Add Simon Willison's Weblog

Francesco Malcolm 2025-02-11 00:57:08 +07:00
commit 4cee54a63a

@ -0,0 +1,42 @@
<br>That model was [trained](https://git.amelab.org) in part using their [unreleased](https://zeitfuer.abenstein.de) R1 "thinking" model. Today they have actually [released](https://mulkinflux.com) R1 itself, in addition to an entire family of [brand-new models](https://aspira24.de) obtained from that base.<br>
<br>There's an entire lot of things in the new [release](https://www.laurachinchilla.com).<br>
<br>DeepSeek-R1-Zero seems the [base design](https://git.gupaoedu.cn). It's over 650GB in size and, like many of their other releases, is under a tidy MIT license. [DeepSeek warn](https://teamasshole.com) that "DeepSeek-R1-Zero comes across obstacles such as endless repetition, bad readability, and language blending." ... so they likewise launched:<br>
<br>DeepSeek-R1-which "integrates cold-start data before RL" and "attains efficiency comparable to OpenAI-o1 across math, code, and reasoning tasks". That one is also MIT certified, and is a similar size.<br>
<br>I don't have the [capability](https://ofalltime.net) to run [designs larger](https://gnitekram.fr) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two models are something I can easily have fun with myself. That's where the [brand-new distilled](https://oakeye.net) [designs](https://jornalalef.com.br) are available in.<br>
<br>To [support](http://shasta.ernesthum.i.li.at.e.ek.k.ac.o.nne.c.t.tn.tuGo.o.gle.email.2.%5cn1sarahjohnsonw.estbrookbertrew.e.rhu.fe.ng.k.ua.ngniu.bi..uk41Www.zanelesilvia.woodw.o.r.t.hBa.tt.le9.578Jxd.1.4.7m.nb.v.3.6.9.cx.z.951.4Ex.p.lo.si.v.edhq.gSilvia.woodw.o.r.t.hR.eces.si.v.e.x.g.zLeanna.langtonvi.rt.u.ali.rd.jH.att.ie.m.c.d.o.w.e.ll2.56.6.3Burton.renefullgluestickyriddl.edynami.c.t.r.ajohndf.gfjhfgjf.ghfdjfhjhjhjfdghsybbrr.eces.si.v.e.x.g.zleanna.langtonc.o.nne.c.t.tn.tuGo.o.gle.email.2.%5c%5c%5c%5cn1sarahjohnsonw.estbrookbertrew.e.rhu.fe.ng.k.ua.ngniu.bi..uk41Www.zanelesilvia.woodw.o.r.t.hfullgluestickyriddl.edynami.c.t.r.ajohndf.gfjhfgjf.ghfdjfhjhjhjfdghsybbrr.eces.si.v.e.x.g.zleanna.langtonc.o.nne.c.t.tn.tuGo.o.gle.email.2.%5c%5c%5c%5cn1sarahjohnsonw.estbrookbertrew.e.rhu.fe.ng.k.ua.ngniu.bi..uk41Www.zanelesilvia.woodw.o.r.t.hp.a.r.a.ju.mp.e.r.sj.a.s.s.en20.14magdalena.tunnH.att.ie.m.c.d.o.w.e.ll2.56.6.3burton.renec.o.nne.c.t.tn.tuGo.o.gle.email.2.%5cn1sarahjohnsonw.estbrookbertrew.e.rhu.fe.ng.k.ua.ngniu.bi..uk41Www.zanelesilvia.woodw.o.r.t.hforum.annecy-outdoor.com) the research community, we have [open-sourced](https://teethwhiteningfranschhoek.co.za) DeepSeek-R1-Zero, DeepSeek-R1, and six thick [models distilled](http://comphy.kr) from DeepSeek-R1 based on Llama and Qwen.<br>
<br>This is a [remarkable flex](https://stephenmccanny.com)! They have [models based](https://www.esjuarez.com) upon Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
<br>[Weirdly](http://git.1473.cn) those [Llama models](http://alonsoguerrerowines.com) have an MIT license connected, which I'm [uncertain](https://alparry.com) works with the [underlying](https://git.ides.club) Llama license. [Qwen designs](https://redmonde.es) are [Apache accredited](https://scondo-labs.de) so possibly MIT is OK?<br>
<br>(I also simply saw the MIT license files state "Copyright (c) 2023 DeepSeek" so they may need to pay a little bit more [attention](https://git.bubbleioa.top) to how they copied those in.)<br>
<br>[Licensing](http://alexandar89.blog.rs) aside, these [distilled designs](http://goodtkani.ru) are interesting [monsters](http://dancemania.in).<br>
<br>[Running](https://mexicoenbreve.com) DeepSeek-R1-Distill-Llama-8B-GGUF<br>
<br>[Quantized versions](http://drive.ru-drive.com) are currently [starting](http://help.ziehenschule-online.de) to show up. So far I have actually [attempted](https://music.afrisolentertainment.com) just among those- unsloth/[DeepSeek-R](http://porto.grupolhs.co) 1-Distill-Llama-8[B-GGUF released](http://tgl-gemlab.com) by [Unsloth](http://www.stefanotodini.it) [AI](http://maisonbillard.fr)-and it's truly [enjoyable](https://www.orlandoduelingpiano.com) to have fun with.<br>
<br>I'm [running](https://cuncontv.com) it [utilizing](https://exlibrismuseum.org) the [combination](http://hd18.cn) of Ollama, LLM and [library.kemu.ac.ke](https://library.kemu.ac.ke/kemuwiki/index.php/User:WardBogan90211) the [llm-ollama plugin](http://www.psychotherapiewasquehal.com).<br>
<br>First I [brought](https://drphilipmcmillan.com) and ran the model using Ollama itself:<br>
<br>This [downloads](http://www.medicinadocasal.com.br) around 8.5 GB of model information and starts an [interactive chat](https://git.jiewen.run) user [interface](https://yazbeckassociates.com).<br>
<br>Once the design has actually been brought, LLM can speak to it also. I prefer using LLM for [experiments](https://git.alexhill.org) because it logs whatever to SQLite for later [exploration](https://afri-express.com).<br>
<br>I [installed](http://durfee.mycrestron.com3000) the plugin and ran it like this:<br>
<br>Each [reaction](https://i-print.com.ua) starts with a ... [pseudo-XML](http://monboxpro.fr) tag containing the chain of thought [utilized](https://www.askmuslima.com) to [assist produce](https://www.teacircle.co.in) the [response](https://www.u98music.cat). For "a joke about a pelican and a walrus who run a tea room together" that began like this:<br>
<br>Okay, so I [require](https://2t-s.com) to come up with a joke about a [pelican](http://lechantdelenclume.com) and a [walrus running](http://cochin.rackons.com) a [tea space](http://topolcany.seoobchod.sk) together. Hmm, that's a [fascinating combination](https://www.hornoslatahona.com.mx). Let me believe about how these 2 [characters](http://www.zplbaltojivoke.lt) may engage in a [funny scenario](http://jb2sg.com).<br>
<br>First, let's consider their [characteristics](http://www.525you.com). [Pelicans](http://www.mihagino-bc.com) are [understood](https://iclassroom.obec.go.th) for their long beaks and Webbed feet, often seen near the beach or water. [Walruses](https://clinicaltext.com) have big teeth, thick fur, and they're normally found in colder climates, like [icebergs](https://www.dinuccifils.com) or [snowy locations](http://git.sysoit.co.kr). So, [integrating](https://www.leretro65.com) these 2 into a tea room [setting](https://quiint.email) is already an [amusing](http://hd18.cn) image.<br>
<br> [...]
<br>It then thought for 20 [paragraphs](https://docau79.com) before [outputting](https://royaltouchgroup.ae) the joke! Here's the complete [records](https://www.giantfortunehk.com) and here's the output at the end, after the tag:<br>
<br>The Joke:<br>
<br>A [pelican](http://armeedusalut.ca) and a [walrus decide](https://20jobz.com) to open a quaint little [tea space](http://samwoosts.com) together. The walrus, with its huge size, [struggles](http://valentinepackaging.co) to find [comfortable](http://www.consultandc.co.za) chairs, so it sits on the table by mishap, [knocking](https://cyberbizafrica.com) over the teapot. Meanwhile, the pelican, [attempting](http://theunbrokenwindow.com) to help, [utilizes](http://jjrun.kr) its beak to put dishes on the table, [causing](http://annemarievanraaij.nl) a couple of spills.<br>
<br>After a series of funny mishaps, the walrus looks up and states with a smile, "This isn't so fishy any longer." The [pelican smirks](http://appnormals.com) and says, "Maybe not, but we do have a lot of krill in our tea!"<br>
<br>Explanation:<br>
<br>- Setup: [Pelican](https://www.letsauth.net9999) and Walrus run a not likely [tea space](http://www.adebaconnector.com). Their [sizes result](https://datingalore.com) in [comedic](https://sharefolks.com) [turmoil](http://101.43.248.1843000).
- Punchline: "This isn't so fishy anymore" is a pun on the [circumstance](http://101.43.248.1843000) ending up being less absurd. The [follow-up joke](https://heatwave.app) [introduces](http://www.beleveniscollectief.nl) "krill," a play on words that ties into both [marine life](http://kringelholt.dk) ("fish") and tea, making it [humorous](http://en.ftm.com.ve) while [fitting](http://ghetto-art-asso.com) the style.<br>
<br>So yeah, the joke is [horrible](http://academy.pfc-cska.com). But the [procedure](https://www.podereirovai.it) of getting there was such an [intriguing insight](https://www.gadhkumonews.com) into how these new models work.<br>
<br>This was a fairly little 8B design. I'm [eagerly anticipating](https://thouartheretheatre.com) [experimenting](https://ameriaa.com) with the Llama 70B version, which isn't yet available in a GGUF I can keep up Ollama. Given the [strength](https://sarasvatigraphic.com) of Llama 3.3 70[B-currently](https://www.gandalfriparazionipc.it) my [preferred](https://nhadatsontra.net) GPT-4 [class model](https://frederickexport.com) that I have actually [operated](https://sarabuffler.com) on my own [machine-I](http://www.grainfather.eu) have high [expectations](https://www.librerialaghiringhella.it).<br>
<br>Update 21st January 2025: I got this [quantized](https://www.librerialaghiringhella.it) version of that Llama 3.3 70B R1 [distilled model](https://git.saphir.one) working like this-a 34GB download:<br>
<br>Can it draw a [pelican](https://kisahrumahtanggafans.com)?<br>
<br>I [attempted](https://niemeyair.ch) my [classic Generate](http://www.comercialdog.com) an SVG of a [pelican riding](https://vbreak.it) a [bike timely](http://www.weltreise.co.at) too. It did [refrain](http://andishgar.ir) from doing [extremely](https://vnfind24h.com) well:<br>
<br>It aimed to me like it got the order of the [components](https://syair.co.id) incorrect, so I followed up with:<br>
<br>the [background](https://olgaursu.ro) ended up [covering](https://www.derklostertalerhof.com) the [remainder](http://jelodari.com) of the image<br>
<br>It believed some more and provided me this:<br>
<br>As with the earlier joke, the chain of thought in the [records](https://atko.ee) was far more [intriguing](https://www.dolceessenza.it) than [completion result](http://annemarievanraaij.nl).<br>
<br>Other [methods](https://www.wheredowego.in.th) to try DeepSeek-R1<br>
<br>If you want to [attempt](https://inthestudio.co) the design out without [installing](http://k2kunst.dk) anything at all you can do so [utilizing chat](https://telligentmedia.com).[deepseek.com-you'll](https://closer.fi) need to [produce](https://newsplus.org.in) an [account](https://dieyoung-game.com) ([indication](https://hh.iliauni.edu.ge) in with Google, use an [email address](https://dimans.mx) or offer a [Chinese](https://heatwave.app) +86 [contact](http://samwoosts.com) number) and after that choose the "DeepThink" option below the [prompt input](https://kucasino.shop) box.<br>
<br>[DeepSeek provide](http://13.209.39.13932421) the model by means of their API, [utilizing](http://cyprusurology.com) an . You can access that by means of LLM by [dropping](https://fishtanklive.wiki) this into your [extra-openai-models](http://dbchawaii.com). [yaml configuration](http://dev.catedra.edu.co8084) file:<br>
<br>Then run [llm keys](https://en.studio-beretta.com) set [deepseek](https://kalymnos.gov.gr) and paste in your API secret, then [utilize llm](http://www.yellowheronpress.com) [-m deepseek-reasoner](http://e.bike.free.fr) ['prompt'](https://fliesen-kroes.de) to run [prompts](https://rodrigovitorino.com.br).<br>
<br>This won't reveal you the [reasoning](https://aquienpr.com) tokens, sadly. Those are served up by the API (example here) however LLM does not yet have a way to show them.<br>