Izidingo zehadiwe zokusebenzisa i-Ollama ngaphandle kokumiswa

Isibuyekezo sokugcina: December 23 we-2025
  • Ukuphila kuka-Ollama kuncike kakhulu ku-RAM, i-GPU, kanye nokulinganiswa kwemodeli, hhayi kakhulu kuhlelo lokusebenza ngokwalo.
  • Njengoba ine-RAM engu-16 GB kanye ne-GPU engu-8–12 GB, amamodeli alinganisiwe angu-7B–13B angaphathwa kahle ukusetshenziswa kwansuku zonke.
  • Amamodeli angu-30B–70B adinga ama-GPU ane-VRAM engu-16–32 GB kanye ne-RAM okungenani engu-32 GB ukuze asetshenziswe ngempela.
  • Ukukhetha usayizi wemodeli ofanele kanye nefomethi yehadiwe yakho kuvimbela ukuphahlazeka futhi kuvumela i-AI yendawo ebushelelezi neyimfihlo.

Izidingo zezingxenyekazi zekhompuyutha ze-Ollama

Uma ucabanga ukusebenzisa amamodeli obuhlakani bokwenziwa kukhompyutha yakho, uzohlangana no-Ollama ngokushesha noma kamuva. Futhi yilapho kanye umbuzo omkhulu uphakama khona: Yiziphi izidingo zehadiwe engizidingayo ukuze amamodeli asebenze kahle futhi angasheshi? Akwanele ukuthi ziqale; isihluthulelo ukuthi zingasetshenziswa ngokunethezeka nsuku zonke nokuthi uyazi izinhlobo zehadiwe yekhompyutha.

Kuyo yonke le ndatshana sizoyibheka ngokuningiliziwe Yini eyenziwa yi-Ollama, yini edingwa izinhlobo ezahlukene zamamodeli (7B, 13B, 70B, njll.), i-CPU, i-GPU, i-RAM kanye nediski kuthinta kanjani ukusebenza, futhi yiziphi izilungiselelo ezifanele isimo sakho?Kungakhathaliseki ukuthi ufuna umsizi wombhalo olula noma uhlose ukuhambisa izilo ezifana ne-Llama 3 ngamapharamitha angamashumi ezigidigidi noma amamodeli ombono kanye ne-OCR.

Uyini u-Ollama futhi kungani ihadiwe yenza umehluko omkhulu kangaka?

Empeleni, u-Ollama, iklayenti lemodeli yolimi elivumela ukusebenzisa ama-LLM endaweni emshinini wakho, ngaphandle kokuthembela efwini. Isebenzisa izinjini ezifana shayela.cpp ukwenza isiphetho futhi ugoqe konke ubunzima ngethuluzi elilula, nge-CLI kanye ne-REST API, futhi kusiza ukuqonda imiqondo ye amanethiwekhi we-neural wokufakelwa abasekela amamodeli.

Indima yayo ukuba “isikhungo somyalo” lapho Ulanda, uphathe, futhi usebenzise amamodeli afana ne-Llama 3, Mistral, Gemma, Phi, Qwen, DeepSeek, noma amamodeli e-multimodal afana ne-Llava.Ubuhle bakho ukuthi ungazisebenzisa ungaxhunyiwe ku-inthanethi ngokuphelele, ugcine idatha yakho ekhaya futhi ngaphandle kokukhokhela ithokheni ngayinye njengoba kunjalo ngama-API efu.

Nokho, nakuba u-Ollama ngokwakhe enesisindo esincane futhi engadingi izinto eziningi, Amamodeli asebenzisa wona adinga kakhulu izinsiza.I-LLM ngayinye inamapharamitha ayizigidi noma ayizigidigidi, futhi lokho kuhunyushwa kube ama-gigabytes enkumbulo nesitoreji, kanye nomthwalo osindayo ku-CPU, futhi uma unayo, i-GPU.

Ngakho-ke, uma umuntu ezama ukusebenzisa imodeli enkulu (isibonelo, i-70B Llama) kukhompyutha ene-CPU enamandla kodwa i-GPU ehlukile kanye ne-RAM eyanele, Umphumela uvame ukuba ukuthi "kuyasebenza, kuyasebenza", kodwa kuhamba kancane kangangokuthi akusizi ngalutho.Isihluthulelo ukulinganisela kahle i-CPU, i-GPU, i-RAM, idiski kanye nohlobo lwemodeli.

Izinhlobo zamamodeli ku-Ollama nokuthi zithinta kanjani izidingo

Emtatsheni wezincwadi ka-Ollama uzobona amamodeli ahlelwe ngokwemindeni nobukhulu: 1B, 2B, 4B, 7B, 13B, 30B, 65B, 70B, 405B...Leyo nombolo (B yezigidigidi) ikhombisa inani elilinganiselwe lamapharamitha, futhi ingenye yezinto ezinquma kakhulu ihadiwe edingekayo.

Singawahlanganisa ngendlela ejwayelekile izigaba ezineokusiza kakhulu ekulinganiseni ukuthi yimuphi umshini okudingeka uzizwe ukhululekile ngeqembu ngalinye lamamodeli kanye nezilinganiso:

  • Amamodeli amancane (270M – 4B): yakhelwe imishini enesizotha (ama-laptop alula, ngisho namaselula noma ama-mini-PC). Kuyashesha, kodwa kunekhono elincane lokucabanga.
  • Amamodeli amancane (4B - 14B): ifaneleka njenge amamodeli "asekhaya" alinganisiweKuhle kakhulu engxoxweni evamile, imisebenzi yasehhovisi, usizo lokubhala ikhodi elula, njll.
  • Amamodeli aphakathi nendawo (14B - 70B)Sebevele badlala kuligi ehlukile; Badinga ihadiwe enamandla., i-RAM eningi, futhi uma kungenzeka, i-GPU ene-VRAM eningi.
  • Amamodeli amakhulu (> 70B)Ziyizilwane eziklanyelwe ingqalasizinda engathi sína kakhulu (Ama-GPU aphezulu, amakhadi ehluzo amaningi, amaseva azinikele, ama-Mac aphezulu asetshenziswa kahle, njll.).

Ngaphandle kobukhulu, kunezinye izici ezihilelekile. quantizationKu-Ollamama uzobona izijobelelo ezifana nokuthi q4_K_M, q5_1, q3_K_S, q8_0, f16njll. Lezi zakhiwo zibonisa ukuthi izisindo zicindezelwe kangakanani yemodeli:

  • FP16 / FP32 (f16, f32): kucindezelwe kancane, ukusetshenziswa kwememori okunekhwalithi ephezulu kodwa okunonyaI-7B ku-FP16 ingafinyelela ku-VRAM engaphezu kuka-20 GB.
  • Q4 (q4_0, q4_K_M…): Ukulinganisa okungu-4-bit, ukunciphisa usayizi omkhulu okunomthelela omaphakathi kwikhwalithiNgokuvamile "yizindawo ezimnandi".
  • Umbuzo 3, Umbuzo 2 (q3_K_S, q2_K…): ukulinganisa okunamandla kakhulu, usayizi omncane kakhulu ukuze uthole ukulahlekelwa okuncane kokunembaIwusizo kwihadiwe elinganiselwe kakhulu.
  • Q5, Q6, Q8: izinyathelo eziphakathi nendawo phakathi kokucindezela okunamandla kanye ne-FP16; Ikhwalithi ephezulu, ukusetshenziswa okuphezulu.

Umphumela ongokoqobo ucacile: Imodeli efanayo ye-7B ingahlala cishe ku-26 GB ku-FP16 noma cishe ku-4 GB ku-Q4Lokhu kuhunyushwa ngqo ku-GPU VRAM oyidingayo kanye nenani le-RAM okumele lisekele umthwalo.

Izidingo zehadiwe ezincane nezinconywayo ze-Ollama kunethiwekhi yendawo

Uma ukukhathazeka kwakho ukuthi ikhompyutha yakho ingakwazi yini ukuphatha i-Ollama, impendulo ivame ukuba yebo; umbuzo uthi Yimuphi umodeli ozokwazi ukuwusebenzisa kalulaSizokuhlukanisa ngezingxenye: i-RAM, i-CPU, i-GPU, nediski, ngezincomo ezingokoqobo ezisekelwe ekusebenzeni namadokhumenti avela eziqondisweni ezahlukahlukene ezikhethekile.

I-RAM: umthombo obalulekile kakhulu

I-RAM iyi- imbobo yokuqala Uma sikhuluma ngama-LLM endawo, ngokuvamile, singacabanga ngalezi zigaba:

  • I-8 GB ye-RAM: isisekelo esisebenzayo. Ivumela amamodeli amancane (1B, 3B, uhlobo oluthile olulinganiselwe kakhulu lwe-7B)Kodwa-ke, uzobona ukulinganiselwa, ikakhulukazi uma uhlelo kanye nesiphequluli sezivele zisebenzisa imemori eningi. Kungenzeka ukuthi konke kuzohamba kancane futhi kube nokulibaziseka okwengeziwe.
  • I-16 GB ye-RAM: indinganiso enengqondo namuhla. Ilungele amamodeli angu-7B ngisho nangu-13B alinganiswe ku-Q4Ikakhulukazi uma usebenzisa ama-GPU. Ungasebenza ngezingxoxo eziyinkimbinkimbi ngaphandle kokuthi uhlelo lwehlise ijubane.
  • 32 GB we-RAM noma ngaphezuluKunconywa uma ufuna amamodeli aphakathi nendawo (30B, 40B, 70B) noma ukwenza izinto ezisindayo njengezimo ezinde kakhulu, amamodeli amaningana kumaseva afanayo, anabasebenzisi abaningi noma amathuluzi ezithombe avulekile ohlobo lwe-WebUI ku-Ollama.
  I-ChatGPT Go vs Plus: umehluko, amanani, imikhawulo, nokuthi ikufanelekele bani

Khumbula ukuthi i-RAM ayinqunywanga nje kuphela yimodeli: Isistimu yokusebenza, isiphequluli, i-IDE, i-Docker, i-Open WebUI, njll., nazo zithembele kuyoUma ufuna ukukhulula inkumbulo ezimweni ezithile, ungafunda ukuthi ungakwenza kanjani nciphisa ukusetshenziswa kwe-RAM kuzinhlelo zokusebenza ezifana nesiphequluli. Uma ucabanga ngokusetshenziswa kakhulu, i-16 GB okwamanje "iyinto ekhululekile okungenani" futhi i-32 GB iqala ukuba yinani elikhulu kakhulu.

I-CPU: Imiyalelo yesimanje kanye nenani lama-core

I-Ollama ingasebenza ku-CPU yodwa, kodwa ulwazi luyahlukahluka kakhulu kuye ngeprosesa. Ngaphezu kwenani lama-core, Kubalulekile ukuba nokusekelwa kwamasethi emiyalelo athuthukile njenge-AVX2, futhi okungcono nakakhulu, i-AVX-512, okusheshisa ukusebenza kwe-matrix kanye ne-vector okusetshenziswa kakhulu kuma-LLM.

Una isiqondiso esinengqondo unga:

  • Okuncane okwamukelekayoI-CPU yesimanje ene-quad-core (isibonelo, iprosesa ye-Intel i5 yesizukulwane sakamuva noma i-Ryzen efanayo) enokusekelwa kwe-AVX2. Uzokwazi ukwenza lokhu Sebenzisa amamodeli e-7B ngesineke, ikakhulukazi uma elinganiswe kahle.
  • Kunconyiwe: uhlobo lwakamuva lweprosesa Isizukulwane se-Intel se-11 noma kamuva noma i-AMD Zen4, nge Ama-cores angu-8 noma ngaphezulu kanye nokusekelwa kwe-AVX-512 lapho kungenzeka khona. Ngale ndlela uthola izikhathi zokuphendula ezithuthukisiwe kanye nokunciphisa ukuvinjelwa, ngisho nangama-GPU.

Uma umbono wakho uwukusebenzisa amamodeli amakhulu kakhulu (isibonelo, ukuzama i-70B Llama 3 ene-CPU + GPU encane), I-CPU izohlupheka futhi uzobona izikhathi zokukhiqiza amathokheni aphezulu kakhulu.Kulezi zimo, into ehlakaniphile kakhulu ongayenza ukukhetha amamodeli amancane noma ukutshala imali ku-GPU efanelekile.

I-GPU ne-VRAM: kubaluleke nini futhi kungakanani okudingekayo

I-GPU ayiphoqelekile, kodwa iphawula iphuzu lokushintsha. I-GPU enhle ene-VRAM eyanele ingaguqula isipiliyoni esihamba kancane sibe yinto esebenziseka kahle., ikakhulukazi ngamamodeli angu-7B kuya ku-13B kanye namamodeli alinganisiwe.

Njengesithenjwa esiwusizo kakhuluKumamodeli alinganisiwe (cishe i-Q4), umuntu angalinganisa into efana nale:

  • 7B → ~4 GB ye-VRAM
  • 13B → ~8 GB ye-VRAM
  • 30B → ~16 GB ye-VRAM
  • 65-70B → ~32 GB ye-VRAM

Lezi yizilinganiso ezilinganiselwe, kodwa zikwenza kucace ukuthi I-GPU yohlobo lwe-RTX 2060 SUPER ene-8 GB ye-VRAM yanele i-7B futhi ingaphatha i-13B, kodwa ayitholakali ku-70B. Ngisho noma une-i9 ene-RAM engu-64 GB, uhlelo luzophoqeleka ukusabalalisa umthwalo omningi phakathi kwe-RAM ne-CPU, futhi ukubambezeleka kuzokhuphuka kakhulu.

Ngokwezinto ezibonakalayo:

  • cunt 4-6 GB ye-VRAM: Gxila ku amamodeli e-7B alinganiswe kahleZisebenza kahle kakhulu engxoxweni, ekubhaleni, kanye nasemisebenzini ejwayelekile.
  • cunt 8-12 GB ye-VRAMUngasebenza ngokunethezeka nge 7B kanye no-13B ngisho nama-30B uma uzimisele ukuhamba kancane.
  • cunt 20-24 GB ye-VRAMManje usungena endaweni ye- Amamodeli angu-30B-40B anesithunzi esikhulu, kanye ne-70B elinganiswe kakhulu, ikakhulukazi uma uyisekela nge-RAM enhle.
  • cunt 32 GB ye-VRAM noma ngaphezulu: kunini lapho I-70B iqala ukubonakala inengqondo ngempela ukuze kusetshenziswe ngokuxhumana, uma nje bonke abanye beqembu behambisana.

Kumodeli ye-OCR noma amanye amamodeli akhethekile (isb., umbono), I-GPU ene-VRAM engu-20-24 GB iyisisekelo esiqinile kakhulu sokusebenza okubushelelezi.Ikakhulukazi uma imodeli ihilela amashumi ezigidigidi zamapharamitha. Ngezinhlobo ze-OCR ezilula (2B-7B) noma zokubona, i-8-12 GB izokwanela ngokuphelele.

Isitoreji sediski: ingakanani indawo ethathwa amamodeli

Ngokuphathelene nesikhala sediski, uhlelo lokusebenza lwe-Ollama ngokwalo luthatha isikhala esincane kakhulu; okuthatha isikhala ngempela amamodeli. Endaweni eyisisekelo noma yokuhlola, ambalwa azokwanela. 50 GBKodwa uma uqala ukuqoqa amamodeli, izinto ziya ngokuya zikhula ngokushesha.

Njengomhlahlandlela onzima kumamodeli alinganisiwe:

  • Amamodeli amancane (1B-4B) → nxazonke 2 GB ngemodeli.
  • Amamodeli aphakathi nendawo (7B-13B) → ngokuvamile I-4-8 GB ngemodeli ngokuya ngesilinganiso.
  • Amamodeli amakhulu (30B-70B) → kalula I-16-40 GB ngamunye
  • Amamodeli amakhulu kakhulu (> 100B) → kungadlula 200 GB ngemodeli futhi idlule ngisho nama-terabyte kwezinye izimo ezimbi kakhulu.

Okuhle ukusebenzisa I-SSD esheshayo (NVMe uma kungenzeka) ukwenza imodeli yokuqala ilayishe ngokushesha. Ngaphezu kwalokho, i-Ollama ivumela shintsha indlela lapho amamodeli agcinwa khona kusetshenziswa i-environment variable OVEN_MODELSukuze usebenzise idrayivu yesibili enkulu bese ushiya eyokuqala ingagcwele kakhulu; ukuthola ulwazi olwengeziwe ngesikhala nezinhlobo zedrayivu, bheka ku- umhlahlandlela wehadiwe yesitoreji.

Izidingo ezithile zokusebenzisa amamodeli athile ne-Ollama

Nakuba imodeli ngayinye inezici zayo, ngokwesimiso samanje sika-Ollama, amanye [amathuba] angavela. iziqondiso ezicacile ngezigaba ezivamile zokusetshenziswa: ingxoxo evamile, ukufaka ikhodi, amamodeli ombono/ama-OCR kanye namamodeli amakhulu ohlobo lwe-70B.

Amathempulethi engxoxo evamile (i-Llama, i-Mistral, i-Gemma, i-Qwen…)

Ukusetshenziswa kohlobo olujwayelekile "lwe-ChatGPT yendawo" ngamamodeli afana I-Llama 3.x 7B/8B, Mistral 7B, Gemma 2B/7B noma i-Qwen enosayizi omaphakathiOkungaba nengqondo namuhla kungaba yinto efana nale:

  • Okunconywayo okuncane:
    • I-CPU yesimanje ene-quad-core ene-AVX2.
    • I-16 GB ye-RAM.
    • Ayikho i-GPU noma i-GPU eyisisekelo ene-4-6 GB VRAM.
    • Okungenani i-SSD engu-50 GB yesistimu + imodeli eyodwa noma ezimbili.
  • Ukucushwa okuhle kakhulu kokuba negumbi elikhulu eline-7B-13B:
    • I-CPU enama-cores angu-8 noma ngaphezulu (i-i7/i9 yesimanje noma i-Ryzen 7/9).
    • I-32 GB ye-RAM uma ufuna ukugcina izinto eziningi zivulekile.
    • GPU nge 8-12 GB ye-VRAM (RTX 3060/3070 noma okulingana nayo, i-AMD RX 6700 noma ngaphezulu, noma i-Mac ene-M1/M2/M3 esetshenziswa kahle).
    • I-SSD engu-1 TB uma uzoqoqa amamodeli.
  Ungawaxhuma kanjani ama-AirPods akho ku-PC, i-Mac, namanye amadivayisi

Kulezi zimo, Amamodeli e-7B ane-quantization ye-Q4_K_M noma ye-Q5_K_M asebenza kahle kakhulu. futhi inikeza ikhwalithi engaphezu kwenele yokusetshenziswa komuntu siqu, imibhalo yobuchwepheshe, imisebenzi yokufunda noma ukwesekwa kokubhala.

Amamodeli okubhala ikhodi (i-DeepSeek, i-CodeLlama, i-Phi egxile kukhodi)

Amamodeli agxile ekuhleleni izinhlelo ngokuvamile anazo izidingo ezifana nezezigumbi zokuxoxa ezijwayelekile ezinobukhulu obufanayoKodwa kuyalulekwa ukuvumela inzuzo eyengeziwe I-RAM ne-VRAM ziyadingeka uma uzozisebenzisa kanye ne-IDE esindayo kanye namaphrojekthi amaningi avulekile..

Isibonelo, ukusebenzisa into efana I-DeepSeek-Coder ye-7B-8B noma i-CodeLlama yobukhulu obufanayo ngaphansi kwezimoInhlanganisela enengqondo kakhulu ingaba:

  • CPU ama-cores anamuhla angu-6-8.
  • I-32 GB ye-RAM uma usebenza ngamathuluzi amaningi ngesikhathi esisodwa (i-IDE, isiphequluli esinamathebhu, i-Docker, njll.).
  • I-GPU ene-VRAM okungenani engu-8 GB ukuhambisa imodeli kahle.

Isebenza futhi kwihadiwe engenamandla kakhulu, kodwa uzobona Izikhathi zokuphendula ezisheshayo lapho kwakhiwa amabhlogo amade ekhodi noma ukuhlaziywa okuyinkimbinkimbiKumamodeli amancane, thayipha I-Phi-4 Mini Izidingo ziphansi kakhulu futhi zisebenza kahle ngisho nasezinhlelweni ezingu-16 GB ezine-GPU elula.

Amamodeli e-Vision kanye ne-OCR (Ukhiye, amamodeli e-OCR, amamodeli e-multimodal)

Amamodeli anekhono lokucubungula izithombe (umbono/i-OCR) njenge I-lava Izinhlobo eziningi ze-Llama 3.x, kanye namamodeli athile e-OCR, zengeza ungqimba olwengeziwe lobunzima. Ezingeni lehadiwe, Zifinyelela izidingo zemodeli yombhalo enobukhulu obufanayo, kodwa zinenzuzo enkulu ngokusebenzisa ama-GPU..

Uma sikhuluma ngemodeli ye-OCR ephakathi nendawo (ake sithi kububanzi be-7B-13B) futhi ufuna ukuyisebenzisa kahle endaweni ukuze ubone amadokhumenti, izithombe eziskeniwe, njll., Kunengqondo ukuphakamisa into efana nokuthi:

  • I-GPU ene-VRAM engu-20-24 GB kungakhathaliseki ukuthi imodeli inkulu ngempela noma uma ufuna ukushiya cishe konke ukucutshungulwa ekhadini.
  • I-GPU ene-VRAM engu-8-12 GB Uma ukhetha izinhlobo ezilula nezilinganiselwe kahle, zizoqhubeka nokusebenza kahle uma nje ungasebenzisi ngokweqile usayizi wesithombe noma izimo ezinkulu.
  • Ubuncane be-16 GB ye-RAM, yize i-32 GB inikeza i-margin ekhululekile kakhulu yokusetshenziswa kakhulu.
  • i-CPU yesimanje ukuze ingavimbi lapho i-GPU ilayishwa.

Impendulo eqondile yombuzo ojwayelekile othi “ngingakwazi yini ukusebenzisa imodeli ye-OCR ku-GPU ene-20-24 GB ye-VRAM?” yileso Yebo, luyibanga elihle kakhulu lamamodeli okubona/okubona aphakathi kuya kwamakhulu e-Ollamauma nje une-RAM eyanele kanye ne-CPU enhle.

Amamodeli amakhulu (i-Llama 3:70B kanye nokunye okufanayo)

Ukuzama ukuhambisa Shayela inombolo 3 kwangu-70B nge-CPU enamandla kakhulu (isibonelo, isizukulwane se-11 i9) kanye I-RAM engu-64GB kodwa i-GPU efana ne-8GB RTX 2060 SUPER Kuyisibonelo esiphelele sokuthi "yebo, kodwa cha." Imodeli ingase ilayishe ekugcineni, kodwa:

  • Ingxenye yemodeli ayifaneleki ku-VRAM futhi incike kakhulu ku-RAM.
  • I-CPU kufanele ithathe umsebenzi omningi wokuqagela.
  • Isikhathi ngethokheni siyakhuphuka futhi isipiliyoni asisasebenzi nhlobo..

Ukuze i-70B ibe nengqondo ekhaya noma ezindaweni ezingezona ezobungcweti, udinga, okungenaniOkuthile okufana nalokhu:

  • I-RAM engu-32 GB njengesisekelo, i-64 GB uma ufuna i-headroom eyengeziwe.
  • I-GPU ene-VRAM okungenani engu-24-32 GB ukulayisha iningi lemodeli ngesilinganiso esinengqondo (Q4_K_M noma esifanayo).
  • I-CPU enamandla ephezulu enama-cores angu-8-16.

Uma ungahlangabezani nalezi zibalo, Kungcono kakhulu ukusebenzisa amamodeli e-7B-13B alinganiswe kahle Noma, uma udinga ngempela i-70B ukuze uthole ikhwalithi, cabanga ngeseva ekhethekile (yendawo noma esefwini), i-Mac enamandla kakhulu, noma ama-GPU amaningana asebenza ngesikhathi esisodwa.

Izidingo zokufaka i-Ollama ku-VPS noma kuseva

Enye indlela evame kakhulu ukufaka i-Ollama endaweni evulekile. I-VPS noma iseva ezinikezele bese uyisebenzisa nge-API noma isikhombikubona sewebhu (isibonelo, nge-Open WebUI). Lokhu akubandakanyi izinsiza kuphela, kodwa futhi nesistimu yokusebenza kanye nezimvume.

Kuziqondiso zabahlinzeki ezifana ne-Hostinger Kunconywa ubuncane obulandelayo nge-VPS eqondiswe ku-Ollama:

  • I-RAM: ubuncane be-16 GB ukuze amamodeli amancane/aphakathi angawugqibi kakhulu uhlelo.
  • I-CPU: 4-8 vCoreskuye ngobukhulu bamamodeli kanye nenani labasebenzisi abasebenzisa kanyekanye.
  • Isitoreji: okungenani i-12 GBKodwa-ke, empeleni kuyalulekwa ukuthi uhlose okungaphezulu (50-100 GB) uma uzozama amamodeli amaningana.
  • Isistimu yokusebenza: ngaphezu kwakho konke Linux, ngokukhetha kwe Ubuntu 22.04 noma ngaphezulu, noma i-Debian ezinzile yakamuva.
  • Ukufinyelela kwempande noma izimvume ze-sudo ukuze ufake ukuncika, lungiselela i-systemd, njll.

Uma i-VPS yakho ifaka i-NVIDIA GPU, kuzodingeka ukuthi Faka futhi ulungiselele i-CUDA noma i-NVIDIA container toolkit Uma usebenzisa i-Docker. Nge-AMD, i-ROCm ivame ukusetshenziswa ku-Linux, kanye nabashayeli be-Adrenalin abafanele ku-Windows. Ezindaweni ezingenayo i-GPU, iseva izoncika ku-CPU kanye ne-RAM, ngakho ungagcini lapho; ungayiphatha futhi ukude usebenzisa uxhumano lwedeskithophu ekude uma udinga isikhombikubona sezithombe.

  I-GitHub Spark: Iyini nokuthi ungazakha kanjani izinhlelo zokusebenza ngobuhlakani bokwenziwa

Izimo ezithile zehadiwe nokuthi yimaphi amamodeli okufanele uwasebenzise

Ukuqinisekisa ukuthi konke okungenhla akuhlali kuyimfundiso nje, kungaba usizo ukubheka ezinye izinhlanganisela zehadiwe ezijwayelekile kanye ukuthi yiziphi izinhlobo zamamodeli ezifanele icala ngalinye usebenzisa u-Ollama.

Ikhompyutha yedeskithophu noma yelaptop ephakathi nendawo

Ake sicabange ngeqembu elijwayelekile:

  • i5 noma i-Ryzen 5 CPU kusukela eminyakeni embalwa edlule (ama-core angu-4-6).
  • I-16 GB ye-RAM.
  • I-GPU engu-4 GB ehlanganisiwe noma ezinikele.
  • I-SSD engu-512GB.

Kulesi simo, into enengqondo okufanele uyenze ukuhlose:

  • Amamodeli e-1B-3B alinganisiwe (Gemma 2B, Phi-4 Mini, Llama 3.x 1B) ukuze kube nokugeleza okuphezulu.
  • Amamodeli angu-7B ku-Q4 uma wamukela isikhathi eside sokuphendula.
  • Usebenzisa i-Ollama nge-terminal futhi, uma ufuna i-web interface, Vula i-WebUI ngokucophelela ukuze ungagcwalisi i-RAM ngokweqile.

Uzokwazi ukuba nomsizi wakho wombhalo wendawo, wenze izifinyezo, ukuhlaziya okuthile, kanye nemisebenzi elula yokuhlela, kodwa Akuyona indawo efanelekile yamamodeli angu-13B nangaphezulu..

Imishini ephakathi kuya kwephezulu egxile ku-AI yendawo

Lapha sikhuluma ngohlobo lwe-PC:

  • I-CPU yesimanje ye-i7/i9 noma i-Ryzen 7/9, ama-core angu-8-16.
  • I-32 GB ye-RAM.
  • I-GPU ene-VRAM engu-12-24 GB (RTX 4070/4080, 3090, 4090, okulingana ne-AMD noma okufanayo).
  • I-SSD engu-1-2 TB.

Lokhu kulungiselelwa kwandisa kakhulu ububanzi bamathuba.:

  • onobuhle 7B-13B ku-Q4/Q5 ngengxoxo, ikhodi, ukuhlaziywa kwedatha… ngezikhathi ezinhle kakhulu zokuphendula.
  • Amamodeli angu-30B futhi abanye Isilinganiso esingu-70B uma wamukela ukubambezeleka okwengeziwe.
  • Amamodeli we umbono/i-OCR ngobukhulu obuphakathi kusetshenziswa kakhulu i-GPU.

Luhlobo lomshini ongawuhlanganisa Indawo ebucayi ye-AI yendawo, enamamodeli amaningi, isikhombikubona sewebhu, ukuhlanganiswa kwe-REST API, kanye nokuhamba komsebenzi kobungcweti ngaphandle kokuncika kumasevisi angaphandle.

Iseva noma isiteshi sendawo "sesilo"

Ku isiphetho esiphezulu Kunezindawo ezine:

  • Ama-GPU amaningana ane-VRAM engu-24-48 GB ngayinye, noma eyodwa ephezulu.
  • 64-128 GB RAM.
  • Ama-CPU anama-core amaningi, njengezinhlobo zakamuva ze-Threadripper noma ze-Xeon.

Yilapho Amamodeli amakhulu (>70B, MoE, vision heavy, njll.) aseqala ukuba ngokoqobo ngisho nabasebenzisi abaningi ngesikhathi esisodwa noma ukuhlanganiswa okuyinkimbinkimbi. Kusobala ukuthi kuyisimo esibiza kakhulu, kodwa futhi sikuvumela ukuthi ube namakhono afana namanye ama-API ezentengiselwano, ngokulawula idatha okuphelele ngaphakathi kwengqalasizinda yakho.

Amathiphu awusizo okuthola okuningi kwihadiwe yakho ye-Ollama

Ngaphandle kokuthenga i-RAM eyengeziwe noma i-GPU engcono, kunezindlela eziningana ezisetshenziswayo Zisiza ekutholeni okuningi kulokho osuvele unakho futhi zigweme ukumangala lapho usebenzisa amamodeli amakhulu no-Ollama.

Okokuqala, kuyatuseka Khetha imodeli efanele ngokuya ngokusetshenziswaAkunangqondo ukusebenzisa i-70B ukubhala ama-imeyili alula uma i-7B elungiswe kahle yanele ngokuphelele. Ngokufanayo, i-30B ayinangqondo uma i-GPU yakho ine-6GB ye-VRAM kuphela; i-7B izoba ukukhetha okungcono ku-Q4.

Esinye isilinganiso esibalulekile ukudlala ngamapharamitha okusebenza (izinga lokushisa, i-num_ctx, i-num_predict, njll.), kungaba ku-Modelfile noma nge-CLI/API. Kusetshenziswa izimo ezinkulu ngokumangalisayo (inani_le-ctx elingu-32k noma ngaphezulu) uma une-RAM encane noma i-VRAM kuzonciphisa ijubane lohlelo lonke ngaphandle kokufaka isandla esiningi ezimweni eziningi.

Kuyanconywa futhi qapha ukuthi yimaphi amamodeli alayishwayo nokuthi kuyiphi iprosesa usebenzisa ollama psLapho uzobona ukuthi imodeli isebenza ngempela ku-GPU noma ku-CPU, nokuthi ilayishe usayizi ongakanani. Lungisa i-variable OLLAMA_KEEP_ALIVE Kusiza amamodeli ukuthi akhulule inkumbulo uma engasetshenziswa, ngaleyo ndlela akhulule izinsiza.

Okokugcina, khumbula lokho I-Quantization ingumngane wakhoUkudala izinhlobo ze-Q4_K_M noma ze-Q5_K_M zemodeli yokuqala ku-FP16 kukuvumela ukuthi usebenzise ihadiwe ephansi kakhulu ngokulahlekelwa yikhwalithi evame ukungabonakali ukusetshenziswa kwangempela.

Ngemva kokubona lesi sithombe sonke, umqondo ocacile ukuthi U-Ollama akuyona ingxenye edinga kakhulu, amamodeli ayadinga.Ukuqonda ukuthi usayizi, ukulinganisa, i-RAM, kanye ne-VRAM kuhlobene kanjani kukuvumela ukuthi ukhethe i-hardware efanele kanye nenhlanganisela ye-LLM yezidingo zakho: kusukela kwi-laptop ene-16 GB esebenzisa i-7B elula kuya endaweni yokusebenza ene-24 GB GPU ephethe umbono oqinile kanye namamodeli e-OCR. Ngokulungisa ngokucophelela okulindelwe kanye namapharamitha, kungenzeka ngokuphelele ukuba ne-AI enamandla, yangasese esebenza emshinini wakho ngaphandle kwemali yanyanga zonke.

guqula i-PC yakho ibe ilebhu ye-AI
I-athikili ehlobene:
Ungayiguqula kanjani i-PC yakho ibe ilebhu ye-AI yangempela