Ososayensi beDatha kanye nezingcweti zokufunda ngomshini zibhekana nenani elibalulekile ledatha yezinhlobo ezihlukahlukene kuphrojekthi yesayensi yedatha evamile. Amamodeli amaningi athuthukisiwe anokucushwa nezici ezihlukahlukene, kanye nokuphindaphinda okuningi kokushuna kwepharamitha ukuze uthole ukusebenza okuphelele.
Esimeni esinjalo, zonke izinguquko zedatha kanye nokulungiswa kwenqubo yokwakha imodeli kufanele kuqashelwe futhi kukalwe ukuze kunqunywe ukuthi yini esebenzile nengazange. Kubalulekile futhi ukuthi ukwazi ukubuyela kuhlelo lwangaphambilini futhi ubheke imiphumela yangaphambilini.
Ukulawulwa Kwenguqulo Yedatha (i-DVC), esiza ekulawuleni idatha, imodeli eyisisekelo, nokusebenzisa imiphumela ephindaphindekayo, ingobunye bobuchwepheshe obusenza sikwazi ukuqapha konke lokhu.
Kulokhu okuthunyelwe, sizobheka eduze Ukulawulwa Kwenguqulo Yedatha, kanye namathuluzi angcono kakhulu ongawasebenzisa. Ake siqale.
Kuyini Ukulawulwa Kwenguqulo Yedatha?
Inguqulo iyadingeka kuwo wonke amasistimu okukhiqiza. Iphoyinti elilodwa lokufinyelela kudatha yakamuva kakhulu. Noma iyiphi insiza evame ukuguqulwa, ikakhulukazi abasebenzisi abambalwa ngesikhathi esisodwa, idinga ukudalwa komkhondo wokucwaninga ukuze ulandelele zonke izinguquko.
Isistimu yokulawula inguqulo inesibopho sokuqinisekisa ukuthi wonke umuntu eqenjini usekhasini elifanayo. Iqinisekisa ukuthi wonke umuntu eqenjini usebenza kunguqulo yakamuva yefayela futhi, okubaluleke nakakhulu, ukuthi wonke umuntu ubambisana kuphrojekthi efanayo ngesikhathi.
Uma unemishini efanele, ungakufeza lokhu ngomzamo omncane!
Uzoba namasethi edatha angashintshi kanye nengobo yomlando ephelele yalo lonke ucwaningo lwakho uma usebenzisa isu elinokwethenjelwa lokuphatha inguqulo yedatha. Amathuluzi enguqulo yedatha abalulekile ekuhambeni komsebenzi wakho uma unendaba nokuphindaphindeka, ukulandeleka, kanye nomlando wemodeli ye-ML.
Bakusiza ukuthi uthole inguqulo yento, njenge-hashi yedathasethi noma imodeli, ongayisebenzisa ukuze ukhombe futhi uyiqhathanise. Le nguqulo yedatha ivamise ukufakwa kusixazululo sakho sokuphathwa kwemethadatha ukuze kuqinisekiswe ukuthi ukuqeqeshwa kwakho okuyimodeli kunenguqulo futhi kuyaphinda.
Amathuluzi Okulawula Inguqulo Yedatha Angcono Kakhulu
Manje sekuyisikhathi sokubheka izixazululo zokulawula inguqulo yedatha ezihamba phambili ezitholakalayo, ongazisebenzisa ukuze ulandelele yonke ingxenye yekhodi yakho.
1. IGit LFS
Iphrojekthi ye-Git LFS imahhala ukuyisebenzisa. Ngaphakathi kwe-Git, amafayela amakhulu njengamasampuli omsindo, amavidiyo, isizindalwazi, nezithombe kufakwa izikhombi zombhalo, futhi okuqukethwe kwefayela kugcinwa kuseva ekude njenge-GitHub.com noma i-GitHub Enterprise.
Ikuvumela ukuthi usebenzise i-Git ukuze wenze inguqulo yamafayela amakhulu—afika ku-GB ambalwa ngosayizi—singatha okwengeziwe kumakhosombe akho e-Git usebenzisa isitoreji sangaphandle, futhi uhlanganise futhi ubuyise izinqolobane zamafayela amakhulu ngokushesha okukhulu. Uma kuziwa ekuphathweni kwedatha, lesi yisixazululo esilula kakhulu. Ukuze usebenze ne-Git, awudingi noma yimiphi imiyalo eyengeziwe, amasistimu okugcina, noma amathuluzi wamathuluzi.
Ikhawulela inani lolwazi olulandayo. Lokhu kusho ukuthi ukwenza i-cloning kanye nokubuyisa amafayela amakhulu kumakhosombe kuzoshesha. Izikhombisi zenziwe ngezinto ezilula futhi zikhomba ku-LFS.
Njengomphumela, lapho ucindezela i-repo yakho endaweni yokugcina enkulu, ibuyekeza ngokushesha futhi ithatha isikhala esincane.
buhle
- Ihlanganisa kalula ekuthuthukisweni komsebenzi wamabhizinisi amaningi.
- Asikho isidingo sokuphatha amalungelo engeziwe ngoba isebenzisa izimvume ezifanayo nenqolobane ye-Git.
bawo
- I-Git LFS idinga ukusetshenziswa kwamaseva azinikele ukugcina idatha yakho. Ngenxa yalokho, amaqembu akho esayensi yedatha azovaleleka, futhi nomsebenzi wakho wobunjiniyela uzokhuphuka.
- Ikhethekile kakhulu, futhi ingase idinge ukusetshenziswa kwamathuluzi ahlukahlukene ahlukahlukene ezigabeni ezilandelayo ekuhambeni komsebenzi kwesayensi yedatha.
Zamanani
Kumahhala ukusetshenziselwa wonke umuntu.
2. IchibiFS
I-LakeFS iyisixazululo senguqulo yedatha yomthombo ovulekile egcina idatha ku-S3 noma i-GCS futhi inegatsha elifana ne-Git nokwenza i-paradigm ekala kuma-petabytes.
Lelisu legatsha lenza idatha yakho ye-ACID ithobelane ngokuvumela izinguquko ukuthi zenzeke emagatsheni ahlukile angakhiwa, ahlanganiswe, futhi abuyiselwe emuva nge-athomu nangokuphazima kweso.
ILakeFS inika amandla amaqembu ukuthi enze imisebenzi yedatha yechibi ephindaphindekayo, i-athomu, nenguqulo. Kuyinto entsha endaweni yesigameko, kodwa kungamandla okufanele kubhekwe nawo.
Isebenzisa i-Git-like branching kanye nendlela yokulawula inguqulo ukuze uxhumane neyakho idatha echibini, ingakala ifike kuma-Petabytes edatha. Ngesikali se-exabyte, ungabheka isilawuli senguqulo.
buhle
- Imisebenzi efana ne-Git ihlanganisa ukuhlanganisa, ukwenza, ukuhlanganisa, nokubuyisela emuva.
- Izingwegwe zokuzibophezela/zokuhlanganisa zisetshenziselwa ukuhlola i-CI/CD yedatha.
- Ihlinzeka ngezici eziyinkimbinkimbi njengokwenziwe kwe-ACID yesitoreji samafu esilula njenge-S3 ne-GCS, konke kuyilapho ifomethi ingathathi hlangothi.
- Buyisela izinguquko kudatha ngesikhathi sangempela.
- Ikala kalula, ivumela ukuthi ithwale amachibi edatha amakhulu kakhulu. Ukulawulwa kwenguqulo kunganikezwa kukho kokubili izilungiselelo zokuthuthukisa nezokukhiqiza.
bawo
- ILakeFS iwumkhiqizo omusha, ngakho-ke ukusebenza nemibhalo kungase kushintshe ngokushesha kunezixazululo zangaphambilini.
- Njengoba igxile enguqulweni yedatha, uzodinga ukusebenzisa amathuluzi ahlukahlukene ezingxenyeni ezihlukahlukene zokuhamba komsebenzi wesayensi yedatha.
Zamanani
Kumahhala ukusetshenziselwa wonke umuntu.
3. I-DVC
Ukulawulwa Kwenguqulo Yedatha kuyisixazululo samahhala senguqulo yedatha eyenzelwe isayensi yedatha kanye nezinhlelo zokusebenza zokufunda ngomshini. Iwuhlelo olukuvumela ukuthi uchaze ipayipi lakho nganoma yiluphi ulimi.
Ngokuphatha amafayela amakhulu, amasethi edatha, amamodeli okufunda omshini, ikhodi, nokunye, ithuluzi lenza amamodeli okufunda omshini akwazi ukwabelwa futhi akhiqizeke. Uhlelo lulandela ukuhola kwe-Git ekunikezeni umugqa womyalo olula ongasethwa ngezinyathelo ezimbalwa kuphela.
Njengoba igama layo lisho, i-DVC ayigcini nje ngokushintsha idatha. Iphinde isize ukuphathwa kwamapayipi namamodeli okufunda emishini emaqenjini.
Okokugcina, i-DVC izosiza ekuthuthukiseni ukuvumelana kwamamodeli weqembu lakho nokuphindaphinda kwawo. Esikhundleni sokusebenzisa izijobelelo zefayela eziyinkimbinkimbi namazwana kukhodi, zuza Amagatsha e-Git ukuzama imibono emisha. Ukuze uhambe, sebenzisa ukulandelela kwemethrikhi okuzenzakalelayo esikhundleni sephepha nepensela.
Ukudlulisa izinyanda ezingaguquki ze ukufunda imishini amamodeli, idatha, kanye nekhodi ekukhiqizeni, amakhompuyutha akude, noma ideskithophu yozakwenu, ungasebenzisa imiyalo yokuphusha/yokudonsa esikhundleni semibhalo ye-ad-hoc.
buhle
- Ayisindi, ingumthombo ovulekile, futhi isebenza nazo zonke izinkundla zamafu ezinkulu nezinhlobo zesitoreji.
- Iyavumelana nezimo, i-agnostic yefomethi nohlaka, futhi kulula ukuyisebenzisa.
- Yonke imodeli ye-ML ingalandelelwa emuva kukhodi yayo yomthombo kanye nedatha.
bawo
- Ukuphathwa kwamapayipi kanye nokulawulwa kwenguqulo ye-DVC kuxhunywe ngokungenakuhlukaniswa. Kuzoba nomsebenzi ongasasebenzi uma ithimba lakho selivele lisebenzisa omunye umkhiqizo wepayipi ledatha.
- Njengoba i-DVC iyisindi, ithimba lakho kungase kudingeke lidizayine izici ezengeziwe mathupha ukuze liyenze isebenziseke kalula.
Zamanani
Kumahhala ukusetshenziselwa wonke umuntu.
4. DeltaLake
I-DeltaLake iyisendlalelo sesitoreji somthombo ovulekile esithuthukisa ukuthembeka kwechibi ledatha. I-Delta Lake isekela ukuthengiselana kwe-ACID nokuphathwa kwemethadatha okukalayo ngaphezu kokusakaza nokucutshungulwa kwedatha yeqoqo.
Isebenza nama-Apache Spark APIs futhi ihlala echibini lakho ledatha elikhona. I-Delta Sharing iyiphrothokholi yokuqala evulekile emhlabeni yokwabelana ngedatha ephephile ebhizinisini, okwenza kube lula ukushintshanisa idatha namanye amabhizinisi ngaphandle kwezinhlelo zawo zekhompyutha.
I-Delta Lakes iyakwazi ukuphatha ama-petabyte edatha kalula. Imethadatha igcinwa ngendlela efanayo nedatha, futhi abasebenzisi bangayithola besebenzisa indlela yokuchaza imininingwane. I-Delta Lakes inesakhiwo esisodwa esingakwazi ukufunda kokubili idatha yokusakaza neqoqo.
Ama-Upsets alula ukwenza usebenzisa i-Delta. Lokhu kuphazamiseka noma okuhlanganisiwe kuthebula le-Delta kuqhathaniswa ne-SQL Meges. Ungayisebenzisela ukuhlanganisa idatha esuka kolunye uhlaka lwedatha kuthebula lakho futhi wenze izibuyekezo, ufake, nokususa.
buhle
- Amakhono amaningi, njengokuthenga kwe-ACID nokuphathwa kwemethadatha okuqinile, angatholakala kusixazululo sakho samanje sokugcina idatha.
- I-Delta Lake manje isingakwazi ukuphatha kalula amatafula anezigidigidi zama-partitions namafayela esikalini se-petabyte.
- Yehlisa isidingo sokulawulwa kwenguqulo yedatha okwenziwa mathupha nokunye ukukhathazeka kwedatha, okuvumela abathuthukisi ukuthi bagxile ekuthuthukiseni imikhiqizo phezu kwamachibi abo edatha.
bawo
- Njengoba yayiklanyelwe ukusebenza ne-Spark kanye nedatha enkulu, i-Delta Lake ivamise ukugcwala ngokweqile emisebenzini eminingi.
- Idinga ukusetshenziswa kwefomethi yedatha ezinikele, ekhawulela ukuguquguquka kwayo futhi iyenze ingahambisani namafomu akho amanje.
Zamanani
Kumahhala ukusetshenziselwa wonke umuntu.
5. I-Dolt
I-Dolt iyisizinda semininingwane se-SQL esenza i-forking, i-cloning, i-branching, ukuhlanganisa, ukusunduza, nokudonsa ngendlela efanayo ne-git repository. Ukuze uthuthukise ulwazi lomsebenzisi lwesizindalwazi sokulawula inguqulo, i-Dolt ivumela idatha nesakhiwo ukuthi sishintshe ekuvumelaniseni.
Kuyithuluzi elihle kakhulu wena nozakwenu eningasebenzisana kulo. Ungakwazi ukuxhuma ku-Dolt ngendlela efanayo obungenza ngayo kunoma iyiphi enye isizindalwazi se-MySQL futhi uqhube imibuzo noma wenze izinguquko kudatha usebenzisa imiyalo ye-SQL.
Uma kuziwa enguqulweni yedatha, i-Dolt iwuhlobo olulodwa. I-Dolt iyisizinda semininingwane, ngokungafani nezinye izixazululo ezimane zenguqulo yedatha. Nakuba isofthiwe okwamanje isezigabeni zayo zokuqala, kukhona amathemba okuyenza ihambisane ngokugcwele ne-Git ne-MySQL esikhathini esizayo esiseduze.
Yonke imiyalo ojwayele ukuyisebenzisa ne-Git izosebenza ne-Dolt. Amafayela enguqulo ye-Git, amathebula enguqulo ye-Dolt Usebenzisa isixhumi esibonakalayo somugqa womyalo, ngenisa amafayela e-CSV, yenza izinguquko zakho, zishicilele kusilawuli kude, futhi uhlanganise izinguquko zozakwenu.
buhle
- Engasindi futhi umthombo ovulekile ngokwengxenye.
- Uma kuqhathaniswa nokukhetha okungacacile, inokuxhumana kwe-SQL, okuyenza ifinyeleleke kakhulu kubahlaziyi bedatha.
bawo
- Uma kuqhathaniswa nezinye izindlela zokuguqula idatha yesizindalwazi, i-Dolt isewumkhiqizo okhulayo.
- Njengoba i-Dolt iyisizindalwazi, kufanele uthumele idatha yakho kuyo ukuze uthole izinzuzo.
Zamanani
Wonke umuntu wamukelekile ukusebenzisa iseshini yomphakathi. Inkundla ayinikezi amanani entengo; esikhundleni salokho, kufanele uxhumane nomhlinzeki.
6. I-Pachyderm
I-Pachyderm iwuhlelo lokulawula inguqulo yesayensi yedatha yamahhala enezici eziningi. I-Pachyderm Enterprise iyinkundla yesayensi yedatha enamandla eyenzelwe ukubambisana okukhulu ezindaweni eziphephe kakhulu.
I-Pachyderm ingesinye sezinkundla ezimbalwa zohlu zesayensi yedatha. Umgomo we-Pachyderm uwukunikeza inkundla elawula umjikelezo wedatha ophelele futhi ikwenze kube lula ukuphinda okutholwe kumamodeli okufunda omshini. I-Pachyderm yaziwa ngokuthi "i-Docker of Data" kulo mongo. I-Pachyderm ipakisha indawo yakho yokubulawa isebenzisa iziqukathi ze-Docker. Lokhu kwenza kube lula ukuphinda imiphumela efanayo.
Ososayensi bedatha namaqembu e-DevOps bangasebenzisa amamodeli ngokuzethemba ngenxa yokuhlanganiswa kwedatha yenguqulo ne-Docker. Ngenxa yesistimu yokulondoloza esebenza kahle, ama-petabyte edatha ehlelekile nengahlelekile angagcinwa kuyilapho izindleko zokulondoloza zigcinwa zincane.
Kuzo zonke izigaba zamapayipi, ukuguqulwa okusekelwe kufayela kunikeza irekhodi lokucwaninga eliphelele layo yonke idatha nama-artifact, okuhlanganisa okuphumayo okumaphakathi. Amakhono amaningi amathuluzi aqhutshwa yilezi zinsika, ezisiza amaqembu ukuthi azuze okuningi kuzo.
buhle
- Ngokusekelwe kuziqukathi, izindawo zedatha yakho zizophatheka futhi kube lula ukudlulisa phakathi kwabahlinzeki bamafu.
- Iqinile, enekhono lokukala isuka emancane iye kumasistimu amakhulu kakhulu.
bawo
- Njengoba kunezinto eziningi ezihambayo, njengeseva ye-Kubernetes edingekayo ukuphatha uhlelo lwamahhala lwe-Pachyderm, kunejika lokufunda elikhuphukayo.
- I-Pachyderm ingase ibe inselele ukuhlanganisa nengqalasizinda ekhona yenkampani ngenxa yezingxenye zayo eziningi zobuchwepheshe.
Zamanani
Ungaqala ukusebenzisa inkundla ngeseshini yomphakathi kanye ne-edishini yebhizinisi, kufanele uxhumane nomthengisi.
7. Neptune
Imethadatha yokwakha imodeli iphethwe isitolo semethadatha ye-ML, okuyingxenye ebalulekile yesitaki se-MLOps. Kukho konke ukuhamba komsebenzi we-MLOps, i-Neptune isebenza njengendawo yokugcina imethadatha emaphakathi.
Ungakwazi ukulandelela, ubone ngeso lengqondo, futhi uqhathanise izinkulungwane zamamodeli okufunda omshini wonke endaweni eyodwa. Kuhlanganisa izici ezifana nokulandelela ukuhlolwa, ukubhaliswa kwemodeli, nokuqapha imodeli, kanye nesixhumi esibonakalayo esihlanganyelwe. Ihlanganisa amathuluzi ahlukene angaphezu kuka-25 nemitapo yolwazi edidiyelwe, okuhlanganisa ukuqeqeshwa kwamamodeli ambalwa namathuluzi wokushuna we-hyperparameter.
Ungakwazi ukujoyina i-Neptune ngaphandle kokusebenzisa ikhadi lakho lesikweletu. I-akhawunti ye-Gmail izokwanela endaweni yayo.
buhle
- Ukuhlanganisa nanoma iyiphi ipayipi, ukugeleza, i-codebase, noma uhlaka kulula.
- Imibono yesikhathi sangempela, i-API elula, nokusekelwa okusheshayo
- Nge-Neptune, ungenza “isipele” sayo yonke idatha yokuhlolwa kwakho endaweni eyodwa, ongayibuyisela kamuva.
bawo
- Nakuba kungewona umthombo ovulekile ngokuphelele, inguqulo ngayinye ingase yanele ukusetshenziswa kwangasese, nakuba ukufinyelela okunjalo kukhawulelwe enyangeni eyodwa.
- Kunamaphutha amancane okuklama okumele atholakale.
Zamanani
Ungaqala ukusebenzisa inkundla ngohlelo lomuntu siqu olungasetshenziswa mahhala kuwo wonke umuntu. Isigaba senani siqala ku-$150/ngenyanga.
Isiphetho
Kulokhu okuthunyelwe, sixoxe ngamathuluzi angcono kakhulu wokuguqula idatha. Ithuluzi ngalinye, njengoba sesibonile, linesethi yalo yezici. Ezinye zazimahhala, kanti ezinye zazidinga inkokhelo. Ezinye zifaneleka kahle kumodeli yebhizinisi elincane, kanti ezinye zifaneleka kangcono kumodeli yebhizinisi elikhulu.
Njengomphumela, kufanele ukhethe isoftware enhle kakhulu ngezinjongo zakho ngemuva kokukala izinzuzo nezinkinga. Sikhuthaza ukuthi uhlole inguqulo yesilingo samahhala ngaphambi kokuthenga umkhiqizo we-premium.
shiya impendulo