Iinkcukacha zeNzululwazi kunye neengcali zokufunda ngomatshini zijongana nenani elibalulekileyo ledatha yeentlobo ezahlukeneyo kwiprojekthi yesayensi yedatha eqhelekileyo. Iimodeli ezininzi ziye zaphuhliswa ngolungelelwaniso kunye neempawu ezahlukeneyo, kunye nokuphindaphindwa okuphindaphindiweyo kweparameter yokulungisa ukufumana ukusebenza okugqibeleleyo.
Kwimeko enjalo, zonke izilungiso zedatha kunye nohlengahlengiso lwenkqubo yokwakha imodeli kufuneka zibekwe esweni kwaye zilinganiswe ukuze kugqitywe ukuba yintoni esebenzayo kwaye yintoni engazange. Kukwabalulekile ukuba ukwazi ukubuyela kuhlelo lwangaphambili kwaye ujonge kwiziphumo zangaphambili.
Ulawulo lweNguqulelo yeDatha (i-DVC), encedisa ekulawuleni idatha, imodeli ephantsi, kunye nokuqhuba iziphumo ezinokuphinda ziphinde ziphinde ziphinde zenziwe, yenye yeteknoloji eyenza ukuba sikwazi ukubeka esweni konke oku.
Kule post, siza kujonga ngokusondeleyo kuLawulo lweNguqulelo yeDatha, kunye nezona zixhobo zilungileyo zokusebenzisa. Masiqale.
Yintoni uLawulo lweNguqulelo yeDatha?
Uguqulelo luyafuneka kuzo zonke iinkqubo zemveliso. Indawo enye yofikelelo kweyona datha isexesheni. Nasiphi na isibonelelo esihlala silungiswa, ngakumbi ngabasebenzisi abaninzi ngaxeshanye, sifuna ukuyilwa kwendlela yophicotho-zincwadi ukugcina umkhondo walo lonke utshintsho.
Inkqubo yolawulo lwenguqulelo inoxanduva lokuqinisekisa ukuba wonke umntu okwiqela ukwiphepha elinye. Iqinisekisa ukuba wonke umntu kwiqela usebenza kwinguqulelo yamva nje yefayile kwaye, okona kubaluleke kakhulu, ukuba wonke umntu usebenzisana kwiprojekthi enye ngexesha.
Ukuba unezixhobo ezifanelekileyo, unokukuphumeza oku ngomzamo omncinci!
Uya kuba neeseti zedatha ezingaguqukiyo kunye nogcino olucokisekileyo lwalo lonke uphando lwakho ukuba usebenzisa isicwangciso solawulo lwedatha ethembekileyo. Izixhobo zoguqulelo lwedatha zibalulekile kuhambo lwakho lokusebenza ukuba unenkathalo malunga nokuphinda uvelise kwakhona, ukulandeleka, kunye nembali yemodeli yeML.
Bakunceda ufumane uguqulelo lwento, njenge-hash yesethi yedatha okanye imodeli, onokuyisebenzisa ukuchonga nokuthelekisa. Olu guqulelo lwedatha luhlala lungeniswa kwisisombululo sakho solawulo lwemethadatha ukuqinisekisa ukuba uqeqesho lwakho lwemodeli luguqulelwe kwaye luyaphinda.
Eyona zixhobo zoLawulo lweNguqulelo yeDatha
Ngoku lixesha lokujonga ezona zisombululo zolawulo lwedatha ezigqwesileyo ezikhoyo, onokuzisebenzisa ukugcina umkhondo wazo zonke iindawo zekhowudi yakho.
1. I-Git LFS
Iprojekthi yeGit LFS isimahla ukuyisebenzisa. Ngaphakathi kwe-Git, iifayile ezinkulu ezifana neesampulu zomsindo, iividiyo, i-database, kunye neefoto zifakwe endaweni yezalathisi zesicatshulwa, kwaye umxholo wefayile ugcinwa kwiseva ekude njengeGitHub.com okanye i-GitHub Enterprise.
Ikuvumela ukuba usebenzise i-Git ukuguqulela iifayile ezinkulu-ukuya kuthi ga kwii-GB ezininzi ngobukhulu-bamba ngakumbi kwindawo yakho yokugcina i-Git usebenzisa ugcino lwangaphandle, kwaye ulinganise kwaye ufumane kwakhona iifayile ezinkulu zefayile ngokukhawuleza. Xa kuziwa kulawulo lwedatha, esi sisisombululo esihle kakhulu. Ukusebenza neGit, awufuni nayiphi na imiyalelo eyongezelelweyo, iinkqubo zokugcina, okanye izixhobo zokusebenza.
Inciphisa ubuninzi bolwazi olukhuphelayo. Oku kuthetha ukuba ukudibanisa kunye nokubuyisela iifayile ezinkulu kwiindawo zokugcina kuya kukhawuleza. Izikhombisi zenziwe ngezinto ezikhaphukhaphu kwaye zikhomba kwiLFS.
Ngenxa yoko, xa utyhala irepo yakho kwindawo yokugcina yokugcina, ihlaziya ngokukhawuleza kwaye ithatha indawo encinci.
eziluncedo
- Idibanisa ngokulula kuphuhliso lokuqhutywa kwamashishini amaninzi.
- Akukho mfuneko yokuphatha amalungelo awongezelelweyo kuba isebenzisa iimvume ezifanayo njengendawo yokugcina iGit.
neengozi
- I-Git LFS ifuna ukusetyenziswa kweeseva ezinikezelweyo ukugcina idatha yakho. Ngenxa yoko, amaqela akho enzululwazi yedatha aya kuvalelwa ngaphakathi, kwaye umthwalo wakho wobunjineli uya kunyuka.
- Ikhetheke kakhulu, kwaye isenokufuna ukusetyenziswa kwezixhobo ezahlukeneyo ezahlukeneyo kwizigaba ezilandelayo kuhambo lwenzululwazi yedatha.
namaxabiso
Ikhululekile ukuba isetyenziswe kumntu wonke.
2. LakeFS
I-LakeFS sisisombululo esivulekileyo soguqulelo lwedatha egcina idatha kwi-S3 okanye i-GCS kwaye ine-branching efana ne-Git kunye nokwenza i-paradigm elinganisa kwi-petabytes.
Esi sicwangciso sesebe senza i-data yakho ye-ACID ihambelane ngokuvumela utshintsho ukuba lwenzeke kumasebe ahlukeneyo anokwakhiwa, adityaniswe, kwaye aqengqeleke emva kwe-atom kwaye ngoko nangoko.
ILakeFS yenza amaqela enze imisebenzi yedatha yechibi ephindaphindwayo, iathomu, kunye nenguqulelo. Yinto entsha ukuya kwindawo, kodwa ingamandla ukubalwa nayo.
Isebenzisa i-Git-efana ne-branching kunye nendlela yokulawula inguqulelo ukunxibelelana neyakho idatha echibini, inokwandiswa ukuya kwiPetabytes yedatha. Kwinqanaba le-exabyte, unokujonga ulawulo lwenguqulelo.
eziluncedo
- Imisebenzi efana ne-Git ibandakanya ukwenza i-branching, ukuzibophelela, ukudibanisa, kunye nokubuyisela umva.
- Ukuzibophelela kwangaphambili/ukudibanisa amagwegwe kusetyenziselwa ukujonga idatha yeCI/CD.
- Ibonelela ngezinto ezintsonkothileyo ezinje ngentengiselwano ye-ACID yokugcina ilifu elula njenge-S3 kunye ne-GCS, zonke ngelixa ifomati ihleli ingathathi hlangothi.
- Buyisela utshintsho kwidatha ngexesha lokwenyani.
- Izikali ngokulula, zivumela ukuba zifake amachibi amakhulu edatha. Ulawulo loguqulelo lunokubonelelwa kuzo zombini izicwangciso zophuhliso kunye nemveliso.
neengozi
- ILakeFS yimveliso entsha, ngoko ukusebenza kunye namaxwebhu anokutshintsha ngokukhawuleza kunezisombululo zangaphambili.
- Kuba igxile kuguqulelo lwedatha, kuya kufuneka usebenzise izixhobo ezongezelelweyo ezahlukeneyo kwiindawo ezahlukeneyo zokuhamba kwesayensi yedatha.
namaxabiso
Ikhululekile ukuba isetyenziswe kumntu wonke.
3. I-DVC
Ulawulo lweNguqulelo yeDatha sisisombululo sasimahla soguqulelo lwedatha eyenzelwe isayensi yedatha kunye nezicelo zokufunda koomatshini. Yinkqubo ekuvumela ukuba uchaze umbhobho wakho kulo naluphi na ulwimi.
Ngokulawula iifayile ezinkulu, iiseti zedatha, iimodeli zokufunda ngomatshini, ikhowudi, njalo njalo, isixhobo senza ukuba iimodeli zokufunda zoomatshini zikwabelwane kwaye ziphinde zenziwe kwakhona. Inkqubo ilandela ukhokelo lweGit ekuboneleleni ngomgca womyalelo olula onokumiselwa ngamanyathelo ambalwa kuphela.
Njengoko igama layo lisitsho, iDVC ayikho malunga noguqulelo lwedatha kuphela. Ikwaququzelela ulawulo lwemibhobho kunye nemodeli yokufunda koomatshini kumaqela.
Okokugqibela, i-DVC iya kunceda ekuphuculeni ukuhambelana kweemodeli zeqela lakho kunye nokuphindaphinda kwazo. Endaweni yokusebenzisa izimamva zefayile ezintsonkothileyo kunye nezimvo kwikhowudi, thatha ithuba Amasebe eGit ukuzama izimvo ezintsha. Ukuhamba, sebenzisa i-automated metric-tracking endaweni yephepha kunye nepensile.
Ukuhambisa iinyanda ezihambelanayo ze yokufunda umatshini iimodeli, idatha, kunye nekhowudi kwimveliso, iikhompyuter ezikude, okanye idesktop yogxa wakho, ungasebenzisa i-push/tsala imiyalelo endaweni ye-ad-hoc scripts.
eziluncedo
- Ikhaphukhaphu, ngumthombo ovulekileyo, kwaye isebenza nawo onke amaqonga amafu amakhulu kunye neentlobo zokugcina.
- I-Flexible, i-agnostic yefomathi kunye nesakhelo, kwaye kulula ukuyiphumeza.
- Yonke imodeli ye-ML yendaleko inokulandelwa emva kwikhowudi yomthombo kunye nedatha.
neengozi
- Ulawulo lwemibhobho kunye nolawulo lwenguqulelo yeDVC ludityaniswe ngokungenakuhluzwa. Kuya kubakho ukungafuneki ukuba iqela lakho sele lisebenzisa enye imveliso yombhobho wedatha.
- Kuba i-DVC ikhaphukhaphu, iqela lakho linokufuna ukuyila izinto ezongezelelweyo ngokwenza ukuba isebenziseke ngakumbi.
namaxabiso
Ikhululekile ukuba isetyenziswe kumntu wonke.
4. DeltaLake
I-DeltaLake ngumaleko ovulelekileyo wokugcina umthombo okhulisa ukuthembeka kwechibi ledatha. I-Delta Lake ixhasa ukuthengiselana kwe-ACID kunye nokulawulwa kwemethadatha enokunyuka ngaphezu kokusasazwa kunye nokucubungula idatha ye-batch.
Isebenza kunye ne-Apache Spark APIs kwaye ihleli kwi-data ekhoyo echibini. I-Delta Sharing yinkqubo yokuqala yehlabathi evulekileyo yokwabelana ngedatha ekhuselekileyo kushishino, ikwenza kube lula ukutshintshiselana ngedatha namanye amashishini azimeleyo kwiinkqubo zawo zekhompyuter.
I-Delta Lakes iyakwazi ukuphatha iipetabytes zedatha ngokulula. Imethadatha igcinwe ngendlela efanayo nedatha, kwaye abasebenzisi banokuyifumana ngokusebenzisa indlela yokuchaza iNkcazo. I-Delta Lakes ine-architecture eyodwa engakwazi ukufunda zombini idatha ye-stream kunye ne-batch.
I-Usserts ilula ukwenza usebenzisa i-Delta. Oku kuphazamisa okanye ukudibanisa kwitafile yeDelta kuthelekiseka neSQL Meges. Ungayisebenzisa ukudibanisa idatha esuka kwesinye isakhelo sedatha kwitafile yakho kwaye wenze uhlaziyo, ufake, kunye nokucima.
eziluncedo
- Izakhono ezininzi, ezifana nokuthengiselana kwe-ACID kunye nolawulo oluqinileyo lwemethadatha, lunokufumaneka kwisisombululo sakho sokugcina idatha.
- I-Delta Lake ngoku inokulawula ngokulula iitafile kunye neebhiliyoni zezahlulo kunye neefayile kwisikali se-petabyte.
- Yehlisa imfuno yolawulo lwedatha yesandla kunye nezinye iinkxalabo zedatha, ivumela abaphuhlisi ukuba bagxile ekuphuhliseni iimveliso phezu kwamachibi abo edatha.
neengozi
- Njengoko yayiyilelwe ukusebenza kunye ne-Spark kunye nedatha enkulu, i-Delta Lake idla ngokugqithisileyo kwimisebenzi emininzi.
- Ifuna ukusetyenziswa kwefomathi yedatha ezinikeleyo, ethintela ukuguquguquka kwayo kwaye iyenze ingahambelani neefom zakho zangoku.
namaxabiso
Ikhululekile ukuba isetyenziswe kumntu wonke.
5. Dolt
I-Dolt yi-database ye-SQL eyenza i-forking, i-cloning, i-branching, ukudibanisa, ukutyhala, kunye nokutsalwa ngendlela efanayo ne-git repository. Ukuphucula amava omsebenzisi we-database yolawulo lwenguqu, i-Dolt ivumela idatha kunye nesakhiwo ukuba sitshintshe kwi-sync.
Sisixhobo esihle kakhulu sokuba wena kunye nosebenza nabo nisebenzisane. Ungaqhagamshela kwiDolt ngendlela efanayo ukuba ubuya kuyo nayiphi na enye i-database ye-MySQL kwaye uqhube imibuzo okanye wenze utshintsho kwidatha usebenzisa imiyalelo ye-SQL.
Xa kuziwa kuguqulelo lwedatha, iDolt yenye yohlobo oluthile. I-Dolt yindawo egciniweyo, ngokuchaseneyo nezinye izisombululo eziguqulela nje idatha. Ngelixa isoftware okwangoku ikwinqanaba layo lokuqala, kukho ithemba lokuyenza ihambelane ngokupheleleyo neGit kunye neMySQL kungekudala.
Yonke imiyalelo oqhelene nokuyisebenzisa neGit iya kusebenza noDolt. Iifayile zeenguqulelo zeGit, iitafile zeenguqulelo zeDolt Ukusebenzisa ujongano lomgca womyalelo, ngenisa iifayile ze-CSV, yenza utshintsho lwakho, uzipapashe kwindawo ekude, kwaye udibanise utshintsho lweqela lakho.
eziluncedo
- Khaphukhaphu kwaye Vula Umnikezi ngokuyinxenye.
- Xa kuthelekiswa nokukhetha okungaqondakaliyo, kunonxibelelwano lwe-SQL, okwenza kube lula ukufikelela kubahlalutyi bedatha.
neengozi
- Xa kuthelekiswa nezinye iindlela zoguqulelo lwesiseko sedatha, iDolt iseyimveliso ephuhlisayo.
- Ekubeni iDolt iyisiseko sedatha, kufuneka udlulisele idatha yakho kuyo ukuze ufumane izibonelelo.
namaxabiso
Wonke umntu wamkelekile ukuba asebenzise iseshoni yoluntu. Iqonga aliboneleli ngexabiso leprimiyamu; endaweni yoko, kufuneka uqhagamshelane nomboneleli.
6. Pachyderm
I-Pachyderm yinkqubo yolawulo lwenguqulelo yesayensi yedatha yasimahla eneempawu ezininzi. I-Pachyderm Enterprise liqonga lesayensi yedatha enamandla eyenzelwe intsebenziswano enkulu kwiindawo ezikhuseleke kakhulu.
I-Pachyderm yenye yamaqonga ambalwa enzululwazi yedatha yoluhlu. Injongo ye-Pachyderm kukubonelela ngeqonga elilawula umjikelo wedatha opheleleyo kwaye yenza kube lula ukuphinda iziphumo zemodeli yokufunda koomatshini. I-Pachyderm yaziwa ngokuba yi "Docker yeDatha" kulo mongo. I-Pachyderm ipakisha indawo yakho yokwenza usebenzisa izikhongozeli zeDocker. Oku kwenza kube lula ukuphinda iziphumo ezifanayo.
Izazinzulu zedatha kunye namaqela e-DevOps anokusebenzisa iimodeli ngokuzithemba enkosi ngokudityaniswa kwedatha eguqulelweyo kunye neDocker. Ngombulelo kwisistim sokugcina esisebenzayo, i-petabytes yedatha ehleliweyo nengacwangciswanga ingagcinwa ngelixa iindleko zokugcina zigcinwa zincinci.
Kuzo zonke izigaba zemibhobho, uguqulelo olusekelwe kwiifayile lubonelela ngerekhodi lophicotho olucokisekileyo kuzo zonke iinkcukacha kunye nezinto zakudala, kubandakanywa iziphumo eziphakathi. Uninzi lwezakhono zesixhobo ziqhutywa zezi ntsika, ezinceda amaqela ukuba afumane okuninzi kuwo.
eziluncedo
- Ngokusekwe kwizikhongozeli, iindawo zakho zedatha ziya kuphatheka kwaye kube lula ukudluliselwa phakathi kwababoneleli belifu.
- Inamandla, ekwaziyo ukukala ukusuka kwincinci ukuya kwiinkqubo ezinkulu kakhulu.
neengozi
- Kuba zininzi izinto ezihambayo, ezinje ngeseva yeKubernetes eyimfuneko yokuphatha uhlelo lwasimahla lwePachyderm, kukho ijiko lokufunda elithe kratya.
- I-Pachyderm inokuba ngumceli mngeni ukubandakanya kwisiseko esikhoyo senkampani ngenxa yezinto ezininzi zetekhnoloji.
namaxabiso
Unokuqala ukusebenzisa iqonga kunye neseshini yoluntu kunye noshicilelo lweshishini, kufuneka uqhagamshelane nomthengisi.
7. Neptune
Imethadatha yokwakha imodeli ilawulwa yi-ML metadata store, eyona nto ibalulekileyo kwi-stack ye-MLOps. Kuyo yonke iMLOps yokuhamba komsebenzi, iNeptune isebenza njengendawo yokugcina imetadata.
Ungagcina umkhondo, ubone, kwaye uthelekise amawaka eemodeli zokufunda koomatshini kwindawo enye. Ibandakanya iimpawu ezinjengokulandelela umfuniselo, imodeli yobhaliso, kunye nemodeli yokubeka iliso, kunye nojongano lwentsebenziswano. Ibandakanya ngaphezulu kwe-25 yezixhobo ezahlukeneyo kunye namathala eencwadi adityanisiweyo, kubandakanya noqeqesho oluninzi lwemodeli kunye nezixhobo zokulungisa i-hyperparameter.
Ungajoyina iNeptune ngaphandle kokusebenzisa ikhadi lakho letyala. I-akhawunti ye-Gmail iya kwanela endaweni yayo.
eziluncedo
- Ukudibanisa kunye naluphi na umbhobho, ukuhamba, i-codebase, okanye isakhelo silula.
- Imiboniso yexesha lokwenyani, i-API elula, kunye nenkxaso ekhawulezayo
- NgeNeptune, ungenza "ugcino" kuyo yonke imifuniselo yakho yedatha kwindawo enye, onokuyifumana kamva.
neengozi
- Nangona ingengomthombo ovulekileyo ngokupheleleyo, uguqulelo lomntu lunokwanela ukusetyenziswa kwabucala, nangona ukufikelela okunjalo kuthintelwe kwinyanga enye.
- Kukho iimpazamo ezimbalwa zoyilo ezincinci eziza kufunyanwa.
namaxabiso
Ungaqala ukusebenzisa iqonga ngesicwangciso soMntu esikhululekile ukuba sisetyenziswe kumntu wonke. Icandelo lexabiso liqala kwi-$ 150 / ngenyanga.
isiphelo
Kule post, sixoxe ngezona zixhobo zoguqulelo lwedatha. Isixhobo ngasinye, njengoko sesibonile, sineseti yaso yeempawu. Ezinye zazikhululekile, ngoxa ezinye zazifuna intlawulo. Ezinye zifanelekile kwimodeli yamashishini amancinci, ngelixa ezinye zifaneleka ngakumbi kwimodeli yoshishino olukhulu.
Ngenxa yoko, kuya kufuneka ukhethe eyona software ngeenjongo zakho emva kokuvavanya izibonelelo kunye nokungalunganga. Sikhuthaza ukuba uvavanye inguqulelo yesilingo sasimahla ngaphambi kokuba uthenge imveliso yeprimiyamu.
Shiya iMpendulo