M'ndandanda wazopezekamo[Bisani][Show]
Asayansi Aakulu ndi akatswiri ophunzirira pamakina amachita ndi kuchuluka kwa data yamitundu yosiyanasiyana mu projekiti yasayansi ya data. Mitundu yambiri yapangidwa ndi masinthidwe osiyanasiyana ndi mawonekedwe, komanso kubwereza kangapo kwa kusintha kwa magawo kuti agwire bwino ntchito.
Muzochitika zotere, zosintha zonse za data ndikusintha njira zomangira zitsanzo ziyenera kuyang'aniridwa ndikuyesedwa kuti zitsimikizire zomwe zidagwira ntchito ndi zomwe sizinachitike. Ndikofunikiranso kubwereranso ku mtundu wakale ndikuwona zotsatira zam'mbuyomu.
Data Version Control (DVC), yomwe imathandizira kuyang'anira deta, chitsanzo choyambirira, ndi kuyendetsa zotsatira zobwerezabwereza, ndi teknoloji imodzi yotere yomwe imatithandiza kuyang'anira zonsezi.
Mu positi iyi, tiyang'ana mozama mu Data Version Control, ndi zida zabwino kwambiri zogwiritsira ntchito. Tiyeni tiyambe.
Kodi Data Version Control ndi chiyani?
Kusintha ndikofunikira pamakina onse opanga. Malo amodzi ofikira deta zaposachedwa kwambiri. Chida chilichonse chomwe chimasinthidwa nthawi zambiri, makamaka ndi ogwiritsa ntchito angapo nthawi imodzi, chimafunikira kukhazikitsidwa kwa njira yowunikira kuti iwonetse zosintha zonse.
Dongosolo loyang'anira mtunduwo lili ndi udindo wowonetsetsa kuti aliyense mgululi ali patsamba lomwelo. Zimatsimikizira kuti aliyense m'gulu akugwira ntchito pa fayilo yaposachedwa kwambiri ndipo, chofunika kwambiri, kuti aliyense akugwira ntchito imodzi panthawi imodzi.
Ngati muli ndi zida zoyenera, mutha kuchita izi mosavutikira!
Mudzakhala ndi ma seti a data osasinthasintha komanso mbiri yakale ya kafukufuku wanu wonse ngati mutagwiritsa ntchito njira yodalirika yoyendetsera mtundu wa data. Zida zosinthira deta ndizofunikira kwambiri pamayendedwe anu ngati mumasamala za kuchulukirachulukira, kufufuza, ndi mbiri yachitsanzo cha ML.
Amakuthandizani kupeza mtundu wa chinthu, monga hashi ya dataset kapena chitsanzo, chomwe mungagwiritse ntchito kuti muzindikire ndikufanizira. Mtundu wa datawu nthawi zambiri umalowetsedwa munjira yanu yoyendetsera metadata kuti mutsimikizire kuti maphunziro anu achitsanzo amasinthidwa ndikubwereza.
Zida Zabwino Kwambiri Zowongolera Data
Tsopano ndi nthawi yoti muyang'ane njira zabwino kwambiri zoyendetsera mtundu wa data zomwe zilipo, zomwe mungagwiritse ntchito kuti muwunikire gawo lililonse la code yanu.
1. Mtengo LFS
Ntchito ya Git LFS ndi yaulere kugwiritsa ntchito. Mkati mwa Git, mafayilo akulu ngati ma audio, makanema, nkhokwe, ndi zithunzi amalowetsedwa ndi zolozera, ndipo zomwe zili mufayilo zimasungidwa pa seva yakutali ngati GitHub.com kapena GitHub Enterprise.
Zimakupatsani mwayi wogwiritsa ntchito Git kuti musinthe mafayilo akulu - mpaka ma GB angapo kukula kwake - sungani zambiri m'nkhokwe zanu za Git pogwiritsa ntchito zosungira zakunja, ndikufanizira ndikupeza mafayilo akulu mwachangu. Pankhani yoyang'anira deta, iyi ndi njira yabwino yowunikira. Kuti mugwire ntchito ndi Git, simufunikira malamulo owonjezera, makina osungira, kapena zida.
Zimachepetsa kuchuluka kwa zomwe mumatsitsa. Izi zikutanthawuza kuti kujambula ndi kubweza mafayilo akuluakulu kuchokera kumalo osungirako zinthu kumakhala mofulumira. Zolozerazo zimapangidwa ndi zinthu zopepuka ndikuloza ku LFS.
Zotsatira zake, mukakankhira repo yanu munkhokwe yayikulu, imasintha mwachangu ndipo imatenga malo ochepa.
ubwino
- Imaphatikizana mosavuta ndikuyenda kwachitukuko kwamabizinesi ambiri.
- Palibe chifukwa chogwirira maufulu owonjezera chifukwa imagwiritsa ntchito zilolezo zofanana ndi malo a Git.
kuipa
- Git LFS imafuna kugwiritsa ntchito ma seva odzipatulira kusunga deta yanu. Zotsatira zake, magulu anu asayansi ya data adzatsekedwa, ndipo ntchito yanu ya uinjiniya idzakwera.
- Zapadera kwambiri, ndipo zingafunike kugwiritsa ntchito zida zosiyanasiyana m'magawo otsatirawa mumayendedwe asayansi ya data.
mitengo
Ndi yaulere kuti igwiritsidwe ntchito kwa aliyense.
2. LakeFS
LakeFS ndi njira yotseguka yosinthira deta yomwe imasunga deta mu S3 kapena GCS ndipo ili ndi nthambi ngati Git ndikuchita paradigm yomwe imafikira ku petabytes.
Njira yopangira nthambi iyi imapangitsa kuti nyanja yanu ya ACID igwirizane polola kuti kusintha kuchitike m'nthambi zina zomwe zitha kumangidwa, kuphatikizidwa, ndikubwezeredwa m'maatomu komanso nthawi yomweyo.
LakeFS imathandizira magulu kupanga zochitika zam'madzi zomwe zimatha kubwerezedwa, atomiki, komanso kusinthidwa. Ndi wongoyamba kumene ku zochitika, koma ndi mphamvu yowerengera.
Imagwiritsa ntchito njira yofanana ndi Git ndi njira yowongolera mtundu kuti igwirizane ndi zanu data lake, scalable mpaka Petabytes of data. Pamlingo wa exabyte, mutha kuyang'ana kuwongolera kwa mtundu.
ubwino
- Zochita ngati za Git zikuphatikiza kupanga nthambi, kuchita, kuphatikiza, ndi kubwezeretsa.
- Pre-commit/merge mbedza amagwiritsidwa ntchito pofufuza ma CI/CD.
- Amapereka zinthu zovuta monga zochitika za ACID zosungirako zosavuta zamtambo monga S3 ndi GCS, zonse zikukhala zosalowerera ndale.
- Bwezerani zosintha ku data munthawi yeniyeni.
- Mayeso osavuta, omwe amalola kuti azitha kukhala ndi nyanja zazikulu kwambiri zama data. Kuwongolera kwamitundu kutha kuperekedwa pazosintha zachitukuko ndi kupanga.
kuipa
- LakeFS ndi chinthu chatsopano, motero magwiridwe antchito ndi zolemba zitha kusintha mwachangu kuposa mayankho am'mbuyomu.
- Popeza imayang'ana pakusintha kwa data, muyenera kugwiritsa ntchito zida zowonjezera zosiyanasiyana pamagawo osiyanasiyana asayansi ya data.
mitengo
Ndi yaulere kuti igwiritsidwe ntchito kwa aliyense.
3. DVC
Data Version Control ndi njira yaulere yosinthira deta yopangidwira sayansi ya data ndi kugwiritsa ntchito makina ophunzirira. Ndi pulogalamu yomwe imakupatsani mwayi wofotokozera mayendedwe anu m'chinenero chilichonse.
Poyang'anira mafayilo akulu, ma seti a data, makina ophunzirira makina, ma code, ndi zina zotero, chidachi chimapangitsa kuti makina ophunzirira azitha kugawana nawo komanso kupangidwanso. Pulogalamuyi imatsatira chitsogozo cha Git popereka mzere wosavuta wolamula womwe ungathe kukhazikitsidwa pang'onopang'ono.
Monga dzina lake limatanthawuzira, DVC sikungokhudza kusinthidwa kwa data. Imathandiziranso kasamalidwe ka mapaipi ndi makina ophunzirira makina amagulu.
Pomaliza, DVC ikuthandizani kusinthasintha kwamitundu yamagulu anu ndikubwerezanso. M'malo mogwiritsa ntchito ma suffixes ovuta komanso ndemanga pama code, gwiritsani ntchito mwayi Nthambi za Git kuyesa malingaliro atsopano. Kuti muyende, gwiritsani ntchito ma metric tracking m'malo mwa pepala ndi pensulo.
Kutumiza mitolo yosasinthasintha ya makina kuphunzira zitsanzo, deta, ndi ma code pakupanga, makompyuta akutali, kapena kompyuta ya anzanu, mutha kugwiritsa ntchito malamulo okankhira / kukoka m'malo mwa ad-hoc scripts.
ubwino
- Ndiwopepuka, yotseguka, ndipo imagwira ntchito ndi nsanja zazikulu zonse zamtambo ndi mitundu yosungira.
- Flexible, agnostic of format ndi framework, komanso yosavuta kugwiritsa ntchito.
- Kusintha kwa mtundu uliwonse wa ML kumatha kutsatiridwa ndi magwero ake ndi deta.
kuipa
- Kasamalidwe ka mapaipi ndi kuwongolera mtundu wa DVC ndizolumikizidwa mosadukiza. Padzakhala kuchepa ngati gulu lanu likugwiritsa ntchito kale chida china chapaipi.
- Popeza DVC ndi yopepuka, gulu lanu lingafunike kupanga zina zowonjezera pamanja kuti zikhale zosavuta kugwiritsa ntchito.
mitengo
Ndi yaulere kuti igwiritsidwe ntchito kwa aliyense.
4. DeltaLake
DeltaLake ndi malo osungira otseguka omwe amathandizira kudalirika kwa nyanja. Delta Lake imathandizira zochitika za ACID ndi kasamalidwe ka metadata scalable kuwonjezera pa kusuntha ndi kukonza deta.
Imagwira ntchito ndi ma Apache Spark APIs ndipo imakhala panyanja yanu ya data yomwe ilipo. Delta Sharing ndiye njira yoyamba yotseguka padziko lonse lapansi yogawana zotetezedwa mubizinesi, zomwe zimapangitsa kukhala kosavuta kusinthanitsa deta ndi mabizinesi ena osatengera makompyuta awo.
Delta Lakes amatha kunyamula ma petabytes a data mosavuta. Metadata imasungidwa mofanana ndi deta, ndipo ogwiritsa ntchito akhoza kuipeza pogwiritsa ntchito njira ya Describe Detail. Delta Lakes ili ndi kamangidwe kake komwe kamatha kuwerenga zonse zamtsinje ndi batch.
Ma Upser ndi osavuta kuchita pogwiritsa ntchito Delta. Izi zokwiyitsa kapena kuphatikiza mu tebulo la Delta zikufanana ndi SQL Merges. Mutha kuzigwiritsa ntchito kuti muphatikize data kuchokera kumtundu wina wa data patebulo lanu ndikusintha, kuyika, ndikuchotsa.
ubwino
- Zambiri, monga mayendedwe a ACID ndi kasamalidwe kolimba ka metadata, zitha kupezeka munjira yanu yosungira deta.
- Delta Lake tsopano imatha kuyendetsa bwino matebulo okhala ndi magawo mabiliyoni ndi mafayilo pamlingo wa petabyte.
- Imachepetsa kufunikira kowongolera mtundu wa data pamanja ndi zovuta zina za data, kulola opanga kuti akhazikike kwambiri popanga zinthu pamwamba pa malo awo opangira data.
kuipa
- Monga idapangidwira kuti igwire ntchito ndi Spark ndi data yayikulu, Delta Lake nthawi zambiri imakhala yochulukira pantchito zambiri.
- Zimafunikira kugwiritsa ntchito mtundu wodzipatulira wa data, womwe umalepheretsa kusinthasintha kwake ndikupangitsa kuti zisagwirizane ndi mawonekedwe anu omwe alipo.
mitengo
Ndi yaulere kuti igwiritsidwe ntchito kwa aliyense.
5. Dothi
Dolt ndi database ya SQL yomwe imapanga forking, cloning, nthambi, kuphatikiza, kukankha, ndi kukoka mofanana ndi momwe git repository imachitira. Kuti muwongolere luso la wogwiritsa ntchito pankhokwe yowongolera mtundu, Dolt imalola kuti deta ndi kapangidwe kake zisinthe pakulumikizana.
Ndi chida chabwino kwambiri choti inu ndi antchito anzanu mugwirizanitsepo. Mukhoza kulumikiza ku Dolt mofanana ndi momwe mungapangire deta ina iliyonse ya MySQL ndikuyendetsa mafunso kapena kusintha deta pogwiritsa ntchito malamulo a SQL.
Zikafika pakusintha kwa data, Dolt ndi imodzi mwamtundu wina. Dolt ndi nkhokwe, mosiyana ndi mayankho ena omwe amangomasulira. Ngakhale pulogalamuyo idakali koyambirira, pali chiyembekezo kuti ikugwirizana kwathunthu ndi Git ndi MySQL posachedwa.
Malamulo onse omwe mumawadziwa bwino ndi Git adzagwiranso ntchito ndi Dolt. Mafayilo amtundu wa Git, matebulo amitundu ya Dolt Pogwiritsa ntchito mawonekedwe a mzere wolamula, lowetsani mafayilo a CSV, perekani zosintha zanu, zisindikize kutali, ndikuphatikiza zosintha za anzanu.
ubwino
- Wopepuka ndi gwero lotseguka mwagawo.
- Poyerekeza ndi zosankha zosadziwika bwino, zimakhala ndi mawonekedwe a SQL, zomwe zimapangitsa kuti zikhale zosavuta kuzipeza kwa akatswiri a deta.
kuipa
- Poyerekeza ndi njira zina zosinthira ma database, Dolt akadali chinthu chomwe chikukula.
- Popeza Dolt ndi nkhokwe, muyenera kusamutsa deta yanu kuti mupeze phindu.
mitengo
Aliyense ndiwololedwa kugwiritsa ntchito gawo la anthu ammudzi. Pulatifomu sikupereka mitengo yamtengo wapatali; m'malo mwake, muyenera kulumikizana ndi wothandizira.
6. Pachyderm
Pachyderm ndi pulogalamu yaulere yowongolera sayansi ya data yokhala ndi zinthu zambiri. Pachyderm Enterprise ndi nsanja yamphamvu ya sayansi ya data yomwe idapangidwa kuti igwirizane ndi magulu otetezedwa kwambiri.
Pachyderm ndi imodzi mwamapulatifomu ochepa a sayansi ya data. Cholinga cha Pachyderm ndikupereka nsanja yomwe imayendetsa kuzungulira kwa data yonse ndikupangitsa kuti ikhale yosavuta kubwereza zomwe zapezedwa pamakina ophunzirira makina. Pachyderm amadziwika kuti "Docker of Data" munkhaniyi. Pachyderm imayika malo anu opheramo pogwiritsa ntchito zida za Docker. Izi zimapangitsa kukhala kosavuta kubwereza zotsatira zomwezo.
Asayansi a data ndi magulu a DevOps amatha kugwiritsa ntchito zitsanzo molimba mtima chifukwa cha kuphatikiza kwa data yosinthidwa ndi Docker. Chifukwa cha njira yabwino yosungiramo zinthu, ma petabytes a deta yokhazikika komanso yosasinthika akhoza kusungidwa pamene ndalama zosungirako zimakhala zochepa.
M'magawo onse a mapaipi, kusinthidwa kochokera pamafayilo kumapereka mbiri yowunikira bwino pama data onse ndi zinthu zakale, kuphatikiza zotuluka zapakatikati. Mphamvu zambiri za zidazi zimayendetsedwa ndi zipilala izi, zomwe zimathandiza magulu kuti apindule kwambiri.
ubwino
- Kutengera zotengera, malo anu a data adzakhala osunthika komanso osavuta kusamutsa pakati pa omwe amapereka mitambo.
- Wolimba, wokhoza kukula kuchokera kuzinthu zazing'ono mpaka zazikulu kwambiri.
kuipa
- Popeza pali zinthu zambiri zosuntha, monga seva ya Kubernetes yofunikira kuti muzitha kusindikiza kwaulere kwa Pachyderm, pali njira yophunzirira yotalikirapo.
- Pachyderm ikhoza kukhala yovuta kuphatikizira mumakampani omwe alipo chifukwa chazinthu zambiri zaukadaulo.
mitengo
Mutha kuyamba kugwiritsa ntchito nsanja ndi gawo la anthu ammudzi komanso kusindikiza kwamabizinesi, muyenera kulumikizana ndi ogulitsa.
7. Neptune
Metadata yomanga ma Model imayendetsedwa ndi sitolo ya metadata ya ML, yomwe ndi gawo lofunikira la stack ya MLOps. Pamayendedwe aliwonse a MLOps, Neptune imakhala ngati yosungirako metadata yapakati.
Mutha kuyang'anira, kuwona, ndikuyerekeza masauzande amitundu yophunzirira makina onse pamalo amodzi. Zimaphatikizapo zinthu monga kufufuza moyesera, kaundula wa zitsanzo, ndi kuyang'anira zitsanzo, komanso mawonekedwe ogwirizana. Mulinso zida zopitilira 25 ndi malaibulale ophatikizidwa, kuphatikiza maphunziro angapo amitundu ndi zida zosinthira ma hyperparameter.
Mutha kujowina Neptune osagwiritsa ntchito kirediti kadi yanu. Akaunti ya Gmail idzakwanira m'malo mwake.
ubwino
- Kuphatikiza ndi payipi iliyonse, kuyenda, codebase, kapena chimango ndikosavuta.
- Zowonera zenizeni zenizeni, API yosavuta, komanso chithandizo chachangu
- Ndi Neptune, mutha kupanga "zosunga zobwezeretsera" za data yanu yonse pamalo amodzi, zomwe mutha kuchira pambuyo pake.
kuipa
- Ngakhale kuti sikunali kotsegula, mtundu wamtundu uliwonse ungakhale wokwanira kugwiritsidwa ntchito mwachinsinsi, ngakhale kuti mwayi woterewu umangokhala mwezi umodzi.
- Pali zolakwika zochepa zazing'ono zomwe zingapezeke.
mitengo
Mutha kuyamba kugwiritsa ntchito nsanja ndi Dongosolo la Munthu Payekha lomwe ndi laulere kugwiritsa ntchito kwa aliyense. Gawo lamitengo limayambira pa $150/mwezi.
Kutsiliza
Mu positi iyi, tidakambirana zida zabwino kwambiri zosinthira deta. Chida chilichonse, monga tawonera, chili ndi zida zake. Zina zinali zaulere, pamene zina zinkafuna malipiro. Zina zimagwirizana bwino ndi chitsanzo cha bizinesi yaying'ono, pamene zina ndizoyenera kwambiri ku bizinesi yaikulu.
Zotsatira zake, muyenera kusankha pulogalamu yabwino kwambiri pazolinga zanu mutawunika zabwino ndi zovuta zake. Tikukulimbikitsani kuti muyese mayeso aulere musanagule chinthu chamtengo wapatali.
Siyani Mumakonda