Okuqukethwe[Fihla][Bonisa]
Kungase kube nzima ukucabangela zonke izinsiza ezitholakalayo nezinketho zezakhiwo uma ucabanga ngamapulatifomu edatha.
Inkundla yedatha yebhizinisi ngokuvamile iqukethe izindawo zokugcina idatha, amamodeli edatha, amachibi edatha, nemibiko, ngayinye enenjongo ethile kanye nesethi yamakhono adingekayo. Ngokuphambene, idizayini entsha ebizwa ngokuthi i-lakehouse yedatha iye yavela phakathi neminyaka embalwa edlule.
Ukuhlukahluka kwamachibi edatha kanye nokuphathwa kwedatha yenqolobane yedatha kuhlanganiswe ekwakhiweni kwenguquko yokugcina idatha ebizwa ngokuthi "i-lakehouse yedatha."
Sizohlola idatha ye-lakehouse ngokujulile kulokhu okuthunyelwe, okuhlanganisa izingxenye zayo, izici, izakhiwo, nezinye izici.
Iyini i-Data Lakehouse?
Njengoba igama lisho, i-lakehouse yedatha iwuhlobo olusha lwe-architecture yedatha ehlanganisa ichibi ledatha nenqolobane yedatha ukuze kuxazululwe ukushiyeka kwento ngayinye ngokwehlukana.
Empeleni, isistimu ye-lakehouse isebenzisa isitoreji esishibhile ukuze igcine amanani amakhulu edatha ngezindlela zayo zangempela, njengamachibi edatha. Ukwengeza isendlalelo semethadatha phezulu kwesitolo kuphinda kunikeze ukwakheka kwedatha futhi kunikeze amandla amathuluzi okuphatha idatha afana nalawo atholakala ezinqolobaneni zedatha.
Igcina inqwaba yedatha ehleliwe, enesakhiwo esincane, nengahlelekile abayithola ezinhlelweni ezihlukene zebhizinisi, amasistimu, namagajethi asetshenziswa kuyo yonke inhlangano yabo.
Isikhathi esiningi, amachibi edatha asebenzisa ingqalasizinda yokugcina eshibhile ene-file application programming interface (API) ukugcina idatha kumafomethi amafayela ajwayelekile avuliwe.
Lokhu kwenza ukuthi amaqembu amaningi akwazi ukufinyelela yonke idatha yenkampani ngohlelo olulodwa lwezinhlelo ezihlukahlukene, njengesayensi yedatha, ukufunda imishini, kanye nobuhlakani bebhizinisi.
Izici
- Isitoreji esinezindleko eziphansi. I-lakehouse yedatha kufanele ikwazi ukugcina idatha endaweni yokugcina izinto engabizi, njenge Ifu le-Google Isitoreji, I-Azure Blob Storage, Isevisi Yesitoreji Esilula ye-Amazon, noma kusetshenziswa i-ORC noma iParquet.
- Amandla okuthuthukisa idatha: Ukuthuthukiswa kwesakhiwo sedatha, ukugcinwa kunqolobane, kanye nezinkomba kuyizibonelo ezimbalwa zendlela i-lakehouse yedatha okufanele ikwazi ngayo ukuthuthukisa idatha kuyilapho igcina ifomethi yoqobo yedatha.
- Isendlalelo semethadatha yokwenziwayo: Ngaphezu kwesitoreji esibalulekile esibiza kancane, lokhu kunika amandla amakhono okuphatha idatha abalulekile ekusebenzeni kwenqolobane yedatha.
- Usekelo lwe-Declarative DataFrame API: Iningi lamathuluzi e-AI lingasebenzisa ama-DataFrame ukuze libuyise idatha yesitolo sento eluhlaza. Ukusekelwa kwe-Declarative DataFrame API kukhulisa amandla okuthuthukisa ngokuguquguqukayo ukwethulwa nokwakheka kwedatha ekuphenduleni kusayensi yedatha ethile noma umsebenzi we-AI.
- Ukusekela okwenziwayo kwe-ACID: Isifinyezo esithi ACID, esimele i-atomicity, ukungaguquguquki, ukuhlukaniswa, nokuqina, siyingxenye ebalulekile ekuchazeni okwenziwayo kanye nokuqinisekisa ukuvumelana nokwethembeka kwedatha. Ukuthengiselana okunjalo ngaphambili kwakungenzeka kuphela ezindaweni zokugcina idatha, kodwa i-lakehouse inikeza inketho yokuwasebenzisa namachibi edatha kanjalo. Ngamaphayiphi edatha ambalwa okuhlanganisa ukufundwa nokubhala kwedatha kanyekanye, lokhu kuxazulula inkinga yekhwalithi yedatha ephansi yakamuva.
Izinto ze-Data Lakehouse
Isakhiwo se-lakehouse yedatha ihlukaniswe ngezigaba ezimbili eziyinhloko ezingeni eliphezulu. Ukuthathwa kwedatha yesendlalelo sesitoreji kulawulwa inkundla yeLakehouse (okungukuthi, ichibi ledatha).
Ngaphandle kokudinga ukulayisha idatha ku-warehouse yedatha noma ukuyiguqulela kufomethi yobunikazi, isendlalelo sokucubungula singakwazi ukubuza idatha kusendlalelo sesitoreji sisebenzisa amathuluzi ahlukahlukene.
Bese, izinhlelo zokusebenza ze-BI, kanye nobuchwepheshe be-AI ne-ML, zingasebenzisa idatha. Umnotho wechibi ledatha uhlinzekwa yilo mklamo, kodwa ngenxa yokuthi noma iyiphi injini yokucubungula ingafunda le datha, amabhizinisi anenkululeko yokwenza idatha elungisiwe ifinyeleleke ukuze ihlaziywe ngohlu lwamasistimu. Ukusebenza kwephrosesa kanye nezindleko kokubili kungathuthukiswa ngokusebenzisa le ndlela yokucubungula nokuhlaziya.
Ngenxa yokwesekwa kwayo kokwenziwa kwedathabheyisi enamathela kumibandela elandelayo ye-ACID (i-atomicity, ukungaguquguquki, ukuhlukaniswa, nokuqina), isakhiwo siphinde sinike amandla amaqembu amaningi ukuthi afinyelele futhi abhale idatha ngesikhathi esisodwa ngaphakathi kwesistimu:
- I-Atomicity isho iqiniso lokuthi umsebenzi ogcwele noma akukho, uyaphumelela ngenkathi kuqedwa umsebenzi. Esimeni lapho inqubo iphazamiseka, lokhu kusiza ukugwema ukulahleka kwedatha noma inkohlakalo.
- ukungaguquguquki iqinisekisa ukuthi ukuthengiselana kwenzeka ngendlela ebikezelwayo, engaguquki. Igcina ubuqotho bedatha ngokuqinisekisa ukuthi yonke idatha isemthethweni ngokuvumelana nemithetho enqunywe kusengaphambili.
- Ukuba Wedwa iqinisekisa ukuthi, kuze kube iyaphela, akukho kuthenga okungathintwa yinoma yikuphi okunye ukuthengiselana ngaphakathi kwesistimu. Lokhu kuvumela amaqembu amaningi ukuthi afunde futhi abhale esistimu efanayo ngesikhathi esisodwa ngaphandle kokuphazamisana.
- ukuqina iqinisekisa ukuthi izinguquko kudatha ohlelweni ziyaqhubeka nokuba khona ngemva kokuba umsebenzi usuqediwe, ngisho noma kwenzeka iphutha lesistimu. Noma yiziphi izinguquko ezilethwa umsebenzi zigcinwa efayelini kuze kube phakade.
Idatha ye-Lakehouse Architecture
I-Databricks (umsunguli nomklami womqondo wabo we-Delta Lake) kanye ne-AWS bangabameli ababili abakhulu bomqondo we-lakehouse yedatha. Ngakho sizoncika olwazini lwabo nasekuqondeni ukuze sichaze ukwakheka kwezakhiwo zamachibi.
Isistimu ye-lakehouse yedatha ngokuvamile izoba nezigaba ezinhlanu:
- Isendlalelo sokungenisa
- Isendlalelo sesitoreji
- Isendlalelo semethadatha
- Isendlalelo se-API
- Isendlalelo sokusetshenziswa
Isendlalelo sokungenisa
Isendlalelo sokuqala sesistimu siphethe ukuqoqa idatha emithonjeni ehlukahlukene futhi siyithumele kusendlalelo sesitoreji. Isendlalelo singasebenzisa amaphrothokholi ambalwa ukuze sixhume emithonjeni eminingi yangaphakathi nengaphandle, okuhlanganisa ukuhlanganisa inqwaba namandla okusakaza idatha, njenge
- NoSQL database,
- amafayela amasheya
- Izicelo ze-CRM,
- amawebhusayithi,
- Izinzwa ze-IoT,
- ezokuxhumana,
- I-Software as a Service (SaaS) izicelo, kanye
- izinhlelo zokuphatha isizindalwazi esihlobene, njll.
Kuleli qophelo, izingxenye ezifana ne-Apache Kafka zokusakazwa kwedatha kanye ne-Amazon Data Migration Service (i-Amazon DMS) zokungenisa idatha kusuka ku-RDBMSs kanye nesizindalwazi se-NoSQL zingaqashwa.
Isendlalelo sesitoreji
Isakhiwo se-lakehouse senzelwe ukunika amandla ukugcinwa kwezinhlobo ezahlukahlukene zedatha njengezinto ezitolo zezinto ezingabizi, njenge-AWS S3. Ngokusebenzisa amafomethi efayela avulekile, amathuluzi eklayenti angakwazi ukufunda lezi zinto ngokuqondile esitolo.
Lokhu kwenza kube nokwenzeka ukuthi ama-API amaningi nezingxenye zesendlalelo sokusebenzisa zifinyelele futhi zisebenzise idatha efanayo. Isendlalelo semethadatha sigcina ama-schema amasethi edatha ahlelekile nasakhiwe kancane ukuze izingxenye ziwasebenzise kudatha njengoba ziyifunda.
I-Hadoop Distributed File System (HDFS) yesikhulumi, isibonelo, ingasetshenziswa ukwakha izinsiza ze-cloud repository ezihlukanisa i-computing nokugcina endaweni. ILakehouse izifanele lezi zinsizakalo.
Isendlalelo semethadatha
Isendlalelo semethadatha siyingxenye eyisisekelo ye-lakehouse yedatha ehlukanisa lo mklamo. Ikhathalogi eyodwa enikeza imethadatha (ulwazi mayelana nezinye izingcezu zedatha) yazo zonke izinto ezigcinwe echibini futhi ivumela abasebenzisi ukuthi basebenzise amakhono okuphatha afana nalawa:
- Inguqulo engaguquki yedathabheyisi ibonakala ngokuthengiselana ngesikhathi esisodwa ngenxa yemisebenzi ye-ACID;
- ukulondoloza isikhashana ukulondoloza amafayela esitolo sezinto zefu;
- ukwengeza izinkomba zesakhiwo sedatha usebenzisa ukukhomba ukusheshisa ukucutshungulwa kwemibuzo;
- usebenzisa i-cloning enguziro-copy ukuze kuphindwe izinto zedatha; futhi
- ukugcina izinguqulo ezithile zedatha, njll., sebenzisa inguqulo yedatha.
Ukwengeza, isendlalelo semethadatha sivumela ukuqaliswa kokuphathwa kwe-schema, ukusetshenziswa kwezihloko ze-schema ze-DW njengezikimu zenkanyezi/iqhwa, kanye nokuhlinzekwa kokulawulwa kwedatha nekhono lokucwaninga ngokuqondile echibini ledatha, okuthuthukisa ubuqotho bayo yonke ipayipi yedatha.
Izici zokuvela kwe-schema nokuphoqelela zifakiwe ekuphathweni kwe-schema. Ngokwenqaba noma yikuphi ukubhala okungahlangabezani ne-schema yethebula, ukusetshenziswa kwe-schema kuvumela abasebenzisi ukuthi bagcine ubuqotho nekhwalithi yedatha.
Ukuvela kwe-schema kuvumela i-schema yamanje yethebula ukuthi ilungiswe ukuze ivumelane nokushintsha kwedatha. Ngenxa yokuxhumana okukodwa kokuphatha phezu kwechibi ledatha, kukhona nokulawula ukufinyelela nezindlela zokuhlola.
Isendlalelo se-API
Olunye ungqimba olubalulekile lwezakhiwo manje lukhona, lusingatha inani lama-API bonke abasebenzisi bokugcina abangawasebenzisa ukwenza imisebenzi ngokushesha okukhulu futhi bathole izibalo eziyinkimbinkimbi.
Ukusetshenziswa kwe-metadata APIs kwenza kube lula ukukhomba nokufinyelela izinto zedatha ezidingekayo kuhlelo lokusebenza olunikeziwe.
Ngokuya ngamalabhulali okufunda ngomshini, amanye awo, njenge-TensorFlow ne-Spark MLlib, angafunda amafomethi efayela avulekile njenge-Parquet futhi afinyelele ngokuqondile isendlalelo semethadatha.
Ngesikhathi esifanayo, ama-DataFrame APIs anikeza amathuba amakhulu okuthuthukisa, okwenza abahleli bezinhlelo bakwazi ukuhlela nokushintsha idatha ehlakazekile.
Isendlalelo sokusetshenziswa
I-Power BI, i-Tableau, namanye amathuluzi nezinhlelo zokusebenza isingathwa ngaphansi kwesendlalelo sokusebenzisa. Ngomklamo we-lakehouse, yonke imethadatha nayo yonke idatha egcinwe echibini ifinyeleleka ezinhlelweni zokusebenza zamaklayenti.
I-lakehouse ingasetshenziswa yibo bonke abasebenzisi ngaphakathi kwenkampani ukwenza zonke izinhlobo imisebenzi analytics, okuhlanganisa ukudala amadeshibhodi obuhlakani bebhizinisi nokusebenzisa imibuzo ye-SQL nemisebenzi yokufunda ngomshini.
Izinzuzo ze-Data Lakehouse
Izinhlangano zingakha i-lakehouse yedatha ukuze zihlanganise inkundla yazo yamanje yedatha futhi zithuthukise yonke inqubo yazo yokuphatha idatha. Ngokudiliza izithiyo ze-silo ezixhuma imithombo ehlukahlukene, i-lakehouse yedatha ingangena esikhundleni sesidingo sezixazululo ezihlukile.
Uma kuqhathaniswa nemithombo yedatha ekhethiwe, lokhu kuhlanganiswa kukhiqiza inqubo esebenza kahle kakhulu yokugcina-ukuphela. Lokhu kunezinzuzo ezimbalwa:
- Ukuphatha okuncane: Kunokuba kukhishwe idatha kudatha eluhlaza nokuyilungiselela ukuthi isetshenziswe endaweni yokugcina idatha, i-lakehouse yedatha ivumela noma yimiphi imithombo exhunywe kuyo ukuthi idatha yayo itholakale futhi ihlelwe ukuze isetshenziswe.
- Ukwenyuka kokuphumelela kwezindleko: Izindlu zamachibi zedatha zakhiwa kusetshenziswa ingqalasizinda yesimanje ehlukanisa ukubala nokugcinwa, okwenza kube lula ukunweba isitoreji ngaphandle kokwenyusa amandla ekhompyutha. Ukusetshenziswa nje kokugcinwa kwedatha okungabizi kuphumela ekukhuleni okungabizi kakhulu.
- Ukubusa kwedatha okungcono: Amachibi edatha akhiwe anezakhiwo ezivulekile ezijwayelekile, okuvumela ukulawula okwengeziwe kokuvikeleka, amamethrikhi, ukufinyelela okusekelwe indima, nezinye izingxenye ezibalulekile zokuphatha. Ngokuhlanganisa izinsiza nemithombo yedatha, zenza kube lula futhi zithuthukise ukubusa.
- Izindinganiso ezenziwe lula: Njengoba ukuxhumeka kwakukhawulelwe kakhulu ngeminyaka yawo-1980, lapho izindawo zokugcina idatha ziqala ukwakhiwa, izindinganiso ze-schema zendawo zazivame ukuthuthukiswa ngaphakathi kwamabhizinisi, ngisho neminyango. Izikhungo zedatha zisebenzisa iqiniso lokuthi izinhlobo eziningi zedatha manje zinezindinganiso ezivulekile ze-schema ngokungenisa imithombo eminingi yedatha nge-schema efanayo eyeqanayo ukuze kuqondiswe izinqubo.
Ukungalungi kweDatha Lakehouse
Naphezu kwayo yonke i-hoopla ezungeze amachibi edatha, kubalulekile ukukhumbula ukuthi umbono usemusha kakhulu. Qiniseka ukuthi ulinganisa ububi ngaphambi kokuzibophezela ngokugcwele kulo mklamo omusha.
- Isakhiwo se-Monolithic: Idizayini ehlanganisa konke ye-lakehouse inikeza izinzuzo ezimbalwa, kodwa futhi iphakamisa izinkinga ezithile. Izakhiwo ze-Monolithic ngokuvamile ziholela kusevisi embi kubo bonke abasebenzisi futhi zingaba lukhuni futhi kube nzima ukuzinakekela. Imvamisa, abaklami bezakhiwo nabaklami bathanda ukwakheka kwemodular abangakwazi ukuzenza ngendlela oyifisayo ezimweni ezihlukahlukene zokusetshenziswa.
- Ubuchwepheshe abukho okwamanje: umgomo wokugcina uhlanganisa inani elibalulekile lokufunda komshini nobuhlakani bokwenziwa. Ngaphambi kokuthi amachibi asebenze njengoba kucatshangwa, lobu buchwepheshe kufanele buthuthuke ngokuqhubekayo.
- Akuyona intuthuko enkulu phezu kwezakhiwo ezikhona: Kusenokungabaza okukhulu ngokuthi izindlu zamachibi zizonikela ngenani elingakanani ngempela. Abanye abagxeki baphikisa ngokuthi idizayini ye-lake-warehouse ebhangqwe nezinto ezizenzakalelayo ezizenzakalelayo ingathola ukusebenza kahle okufanayo.
Izinselelo ze-Data Lakehouse
Kungase kube nzima ukusebenzisa inqubo ye-lakehouse yedatha. Ngenxa yokuxaka kwezingxenye zayo, akulungile ukubuka i-lakehouse yedatha njengesakhiwo esihlanganisa konke noma "inkundla eyodwa yayo yonke into," eyodwa.
Ukwengeza, ngenxa yokwamukelwa okwandayo kwamachibi edatha, amabhizinisi kuzodingeka athuthele kuwo ama-warehouse awo amanje, athembele kuphela esithembisweni sempumelelo ngaphandle kwenzuzo yezomnotho ebonakala.
Uma kukhona noma yiziphi izinkinga zokubambezeleka noma ukuphela kwenqubo yokudlulisa, lokhu kungase kubize kakhulu, kudle isikhathi, futhi mhlawumbe kungaphephile.
Abasebenzisi bebhizinisi kufanele bamukele ubuchwepheshe obukhethekile kakhulu, ngokuya ngabathengisi abathile abamaketha ngokucacile noma ngokungagunci izixazululo njengezindlu zedatha. Lokhu kungase kungasebenzi ngaso sonke isikhathi namanye amathuluzi axhunywe echibini ledatha enkabeni yesistimu, okwengeza ezinkingeni.
Ukwengeza, kungase kube nzima ukuhlinzeka ngezibalo ezingu-24/7 ngenkathi usebenzisa imisebenzi ebaluleke kakhulu yebhizinisi, edinga ingqalasizinda enokwehla okungabizi kakhulu.
Isiphetho
Izinhlobonhlobo ezintsha kakhulu zezikhungo zedatha eminyakeni yamuva nje i-lakehouse yedatha. Ihlanganisa imikhakha eyahlukene, njengobuchwepheshe bolwazi, isofthiwe yomthombo ovulekile, ngamafu, kanye namaphrothokholi esitoreji esabalalisiwe.
Ivumela amabhizinisi ukuthi agcine phakathi nendawo zonke izinhlobo zedatha kunoma iyiphi indawo, enze ukuphatha nokuhlaziya kube lula. I-Data Lakehouse ingumqondo omuhle othakazelisayo.
Noma iyiphi inkampani ibingaba nomkhawulo obalulekile wokuncintisana ukube ibikwazi ukufinyelela inkundla yedatha ehlangene yonke into eyayishesha futhi isebenza kahle njengenqolobane yedatha kuyilapho iguquguquka njengechibi ledatha.
Umbono usathuthuka futhi usalokhu umusha. Ngenxa yalokho, kungase kuthathe isikhathi ukunquma ukuthi into ingase isabalale noma cha.
Sonke kufanele sibe nelukuluku lokwazi mayelana nendlela iLakehouse architecture eya ngakhona.
shiya impendulo