Amachibi edatha ahlanganisa inqolobane yedatha nemiqondo yechibi ledatha yamabhizinisi.
Lawa mathuluzi akuvumela ukuthi wakhe izixazululo zokulondoloza idatha ezingabizi kakhulu ngokuhlanganisa amandla okuphatha amachibi edatha nesakhiwo sedatha esitholakala ezinqolobaneni zedatha.
Ukwengeza, kukhona ukuncishiswa kokuthuthwa kwedatha nokungadingeki, isikhathi esincane esichithwa ekuqondiseni, futhi i-schema emfishane nezinqubo zokulawula idatha empeleni ziba ngokoqobo.
I-lakehouse eyodwa yedatha inezinzuzo eziningi uma iqhathaniswa nesistimu yokugcina enezixazululo eziningana.
Lawa mathuluzi asasetshenziswa ososayensi bedatha ukuthuthukisa ukuqonda kwabo ubuhlakani bebhizinisi nezinqubo zokufunda komshini.
Lesi sihloko sizobheka ngokushesha i-lakehouse yedatha, amandla ayo, namathuluzi atholakalayo.
Isingeniso se-Data Lakehouse
Uhlobo olusha lwezakhiwo zedatha olubizwa ngokuthi “idatha lakehouse” ihlanganisa i-data lake kanye nenqolobane yedatha ukuze kubhekwane nobuthakathaka bento ngayinye ngokuzimela.
Uhlelo lwe-lakehouse, njengamachibi edatha, lisebenzisa isitoreji esinezindleko eziphansi ukuze kugcinwe amanani amakhulu edatha kuhlobo lwayo lwangempela.
Ukwengezwa kwesendlalelo semethadatha phezulu kwesitolo kuhlinzeka ngesakhiwo sedatha futhi kunikeza amandla amathuluzi okuphatha idatha afana nalawo atholakala ezinqolobaneni zedatha.
Iqukethe amanani amakhulu edatha ehlelekile, enesakhiwo esincane, nengahlelekile etholwe ezinhlelweni ezihlukahlukene zebhizinisi, amasistimu, namadivayisi asetshenziswa kulo lonke ibhizinisi.
Njengomphumela, ngokungafani namachibi edatha, isistimu ye-lakehouse ingaphatha futhi ithuthukise leyo datha ekusebenzeni kwe-SQL.
Futhi inamandla okugcina nokucubungula amanani amakhulu edatha ehlukahlukene ngezindleko ezishibhile kunezindawo zokugcina idatha.
I-lakehouse yedatha isiza lapho udinga ukusebenzisa noma yikuphi ukufinyelela kwedatha noma izibalo ngokumelene nanoma iyiphi idatha kodwa ungaqiniseki ngedatha noma izibalo ezinconyiwe.
Isakhiwo se-lakehouse sizosebenza kahle uma ukusebenza kungeyona into esemqoka.
Lokho akusho ukuthi kufanele usekele sonke isakhiwo sakho phezu kwe-lakehouse.
Ulwazi olwengeziwe mayelana nendlela yokukhetha ichibi ledatha, i-lakehouse, inqolobane yedatha, noma isizindalwazi sezibalo ezikhethekile esimweni ngasinye sokusetshenziswa singatholakala. lapha.
Izici ze-Data Lakehouse
- Ukufunda nokubhala idatha ngesikhathi esisodwa
- Ukuzivumelanisa nezimo nokuqina
- Usizo lwe-schema ngamathuluzi okuphatha idatha
- Ukufunda nokubhala idatha ngesikhathi esisodwa
- Isitoreji esithengekayo
- Zonke izinhlobo zedatha namafomethi wefayela asekelwa.
- Ukufinyelela kusayensi yedatha namathuluzi okufunda omshini athuthukisiwe
- Amathimba akho edatha azozuza ngokufinyelela isistimu eyodwa ukuze adlulisele imithwalo yomsebenzi ngayo ngokushesha nangokunembile.
- Amandla esikhathi sangempela wezinhlelo zesayensi yedatha, ukufunda ngomshini, nezibalo
Top 5 Data Lakehouse amathuluzi
izitini idatha
I-Databricks, eyasungulwa umuntu owasungula i-Apache Spark futhi wayenza umthombo ovulekile, inikeza isevisi ephethwe ye-Apache Spark futhi ibekwe njengenkundla yamachibi edatha.
Ichibi ledatha, i-delta lake, kanye nezingxenye zenjini ye-delta ye-Databricks lakehouse architecture inika amandla ubuhlakani bebhizinisi, isayensi yedatha, nezimo zokufunda zomshini.
Ichibi ledatha liyindawo yokugcina ifu yomphakathi.
Ngokusekelwa kokuphathwa kwemethadatha, ukucutshungulwa kwedatha kweqoqo kanye nokusakaza-bukhoma kumadathasethi anezinhlaka eziningi, ukutholwa kwedatha, izilawuli zokufinyelela okuphephile, nezibalo ze-SQL.
I-Databricks inikezela ngemisebenzi eminingi yokugcina idatha umuntu angase alindele ukuyibona kuplathifomu ye-lakehouse yedatha.
I-Databricks isanda kwethula i-Auto Loader yayo, eyenza ngokuzenzakalelayo i-ETL nokufaka idatha futhi isebenzise amasampula edatha ukuze iveze i-schema yezinhlobo ezihlukahlukene zedatha, ukuze ilethe izingxenye ezibalulekile zesu lokugcina idatha.
Okunye, abasebenzisi bangakha amapayipi e-ETL phakathi kwechibi labo ledatha yefu lomphakathi kanye ne-Delta Lake besebenzisa i-Delta Live Tables.
Ephepheni, i-Databricks ibonakala inazo zonke izinzuzo, kodwa ukusetha isisombululo nokudala amapayipi ayo edatha kudinga umsebenzi omningi womuntu ovela kubathuthukisi abanekhono.
Esikalini, impendulo nayo iba yinkimbinkimbi. Kuyinkimbinkimbi kakhulu kunalokho kubonakala.
Ahana
Ichibi ledatha liyindawo eyodwa, emaphakathi lapho ungagcina khona noma yiluphi uhlobo lwedatha olukhethayo esikalini, okuhlanganisa idatha engahlelekile neyakhekile. I-AWS S3, iMicrosoft Azure, ne-Google Cloud Storage zingamachibi amathathu ajwayelekile edatha.
Amachibi edatha athandwa ngendlela emangalisayo ngoba ayathengeka kakhulu futhi kulula ukuwasebenzisa; ungakwazi ukugcina okuningi kwanoma yiluphi uhlobo lwedatha ngendlela othanda ngayo ngemali encane kakhulu.
Kodwa ichibi ledatha alinikezi ngamathuluzi akhelwe ngaphakathi njengezibalo, umbuzo, njll.
Udinga injini yombuzo nekhathalogi yedatha phezu kwechibi ledatha (lapho i-Ahana Cloud ingena khona) ukuze ubuze idatha yakho futhi uyisebenzise.
Ngokuhamba phambili kwakho kokubili kwe-Data Warehouse kanye ne-Data Lake, umklamo omusha we-lakehouse uthuthukile.
Lokhu kubonisa ukuthi isobala, iyavumelana nezimo, inenani/ukusebenza okuhle, izilinganiso ezifana nechibi ledatha elisekela ukuthengiselana, futhi inezinga eliphezulu lokuphepha eliqhathaniswa nenqolobane yedatha.
Injini yakho yemibuzo ye-SQL esebenza kahle kakhulu iwubuchopho ngemuva kweDatha Lakehouse. Ngenxa yalokhu, ungakwazi ukwenza ukuhlaziya kokusebenza okuphezulu kudatha yechibi lakho ledatha.
I-Ahana Cloud ye-Presto i-SaaS ye-Presto ku-AWS, ikwenza kube lula ngendlela emangalisayo ukuqala ukusebenzisa i-Presto efwini.
Echibini lakho ledatha elisekelwe ku-S3, i-Ahana isivele inekhathalogi yedatha eyakhelwe ngaphakathi kanye nenqolobane. I-Ahana ikunikeza izici zePresto ngaphandle kokudinga ukuthi uphathe phezulu ngoba ikwenza ngaphakathi.
I-AWS Lake Formation, i-Apache Hudi, ne-Delta Lake ingabaphathi abambalwa bokwenziwe abayingxenye yestaki futhi abahlanganiswe nayo.
I-Dremio
Izinhlangano zifuna ukuhlola ngokushesha, kalula, nangempumelelo amanani amakhulu edatha ekhuphuka ngokushesha.
U-Dremio ukholelwa ukuthi i-lakehouse yedatha evulekile ihlanganisa izinzuzo zamachibi edatha kanye nezindawo zokugcina idatha ngokuvulekile kuyindlela engcono kakhulu yokufeza lokhu.
Inkundla ye-lakehouse ka-Dremio ihlinzeka ngomuzwa osebenzela wonke umuntu, nge-UI elula evumela abasebenzisi ukuthi baqedele ukuhlaziya ngengxenye yesikhathi.
I-Dremio Cloud, inkundla yedatha ephethwe ngokugcwele, kanye nokwethulwa kwezinsizakalo ezimbili ezintsha: I-Dremio Sonar, injini yemibuzo ye-lakehouse, kanye ne-Dremio Arctic, i-megastore ehlakaniphile ye-Apache Iceberg eletha okuhlangenwe nakho okuyingqayizivele okufana ne-Git kwe-lakehouse.
Yonke imisebenzi ye-SQL yenhlangano ingaqhutshwa kuplathifomu ye-Dremio Cloud engaguquki, ephindaphindeka, ephinde isebenze ngokuzenzakalelayo imisebenzi yokuphatha idatha.
Yakhelwe i-SQL, inikeza ulwazi olufana ne-Git, iwumthombo ovulekile, futhi ihlala imahhala.
Bakudalele ukuthi kube inkundla ye-lakehouse amaqembu edatha ayithandayo.
Usebenzisa ithebula lomthombo ovulekile namafomethi wefayela njenge-Apache Iceberg ne-Apache Parquet, idatha yakho iphikelela kusitoreji sakho sedatha echibini lapho usebenzisa i-Dremio Cloud.
Izinto ezintsha ezizayo zingamukelwa kalula, futhi injini efanele ingakhethwa ngokusekelwe emsebenzini wakho.
Snowflake
I-Snowflake iyinkundla yedatha yamafu nezibalo engahlangabezana nezidingo zamachibi edatha nezindawo zokugcina impahla.
Iqale njengesistimu yokugcina idatha eyakhelwe engqalasizinda yamafu.
Inkundla ihlanganisa indawo yokugcina ephakathi nendawo ehlala phezu kwesitoreji samafu somphakathi esivela ku-AWS, Microsoft Azure, noma i-Google Cloud Platform (GCP).
Okulandela lokho ungqimba lwekhompiyutha enamaqoqo amaningi, lapho abasebenzisi bengaqalisa khona indawo yokugcina idatha ebonakalayo futhi baqhube imibuzo ye-SQL ngokumelene nesitoreji sabo sedatha.
I-architecture ivumela ukuhlukanisa izinsiza zokugcina nokubala, okuvumela izinhlangano ukuthi zilinganise kokubili ngokuzimela njengoba kudingeka.
Okokugcina, i-Snowflake inikeza isendlalelo sesevisi ngokuhlukaniswa kwemethadatha, ukuphathwa kwezinsiza, ukuphathwa kwedatha, imisebenzi, nezinye izici.
Izixhumi zamathuluzi e-BI, ukuphathwa kwemethadatha, izilawuli zokufinyelela, nemibuzo ye-SQL imbalwa nje yemisebenzi yenqolobane yedatha inkundla eyenza kahle kakhulu ekunikezeni.
I-snowflake, nokho, ikhawulelwe enjinini yombuzo eyodwa esekelwe ku-SQL.
Ngenxa yalokho, kuba lula ukuphatha kodwa kunciphe ukuguquguquka, futhi umbono wechibi wedatha wamamodeli amaningi awubonakali.
Ukwengeza, ngaphambi kokuthi idatha evela kusitoreji samafu iseshwe noma ihlaziywe, i-Snowflake idinga amabhizinisi ukuthi ayilayishe kungqimba lwesitoreji esimaphakathi.
Inqubo yokwenza amapayipi edatha okwenziwa ngesandla idinga i-ETL yangaphambili, ukunikezwa, nokufometha kwedatha ngaphambi kokuba ihlolwe. Ukwandisa lezi zinqubo ezenziwa ngesandla kuzenza zikhungathekise.
Enye inketho ebonakala ilingana kahle ephepheni kodwa empeleni, ichezuka kumgomo wechibi ledatha wokufaka idatha elula i-lakehouse yedatha ye-Snowflake.
Oracle
Izakhiwo zesimanjemanje, ezivulekile ezaziwa ngokuthi "i-lakehouse yedatha" zenza kube nokwenzeka ukugcina, ukuqonda, nokuhlaziya yonke idatha yakho.
Ububanzi kanye nokuguquguquka kwedatha yedatha yomthombo ovulekile kuhlanganiswe namandla nokujula kwezinqolobane zedatha.
Izinhlaka ezintsha ze-AI nezinsizakalo ze-AI ezakhiwe kusengaphambili zingasetshenziswa nedatha ye-lakehouse ku-Oracle Cloud Infrastructure (OCI).
Kuyenzeka ukusebenza ngezinhlobo ezengeziwe zedatha ngenkathi usebenzisa ichibi ledatha elinomthombo ovulekile. Kodwa isikhathi nomzamo odingekayo ukuze ukulawule kungase kube isihibe esiphikelelayo.
I-OCI inikeza izinsiza eziphethwe ngokugcwele ze-lakehouse ngamanani aphansi futhi ngokuphathwa okuncane, okukuvumela ukuthi ulindele izindleko eziphansi zokusebenza, ukukala okungcono nokuvikeleka, namandla okuhlanganisa yonke idatha yakho ekhona endaweni eyodwa.
I-lakehouse yedatha izokhuphula inani lezinqolobane zedatha kanye nama-marts, okubalulekile emabhizinisini aphumelelayo.
Idatha ingabuyiswa kusetshenziswa i-lakehouse ezindaweni ezimbalwa ngombuzo owodwa we-SQL.
Izinhlelo ezikhona namathuluzi zithola ukufinyelela okusobala kuyo yonke idatha ngaphandle kokudinga ukulungiswa noma ukuzuza amakhono amasha.
Isiphetho
Ukwethulwa kwezixazululo ze-lakehouse yedatha kuwukubonakaliswa kwethrendi enkulu kudatha enkulu, okuwukuhlanganiswa kokuhlaziya nokugcinwa kwedatha ezinkundleni zedatha ezihlanganisiwe ukuze kukhuliswe inani lebhizinisi kusuka kudatha kuyilapho kwehliswa isikhathi, izindleko, nobunkimbinkimbi bokukhishwa kwenani.
Amapulatifomu ahlanganisa i-Databricks, Snowflake, Ahana, Dremio, ne-Oracle wonke axhunywe embonweni “we-lakehouse yedatha,” kodwa ngayinye inesethi ehlukile yezici kanye nokuthambekela kokusebenza njengenqolobane yedatha kunechibi ledatha langempela. ngokuphelele.
Lapho ikhambi lithengiswa “njengendlu yedatha,” amabhizinisi kufanele aqaphele ukuthi lisho ukuthini ngempela.
Amabhizinisi adinga ukubheka ngale kwenkulumo-ze yokumaketha njengokuthi “i-lakehouse yedatha” futhi esikhundleni salokho abheke izici zenkundla ngayinye ukuze akhethe inkundla yedatha ehamba phambili ezonwebeka namabhizinisi abo esikhathini esizayo.
shiya impendulo