Izindlu zedatha zidibanisa indawo yokugcina idatha kunye neengcamango zechibi ledatha kumashishini.
Ezi zixhobo zikuvumela ukuba wakhe izisombululo zokugcina idatha engabizi kakhulu ngokudibanisa amandla okulawula amachibi edatha kunye nolwakhiwo lwedatha olufunyenwe kwiindawo zokugcina idatha.
Ukongeza, kukho ukuncipha kokufuduka kwedatha kunye nokungafuneki, ixesha elincinci elichithwe kulawulo, kwaye i-schema emfutshane kunye neenkqubo zolawulo lwedatha ngokwenene zibe yinyani.
Enye i-lakehouse yedatha ineenzuzo ezininzi xa kuthelekiswa nenkqubo yokugcina enezisombululo ezininzi.
Ezi zixhobo zisasetyenziswa zizazinzulu zedatha ukuphucula ukuqonda kwabo ubukrelekrele beshishini kunye neenkqubo zokufunda koomatshini.
Eli nqaku liza kujonga ngokukhawuleza kwi-lakehouse yedatha, amandla ayo, kunye nezixhobo ezikhoyo.
Intshayelelo kwi-Data Lakehouse
Uhlobo olutsha loyilo lwedatha olubizwa ngokuba "data lakehouse” idibanisa ichibi ledatha kunye nendawo yokugcina idatha ukulungisa ubuthathaka bomntu ngamnye ngokuzimeleyo.
Inkqubo ye-lakehouse, njengamachibi edatha, isebenzisa ukugcinwa kwexabiso eliphantsi ukugcina inani elikhulu ledatha kwimo yayo yangaphambili.
Ukongezwa koluhlu lwemethadatha phezulu kwevenkile lubonelela ngesakhiwo sedatha kunye nokuxhobisa izixhobo zokulawula idatha ezifana nezo zifunyenwe kwiindawo zokugcina idatha.
Iqulethe izixa ezikhulu zedatha eyakhiweyo, engabelwanga, kunye nengacwangciswanga efunyenwe kwizicelo ezahlukeneyo zoshishino, iinkqubo, kunye nezixhobo ezisetyenziswa kulo lonke ishishini.
Ngenxa yoko, ngokungafaniyo namachibi edatha, inkqubo ye-lakehouse inokulawula kunye nokwandisa loo datha yokusebenza kweSQL.
Ikwanako nokugcina kwaye isebenze isixa esikhulu sedatha eyahlukeneyo ngexabiso eliphantsi kunendawo yokugcina idatha.
I-lakehouse yedatha ifika ngokufanelekileyo xa ufuna ukwenza naluphi na ukufikelela kwedatha okanye uhlalutyo oluchasene nayo nayiphi na idatha kodwa awuqinisekanga ngedatha okanye uhlalutyo olucetyiswayo.
Uyilo lwe-lakehouse luya kusebenza kakuhle ukuba ukusebenza ayisiyiyo eyona nkxalabo iphambili.
Oko akuthethi ukuba kufuneka usekele isakhiwo sakho sonke kwi-lakehouse.
Ulwazi oluthe kratya malunga nendlela yokukhetha ichibi ledatha, i-lakehouse, indawo yokugcina idatha, okanye isiseko sedatha esikhethekileyo sosetyenziso ngalunye sinokufunyanwa. Apha.
Iimpawu zeDatha Lakehouse
- Ukufunda nokubhala idatha ngaxeshanye
- Ukuguquguquka kunye nokuqina
- Uncedo lweSchema ngezixhobo zolawulo lwedatha
- Ukufunda nokubhala idatha ngaxeshanye
- Ugcino olufikelelekayo
- Zonke iintlobo zedatha kunye neefomathi zefayile zixhaswa.
- Ukufikelela kwisayensi yedatha kunye nezixhobo zokufunda zoomatshini eziphuculweyo
- Amaqela akho edatha aya kuxhamla ekufikeleleni kwinkqubo enye yokudlulisa umthwalo wemisebenzi ngayo ngokukhawuleza nangokuchanekileyo.
- Izakhono zexesha lokwenyani zamaphulo kwisayensi yedatha, ukufundwa koomatshini, kunye nohlalutyo
Top 5 Data Lakehouse izixhobo
Izitena zedatha
I-Databricks, eyasungulwa ngumntu owaqala ukuphuhlisa i-Apache Spark waza wayenza Vula Umnikezi, ibonelela ngenkonzo ye-Apache Spark elawulwayo kwaye ibekwe njengeqonga lamachibi edatha.
Ichibi ledatha, ichibi le-delta, kunye namacandelo enjini ye-delta ye-Databricks lakehouse architecture yenza ubukrelekrele beshishini, isayensi yedatha, kunye neemeko zokusetyenziswa komatshini.
Ichibi ledatha yindawo yokugcina ilifu yoluntu.
Ngenkxaso yolawulo lweemethadatha, i-batch kunye nokuhanjiswa kwedatha yedatha ye-multi-structured datasets, ukufunyanwa kwedatha, ukulawulwa kokufikelela okukhuselekileyo, kunye nohlalutyo lwe-SQL.
I-Databricks inikezela uninzi lwemisebenzi yokugcina idatha umntu unokulindela ukuyibona kwiqonga le-lakehouse yedatha.
I-Databricks isanda kutyhila i-Auto Loader yayo, eyenza i-ETL kunye nokufakwa kwedatha kunye ne-leverages ye-sampling yedatha ukukhupha i-schema kwiintlobo zedatha yedatha, ukuze kunikezelwe kumacandelo abalulekileyo kwisicwangciso sokugcina i-data yedatha.
Ngenye indlela, abasebenzisi banokwakha imibhobho ye-ETL phakathi kwechibi ledatha yelifu likawonkewonke kunye neDelta Lake usebenzisa iDelta Live Tables.
Ephepheni, i-Databricks ibonakala inazo zonke iingenelo, kodwa ukuseta isisombululo kunye nokudala imibhobho yayo yedatha kufuna umsebenzi omningi wabantu kubaphuhlisi abanezakhono.
Kwinqanaba, impendulo nayo iba nzima ngakumbi. Inzima kakhulu kunokuba ibonakala.
Ahana
Ichibi ledatha yindawo enye, esembindini apho unokugcina naluphi na uhlobo lwedatha oyikhethayo kwisikali, kubandakanya idatha engacwangciswanga kunye neyakhiwe. I-AWS S3, iMicrosoft Azure, kunye ne-Google Cloud Storage ngamachibi amathathu edatha eqhelekileyo.
Amachibi edatha athandwa kakhulu kuba afikeleleka kakhulu kwaye kulula ukuyisebenzisa; ungagcina uninzi lwalo naluphi na uhlobo lwedatha njengoko uthanda ngemali encinci kakhulu.
Kodwa ichibi ledatha aliboneleli ngezixhobo ezakhelwe ngaphakathi njenge-analytics, umbuzo, njl.
Udinga i-injini yombuzo kunye nekhathalogu yedatha phezu kwechibi ledatha (apho i-Ahana Cloud ingena khona) ukubuza idatha yakho kwaye uyisebenzise.
Ngeyona nto ingcono kwiNdawo yokuGcina iDatha kunye neDatha yeDatha, uyilo olutsha lwe-lakehouse yedatha luye lwaphuhliswa.
Oku kubonisa ukuba kuyabonakala, kuguquguqukayo, kunexabiso elihle / ukusebenza, izikali ezifana nechibi ledatha lixhasa ukuthengiselana, kwaye linomgangatho ophezulu wokhuseleko olufaniswa nokugcinwa kwedatha.
Injini yakho yombuzo esebenza kakhulu yeSQL bubuchopho obusemva kweDatha Lakehouse. Ngenxa yoku, unokwenza uhlalutyo lwentsebenzo ephezulu kwidatha yechibi lakho ledatha.
I-Ahana Cloud ye-Presto yi-SaaS ye-Presto kwi-AWS, iyenza ibe lula kakhulu ukuqala ukusebenzisa i-Presto efini.
Kwichibi lakho ledatha esekwe kwi-S3, i-Ahana sele inekhathalogu yedatha eyakhelweyo kunye ne-caching. I-Ahana ikunika iimpawu ze-Presto ngaphandle kokufuna ukuba uphathe i-overhead kuba iyenza ngaphakathi.
I-AWS Lake Formation, i-Apache Hudi, kunye ne-Delta Lake zimbalwa zabaphathi bentengiselwano abayinxalenye yesitaki kwaye badibanise nayo.
Dremio
Imibutho ifuna ukukhawuleza, ngokulula, nangokufanelekileyo izixa ezikhulu zedatha enyuka ngokukhawuleza.
I-Dremio ikholelwa ukuba i-lakehouse yedatha evulekileyo idibanisa izibonelelo zamachibi edatha kunye neendawo zokugcina idatha kwisiseko esivulekileyo yindlela efanelekileyo yokufezekisa oku.
Iqonga le-lakehouse likaDremio libonelela ngamava asebenzela wonke umntu, nge-UI elula evumela abasebenzisi ukuba bagqibezele uhlalutyo kwiqhezu lexesha.
I-Dremio Cloud, iqonga le-lakehouse elawulwa ngokupheleleyo, kunye nokuqaliswa kweenkonzo ezimbini ezintsha: i-Dremio Sonar, i-lakehouse injini yombuzo, kunye ne-Dremio Arctic, i-megastore ehlakaniphile ye-Apache Iceberg enikezela ngamava afana ne-Git kwi-lakehouse.
Yonke imithwalo yemisebenzi ye-SQL yombutho inokuqhutywa kwiplatifti ye-Dremio Cloud engapheliyo, eyenzela ngokuzenzekelayo imisebenzi yolawulo lwedatha.
Yakhelwe iSQL, inikezela ngamava afana neGit, ngumthombo ovulekileyo, kwaye ihlala isimahla.
Bayidale ukuba ibe liqonga le-lakehouse apho amaqela edatha ayithandayo.
Ukusebenzisa itafile yemithombo evulekileyo kunye neefomati zefayile ezifana ne-Apache Iceberg kunye ne-Apache Parquet, idatha yakho iqhubekile kwisitoreji sakho sedatha echibini xa usebenzisa i-Dremio Cloud.
Izinto ezintsha ezizayo zinokumkelwa ngokulula, kwaye injini efanelekileyo inokukhethwa ngokusekelwe kumthwalo wakho womsebenzi.
Snowflake
I-Snowflake yidatha yelifu kunye neqonga lohlalutyo elinokuthi lihlangabezane neemfuno zamachibi edatha kunye neendawo zokugcina iimpahla.
Yaqala njengenkqubo yokugcina idatha eyakhelwe kwiziseko zelifu.
Iqonga liquka indawo yokugcina indawo esembindini ehlala phezulu kwindawo yokugcina ilifu likawonke-wonke ukusuka kwi-AWS, iMicrosoft Azure, okanye i-Google Cloud Platform (GCP).
Ukulandela oko kuluhlu lwee-multi-cluster computation, apho abasebenzisi banokuqalisa indawo yokugcina idatha ebonakalayo kwaye baqhube imibuzo ye-SQL ngokuchasene nokugcinwa kwedatha.
I-architecture ivumela ukudibanisa ukugcinwa kunye nezixhobo zokubala, ukuvumela imibutho ukuba ilinganise ezimbini ngokuzimeleyo njengoko kufuneka.
Ekugqibeleni, i-Snowflake inikezela ngoluhlu lwenkonzo ngokuhlelwa kwemethadatha, ulawulo lwemithombo, ulawulo lwedatha, ukuthengiselana, kunye nezinye iimpawu.
Izixhumi zesixhobo se-BI, ulawulo lwemethadatha, ulawulo lokufikelela, kunye nemibuzo ye-SQL zizinto ezimbalwa zokugcinwa kwedatha yokusebenza kweqonga eligqwesileyo ekunikezeni.
I-snowflake, nangona kunjalo, ithintelwe kwi-injini yombuzo esekelwe kwi-SQL enye.
Ngenxa yoko, kuba lula ukulawula kodwa ngaphantsi ukulungelelaniswa, kwaye imodeli yedatha ezininzi umbono echibini ayiqondwa.
Ukongezelela, ngaphambi kokuba idatha esuka kwisitoreji sefu ikhangelwe okanye ihlalutywe, i-Snowflake ifuna amashishini ukuba ayilayishe kwindawo yokugcina ephakathi.
Inkqubo yokwenziwa kwemibhobho yedatha ifuna i-ETL yangaphambili, ukubonelela, kunye nokufomathwa kwedatha phambi kokuba ihlolwe. Ukwandisa ezi nkqubo zezandla kuzenza ziphazamiseke.
Enye inketho ebonakala ifanelekile ephepheni kodwa eneneni, iyatenxa kumgaqo-siseko wedatha yedatha yedatha ye-Lakehouse ye-Snowflake.
Oracle
Uyilo lwangoku, oluvulekileyo olwaziwa ngokuba yi "data lakehouse" yenza kube lula ukugcina, ukuqonda, kunye nokuhlalutya yonke idatha yakho.
Eyona nto ithandwayo evulekileyo yedatha yezisombululo zedatha ububanzi kunye nokuguquguquka zidityaniswe namandla kunye nobunzulu beendawo zokugcina idatha.
Izikhokelo ze-AI ezitsha kunye neenkonzo ze-AI ezakhelwe kwangaphambili zingasetyenziswa kunye ne-lakehouse yedatha kwi-Oracle Cloud Infrastructure (OCI).
Kuyenzeka ukusebenza kunye neentlobo ezongezelelweyo zedatha ngelixa usebenzisa i-data evulekileyo yedatha yechibi. Kodwa ixesha kunye nomgudu ofunekayo ukuwulawula unokuba yintsilelo eqhubekayo.
I-OCI inikezela ngeenkonzo zomthombo ovulekileyo we-lakehouse olawulwa ngokupheleleyo kumazinga aphantsi kunye nolawulo oluncinci, kukuvumela ukuba ulindele iindleko zokusebenza ezisezantsi, ukuncipha okungcono kunye nokhuseleko, kunye namandla okuhlanganisa yonke idatha yakho ekhoyo kwindawo enye.
I-lakehouse yedatha iya kwandisa ixabiso leendawo zokugcina idatha kunye neemarike, eziyimfuneko kumashishini aphumeleleyo.
Idatha inokufunyanwa ngokusebenzisa i-lakehouse kwiindawo ezininzi kunye nombuzo omnye we-SQL.
Iinkqubo ezikhoyo kunye nezixhobo zifumana ukufikelela elubala kuyo yonke idatha ngaphandle kokufuna uhlengahlengiso okanye ukufumana izakhono ezitsha.
isiphelo
Ukuqaliswa kwedatha yezisombululo ze-lakehouse yimbonakaliso yendlela enkulu kwidatha enkulu, edibanisa uhlalutyo kunye nokugcinwa kwedatha kwiiplatifti zedatha ezidibeneyo zokwandisa ixabiso leshishini kwidatha ngelixa unciphisa ixesha, iindleko, kunye nobunzima bokukhutshwa kwexabiso.
Amaqonga aquka iDatabricks, Snowflake, Ahana, Dremio, kunye ne-Oracle zonke ziye zadityaniswa nombono we "data lakehouse," kodwa nganye ineseti ekhethekileyo yeempawu kunye notyekelo lokusebenza ngakumbi njengendawo yokugcina idatha kunechibi ledatha yokwenyani. iphelele.
Xa isisombululo sithengiswa njenge "data lakehouse," amashishini kufuneka alumkele ukuba kuthetha ukuthini.
Amashishini kufuneka ajonge ngaphaya kwejargon yokuthengisa njenge "data lakehouse" kwaye endaweni yoko ajonge kwiifitsha zeqonga ngalinye ukukhetha elona qonga ledatha liya kwanda kunye namashishini abo kwixesha elizayo.
Shiya iMpendulo