Isiqulatho[Fihla][Bonisa]
Iinkampani zibamba idatha eninzi kunanini na ngaphambili njengoko zithembela ngakumbi kuyo ukwazisa izigqibo ezibalulekileyo zeshishini, ukuphucula ukunikezelwa kweemveliso, kunye nokubonelela ngenkonzo engcono yabathengi.
Ngobungakanani bedatha eyenziwe ngesantya esicacileyo, ilifu libonelela ngeenzuzo ezininzi zokusetyenzwa kwedatha kunye nohlalutyo, kubandakanya ukulinganisa, ukuxhomekeka, kunye nokufumaneka.
Kwi-ecosystem yelifu, kukwakho nezixhobo ezininzi kunye neetekhnoloji zokusetyenzwa kwedatha kunye nohlalutyo. Iindidi ezimbini zezakhiwo ezinkulu zokugcina idatha ezisetyenziswa rhoqo ziindawo zokugcina idatha kunye namachibi edatha.
Nangona ukusebenzisa i-data echibini akuthandeki kangako kuba awukwazi ukubuza imodeli kunye nedatha ngelixa isasebenza, ukusebenzisa indawo yokugcina idatha yokuhambisa idatha kuyinkcitho.
Wloluphi uhlobo lolwakhiwo lwamafu esilukhethayo?
Ngaba kufuneka sithathele ingqalelo iikhonsepthi ezintsha ze-lakehouse yedatha, okanye kufuneka saneliseke yimiqobo yendawo yokugcina okanye izithintelo zechibi?
I-architecture yokugcina idatha entsha ebizwa ngokuba yi "data lakehouse" idibanisa ukulungelelaniswa kwamachibi edatha kunye nokulawulwa kwedatha yokugcina idatha.
Ukuqonda iindlela ezahlukeneyo zokugcina idatha enkulu kubalulekile ekwakheni umbhobho othembekileyo wokugcina idatha kwishishini lobukrelekrele (BI), uhlalutyo lwedatha, kunye yokufunda umatshini (ML) imithwalo yomsebenzi, ngokuxhomekeke kwiimfuno zenkampani yakho.
Kule post, siza kujonga ngokusondeleyo kwi-Data Warehouse, i-Data Lake, kunye ne-Data Lakehouse, kunye neenzuzo, ukulinganiselwa kunye neenzuzo kunye neengxaki zabo. Masiqale.
Yintoni iData Warehouse?
Uvimba wedatha nguvimba wedatha osembindini osetyenziswa ngumbutho ukugcina umthamo omkhulu wedatha evela kwimithombo emininzi. Indawo yokugcina idatha isebenza njengomthombo omnye wombutho "wenyaniso yedatha" kwaye ibalulekile ekunikezeni ingxelo kunye nohlalutyo lweshishini.
Ngokuqhelekileyo, iindawo zokugcina idatha zidibanisa iiseti zedatha ezihambelanayo ezivela kwimithombo emininzi, njengesicelo, ishishini, kunye nedatha yokuthengiselana, ukugcina idatha yembali. Ngaphambi kokuba ilayishwe kwinkqubo yokugcina impahla, idatha iguqulwa kwaye icocwe kwiindawo zokugcina idatha ukuze isetyenziswe njengomthombo omnye wenyaniso yedatha.
Ngenxa yomthamo wabo wokubonelela ngokukhawuleza ulwazi lweshishini oluvela kuzo zonke iindawo zenkampani, amashishini atyala imali kwiindawo zokugcina idatha. Ngokusetyenziswa kwezixhobo zeBI, abathengi beSQL, kunye nezinye ezingaphucukanga (okt, non-data science) izisombululo zohlalutyo, Abahlalutyi beshishini, iinjineli zedatha, kunye nabenzi bezigqibo banokufikelela kwidatha kwiindawo zokugcina idatha.
Kuyabiza ukugcina indawo yokugcina impahla enomthamo osoloko ukhula wedatha, kwaye indawo yokugcina idatha ayikwazi ukuphatha idatha ekrwada okanye engacwangciswanga. Ukongeza, ayisiyiyo inketho efanelekileyo yobuchule bokuhlalutya idatha njengokufunda koomatshini okanye imodeli eqikelelweyo.
Indawo yokugcina idatha, ke ngoko, ibonelela ngeempendulo zemibuzo ekhawulezayo kunye nedatha ekumgangatho ophezulu. Umbuzo oMkhulu kaGoogle, i-Amazon Redshift, i-Azure SQL Data warehouse, kunye ne-Snowflake ziinkonzo zamafu ezifumaneka kwiindawo zokugcina idatha.
Okuzuzwayo kwiNdawo yokugcina iDatha
- Ukwandisa ukusebenza kakuhle kunye nesantya sobukrelekrele beshishini kunye nohlalutyo lwedatha yomsebenzi: Iindawo zokugcina idatha zifinyeza ixesha elifunekayo ukulungiselela nohlalutyo lwedatha. Bangakwazi ukudibanisa ngokulula uhlalutyo lwedatha kunye nezixhobo zobuntlola beshishini ekubeni idatha esuka kwindawo yokugcina idatha inokuthenjwa kwaye iyahambelana. Ukongezelela, iindawo zokugcina idatha zigcina ixesha elifunekayo lokuqokelela idatha kwaye zinike amaqela amandla okusebenzisa idatha kwiingxelo, iideshibhodi, kunye nezinye iimfuno zohlalutyo.
- Ukwandisa ukungaguquguquki, umgangatho, kunye nomgangatho wedatha: Imibutho iqokelela iinkcukacha kwimithombo eyahlukeneyo, kuquka umsebenzisi, intengiso, kunye nedatha yentengiselwano. Ifemu inokuthembela kwidatha yeemfuno zoshishino kuba ukugcinwa kwedatha kuhlanganisa idatha yenkampani ibe yifomu efanayo, ifomati esemgangathweni enokuthi isebenze njengomthombo omnye wenyaniso yedatha.
- Ukuphucula ukwenziwa kwezigqibo ngokubanzi: Ugcino lwedatha luququzelela ukwenziwa kwezigqibo ezingcono ngokunikezela ngevenkile esembindini kuzo zombini iinkcukacha zamva nje nezindala. Ngokusetyenzwa kwedatha kwiindawo zokugcina idatha ukuze bafumane ulwazi oluchanekileyo, abenzi bezigqibo banokuhlola umngcipheko, baqonde iimfuno zabaxhasi, kwaye baphucule iimpahla kunye neenkonzo.
- Ukubonelela ngobukrelekrele boshishino olungcono: Ugcino lwedatha luvala umsantsa phakathi kwedatha enkulu ekrwada, ehlala iqokelelwa ngokwesiqhelo njengesiqhelo, kunye nedatha egciniweyo enika ulwazi. Basebenza njengesiseko sokugcinwa kwedatha yentlangano, okwenza ukuba iphendule imibuzo enzima malunga nedatha yayo kwaye isebenzise iimpendulo ukwenza izigqibo zoshishino ezinokukhuseleka.
Unyino lweNdawo yokugcina iDatha
- Ukunqongophala kokuguquguquka kwedatha: Ngelixa iindawo zokugcina idatha zigqwesa ekuphatheni idatha ecwangcisiweyo, iifomathi zedatha ezakhiwe kancinci kunye nezingalungiswanga ezifana nohlalutyo lwelogi, ukusasazwa, kunye nedatha yemidiya yoluntu kunokuba ngumngeni kubo. Oku kwenza isincomo sogcino lwedatha kumatyala okusetyenziswa abandakanya ukufunda koomatshini kunye kukubhadla okungeyonyani Kunzima.
- Iindleko zokufakela kunye nokugcina: Iindawo zokugcina iinkcukacha zinokubiza ukuzifaka nokuzilungisa. Ngaphaya koko, indawo yokugcina idatha ayisoloko ime ndawonye; iyaguga kwaye ifuna ukulungiswa rhoqo, nto leyo ebiza imali eninzi.
eziluncedo
- Idatha ilula ukuyifumana, ukufunyanwa, kunye nokubuza.
- Ngethuba nje idatha sele icocekile, ukulungiswa kwedatha yeSQL kulula.
neengozi
- Unyanzelekile ukuba usebenzise umthengisi omnye wohlalutyo.
- Ukuhlalutya kunye nokugcina idatha engacwangciswanga okanye ejikelezayo kubiza kakhulu.
Yintoni i-Data Lake?
Lonke uhlobo lwedatha luthenjisiwe kwaye lwenziwe lunokwenzeka ngamachibi edatha. Kuyinzuzo ukuba nedatha ngendlela efikelelekayo ebekwe kwindawo esembindini nefumanekayo ukuze ifundwe.
Ichibi ledatha yindawo ephakathi, eguquguqukayo kakhulu yokugcina apho umthamo omkhulu wedatha ecwangcisiweyo nengacwangciswanga igcinwa kwiifomu zabo ezingalungiswanga, ezingatshintshwanga, kunye nezingalungiswanga.
Ichibi ledatha liqeshe ulwakhiwo olusicaba kunye nezinto ezigcinwe kwimeko yalo engalungiswanga ukugcina idatha, ngokuchaseneyo neendawo zokugcina idatha, ezigcina idatha yobudlelwane ebisele "icociwe."
Amachibi edatha, ngokuchasene neendawo zokugcina idatha, ezinobunzima bokuphatha idatha kule fomati, ziyakwazi ukulungelelaniswa, zithembeke, kwaye zifikeleleke kwaye zivumela amashishini ukuba afumane ingqiqo eyongeziweyo kwidatha engacwangciswanga.
Kumachibi edatha, idatha ikhutshwe, ilayishwe, kwaye iguqulwe (ELT) ngenjongo yokuhlalutya kunokuba ibe ne-schema okanye idatha esekwe ngexesha lokuqokelela idatha.
Ukusebenzisa itekhnoloji kwiintlobo ezininzi zedatha kwizixhobo ze-IoT, Imidiya yokuncokola, kunye nedatha yokusasaza, amachibi edatha enza ukuba kufundwe ngomatshini kunye nohlalutyo lokuxela kwangaphambili.
Ukongezelela, isazinzulu sedatha esinokusebenzisa idatha eluhlaza singasebenzisa ichibi ledatha. Uvimba wedatha, kwelinye icala, kulula ukuba amashishini asebenzise. Ifanelekile kwiprofayile yomsebenzisi, Uhlalutyo oluqikelelweyo, ukufunda koomatshini, kunye neminye imisebenzi.
Nangona amachibi edatha ajongana nemiba emininzi kunye neendawo zokugcina idatha, umgangatho wabo wedatha usezantsi kwaye isantya semibuzo yabo asanelanga. Ukongeza, kuthatha izixhobo ezongezelelweyo kubasebenzisi beshishini ukwenza imibuzo yeSQL. Ichibi ledatha elingacwangciswanga kakuhle linokuba nengxaki ngokuma kwedatha.
Izibonelelo zeDatha Lake
- Inkxaso yoluhlu olubanzi lokufunda koomatshini kunye neemeko zesicelo sesayensi yedatha Kulula ukusebenzisa umatshini ohlukeneyo kunye ne-algorithms yokufunda enzulu ukuphatha idatha kumachibi edatha ukususela ekubeni idatha igcinwe ngendlela evulekileyo, eluhlaza.
- Ukuguquguquka kwamachibi edatha, okukuvumela ukuba ugcine idatha kuyo nayiphi na ifomathi okanye imidiya ngaphandle kwemfuneko ye-schema esetwe kwangaphambili, yinzuzo enkulu. Amatyala okusetyenziswa kwedatha yexesha elizayo anokuxhaswa, kwaye idatha eninzi inokuhlalutywa ukuba idatha ishiywe kwimeko yayo yangaphambili.
- Ukuze ugweme ukugcina zombini iindidi zedatha kwiimeko ezahlukeneyo, amachibi edatha anokuqulatha zombini idatha eyakhiweyo kunye neyokungacwangciswanga. Ukugcinwa kweentlobo ezahlukeneyo zedatha yentlangano, banikezela indawo enye.
- Xa kuthelekiswa nokugcinwa kwedatha yemveli, amachibi edatha angabizi kakhulu ngenxa yokuba akhiwe ukuba agcinwe kwi-hardware yempahla engabizi, efana nokugcinwa kwezinto, ezihlala zilungele ixabiso eliphantsi ngegigabyte egciniweyo.
Unyino lweDatha Lake
- Uhlalutyo lwedatha kunye namatyala okusebenzisa ubukrelekrele boshishino amanqaku angalunganga: Amachibi edatha anokuba angalungelelaniswanga ukuba awagcinwanga ngokwaneleyo, nto leyo eyenza kube nzima ukuwadibanisa nobukrelekrele beshishini kunye nezixhobo zokuhlalutya. Ukongeza, xa kukho imfuneko yokuxela kunye nokuhlalutya iimeko zokusetyenziswa, ukungabikho kokuhambelana izakhiwo zedatha kunye ne-ACID (i-atomicity, i-consistency, i-isolation, kunye nokuqina) inkxaso yentengiselwano inokukhokelela ekusebenzeni okungaphantsi kombuzo.
- Ukungahambelani kwamachibi edatha kwenza kube nzima ukunyanzelisa ukuthembeka kwedatha kunye nokhuseleko, nto leyo ebangela ukunqongophala kokubini. Kunokuba nzima ukuphuhlisa ukhuseleko lwedatha efanelekileyo kunye nemigangatho yolawulo ukulungiselela iintlobo zedatha ezibuthathaka, ekubeni amachibi edatha anokusingatha nayiphi na ifom yedatha.
eziluncedo
- Izisombululo ezifikelelekayo kuzo zonke iintlobo zedatha.
- Iyakwazi ukuphatha idatha elungelelanisiweyo kunye ne-semi-structured.
- Ilungele ukusetyenzwa kwedatha enzima kunye nokusasazwa.
neengozi
- Kufuneka kwakhiwe umbhobho ophucukileyo.
- Nika idatha ixesha elithile ukuze ibuzeke.
- Kuthatha ixesha ukuqinisekisa ukuthembeka kwedatha kunye nomgangatho.
Yintoni iData Lakehouse?
Inoveli enkulu yokugcina idatha ebizwa ngokuba yi "data lakehouse" idibanisa imiba emikhulu yamachibi edatha kunye neendawo zokugcina idatha. Yonke idatha yakho, nokuba yakhiwe, yakhiwe kancinci, okanye ayilungiswanga, inokugcinwa kwindawo enye ngowona matshini ugqwesileyo wokufunda, ubukrelekrele beshishini, kunye nokusasaza amandla okubulela kwi-lakehouse yedatha.
Amachibi edatha azo zonke iintlobo adla ngokuba yindawo yokuqala yeechibi zedatha; emva koko, idatha iguqulwa ibe yifomati yeDelta Lake (indawo yokugcina umthombo ovulekileyo ozisa ukuthembeka kumachibi edatha).
Amachibi edatha anamachibi e-delta enza iinkqubo zentengiselwano ze-ACID ezivela kwiindawo zokugcina idatha eziqhelekileyo. Ngokwenyani, inkqubo ye-lakehouse isebenzisa ugcino olungabizi kakhulu ukugcina amanani amakhulu edatha kwiifom zabo zangaphambili, njengamachibi edatha.
Ukongeza i-metadata layer phezu kwevenkile iphinda inike ubume bedatha kwaye ixhobise izixhobo zokulawula idatha ezifana nezo zifunyenwe kwiindawo zokugcina idatha.
Oku kwenza ukuba amaqela amaninzi akwazi ukufikelela kuyo yonke idatha yenkampani ngenkqubo enye kumanyathelo ahlukeneyo, njengenzululwazi yedatha, ukufunda ngomatshini, kunye nobukrelekrele beshishini.
Iinzuzo zeData Lakehouse
- Inkxaso yoluhlu olukhulu lomsebenzi: Ukuququzelela uhlalutyo oluphucukileyo, i-lakehouses yedatha inika abasebenzisi ukufikelela ngokuthe ngqo kwezinye zezona zixhobo zidumileyo ze-intelligence shishini (iTableau, PowerBI). Ukongeza, izazinzulu zedatha kunye neenjineli zokufunda ngomatshini zinokusebenzisa ngokulula idatha kuba ii-lakehouses zedatha zisebenzisa iifomathi zedatha evulekileyo (ezifana neParquet) kunye nee-API kunye nesikhokelo sokufunda ngomatshini, njengePython/R.
- Iindleko-ukusebenza: Izindlu zedatha zisebenzisa izisombululo zokugcina izinto ezingabizi ukuphumeza iimpawu zokugcinwa kwamachibi edatha. Ngokunikezela ngesisombululo esinye, ii-lakehouses zedatha nazo zisusa iindleko kunye nexesha elihambelana nokulawula iinkqubo ezahlukeneyo zokugcina idatha.
- Uyilo lwe-lakehouse yedatha luqinisekisa i-schema kunye nokuthembeka kwedatha, okwenza kube lula ukwakha ukhuseleko lwedatha olusebenzayo kunye neenkqubo zolawulo. Ukulula kwe uguqulelo lwedatha, ulawulo nokhuseleko.
- I-lakehouses yedatha inikezela ngeqonga elilodwa, elinemisebenzi emininzi yokugcina idatha enokuthi ikwazi ukuhlangabezana nazo zonke iimfuno zedatha yenkampani, okunciphisa ukuphindaphinda kwedatha. Uninzi lwamashishini lukhetha isisombululo se-hybrid ngenxa yeenzuzo zombini indawo yokugcina idatha kunye nechibi ledatha. Esi sicwangciso, okwangoku, sinokukhokelela ekuphindaphindweni kwedatha eneendleko.
- Inkxaso yeefomati ezivulekileyo. Iifomathi ezivuliweyo ziintlobo zefayile ezinokusetyenziswa lusetyenziso lwesoftware ezininzi kwaye iinkcukacha zazo zifumaneka esidlangalaleni. Ngokweengxelo, iiLakehouses ziyakwazi ukugcina idatha kwiifom zefayile eziqhelekileyo ezifana ne-Apache Parquet kunye ne-ORC (i-Optimized Row Columnar).
Unyino lweData Lakehouse
A data lakehouse enkulu drawback kukuba isentsha kwaye ephuhlisa iteknoloji. Akuqinisekanga ukuba iya kuzalisekisa izibophelelo zayo ngenxa yoko. Ngaphambi kokuba iindawo zokugcina idatha zikhuphisane neenkqubo zokugcina idatha enkulu, kungathatha iminyaka.
Nangona kunjalo, xa kujongwa izinga okwenziwa ngalo uhlaziyo lwale mihla, kunzima ukutsho ukuba inkqubo yogcino lwedatha eyahlukileyo ayizukutshintsha ekugqibeleni.
eziluncedo
- Iqonga elinye linazo zonke iinkcukacha, nto leyo ethetha ukuba kukho amagama abamkeli abambalwa abanokuwagcina.
- I-atomity, ukungaguquguquki, ukuba yedwa, kunye nokuqina akuchaphazeli.
- Iyafikeleleka ngakumbi.
- Iqonga elinye linazo zonke iinkcukacha, nto leyo ethetha ukuba kukho amagama abamkeli abambalwa abanokuwagcina.
- Kulula ukulawula, kwaye ngokukhawuleza ukulungisa nayiphi na imiba
- Yenza kube lula ukwenza umbhobho
neengozi
- Ukuseta kunokuthatha ixesha.
- Incinci kakhulu kwaye ikude kakhulu ukuba ifaneleke njengenkqubo yokugcina emiselweyo.
Indawo yokugcina idatha Vs iDatha Lake Vs Data Lakehouse
Indawo yokugcina idatha inembali ende kwi-intelligence yenkampani, ingxelo, kunye nezicelo zokuhlalutya kwaye iyona teknoloji yokuqala yokugcina idatha enkulu.
Ugcino lwedatha, kwelinye icala, lunexabiso kwaye lunengxaki yokuphatha iidatha ezahlukeneyo kunye nezingacwangciswanga, njengokusasaza idatha. Ukufunda koomatshini kunye nomthwalo wesayensi yedatha, amachibi edatha aphuhlisiwe ukulawula idatha ekrwada kwiifom ezahlukeneyo kwisitoreji esifikelelekayo.
Nangona amachibi edatha esebenza kakuhle ngedatha engacwangciswanga, ayinayo i-ACID yokuthengiselana kweendawo zokugcina idatha, okwenza kube nzima ukuqinisekisa ukuhambelana kwedatha kunye nokuthembeka.
I-architecture entsha yokugcina idatha, eyaziwa ngokuba yi "data lakehouse," idibanisa ukuthembeka kunye nokuhambelana kweendawo zokugcina idatha kunye nokufikeleleka kunye nokuguquguquka kwamachibi edatha.
isiphelo
Ukuqukumbela, ukwakha i-lakehouse yedatha ukusuka ekuqaleni kunokuba nzima. Ngaphaya koko, phantse ngokuqinisekileyo uya kuba usebenzisa iqonga eliyilelwe ukuvumela ulwakhiwo lwe-lakehouse evulekileyo.
Ke ngoko, qaphela ukuphanda izinto ezininzi kunye nokuphunyezwa kweqonga ngalinye ngaphambi kokuthenga. Iinkampani ezifuna isisombululo sedatha esivuthiweyo, esicwangcisiweyo esigxininise kubukrelekrele beshishini kunye neemeko zokusetyenziswa kohlalutyo lwedatha zinokuqwalasela indawo yokugcina idatha.
Nangona kunjalo, amashishini afuna isisombululo esikhulu sedatha esinokwehla, esifikelelekayo kumandla omsebenzi wesayensi yedatha kunye nokufunda koomatshini kwiidatha ezingacwangciswanga kufuneka ziqwalasele amachibi edatha.
Qwalasela ukuba ishishini lakho lifuna idatha eninzi kunendawo yokugcina idatha kunye nedatha yetekhnoloji yechibi enokubonelela, okanye ukuba ufuna isisombululo sokudibanisa uhlalutyo oluphucukileyo kunye nokusebenza komatshini kwidatha yakho. A data lakehouse lukhetho olunengqiqo kwimeko.
Shiya iMpendulo