Okuqukethwe[Fihla][Bonisa]
Izinkampani zithwebula idatha eminingi kunangaphambili njengoba zithembela kakhulu kuyo ukuze zazise izinqumo ezibalulekile zebhizinisi, zithuthukise ukunikezwa kwemikhiqizo, futhi zinikeze isevisi yamakhasimende engcono.
Ngenani ledatha elidalwa ngezinga lokuchayeka, ifu linikeza izinzuzo ezimbalwa zokucubungula idatha nokuhlaziya, okuhlanganisa ukukala, ukwethembeka, nokutholakala.
Ku-ecosystem yamafu, kukhona namathuluzi ambalwa nobuchwepheshe bokucubungula idatha nokuhlaziya. Izinhlobo ezimbili zezakhiwo ezinkulu zokugcina idatha ezisetshenziswa kakhulu izindawo zokugcina idatha namachibi edatha.
Nakuba ukusebenzisa i-data lake kungakhangi kangako njengoba awukwazi ukubuza imodeli nedatha ngesikhathi kusasebenza, ukusebenzisa inqolobane yedatha ukusakaza isitoreji sedatha kuwukumosha.
Wyiluphi uhlobo lwezakhiwo zamafu esilukhethayo?
Ingabe kufanele sicabangele imiqondo emisha ye-lakehouse yedatha, noma kufanele saneliswe yizingqinamba zendlu yokugcina izinto noma imikhawulo yechibi?
Ukwakhiwa kwedatha yenoveli ebizwa ngokuthi "i-lakehouse yedatha" ihlanganisa ukuguquguquka kwedatha yamachibi nokuphathwa kwedatha yezindawo zokugcina idatha.
Ukuqonda izindlela eziningi zokugcina idatha ebalulekile kubalulekile ekwakheni ipayipi elithembekile lokugcina idatha ye-business intelligence (BI), ukuhlaziya idatha, kanye ukufunda imishini (ML) umthwalo wokusebenza, kuye ngezimfuno zenkampani yakho.
Kulokhu okuthunyelwe, sizobheka eduze i-Data Warehouse, i-Data Lake, ne-Data Lakehouse, ngezinzuzo, imikhawulo kanye nobuhle nobubi bayo. Ake siqale.
Iyini i-Data Warehouse?
Inqolobane yedatha iyinqolobane yedatha emaphakathi esetshenziswa inhlangano ukubamba imiqulu emikhulu yedatha evela emithonjeni eminingi. Inqolobane yedatha isebenza njengomthombo owodwa wenhlangano “weqiniso ledatha” futhi ibalulekile ekubikeni nasekuhlaziyeni ibhizinisi.
Ngokuvamile, izinqolobane zedatha zihlanganisa amasethi edatha ehlobene asuka emithonjeni embalwa, njengohlelo lokusebenza, ibhizinisi, nedatha yokwenziwayo, ukuze kugcinwe idatha yomlando. Ngaphambi kokulayishwa ohlelweni lokugcinwa kwempahla, idatha iyaguqulwa futhi ihlanzwe ezindaweni zokugcina idatha ukuze isetshenziswe njengomthombo owodwa weqiniso ledatha.
Ngenxa yekhono labo lokunikeza ngokushesha imininingwane yebhizinisi evela kuzo zonke izindawo zenkampani, amabhizinisi atshala imali ezindaweni zokugcina idatha. Ngokusetshenziswa kwamathuluzi e-BI, amaklayenti e-SQL, nezinye izisombululo zokuhlaziya ezingeyona yedatha (okungukuthi, okungezona ezesayensi yedatha), abahlaziyi bezebhizinisi, onjiniyela bedatha, nabenzi bezinqumo bangafinyelela idatha kusuka ezinqolobaneni zedatha.
Kuyabiza ukunakekela inqolobane enevolumu yedatha ehlala ikhula, futhi inqolobane yedatha ayikwazi ukuphatha idatha eluhlaza noma engahlelekile. Ukwengeza, akuyona inketho efanelekile yamasu okuhlaziya idatha ayinkimbinkimbi njengokufunda komshini noma ukumodela okubikezelwayo.
Ngakho-ke, inqolobane yedatha inikeza izimpendulo zemibuzo esheshayo nedatha yekhwalithi ephezulu. I-Google Big Query, Amazon Redshift, Azure SQL Data warehouse, kanye ne-Snowflake yizinsizakalo zamafu ezitholakala ezindaweni zokugcina idatha.
Izinzuzo ze-Data Warehouse
- Ukwandisa ukusebenza kahle kanye nesivinini sobuhlakani bebhizinisi kanye nedatha yokuhlaziya imisebenzi: Izinqolobane zedatha zifinyeza isikhathi esidingekayo sokulungiselela nokuhlaziya idatha. Bangakwazi ukuxhumana kalula nokuhlaziywa kwedatha namathuluzi obuhlakani bebhizinisi njengoba idatha evela kunqolobane yedatha ithembekile futhi ayiguquki. Ukwengeza, izinqolobane zedatha zonga isikhathi esidingekayo sokuqoqwa kwedatha futhi zinikeza amaqembu ikhono lokusebenzisa idatha yemibiko, amadeshibhodi, nezinye izimfuneko zezibalo.
- Ukwandisa ukungaguquguquki, ikhwalithi, nokumiswa kwedatha: Izinhlangano ziqoqa idatha kusuka emithonjeni eyahlukene, efaka umsebenzisi, ukuthengisa, nedatha yokwenziwayo. Inkampani ingathembela kudatha yezidingo zebhizinisi ngoba ukugcinwa kwedatha kuhlanganisa idatha yenkampani ibe yifomethi efanayo, esezingeni elingasebenza njengomthombo owodwa weqiniso ledatha.
- Ukuthuthukisa ukuthathwa kwezinqumo ngokujwayelekile: Ukugcinwa kwedatha kusiza ukuthathwa kwezinqumo okungcono ngokunikeza isitolo esimaphakathi sakho kokubili idatha yakamuva nendala. Ngokucubungula idatha ezinqolobaneni zedatha ukuze bathole imininingwane enembile, abenzi bezinqumo bangahlola ubungozi, baqonde akufunayo amaklayenti, futhi bathuthukise izimpahla namasevisi.
- Ukunikeza ubuhlakani bebhizinisi obungcono: Ukugcinwa kwedatha kuvala igebe phakathi kwedatha enkulu eluhlaza, evame ukuqoqwa njalo nje njengendaba, kanye nedatha ekhethiwe enikeza imininingwane. Zisebenza njengesisekelo sokugcinwa kwedatha yenhlangano, okuyenza ikwazi ukuphendula imibuzo eyinkimbinkimbi mayelana nedatha yayo futhi isebenzise izimpendulo ukuze yenze izinqumo zebhizinisi ezivikelekayo.
Imikhawulo Yenqolobane Yedatha
- Ukuntuleka kokuguquguquka kwedatha: Nakuba izinqolobane zedatha zihamba phambili ekuphatheni idatha ehlelekile, amafomethi edatha anesakhiwo esincane futhi angahlelekile afana nokuhlaziywa kwelogi, ukusakaza, nedatha yenkundla yezokuxhumana kungaba inselele kubo. Lokhu kwenza ukutusa izinqolobane zedatha ngamacala okusetshenziswa ahlanganisa ukufunda komshini kanye ukuhlakanipha okungekhona okwangempela kunzima.
- Kubiza kakhulu ukufaka nokunakekela: Izinqolobane zedatha zingabiza ukuzifaka nokuyinakekela. Ngaphezu kwalokho, i-warehouse yedatha ngokuvamile ayimi ndawonye; iyaguga futhi idinga ukunakekelwa njalo, okumba eqolo.
buhle
- Idatha kulula ukuyithola, ukuyithola, kanye nokubuza.
- Uma nje idatha isivele ihlanzekile, ukulungiswa kwedatha ye-SQL kulula.
bawo
- Uphoqeleka ukuthi usebenzise umthengisi oyedwa kuphela wezibalo.
- Ukuhlaziya nokugcina idatha engahlelekile noma egelezayo kubiza kakhulu.
Yini i-Data Lake?
Lonke uhlobo lwedatha luthenjisiwe futhi lwenziwa lwaba nokwenzeka ngamachibi edatha. Kuyinzuzo ukuba nedatha ngendlela efinyelelekayo ebekwe endaweni emaphakathi futhi etholakalayo ukuze ifundwe.
I-data Lake iyindawo yokugcina izinto emaphakathi, evumelana nezimo kakhulu lapho inqwaba yedatha ehleliwe nengahlelekile igcinwa ngamafomu ayo angacutshunguliwe, angashintshiwe, futhi angafomethiwe.
Ichibi ledatha lisebenzisa i-architecture eyisicaba kanye nezinto ezigcinwe esimweni sayo esingakacutshungulwa ukuze kugcinwe idatha, ngokuphambene nezindawo zokugcina idatha, ezilondoloza idatha yobudlelwane "eyahlanzwa."
Amachibi edatha, ngokungafani nezindawo zokugcina idatha, ezinobunzima bokuphatha idatha ngale fomethi, ayashintshashintsha, anokwethenjelwa, futhi ayathengeka futhi avumela amabhizinisi ukuthi athole ukuqonda okuthuthukisiwe kudatha engahlelekile.
Kumachibi edatha, idatha iyakhishwa, iyalayishwa, futhi iguqulwe (ELT) ngezinjongo zokuhlaziya kunokuba kube ne-schema noma idatha esungulwe ngesikhathi sokuqoqwa kwedatha.
Ukusebenzisa ubuchwepheshe bezinhlobo eziningi zedatha ezivela kumadivayisi we-IoT, social media, nokusakaza idatha, amachibi edatha anika amandla ukufunda komshini nokuhlaziya okubikezelayo.
Ukwengeza, usosayensi wedatha ongakwazi ukucubungula idatha eluhlaza angasebenzisa ichibi ledatha. Ngakolunye uhlangothi, i-warehouse yedatha kulula ukuthi isetshenziswe ngamabhizinisi. Ilungele ukwenza iphrofayili yomsebenzisi, i-analytics yokubikezela, ukufunda komshini, neminye imisebenzi.
Nakuba amachibi edatha abhekana nezinkinga ezimbalwa ngezinqolobane zedatha, ikhwalithi yawo yedatha iphansi futhi isivinini semibuzo yawo asanele. Ukwengeza, kuthatha amathuluzi engeziwe kubasebenzisi bebhizinisi ukwenza imibuzo ye-SQL. Ichibi ledatha elingakhiwe kahle lingase libe nenkinga ngokuma kwedatha.
Izinzuzo ze-Data Lake
- Ukusekela ububanzi obubanzi bokufunda komshini kanye nezimo zohlelo lokusebenza lwesayensi yedatha Kulula ukusebenzisa umshini ohlukile kanye ne-algorithms yokufunda okujulile ukuphatha idatha emachibini edatha njengoba idatha igcinwa ngendlela evulekile, engavuthiwe.
- Ukuhlukahluka kwedatha yechibi, okukuvumela ukuthi ugcine idatha kunoma iyiphi ifomethi noma imidiya ngaphandle kwesidingo se-schema esisethiwe, kuyinzuzo enkulu. Izimo zokusetshenziswa kwedatha yesikhathi esizayo zingasekelwa, futhi idatha eyengeziwe ingahlaziywa uma idatha ishiywe esimweni sayo sangempela.
- Ukuze ugweme ukugcina zombili izinhlobo zedatha kuzimo ezihlukahlukene, amachibi edatha angaqukatha kokubili idatha ehlelekile nengahlelekile. Ukuze kugcinwe izinhlobo ezihlukahlukene zedatha yenhlangano, zinikeza indawo eyodwa.
- Uma kuqhathaniswa nezindawo zokugcina idatha ezivamile, amachibi edatha awabizi kangako ngoba akhelwe ukugcinwa kuzingxenyekazi zekhompyutha zempahla engabizi, njengokugcinwa kwezinto, okuvamise ukulungiselelwa izindleko eziphansi ngegigabhayithi ngayinye egciniwe.
Imikhawulo ye-Data Lake
- Izibalo zedatha nezimo zokusebenzisa ubuhlakani bebhizinisi zinemiphumela emibi: Amachibi edatha angase angahlelekile uma enganakekelwa ngokwanele, okwenza kube nzima ukuwaxhumanisa nobuhlakani bebhizinisi namathuluzi okuhlaziya. Ukwengeza, uma kudingeka ekubikeni nasekusetshenzisweni kwezibalo, ukuntuleka kokuhambisana izakhiwo zedatha kanye ne-ACID (i-atomicity, ukungaguquguquki, ukuhlukaniswa, nokuqina) ukwesekwa kokuthengiselana kungaholela ekusebenzeni kombuzo okungenasisekelo.
- Ukungaguquguquki kwamachibi edatha kwenza kungenzeki ukuphoqelela ukwethembeka nokuphepha kwedatha, okuholela ekuntulekeni kwakho kokubili. Kungase kube nzima ukuthuthukisa ukuvikeleka kwedatha efanele kanye namazinga okuphatha ukuze kuhlinzekelwe izinhlobo zedatha ebucayi, njengoba amachibi edatha angakwazi ukuphatha noma yiliphi ifomu ledatha.
buhle
- Izixazululo ezithengekayo kuzo zonke izinhlobo zedatha.
- Iyakwazi ukuphatha idatha ehlelekile nesakhiwe kancane.
- Ilungele ukucutshungulwa kwedatha okuyinkimbinkimbi nokusakaza.
bawo
- Idinga ipayipi eliyinkimbinkimbi ukuze lakhiwe.
- Nikeza idatha isikhathi esithile ukuze ibuzeke.
- Kuthatha isikhathi ukuqinisekisa ukwethembeka nekhwalithi yedatha.
Iyini i-Data Lakehouse?
Ukwakhiwa kwenoveli enkulu yokugcinwa kwedatha ebizwa ngokuthi "i-lakehouse yedatha" ihlanganisa izici ezinhle kakhulu zamachibi edatha nezinqolobane zedatha. Yonke idatha yakho, kungakhathaliseki ukuthi ihlelekile, inesakhiwo esincane, noma esingahlelekile, ingagcinwa endaweni eyodwa ngokufunda komshini okuhle kakhulu, ubuhlakani bebhizinisi, namandla okusakaza okungenzeka ngenxa yedatha yedatha.
Amachibi edatha azo zonke izinhlobo ngokuvamile ayizindawo zokuqala zamachibi edatha; ngemva kwalokho, idatha iguqulelwa kufomethi ye-Delta Lake (ungqimba lwesitoreji esivulekile esiletha ukwethembeka kumachibi edatha).
Amachibi edatha anamachibi e-delta anika amandla izinqubo zokuthengiselana ze-ACID ezivela ezindaweni zokugcina idatha ezivamile. Empeleni, isistimu ye-lakehouse isebenzisa isitoreji esishibhile ukuze igcine amanani amakhulu edatha ngezindlela zayo zangempela, njengamachibi edatha.
Ukwengeza isendlalelo semethadatha phezulu kwesitolo kuphinda kunikeze ukwakheka kwedatha futhi kunikeze amandla amathuluzi okuphatha idatha afana nalawo atholakala ezinqolobaneni zedatha.
Lokhu kwenza ukuthi amaqembu amaningi akwazi ukufinyelela yonke idatha yenkampani ngohlelo olulodwa lwezinhlelo ezihlukahlukene, njengesayensi yedatha, ukufunda ngomshini, nobuhlakani bebhizinisi.
Izinzuzo ze-Data Lakehouse
- Ukusekela ububanzi obuningi bomsebenzi: Ukuze kube lula ukuhlaziya okuyinkimbinkimbi, izindawo zokugcina idatha zinikeza abasebenzisi ukufinyelela okuqondile kwamanye amathuluzi obuhlakani bebhizinisi aziwa kakhulu (Ithebula, i-PowerBI). Ukwengeza, ososayensi bedatha nonjiniyela bokufunda ngemishini bangasebenzisa idatha kalula njengoba izikhungo zedatha zisebenzisa amafomethi edatha evulekile (njenge-Parquet) kanye nama-API nezinhlaka zokufunda zemishini, njengePython/R.
- Ukusebenza kahle kwezindleko: Izindlu zamachibi zedatha zisebenzisa izixazululo zesitoreji sezinto ezishibhile ukuze zisebenzise izici zokulondoloza ezingabizi kakhulu zamachibi edatha. Ngokunikeza isixazululo esisodwa, izindawo zokugcina idatha nazo zisusa izindleko nesikhathi esihlobene nokuphatha amasistimu okugcina idatha.
- Idizayini ye-lakehouse yedatha iqinisekisa i-schema nobuqotho bedatha, okwenza kube lula ukwakha ukuphepha kwedatha okuphumelelayo nezinhlelo zokuphatha. Kalula ukuguqulwa kwedatha, ukubusa, nokuvikeleka.
- Izindlu zamachibi zedatha zinikeza inkundla eyodwa, enezinjongo eziningi zokugcina idatha engavumela zonke izimfuno zedatha yenkampani, okunciphisa ukuphindaphindwa kwedatha. Iningi lamabhizinisi likhetha isixazululo esiyingxube ngenxa yezinzuzo zakho kokubili indawo yokugcina idatha kanye nechibi ledatha. Leli su, ngakolunye uhlangothi, lingaholela ekuphindaphindweni kwedatha ebizayo.
- Ukusekelwa kwamafomethi avuliwe. Amafomethi avuliwe ayizinhlobo zamafayela angasetshenziswa izinhlelo zokusebenza eziningi zesofthiwe futhi okucaciswe kwawo kutholakala esidlangalaleni. Ngokwemibiko, i-Lakehouses iyakwazi ukugcina idatha kumafomethi wefayela afana ne-Apache Parquet ne-ORC (Optimized Row Columnar).
Imikhawulo ye-Data Lakehouse
Isiphazamiso esikhulu se-lakehouse ukuthi kusewubuchwepheshe obusha futhi obuthuthukayo. Akuqiniseki ukuthi izofeza yini izibopho zayo ngenxa yalokho. Ngaphambi kokuthi izindawo zokugcina idatha ziqhudelane nezinhlelo ezinkulu zokugcina idatha, kungathatha iminyaka.
Kodwa-ke, uma kubhekwa izinga okwenziwa ngalo ukuqanjwa kabusha kwesimanje, kunzima ukusho ukuthi uma ngabe isistimu ehlukile yokugcina idatha ngeke ithathe indawo yayo ekugcineni.
buhle
- Inkundla eyodwa inayo yonke idatha, okusho ukuthi kunamagama omethuleli ambalwa okufanele agcinwe.
- I-atomicity, ukungaguquguquki, ukuhlukaniswa, kanye nokuqina akuthintwa.
- Iyathengeka kakhulu.
- Inkundla eyodwa inayo yonke idatha, okusho ukuthi kunamagama omethuleli ambalwa okufanele agcinwe.
- Kulula ukuphatha, futhi kuyashesha ukulungisa noma yiziphi izinkinga
- Kwenze kube lula ukwakha ipayipi
bawo
- Ukusetha kungase kuthathe isikhathi.
- Incane kakhulu futhi ikude kakhulu ukuthi ifaneleke njengohlelo olumisiwe lokugcina.
I-Data Warehouse Vs Data Lake Vs Data Lakehouse
Indawo yokugcina idatha inomlando omude kwezobuhlakani benkampani, ukubika, nezinhlelo zokusebenza zezibalo futhi iwubuchwepheshe bokuqala bokugcina idatha.
Ngakolunye uhlangothi, izindawo zokugcina idatha zibiza futhi zinenkinga yokuphatha idatha ehlukahlukene nengahlelekile, njengokusakaza idatha. Ngomthwalo wokufunda komshini nesayensi yedatha, amachibi edatha athuthukiswa ukuze alawule idatha eluhlaza ngamafomu ahlukahlukene kusitoreji esithengekayo.
Nakuba amachibi edatha esebenza kahle ngedatha engahlelekile, ayinawo amandla okuhweba e-ACID ezindawo zokugcina idatha, okwenza kube inselele ukuqinisekisa ukuvumelana kwedatha nokwethembeka.
Isakhiwo esisha sha sokugcina idatha, okwaziwa ngokuthi "i-lakehouse yedatha," sihlanganisa ukwethembeka nokuvumelana kwezindawo zokugcina idatha nokufinyeleleka nokuvumelana nezimo kwamachibi edatha.
Isiphetho
Sengiphetha, ukwakha i-lakehouse yedatha kusukela ekuqaleni kungase kube nzima. Ngaphezu kwalokho, cishe uzosebenzisa inkundla eklanyelwe ukunika amandla ukwakheka kwe-lakehouse yedatha evulekile.
Ngakho-ke, qaphela ukuphenya izici eziningi nokusetshenziswa kwenkundla ngayinye ngaphambi kokuthenga. Izinkampani ezifuna isixazululo sedatha esivuthiwe, ehlelekile ngokugxila kubuhlakani bebhizinisi namacala okusebenzisa ukuhlaziya idatha zingacabangela indawo yokugcina idatha.
Kodwa-ke, amabhizinisi afuna isixazululo sedatha esikhulu esingalawuleki, esithengekayo sokulayisha amandla esayensi yedatha nokufunda komshini kudatha engahlelekile kufanele acabangele amachibi edatha.
Cabangela ukuthi ibhizinisi lakho lidinga idatha eningi kunendlu yokugcina idatha kanye nobuchwepheshe bechibi ledatha, noma ukuthi ufuna isixazululo sokuhlanganisa ukuhlaziya okuyinkimbinkimbi nokusebenza komshini kudatha yakho. A idatha lakehouse inketho enengqondo esimweni.
shiya impendulo