Zviri Mukati[Viga][Ratidza]
Makambani ari kutora data rakawanda kupfuura nakare kose sezvo ivo vachiwedzera kuvimba naro kuzivisa zvakakosha bhizinesi sarudzo, kuwedzera zvigadzirwa zvinopihwa, uye kupa zvirinani basa revatengi.
Nehuwandu hwe data iri kugadzirwa pamwero weexponential, gore rinopa akati wandei mabhenefiti ekugadzirisa data uye analytics, kusanganisira scalability, kuvimbika, uye kuwanikwa.
Mugore ecosystem, kune zvakare akati wandei maturusi uye matekinoroji ekugadzirisa data uye analytics. Iwo marudzi maviri emahombe ekuchengetera data zvimiro anonyanya kushandiswa matura edata uye madhamu edata.
Kunyangwe kushandisa dziva redhata kusinganyanyi kunakidza sezvo usingakwanise kubvunza modhi uye data ichiri kukosha, kushandisa nzvimbo yekuchengetera data yekutepfenyura kuchengetedza data kunoparadza.
Wrudzi rudzii rwekuvaka kwefu kwatinosarudza?
Tinofanira kufunga here pfungwa nyowani dzedhamu redhamu, kana kuti tinofanira kugutsikana nezvisungo zveimba yekuchengetera zvinhu kana zvirambidzo zvedhamu?
Inoveli data yekuchengetedza dhizaini inonzi "data lakehouse" inosanganisa kuchinjika kwemadhamu edhata nekutonga kwedata kwenzvimbo dzekuchengetera data.
Kunzwisisa nzira dzakasiyana-siyana dzekuchengetedza data kwakakosha pakuvaka pombi yakavimbika yekuchengetedza data yehungwaru hwebhizinesi (BI), data analytics, uye. machine learning (ML) mitoro yebasa, zvichienderana nezvido zvekambani yako.
Mune ino positi, isu tichanyatso tarisa kuData Warehouse, Data Lake, uye Data Lakehouse, ine mabhenefiti, zvipimo pamwe nezvakanakira nezvayakaipira. Ngatitange.
Chii chinonzi Data Warehouse?
Imba yekuchengetera data inzvimbo yepakati data repository inoshandiswa nesangano kubata mavhoriyamu akakura e data kubva kwakawanda kwakawanda. Imba yekuchengetera data inoita senge sosi yesangano imwe chete ye "data chokwadi" uye yakakosha kukuzivisa uye bhizinesi analytics.
Kazhinji, matura edatha anosanganisa ehukama data seti kubva kune akati wandei masosi, senge application, bhizinesi, uye transaction data, kuchengetedza nhoroondo data. Isati yaiswa mudura rekuchengetedza, data inoshandurwa uye kucheneswa mumatura edatha kuitira kuti ishandiswe sechinhu chimwe chete chechokwadi che data.
Nekuda kwekugona kwavo kukurumidza kupa ruzivo rwebhizinesi kubva munzvimbo dzese dzekambani, mabhizinesi anodyara mari mudura re data. Nekushandiswa kweBI maturusi, SQL vatengi, uye zvimwe zvidiki zvishoma (kureva, isiri-data sainzi) analytics mhinduro, vanoongorora bhizinesi, mainjiniya edata, uye vanoita sarudzo vanogona kuwana data kubva kudura re data.
Zvinodhura kuchengetedza imba yekuchengetera ine huwandu hunoramba huchiwedzera hwe data, uye nzvimbo yekuchengetera data haigone kubata mbishi kana isina kurongeka data. Pamusoro pezvo, haisiriyo sarudzo yakanakira yeakaomesesa ekuongorora data senge muchina kudzidza kana kufanotaura modhi.
Imba yekuchengetera data, saka, inopa nekukurumidza mhinduro dzemibvunzo uye data remhando yepamusoro. Google Big Query, Amazon Redshift, Azure SQL Data warehouse, uye Snowflake masevhisi emakore anowanikwa kune matura edatha.
Mabhenefiti eData Warehouse
- Kuwedzera kugona uye kumhanya kwebhizinesi kungwara uye data analytics basa rakawanda: Nzvimbo dzekuchengetera data dzinopfupisa nguva inodiwa pakugadzirira nekuongorora data. Ivo vanogona kubatanidza zviri nyore kune data analytics uye bhizinesi rehungwaru maturusi sezvo data kubva kudura re data rakavimbika uye rinoenderana. Pamusoro pezvo, matura edatha anochengetedza nguva inodiwa yekuunganidza data uye anopa zvikwata kugona kushandisa data remishumo, madhibhodhi, uye zvimwe zvinodiwa zvekuongorora.
- Kuwedzera kuwirirana, kunaka, uye kumira kwedata: Masangano anounganidza data kubva kwakasiyana siyana, kusanganisira mushandisi, kutengesa, uye data rekutengesa. Iyo femu inogona kuvimba nedhata rezvinodikanwa zvebhizinesi nekuti dhata warehousing inounganidza data rekambani kuita yunifomu, yakamisikidzwa fomati inogona kuita senge sosi yechokwadi yedata.
- Kuvandudza kuita sarudzo mune zvese: Kuchengetedzwa kwedata kunofambisa zvirinani kuita sarudzo nekupa chitoro chepakati che data razvino uye rekare. Nekugadzirisa dhata mumatura edatha kuti uwane ruzivo rwakakwana, vanoita sarudzo vanogona kuongorora njodzi, kunzwisisa zvinodiwa nevatengi, uye kuwedzera zvinhu nemasevhisi.
- Kupa nani bhizinesi njere: Kuchengetedzwa kwedata kunovhara mukaha uripo pakati pedata hombe, iro rinowanzo kuunganidzwa sechinhu chechokwadi, uye data yakasarudzika inopa ruzivo. Ivo vanoita sehwaro hwekuchengetera data kwesangano, vachiigonesa kupindura mibvunzo yakaoma nezve data rayo uye kushandisa mhinduro kuita sarudzo dzebhizinesi dzinodzivirirwa.
Kuganhurirwa kweData Warehouse
- Kushaikwa kwe data kuchinjika: Nepo matura edata achibudirira kubata data rakarongeka, semi-yakarongeka uye isina kurongeka mafomu senge log analytics, kutenderera, uye yesocial media data inogona kuvaomera. Izvi zvinogadzira kurumbidza matura edatha emakesi ekushandisa anosanganisira kudzidza kwemichina uye chakagadzirwa njere zvakaoma.
- Zvinodhura kuisa nekuchengetedza: Nzvimbo dzekuchengetera data dzinogona kudhura kuisa nekuchengetedza. Uyezve, iyo yekuchengetedza data kazhinji haina kumira; inochembera uye inoda kugara ichichengetwa, izvo zvinodhura.
zvayakanakira
- Data iri nyore kuwana, kutora, uye kubvunza.
- Chero bedzi iyo data yatove yakachena, SQL data kugadzirira iri nyore.
nezvayakaipira
- Iwe unomanikidzwa kushandisa chete analytics mutengesi.
- Kuongorora uye kuchengetedza zvisina kurongeka kana kuyerera data kunodhura zvakanyanya.
Chii chinonzi Data Lake?
Yese mhando yedata inovimbiswa uye inoitwa kuti igoneke nemadhamu edata. Izvo zvinobatsira kuve nedata nenzira inosvikika iri pakati uye inowanikwa pakuverenga.
Dziva re data inzvimbo yepakati, inochinjika zvakanyanya yekuchengetera uko mavhoriyamu makuru e data yakarongeka uye isina kurongeka inochengetwa mune yavo isina kugadziridzwa, isina kuchinjwa, uye isina kurongeka mafomu.
Dziva re data rinoshandisa chivakwa chakati sandara uye zvinhu zvakachengetwa munzvimbo yayo isina kugadziridzwa kuchengetedza data, kupesana nenzvimbo dzekuchengetera data, dzinochengetedza data rehukama iro rakambo "cheneswa."
Madhamu edhata, kusiyana nematura edatha, ane dambudziko rekubata data mune iyi fomati, anochinjika, akavimbika, uye anokwanisika uye anobvumira mabhizinesi kuwana nzwisiso yakawedzerwa kubva kune isina kurongeka data.
Mumadziva edata, data inotorwa, kutakurwa, uye kushandurwa (ELT) nechinangwa chekuongorora pane kuve neiyo schema kana data yakagadzwa panguva yekuunganidza data.
Kushandisa matekinoroji emhando dzakawanda dze data kubva kuIoT zvishandiso, evanhu vezvenhau, uye yekufambisa data, madziva edata anogonesa kudzidza kwemichina uye kufungidzira analytics.
Pamusoro pezvo, sainzi wedata anogona kugadzirisa data rakasvibira anogona kushandisa dziva re data. Imba yekuchengetera data, kune rumwe rutivi, iri nyore kuti mabhizinesi ashandise. Yakakwana kune mushandisi profiling, predictive analytics, kudzidza muchina, nemamwe mabasa.
Kunyangwe madziva edata achigadzirisa nyaya dzinoverengeka nedzimba dzekuchengetera data, mhando yavo yedata haina kunaka uye kumhanya kwemubvunzo hakuna kukwana. Pamusoro pezvo, zvinotora mamwe maturusi evashandisi vebhizinesi kuita SQL mibvunzo. Dziva re data iro risina kurongeka zvakanaka rinogona kusangana nedambudziko nekumira kwedata.
Mabhenefiti eData Lake
- Tsigiro yeyakasiyana-siyana yekudzidza muchina uye data sainzi application kesi Zviri nyore kushandisa akasiyana muchina uye yakadzika yekudzidza algorithms kubata iyo data mumadhamu edata sezvo data ichichengetwa yakavhurika, mbishi nzira.
- Dhata dziva 'kusiyana-siyana, izvo zvinokutendera iwe kuchengetedza data mune chero fomati kana midhiya pasina chinodiwa che preset schema, mukana wakakura. Remangwana rekushandisa data makesi anogona kutsigirwa, uye yakawanda data inogona kuongororwa kana iyo data yasara mune yayo yekutanga mamiriro.
- Kuti udzivise kuchengetedza marudzi ese e data mumamiriro akasiyana siyana, madhamu edata anogona kuve nedata rakarongeka uye risina kurongeka. Nekuchengetedza kwemhando dzakasiyana dze data resangano, vanopa imwe nzvimbo.
- Kuenzaniswa nematura echinyakare data, madhamu edhata haadhure nekuti akavakirwa kuti achengetwe pazvinhu zvisingadhuri zvemidziyo, senge chinhu chekuchengetedza, icho chinowanzogadzirirwa mutengo wakaderera pane gigabyte yakachengetwa.
Kuganhurirwa kweData Lake
- Data analytics uye bhizinesi hungwaru hwekushandisa makesi anokora zvisina kunaka: Dhata madziva anogona kuve asina kurongeka kana akasachengetwa zvakaringana, izvo zvinoita kuti zviome kuvabatanidza kune hungwaru hwebhizinesi uye analytics maturusi. Pamusoro pezvo, kana zvichidikanwa pakushuma uye analytics mashandisiro emakesi, kushomeka kwekuenderana zvimiro zvedata uye ACID (atomicity, consistency, isolation, uye durability) kutsigirwa kwekutengeserana kunogona kutungamirira kune suboptimal query performance.
- Kusawirirana kwedhamu dzedhata kunoita kuti zvisaite kumanikidza kuvimbika nekuchengetedza data, izvo zvinokonzeresa kushomeka kwezvose. Zvingave zvakaoma kukudziridza chengetedzo yedata yakakodzera uye mazinga ehutongi kuti aenderane nemhando dze data, sezvo madhamu edata anogona kubata chero fomu re data.
zvayakanakira
- Mhinduro dzinokwanisika kune ese marudzi e data.
- Inokwanisa kubata data iyo yakarongeka uye semi-yakaumbwa.
- Yakanakira kuomesesa data kugadzirisa uye kutenderera.
nezvayakaipira
- Inoda pombi yemhando yepamusoro kuti ivakwe.
- Ipa data imwe nguva kuti ive mubvunzo.
- Zvinotora nguva yekuvimbisa kuvimbika kwedata uye mhando.
Chii chinonzi Data Lakehouse?
Inoveli hombe-data yekuchengetedza dhizaini inonzi "data lakehouse" inosanganisa yakakura maficha edhadha data uye matura data. Yese yedata rako, ingave yakarongeka, yakaumbwa, kana isina kurongeka, inogona kuchengetwa munzvimbo imwechete ine yakanakisa muchina kudzidza, hungwaru hwebhizinesi, uye kugona kutepfenyura kunogoneka nekuda kwedhata data.
Madhamu edhata emarudzi ese anowanzo mavambo edzimba dzedhata; mushure meizvozvo, iyo data inoshandurwa kuita Delta Lake fomati (yakavhurika-sosi yekuchengetedza layer inounza kuvimbika kumadziva edata).
Dhata madziva ane delta madziva anogonesa ACID transaction maitiro kubva kune akajairwa data warehouses. Muchidimbu, iyo Lakehouse system inoshandisa isingadhure kuchengetedza kuchengetedza yakakura yedata mumafomu avo ekutanga, senge madhamu edata.
Kuwedzera metadata layer pamusoro pechitoro kunopawo chimiro chedata uye inopa masimba ekugadzirisa data seaya anowanikwa munzvimbo dzekuchengetera data.
Izvi zvinoita kuti zvikwata zvakawanda zvikwanise kuwana data rese rekambani kuburikidza nehurongwa humwe hwezvirongwa zvakasiyana, senge sainzi yedata, kudzidza kwemichina, uye hungwaru hwebhizinesi.
Benefits yeData Lakehouse
- Tsigiro yehuwandu hukuru hwehuwandu hwemabasa: Kuti ufambise kuongororwa kwakadzama, dhata dzimba dzemadziva dzinopa vashandisi mukana wakananga kune mamwe anonyanya kufarirwa bhizinesi renjere maturusi (Tableau, PowerBI). Pamusoro pezvo, masayendisiti edata uye mainjiniya ekudzidza muchina anogona kushandisa data nyore nyore sezvo dzimba dzedhata dzichishandisa mafomati akavhurika (akadai seParquet) pamwe chete nemaAPI uye masisitimu ekudzidza muchina, akadai sePython/R.
- Mutengo-Kubudirira: Dzimba dzedhata dzinoshandisa zvigadziriso zvisingadhure zvekuchengetedza zvigadziriso zvekushandisa dhamu dzedhadha' dzinodhura-dzinoshanda kuchengetedza maitiro. Nekupa mhinduro imwechete, dzimba dzedhadha dzedhata zvakare dzinobvisa mari uye nguva ine chekuita nekugadzirisa akasiyana masisitimu ekuchengetedza data.
- Dhata lakehouse dhizaini inovimbisa schema uye data kuperera, zvichiita kuti zvive nyore kuvaka inoshanda kuchengetedza data uye hutongi masisitimu. Ease of data versioning, utongi, uye kuchengeteka.
- Dzimba dzedhata dzedhata dzinopa imwechete, yakawanda-dhata yekuchengetedza data chikuva inogona kugarisa zvese zvinodiwa nekambani data, izvo zvinoderedza kudzokororwa kwedata. Mazhinji emabhizinesi anosarudza mhinduro yakasanganiswa nekuda kwemabhenefiti ezvese data rehouse uye data dziva. Zvichakadaro, zano iri rinogona kuguma nekudhura kudzokorora data.
- Kutsigirwa kwemafomu akazaruka. Mafomati akavhurika ndiwo marudzi emafaira anogona kushandiswa neakawanda masoftware maapplication uye ane zvirevo zvinowanikwa pachena. Sekureva kwemishumo, Lakehouses inokwanisa kuchengetedza data mune akajairwa faira mafomati seApache Parquet uye ORC (Optimized Row Columnar).
Kuganhurirwa kweData Lakehouse
Iyo data lakehouse yakakura drawback ndeyekuti ichiri chidiki uye chiri kusimukira tekinoroji. Hazvina chokwadi kana ichazadzisa zvisungo zvayo semhedzisiro. Pamberi pe data dzimba dzemadziva dzakwanisa kukwikwidza neyakasimbiswa-hombe-data yekuchengetedza masisitimu, zvinogona kutora makore.
Nekudaro, nekupihwa chiyero chiri kuitika hunyanzvi hwemazuva ano, zvakaoma kutaura kana imwe nzira yekuchengetera data ikasazoitsiva.
zvayakanakira
- Imwe puratifomu ine data rese, zvinoreva kuti kune mashoma mazita ekuchengetedza.
- Atomicity, kusachinja-chinja, kuzviparadzanisa nevamwe, uye kuomarara hazvina kukanganiswa.
- Zvinonyanya kutengeka.
- Imwe puratifomu ine data rese, zvinoreva kuti kune mashoma mazita ekuchengetedza.
- Zviri nyore kubata, uye nekukurumidza kugadzirisa chero nyaya
- Ita kuti zvive nyore kugadzira pombi
nezvayakaipira
- Kugadzirisa kunogona kutora nguva.
- Iyo idiki uye iri kure zvakanyanya kuti ikwanise kuve yakamiswa yekuchengetedza system.
Data Warehouse Vs Data Lake Vs Data Lakehouse
Iyo data warehouse ine nhoroondo refu muhungwaru hwekambani, kushuma, uye analytics maapplication uye ndiyo yekutanga hombe-data yekuchengetedza tekinoroji.
Matura edata, kune rumwe rutivi, ane mutengo uye ane dambudziko rekubata akasiyana uye asina kurongeka data, sekutepfenyura data. Kudzidza kwemuchina uye data sainzi basa rakawanda, madhamu edhata akagadzirwa kuti agadzirise data mbishi mumhando dzakasiyana-siyana pakuchengetedza kunokwanisika.
Kunyangwe madhamu edhata ari kushanda nedata risina kurongeka, anoshaya ACID yekushandura hunyanzvi hwenzvimbo dzekuchengetera data, zvichiita kuti zviome kuvimbisa kuenderana nekuvimbika.
Iyo nyowani yekuchengetera data dhizaini, inozivikanwa se "data lakehouse," inosanganisa kuvimbika uye kuenderana kwenzvimbo dzekuchengetera data nekugona uye kuchinjika kwedhamu dzedhata.
mhedziso
Mukupedzisa, kuvaka data lakehouse kubva pakutanga kungave kwakaoma. Uyezve, iwe unenge uchingove uchishandisa chikuva chakagadzirirwa kugonesa yakavhurika data lakehouse architecture.
Naizvozvo, ita kungwarira kuti uongorore akawanda maficha uye mashandisirwo epuratifomu yega yega usati watenga. Makambani ari kutsvaga yakakura, yakarongeka data mhinduro ine tarisiro yehungwaru hwebhizinesi uye data analytics makesi ekushandisa anogona kufunga nezvekuchengetera data.
Nekudaro, mabhizinesi ari kutsvaga scalable, inokwanisika hombe data mhinduro kune simba rekuita basa resainzi yedata uye kudzidza muchina pane isina kurongeka data inofanirwa kufunga madziva edata.
Funga kuti bhizinesi rako rinoda data rakawanda kupfuura dura re data uye data dziva tekinoroji inogona kupa, kana kuti iwe uri kutsvaga mhinduro yekubatanidza yakaomesesa analytics uye muchina kudzidza mashandiro pane yako data. A data lakehouse isarudzo inonzwisisika mumamiriro ezvinhu.
Leave a Reply