Zvingave zvakatiomera kufunga nezvese aripo masevhisi uye sarudzo dzekuvaka kana uchifunga nezve data mapuratifomu.
Iyo bhizinesi data chikuva kazhinji ine matura data, data modhi, madhamu edhata, uye mishumo, imwe neimwe iine chinangwa chakati uye seti yehunyanzvi hunodiwa. Kusiyana neizvi, dhizaini nyowani inonzi data lakehouse yakabuda mukati memakore mashoma apfuura.
Iko kuita kwakasiyana-siyana kwemadhamu edhata uye data warehouse data management inosanganiswa mune shanduko yekuchengetera data yedhizaini inonzi "data lakehouse."
Isu tichaongorora data lakehouse zvakadzika mune ino positi, kusanganisira zvikamu zvayo, maficha, zvivakwa, uye zvimwe zvinhu.
Chii chinonzi Data Lakehouse?
Sezvinoreva zita, a data lakehouse imhando nyowani yedata architecture iyo inosanganisa dziva re data nedura re data kugadzirisa zvikanganiso zvemumwe neumwe zvakasiyana.
Muchidimbu, iyo Lakehouse system inoshandisa isingadhure kuchengetedza kuchengetedza yakakura yedata mumafomu avo ekutanga, senge madhamu edata. Kuwedzera metadata layer pamusoro pechitoro kunopawo chimiro chedata uye inopa masimba ekushandisa data senge anowanikwa munzvimbo dzekuchengetera data.
Iyo inochengeta yakakura mavhoriyamu e data rakarongeka, rakaumbwa, uye risina kurongeka ravanowana kubva kune akasiyana bhizinesi mashandisirwo, masisitimu, uye magajeti anoshandiswa mukati mesangano ravo.
Yakawanda yenguva, madziva edata anoshandisa yakaderera-mutengo yekuchengetedza masisitimu ane faira application programming interface (API) kuchengetedza data muakavhurika, generic faira mafomati.
Izvi zvinoita kuti zvikwata zvakawanda zvikwanise kuwana data rese rekambani kuburikidza nehurongwa humwe hwezvirongwa zvakasiyana, senge sainzi yedata, machine learning, uye njere dzebhizimisi.
Features
- Kuchengetedza kwakaderera. A data lakehouse inofanirwa kukwanisa kuchengeta data mune isingadhure chinhu chekuchengetedza, senge Google Cloud Kuchengeta, Azure Blob Storage, Amazon Nyore Kuchengeta Sevhisi, kana natively kushandisa ORC kana Parquet.
- Kugona kwedata optimization: Dhata dhizaini optimization, caching, uye indexing mienzaniso mishoma yekuti nzvimbo yedhata yedata inofanirwa kukwanisa sei kukwirisa data ichichengetedza iyo data yekutanga fomati.
- Chiyero chekutengesa metadata: Pamusoro peiyo yakakosha-yemutengo wakaderera kuchengetedza, izvi zvinogonesa dhata manejimendi kugona kwakakosha pakuita kwekuchengetedza data.
- Tsigiro yeDeclarative DataFrame API: Maturusi mazhinji eAI anogona kushandisa DataFrames kudzoreredza data rechitoro chechinhu. Tsigiro yeDeclarative DataFrame API inowedzera kugona kugadzirisa zvine simba mharidzo yedata uye chimiro mukupindura imwe sainzi yedata kana AI basa.
- Tsigiro ye ACID transactions: Acronym ACID, iyo inomiririra atomicity, consistency, isolation, uye durability, chinhu chakakosha pakutsanangura kutengeserana uye kuve nechokwadi chekuenderana uye kuvimbika kwedata. Kutengeserana kwakadaro kwaimbove kuchigoneka chete mumatura data, asi iyo lakehouse inopa sarudzo yekuishandisa nedhamu dzedhamu naizvo. Nemapaipi akati wandei edata anosanganisira anoverengeka uye anonyora dhata, izvi zvinogadzirisa dambudziko reyakaderera data mhando yeyekupedzisira.
Zvinhu zveData Lakehouse
Iyo dhizaini yedata lakehouse yakakamurwa kuita maviri makuru tiers padanho repamusoro. Iyo yekuchengetera dhata yekutora inodzorwa neLakehouse chikuva (kureva, iyo data dziva).
Pasina kuda kurodha iyo data mudura re data kana kuishandura kuita proprietary fomati, iyo yekugadzirisa layer inozokwanisa kubvunza iyo data mudura rekuchengetedza zvakananga uchishandisa huwandu hwezvishandiso.
Zvadaro, mapurogiramu eBI, pamwe neAI neML tekinoroji, anogona kushandisa iyo data. Iyo hupfumi yedhamu yedata inopihwa nedhizaini iyi, asi nekuti chero injini yekugadzirisa inogona kuverenga iyi data, mabhizinesi ane rusununguko rwekuita kuti data rakagadzirirwa riwanikwe kuongororwa nehurongwa hwehurongwa. processor kuita uye mutengo zvinogona kuvandudzwa nekushandisa iyi nzira yekugadzirisa uye kuongorora.
Nekuda kwerutsigiro rwayo rwekutengeserana kwedhatabhesi inonamatira kune inotevera ACID (atomicity, consistency, isolation, uye durability) maitiro, zvivakwa zvakare zvinogonesa mapato mazhinji kuwana nekunyora data panguva imwe chete mukati mehurongwa:
- Atomicity inoreva chokwadi chekuti kutengeserana kwakazara kana kusavapo kwayo, kunobudirira paunenge uchipedza kutengeserana. Muchiitiko chekuti chirongwa chikavhiringidzwa, izvi zvinobatsira kudzivirira kurasikirwa nedata kana huwori.
- Consistency inovimbisa kutengeserana kunoitika nenzira inofanotaurwa, inowirirana. Inochengetedza kutendeseka kweiyo data nekuona kuti data rega rega riri pamutemo maererano nemitemo yakafanotemerwa.
- Isolation inovimbisa kuti, kusvika yapera, hapana kutengeserana kunogona kukanganiswa nechero kumwe kutengeserana mukati mehurongwa. Izvi zvinobvumira mapato mazhinji kuverenga nekunyora kubva kune imwecheteyo system panguva imwe chete pasina kupindirana.
- mukurumari inovimbisa kuti shanduko kune data muhurongwa inoramba iripo mushure mekunge kutengeserana kwapera, kunyangwe pakaitika kutadza kwehurongwa. Chero shanduko inounzwa nekutengeserana inochengetwa pafaira zvachose.
Data Lakehouse Architecture
Databricks (mugadziri uye mugadziri weDelta Lake pfungwa) uye AWS ndiwo maviri marevereri makuru eiyo pfungwa yedhata redzimba. Saka tichavimba neruzivo rwavo uye nzwisiso kutsanangura magadzirirwo edzimba dzemadziva.
A data lakehouse system inowanzova nezvikamu zvishanu:
- Kupinza layer
- Storage layer
- Metadata layer
- API layer
- Kushandiswa layer
Kupinza layer
Yekutanga sisitimu layer inobata kuunganidza data kubva kwakasiyana masosi uye kuitumira kune yekuchengetedza layer. Iyo dhizaini inogona kushandisa akati wandei mapuroteni kuti abatanidze kune akawanda emukati nekunze masosi, kusanganisira kusanganisa batch uye yekufambisa data kugadzirisa kugona, senge.
- NoSQL databases,
- file shares
- CRM zvikumbiro,
- mawebhusaiti,
- IoT sensors,
- social media,
- Software seSevhisi (SaaS) zvikumbiro, uye
- relational database manejimendi masisitimu, nezvimwe.
Panguva ino, zvikamu zvakaita seApache Kafka zvekutepfenyura data uye Amazon Data Migration Service (Amazon DMS) yekupinza data kubva kuRDBMS uye NoSQL dhatabhesi inogona kushandiswa.
Storage layer
Iyo Lakehouse architecture inoitirwa kugonesa kuchengetwa kwemhando dzakasiyana dze data sezvinhu muzvitoro zvisingadhure zvinhu, senge AWS S3. Uchishandisa akavhurika faira mafomati, maturusi evatengi anogona kubva averenga zvinhu izvi zvakananga kubva muchitoro.
Izvi zvinoita kuti zvikwanisike kune akawanda maAPIs uye mashandisirwo akaturikidzana zvikamu kuwana uye kushandisa iyo yakafanana data. Iyo metadata layer inochengetedza schemas yeakarongwa uye semi-akamisikidzwa datasets kuitira kuti zvikamu zvigone kuzvishandisa kune data pavanenge vachiverenga.
Iyo Hadoop Distributed File System (HDFS) chikuva, semuenzaniso, inogona kushandiswa kuvaka gore repository masevhisi anotsemura komputa uye kuchengetedza pane-nzvimbo. Lakehouse yakanyatsokodzera masevhisi aya.
Metadata layer
Iyo metadata layer ndiyo yakakosha chikamu che data lakehouse inosiyanisa iyi dhizaini. Iyo kabhuku kamwechete inopa metadata (ruzivo nezve mamwe zvidimbu zve data) yezvinhu zvese zvakachengetwa mudhamu uye inobvumira vashandisi kushandisa masimba ekutonga se:
- Iyo inopindirana vhezheni yedhatabhesi inoonekwa neakabatana matransaction nekuda kweACID transaction;
- caching kuchengetedza gore chinhu chitoro mafaera;
- kuwedzera data chimiro indexes uchishandisa indexing kukurumidza kugadzirisa mibvunzo;
- kushandisa zero-copy cloning kudzokorora zvinhu zve data; uye
- kuchengetedza dzimwe shanduro dze data, nezvimwewo, shandisa data versioning.
Pamusoro pezvo, iyo metadata layer inogonesa kuitwa kwe schema manejimendi, kushandiswa kweDW schema topologies senge nyeredzi / snowflake schemas, uye kupihwa kwekutonga kwedata uye kugona kwekuongorora zvakananga padhamu re data, kuwedzera kutendeseka kwese pombi yedata.
Zvimiro zve schema shanduko uye kuisirwa zvinosanganisirwa mune schema manejimendi. Nekuramba chero zvinyorwa zvisingasangane netafura schema, schema enforcement inogonesa vashandisi kuchengetedza data kutendeseka uye mhando.
Schema evolution inobvumira tafura iripo schema kuti igadziriswe kuti ienderane nekuchinja data. Nekuda kweiyo imwe chete manejimendi yekutonga pamusoro pedhamu data, kune zvakare yekuwana yekutonga uye yekuongorora mikana.
API layer
Imwe yakakosha dhizaini yezvivakwa ikozvino iripo, inotora akati wandei maAPI ayo vese vashandisi vekupedzisira vanogona kushandisa kuita mabasa nekukurumidza uye kuwana huwandu hwakanyanya.
Iko kushandiswa kwemetadata APIs kunoita kuti zvive nyore kuziva uye kuwana zvinhu zve data zvinodiwa kune yakapihwa application.
Panyaya yemaraibhurari ekudzidza muchina, mamwe acho, akadai saTensorFlow uye Spark MLlib, anogona kuverenga akavhurika mafomati seParquet uye akananga metadata layer.
Panguva imwecheteyo, DataFrame APIs inopa mikana mikuru yekugadzirisa, zvichiita kuti vanogadzira mapurogiramu varonge uye vachinje data rakaparadzirwa.
Kushandiswa layer
Power BI, Tafura, uye mamwe maturusi uye maapplication anotambirwa pasi peiyo yekudyara layer. Nekugadzirwa kwelakehouse, metadata yese uye yese data inochengetwa mudhamu inowanikwa kune vatengi maapplication.
Iyo lakehouse inogona kushandiswa nevashandisi vese mukati mekambani kuita marudzi ese e analytics mabasa, kusanganisira kugadzira bhizinesi rehungwaru dashboard uye kumhanya SQL mibvunzo uye muchina kudzidza mabasa.
Zvakanakira zveData Lakehouse
Masangano anogona kugadzira dhata rakehouse kubatanidza yavo yazvino data chikuva uye kukwidziridza yavo yese data manejimendi maitiro. Nekubvisa zvipingamupinyi zvesilo zvinobatanidza kwakasiyana masosi, nzvimbo yedhata inogona kutsiva kudiwa kwemhinduro dzakasiyana.
Kuenzaniswa neyakavharidzirwa data masosi, kubatanidzwa uku kunoburitsa yakanyanya kushanda-yekupedzisira-yekupedzisira maitiro. Izvi zvine zvakawanda zvazvakanakira:
- Kunyanya kutonga: Panzvimbo pekubvisa data kubva kune yakabikwa data uye kuigadzirira kuti ishandiswe mukati medura re data, nzvimbo yedhata inobvumira chero masosi akabatana nayo kuti data ravo riwanikwe uye rakarongeka kuti rishandiswe.
- Kuwedzera mari-inoshanda: Dzimba dzedhata dzedhata dzinovakwa uchishandisa zvivakwa zvemazuva ano zvinokamura computation uye kuchengetedza, zvichiita kuti zvive nyore kuwedzera chengetedzo pasina kuwedzera compute simba. Kungoshandiswa kwekusingadhuri kwekuchengetedza data kunokonzeresa scalability iyo inodhura-inoshanda.
- Zvirinani kutonga kwedata: Dhata dzimba dzemadziva dzakavakwa neyakajairwa yakavhurika zvivakwa, zvichibvumira kutonga kwakawanda pamusoro pechengetedzo, metrics, basa-yakavakirwa kuwana, uye zvimwe zvakakosha zvekutonga zvinhu. Nekubatanidza zviwanikwa uye masosi edata, vanorerutsa nekusimudzira utongi.
- Mipimo yakapfava: Sezvo kubatana kwacho kwakaganhurirwa zvakanyanya muma1980, apo matura edata akatanga kugadzirwa, zvimiro zve schema zvemuno zvaigarogadzirwa mukati mebhizinesi, kunyangwe madhipatimendi. Dhata dzimba dzemadziva dzinoshandisa chokwadi chekuti marudzi mazhinji edatha ikozvino ane akavhurika zviyero zve schema nekuisa akawanda data masosi ane inopindirana yunifomu schema kugadzirisa maitiro.
Zvakaipa zveData Lakehouse
Kunyangwe iyo hoopla yakatenderedza dzimba dzedhata, zvakakosha kuti urambe uchifunga kuti iyo pfungwa ichiri nyowani. Iva nechokwadi chekuyera zvipingamupinyi usati wazvipira zvizere kune iyi dhizaini itsva.
- Monolithic chimiro: Yakehouse's-inosanganisirwa dhizaini inopa zvakati wandei zvakanakira, asi zvakare inomutsa mamwe matambudziko. Monolithic architecture inowanzo tungamira kune yakashata sevhisi kune vese vashandisi uye inogona kuve yakaoma uye yakaoma kuchengetedza. Kazhinji, vagadziri uye vagadziri vanofarira imwe modular architecture iyo yavanogona kugadzirisa kune akasiyana makesi ekushandisa.
- Tekinoroji haisati yavepo: chinangwa chekupedzisira chinosanganisira huwandu hwakakosha hwekudzidza kwemuchina uye hungwaru hwekugadzira. Dzimba dzemadziva dzisati dzagona kuita sezvinofungidzirwa, matekinoroji aya anofanira kuenderera mberi.
- Kwete kufambira mberi kwakakosha pamusoro pezvivakwa zviripo: Pachine kukahadzika kwakanyanya pamusoro pekuti yakawanda sei kukosha kwemadziva kuchabatsira. Vamwe vadzivisi vanopokana kuti dhizaini yekuchengetera dhamu yakapetwa nemichina yakafanira otomatiki inogona kuita basa rakafanana.
Matambudziko eData Lakehouse
Zvinogona kuve zvakaoma kutora iyo data lakehouse tekinoroji. Nekuda kwekuomerwa kwezvikamu zvayo, hazvina kunaka kuona iyo dhakahouse sechinhu chinosanganisa-sese chakanakira chimiro kana "chikuva chimwe chezvese," kune chimwe.
Pamusoro pezvo, nekuda kwekuwedzera kutorwa kwemadhamu edhata, mabhizinesi anozofanira kuendesa matura e data kwavari, achivimba chete nevimbiso yekubudirira pasina hupfumi hunoonekwa.
Kana paine matambudziko ekunonoka kana kudzima mukati memaitiro ekutamisa, izvi zvinogona kupedzisira zvave kudhura, kutora nguva, uye pamwe kusachengeteka.
Vashandisi vebhizinesi vanofanirwa kugashira matekinoroji akanyanya hunyanzvi, sekureva kwevamwe vatengesi vanonyatso tengesa kana kushambadza mhinduro sedzimba dzedhata. Izvi zvinogona kusagara zvichishanda nemamwe maturusi akabatana nedhamu re data pakati peiyo system, zvichiwedzera kune nyaya.
Pamusoro pezvo, zvingave zvakaoma kupa 24/7 analytics uchimhanyisa bhizinesi-yakakosha mabasa, izvo zvinodaidzira zvivakwa zvine mutengo-unoshanda scalability.
mhedziso
Iyo mitsva yakasiyana-siyana yedata data mumakore achangopfuura ndiyo data lakehouse. Iyo inosanganisa akasiyana minda, senge ruzivo tekinoroji, yakavhurika-sosi software, gore kadambari, uye akagovera mapuroteni ekuchengetedza.
Inogonesa mabhizinesi kuchengetedza nepakati marudzi ese e data kubva chero nzvimbo, kurerutsa manejimendi uye kuongorora. Data Lakehouse ipfungwa yakanaka inonakidza.
Chero femu yaizove neyakakura yemakwikwi mupendero dai yaikwanisa kuwana yese-mu-imwe dhata chikuva chaikurumidza uye chinoshanda sedura re data ukuwo ichichinjika sedhamu redhata.
Pfungwa ichiri kukura uye ichiri itsva. Nekuda kweizvozvo, zvinogona kutora nguva kuti uone kana chimwe chinhu chingave chakapararira.
Tese tinofanirwa kunge tichida kuziva nezvekwakananga iyo Lakehouse architecture iri kuenda.
Leave a Reply