I-Hive iyithuluzi elisetshenziswa kakhulu le-Big Data Analytics ebhizinisini, futhi iyindawo enhle yokuqala uma umusha ku-Big Data. Lesi sifundo se-Apache Hive sidlula ezintweni eziyisisekelo ze-Apache Hive, kungani isidleke sidingeka, izici zayo, nakho konke okunye okufanele ukwazi.
Masiqale siqonde uhlaka lwe-Hadoop okwakhelwe kulo i-Apache Hive.
I-Apache Hadoop
I-Apache Hadoop imahhala futhi evulekile-umthombo inkundla yokugcina nokucubungula amasethi edatha amakhulu asukela ngosayizi ukusuka kumagigabhayithi kuye kumapetabytes. I-Hadoop ivumela ukuhlanganisa amakhompyutha amaningi ukuhlaziya amasethi edatha amakhulu ngokuhambisana, kunokudinga ikhompuyutha eyodwa enkulu ukuthi igcine futhi ihlaziye idatha.
I-MapReduce kanye ne-Hadoop Distributed File System yizingxenye ezimbili:
- ImephuNciphisa - I-MapReduce iwuhlelo lokuhlela oluhambisanayo lokusingatha imiqulu emikhulu yedatha ehleliwe, enesakhiwo esincane, nengahlelekile kumaqoqo wezingxenyekazi zempahla.
- HDFS - I-HDFS (I-Hadoop Distributed File System) iyingxenye yohlaka lwe-Hadoop olugcina futhi lucubungule idatha. Iwuhlelo lwefayela olubekezelela iphutha olusebenza kuhadiwe evamile
Amaphrojekthi amancane (amathuluzi) ahlukene ku-ecosystem ye-Hadoop, okuhlanganisa i-Sqoop, i-Pig, ne-Hive, asetshenziselwa ukusiza amamojula we-Hadoop.
- Hive - I-Hive iwuhlaka lokubhala imibhalo yesitayela se-SQL eyenza i-MapReduce computations.
- I-pig - Ingulube iwulimi lokuhlela lwenqubo olungasetshenziswa ukwakha iskripthi sezinqubo ze-MapReduce.
- Sqoop - I-Sqoop iyithuluzi lokungenisa nokuthekelisa idatha phakathi kwe-HDFS ne-RDBMS.
Kuyini Isidleke se-Apache?
I-Apache Hive iwumthombo ovulekile indawo yokugcina idatha Uhlelo lokufunda, ukubhala, nokuphatha amasethi edatha amakhulu agcinwe ngokuqondile ku-Apache Hadoop Distributed File System (HDFS) noma ezinye izinhlelo zokugcina idatha njenge-Apache HBase.
Onjiniyela be-SQL bangasebenzisa i-Hive ukuze bakhe izitatimende ze-Hive Query Language (HQL) zombuzo wedatha nokuhlaziywa okufanayo nezitatimende ze-SQL ezivamile. Yakhelwe ukwenza i-MapReduce programming ibe lula ngokususa isidingo sokufunda nokubhala ikhodi ende ye-Java. Kunalokho, ungabhala imibuzo yakho ku-HQL, futhi i-Hive izokwakhela imephu futhi ikunciphisele imisebenzi.
Isixhumi esibonakalayo esifana ne-SQL se-Apache Hive sesiphenduke Izinga Legolide lokwenza ukusesha kwe-ad-hoc, ukufingqa, nokuhlaziya idatha ye-Hadoop. Uma ifakiwe efwini amanethiwekhi ekhompyutha, lesi sixazululo singabizi kakhulu futhi siyingozi, yingakho amafemu amaningi, kuhlanganise ne-Netflix ne-Amazon, eqhubeka nokuthuthukisa nokuthuthukisa i-Apache Hive.
Umlando
Ngesikhathi beku-Facebook, u-Joydeep Sen Sarma kanye no-Ashish Thusoo baqamba ngokuhlanganyela i-Apache Hive. Bobabili babonile ukuthi ukuze bathole okuningi ku-Hadoop, kuzodingeka bakhe imisebenzi eyinkimbinkimbi ye-Java Map-Reduce. Babone ukuthi ngeke bakwazi ukufundisa amaqembu abo obunjiniyela nokuhlaziya akhula ngokushesha ngamakhono abazowadinga ukuze bathuthukise i-Hadoop kuyo yonke inkampani. Onjiniyela nabahlaziyi bavame ukusebenzisa i-SQL njengesixhumi esibonakalayo somsebenzisi.
Nakuba i-SQL ingahlangabezana nezidingo eziningi zezibalo, abathuthukisi baphinde bahlose ukuhlanganisa ukuhleleka kwe-Hadoop. I-Apache Hive iqhamuke kulezi zinhloso ezimbili: ulimi lwesimemezelo olusekelwe ku-SQL oluphinde lwavumela abathuthukisi ukuthi balethe eyabo imibhalo nezinhlelo lapho i-SQL inganele.
Iphinde yathuthukiswa ukuze ibambe imethadatha emaphakathi (i-Hadoop-based) mayelana nawo wonke amadathasethi enkampani ukwenza ukwakhiwa kwezinhlangano eziqhutshwa idatha kube lula.
Isebenza kanjani i-Apache Hive?
Kafushane, i-Apache Hive iguqula uhlelo lokufaka olubhalwe ngolimi lwe-HiveQL (SQL-like) lube umsebenzi owodwa noma ngaphezulu we-Java MapReduce, Tez, noma we-Spark. (Zonke lezi zinjini zokubulala zihambisana ne-Hadoop YARN.) Ngemva kwalokho, i-Apache Hive ihlela idatha ibe amatafula we-Hadoop Distributed File System HDFS) futhi yenza imisebenzi kuqoqo ukuze uthole impendulo.
Idatha
Amathebula e-Apache Hive ahlelwa ngendlela efanayo njengoba amathebula kusizindalwazi esihlobene ahlelwa, anamayunithi edatha asukela ngosayizi ukusuka kokukhudlwana kuye kwabancane. Imininingo egciniwe yenziwe ngamathebula ahlukaniswe ngokwezigaba, aphinde ahlukaniswe abe amabhakede. I-HiveQL (Ulimi Lombuzo We-Hive) isetshenziselwa ukufinyelela idatha, engashintshwa noma yengezwe. Idatha yethebula ihlelwa ngaphakathi kwedathabheyisi ngayinye, futhi ithebula ngalinye linenkomba yalo ye-HDFS.
Architecture
Manje sizokhuluma ngesici esibaluleke kakhulu se-Hive Architecture. Izingxenye ze-Apache Hive zimi kanje:
I-Metastore - Igcina umkhondo wolwazi mayelana netafula ngalinye, njengesakhiwo nendawo elikuyo. Imethadatha yokuhlukanisa nayo ifakiwe ku-Hive. Lokhu kuvumela umshayeli ukuthi alandelele ukuqhubeka kwamasethi edatha ahlukene asabalele kuqoqo. Idatha igcinwa ngefomethi evamile ye-RDBMS. Imethadatha ye-Hive ibaluleke kakhulu ukuze umshayeli agcine umkhondo wedatha. Iseva eyisipele iphinda idatha njalo ukuze iphinde itholakale esimweni sokulahleka kwedatha.
Driver - Izitatimende ze-HiveQL zitholwa umshayeli, osebenza njengesilawuli. Ngokusungula izikhathi, umshayeli uqala ukwenziwa kwesitatimende. Ilandelela isikhathi sokuphila nesigungu esiphezulu. Ngesikhathi kusetshenziswa isitatimende se-HiveQL, umshayeli ugcina imethadatha edingekayo. Iphinde isebenze njengedatha noma iphoyinti lokuqoqa imiphumela yombuzo kulandela inqubo Yehlisa.
Umhlanganisi - Isebenzisa ukuhlanganisa imibuzo ye-HiveQL. Umbuzo manje usuguqulelwe ohlelweni lokuqalisa. Imisebenzi ibhalwe ohlelweni. Iphinde ihlanganise nezinyathelo okumele i-MapReduce izithathe ukuze ithole umphumela njengoba uhunyushwe ngumbuzo. Umbuzo uguqulelwa ku-Abstract Syntax Tree by Hive's compiler (AST). Iguqula i-AST ibe i-Directed Acyclic Graph ngemva kokuhlola ukuhambisana namaphutha esikhathi sokuhlanganisa (DAG).
I-Optimizer - Ithuthukisa i-DAG ngokwenza izinguquko ezihlukile kuhlelo lokusebenza. Ihlanganisa ukuguqulwa kokusebenza okuthuthukisiwe, njengokuguqula ipayipi lokuhlanganisa libe ukuhlanganisa okukodwa. Ukuze kuthuthukiswe isivinini, isilungiseleli singase sihlukanise imisebenzi, efana nokusebenzisa ukuguqulwa kudatha ngaphambi kokwenza umsebenzi wokunciphisa.
Umabi wefa - Umabi wefa uqhuba imisebenzi lapho ukuhlanganisa nokwenza kahle sekuqediwe. Imisebenzi ihlinzekwa nguMabi Wefa.
I-CLI, i-UI, ne-Thrift Server - I-interface yomugqa womyalo (CLI) iwumsebenzisi ovumela umsebenzisi wangaphandle ukuthi axhumane neHive. Iseva ye-Hive's thrift, efana nephrothokholi ye-JDBC noma ye-ODBC, ivumela amaklayenti angaphandle ukuthi axhumane ne-Hive ngenethiwekhi.
Security
I-Apache Hive ihlanganiswe nokuphepha kwe-Hadoop, esebenzisa i-Kerberos ukuze kuqinisekiswe iklayenti neseva. I-HDFS ibeka izimvume zamafayela asanda kukhiqizwa ku-Apache Hive, ekuvumela ukuthi ugunyaze umsebenzisi, iqembu, nabanye.
Izimpawu ezisemqoka
- I-Hive isekela amatafula angaphandle, akuvumela ukuthi ucubungule idatha ngaphandle kokuyigcina ku-HDFS.
- Iphinde inike amandla ukuhlukaniswa kwedatha ezingeni lethebula ukuze kukhuphule isivinini.
- I-Apache Hive ihlangabezana kahle nesidingo se-interface esisezingeni eliphansi sikaHadoop.
- I-Hive yenza ukufinyezwa kwedatha, ukubuza, nokuhlaziya kube lula.
- I-HiveQL ayidingi amakhono okuhlela; ukuqonda okulula kwemibuzo ye-SQL kwanele.
- Futhi singasebenzisa i-Hive ukwenza imibuzo ye-ad-hoc ukuze sihlaziye idatha.
- Iyakhula, ijwayelekile, futhi iyavumelana nezimo.
- I-HiveQL ayidingi amakhono okuhlela; ukuqonda okulula kwemibuzo ye-SQL kwanele.
Izinzuzo
I-Apache Hive ivumela imibiko yokuphela kosuku, ukuhlolwa kokwenziwe kwansuku zonke, ukusesha kwe-ad-hoc, nokuhlaziywa kwedatha. Ukuqonda okuphelele okunikezwe i-Apache Hive kunikeza izinzuzo ezibalulekile zokuncintisana futhi kwenze kube lula kuwe ukuthi uphendule izimfuno zemakethe.
Nazi ezinye zezinzuzo zokuba nolwazi olunjalo lutholakale kalula:
- Kulula ukusetshenziswa - Ngolimi lwayo olufana ne-SQL, ukubuza idatha kulula ukuyiqonda.
- Ukufakwa kwedatha okusheshisiwe - Ngenxa yokuthi i-Apache Hive ifunda i-schema ngaphandle kokuqinisekisa uhlobo lwethebula noma incazelo ye-schema, idatha akudingekile ukuba ifundwe, ihlungwe, futhi ihlelwe ku-disc ngefomethi yangaphakathi yesizindalwazi. Ngokuphambene, kusizindalwazi esivamile, idatha kufanele iqinisekiswe isikhathi ngasinye lapho ingezwa.
- I-scalability ephakeme, ukuguquguquka, nokusebenza kahle kwezindleko - Ngenxa yokuthi idatha igcinwa ku-HDFS, i-Apache Hive ingabamba ama-petabytes angu-100 wedatha, ikwenze kube inketho eyingozi kakhulu kune-database evamile. I-Apache Hive, njengesevisi ye-Hadoop esekwe efwini, ivumela amakhasimende ukuthi aphendukisele phezulu naphansi ngokushesha amaseva abonakalayo ukuze ahlangabezane nemithwalo yemisebenzi eshintshayo.
- Umthamo omkhulu wokusebenza - Amasethi edatha amakhulu angaphatha imibuzo efika ku-100,000 ngehora.
Ukulinganiselwa
- Ngokuvamile, imibuzo ye-Apache Hive inokubambezeleka okuphezulu kakhulu.
- Ukusekelwa kwe-subquery kunqunyelwe.
- Imibuzo yesikhathi sangempela nezinguquko zeleveli yomugqa azitholakali ku-Apache Hive.
- Akukho ukusekelwa kwemibono eyenziwe ngezinto ezibonakalayo.
- Esidlekeni, izenzo zokuvuselela nokususa azisekelwe.
- Ayihloselwe i-OLTP (inqubo yenguquko eku-inthanethi).
Ukuqalisa nge-Apache Hive
I-Apache Hive inguzakwethu oqinile we-Hadoop owenza kube lula futhi aqondise ukuhamba komsebenzi wakho. Ukuthola okuningi ku-Apache Hive, ukuhlanganiswa okungenamthungo kubalulekile. Isinyathelo sokuqala ukuya ku- iwebhusayithi.
1. Ukufaka Isidleke Esisuka Ekukhishweni Okuzinzile
Qala ngokulanda ukukhishwa kwakamuva okuzinzile kwe-Hive kwesinye sezibuko zokulanda ze-Apache (bona Ukukhishwa kweHive). I-tarball kufanele isuswe. Lokhu kuzodala ifolda engaphansi ebizwa ngokuthi i-hive-xyz (lapho i-xyz iyinombolo yokukhishwa):
Setha okuguquguqukayo kwemvelo HIVE_HOME ukuze ukhombe inkomba yokufaka:
Ekugcineni, engeza i-$HIVE_HOME/bin kweyakho PATH
:
2. I-Running Hive
I-Hive isebenzisa i-Hadoop, ngakho-ke:
- kufanele ube ne-Hadoop endleleni yakho NOMA
3. Ukusebenza kwe-DLL
Ukudala I-Hive Table
ikhiqiza itafula eliqanjwe ngokuthi ama-pokes elinamakholomu amabili, elokuqala liyinombolo kanye neyesibili okuwuchungechunge.
Ukuphequlula Emathebula
Ukubhala Wonke Amathebula
Ukushintsha kanye Nokwehlisa Amathebula
Amagama ethebula angashintshwa futhi amakholomu angangezwa noma ashintshwe:
Kubalulekile ukuqaphela ukuthi REPLACE COLUMNS ithatha indawo yawo wonke amakholomu akhona kuyilapho ishintsha kuphela ukwakheka kwethebula hhayi idatha. I-SerDe yomdabu kufanele isetshenziswe etafuleni. BUYISA AMAKHOLOMU angasetshenziswa futhi ukususa amakholomu ku-schema sethebula:
Ewisa Amathebula
Kunemisebenzi eminingi eyengeziwe nezici ku-Apache Hive ongafunda ngazo ngokuvakashela iwebhusayithi esemthethweni.
Isiphetho
Incazelo ye-Hive iwuhlelo lokusebenzisa idatha lokubuza nokuhlaziya amadathasethi amakhulu akhelwe phezu kwe-Apache Hadoop. Ochwepheshe bakhetha yona ngaphezu kwezinye izinhlelo, amathuluzi, nesofthiwe njengoba yakhelwe kakhulu idatha ebanzi ye-Hive futhi kulula ukuyisebenzisa.
Sethemba ukuthi lesi sifundo sikusiza ukuthi uqalise nge-Apache Hive futhi wenze ukuhamba kwakho komsebenzi kusebenze kahle. Sazise kumazwana.
shiya impendulo