I-Hive sisixhobo esisetyenziswa ngokubanzi kwi-Big Data Analytics kwishishini, kwaye yindawo entle yokuqalisa ukuba umtsha kwiDatha enkulu. Esi sifundo se-Apache Hive sidlula kwizinto ezisisiseko ze-Apache Hive, kutheni i-hive iyimfuneko, iimpawu zayo, nayo yonke enye into ekufuneka uyazi.
Masiqale siqonde isakhelo seHadoop apho iApache Hive yakhelwe phezu kwayo.
Apache Hadoop
I-Apache Hadoop isimahla kwaye Vula Umnikezi iqonga lokugcina kunye nokusetyenzwa kwedatha enkulu ukusuka kubukhulu ukusuka kwigigabytes ukuya kwipetabytes. I-Hadoop ivumela ukudibanisa iikhompyuter ezininzi ukuba zihlalutye iiseti zedatha ezinkulu ngokuhambelanayo, kunokuba ifune ikhompyuter enye enkulu ukugcina kunye nokuhlalutya idatha.
Imephu yeNcisa kunye neHadoop yeNkqubo yeFayile eSasazo zimbini zamacandelo:
- ImephuLungisa -I-MapReduce bubuchule benkqubo obunxuseneyo bokuphatha umthamo omkhulu wedatha ecwangcisiweyo, enesiqingatha, kunye nedatha engacwangciswanga kuluhlu lwempahla yorhwebo.
- IiHDFS - I-HDFS (i-Hadoop Distributed File System) yinxalenye yesakhelo se-Hadoop egcina kwaye iqhube idatha. Yinkqubo yefayile yokunyamezela impazamo esebenza kwihardware eqhelekileyo
Iiprojekthi ezincinci (izixhobo) kwi-ecosystem ye-Hadoop, kuquka i-Sqoop, i-Pig, kunye ne-Hive, isetyenziselwa ukunceda iimodyuli ze-Hadoop.
- Hive -I-Hive sisikhokelo sokubhala izikripthi zesimbo se-SQL ezenza i-MapReduce computations.
- Ingulube –Ihagu lulwimi lwenkqubo lwenkqubo olunokuthi lusetyenziswe ukwenza iskripthi seenkqubo zeMapReduce.
- Sqoop -I-Sqoop sisixhobo sokungenisa kunye nokuthumela ngaphandle idatha phakathi kwe-HDFS kunye ne-RDBMS.
Yintoni i Indlu yeApache?
I-Apache Hive ngumthombo ovulekileyo yokugcina idatha inkqubo yokufunda, ukubhala, nokulawula iiseti zedatha ezinkulu ezigcinwe ngqo kwi-Apache Hadoop Distributed File System (HDFS) okanye ezinye iinkqubo zokugcina idatha njenge-Apache HBase.
Abaphuhlisi be-SQL banokusebenzisa i-Hive ukwenza i-Hive Query Language (HQL) iingxelo zombuzo wedatha kunye nohlalutyo oluthelekiseka kwiingxelo ze-SQL eziqhelekileyo. Yenzelwe ukwenza iMapReduce programming lula ngokuphelisa isidingo sokufunda nokubhala ikhowudi ende yeJava. Endaweni yoko, ungabhala imibuzo yakho kwi-HQL, kwaye uHive uya kukwakhela imephu kwaye anciphise imisebenzi yakho.
I-SQL-like interface ye-Apache Hive iye yaba nguMgangatho weGolide wokwenza uphando lwe-ad-hoc, ukushwankathela, kunye nokuhlalutya idatha ye-Hadoop. Xa ifakiwe kwilifu iinethiwekhi zekhompyutha, esi sicombululo sixabisa kakhulu kwaye sinobunzima, yingakho iifemu ezininzi, kuquka iNetflix kunye ne-Amazon, ziqhubeka nokuphuhlisa nokuphucula i-Apache Hive.
imbali
Ngexesha labo kuFacebook, uJoydeep Sen Sarma kunye noAshish Thusoo baye benza iApache Hive. Bobabini baqaphele ukuba ukufumana okuninzi kwi-Hadoop, kuya kufuneka benze imephu yeJava enzima-Nciphisa imisebenzi. Baye baqaphela ukuba abanakukwazi ukufundisa ubunjineli babo obukhula ngokukhawuleza kunye namaqela ahlalutyayo ngezakhono abanokuzidinga ukuze basebenzise iHadoop kwinkampani iphela. Iinjineli kunye nabahlalutyi bahlala besebenzisa iSQL njengojongano lomsebenzisi.
Ngelixa i-SQL inokuhlangabezana noninzi lweemfuno zokuhlalutya, abaphuhlisi nabo bajonge ukubandakanya inkqubo yeHadoop. I-Apache Hive yavela kwezi njongo zimbini: ulwimi lwe-SQL-based declarative olukwavumela abaphuhlisi ukuba bangenise izikripthi kunye neenkqubo zabo xa i-SQL yayinganelanga.
Kwakhona kwaphuhliswa ukubamba i-metadata ephakathi (i-Hadoop-based) malunga nazo zonke iiseti zedatha kwinkampani ukwenza ukwakhiwa kwemibutho eqhutywe ngedatha lula.
Isebenza njani iApache Hive?
Ngamafutshane, i-Apache Hive iguqula inkqubo yokufaka ebhalwe kwi-HiveQL (SQL-like) ulwimi kwi-Java enye okanye ngaphezulu ye-MapReduce, Tez, okanye imisebenzi ye-Spark. (Zonke ezi injini zokubulala zihambelana neHadoop YARN.) Emva koko, i-Apache Hive ilungiselela idatha kwiitafile zeHadoop Distributed File System HDFS) kwaye yenza imisebenzi kwi-cluster ukuze ufumane impendulo.
Iinkcukacha
Iitafile ze-Apache Hive zicwangciswe ngendlela efanayo njengoko iitheyibhile kwi-database yobudlelwane zicwangciswe, kunye neeyunithi zedatha ukusuka kubukhulu obukhulu ukuya kuncinci. Iidatabase zenziwe ngeetheyibhile ezahlulwe ngokwahlulahlulo, eziphinde zohlulwe zibe ngamabhakethi. I-HiveQL (i-Hive Query Language) isetyenziselwa ukufikelela kwidatha, enokuthi iguqulwe okanye ifakwe. Idatha yetheyibhile ilandelelwa ngaphakathi kwesiseko sedatha nganye, kwaye itafile nganye inoluhlu lwayo lwe-HDFS.
Architecture
Ngoku siza kuthetha ngowona mba ubalulekileyo weHive Architecture. Amacandelo e-Apache Hive ami ngolu hlobo lulandelayo:
Metastore — Igcina umkhondo wolwazi malunga netheyibhile nganye, njengesakhiwo kunye nendawo yayo. Imetadata yokwahlula iqukiwe ngokunjalo kwiHive. Oku kuvumela umqhubi ukuba agcine umkhondo wenkqubela phambili yeeseti zedatha ezahlukeneyo ezisasazeke kwiqela. Idatha igcinwe kwifomathi ye-RDBMS eqhelekileyo. Imetadata yeHive ibaluleke kakhulu kumqhubi ukugcina umkhondo wedatha. Umncedisi wogcino uphinda-phinda idatha rhoqo ukuze ifumaneke kwakhona kwimeko yokulahleka kwedatha.
umqhubi -Iingxelo ze-HiveQL zifunyenwe ngumqhubi, osebenza njengomlawuli. Ngokuseka iiseshoni, umqhubi uqalisa ukuphunyezwa kwengxelo. Igcina umkhondo wobomi bomphathi kunye nenkqubela phambili. Ngethuba lokuphunyezwa kwengxelo ye-HiveQL, umqhubi ugcina i-metadata efunekayo. Ikwasebenza njengedatha okanye inqaku lokuqokelela iziphumo zemibuzo emva kwenkqubo yokuNcitshiswa.
Umhlanganisi -Yenza ukuhlanganisa imibuzo ye-HiveQL. Umbuzo ngoku uguqulelwe kwisicwangciso sophumezo. Imisebenzi idweliswe kwisicwangciso. Ikwabandakanya amanyathelo ekufuneka iMapReduce iwathathe ukufumana isiphumo njengoko iguqulelwe ngumbuzo. Umbuzo uguqulelwa ekubeni nguMthi we-Abstract Syntax ngumqambi weHive's compiler (AST). Ukuguqula i-AST kwi-Directed Acyclic Graph emva kokutshekisha ukuhambelana kunye nokuqulunqa ixesha lokuphoswa (DAG).
Optimizer -Yenza iDAG ngokwenza utshintsho olwahlukileyo kwisicwangciso sophumezo. Idibanisa iinguqu zokuphucula ukusebenza kakuhle, okufana nokujika umbhobho wokudibanisa ube lidibaniso elinye. Ukuphucula isantya, isilungisi sinokwahlula imisebenzi, efana nokusebenzisa inguqu kwidatha phambi kokwenza umsebenzi wokunciphisa.
umenzi welifa -Umebi welifa uqhuba imisebenzi xa ukudityaniswa kunye nokwenza ngcono kugqityiwe. Imisebenzi ilungiswa nguMabi welifa.
I-CLI, i-UI, kunye ne-Thrift Server – Ujongano lomgca womyalelo (CLI) lujongano lomsebenzisi oluvumela umsebenzisi wangaphandle ukuba anxibelelane neHive. Umncedisi weHive's thrift, ofanayo ne-JDBC okanye i-ODBC protocol, ivumela abathengi bangaphandle ukuba banxibelelane neHive ngenethiwekhi.
ukhuseleko
I-Apache Hive idityaniswe nokhuseleko lwe-Hadoop, olusebenzisa i-Kerberos yokuqinisekisa ukungqinelana kwe-client-server. I-HDFS iyalela iimvume kwiifayile ezisandul 'ukuveliswa kwi-Apache Hive, ikuvumela ukuba uvume ngumsebenzisi, iqela kunye nabanye.
Ezona mpawu
- I-Hive ixhasa iitafile zangaphandle, ezikuvumela ukuba usebenze idatha ngaphandle kokuyigcina kwi-HDFS.
- Ikwavumela ukwahlulwa kwedatha kwinqanaba letafile ukwandisa isantya.
- I-Apache Hive idibana kakuhle nemfuno yojongano olukwinqanaba elisezantsi leHadoop.
- I-Hive yenza isishwankathelo sedatha, ukubuza, kunye nohlalutyo lula.
- I-HiveQL ayifuni naziphi na izakhono zokucwangcisa; ukuqonda okulula kwemibuzo yeSQL kwanele.
- Sinokusebenzisa iHive ukwenza imibuzo ye-ad-hoc kuhlalutyo lwedatha.
- Iyakhula, iqhelekile, kwaye iguquguquka.
- I-HiveQL ayifuni naziphi na izakhono zokucwangcisa; ukuqonda okulula kwemibuzo yeSQL kwanele.
benefits
I-Apache Hive ivumela iingxelo zokuphela komhla, ukuvavanywa kwentengiselwano yemihla ngemihla, ukukhangela kwe-ad-hoc, kunye nohlalutyo lwedatha. Ukuqonda okubanzi okubonelelwe yi-Apache Hive kunika uncedo olubalulekileyo lokhuphiswano kwaye kwenze kube lula kuwe ukuba uphendule kwiimfuno zentengiso.
Nazi ezinye zeengenelo zokuba nolwazi olunjalo lufumaneka ngokulula:
- Ukusetyenziswa kokusetyenziswa Ngolwimi lwayo olufana ne-SQL, ukubuza idatha kulula ukuyiqonda.
- Ukufakwa kwedatha okukhawulezileyo - Ngenxa yokuba i-Apache Hive ifunda i-schema ngaphandle kokuqinisekisa uhlobo lwetafile okanye inkcazo ye-schema, idatha akufuneki ifundwe, ihlulwe, kwaye ilandelelwe kwi-disc kwifomathi yangaphakathi yesiseko sedatha. Ngokwahlukileyo, kwisiseko sedatha esiqhelekileyo, idatha kufuneka iqinisekiswe rhoqo xa yongezwa.
- I-scalability ephezulu, ukuguquguquka, kunye nokusebenza kakuhle kweendleko -Ngenxa yokuba idatha igcinwe kwi-HDFS, i-Apache Hive inokubamba i-100 ye-petabytes yedatha, iyenza ibe yinto enokunyuka kakhulu kunesiseko sedatha esiqhelekileyo. I-Apache Hive, njengenkonzo ye-Hadoop esekwe ilifu, ivumela abathengi ukuba bajikeleze ngokukhawuleza behla benyuka kwiiseva ezibonakalayo ukuhlangabezana nokutshintsha komthwalo womsebenzi.
- Umthamo omkhulu wokusebenza -Iidatha ezinkulu zinokuphendula ukuya kuthi ga kwi-100,000 yemibuzo ngeyure.
Imida
- Ngokubanzi, imibuzo ye-Apache Hive ine-latency ephezulu kakhulu.
- Inkxaso ye-subquery ilinganiselwe.
- Imibuzo yexesha lokwenyani kunye notshintsho lwenqanaba lomqolo alufumaneki kwiApache Hive.
- Akukho nkxaso yeembono ezenziweyo.
- Kwi-hive, uhlaziyo kunye nokucima izenzo azixhaswanga.
- Ayijoliswanga kwi-OLTP (inkqubo yenguqu ye-intanethi).
Ukuqalisa ngeApache Hive
I-Apache Hive liqabane elomeleleyo leHadoop elenza lula kwaye lilungelelanise ukuhamba kwakho komsebenzi. Ukufumana okuninzi kwi-Apache Hive, udibaniso olungenamthungo lubalulekile. Inyathelo lokuqala kukuya kwi website.
1. UFakelo lweHive olusuka kuKhupho oluZinzileyo
Qala ngokukhuphela ukukhutshwa okuzinzile kweHive kwenye yezipili zokukhuphela ze-Apache (bona Ukukhutshwa kweHive). Itarball kufuneka ke ngoko yothulwe. Oku kuyakudala isiqulathi seefayili esisezantsi esibizwa ngokuba yi-hive-xyz (apho i-xyz iyinombolo yokukhupha):
Cwangcisa ukuguquguquka kokusingqongileyo HIVE_HOME ukwalathe kuluhlu lofakelo:
Okokugqibela, yongeza i-$HIVE_HOME/umgqomo kweyakho PATH
:
2. Ukubaleka iHive
I-Hive isebenzisa iHadoop, ngoko:
- kufuneka ube neHadoop endleleni yakho OKANYE
3. Ukusebenza kweDLL
Ukudala iHive Table
yenza itheyibhile ebizwa ngokuba yipokes enezintlu ezimbini, eyokuqala ibe yinani elipheleleyo kwaye eyesibini ibe ngumtya.
Ukukhangela kwiiThebhile
Ukudwelisa zonke iiTheyibhile
Ukutshintsha nokulahla iiTafile
Amagama etheyibhile anokutshintshwa kwaye iikholamu zinokongezwa okanye zitshintshwe:
Kuyathakazelisa ukuqaphela ukuba REPLACE COLUMNS ibuyisela yonke imiqolo ekhoyo ngelixa utshintsha kuphela ulwakhiwo lwetafile hayi idatha. I-SerDe yomthonyama kufuneka isetyenziswe kwitafile. TSHINTSHA IMIKHOLAMINA isenokusetyenziswa ukususa iikholamu kwischema setafile:
Ukuwisa iiTafile
Kukho imisebenzi eyongezelelweyo kunye neempawu kwi-Apache Hive onokufunda ngayo ngokundwendwela iwebhusayithi esemthethweni.
isiphelo
Inkcazo yeHive lujongano lwenkqubo yedatha yokubuza kunye nohlalutyo lweedatha ezinkulu ezakhelwe phezulu kweApache Hadoop. Iingcali zikhetha ngaphezu kwezinye iinkqubo, izixhobo, kunye nesoftware kuba yenzelwe iHive data ebanzi kwaye kulula ukuyisebenzisa.
Ndiyathemba ukuba esi sifundo siyakunceda uqalise ngeApache Hive kwaye wenze ukuhamba kwakho komsebenzi kusebenze ngakumbi. Sazise kwizimvo.
Shiya iMpendulo