Hive chishandiso chinoshandiswa zvakanyanya Big Data Analytics mubhizinesi, uye inzvimbo inonakidza yekutanga kana uri mutsva kuBig Data. Ichi chidzidzo cheApache Hive chinopfuura nepakati pezvakakosha zveApache Hive, nei mukoko uchidikanwa, maficha awo, uye zvimwe zvese zvaunofanira kuziva.
Ngatitangei kunzwisisa iyo Hadoop chimiro pakavakirwa Apache Hive.
Apache Hadoop
Apache Hadoop ndeyemahara uye pachena-mabviro chikuva chekuchengetedza uye kugadzirisa ma dataset makuru kubva muhukuru kubva pagigabytes kusvika petabytes. Hadoop inobvumira kuunganidza akawanda makomputa kuti aongorore akakura dhataseti nenzira yakafanana, pane kuda imwe hombe komputa kuti ichengetedze uye iongorore iyo data.
MepuReduce uye Hadoop Distributed File System maviri ezvikamu:
- MepuDeredza -MepuReduce inzira yakafanana yekuronga kubata mavhoriyamu akakura akarongeka, akaumbwa, uye asina kurongeka dhata pamasumbu ezvigadzirwa zvehardware.
- HDFS -HDFS (Hadoop Distributed File System) chinhu cheHadoop chimiro chinochengeta uye kugadzirisa data. Iyo inokanganisa-inoshivirira faira system inomhanya pane yakajairwa hardware
Different sub-projects (maturusi) muHadoop ecosystem, kusanganisira Sqoop, Nguruve, uye Hive, anoshandiswa kubatsira Hadoop modules.
- mukoko -Hive chigadziriso chekunyora SQL-maitiro zvinyorwa anoita MepuReduce computations.
- Nguruve - Nguruve inzira yekuronga mutauro inogona kushandiswa kugadzira script yeMepuReduce maitiro.
- Sqoop -Sqoop chishandiso chekupinza uye kutumira kunze data pakati peHDFS neRDBMS.
Chii Apache Mukoko?
Apache Hive ndeye yakavhurika-sosi deta rekuchengeta chirongwa chekuverenga, kunyora, uye kubata hombe data seti dzakachengetwa zvakananga muApache Hadoop Distributed File System (HDFS) kana mamwe masisitimu ekuchengetedza data seApache HBase.
Vagadziri veSQL vanogona kushandisa Hive kugadzira Hive Query Mutauro (HQL) zvirevo zvemubvunzo wedata uye ongororo inofananidzwa neyakajairwa SQL zvirevo. Yakagadzirwa kuti iite kuti MapReduce programming nyore nekubvisa kukosha kwekudzidza nekunyora refu Java kodhi. Pane kudaro, unogona kunyora mibvunzo yako muHQL, uye Hive inovaka iyo mepu nekudzikisira mabasa ako.
Iyo SQL-senge interface yeApache Hive yave iyo Goridhe Standard yekutsvaga ad-hoc, kupfupikisa, uye kuongorora Hadoop data. Kana yaiswa mugore computing network, mhinduro iyi inonyanya kudhura uye inokonzeresa, ndosaka mafemu mazhinji, kusanganisira Netflix neAmazon, achiramba achigadzira nekuvandudza Apache Hive.
History
Munguva yavo paFacebook, Joydeep Sen Sarma naAshish Thusoo vakagadzira pamwe chete Apache Hive. Ivo vese vakaziva kuti kuti vawane zvakanyanya kubva kuHadoop, vaizofanira kugadzira mamwe akaomarara eJava Mepu-Drusa mabasa. Vakaziva kuti havaizokwanisa kudzidzisa zvikwata zvavo zveinjiniya nekuongorora zviri kukurumidza hunyanzvi hwavaizoda kusimudzira Hadoop mukambani yese. Mainjiniya nevanoongorora vanowanzo shandisa SQL semushandisi interface.
Nepo SQL yaigona kusangana nezvakawanda zvezvido zve analytics, vagadziri vacho vaidawo kubatanidza Hadoop's programmability. Apache Hive yakasimuka kubva pazvinangwa zviviri izvi: SQL-based declarative language iyo zvakare yakagonesa vanogadzira kuunza yavo mascript uye zvirongwa apo SQL yanga isina kukwana.
Yakagadziridzwawo kubata metadata yepakati (Hadoop-based) nezvese datasets mukambani kuita kuti kuvakwa kwemasangano anofambiswa nedata kuve nyore.
Apache Hive inoshanda sei?
Muchidimbu, Apache Hive inoshandura chirongwa chekuisa chakanyorwa muHiveQL (SQL-like) mutauro kuita imwechete kana kupfuura Java MepuReduce, Tez, kana Spark mabasa. (Zvose izvi injini dzekuuraya dzinoenderana neHadoop YARN.) Mushure mezvo, Apache Hive inoronga data mumatafura eHadoop Distributed File System HDFS) uye inoita mabasa pachikwata kuti uwane mhinduro.
Data
Iwo maApache Hive matafura akarongwa nenzira imwecheteyo sematafura ari mudura rehukama akarongwa, aine data mauniti kubva muhukuru kusvika kudiki. Databases inoumbwa nematafura akakamurwa kuita kupatsanurwa, ayo akakamurwazve kuita mabhakiti. HiveQL (Hive Query Mutauro) inoshandiswa kuwana iyo data, iyo inogona kuchinjwa kana kuwedzerwa. Tafura data inoteedzerwa mukati me database yega yega, uye tafura yega yega ine yayo HDFS dhairekitori.
akitekicha
Iye zvino tichataura nezve inonyanya kukosha chikamu cheHive Architecture. Izvo zvikamu zveApache Hive ndezvizvi:
Metastore - Inochengeta ruzivo rwetafura yega yega, senge chimiro uye nzvimbo. Iyo metadata yekuparadzanisa inosanganisirwa muHive. Izvi zvinobvumira mutyairi kuti atarise kufambira mberi kwemaseti akasiyana-siyana akapararira musumbu. Iyo data inochengetwa mune yakajairwa RDBMS fomati. Hive metadata yakakosha zvakanyanya kuti mutyairi achengetedze data. Iyo backup server inodzokorora data nguva nenguva kuitira kuti inogona kudzoserwa kana data rarasikirwa.
mutyairi -HiveQL zvirevo zvinogamuchirwa nemutyairi, anoshanda semutongi. Nokutanga zvirongwa, mutyairi anotanga kuitwa kwechirevo. Inochengetedza hupenyu hwemukuru uye kufambira mberi. Panguva yekuitwa kwekutaura kweHiveQL, mutyairi anochengetedza metadata inodiwa. Iyo inoshandawo sedhata kana mubvunzo mhedzisiro yekuunganidza poindi ichitevera Kuderedza maitiro.
Mutengesi -Inoita iyo HiveQL yemubvunzo kuunganidzwa. Mubvunzo zvino washandurwa kuita chirongwa chekuuraya. Mabasa akanyorwa muurongwa. Inosanganisirawo nhanho dzinofanirwa kutora MepuReduce kuti iwane mhedzisiro sekududzirwa nemubvunzo. Mubvunzo unoshandurwa kuita Abstract Syntax Tree neHive's compiler (AST). Inoshandura iyo AST kuita Yakatungamirwa Acyclic Girafu mushure mekutarisa kuenderana uye kuunganidza-nguva kukanganisa (DAG).
Optimizer -Inokwidziridza DAG nekuita shanduko dzakasiyana paurongwa hwekuuraya. Inosanganisa shanduko yekuvandudza kushanda zvakanaka, sekushandura pombi yemajoini kuita kubatana kumwe chete. Kuti uvandudze kumhanya, iyo optimizer inogona kupatsanura zviitiko, sekushandisa shanduko kune data usati waita yekudzikisa oparesheni.
Muurayi -Iye executor anomhanyisa mabasa kana kuunganidza uye kugadzirisa kwapera. Mabasa anofambiswa nepaipi naMubati.
CLI, UI, uye Thrift Server - Iyo yekuraira-mutsara interface (CLI) ndeye mushandisi interface inobvumira mushandisi wekunze kutaurirana neHive. Hive's thrift server, yakafanana neiyo JDBC kana ODBC protocol, inobvumira vatengi vekunze kutaurirana neHive kuburikidza netiweki.
chibatiso
Apache Hive inosanganiswa neHadoop chengetedzo, iyo inoshandisa Kerberos kune mutengi-server mutual authentication. Iyo HDFS inoraira mvumo yemafaira achangobva kugadzirwa muApache Hive, ichikubvumidza kuti ubvumidzwe nemushandisi, boka, uye nevamwe.
Key zvinhu
- Hive inotsigira matafura ekunze, ayo anokuita kuti uite data pasina kuichengeta muHDFS.
- Inogonesawo kupatsanurwa kwedata padanho retafura kuti iwedzere kukurumidza.
- Apache Hive inosangana zvakanaka neHadoop's low-level interface inoda.
- Hive inoita kuti kupfupisa data, kubvunza, uye kuongorora kuve nyore.
- HiveQL haidi chero hunyanzvi hwekuronga; kunzwisisa kuri nyore kwemibvunzo yeSQL yakakwana.
- Isu tinogona zvakare kushandisa Hive kuita ad-hoc mibvunzo yekuongorora data.
- Inokura, inozivikanwa, uye inochinjika.
- HiveQL haidi chero hunyanzvi hwekuronga; kunzwisisa kuri nyore kwemibvunzo yeSQL yakakwana.
Benefits
Apache Hive inobvumira kupera-kwezuva mishumo, kuongororwa kwezuva nezuva kwekutengeserana, kutsvaga kwe-ad-hoc, uye kuongororwa kwedata. Iwo manzwisisiro akazara anopihwa neApache Hive anopa akakosha emakwikwi mabhenefiti uye anoita kuti zvive nyore kwauri kupindura kune zvinodiwa nemusika.
Hezvino zvimwe zvezvakanakira kuwana ruzivo rwakadaro zviri nyore:
- Nyore -Nemutauro wayo weSQL, kubvunza data kuri nyore kunzwisisa.
- Yakakasira kuisa data -Nekuti Apache Hive inoverenga schema pasina kuonesa rudzi rwetafura kana schema tsananguro, data haifanirwe kuverengerwa, kupatsanurwa, uye serialized kudhisiki mune dhatabhesi yemukati fomati. Mukupesana, mune yakajairwa dhatabhesi, data rinofanirwa kusimbiswa pese parinowedzerwa.
- Superior scalability, kuchinjika, uye mutengo-kubudirira -Nekuda kwekuti data rakachengetwa muHDFS, Apache Hive inogona kubata zana emapetabytes edata, zvichiita kuti ive yakanyanya scalable sarudzo pane yakajairwa dhatabhesi. Apache Hive, segore-yakavakirwa Hadoop sevhisi, inobvumira vatengi kuti vakurumidze kutenderera kumusoro uye pasi sevha chaiyo kuti vasangane nekuchinja kwemabasa.
- Kuwedzera kushanda nesimba -Mahombe edataseti anogona kubata anosvika zana emibvunzo paawa.
Nokuremara
- Kazhinji, Apache Hive mibvunzo ine yakanyanya latency.
- Subquery rutsigiro ishoma.
- Yechokwadi-nguva mibvunzo uye mutsara-level shanduko haisi kuwanikwa muApache Hive.
- Hapana tsigiro yemaonero enyama.
- Mumukoko, kugadzirisa uye kudzima zviito hazvitsigirwe.
- Haina kuitirwa OLTP (online transition process).
Kutanga neApache Hive
Apache Hive ishamwari yakasimba yeHadoop inorerutsa uye inokwenenzvera mafambiro ako ebasa. Kuti uwane zvakanyanya kubva kuApache Hive, kusanganisa kusina musono kwakakosha. Nhanho yekutanga ndeye kuenda kune Website.
1. Kuisa Mukoko kubva kuStable Release
Tanga neku dhawunirodha yazvino yakagadzikana kuburitswa kweHive kubva kune imwe yeApache yekurodha magirazi (ona Hive Releases) Iyo tarball inofanira kubva yasunungurwa. Izvi zvinogadzira diki folda inonzi hive-xyz (apo xyz ndiyo nhamba yekuburitsa):
Seta iyo nharaunda inoshanduka HIVE_HOME kunongedza kune yekumisikidza dhairekitori:
Pakupedzisira, wedzera $HIVE_HOME/bin kune yako PATH
:
2. Running Hive
Hive inoshandisa Hadoop, saka:
- iwe unofanirwa kuve uine Hadoop munzira yako OR
3. DLL Kushanda
Kugadzira Hive Table
inogadzira tafura inonzi pokes ine makoramu maviri, yekutanga iri nhamba uye yechipiri ine tambo.
Kutsvaga kuburikidza neTables
Kunyora Matafura Ese
Kushandura uye Kudonha Matafura
Mazita ematafura anogona kuchinjwa uye makoramu anogona kuwedzerwa kana kutsiviwa:
Zvakakosha kuziva kuti REPLACE COLUMNS inotsiva makoramu ese aripo uchingochinja chimiro chetafura kwete data. Yekuzvarwa SerDe inofanirwa kushandiswa patafura. REPLACE COLUMNS inogona zvakare kushandiswa kubvisa makoramu kubva kune schema yetafura:
Kudonha Matafura
Kune akawanda ekuwedzera mashandiro uye maficha muApache Hive aunogona kudzidza nezvawo nekushanyira iyo yepamutemo webhusaiti.
mhedziso
Tsanangudzo yeHive ndeye data program interface yekubvunza uye kuongororwa kwemahombe dataset akavakirwa pamusoro peApache Hadoop. Nyanzvi dzinoisarudza pamusoro pezvimwe zvirongwa, maturusi, uye software sezvo yakanyanya kugadzirirwa Hive data rakawanda uye iri nyore kushandisa.
Ndinovimba chidzidzo ichi chinokubatsira kutanga neApache Hive uye kuita kuti mafambiro ako ashande. Tizivise mumashoko.
Leave a Reply