M'ndandanda wazopezekamo[Bisani][Show]
Hive ndi chida chomwe chimagwiritsidwa ntchito kwambiri pa Big Data Analytics pabizinesi, ndipo ndi malo abwino kuyamba ngati mwangoyamba kumene ku Big Data. Phunziro la Apache Hive ili limadutsa pazoyambira za Apache Hive, chifukwa chiyani mng'oma uli wofunikira, mawonekedwe ake, ndi china chilichonse chomwe muyenera kudziwa.
Tiyeni timvetsetse kamangidwe ka Hadoop komwe Apache Hive adamangidwira.
Apache Hadoop
Apache Hadoop ndi yaulere komanso yaulere gwero lotseguka nsanja yosungira ndikukonza ma dataset akulu akulu kuyambira ma gigabytes mpaka ma petabytes. Hadoop imalola kusonkhanitsa makompyuta ambiri kuti asanthule zidziwitso zazikulu mofanana, m'malo mongofuna kompyuta imodzi yayikulu kuti isunge ndikusanthula deta.
MapReduce ndi Hadoop Distributed File System ndi zigawo ziwiri:
- MapaZida - MapReduce ndi njira yofananira yopangira ma data ambiri opangidwa mwadongosolo, osakhazikika, komanso osakhazikika pamagulu azinthu zamagulu.
- Zithunzi za HDFS - HDFS (Hadoop Distributed File System) ndi gawo la Hadoop lomwe limasunga ndikusintha deta. Ndi fayilo yololera zolakwika yomwe imayenda pa hardware wamba
Magawo ang'onoang'ono (zida) mu Hadoop ecosystem, kuphatikiza Sqoop, Nkhumba, ndi Hive, amagwiritsidwa ntchito pothandizira ma module a Hadoop.
- Mimba - Hive ndi chimango cholembera zolemba za SQL zomwe zimawerengera MapReduce.
- Nkhumba - Nkhumba ndi chilankhulo chokonzekera chomwe chingagwiritsidwe ntchito popanga zolemba za MapReduce.
- Skhoop - Sqoop ndi chida chotumizira ndi kutumiza deta pakati pa HDFS ndi RDBMS.
Kodi Mng'oma wa Apache?
Apache Hive ndi gwero lotseguka nyumba yosungiramo deta Pulogalamu yowerengera, kulemba, ndikuwongolera ma data akulu omwe amasungidwa mwachindunji mu Apache Hadoop Distributed File System (HDFS) kapena makina ena osungira deta monga Apache HBase.
Madivelopa a SQL atha kugwiritsa ntchito Hive kupanga mawu a Hive Query Language (HQL) pofufuza ndi kusanthula deta yomwe ingafanane ndi mawu anthawi zonse a SQL. Adapangidwa kuti apangitse mapulogalamu a MapReduce kukhala osavuta pochotsa kufunikira kophunzira ndikulemba ma code a Java aatali. M'malo mwake, mutha kulemba mafunso anu mu HQL, ndipo Hive ikupanga mapu ndikuchepetserani ntchito.
Mawonekedwe a SQL ngati Apache Hive asanduka Mulingo Wagolide wofufuza za ad-hoc, kufotokoza mwachidule, ndi kusanthula deta ya Hadoop. Mukaphatikizidwa mumtambo ma network a makompyuta, njira yothetsera vutoli ndi yotsika mtengo komanso yowonjezereka, chifukwa chake makampani ambiri, kuphatikizapo Netflix ndi Amazon, akupitiriza kupanga ndi kukonza Apache Hive.
History
Pa nthawi yawo pa Facebook, Joydeep Sen Sarma ndi Ashish Thusoo adapanganso Apache Hive. Onse adazindikira kuti kuti apindule kwambiri ndi Hadoop, amayenera kupanga ntchito zovuta kwambiri za Java Map-Reduce. Adazindikira kuti sangathe kuphunzitsa magulu awo omwe akukula mwachangu komanso owunikira maluso omwe angafune kuti athandizire Hadoop pakampani yonse. Mainjiniya ndi akatswiri nthawi zambiri amagwiritsa ntchito SQL ngati mawonekedwe ogwiritsa ntchito.
Ngakhale SQL ikhoza kukwaniritsa zosowa zambiri za analytics, opanga nawonso adafuna kuphatikizira kukhazikika kwa Hadoop. Apache Hive idachokera ku zolinga ziwiri izi: chilankhulo chofotokozera cha SQL chomwe chinathandizanso opanga mapulogalamu kuti abweretse zolemba ndi mapulogalamu awo pamene SQL sinali yokwanira.
Inapangidwanso kuti ikhale ndi metadata yapakati (Hadoop-based) ponena za ma dataset onse mu kampani kuti ntchito yomanga mabungwe oyendetsedwa ndi deta ikhale yosavuta.
Kodi Apache Hive imagwira ntchito bwanji?
Mwachidule, Apache Hive amasintha pulogalamu yolowetsa yolembedwa m'chinenero cha HiveQL (SQL-ngati) kukhala chimodzi kapena zingapo Java MapReduce, Tez, kapena Spark tasks. (Ma injini onse opherawa amagwirizana ndi Hadoop YARN.) Pambuyo pake, Apache Hive amakonza deta mu matebulo a Hadoop Distributed File System HDFS) ndikuchita ntchitozo pamagulu kuti apeze yankho.
Deta
Matebulo a Apache Hive amasanjidwa mofanana ndi momwe matebulo omwe ali munkhokwe yaubale amasanjidwa, okhala ndi magawo a data kuyambira akulu mpaka ang'onoang'ono. Ma database amapangidwa ndi matebulo omwe amagawidwa m'magawo, omwe amagawidwanso kukhala ndowa. HiveQL (Chiyankhulo cha Hive Query) chimagwiritsidwa ntchito kupeza deta, yomwe ingasinthidwe kapena kuwonjezeredwa. Deta yatebulo imasinthidwa mkati mwa nkhokwe iliyonse, ndipo tebulo lililonse lili ndi chikwatu chake cha HDFS.
zomangamanga
Tsopano tikambirana mbali yofunika kwambiri ya Hive Architecture. Zigawo za Apache Hive ndi izi:
Metastore - Imasunga zambiri za tebulo lililonse, monga momwe lilili komanso malo ake. Metadata yogawa imaphatikizidwanso mu Hive. Izi zimathandiza dalaivala kuti azitha kuyang'anira momwe ma seti a data akufalikira pagulu lonselo. Zambiri zimasungidwa mumtundu wamba wa RDBMS. Metadata ya Hive ndiyofunikira kwambiri kuti dalaivala azisunga zomwe zasungidwa. Seva yosunga zobwezeretsera imabwereza deta pafupipafupi kuti ipezekenso pakatayika deta.
dalaivala - Mawu a HiveQL amalandiridwa ndi dalaivala, yemwe amagwira ntchito ngati wowongolera. Pokhazikitsa magawo, dalaivala amayambitsa kukwaniritsidwa kwa mawuwo. Imasunga nthawi ya moyo wa wamkulu komanso momwe akuyendera. Pakuperekedwa kwa mawu a HiveQL, woyendetsa amasunga metadata yofunikira. Imagwiranso ntchito ngati malo osonkhanitsira deta kapena mafunso potsatira ndondomeko ya kuchepetsa.
Wopanga - Imakwaniritsa kuphatikiza kwamafunso a HiveQL. Funsoli tsopano lasinthidwa kukhala dongosolo lokonzekera. Ntchito zalembedwa mu dongosolo. Ikuphatikizanso masitepe omwe MapReduce akuyenera kuchita kuti apeze zotsatira zomwe zamasuliridwa ndi funso. Funsoli limasinthidwa kukhala Abstract Syntax Tree ndi Hive's compiler (AST). Imasintha AST kukhala Directed Acyclic Graph pambuyo poyang'ana kuti ikugwirizana ndi zolakwika za nthawi (DAG).
Optimizer - Imakulitsa DAG pochita zosintha zosiyanasiyana pakukonzekera. Zimaphatikiza masinthidwe kuti azigwira bwino ntchito, monga kusandutsa mapaipi olumikizirana kukhala amodzi. Kuti muwongolere liwiro, chowonjezeracho chikhoza kugawa zochitika, monga kugwiritsa ntchito kusintha kwa data musanagwire ntchito yochepetsera.
Wokwaniritsa - Woyang'anira amayendetsa ntchitozo pamene kusonkhanitsa ndi kukhathamiritsa kwatha. Zochitazo zimaperekedwa ndi Mtsogoleri.
CLI, UI, ndi Thrift Server - The command-line interface (CLI) ndi mawonekedwe ogwiritsira ntchito omwe amalola wogwiritsa ntchito kunja kulankhulana ndi Hive. Seva ya Hive's thrift, yofanana ndi ma protocol a JDBC kapena ODBC, imalola makasitomala akunja kulumikizana ndi Hive kudzera pa netiweki.
Security
Apache Hive imaphatikizidwa ndi chitetezo cha Hadoop, chomwe chimagwiritsa ntchito Kerberos kutsimikizirana kwa kasitomala-seva. HDFS imalamula zilolezo zamafayilo omwe angopangidwa kumene ku Apache Hive, kukulolani kuti muvomereze ndi wogwiritsa ntchito, gulu, ndi ena.
zinthu zikuluzikulu
- Hive imathandizira matebulo akunja, omwe amakupatsani mwayi wokonza deta osasunga mu HDFS.
- Zimathandiziranso magawo a data pamlingo wa tebulo kuti awonjezere liwiro.
- Apache Hive imakwaniritsa zofunikira za mawonekedwe otsika a Hadoop.
- Hive imapangitsa kuti chidule cha data, kufunsa, ndi kusanthula kukhala kosavuta.
- HiveQL sichifuna luso la mapulogalamu; kumvetsetsa kosavuta kwa mafunso a SQL ndikokwanira.
- Titha kugwiritsanso ntchito Hive kuyankha mafunso ad-hoc pakusanthula deta.
- Ndi scalable, zodziwika, ndi kusintha.
- HiveQL sichifuna luso la mapulogalamu; kumvetsetsa kosavuta kwa mafunso a SQL ndikokwanira.
ubwino
Apache Hive imalola malipoti amasiku otsiriza, kuwunika kwa zochitika zatsiku ndi tsiku, kusaka kwa ad-hoc, ndi kusanthula deta. Zidziwitso zatsatanetsatane zoperekedwa ndi Apache Hive zimapatsa mwayi wampikisano ndikupangitsa kuti zikhale zosavuta kuti muyankhe zomwe msika umafuna.
Nawa maubwino ena okhala ndi chidziwitso chotere:
- Chomasuka ntchito - Ndi chilankhulo chofanana ndi SQL, kufunsa mafunso ndikosavuta kumva.
- Kuyika kwachangu kwa data - Chifukwa Apache Hive amawerenga schema popanda kutsimikizira mtundu wa tebulo kapena tanthauzo la schema, deta siyenera kuwerengedwa, kugawidwa, ndi kusindikizidwa kuti ikhale yamtundu wamkati wa database. Mosiyana ndi izi, m'malo osungira wamba, deta iyenera kutsimikiziridwa nthawi iliyonse ikawonjezedwa.
- Kukhazikika kwapamwamba, kusinthasintha, komanso kutsika mtengo - Chifukwa deta imasungidwa mu HDFS, Apache Hive imatha kusunga ma 100 a petabytes a data, ndikupangitsa kuti ikhale njira yowopsa kwambiri kuposa nkhokwe wamba. Apache Hive, monga ntchito ya Hadoop yochokera pamtambo, imalola makasitomala kusuntha mwachangu ndikutsitsa ma seva kuti akwaniritse kusintha kwantchito.
- Kugwira ntchito kwakukulu - Zosungira zazikulu zimatha kuthana ndi mafunso opitilira 100,000 pa ola limodzi.
sitingathe
- Nthawi zambiri, mafunso a Apache Hive amakhala ndi latency yayikulu kwambiri.
- Thandizo la subquery ndilochepa.
- Mafunso enieni komanso kusintha kwa mizere sikupezeka mu Apache Hive.
- Palibe kuchirikiza malingaliro opangidwa ndi thupi.
- Mumng'oma, zosintha ndi kufufuta zochita sizimathandizidwa.
- Osapangira OLTP (njira yosinthira pa intaneti).
Kuyamba ndi Apache Hive
Apache Hive ndi mnzake wamphamvu wa Hadoop yemwe amathandizira ndikuwongolera mayendedwe anu. Kuti mupindule kwambiri ndi Apache Hive, kuphatikiza kopanda msoko ndikofunikira. Chinthu choyamba ndi kupita ku sitolo webusaiti.
1. Kuyika Mng'oma kuchokera ku Kutulutsidwa Kokhazikika
Yambani ndikutsitsa kutulutsidwa kwaposachedwa kwambiri kwa Hive kuchokera ku imodzi mwamagalasi otsitsa a Apache (onani Zotulutsa Hive). Kenako tarball iyenera kumasulidwa. Izi zipanga foda yaying'ono yotchedwa hive-xyz (pomwe xyz ndi nambala yotulutsa):
Khazikitsani kusintha kwachilengedwe HIVE_HOME kuloza ku chikwatu choyika:
Pomaliza, onjezani $HIVE_HOME/bin yanu PATH
:
2. Kuthamanga Mng'oma
Hive amagwiritsa ntchito Hadoop, kotero:
- muyenera kukhala ndi Hadoop panjira yanu OR
3. Ntchito ya DLL
Kupanga Hive Table
imapanga tebulo lotchedwa pokes ndi mizati iwiri, yoyamba ndi nambala ndipo yachiwiri ndi chingwe.
Kusakatula mu Matebulo
Kulemba Matebulo Onse
Kusintha ndi Kugwetsa Matebulo
Mayina atebulo amatha kusinthidwa ndipo mizati ikhoza kuwonjezeredwa kapena kusinthidwa:
Ndizofunikira kudziwa kuti REPLACE COLUMNS ilowa m'malo mwa mizati yonse yomwe ilipo pomwe imangosintha mawonekedwe a tebulo osati deta. SerDe yakubadwa iyenera kugwiritsidwa ntchito patebulo. REPLACE COLUMNS itha kugwiritsidwanso ntchito kuchotsa mizati pa schema ya tebulo:
Kugwetsa Matebulo
Pali zina zambiri zochitira ndi zina mu Apache Hive zomwe mungaphunzire poyendera tsamba lovomerezeka.
Kutsiliza
Tanthauzo la Hive ndi mawonekedwe a pulogalamu yofunsira mafunso ndi kusanthula ma dataset akuluakulu omwe amamangidwa pamwamba pa Apache Hadoop. Akatswiri amasankha pamapulogalamu ena, zida, ndi mapulogalamu ena chifukwa amapangidwira kuti azitha kudziwa zambiri za Hive ndipo ndi yosavuta kugwiritsa ntchito.
Tikukhulupirira kuti phunziroli likuthandizani kuti muyambe ndi Apache Hive ndikupangitsa kuti ntchito zanu ziziyenda bwino. Tiuzeni mu ndemanga.
Siyani Mumakonda