M'ndandanda wazopezekamo[Bisani][Show]
Masiku ano, sayansi ya data ndiyofunikira kwambiri!
Mochuluka kwambiri kotero kuti wasayansi wa data adavekedwa korona wa "Sexiest Job of the Twenty-First Century," ngakhale palibe amene amayembekezera ntchito za geeky kukhala zachigololo!
Komabe, chifukwa cha kufunikira kwakukulu kwa deta, Data Science ndiyodziwika kwambiri pakali pano.
Python, ndi kusanthula kwake ziwerengero, kutengera deta, komanso kuwerenga, ndi imodzi mwazabwino kwambiri zilankhulo zamakompyuta kuti mutenge mtengo kuchokera mu datayi.
Python samasiya kudabwitsa opanga mapulogalamu ake pankhani yothana ndi zovuta za sayansi ya data. Ndichilankhulidwe chogwiritsidwa ntchito kwambiri, choyang'ana zinthu, chotseguka, chochita bwino kwambiri chokhala ndi zina zowonjezera.
Python idapangidwa ndi malaibulale odabwitsa a sayansi ya data omwe opanga mapulogalamu amagwiritsa ntchito tsiku lililonse kuthana ndi zovuta.
Nawa malaibulale abwino kwambiri a Python omwe mungaganizire:
1. Pandas
Pandas ndi phukusi lopangidwa kuti lithandizire omanga kugwira ntchito ndi "zolembedwa" ndi "zaubale" mwachilengedwe. Zimamangidwa pamagulu awiri akuluakulu a data: "Series" (dimensional one, ofanana ndi mndandanda wa zinthu) ndi "Deta Frames" (awiri-dimensional, ngati tebulo lokhala ndi mizati yambiri).
Ma Pandas amathandizira kusintha mawonekedwe a data kukhala zinthu za DataFrame, kuthana ndi zomwe zikusowa, kuwonjezera / kuchotsa zipilala kuchokera ku DataFrame, kuyika mafayilo omwe akusowa, ndi kuwonetsa deta pogwiritsa ntchito histograms kapena ziwembu.
Imaperekanso zida zingapo zowerengera ndi kulemba deta pakati pa mapangidwe a data mu-memory ndi mafayilo angapo.
Mwachidule, ndi yabwino kukonzanso mwachangu komanso kosavuta, kusonkhanitsa deta, kuwerenga ndi kulemba deta, komanso kuwona deta. Mukapanga pulojekiti ya sayansi ya data, nthawi zonse muzigwiritsa ntchito laibulale yachilombo Pandas kuti mugwire ndikusanthula deta yanu.
2. numpy
NumPy (Numerical Python) ndi chida chabwino kwambiri chowerengera zasayansi komanso machitidwe oyambira komanso apamwamba kwambiri.
Laibulaleyi imapereka zinthu zingapo zothandiza pogwira ntchito ndi n-arrays ndi matrices ku Python.
Zimapangitsa kuti zikhale zosavuta kukonza masanjidwe omwe ali ndi mfundo zamtundu womwewo wa data ndikuchita masamu pamasanjidwe (kuphatikiza vectorization). M'malo mwake, kugwiritsa ntchito mtundu wamtundu wa NumPy kuti muwonetsetse masamu kumawongolera magwiridwe antchito ndikuchepetsa nthawi yochitira.
Thandizo lamitundu ingapo pamachitidwe a masamu ndi zomveka ndiye gawo lalikulu la laibulale. Ntchito za NumPy zitha kugwiritsidwa ntchito kulondolera, kusanja, kukonzanso, ndikulankhulana zowoneka ndi mafunde amawu ngati kuchuluka kwa manambala enieni.
3. Matlotlib
M'dziko la Python, Matplotlib ndi amodzi mwa malaibulale omwe amagwiritsidwa ntchito kwambiri. Amagwiritsidwa ntchito popanga ma static, animated, komanso mawonekedwe a data. Matplotlib ali ndi njira zambiri zopangira ma chart ndi makonda.
Pogwiritsa ntchito histograms, opanga mapulogalamu amatha kumwaza, kusuntha, ndikusintha ma graph. Laibulale yotsegula-source imapereka API yolunjika pa chinthu chowonjezera ziwembu mu mapulogalamu.
Pogwiritsa ntchito laibulale iyi kuti mupange zowonera zovuta, komabe, opanga ayenera kulemba ma code ambiri kuposa momwe amakhalira.
Ndizofunikira kudziwa kuti malaibulale odziwika otchuka amakhala limodzi ndi Matplotlib popanda vuto.
Mwa zina, imagwiritsidwa ntchito muzolemba za Python, zipolopolo za Python ndi IPython, zolemba za Jupyter, ndi malonda a webusaiti maseva.
Mapulani, ma bar chart, ma pie charts, histograms, scatterplots, error charts, power spectra, stemplots, ndi mtundu wina uliwonse wa tchati chowonetsera akhoza kupangidwa ndi izo.
4. Nyanja
Laibulale ya Seaborn idamangidwa pa Matplotlib. Seaborn atha kugwiritsidwa ntchito kupanga zithunzi zowoneka bwino komanso zodziwitsa zambiri kuposa Matplotlib.
Seaborn imaphatikizapo API yophatikizika yokhazikika ya data kuti ifufuze momwe zimakhalira pakati pamitundu yambiri, kuphatikiza kuthandizira kwathunthu pakuwonera deta.
Seaborn imapereka zosankha zingapo zowonera deta, kuphatikiza kuwonera kwanthawi, magawo ophatikizana, zojambula za violin, ndi zina zambiri.
Imagwiritsa ntchito mapu a semantic ndi kuphatikizika kwa ziwerengero kuti ipereke zowoneka bwino zokhala ndi chidziwitso chakuya. Zimaphatikizapo maulendo angapo opangira ma dataset omwe amagwira ntchito ndi mafelemu a data ndi magulu omwe ali ndi deta yonse.
Mawonekedwe ake a data angaphatikizepo ma chart a bar, ma pie chart, histograms, scatterplots, ma chart a zolakwika, ndi zithunzi zina. Laibulale yowonera data ya Python ilinso ndi zida zosankha ma palette amitundu, omwe amathandizira kuwulula zomwe zikuchitika mu dataset.
5. Scikit-phunzirani
Scikit-learn ndiye laibulale yayikulu kwambiri ya Python yopangira ma data ndikuwunika kwachitsanzo. Ndi imodzi mwamalaibulale othandiza kwambiri a Python. Lili ndi mphamvu zambiri zomwe zimapangidwira cholinga cha chitsanzo.
Zimaphatikizapo ma algorithms onse Oyang'aniridwa ndi Osayang'aniridwa ndi Makina Ophunzirira Makina, komanso ntchito zofotokozedwa bwino za Ensemble Learning ndi Boosting Machine Learning.
Amagwiritsidwa ntchito ndi asayansi a data kuchita chizolowezi makina kuphunzira ndi ntchito za migodi ya deta monga kusonkhanitsa, kutsika, kusankha zitsanzo, kuchepetsa kukula, ndi kugawa. Imabweranso ndi zolemba zonse ndipo imagwira ntchito modabwitsa.
Scikit-phunzirani angagwiritsidwe ntchito popanga mitundu yosiyanasiyana ya Makina Oyang'aniridwa ndi Osayang'aniridwa ndi Makina Ophunzirira monga Gulu, Regression, Support Vector Machines, Random Forests, Oyandikana nawo Oyandikana nawo, Naive Bayes, Mitengo Yosankha, Kuphatikizana, ndi zina zotero.
Laibulale yophunzirira makina a Python imaphatikizapo zida zingapo zosavuta koma zogwira ntchito zowunikira deta ndi ntchito zamigodi.
Kuti muwerenge zambiri, nayi kalozera wathu Scikit-phunzirani.
6. XGBoost
XGBoost ndi chida chothandizira kuti chiwonjezeke chopangidwa kuti chizitha kuthamanga, kusinthasintha, komanso kusuntha. Kupanga ma aligorivimu a ML, imagwiritsa ntchito dongosolo la Gradient Boosting. XGBoost ndi njira yolimbikitsira mitengo yachangu komanso yolondola yomwe imatha kuthana ndi zovuta zambiri za sayansi ya data.
Pogwiritsa ntchito dongosolo la Gradient Boosting, laibulale iyi itha kugwiritsidwa ntchito kupanga makina ophunzirira makina.
Zimaphatikizapo kukwera kwamitengo yofananira, komwe kumathandizira magulu kuthana ndi zovuta zosiyanasiyana za sayansi ya data. Phindu lina ndiloti opanga angagwiritse ntchito code yomweyi ya Hadoop, SGE, ndi MPI.
Ndilodalirikanso muzochitika zonse zomwe zimagawidwa komanso zosakumbukira.
7. Kutuluka kwamatsenga
TensorFlow ndi nsanja yaulere yaulere ya AI yokhala ndi zida zambiri, malaibulale, ndi zida. TensorFlow iyenera kukhala yodziwika kwa aliyense wogwira ntchito makina ophunzirira ntchito mu Python.
Ndi buku la masamu ophiphiritsa lotseguka powerengera manambala pogwiritsa ntchito ma graph a data omwe adapangidwa ndi Google. Ma graph node amawonetsa masamu mumtundu wanthawi zonse wa TensorFlow data flow graph.
M'mphepete mwa ma graph, kumbali ina, ndi ma multidimensional data arrays, omwe amadziwikanso kuti ma tensor, omwe amayenda pakati pa node za netiweki. Imalola opanga mapulogalamu kuti agawire kukonza pakati pa ma CPU amodzi kapena angapo kapena ma GPU pakompyuta, foni yam'manja, kapena seva popanda kusintha ma code.
TensorFlow imapangidwa mu C ndi C ++. Ndi TensorFlow, mutha kupanga ndi kupanga kuphunzitsa Machine Learning zitsanzo zogwiritsa ntchito ma API apamwamba ngati Keras.
Ilinso ndi magawo ambiri ophatikizika, kukulolani kuti musankhe yankho labwino kwambiri lachitsanzo chanu. TensorFlow imakulolani kuti mutumize zitsanzo za Machine Learning pamtambo, msakatuli, kapena chipangizo chanu.
Ndi chida chothandiza kwambiri pantchito monga kuzindikira zinthu, kuzindikira mawu, ndi zina zambiri. Imathandiza pakupanga zinthu zopangira mawindo a neural zomwe ziyenera kuthana ndi magwero ambiri a data.
Nayi kalozera wathu wachangu pa TensorFlow kuti muwerenge zambiri.
8. Keras
Keras ndi gwero laulere komanso lotseguka Python-based neural network zida zanzeru zopangira, kuphunzira mozama, ndi zochitika za sayansi ya data. Ma Neural network amagwiritsidwanso ntchito mu Data Science kutanthauzira zowonera (zithunzi kapena zomvera).
Ndi gulu la zida zopangira zitsanzo, ma graphing data, ndikuwunika deta. Zimaphatikizanso ma dataset omwe adalembedwa kale omwe amatha kutumizidwa mwachangu ndikutsitsa.
Ndi yosavuta kugwiritsa ntchito, yosunthika, komanso yabwino pa kafukufuku wofufuza. Kuphatikiza apo, zimakupatsani mwayi wopanga zolumikizana kwathunthu, zolumikizana, zophatikizana, zobwerezabwereza, zophatikizira, ndi mitundu ina ya Neural Networks.
Mitundu iyi imatha kuphatikizidwa kuti ipange Neural Network yodzaza ndi ma data ndi zovuta zambiri. Ndi laibulale yabwino kwambiri yopangira ma neural network.
Ndi yosavuta kugwiritsa ntchito ndipo amapereka Madivelopa zambiri kusinthasintha. Keras ndi waulesi poyerekeza ndi ma phukusi ena ophunzirira makina a Python.
Izi ndichifukwa choti imayamba kupanga ma graph owerengera omwe amagwiritsa ntchito maziko akumbuyo ndikuigwiritsa ntchito pochita ntchito. Keras ndiwofotokozera modabwitsa komanso wosinthika ikafika pochita kafukufuku watsopano.
9. PyTorch
PyTorch ndi phukusi lodziwika bwino la Python kuphunzira kwakukulu ndi kuphunzira makina. Ndi pulogalamu ya Python yokhazikitsidwa ndi open source science computing pokhazikitsa Deep Learning ndi Neural Networks pamaseti akulu akulu.
Facebook imagwiritsa ntchito kwambiri chida ichi kupanga ma neural network omwe amathandizira pazinthu monga kuzindikira nkhope ndi kulemba ma tag.
PyTorch ndi nsanja ya asayansi a data omwe akufuna kumaliza ntchito zophunzirira mwakuya mwachangu. Chidachi chimathandizira kuwerengera kwa tensor kuti kuchitidwe ndi mathamangitsidwe a GPU.
Amagwiritsidwanso ntchito pazinthu zina, kuphatikiza kupanga ma netiweki osinthika komanso kuwerengera ma gradients.
Mwamwayi, PyTorch ndi phukusi labwino kwambiri lomwe limalola omanga kuti asinthe mosavuta kuchokera ku chiphunzitso ndi kafukufuku kupita ku maphunziro ndi chitukuko zikafika pakuphunzira pamakina ndi kafukufuku wozama kwambiri kuti apereke kusinthasintha kwakukulu komanso kuthamanga.
10. Mtengo wa NLTK
NLTK (Natural Language Toolkit) ndi phukusi lodziwika bwino la Python la asayansi a data. Kuyika zolemba pamawu, ma tokenization, kulingalira kwa semantic, ndi ntchito zina zokhudzana ndi kukonza zilankhulo zachilengedwe zitha kukwaniritsidwa ndi NLTK.
NLTK itha kugwiritsidwanso ntchito kumaliza AI yovuta kwambiri (Nzeru zochita kupanga) ntchito. NLTK poyambilira idapangidwa kuti izithandizira ma AI osiyanasiyana ndi ma paradigms ophunzitsira pamakina, monga chilankhulo ndi chiphunzitso chazidziwitso.
Ikuyendetsa ma algorithm a AI komanso chitukuko chamitundu yophunzirira mdziko lenileni. Lalandiridwa kwambiri kuti ligwiritsidwe ntchito ngati chida chophunzitsira komanso ngati chida chophunzirira payekha, kuwonjezera pa kugwiritsidwa ntchito ngati nsanja yopangira ma prototyping ndikupanga njira zofufuzira.
Kugawa, kugawa, kulingalira kwa semantic, kuyika, kuyika chizindikiro, ndi ma tokenization zonse zimathandizidwa.
Kutsiliza
Izi zimamaliza ma library khumi apamwamba a Python a sayansi ya data. Ma library a sayansi ya data ya Python amasinthidwa pafupipafupi pomwe sayansi ya data ndi kuphunzira pamakina kumatchuka kwambiri.
Pali malaibulale angapo a Python a Data Science, ndipo kusankha kwa wogwiritsa ntchito kumatsimikiziridwa ndi mtundu wa polojekiti yomwe akugwira.
Siyani Mumakonda