Amashishini aya kuba ekwazi ukufunyanwa kwedatha yokusebenzisana nabathengi ngo-2021.
Ukuthembela ngokugqithisileyo kula manqaku edatha, kwelinye icala, rhoqo kukhokelela kwimibutho ephatha igalelo labathengi njengestatistiki - indlela enecala elinye yokuphulaphula ilizwi lomthengi.
Ilizwi lomthengi alinakubhejwa okanye litshintshwe libe linani.
Kufuneka ifundwe, ishwankathelwe, kwaye, ngaphezu kwako konke, iqondwe.
Inyani yeyokuba iinkampani kufuneka zimamele oko bakuthethayo abathengi bazo kwijelo ngalinye abanxibelelana ngalo nabo, nokuba kungomnxeba, ngee-imeyile okanye ngencoko ebukhoma.
Yonke inkampani kufuneka ibeke phambili ukubeka iliso kunye nokuvavanya imvakalelo yengxelo yabathengi, kodwa iinkampani ngokwesiko bezisokola ukuphatha le datha kwaye iyiguqule ibe bubulumko obunentsingiselo.
Oku akusekho kuHlalutya lweemvakalelo.
Kule tutorial, siza kujonga ngakumbi kuhlalutyo lweemvakalelo, iingenelo zayo, kunye nendlela yokusebenzisa i I-NLTK ithala leencwadi ukwenza uhlalutyo lweemvakalelo kwidatha.
Yintoni uhlalutyo lweemvakalelo?
Uhlalutyo lweemvakalelo, oludla ngokubizwa ngokuba yingxoxo yemigodi, yindlela yokuhlalutya iimvakalelo, iingcamango neembono zabantu.
Uhlalutyo lweemvakalelo luvumela amashishini ukuba aqonde ngcono abathengi bawo, anyuse ingeniso, kwaye aphucule iimveliso kunye neenkonzo zabo ngokusekwe kwigalelo labathengi.
Umahluko phakathi kwenkqubo yesoftware ekwaziyo ukuhlalutya uvakalelo lwabathengi kunye nomthengisi/ummeli wenkonzo yabathengi ozama ukuyifumanisa kukuba isakhono sangaphambili sokufumana iziphumo ezinenjongo kumbhalo okrwada - oku kufezwa ngokuyintloko ngokulungiswa kolwimi lwendalo (NLP) kunye yokufunda umatshini ubuchule.
Ukususela ekuchongeni uvakalelo ukuya kuluhlu lwesicatshulwa, uhlalutyo lweemvakalelo lunoluhlu olubanzi lwezicelo. Sisebenzisa uhlalutyo lweemvakalelo kwidatha yombhalo ukunceda inkampani ibeke iliso kwiimvakalelo zovavanyo lwemveliso okanye ingxelo yabathengi.
Amaziko osasazo ahlukeneyo oluntu ayisebenzisayo ukuvavanya ukuvakalelwa kokuthunyelwa, kwaye ukuba imvakalelo inamandla kakhulu okanye inobundlobongela, okanye iwela ngaphantsi komda wabo, isithuba siyacinywa okanye sifihliwe.
Uhlalutyo lweemvakalelo lunokusetyenziswa kuyo yonke into ukusuka ekuchongeni uvakalelo ukuya kuluhlu lwesicatshulwa.
Ukusetyenziswa okudumileyo kokuhlalutya kweemvakalelo kukwidatha yombhalo, apho isetyenziselwa ukunceda inkampani ekulandeleni imvakalelo yovavanyo lwemveliso okanye izimvo zabathengi.
Iisayithi ezahlukeneyo zeendaba zoluntu nazo zisetyenziselwa ukuvavanya imvakalelo yokuthumela, kwaye ukuba imvakalelo inamandla kakhulu okanye inobundlobongela, okanye iwela ngaphantsi komda wabo, isusa okanye ifihle isithuba.
Iingenelo Zokuhlalutya Kweemvakalelo
Oku kulandelayo zezinye zezona nzuzo zibalulekileyo zohlalutyo lweemvakalelo ezingafanele zingahoywa.
- Uncedo ekuvavanyeni imbono yophawu lwakho phakathi kwabantu ojolise kubo.
- Ingxelo ethe ngqo yomxhasi inikezelwe ukukunceda ekuphuhliseni imveliso yakho.
- Yandisa ingeniso yentengiso kunye nokukhangela.
- Amathuba okuthengisa kwiintshatsheli zemveliso yakho andile.
- Inkonzo yabathengi esebenzayo lukhetho olusebenzayo.
Amanani angakunika ulwazi olufana nokusebenza okukrwada kwephulo lokuthengisa, isixa sokubandakanyeka kwifowuni yokukhangela, kunye nenani lamatikiti alindele inkxaso yabathengi.
Nangona kunjalo, ayizukukuxelela ukuba kutheni isiganeko esithile senzeke okanye sibangelwa yintoni. Izixhobo ze-Analytics ezifana neGoogle kunye ne-Facebook, umzekelo, zinokukunceda ukuvavanya ukusebenza kwemigudu yakho yokuthengisa.
Kodwa abakuboneleli ngolwazi olunzulu lokuba kutheni elo phulo libe yimpumelelo.
Uhlalutyo lweemvakalelo lunamandla okutshintsha umdlalo kulo mba.
Uhlalutyo lweemvakalelo - iNgxelo yeNgxaki
Injongo kukuqinisekisa ukuba i-tweet ineemvakalelo ezithandekayo, ezingalunganga, okanye ezingathathi hlangothi malunga neenqwelomoya ezintandathu zase-US ezisekelwe kwiitweets.
Lo ngumsebenzi omiselweyo wokufunda ophantsi kweliso apho kufuneka sihlele umtya wesicatshulwa ngokweendidi ezimiselwe kwangaphambili ezinikwe umtya wokubhaliweyo.
isisombululo
Siza kusebenzisa inkqubo yokufunda koomatshini ukujongana nale ngxaki. Siza kuqala ngokungenisa ngaphandle amathala eencwadi ayimfuneko kunye neeseti zedatha.
Emva koko siya kwenza uhlalutyo lwedatha yokuhlola ukubona ukuba kukho iipateni kwidatha. Ukulandela oko, siyakwenza ulungiso lwangaphambili lokubhaliweyo ukujika igalelo lombhalo wedatha yamanani a yokufunda umatshini inkqubo ingasebenzisa.
Ekugqibeleni, siya kuqeqesha kwaye sivavanye iimodeli zethu zokuhlalutya iimvakalelo sisebenzisa iindlela zokufunda ngomatshini.
1. Ukungenisa amathala eencwadi ngaphandle
Layisha amathala eencwadi ayimfuneko.
2. Iseti yedatha ngaphandle
Eli nqaku liza kusekelwa kwiseti yedatha enokufunyanwa kuyo Github. Iseti yedatha iya kuthathwa ngaphandle kusetyenziswa umsebenzi we-CSV wokufunda wePandas, njengoko kubonwa ngezantsi:
Usebenzisa intloko () umsebenzi, hlola isethi yedatha yokuqala imiqolo emihlanu:
Isiphumo:
3. Uhlalutyo lweDatha
Makhe sihlolisise idatha ukuze sibone ukuba kukho naziphi na iintsingiselo. Kodwa kuqala, siya kutshintsha ubungakanani besakhiwo esingagqibekanga ukwenza iitshathi zibonakale ngakumbi.
Masiqale ngenani lee-tweets ezifunyenwe yinkampani yeenqwelomoya nganye. Siza kusebenzisa itshati yephayi kule nto:
Ipesenti yeetweets zoluntu kwinqwelomoya nganye iboniswa kwisiphumo.
Makhe sijonge ukuba iimvakalelo zihanjiswa njani kuzo zonke iitweets.
Isiphumo:
Ngoku makhe sihlolisise ukuhanjiswa kweemvakalelo kwisikhululo seenqwelomoya ngasinye.
Ngokweziphumo, ubuninzi beetweets phantse kuzo zonke iinqwelomoya azithandeki, ngokungathathi hlangothi kunye neetweets ezilungileyo ezilandelayo. I-Virgin America mhlawumbi kuphela kwenqwelomoya apho umlinganiselo weemvakalelo ezintathu uthelekiseka.
Isiphumo:
Okokugqibela, siza kusebenzisa ithala leencwadi laseSeaborn ukufumana umgangatho wokuzithemba ophakathi kweetweets ezivela kumacandelo amathathu eemvakalelo.
Isiphumo:
Isiphumo sibonisa ukuba inqanaba lokuzithemba kwii-tweets ezingalunganga likhulu kunee-tweets ezintle okanye ezingathathi hlangothi.
4. Ukucoca idatha
Amagama amaninzi asetyenziswa kulwimi kunye neempawu zobhalo anokufumaneka kwiitweets. Ngaphambi kokuba siqeqeshe imodeli yokufunda koomatshini, kufuneka sicoce iitweets zethu.
Nangona kunjalo, ngaphambi kokuba siqale ukucoca iitweets, kufuneka sahlule iseti yethu yedatha ibe yinkalo kunye neeseti zelebhile.
Singayicoca idatha xa sele siyahlulahlule ngokweempawu kunye neeseti zoqeqesho. Iintetho eziqhelekileyo ziya kusetyenziswa ukwenza oku.
5. Ukumelwa kwamanani okubhaliweyo
Ukuqeqesha iimodeli zokufunda koomatshini, iialgorithms zamanani zisebenzisa imathematika. IMathematika, kwelinye icala, isebenza ngamanani kuphela.
Kufuneka siqale siguqule isicatshulwa sibe ngamanani ee-algorithms zamanani ukujongana nawo. Kukho iindlela ezintathu ezisisiseko zokwenza oku: Ingxowa yaMagama, iTF-IDF, kunye neWord2Vec.
Ngethamsanqa, iklasi ye-TfidfVectorizer kwimodyuli ye-Scikit-Learn ye-Python ingasetyenziselwa ukuguqula iimpawu zesicatshulwa zibe yi-TF-IDF yeempawu ze-vectors.
6. Ukudala uQeqesho oluqhutywa yiDatha kunye neeSeti zoVavanyo
Okokugqibela, kufuneka sahlule idatha yethu ibe nguqeqesho kunye neeseti zovavanyo ngaphambi kokuba siqeqeshe i-algorithms yethu.
Iseti yoqeqesho iya kusetyenziselwa ukuqeqesha i-algorithm, kwaye isethi yovavanyo iya kusetyenziswa ukuvavanya ukusebenza komzekelo wokufunda umatshini.
7. Uphuhliso lweModeli
Emva kokuba idatha ihlulwe kwiiseti zoqeqesho kunye novavanyo, iindlela zokufunda ngomatshini zisetyenziselwa ukufunda kwiidatha zoqeqesho.
Ungasebenzisa nayiphi na ialgorithm yokufunda komatshini. Indlela yeRandom Forest, nangona kunjalo, iya kusetyenziswa ngenxa yokukwazi ukujamelana nedatha engekho ngokuqhelekileyo.
8. Uqikelelo kunye noVavanyo lweModeli
Emva kokuba imodeli iqeqeshiwe, inqanaba lokugqibela kukwenza uqikelelo. Ukwenza oku, kufuneka sisebenzise indlela yokuqikelela kwinto yeklasi yeRandomForestClassifier esiyiqeqeshe.
Okokugqibela, imilinganiselo yokuhlela efana neemetrics zokubhideka, imilinganiselo yeF1, ukuchaneka, njalo njalo ingasetyenziselwa ukuvavanya ukusebenza kweemodeli zokufunda koomatshini.
Isiphumo:
I-algorithm yethu iphumelele ukuchaneka kwe-75.30, njengoko kubonwa kwiziphumo.
isiphelo
Uhlalutyo lweemvakalelo ngomnye wemisebenzi ye-NLP exhaphakileyo kuba inceda ukuchonga uluvo loluntu ngokubanzi kumcimbi othile.
Siye sabona ukuba iilayibrari ezininzi zePython zinokunceda njani kuhlalutyo lweemvakalelo.
Senze uphononongo lweetweets zikawonke-wonke malunga neenqwelomoya ezintandathu zase-US kwaye sifikelele ekuchanekeni okumalunga nama-75%.
Ndingacebisa ukuba uzame enye i-algorithm yokufunda komatshini, njengokuhlehla kwezinto, i-SVM, okanye i-KNN, ukubona ukuba ungafikelela kwiziphumo ezingcono.
Shiya iMpendulo