Okuqukethwe[Fihla][Bonisa]
Amabhizinisi azobe esekwazile ukutholwa kwedatha yokusebenzisana nabathengi ngo-2021.
Ukuthembela ngokweqile kulawa maphuzu wedatha, ngakolunye uhlangothi, kuvame ukuholela ezinhlanganweni eziphatha okufakwayo kwamakhasimende njengezibalo - indlela enohlangothi olulodwa yokulalela izwi lekhasimende.
Izwi lekhasimende alikwazi ukubhejwa noma liguqulwe libe inombolo.
Kumele ifundwe, ifinyezwe, futhi ngaphezu kwakho konke, iqondwe.
Iqiniso liwukuthi izinkampani kufanele zilalele lokho abathengi bazo abakushoyo kuzo zonke iziteshi ezisebenzisana nazo, kungaba ngezingcingo, ama-imeyili, noma ingxoxo ebukhoma.
Yonke inkampani kufanele ibeke phambili ukuqapha nokuhlola imizwa yempendulo yabathengi, kodwa izinkampani bezilokhu zithwala kanzima ukuphatha le datha futhi ziyiguqule ibe ubuhlakani obuphusile.
Lokhu akusenjalo nge-Sentiment Analysis.
Kulesi sifundo, sizobhekisisa ukuhlaziya imizwa, izinzuzo zayo, kanye nendlela yokuyisebenzisa I-NLTK umtapo wolwazi ukwenza ukuhlaziya imizwa kudatha.
Kuyini ukuhlaziya imizwa?
Ukuhlaziya imizwa, okuvame ukwaziwa ngokuthi izimayini zengxoxo, kuyindlela yokuhlaziya imizwa, imicabango, kanye nemibono yabantu.
Ukuhlaziya imizwa kuvumela amabhizinisi ukuthi azuze ukuqonda okungcono kwabathengi bawo, andise imali engenayo, futhi athuthukise imikhiqizo yawo namasevisi ngokusekelwe kokufakwayo kwamakhasimende.
Umehluko phakathi kwesistimu ye-software ekwazi ukuhlaziya imizwa yekhasimende kanye nomthengisi/omele isevisi yamakhasimende ozama ukuthola ukuthi yikhono eliphelele lomuntu wangaphambili lokuthola imiphumela eqondile embhalweni ongahluziwe - lokhu kufezwa ngokuyinhloko ngokucubungula ulimi lwemvelo (NLP) kanye ukufunda imishini amasu.
Ukusuka ekuhlonzeni imizwa kuye ekuhlukaniseni umbhalo ngezigaba, ukuhlaziya imizwa kunezinhlobonhlobo zezinhlelo zokusebenza. Sisebenzisa ukuhlaziya imizwa kudatha yombhalo ukuze sisize inkampani iqaphe imizwa yokuhlolwa komkhiqizo noma impendulo yomthengi.
Amasayithi enkundla yezokuxhumana ahlukene ayisebenzisela ukuhlola imizwa yokuthunyelwe, futhi uma imizwa inamandla kakhulu noma inodlame, noma iwela ngaphansi komkhawulo wabo, okuthunyelwe kuyasuswa noma kufihlwe.
Ukuhlaziya imizwa kungasetshenziselwa yonke into kusukela ekuhlonzeni imizwa kuya ekuhlukaniseni umbhalo ngokwezigaba.
Ukusetshenziswa okudume kakhulu kokuhlaziya imizwelo kusedatha yombhalo, lapho kusetshenziswa khona ukusiza inkampani ukulandelela imizwa yokuhlolwa komkhiqizo noma ukuphawula kwabathengi.
Izingosi ezihlukene zenkundla yezokuxhumana nazo ziyisebenzisela ukuhlola imizwa yokuthunyelwe, futhi uma imizwa inamandla kakhulu noma inodlame, noma iwela ngaphansi komkhawulo wabo, isusa noma ifihle okuthunyelwe.
Izinzuzo Zokuhlaziya Imizwa
Okulandelayo ezinye zezinzuzo ezibaluleke kakhulu zokuhlaziya imizwa okungafanele zinganakwa.
- Usizo ekuhloleni umbono womkhiqizo wakho phakathi kwezibalo zabantu ohlosiwe.
- Impendulo eqondile yeklayenti inikezwa ukukusiza ekuthuthukiseni umkhiqizo wakho.
- Yandisa imali engenayo yokuthengisa kanye nokubheka.
- Amathuba okuthengisa ompetha bomkhiqizo wakho ande.
- Isevisi yamakhasimende esebenzayo iyinketho engokoqobo.
Izinombolo zingakunikeza ulwazi olufana nokusebenza okungaphekiwe komkhankaso wokumaketha, inani lokuzibandakanya ocingweni lokuhlola, kanye nenani lamathikithi asalindwe ekusekelweni kwamakhasimende.
Nokho, ngeke ikutshele ukuthi kungani kwenzeke isenzakalo esithile noma ukuthi sibangelwe yini. Amathuluzi okuhlaziya afana ne-Google ne-Facebook, isibonelo, angakusiza ukuthi uhlole ukusebenza kwemizamo yakho yokumaketha.
Kodwa abakunikezi ulwazi olujulile lokuthi kungani lowo mkhankaso othile ube yimpumelelo.
I-Sentiment Analysis inamandla okuba iguqule umdlalo kulokhu.
Ukuhlaziywa Kwemizwa - Isitatimende Senkinga
Inhloso ukuthola ukuthi i-tweet inemizwa evumayo, engemihle, noma engathathi hlangothi mayelana nezindiza eziyisithupha zase-US ezisekelwe kuma-tweets.
Lona umsebenzi ojwayelekile wokufunda ogadiwe lapho kufanele sihlukanise iyunithi yezinhlamvu zombhalo ngokwezigaba ezinqunywe kusengaphambili uma kunikezwe iyunithi yezinhlamvu yombhalo.
Isixazululo
Sizosebenzisa inqubo yokufunda yomshini ejwayelekile ukubhekana nale nkinga. Sizoqala ngokungenisa imitapo yolwazi edingekayo namasethi edatha.
Bese sizokwenza ukuhlaziya idatha yokuhlola ukuze sinqume ukuthi akhona yini amaphethini kudatha. Ukulandela lokho, sizokwenza ukucubungula kuqala umbhalo ukuze siguqule idatha yezinombolo zombhalo a ukufunda imishini uhlelo lungasebenzisa.
Ekugcineni, sizoqeqesha futhi sihlole amamodeli ethu okuhlaziya imizwa sisebenzisa izindlela zokufunda zomshini.
1. Ukungenisa Imitapo yolwazi
Layisha imitapo yolwazi edingekayo.
2. Ngenisa Isethi Yedatha
Lesi sihloko sizosekelwe kudathasethi engatholwa kuyo I-Github. Idathasethi izongeniswa kusetshenziswa umsebenzi we-CSV wokufunda we-Pandas, njengoba kubonakala ngezansi:
Usebenzisa umsebenzi othi head(), hlola imigqa emihlanu yokuqala yesethi yedatha:
okukhipha:
3. Ukuhlaziywa Kwedatha
Ake sihlole idatha ukuze sinqume ukuthi akhona yini amathrendi. Kodwa okokuqala, sizoshintsha usayizi wesakhiwo esimisiwe ukuze senze amashadi abonakale kakhudlwana.
Ake siqale ngenani lama-tweets atholwe inkampani yezindiza ngayinye. Sizosebenzisa ishadi likaphayi kulokhu:
Amaphesenti ama-tweets asesidlangalaleni enkampani yezindiza ngayinye aboniswa kokuphumayo.
Ake sibheke ukuthi imizwa isatshalaliswa kanjani kuwo wonke ama-tweets.
okukhipha:
Manje ake sihlole ukusatshalaliswa kwemizwa yenkampani yezindiza ngayinye ethile.
Ngokwemiphumela, inqwaba yama-tweets cishe azo zonke izinkampani zezindiza ayilungile, kulandelwa ama-tweets angathathi hlangothi futhi amahle. IVirgin America mhlawumbe iyona kuphela inkampani yezindiza lapho ingxenye yemizwa emithathu iqhathaniswa.
okukhipha:
Okokugcina, sizosebenzisa ilabhulali yakwa-Seaborn ukuze sithole izinga lokuzethemba elimaphakathi lama-tweets avela ezigabeni ezintathu zemizwa.
okukhipha:
Umphumela ubonisa ukuthi izinga lokuzethemba lama-tweets angalungile likhulu kunama-tweets aqondile noma angathathi hlangothi.
4. Ukuhlanza idatha
Amagama amaningi e-slang kanye nezimpawu zokubhala zingatholakala kuma-tweets. Ngaphambi kokuthi siqeqeshe imodeli yokufunda yomshini, sidinga ukuhlanza ama-tweets ethu.
Nokho, ngaphambi kokuthi siqale ukuhlanza ama-tweets, kufanele sihlukanise idathasethi yethu ibe yisici namasethi amalebula.
Singakwazi ukuhlanza idatha uma sesiyihlukanise ngezici namasethi okuqeqesha. Izinkulumo ezivamile zizosetshenziswa ukwenza lokhu.
5. Ukumelwa Kwezinombolo Kombhalo
Ukuze uqeqeshe amamodeli okufunda omshini, ama-algorithms ezibalo asebenzisa izibalo. Izibalo, ngakolunye uhlangothi, zisebenza ngezinombolo kuphela.
Kufanele siqale siguqule umbhalo ube izinombolo zama-algorithms ezibalo ukuze sibhekane nawo. Kunezindlela ezintathu eziyisisekelo zokwenza lokhu: Isikhwama Samagama, i-TF-IDF, ne-Word2Vec.
Ngenhlanhla, ikilasi le-TfidfVectorizer kumojuli ye-Scikit-Learn ye-Python lingasetshenziswa ukuguqula izici zombhalo zibe ama-vector wesici se-TF-IDF.
6. Ukudala Ukuqeqeshwa Okuqhutshwa Yidatha kanye Nesethi Yokuhlola
Okokugcina, kufanele sihlukanise idatha yethu ibe amasethi okuqeqesha nokuhlola ngaphambi kokuqeqesha ama-algorithms ethu.
Isethi yokuqeqeshwa izosetshenziselwa ukuqeqesha i-algorithm, futhi isethi yokuhlola izosetshenziselwa ukuhlola ukusebenza kwemodeli yokufunda komshini.
7. Ukuthuthukiswa Kwezibonelo
Ngemuva kokuthi idatha ihlukaniselwe ukuqeqeshwa namasethi okuhlola, izindlela zokufunda zomshini zisetshenziswa ukuze kufundwe kudatha yokuqeqeshwa.
Ungasebenzisa noma iyiphi i-algorithm yokufunda komshini. Indlela Yehlathi Engahleliwe, nokho, izosetshenziswa ngenxa yekhono layo lokubhekana nedatha engajwayelekile.
8. Izibikezelo kanye Nokuhlola Imodeli
Ngemva kokuba imodeli isiqeqeshiwe, isigaba sokugcina ukwenza izibikezelo. Ukwenza lokhu, kufanele sisebenzise indlela yokubikezela entweni yekilasi ye-RandomForestClassifier esiyiqeqeshe.
Okokugcina, izilinganiso zokuhlukanisa ezifana namamethrikhi okudideka, izilinganiso ze-F1, ukunemba, njalonjalo zingasetshenziswa ukuhlola ukusebenza kwamamodeli okufunda omshini.
okukhipha:
I-algorithm yethu ithole ukunemba okungu-75.30, njengoba kubonakala emiphumeleni.
Isiphetho
Ukuhlaziywa kwemizwa kungomunye wemisebenzi evame kakhulu ye-NLP njengoba kusiza ukukhomba umbono womphakathi wonkana ngodaba oluthile.
Sibonile ukuthi imitapo yolwazi eminingi yePython ingasiza kanjani ekuhlaziyeni imizwa.
Senze ucwaningo lwama-tweets omphakathi mayelana nezindiza eziyisithupha zase-US futhi safinyelela ukunemba okungaba ngu-75%.
Ngingaphakamisa ukuthi uzame enye i-algorithm yokufunda yomshini, efana nokuhlehla kwezinto, i-SVM, noma i-KNN, ukuze ubone ukuthi ungakwazi yini ukuzuza imiphumela engcono.
shiya impendulo