Zviri Mukati[Viga][Ratidza]
Ndine chokwadi chekuti wakambonzwa nezvehungwaru hwekugadzira, pamwe nemazwi senge muchina kudzidza uye kugadzirwa kwemutauro wechisikigo (NLP).
Kunyanya kana uchishandira femu inobata mazana, kana zvisiri zviuru, zvevatengi vanobatika mazuva ese.
Ongororo yedata yekutumirwa kwesocial media, maemail, chats, yakavhurika-yakapera ongororo mhinduro, uye mamwe masosi haisi nyore maitiro, uye inotowedzera kuoma kana yakapihwa vanhu chete.
Ndokusaka vanhu vazhinji vachifarira kugona kwe chakagadzirwa njere nokuda kwebasa ravo rezuva nezuva uye remabhizimisi .
AI-powered text analysis inoshandisa nzira dzakasiyana-siyana kana maalgorithms ekududzira mutauro zvakabatana, imwe yacho ikuongorora misoro, iyo inoshandiswa kungowana zvidzidzo kubva muzvinyorwa.
Mabhizinesi anogona kushandisa mhando dzekuongorora misoro kutumira mabasa ari nyore pamakina kwete kuremedza vashandi nedata rakawandisa.
Funga kuti inguva yakawanda sei iyo timu yako inogona kuchengetedza uye kushandisa kune rakawanda basa rakakosha kana komputa yaigona kusefa kuburikidza isingaperi runyorwa rweongororo dzevatengi kana nyaya dzekutsigira mangwanani ega ega.
Mugwaro rino, tichatarisa mukuenzanisa kwemusoro wenyaya, nzira dzakasiyana dzekuenzanisa misoro, uye kuwana ruzivo rwekuita nazvo.
Chii chinonzi Topic Modelling?
Topic modelling imhando yezvinyorwa zvicherwa umo zvisina mutariri uye akatariswa manhamba machine learning matekiniki anoshandiswa kuona mafambiro ari mukorasi kana vhoriyamu yakakosha yezvinyorwa zvisina kurongeka.
Zvinogona kutora muunganidzwa wako mukuru wemagwaro uye kushandisa nzira yakafanana kuronga mazwi mumapoka ematemu uye kuwana zvidzidzo.
Zvinoita sezvidiki uye zvakaoma, saka ngatirerutsa maitiro ekuenzanisira nyaya!
Fungidzira kuti uri kuverenga bepanhau rine seti yemavara emhando yepamusoro muruoko rwako.
Hazvisi zvechinyakare here izvozvo?
Ndinoziva kuti mazuva ano, vanhu vashoma vanoverenga mapepanhau akadhindwa; zvese ndezvedhijitari, uye zviratidziri chinhu chekare! Kunyepedzera kuva baba kana mai vako!
Saka, paunoverenga pepanhau, unosimbisa mazwi anokosha.
Imwezve fungidziro!
Iwe unoshandisa rakasiyana hue kusimbisa mazwi makuru emhando dzakasiyana. Iwe unoisa mumapoka mazwi akakosha zvichienderana nemuvara wakapihwa uye misoro.
Muunganidzwa wega wega wemazwi akamisikidzwa nerimwe ruvara irondedzero yemazwi akakosha emusoro wakapihwa. Huwandu hwemavara akasiyana awakanhonga hunoratidza huwandu hwemadingindira.
Iyi ndiyo inonyanya kukosha yemuenzaniso wenyaya. Inobatsira mukunzwisisa, kuronga, uye kupfupisa kwezvinyorwa zvakaunganidzwa zvakakura.
Nekudaro, ramba uchifunga kuti kushanda, otomatiki misoro yemhando inoda zvakawanda zvemukati. Kana uine bepa pfupi, ungada kuenda kuchikoro chekare uye kushandisa highlighters!
Zvinobatsirawo kushandisa imwe nguva kuziva data. Izvi zvinokupa iwe pfungwa yekutanga yeicho chinyorwa modhi chinofanira kuwana.
Semuenzaniso, iro diary rinogona kunge riri pamusoro pehukama hwako hwazvino uye hwekare. Nekudaro, ini ndaitarisira mameseji angu ekuchera robhoti-shamwari kuti auye nemazano akafanana.
Izvi zvinogona kukubatsira kuongorora zvirinani mhando yezvidzidzo zvawaona uye, kana zvichidikanwa, tweak iwo keyword seti.
Zvikamu zveTopic Modelling
Probabilistic Model
Random variables uye mukana wekugovera zvinosanganiswa mukumiririra kwechiitiko kana chiitiko mune zvingangoitika modhi.
A deterministic modhi inopa imwechete inogona kupedzisa kwechiitiko, nepo probabilistic modhi inopa mukana wekugovera semhinduro.
Aya mamodheru anotarisa chokwadi chekuti isu hatiwanzove neruzivo rwakakwana rwemamiriro ezvinhu. Pane inenge nguva dzose chinhu che randomness kufunga.
Semuenzaniso, inishuwarenzi yehupenyu inotarirwa pane chokwadi chekuti tinoziva kuti tichafa, asi isu hatizive rinhi. Aya mamodheru anogona kunge ari edimikira, akaerekana angoitika, kana kuti akasarudzika.
Kuwanazve Ruzivo
Kudzosa ruzivo (IR) chirongwa chesoftware chinoronga, kuchengeta, kutora uye kuongorora ruzivo kubva mumagwaro ezvinyorwa, kunyanya ruzivo rwezvinyorwa.
Iyo tekinoroji inobatsira vashandisi kuwana ruzivo rwavanoda, asi haibudisi pachena mhinduro kumibvunzo yavo. Inozivisa nezvekuvapo uye nzvimbo yemapepa anogona kupa ruzivo rwakakosha.
Magwaro akakodzera ndeaya anosangana nezvinodiwa nemushandisi. Iyo isina mhosva IR sisitimu inodzosera chete zvinyorwa zvakasarudzwa.
Topic Coherence
Kubatana kwemusoro wenyaya kunopa musoro mumwe chete nekuverenga dhigirii rekufanana kwesemantic pakati pematemu emusoro ane zvibodzwa zvepamusoro. Aya mametric anobatsira mukusiyanisa pakati pezvidzidzo zvinodudzirwa semantic uye misoro iri statistical inference artifacts.
Kana boka rezvikumbiro kana chokwadi richitsigirana, zvinonzi zvinopindirana.
Nekuda kweizvozvo, chokwadi chakabatana chinogona kunzwisiswa mumamiriro ezvinhu anosanganisira zvese kana ruzhinji rwechokwadi. “Mutambo mutambo wetimu,” “mutambo unotambwa nebhora,” uye “mutambo wacho unoda nhamburiko huru yomuviri” yose iri mienzaniso yezvirongwa zvakabatana.
Nzira Dzakasiyana dzeTopic Modelling
Iyi nzira yakakosha inogona kuitwa neakasiyana maalgorithms kana maitiro. Pakati pavo pane:
- Latent Dirichlet Allocation (LDA)
- Kwete Negative Matrix Factorization (NMF)
- Latent Semantic Analysis (LSA)
- Probabilistic Latent Semantic Analysis(pLSA)
Latent Dirichlet Allocation (LDA)
Kuti uone hukama pakati pezvinyorwa zvakawanda mune corpus, iyo nhamba uye graphical pfungwa yeLatent Dirichlet Allocation inoshandiswa.
Uchishandisa iyo Variational Exception Maximization (VEM) maitiro, iyo yakakura fungidziro yemukana kubva kune yakazara corpus yezvinyorwa inowanikwa.
Pachivanhu, mazwi mashoma epamusoro kubva muhombodo yemashoko anosarudzwa.
Zvisinei, chirevo chacho hachina maturo zvachose.
Zvinoenderana nehunyanzvi uhu, chinyorwa chega chega chinomiririrwa nekugovaniswa kwezvidzidzo, uye musoro wega wega nekugovaniswa kwemazwi.
Kwete Negative Matrix Factorization (NMF)
Matrix ine Non-Negative Values Factorization inzira yekucheka-chete yekubvisa maitiro.
Kana paine hunhu hwakawanda uye hunhu husina kujeka kana husina kunyatsofanotaura, NMF inobatsira. NMF inogona kugadzira akakosha mapatani, zvidzidzo, kana madingindira nekubatanidza maitiro.
NMF inogadzira chimwe nechimwe semusanganiswa wemutsara weiyo yekutanga hunhu seti.
Chimiro chega chega chine seti yemakoefifiti anomiririra kukosha kwechinhu chimwe nechimwe pachinhu. Chiverengo chega chega chenhamba uye kukosha kwega kwega kwechikamu chimwe nechimwe chine coefficient yayo.
Ese macoefficients akanaka.
Latent Semantic Analysis
Ndiyo imwe nzira yekudzidza isina kutariswa inoshandiswa kubvisa hukama pakati pemazwi mune seti yezvinyorwa ndeye latent semantic ongororo.
Izvi zvinotibatsira kusarudza magwaro akakodzera. Basa rayo rekutanga ndere kudzikisa chiyero cheiyo yakakura corpus yezvinyorwa data.
Iyi data isingakoshi inoshanda seruzha rwekumashure mukuwana ruzivo rwunodiwa kubva kune data.
Probabilistic Latent Semantic Analysis(pLSA)
Probabilistic latent semantic analysis (PLSA), dzimwe nguva inozivikanwa seprobabilistic latent semantic indexing (PLSI, zvikuru mumadenderedzwa ekudzoreredza ruzivo), inzira yenhamba yekuongorora dhata-maviri uye co-occurrence data.
Muchokwadi, yakafanana neyakavharika semantic ongororo, kubva pakabuda PLSA, yakaderera-dimensional inomiririra yezvakacherechedzwa zvinosiyana zvinogona kutorwa maererano nekubatana kwavo kune mamwe akavanzika akasiyana.
Maoko-on ane Topic Modelling muPython
Zvino, ini ndichakufambisa iwe kuburikidza nechidzidzo chekuenzanisira kugoverwa nePython programming language kushandisa muenzaniso wenyika chaiyo.
Ndichange ndichitevedzera zvinyorwa zvekutsvaga. Iyo dataset yandichange ndichishandisa pano inobva ku kaggle.com. Unogona kuwana nyore mafaera ese andiri kushandisa mubasa iri kubva pane ino peji.
Ngatitangei neTopic Modelling tichishandisa Python nekuunza kunze kwenyika ese akakosha maraibhurari:
Danho rinotevera nderekuverenga ese madataset andichange ndichishandisa mubasa iri:
Exploratory Data Analysis
EDA (Exploratory Data Analysis) inzira yenhamba inoshandisa zvinhu zvinoonekwa. Inoshandisa zvipfupiso zvenhamba uye mifananidzo inomiririra kuwana mafambiro, mapatani, uye fungidziro dzebvunzo.
Ini ndichaita ongororo yedata ndisati ndatanga kuenzanisira kuti ndione kana paine mapatani kana hukama mu data:
Iye zvino tichawana zvisizvo zvimiro zvebvunzo dataset:
Ikozvino ndichave ndichironga histogram uye boxplot kutarisa hukama pakati pezvinosiyana.
Huwandu hwemavara muZvinyorwa zveChitima seti hunosiyana zvakanyanya.
Muchitima, tine mashoma makumi mashanu nemana uye anodarika mazana mana nemakumi mashanu nevaviri mavara. 54 ndiyo avhareji yehuwandu hwemavara.
Iyo test set inoratidzika kunge inonakidza kupfuura yekudzidziswa seti sezvo test set ine mavara makumi mana nematanhatu ukuwo seti yekudzidziswa iine 46.
Nekuda kweizvozvo, iyo test set yaive nemedian ye1058 characters, iyo yakafanana neyekudzidziswa seti.
Huwandu hwemazwi ari muchikamu chekudzidza hunotevera pateni yakafanana nenhamba yemabhii.
Mashoma emazwi masere uye huwandu hwemashoko e8 anobvumidzwa. Nekuda kweizvozvo, iyo yepakati mazwi kuverenga ndeye 665.
Anodiwa mashoma emazwi manomwe mune abstract uye anokwana e452 mazwi mu test set.
Iyo yepakati, mune iyi kesi, ndeye 153, iyo yakafanana neyepakati mune yekudzidziswa seti.
Kushandisa Tags for Topic Modelling
Kune akati wandei misoro yemuenzaniso nzira. Ndichashandisa ma tag muchiitwa ichi; ngatitarisei kuti tingazviita sei nekuongorora ma tag:
Zvishandiso zveTopic Modelling
- Pfupiso yechinyorwa inogona kushandiswa kuona musoro wegwaro kana bhuku .
- Inogona kushandiswa kubvisa kurerekera kwemumiriri kubva mukugohwesa bvunzo.
- Topic modelling inogona kushandiswa kuvaka hukama hwesemantic pakati pemazwi mumagraph-based models.
- Inogona kusimudzira sevhisi yevatengi nekuona uye kupindura kumazwi akakosha mukubvunza kwemutengi. Vatengi vanozove nekutenda kwakawanda mauri sezvo iwe wakavapa nerubatsiro rwavanoda panguva yakakodzera uye pasina kuvakonzera dambudziko. Nekuda kweizvozvo, kuvimbika kwevatengi kunokwira zvakanyanya, uye kukosha kwekambani kunowedzera.
mhedziso
Topic modelling imhando ye statistical modelling inoshandiswa kufumura abstract "zvidzidzo" zviripo muunganidzwa wezvinyorwa.
Icho chimiro cheiyo statistical modhi inoshandiswa mu machine learning uye magadzirirwo emutauro wechisikigo kuti aburitse pfungwa dzisinganzwisisiki dziripo muboka rezvinyorwa.
Iyo inzira yekuchera mameseji inoshandiswa zvakanyanya kutsvaga latent semantic mapatani mune zvinyorwa zvemuviri.
Leave a Reply