Isiqulatho[Fihla][Bonisa]
Ndiqinisekile ukuba uvile ngobukrelekrele bokwenziwa, kunye namagama anje ngoomatshini bokufunda kunye nokusetyenzwa kolwimi lwendalo (NLP).
Ingakumbi ukuba usebenzela ifemu ephethe amakhulu, ukuba ayingawo amawaka, abafowunelwa babathengi yonke imihla.
Uhlalutyo lwedatha yokuthunyelwa kwemidiya yoluntu, ii-imeyile, iingxoxo, iimpendulo zophando ezivulekileyo, kunye neminye imithombo ayiyonkqubo ilula, kwaye iba nzima ngakumbi xa iphathiswe abantu kuphela.
Yiyo loo nto abantu abaninzi benomdla malunga nokubanakho kukubhadla okungeyonyani ngomsebenzi wabo wemihla ngemihla nakumashishini .
Uhlalutyo lombhalo olunikwe amandla nge-AI lusebenzisa uluhlu olubanzi lweendlela okanye i-algorithms yokutolika ulwimi ngokwezinto eziphilayo, enye yazo luhlalutyo lwesihloko, olusetyenziselwa ukufumanisa ngokuzenzekelayo izifundo ezivela kwiitekisi.
Amashishini anokusebenzisa imifuziselo yokuhlalutya izihloko ukudlulisela imisebenzi elula koomatshini kunokuba bathwalise ubunzima abasebenzi ngedatha eninzi kakhulu.
Qwalasela ukuba lingakanani ixesha elinokugcinwa liqela lakho kwaye lizinikele kumsebenzi obaluleke ngakumbi ukuba ikhompyuter inokucoca uluhlu olungapheliyo lovavanyo lwabathengi okanye imiba yenkxaso rhoqo kusasa.
Kwesi sikhokelo, siza kujonga kwimodeli yesihloko, iindlela ezahlukeneyo zemodeli yesihloko, kwaye sifumane amava athile ngayo.
Yintoni iTopic Modeling?
Imodeli yesihloko luhlobo lokwembiwa kwesicatshulwa apho iinkcukacha-manani zingajongwanga kwaye zibekwe esweni yokufunda umatshini ubuchule busetyenziselwa ukukhangela iintsingiselo kwikhopus okanye umthamo omkhulu wesicatshulwa esingamiselwanga.
Ingathatha ingqokelela yakho enkulu yamaxwebhu kwaye usebenzise indlela efanayo ukucwangcisa amagama ngokweqela lamagama kunye nokufumanisa izifundo.
Oko kubonakala kuntsonkothile kwaye kunzima, ke masenze lula inkqubo yomzekelo wesifundo!
Cinga ukuba ufunda iphephandaba elineseti yezinto eziqaqambisa imibala esandleni sakho.
Ayiyonto yakudala leyo?
Ndiyaqonda ukuba kule mihla, bambalwa abantu abafunda amaphephandaba ashicilelweyo; yonke into yedijithali, kwaye ii-highlighters zizinto zexesha elidlulileyo! Zenze uyihlo okanye unyoko!
Ke, xa ufunda iphephandaba, ubalaselisa amagama abalulekileyo.
Enye intelekelelo!
Usebenzisa i-hue eyahlukileyo ukugxininisa amagama angundoqo emixholo eyahlukeneyo. Uwahlulahlula amagama angundoqo ngokuxhomekeke kumbala onikiweyo kunye nezihloko.
Ingqokelela nganye yamagama ephawulwe ngombala othile luhlu lwamagama angundoqo kwisihloko esinikiweyo. Ubungakanani bemibala eyahlukeneyo oyikhethileyo bubonisa inani lemixholo.
Lo ngowona mzekelo usisiseko wesihloko. Inceda ukuqonda, ukulungelelanisa, kunye noshwankathelo lwengqokelela yemibhalo emikhulu.
Nangona kunjalo, khumbula ukuba ukuze usebenze, iimodeli zesihloko ezizenzekelayo zifuna umxholo omninzi. Ukuba unephepha elifutshane, unokufuna ukuya esikolweni esidala kwaye usebenzise ii-highlighters!
Kukwaluncedo ukuchitha ixesha usazi idatha. Oku kuya kukunika ingqiqo esisiseko malunga nokuba imodeli yesihloko kufuneka ifumane ntoni.
Umzekelo, loo diary inokuba malunga nobudlelwane bakho bangoku kunye nobudlelwane bangaphambili. Ke, ndilindele ukuba isicatshulwa sam sokumba irobhothi-umhlobo ukuba eze nezimvo ezifanayo.
Oku kunokukunceda uhlalutye ngcono umgangatho wezifundo ozichongileyo kwaye, ukuba kuyimfuneko, udibanise iiseti zamagama angundoqo.
Amacandelo oMfanekiso weSihloko
Umzekelo onokwenzeka
Iinguqu ezingaqhelekanga kunye nonikezelo olunokwenzeka lubandakanyiwe kumelo lwesiganeko okanye isenzeko kwimifuziselo enokwenzeka.
Imodeli enokumiselwa ibonelela ngesiphelo esinye esinokubakho sesiganeko, ngelixa imodeli enokwenzeka ibonelela ngonikezelo lokwenzeka njengesisombululo.
Le mizekelo iqwalasela ubunyani bokuba asifane sibe nolwazi olupheleleyo ngemeko ethile. Kukho phantse kusoloko kukho into yokungakhethi ukuba iqwalaselwe.
Ngokomzekelo, i-inshorensi yobomi ichazwe kwinyani yokuba siyazi ukuba siya kufa, kodwa asazi ukuba nini. Ezi modeli zinokumiselwa ngokuyinxenye, ngokungakhethiyo, okanye ngokungakhethiyo ngokupheleleyo.
UkuFumana ulwazi
Ukufunyanwa kolwazi (IR) yinkqubo yesoftware eququzelela, igcine, ifumane, kwaye ivavanye ulwazi olusuka koovimba bamaxwebhu, ngakumbi ulwazi olubhaliweyo.
Itekhnoloji inceda abasebenzisi ukuba bafumane ulwazi abaludingayo, kodwa ayinikezeli ngokucacileyo iimpendulo kwimibuzo yabo. Ikwazisa ngobukho kunye nendawo yamaphepha anokunika ulwazi oluyimfuneko.
Amaxwebhu afanelekileyo ngalawo ahlangabezana neemfuno zomsebenzisi. Inkqubo ye-IR engenaziphene iya kubuyisela amaxwebhu akhethiweyo kuphela.
Ukuhambelana kwesihloko
Ukuhambelana kwesihloko kukora isihloko esinye ngokubala iqondo lokuyelelana kwesemantic phakathi kwamagama esihloko anamanqaku aphezulu. Ezi metrics zinceda ekwahluleni phakathi kwezifundo ezitolika ngokwesemantiki kunye nezihloko ezizii-artifacts zeenkcukacha-manani.
Ukuba iqela lamabango okanye izibakala ziyaxhasana, kuthiwa ziyahambelana.
Ngenxa yoko, iseti yenyani edibeneyo inokuqondwa kumxholo obandakanya zonke okanye uninzi lwezibakala. “Umdlalo ngumdlalo weqela,” “umdlalo udlalwa ngebhola,” yaye “umdlalo ufuna umgudu omkhulu oshukumayo” yonke le yimizekelo yeeseti zezibakala ezihambelanayo.
Iindlela ezahlukeneyo zoMfanekiso weSihloko
Le nkqubo ibalulekileyo inokuqhutywa ngeendlela ezahlukeneyo ze-algorithms okanye iindlela. Phakathi kwazo kukho:
- Ulwabiwo lweDirichlet eLatent (LDA)
- I-Non Negative Matrix Factorization (NMF)
- Uhlalutyo olufihlakeleyo lweSemantic (LSA)
- Uhlahlelo lweSemantic oluNgaguqukiyo oluFikelekileyo (pLSA)
iLatent Dirichlet Allocation(LDA)
Ukubona ubudlelwane phakathi kweetekisi ezininzi kwikhopus, ingqikelelo yamanani kunye negraphical yeLatent Dirichlet Allocation isetyenziswa.
Ngokusebenzisa indlela yeVariational Exception Maximization (VEM), olona qikelelo lukhulu lokunokwenzeka oluvela kwikophusi epheleleyo yesicatshulwa luphunyeziwe.
Ngokwesiko, amagama ambalwa aphezulu avela kwisikhwama samagama akhethiweyo.
Noko ke, isivakalisi asinantsingiselo kwaphela.
Ngokwalobu buchule, isicatshulwa ngasinye siya kumelwa ngokusasazwa okunokwenzeka kwezifundo, kunye nesihloko ngasinye ngokusasazwa okunokwenzeka kwamagama.
I-Non Negative Matrix Factorization(NMF)
I-Matrix ene-Non-Negative Values Factorization yindlela ephambili yokutsalwa kweempawu.
Xa kukho iimpawu ezininzi kunye neempawu ezingacacanga okanye zinokuqikelelwa okubi, i-NMF iluncedo. I-NMF inokuvelisa iipateni ezibalulekileyo, izifundo, okanye imixholo ngokudibanisa iimpawu.
I-NMF yenza uphawu ngalunye njengendibaniselwano yomgca yeseti yophawu loqobo.
Uphawu ngalunye luneseti yee-coefficients ezimele ukubaluleka kophawu ngalunye kuphawu. Uphawu ngalunye lwamanani kunye nexabiso ngalinye lophawu lodidi ngalunye line-coefficient yalo.
Zonke ii-coefficients zi-positive.
Uhlalutyo lweSemantic eLatent
Yenye indlela yokufunda engajongwanga esetyenziselwa ukutsala unxulumano phakathi kwamagama kwiseti yamaxwebhu lucazululo olufihlakeleyo lwesemantic.
Oku kusinceda ukuba sikhethe amaxwebhu afanelekileyo. Umsebenzi wayo ophambili kukunciphisa idimensionality yecorpus enkulu yedata yokubhaliweyo.
Ezi datha zingeyomfuneko zisebenza njengengxolo yangasemva ekufumaneni ulwazi oluyimfuneko kwidatha.
Uhlahlelo lweSemantic oluNgaguqukiyo oluFikelekileyo (pLSA)
Ucazululo olufihlakeleyo lwesemantic olunokwenzeka (PLSA), ngamanye amaxesha lwaziwa njengesalathiso sesemantic esinokwenzeka (PLSI, ngakumbi kwizangqa zokufumana ulwazi), yindlela yamanani yokuhlalutya idatha yemowudi ezimbini kunye neyokwenzeka ngokubambisana.
Ngapha koko, kuyafana nohlalutyo olufihlakeleyo lwe-semantic, apho kwavela khona i-PLSA, ukumelwa komgangatho ophantsi wezinto eziguquguqukayo eziqatshelweyo zinokufunyanwa ngokuhambelana kwazo nezinto ezifihlakeleyo ezithile.
Izandla ngeSihloko sokuModeli kwiPython
Ngoku, ndiza kukuhamba ngesabelo somzekelo wesifundo kunye nePython ulwimi lwenkqubo usebenzisa umzekelo wehlabathi lokwenyani.
Ndiza kuba ngumzekelo wamanqaku ophando. Iseti yedatha endiza kuyisebenzisa apha ivela ku-kaggle.com. Ungafumana ngokulula zonke iifayile endizisebenzisayo kulo msebenzi kule iphepha.
Masiqalise ngeSihloko seModeli sisebenzisa iPython ngokungenisa ngaphandle zonke iilayibrari ezibalulekileyo:
Inyathelo elilandelayo kukufunda zonke iiseti zedatha endiza kuzisebenzisa kulo msebenzi:
Uhlalutyo lweDatha yoHlolo
I-EDA (i-Exploratory Data Analysis) yindlela yokwenza izibalo ezisebenzisa izinto ezibonakalayo. Isebenzisa isishwankathelo seenkcukacha-manani kunye nokuboniswa kwegraphical ukufumanisa iintsingiselo, iipateni, kunye nentelekelelo yovavanyo.
Ndizakwenza uhlalutyo lwedatha yophononongo phambi kokuba ndiqalise ukumodeliswa kwesihloko ukubona ukuba kukho iipateni okanye ubudlelwane kwidatha:
Ngoku siza kufumana amaxabiso angasebenziyo esethi yedatha yovavanyo:
Ngoku ndiza kucwangcisa i-histogram kunye nebhokisi yebhokisi ukujonga ubudlelwane phakathi kwezinto eziguquguqukayo.
Isixa sabalinganiswa kwii-Abstracts zeSiloliwe iseti iyahluka kakhulu.
Kuloliwe, sinobuncinci be-54 kunye nobuninzi beempawu ezingama-4551. I-1065 yi-avareji yesixa samagama.
Isethi yovavanyo ibonakala inomdla ngakumbi kunoqeqesho olumiselweyo ukususela ekubeni isethi yovavanyo ineempawu ze-46 ngelixa isethi yoqeqesho ine-2841.
Ngenxa yoko, isethi yovavanyo yayinomlinganiso weempawu ze-1058, ezifana nesethi yoqeqesho.
Inani lamagama kwiseti yokufunda lilandela ipateni efanayo nenani loonobumba.
Ubuncinane bamagama asi-8 kunye nobuninzi bamagama angama-665 avumelekileyo. Ngenxa yoko, inani lamagama eliphakathi li-153.
Ubuncinane bamagama asixhenxe kwi-abstract kunye namagama aphezulu angama-452 kwiseti yovavanyo iyafuneka.
I-median, kule meko, i-153, efana ne-median kwisethi yoqeqesho.
Ukusebenzisa iithegi zoMfanekiso weSihloko
Kukho izicwangciso ezininzi zokubonisa izihloko. Ndiza kusebenzisa iithegi kulo msebenzi; makhe sijonge indlela yokwenza oko ngokuphonononga iithegi:
Ukusetyenziswa koMfanekiso weSihloko
- Isishwankathelo sesicatshulwa sinokusetyenziselwa ukuqonda umxholo woxwebhu okanye incwadi.
- Ingasetyenziselwa ukususa ukuthambekela komgqatswa kumanqaku eemviwo.
- Umfuziselo wesihloko unokusetyenziselwa ukwakha unxulumano lwesemantiki phakathi kwamagama kwiimodeli ezisekwe kwigrafu.
- Inokuphucula inkonzo yabathengi ngokufumanisa kunye nokuphendula amagama angundoqo kumbuzo womthengi. Abathengi baya kuba nokholo ngakumbi kuwe kuba ubabonelele ngoncedo abalufunayo ngexesha elifanelekileyo kwaye ungakhange ubabangele nayiphi na ingxaki. Ngenxa yoko, ukunyaniseka kwabaxumi kunyuka kakhulu, kwaye ixabiso lenkampani liyenyuka.
isiphelo
Umfuziselo wesihloko luhlobo loyilo lweenkcukacha-manani olusetyenziselwa ukutyhila “izifundo” ezingabonakaliyo ezikhoyo kwingqokelela yeetekisi.
Luhlobo lwemodeli yobalo olusetyenziswa kwi yokufunda umatshini kunye nokusetyenzwa kolwimi lwendalo ukuveza iikhonsepthi ezingabonakaliyo ezikhoyo kwiseti yeetekisi.
Yindlela yokubhaliweyo esetyenziswa kakhulu ukufumana iipateni zesemantic ezifihlakeleyo kumbhalo womzimba.
Shiya iMpendulo