Isiqulatho[Fihla][Bonisa]
Yonke iprojekthi yokuFunda ngoomatshini ixhomekeke kwiseti yedatha elungileyo. Yile datha enkulu eya kukuvumela ukuba uqeqeshe kwaye uqinisekise imodeli yakho ye-ML. Ke, inxalenye enkulu yomsebenzi kwiprojekthi yeML kukufumana idataset egqibeleleyo kwiimfuno zakho. Nangona kunjalo, akusoloko kusenzeka ukuba ufumane inketho ehambelana nomnqweno wakho, njengoko iifayile ezininzi ezijongeka zinomdla, ekugqibeleni, azinjalo.
Kunokuba nzima ukuchitha ixesha ukukhuphela inkitha yedatha ude ufike kwiseti efanelekileyo. Ngaloo nto engqondweni, siqokelele ezinye iinketho ezibonakala zinomdla kwaye ezinokukunceda ukuphuhlisa iprojekthi yakho yeML. Qaphela ukuba ezinye zenzelwe ubuqu endaweni yosetyenziso lwentengiso, ke jonga ezi zikhetho njengendlela yokufumana amava kwindalo yonke yeML.
Iziseko zeeSeti zedatha
Ngaphambi kokuba sikhankanye iiseti zedatha, kufuneka sichaze amagama athile. Kwiiprojekthi zeArtificial Intelligence, ngakumbi U kufunda, inani elikhulu ledatha liyafuneka, eliya kusetyenziselwa ukuqeqesha i-algorithm. Esi sixa sedatha siqokelelwa kwisiseko sedatha, esiluncedo kakhulu ekufundiseni i-algorithm.
Ngale datha, i-algorithm iqeqeshwe - iphinde ivavanywe - kwaye ikwazi ukufumana iipatheni, ukuseka ubudlelwane kwaye ngaloo ndlela yenza izigqibo ngokuzimeleyo. Ngaphandle koqeqesho, U kufunda Ii-algorithms azikwazi ukwenza nasiphi na isenzo. Ngoko ke, ngcono idatha yoqeqesho, ngcono imodeli iya kwenza. Ukuze isiseko sedatha sibe luncedo kwiprojekthi, ayithethi malunga nobuninzi: ikwamalunga nokuhlelwa.
Ngokufanelekileyo, idatha kufuneka ibhalwe kakuhle. Cinga ngemeko yee-chatbots: ukufakwa kolwimi kubalulekile, kodwa uhlalutyo olucokisekileyo lwe-syntactic kufuneka lwenziwe ukwenzela ukuba i-algorithm eyenziwe inokuqonda xa i-interlocutor isebenzisa i-slang. Kulapho kuphela apho umncedisi wenyani uya kuba nako ukuqalisa impendulo ngokwento ecelwe ngumsebenzisi.
Iisethi zedatha zinokuveliswa kwiisaveyi, idatha yokuthengwa komsebenzisi, ukuvavanya okushiywe kwiinkonzo, kunye nezinye iindlela ezininzi ezivumela ukuqokelela ulwazi oluluncedo oluhlelwe kwiikholomu kunye nemigqa kwifayile ye-CSV.
Ngaphambi kokuba uqalise ukukhangela isethi yedatha egqibeleleyo, kubalulekile ukuba uyazi injongo yeprojekthi yakho, ngakumbi ukuba isuka kwindawo ethile, efana nemozulu, imali, impilo, njl. Oku kuya kuyalela umthombo apho uya kukhupha uluhlu lwedatha.
Iiseti zedatha yeML
Uqeqesho lwe-Chatbot
I-chatbot esebenzayo ifuna isixa esikhulu sedatha yoqeqesho ukuze kulungiswe ngokukhawuleza imibuzo yabasebenzisi ngaphandle kokungenelela kwabantu. Nangona kunjalo, i-bottleneck ephambili ekuphuhlisweni kwe-chatbot kukufumana idatha ye-dialog enokwenyani, ejolise kumsebenzi wokuqeqesha ezi nkqubo zisekwe kuMatshini wokuFunda.
Iseti yedatha yencoko iqokelela idatha kumbuzo kunye nefomathi yempendulo. Ilungele uqeqesho lwee-chatbots eziya kunika iimpendulo ezizenzekelayo kubaphulaphuli. Ngaphandle kwale datha, i-chatbot iya kusilela ukusombulula ngokukhawuleza imibuzo yabasebenzisi okanye iphendule imibuzo yabasebenzisi ngaphandle kwesidingo sokungenelela kwabantu.
Ukusebenzisa ezi datha, amashishini anokwenza isixhobo esinika iimpendulo ezikhawulezayo kubathengi 24/7 kwaye sibiza kakhulu kunokuba neqela labantu abenza inkxaso yabathengi.
1. Iseti yedatha yeempendulo zemibuzo
Le dataset ibonelela ngamanqaku eWikipedia, imibuzo kunye neempendulo zabo ezenziwe ngesandla. Yidathasethi eqokelelwe phakathi kuka-2008 no-2010 ukuze isetyenziswe kuyo uphando lwezifundo.
2. Idatha yoLwimi
Idatha yoLwimi yidathabheyisi elawulwa yiYahoo eneenkcukacha eziveliswe kwezinye zeenkonzo zenkampani, ezifana neYahoo! Phendula, esebenza njengoluntu oluvulekileyo kubasebenzisi ukuthumela imibuzo kunye neempendulo.
3. WikiQA
Ikhopus ye-WikiQA nayo ineseti yemibuzo kunye neempendulo. Umthombo wemibuzo yi-Bing, ngelixa iimpendulo ziqhagamshela kwiphepha le-Wikipedia elinamandla okusombulula umbuzo wokuqala.
Iyonke, kukho imibuzo engaphezu kwe-3,000 kunye nesethi yezivakalisi ze-29,258 kwidathasethi, apho malunga ne-1,400 ihlelwe njengeempendulo kumbuzo ohambelanayo.
Idatha karhulumente
Iiseti zedatha eziveliswe ngoorhulumente zizisa idatha yedemografi, eyigalelo elikhulu kwiiprojekthi ezinxulumene nokuqonda iindlela zentlalo, ukudala imigaqo-nkqubo yoluntu, kunye nokuphucula uluntu. Oku kunokuba luncedo kwimikhankaso yezopolitiko, intengiso ekujoliswe kuyo, okanye uhlalutyo lwemarike.
Ezi datha zihlala zinedatha engaziwa mntu, ngelixa iimodeli zinokufikelela kwidatha ekrwada, akukho kuphulwa kobumfihlo bomntu.
4. Data.gov
Iqaliswe kwi-2009, i-Data.gov ngumthombo waseMntla waseMelika wedatha. Ikhathalogu yayo iyamangalisa: ngaphezu kwe-218,000 yedatha evumela ulwahlulo ngefomathi, iithegi, iindidi, kunye nezihloko.
5. EU Open Data Portal
I-EU Open Data Portal inikeza ukufikelela kwiinkcukacha ezivulekileyo ekwabelwana ngazo ngamaziko e-European Union. Ezi datha ezinokuthi zijoliswe ekusetyenzisweni kwezorhwebo kunye nokungarhwebi. Kubasebenzisi bangaphezulu kwe-15.5 lamawaka eedatha, eziquka izihloko ezifana nempilo, amandla, okusingqongileyo, inkcubeko, kunye nemfundo.
Idatha yezempilo
Emva kwengxaki yezempilo eqhubekayo kwihlabathi jikelele, iiseti zedatha ezenziwe yimibutho yezempilo zibalulekile ekuphuhliseni izisombululo ezisebenzayo zokusindisa ubomi. Ezi sethi zedatha zinokunceda ekuchongeni izinto ezinobungozi, ukusebenza ngeendlela zosulelo lwesifo, kunye nokukhawulezisa ukuxilongwa.
Ezi datha ziquka iirekhodi zempilo, inani labantu abagulayo, ukuxhaphaka kwezifo, ukusetyenziswa kwamayeza, ixabiso lezondlo, nokunye okuninzi.
6. IGlobal Health Observatory
Le seti yedatha linyathelo loMbutho wezeMpilo weHlabathi (i-WHO). Ibonelela ngedatha yoluntu enxulumene nemimandla eyahlukeneyo yezempilo, eququzelelwa yimixholo efana neenkqubo zempilo, ulawulo lokusetyenziswa kwecuba, ukukhulelwa, i-HIV/AIDS, njl.
7. I-CORD-19
I-CORD-19 yikophu yopapasho lwezifundo kwi-COVID-19 kunye namanye amanqaku amalunga ne-coronavirus entsha. Luluhlu lwedatha oluvulekileyo olunenjongo yokuvelisa iimbono ezintsha nge-COVID-19.
Idatha yezoqoqosho
Iiseti zedatha ezinxulumene nemeko yezemali zihlala ziqokelela isixa esikhulu solwazi, kuba kuqhelekile ukuba ziqokelelwe ixesha elide. Zikulungele ukwenza uqikelelo lwezoqoqosho okanye ukuseka iindlela zotyalo-mali.
Ngeedatha ezifanelekileyo zemali, a Imodeli yokufunda ngoomatshini banokukwazi ukuqikelela ukuziphatha kwe-asethi enikiweyo. Yiyo loo nto icandelo lezemali lenza konke okusemandleni alo ukwenza imodeli yeML esebenzayo, njengoko nantoni na enokuthi iqikelele nokuba ngokufanelekileyo inamandla okwenza izigidi zeedola. Ukufunda ngoomatshini sele kuqikelela indlela yokuziphatha kwabemi, nto leyo echaphazela indlela abenzi bomgaqo-nkqubo abenza ngayo imisebenzi yabo.
8. Fund International Fund
Uluhlu lwedatha ye-IMF inoluhlu lwezalathisi zoqoqosho kunye nezemali, amanani-manani elizwe elilungu, kunye nezinye iinkcukacha zemali mboleko kunye notshintshiselwano.
9. IBhanki ye hlabathi
Uvimba weBhanki yeHlabathi uqulethe iiseti zedatha ezahlukeneyo ezinolwazi lwezoqoqosho oluvela kumazwe ahlukeneyo. Zingaphezulu kwe-17,000 iiseti zedatha ezahlulwe ngamazwekazi.
Uphononongo lwemveliso kunye neenkonzo
Uhlalutyo lweemvakalelo lufumene usetyenziso lwalo kwiinkalo ezahlukeneyo ngoku ezinceda amashishini ukuba aqikelele kwaye afunde kubathengi bawo okanye abathengi ngokuchanekileyo. Uhlalutyo lweemvakalelo lusanda kusetyenziswa ukubeka iliso kumajelo eendaba ezentlalo, ukujonga uphawu, ilizwi lomthengi (VoC), inkonzo yabathengi, kunye nophando lwentengiso.
Uhlalutyo lweemvakalelo lusebenzisa i-NLP (i-neuro-linguistic programming) iindlela kunye ne-algorithms esekelwe kumthetho, i-hybrid, okanye ixhomekeke kubuchule bokufunda ngoMatshini ukuze bafunde idatha kwiiseti zedatha.
Idatha efunekayo kuhlalutyo lweemvakalelo kufuneka ibe yinto ekhethekileyo kwaye ifunwa ngobuninzi. Eyona nxalenye ingumceli mngeni malunga nenkqubo yoqeqesho yohlalutyo lweemvakalelo ayikufumani idatha kwizixa ezikhulu; endaweni yoko, kukufumana iiseti zedatha ezifanelekileyo. Ezi seti zedatha kufuneka zigubungele indawo ebanzi yohlalutyo lweemvakalelo kunye namatyala okusetyenziswa.
10. Uphengululo lweAmazon
Le datha iqulethe malunga ne-35 yezigidi zokuphononongwa kwe-Amazon, ethatha ixesha le-18 leminyaka yolwazi oluqokelelweyo. Luluhlu lwedatha yemveliso, umsebenzisi, kunye nomxholo wophononongo.
11. Yelp Reviews
I-Yelp ikwabonelela ngeseti yedatha esekwe kulwazi oluqokelelwe kwinkonzo yayo. Kukho ngaphezulu kwezigidi ezisi-8 zokuphononongwa, iingcebiso ezisisigidi esi-1, kunye neempawu eziphantse zibe sisigidi esi-1.5 ezinxulumene namashishini, njengeeyure zokuvula kunye nokufumaneka.
12. Uphononongo lwe-IMDB
Lo vimba wedatha uqulethe iseti engaphezulu kwe-25 amawaka ophononongo bhanyabhanya ukulungiselela uqeqesho kunye namanye amawaka angama-25 kwiimvavanyo ezithathwe ngokungacwangciswanga kwiphepha le-IMDB, elikhethekileyo kwiireyithingi zemuvi. Ikwabonelela ngedatha engabhalwanga njengeyongezelelweyo.
Iiseti zedatha zamanyathelo okuqala kwi-ML
13. Iseti yoMgangatho weWayini
Olu lwazi lubonelela ngolwazi olunxulumene newayini, ebomvu neluhlaza, eveliswe emantla ePortugal. Injongo kukuchaza umgangatho wewayini ngokusekelwe kuvavanyo lwephysicochemical. Umdla kwabo bafuna ukuziqhelanisa nokwenza inkqubo yokuqikelela.
14. Iseti yedatha yeTitanic
Le datha izisa idatha evela kubakhweli bokwenyani abangama-887 abavela kwi-Titanic, kunye nekholamu nganye echaza ukuba basindile, iminyaka yabo, iklasi yabakhweli, isini, kunye nentlawulo yokukhwela abayihlawule. Olu lwazi lwaluyinxalenye yomngeni owasungulwa liqonga leKaggle, elinjongo yalo yayikukudala imodeli enokuthi iqikelele ukuba ngabaphi abakhweli abasindileyo ekuzikeni kweTitanic.
Amaqonga okufumana ezinye iiseti zedatha
Ukuba ufuna ukuya phambili kwaye ufumane eyakho idataset, eyona ndlela ingcono kukukhangela iindawo zokugcina ezidumileyo ze U kufunda indalo:
Igwebu
I-Kaggle, i-subsidiary ye-Google LLC, luluntu olukwi-intanethi lwezazinzulu zedatha kunye neengcali zokuFunda ngoomatshini. I-Kaggle ivumela abasebenzisi ukuba bafumane kwaye bapapashe iiseti zedatha, baphonononge kwaye benze imifuziselo kwindawo yesayensi yedatha esekwe kwiwebhu; ukusebenza kunye nezinye izazinzulu data kunye Iinjineli zokuFunda ngoomatshini, kwaye uthathe inxaxheba kukhuphiswano lokusombulula imingeni yesayensi yedatha.
I-Kaggle yaqala ngo-2010 ngokubonelela ngokhuphiswano lokuFunda ngoomatshini kwaye ngoku ikwabonelela noluntu iqonga data, i-workbench esekelwe kwifu kwisayensi yedatha kunye nemfundo ye-Artificial Intelligence.
Uphendlo lweseti yedatha
Iseti yedatha i-injini yokukhangela evela kuGoogle enceda abaphandi bafumane idatha ye-intanethi efumaneka simahla ukuba isetyenziswe. Kwiwebhu iphela, kukho izigidi zeeseti zedatha malunga nawo nawuphi na umbandela onomdla kuwo.
Ukuba ujonge ukuthenga injana, unokufumana iiseti zedatha eziqulunqa izikhalazo zabathengi beenjana okanye izifundo malunga nokuqonda injana. Okanye ukuba uyathanda ukutyibiliza ekhephini, unokufumana idatha ngengeniso yeendawo zokuchithela iiholide okanye amaxabiso okwenzakala kunye namanani okuthatha inxaxheba. I-dataset Search ine-indexed phantse 25 yezigidi zezi datha, ikunika indawo enye yokukhangela iiseti zedatha kwaye ufumane amakhonkco apho idatha ikhoyo.
UCI Machine Learning Repository
I-UCI Machine Learning Repository yingqokelela yogcino-lwazi, ithiyori yesizinda, kunye neejenereyitha zedatha ezisetyenziswa luluntu lokuFunda ngoMatshini kuhlalutyo lobuchule lwe-algorithms yokuFunda koMatshini. Uvimba wenziwa njengendawo yokugcina ye-ftp ngo-1987 nguDavid Aha kunye nabanye abafundi abaphumelele e-UC Irvine.
Ukusukela ngelo xesha, ibisetyenziswa ngokubanzi ngabafundi, abafundisi, kunye nabaphandi kwihlabathi liphela njengomthombo ophambili weeseti zedatha zeML. Njengombonakaliso wempembelelo ye-archive, sele icatshulwe ngaphezu kwamaxesha e-1000, okwenza ukuba ibe yenye ye-100 ephezulu "amaphepha" akhankanywe kuyo yonke isayensi yekhompyutha.
I-Quandl
I-Quandl liqonga elibonelela abasebenzisi bayo ngezoqoqosho, ezemali, kunye nezinye iiseti zedatha. Abasebenzisi banokukhuphela idatha yamahhala, bathenge idatha ehlawulwayo okanye bathengise idatha kwi-Quandl. Inokuba sisixhobo esiluncedo kuphuhliso lwe ii-algorithms zokurhweba, njengokuba.
isiphelo
Ngokuphonononga ezi zixhobo, uqinisekile ukuba ufumana amagalelo amakhulu kwiiprojekthi zakho. Qinisekisa ukuba ukhetha isethi yedatha eyona ifanelekileyo kwiimfuno zakho ezithile kwaye uhlale ukhumbula: akukhona nje malunga nobuninzi, kodwa kunye nomgangatho. Uluhlu lwedatha sisiseko sayo nayiphi na Iprojekthi yokufunda ngoomatshini kwaye kubalulekile ukwakha kwidatha esemgangathweni ukwenzela ukuphepha umngcipheko wokufikelela kwizigqibo eziphosakeleyo.
Shiya iMpendulo