Yese Machine Kudzidza purojekiti inotsamira pane yakanaka dataset. Iri ndiro hombe dhatabheti rinokutendera iwe kudzidzisa uye kusimbisa yako ML modhi. Saka, chikamu chikuru chebasa mupurojekiti yeML kuwana iyo dataset yakakwana yezvaunoda. Nekudaro, hazvigoneke nguva dzose kuwana sarudzo inokodzera chishuvo chako, sezvo akawanda mafaera anotaridzika anonakidza, mukupedzisira, asiri.
Zvinogona kunetsa kutambisa nguva kurodha zvisingaverengeki dhatasets kudzamara wasvika pane yakakodzera seti. Tine izvozvo mupfungwa, takaunganidza dzimwe sarudzo dzinoita sedzinonakidza uye dzinogona kukubatsira kuvandudza yako ML chirongwa. Ziva kuti zvimwe zvakaitirwa wega pachinzvimbo chekushandiswa kwekutengesa, saka tarisa sarudzo idzi senzira yekuwana ruzivo muML universe.
Basics of Datasets
Tisati tataura nezve dataset, isu tinofanirwa kutsanangura mamwe mazwi. MuArtificial Intelligence mapurojekiti, kunyanya Machine Learning, yakawanda data inodiwa, iyo ichashandiswa kudzidzisa algorithm. Uhu huwandu hwe data hunounganidzwa mune dhatabhesi, iyo inonyanya kubatsira kudzidzisa algorithm.
Neiyi data, iyo algorithm inodzidziswa - zvakare yakaedzwa - uye inokwanisa kuwana mapatani, kumisa hukama uye nekudaro kuita sarudzo vakazvimirira. Pasina kudzidziswa, Machine Learning ma algorithms haakwanise kuita chero chiito. Naizvozvo, zviri nani iyo data yekudzidziswa, zviri nani iyo modhi ichaita. Kuti dhatabhesi rive rinobatsira kupurojekiti, hazvisi zvehuwandu: zvakare nezve kupatsanura.
Nenzira yakanaka, iyo data inofanira kunyorwa zvakanaka. Funga nezvenyaya yema chatbots: kuiswa kwemutauro kwakakosha, asi kunyatsoongorora syntactic kunofanirwa kuitwa kuitira kuti algorithm yakagadzirwa inogona kunzwisisa kana interlocutor ari kushandisa slang. Ipapo ndipo chete apo mubatsiri anozokwanisa kuburitsa mhinduro zvinoenderana nezvakakumbirwa nemushandisi.
Datasets inogona kugadzirwa kubva kuongororo, data rekutenga remushandisi, ongororo dzakasiiwa pamasevhisi, uye nedzimwe nzira dzakawanda dzinobvumira kuunganidza ruzivo rwakakosha rwakarongwa mumakoramu nemitsara mufaira reCSV.
Usati watanga kutsvaga dhatabheti rakakwana, zvakakosha kuti uzive chinangwa chepurojekiti yako, kunyanya kana ichibva kune imwe nzvimbo, semamiriro ekunze, mari, hutano, nezvimwe. Izvi zvinoraira kwaunobva kwaunobva Dataset.
Datasets yeML
Chatbot kudzidziswa
Iyo chatbot inoshanda inoda huwandu hukuru hwe data yekudzidziswa kuitira kuti ikurumidze kugadzirisa mibvunzo yemushandisi pasina kupindira kwevanhu. Nekudaro, iyo yekutanga bhodhoro mukuvandudza chatbot kuwana chaiyo, yakatarisana nebasa dialog data kudzidzisa aya Machine Kudzidza-yakavakirwa masisitimu.
Iyo dataset yekukurukurirana inounganidza data mumubvunzo uye mhinduro fomati. Yakanakira kudzidzisa chatbots iyo inopa otomatiki mhinduro kune vateereri. Pasina iyi data, chatbot inotadza kukurumidza kugadzirisa mibvunzo yemushandisi kana kupindura mibvunzo yemushandisi pasina kudiwa kwekupindira kwevanhu.
Uchishandisa aya dataset, mabhizinesi anogona kugadzira chishandiso chinopa mhinduro nekukurumidza kune vatengi 24/7 uye yakachipa zvakanyanya pane kuve nechikwata chevanhu vari kuita rutsigiro rwevatengi.
1. Mubvunzo-Mhinduro Dataset
Iyi dataset inopa seti yezvinyorwa zveWikipedia, mibvunzo uye mhinduro dzadzo dzakagadzirwa nemaoko. Iri dataset rakaunganidzwa pakati pa2008 na2010 kuti rishandiswe mukati tsvakurudzo yezvidzidzo.
2. Mutauro Data
Mutauro Data idatabase inotungamirwa neYahoo ine ruzivo rwakagadzirwa kubva kune mamwe masevhisi ekambani, seYahoo! Mhinduro, iyo inoshanda senharaunda yakavhurika yevashandisi kutumira mibvunzo nemhinduro.
3. WikiQA
Iyo WikiQA corpus zvakare ine seti yemibvunzo nemhinduro. Kunobva mibvunzo iBing, nepo mhinduro dzichibatana nepeji yeWikipedia ine mukana wekugadzirisa mubvunzo wekutanga.
Pakazara, kune mibvunzo inodarika zviuru zvitatu uye seti yemitsara ye3,000 mudhataset, iyo inenge 29,258 yakarongedzwa semhinduro kumubvunzo unoenderana.
Hurumende data
Datasets inogadzirwa nehurumende inounza data yedemographic, zvinova zvakakosha zvemapurojekiti ane chekuita nekunzwisisa mafambiro emagariro, kugadzira mitemo yeveruzhinji, nekuvandudza nzanga. Izvi zvinogona kubatsira kumapato ezvematongerwo enyika, kushambadza kwakanangwa, kana kuongorora musika.
Aya ma dataset anowanzo aine data risingazivikanwe, saka nepo mamodheru achigona kuwana iyo yakapfava data, hapana kutyorwa kwekuvanzika kwemunhu.
4. Idatha.gov
Yakatangwa muna 2009, Data.gov ndiyo yekuNorth America sosi yedata. Katalog yayo inoshamisa: anopfuura mazana maviri negumi nesere,218,000 dhataseti inobvumira zvikamu nefomati, ma tag, mhando, uye misoro.
5. EU Open Data Portal
Iyo EU Open Data Portal inopa mukana wekuvhura data yakagovaniswa nemasangano eEuropean Union. Aya ndiwo data anogona kuitirwa zvekutengesa uye zvisiri zvekutengesa. Pakushandiswa kwemushandisi zvinodarika zviuru gumi nezvishanu nemazana mashanu emadataseti, anovhara misoro senge hutano, simba, nharaunda, tsika, uye dzidzo.
Hutano data
Nekuda kwedambudziko rehutano riri kuenderera mberi pasi rese, dhataseti dzakagadzirwa nemasangano ehutano akakosha kugadzira mhinduro dzinoshanda dzekuponesa hupenyu. Aya madhataseti anogona kubatsira kuona njodzi dzinokonzeresa, kuita maitiro ekutapurirana kwechirwere, uye nekukasika kuongororwa.
Aya ma dataset ane zvinyorwa zvehutano, huwandu hwevarwere, kuwanda kwechirwere, kushandiswa kwemishonga, kukosha kwekudya, nezvimwe zvakawanda.
6. Global Health Observatory
Iyi data set chirongwa cheWorld Health Organisation (WHO). Inopa veruzhinji dhata rine chekuita nenzvimbo dzakasiyana dzehutano, dzakarongwa nemusoro senge masisitimu ehutano, kutonga kwefodya, kusununguka, HIV/AIDS, nezvimwe. Pane zvakare sarudzo yekubvunza data paCOVID-19.
7. CORD-19
CORD-19 ihomwe yezvinyorwa zvedzidzo paCOVID-19 uye zvimwe zvinyorwa nezve coronavirus itsva. Iyo dataset yakavhurika yakagadzirirwa kuburitsa miono mitsva paCOVID-19.
Economics data
Datasets ine chekuita nemamiriro ezvemari kazhinji inounganidza huwandu hukuru hweruzivo, sezvo zvakajairika kuti dzakaunganidzwa kwenguva yakareba. Iwo akanakira kugadzira fungidziro dzehupfumi kana kumisikidza mafambiro ekudyara.
Nemagwaro ezvemari akakodzera, a Machine Kudzidza modhi inogona kufanotaura mafambiro echinhu chakapihwa. Ndicho chikonzero chikamu chezvemari chiri kuita zvese musimba rayo kugadzira iyo inoshanda ML modhi, sezvo chero chinhu chinogona kufanotaura kunyangwe zvine musoro zvine mukana wekuburitsa mamirioni emadhora. Kudzidza kweMichina kuri kutofanotaura maitiro evagari, izvo zviri kukanganisa nzira iyo vanogadzira mitemo vari kuita mabasa avo.
8. International Monetary Fund
IMF dataset inobata zviratidzi zvehupfumi nezvemari, nhamba yenyika yenhengo, uye imwe dhata rechikwereti uye rekuchinjana.
9. Bhidhiyo renyika
World Bank's repository ine datasets dzakasiyana dzine ruzivo rwehupfumi kubva kunyika dzakasiyana. Kune anopfuura 17,000 datasets akakamurwa nemakondinendi.
Ongororo yezvigadzirwa nemasevhisi
Sentiment ongororo yakawana mashandisiro ayo muzvikamu zvakasiyana izvo zvave kubatsira mabhizinesi kufungidzira uye kudzidza kubva kune vatengi vavo kana vatengi nemazvo. Sentiment ongororo iri kuwedzera kushandiswa pasocial media yekutarisa, kutarisa mhando, izwi remutengi (VoC), sevhisi yevatengi, uye kutsvaga kwemusika.
Sentiment ongororo inoshandisa NLP (neuro-linguistic programming) nzira uye maalgorithms angave ane mutemo-based, hybrid, kana kutsamira paMachina Kudzidza maitiro ekudzidza data kubva kumaseti.
Iyo data inodiwa mukuongorora manzwiro inofanirwa kuve yakasarudzika uye inodiwa muhuwandu hukuru. Chikamu chinonyanya kunetsa nezve manzwiro ekuongorora kudzidziswa maitiro hakusi kuwana data muhuwandu hukuru; pachinzvimbo, ndeye kutsvaga akakodzera dataset. Aya ma data seti anofanirwa kuvhara nzvimbo yakafara yekuongorora manzwiro uye makesi ekushandisa.
10. Amazon Ongororo
Iyi dataset ine mamirioni makumi matatu nemashanu eongororo yeAmazon, inotora makore gumi nemasere eruzivo rwakaunganidzwa. Iyo dataset yechigadzirwa, mushandisi, uye yekuongorora zvemukati.
11. Yelp Ongororo
Yelp inopawo dataset zvichienderana neruzivo rwakaunganidzwa kubva kubasa rayo. Kune anopfuura mamirioni masere eongororo, miriyoni matipi, pamwe neanosvika miriyoni miriyoni hunhu hwakabatana nemabhizinesi, senge maawa ekuvhura uye kuwanikwa.
12. IMDB Ongororo
Iyi dhatabhesi ine seti inodarika zviuru makumi maviri neshanu zvemabhaisikopo eongororo uye zvimwe zviuru makumi maviri neshanu zvebvunzo dzakatorwa zvisina kurongwa kubva kuIMDB peji, nyanzvi mukuyerwa kwemafirimu. Inopawo data isina kunyorwa sekuwedzera.
Datasets ematanho ekutanga muML
13. Waini Quality Dataset
Iyi dataset inopa ruzivo rwakabatana newaini, tsvuku uye girini, inogadzirwa kuchamhembe kwePortugal. Chinangwa ndechekutsanangura mhando yewaini yakavakirwa pane physicochemical bvunzo. Inofarira kune avo vanoda kudzidzira kugadzira hurongwa hwekufungidzira.
14. Titanic Dataset
Iyi dataset inounza data kubva ku887 vafambi chaivo kubva kuTitanic, nekoramu yega yega ichitsanangura kana vakararama, zera ravo, kirasi yevatyairi, murume kana murume, uye muripo webhodhi wavakabhadhara. Iyi dataset yaive chikamu chedambudziko rakatangwa nepuratifomu yeKaggle, ine chinangwa chekugadzira modhi yaigona kufanotaura kuti ndevapi vafambi vakapona pakunyura kweTitanic.
Mapuratifomu ekutsvaga Mamwe maDataset
Kana iwe uchida kuenderera mberi nekutsvaga yako dhatabheti, nzira yakanakisa ndeyekubhurawuza kuburikidza neanonyanya kuzivikanwa repositories e Machine Learning nyika:
Kaggle
Kaggle, mubatsiri weGoogle LLC, inharaunda yepamhepo yemasaenzi edata uye nyanzvi dzeKudzidza Muchina. Kaggle inobvumira vashandisi kutsvaga uye kushambadza dheta, kuongorora uye kugadzira mamodheru muwebhu-based data sainzi nharaunda; shanda nemamwe masayendisiti data uye Machine Kudzidza Injiniya, uye kutora chikamu mumakwikwi ekugadzirisa data sainzi matambudziko.
Kaggle yakatanga muna 2010 nekupa Machine Kudzidza makwikwi uye ikozvino inopawo veruzhinji data platform, gore-based workbench ye data science uye Artificial Intelligence dzidzo.
Dataset Search
Dataset Kutsvaga injini yekutsvaga kubva kuGoogle inobatsira vanotsvaga kutsvaga online data inowanikwa mahara kuti ishandiswe. Padandemutande rese, kune mamirioni emaseti edhata nezvechero nyaya yaunofarira iwe.
Kana iwe uri kutsvaga kutenga imbwa, unogona kuwana dhataseti inonyora zvichemo zvevatengi vembwa kana zvidzidzo pa puppy cognition. Kana kana iwe uchida skiing, unogona kuwana data pamusoro pemari ye ski resorts kana mitengo yekukuvara uye nhamba dzekutora chikamu. Kutsvaga kweDataset kwakarongedza angangoita mamirioni makumi maviri neshanu emadatasets, ichikupa nzvimbo imwechete yekutsvaga dataset uye kuwana zvinongedzo kune iyo data.
UCI Machine Kudzidza Repository
Iyo UCI Muchina Kudzidza Repository muunganidzwa wedhatabhesi, domain theories, uye data jenareta ayo anoshandiswa neMuchina Kudzidza nharaunda kune empirical ongororo yeMachina Kudzidza algorithms. Iyo dura rakagadzirwa senge ftp archive muna 1987 naDavid Aha nevamwe vadzidzi vakapedza kudzidza kuUC Irvine.
Kubva panguva iyoyo, yakashandiswa zvakanyanya nevadzidzi, vadzidzisi, uye vaongorori pasi rese seyokutanga sosi yeML dataset. Sechiratidzo chekukanganisa kweiyo archive, yakataurwa kanopfuura ka1000, zvichiita kuti ive imwe yepamusoro 100 inonyanya kutaurwa "mapepa" mune ese esainzi yekombuta.
Quandl
Quandl ipuratifomu inopa vashandisi vayo hupfumi, mari, uye mamwe madatasets. Vashandisi vanogona kudhawunirodha data remahara, kutenga data rakabhadharwa kana kutengesa data kuQuandl. Inogona kuva chishandiso chinobatsira pakuvandudza kwe kutengesa algorithms, semuyenzaniso.
mhedziso
Nekuongorora maturusi aya, une chokwadi chekuwana makuru ekushandisa kumapurojekiti ako. Iva nechokwadi chekusarudza iyo dataset inonyanya kukodzera kune zvaunoda chaizvo uye gara uchifunga: haisi yehuwandu chete, asiwo mhando. Iyo dataset ndiyo hwaro hwechero Machine Learning project uye zvakakosha kuvaka pamhando yedata kuitira kudzivirira njodzi yekusvika pamhedzisiro dzisina kunaka.
Leave a Reply