M'ndandanda wazopezekamo[Bisani][Show]
Ntchito iliyonse ya Machine Learning imadalira deta yabwino. Ndi dataset yayikuluyi yomwe ingakuthandizeni kuphunzitsa ndikutsimikizira mtundu wanu wa ML. Chifukwa chake, gawo lalikulu la ntchito mu pulojekiti ya ML ndikupeza deta yabwino pazosowa zanu. Komabe, sizotheka nthawi zonse kupeza njira yomwe ikugwirizana ndi zomwe mukufuna, monga mafayilo ambiri omwe amawoneka osangalatsa, pamapeto pake, satero.
Zingakhale zovuta kutaya nthawi kutsitsa ma dataset osawerengeka mpaka mutafika pamalo abwino. Poganizira izi, tasonkhanitsa zosankha zomwe zikuwoneka zosangalatsa ndipo zingakuthandizeni kupanga pulojekiti yanu ya ML. Zindikirani kuti zina zimapangidwira payekha m'malo mogwiritsa ntchito malonda, choncho yang'anani zosankhazi ngati njira yodziwira zambiri mu chilengedwe cha ML.
Zoyambira za Datasets
Tisanatchule zamagulu a data, tiyenera kufotokozera mawu ena. Mu ntchito za Artificial Intelligence, makamaka Kuphunzira Makina, deta yochuluka imafunika, yomwe idzagwiritsidwe ntchito pophunzitsa algorithm. Deta iyi imasonkhanitsidwa mu database, yomwe ndi yothandiza kwambiri pophunzitsa algorithm.
Ndi deta iyi, ma algorithm amaphunzitsidwa - amayesedwanso - ndipo amatha kupeza machitidwe, kukhazikitsa maubwenzi ndikupanga zisankho mwachisawawa. Popanda maphunziro, Kuphunzira Makina ma aligorivimu sangathe kuchita chilichonse. Choncho, deta yophunzitsira bwino, chitsanzocho chidzachita bwino. Kuti database ikhale yothandiza pantchitoyo, sikukhudza kuchuluka kwake: ikukhudzanso kugawa.
Moyenera, deta iyenera kulembedwa bwino. Ganizirani za nkhani ya ma chatbots: kuyika zilankhulo ndikofunikira, koma kusanthula mosamalitsa kuyenera kuchitidwa kuti ma aligorivimu opangidwa amvetsetse pamene wolankhulayo akugwiritsa ntchito slang. Pokhapokha pomwe wothandizirayo adzatha kuyambitsa yankho molingana ndi zomwe adafunsidwa ndi wogwiritsa ntchito.
Zosungiramo data zitha kupangidwa kuchokera ku kafukufuku, zogula za ogwiritsa ntchito, kuwunika kosiyidwa pazithandizo, ndi njira zina zambiri zomwe zimalola kusonkhanitsa zidziwitso zothandiza zokonzedwa m'mizere ndi mizere mufayilo ya CSV.
Musanayambe kufufuza deta yangwiro, ndikofunika kuti mudziwe cholinga cha polojekiti yanu, makamaka ngati ikuchokera kumalo enaake, monga nyengo, ndalama, thanzi, ndi zina zotero. Izi zidzakuuzani komwe mumachokera. deta.
Zithunzi za ML
Maphunziro a Chatbot
Chatbot yogwira mtima imafuna zambiri zophunzitsira kuti athe kuthana ndi mafunso a ogwiritsa ntchito mwachangu popanda kulowererapo kwa anthu. Komabe, cholepheretsa chachikulu pakukula kwa ma chatbot ndikupeza zenizeni, zomwe zimagwira ntchito kuti muphunzitse makina ophunzirira makinawa.
Gulu lazokambirana limasonkhanitsa deta mumtundu wa mafunso ndi mayankho. Ndizoyenera kuphunzitsa ma chatbots omwe amapereka mayankho okhazikika kwa omvera. Popanda deta iyi, chatbot idzalephera kuthetsa mwamsanga mafunso a ogwiritsa ntchito kapena kuyankha mafunso a ogwiritsa ntchito popanda kufunikira kwa anthu.
Pogwiritsa ntchito ma dataset awa, mabizinesi amatha kupanga chida chomwe chimapereka mayankho mwachangu kwa makasitomala 24/7 ndipo ndichotsika mtengo kwambiri kuposa kukhala ndi gulu la anthu omwe amathandizira makasitomala.
1. Funso-Yankho Dataset
Setiyi imapereka zolemba za Wikipedia, mafunso ndi mayankho awo opangidwa pamanja. Ndi data yomwe idasonkhanitsidwa pakati pa 2008 ndi 2010 kuti igwiritsidwe ntchito kufufuza maphunziro.
2. Chinenero Chidziwitso
Language Data ndi nkhokwe yosungidwa ndi Yahoo yokhala ndi chidziwitso chochokera kuzinthu zina zamakampani, monga Yahoo! Yankho, yomwe imagwira ntchito ngati gulu lotseguka kuti ogwiritsa ntchito atumize mafunso ndi mayankho.
3. WikiQA
Gulu la WikiQA lilinso ndi mafunso ndi mayankho angapo. Gwero la mafunso ndi Bing, pomwe mayankho amalumikizana ndi tsamba la Wikipedia lomwe lingathe kuthetsa funso loyamba.
Pazonse, pali mafunso opitilira 3,000 ndi seti ya ziganizo za 29,258 mu dataset, zomwe pafupifupi 1,400 zidagawidwa ngati mayankho afunso lofananira.
Zambiri za boma
Ma data opangidwa ndi maboma amabweretsa zidziwitso za anthu, zomwe ndizofunikira kwambiri pama projekiti okhudzana ndi kumvetsetsa momwe anthu amakhalira, kupanga mfundo za anthu, komanso kukonza bwino anthu. Izi zitha kukhala zothandiza pamakampeni andale, kutsatsa komwe mukufuna, kapena kusanthula msika.
Masetiwa nthawi zambiri amakhala ndi data yosadziwika, kotero kuti ma model amatha kupeza zambiri, palibe kuphwanya zinsinsi.
4. Data.gov
Chokhazikitsidwa mu 2009, Data.gov ndiye gwero la data ku North America. Katundu wake ndi wochititsa chidwi: zopitilira 218,000 zomwe zimalola kugawikana mwamitundu, ma tag, mitundu, ndi mitu.
5. EU Open Data Portal
EU Open Data Portal imapereka mwayi wotsegula deta yogawidwa ndi mabungwe a European Union. Izi ndi data zomwe zitha kugwiritsidwa ntchito pazamalonda komanso zosagulitsa. Ogwiritsa ntchito ali ndi ma dataset opitilira 15.5, omwe amafotokoza mitu monga thanzi, mphamvu, chilengedwe, chikhalidwe, ndi maphunziro.
Zambiri zaumoyo
Chifukwa cha zovuta zathanzi zomwe zikuchitika padziko lonse lapansi, zolembedwa ndi mabungwe azaumoyo ndizofunikira kuti pakhale njira zothetsera kupulumutsa miyoyo. Maguluwa atha kuthandizira kuzindikira zomwe zimayambitsa chiopsezo, kupanga njira zopatsira matenda, ndikufulumizitsa kuzindikira.
Zolemba izi zimakhala ndi mbiri yaumoyo, kuchuluka kwa odwala, kuchuluka kwa matenda, kagwiritsidwe ntchito kamankhwala, kadyedwe, ndi zina zambiri.
6. Global Health Observatory
Deta iyi ndi gawo la World Health Organisation (WHO). Limapereka zidziwitso za anthu onse zokhudzana ndi madera osiyanasiyana azaumoyo, zokonzedwa ndi mitu monga zaumoyo, zoletsa kusuta fodya, amayi oyembekezera, HIV/AIDS, ndi zina zotero. Palinso mwayi wofufuza zambiri za COVID-19.
7. KODI-19
CORD-19 ndi gulu la zofalitsa zamaphunziro pa COVID-19 ndi zolemba zina zokhudzana ndi coronavirus yatsopano. Ndi gulu la data lotseguka lomwe cholinga chake chinali kupanga zidziwitso zatsopano pa COVID-19.
Deta yazachuma
Ma dataset okhudzana ndi malo azachuma nthawi zambiri amasonkhanitsa zidziwitso zambiri, chifukwa ndizofala kuti zasonkhanitsidwa kwa nthawi yayitali. Ndizoyenera kupanga zolosera zachuma kapena kukhazikitsa njira zoyendetsera ndalama.
Ndi zolemba zoyenera zachuma, a Machine Learning model akhoza kuneneratu momwe katundu wapatsidwa. Ichi ndichifukwa chake gulu lazachuma likuchita chilichonse chomwe lingathe kupanga ML yogwira mtima, popeza chilichonse chomwe chingathe kuneneratu ngakhale momveka bwino chili ndi kuthekera kopanga mamiliyoni a madola. Machine Learning ikulosera kale zomwe nzika zikuchita, zomwe zimakhudza momwe opanga mfundo amagwirira ntchito.
8. Fuko la Ndalama Zamdziko Lonse
Dongosolo la data la IMF lili ndi zizindikiro zingapo zachuma ndi zachuma, ziwerengero zamayiko omwe ali mamembala, ndi data ina yangongole ndi ndalama zosinthira.
9. Banki Yadziko
Nkhokwe ya Banki Yadziko Lonse ili ndi ma dataset osiyanasiyana okhala ndi zidziwitso zachuma zochokera kumayiko osiyanasiyana. Pali ma dataset opitilira 17,000 ogawidwa ndi makontinenti.
Ndemanga zamalonda ndi ntchito
Kusanthula kwamalingaliro kwapeza ntchito zake m'magawo osiyanasiyana omwe tsopano akuthandiza mabizinesi kuyerekeza ndi kuphunzira kuchokera kwa makasitomala awo kapena makasitomala molondola. Kusanthula kwamaganizidwe kukugwiritsidwa ntchito kwambiri pakuwunika pazama TV, kuyang'anira mtundu, mawu a kasitomala (VoC), ntchito zamakasitomala, komanso kafukufuku wamsika.
Kusanthula kwamalingaliro kumagwiritsa ntchito NLP (neuro-linguistic programming) njira ndi ma aligorivimu omwe amatsatira malamulo, osakanizidwa, kapena amadalira njira za Machine Learning kuti aphunzire zambiri kuchokera ku dataset.
Zomwe zimafunikira pakuwunika kwamaganizidwe ziyenera kukhala zapadera komanso zofunika kwambiri. Chovuta kwambiri chokhudza kusanthula kwamalingaliro sikupeza zambiri; m'malo mwake, ndiko kupeza ma dataset oyenera. Ma seti a data awa ayenera kukhudza mbali zambiri za kusanthula kwamalingaliro ndi zochitika zogwiritsa ntchito.
10. Ndemanga za Amazon
Dongosololi lili ndi ndemanga pafupifupi 35 miliyoni za Amazon, zomwe zimatenga zaka 18 zosonkhanitsidwa zambiri. Ndi gulu lazogulitsa, ogwiritsa ntchito, ndi zowunikira.
11. Ndemanga za Yelp
Yelp imaperekanso dataset kutengera zomwe zasonkhanitsidwa kuchokera muutumiki wake. Pali ndemanga zopitilira 8 miliyoni, maupangiri 1 miliyoni, kuphatikiza pafupifupi 1.5 miliyoni zokhudzana ndi mabizinesi, monga maola otsegulira ndi kupezeka.
12. Ndemanga za IMDB
Nawonso achichepereyu ali ndi mavidiyo opitilira 25 ophunzitsira ndi ena 25 pamayesero omwe adatengedwa mwamwayi patsamba la IMDB, odziwika bwino pakuwonera makanema. Limaperekanso deta yosalembedwa ngati yowonjezera.
Zosungirako zoyambira mu ML
13. Wine Quality Dataset
Izi zimapereka chidziwitso chokhudzana ndi vinyo, wofiira ndi wobiriwira, wopangidwa kumpoto kwa Portugal. Cholinga ndikutanthauzira mtundu wa vinyo potengera mayeso a physicochemical. Zosangalatsa kwa iwo omwe akufuna kuyeseza kupanga dongosolo lolosera.
14. Titanic Dataset
Izi zimabweretsa deta kuchokera kwa anthu 887 omwe adakwera pa Titanic, ndipo gawo lililonse limafotokoza ngati apulumuka, zaka zawo, gulu la okwera, jenda, komanso ndalama zokwerera zomwe adalipira. Deta iyi inali gawo lazovuta zomwe zidayambitsidwa ndi nsanja ya Kaggle, yomwe cholinga chake chinali kupanga mtundu womwe ungathe kulosera anthu omwe adapulumuka pakumira kwa Titanic.
Mapulatifomu Opezera Ma Data Enanso
Ngati mukufuna kupita patsogolo ndikupeza deta yanu, njira yabwino ndikusakatula nkhokwe zodziwika bwino za Kuphunzira Makina chilengedwe:
Chitani
Kaggle, wothandizidwa ndi Google LLC, ndi gulu la intaneti la asayansi a data ndi akatswiri a Machine Learning. Kaggle imalola ogwiritsa ntchito kupeza ndi kufalitsa ma dataset, kufufuza ndi kupanga zitsanzo mu malo a sayansi ya deta; ntchito ndi asayansi ena deta ndi Makina Ophunzirira Makina, ndikuchita nawo mipikisano kuti athetse zovuta za sayansi ya data.
Kaggle adayamba mu 2010 popereka mipikisano ya Machine Learning ndipo tsopano akuperekanso anthu nsanja ya data, benchi yokhazikika pamtambo ya sayansi ya data ndi maphunziro a Artificial Intelligence.
Kusaka kwa Dataset
Kusaka kwa Dataset ndi injini yosakira yochokera ku Google yomwe imathandiza ofufuza kupeza zomwe zili pa intaneti zomwe zimapezeka kuti zigwiritsidwe ntchito kwaulere. Pa intaneti, pali mamiliyoni ambiri azinthu zamtundu uliwonse zomwe zimakusangalatsani.
Ngati mukuyang'ana kugula kagalu, mutha kupeza zolemba zomwe zikulemba madandaulo a ogula ana agalu kapena maphunziro pa kuzindikira kwa galu. Kapena ngati mumakonda kutsetsereka, mutha kupeza zambiri za ndalama zopezeka kumalo ochitirako masewera olimbitsa thupi kapena kuchuluka kwa anthu ovulala komanso manambala otenga nawo mbali. Kusaka kwa Dataset kwalemba pafupifupi 25 miliyoni mwazinthu izi, kukupatsani malo amodzi oti mufufuze ma dataset ndikupeza maulalo a komwe kuli data.
UCI Machine Learning Repository
UCI Machine Learning Repository ndi gulu lazosungirako, malingaliro a madambwe, ndi opanga ma data omwe amagwiritsidwa ntchito ndi gulu la Machine Learning pakuwunika mozama ma aligorivimu a Machine Learning. Zosungidwa zakale zidapangidwa ngati ftp archive mu 1987 ndi David Aha ndi ophunzira anzawo omaliza maphunziro ku UC Irvine.
Kuyambira nthawi imeneyo, yakhala ikugwiritsidwa ntchito kwambiri ndi ophunzira, aphunzitsi, ndi ofufuza padziko lonse lapansi monga gwero lalikulu la ma dataset a ML. Monga chisonyezero cha momwe malo osungiramo zinthu zakale amakhudzira, adatchulidwa maulendo oposa 1000, ndikupangitsa kuti ikhale imodzi mwa "mapepala" 100 omwe amatchulidwa kwambiri mu sayansi ya makompyuta.
Quandl
Quandl ndi nsanja yomwe imapatsa ogwiritsa ntchito ndalama, zachuma, ndi zina. Ogwiritsa ntchito amatha kutsitsa deta yaulere, kugula zolipira kapena kugulitsa data ku Quandl. Ikhoza kukhala chida chothandiza pa chitukuko cha malonda algorithms, Mwachitsanzo.
Kutsiliza
Mukawona zida izi, mutsimikiza kuti mwapeza zolowa zabwino zamapulojekiti anu. Onetsetsani kuti mwasankha deta yomwe ili yoyenera kwambiri pazosowa zanu zenizeni ndipo nthawi zonse muzikumbukira: sizongokhudza kuchuluka kwake, komanso khalidwe. Detaset ndiye maziko a chilichonse Pulojekiti yophunzirira makina ndipo ndikofunikira kumangirira pazomwe zili bwino kuti tipewe chiopsezo chofika pamalingaliro olakwika.
Siyani Mumakonda