Isiqulatho[Fihla][Bonisa]
- 1. Iseti yedatha yeempawu zeCelebFaces
- 2. IDOTA
- 3. Google Facial Expression comparison dataset
- 4. IGenome ebonakalayo
- 5. LibriSpeech
- 6. IiNdawo zeSixeko
- 7. Iseti yedatha yeKinetics
- 8. CelebAMask-HQ
- 9. Penn Treebank
- 10. VoxCeleb
- 11. Iireyizi ezintandathu
- 12. IiNgozi zase-US
- 13. Ukuqatshelwa kweSifo se-Ocular
- 14. Isifo Sentliziyo
- 15. I-CLEVR
- 16. Ukuxhomekeka kwiHlabathi liphela
- 17. KITTI – 360
- 18. I-MOT(Ukulandelela izinto ezininzi)
- 19. PASCAL 3D+
- 20. Iimodeli eziKhubazekayo zoBuso beZilwanyana
- 21. MPII Uvimba weenkcukacha zoLuntu
- 22. UCF101
- 23. Iseti yomsindo
- 24. Ingqikelelo yoLwimi lweNdalo yaseStanford
- 25. Iimpendulo zemibuzo ebonwayo
- isiphelo
Kule mihla, uninzi lwethu lugxile ekuphuhliseni ukufundwa koomatshini kunye neemodeli ze-AI kunye nokujongana nemiba kusetyenziswa iiseti zedatha zangoku. Kodwa okokuqala, kufuneka sichaze i-dataset, ukubaluleka kwayo, kunye nendima yayo ekuphuhliseni izisombululo ezinamandla ze-AI kunye ne-ML.
Namhlanje, sinothotho lweeseti zedatha ezivulelekileyo esinokwenza uphando ngazo okanye siphuhlise izicelo zokujongana nemiba yehlabathi yokwenyani kumacandelo ahlukeneyo.
Nangona kunjalo, ukunqongophala komgangatho ophezulu wedatha yedatha ngumthombo wokukhathazeka. Idatha inyuke kakhulu kwaye iya kuqhubeka ikhula ngesantya esikhawulezayo kwixesha elizayo.
Kule posi, siza kugubungela iiseti zedatha ezifumaneka simahla onokuzisebenzisa ukuphuhlisa iprojekthi yakho ye-AI elandelayo.
1. Iseti yedatha yeempawu zeCelebFaces
I-CelebFaces Attributes Dataset (CelebA) iqulethe ngaphezulu kwe-200K yeefoto zabantu abadumileyo kunye ne-40 yenkcazo yophawu lomfanekiso ngamnye, iyenza ibe yeyona ndawo ibalaseleyo yokuqalisa iiprojekthi ezinjengezi. ukuqaphela ubuso, ubhaqo lobuso, iphawu lomhlaba (okanye inxalenye yobuso) uhlaziyo, kunye nokuhlelwa kobuso & nodibaniso. Ngaphaya koko, iifoto ezikule ngqokelela ziqulethe uluhlu olubanzi lweendawo ezahlukeneyo kunye ne-backdrop clutter.
2. I-DOTA
DOTA (Uluhlu lweenkcukacha ze Ukufunyanwa kwento kwi-Aerial Photos) yidathasethi enkulu yokukhangela into equka i-15 iindidi eziqhelekileyo (umzekelo, inqanawa, inqwelomoya, imoto, njl.), 1411 imifanekiso yoqeqesho, kunye nemifanekiso engama-458 yokuqinisekiswa.
3. Iseti yedatha yokuthelekisa yoBuso bukaGoogle
Iseti yedatha yothelekiso yembonakalo yobuso kaGoogle iqulathe malunga ne-500,000 yemifanekiso emithathu, kuquka ne-156,000 yeefoto zobuso. Kubalulekile ukuqaphela ukuba i-triplet nganye kule datha yachazwa ubuncinane ngabantu abathandathu.
Le datha iseti iluncedo kwiiprojekthi ezibandakanya uhlalutyo lwembonakalo yobuso, ezifana nokufunyanwa kwemifanekiso esekwe kwintetho, ukwahlulahlula ngokweemvakalelo, ukudityaniswa kwentetho, njalo njalo. Ukufumana ukufikelela kwidathasethi, ifom emfutshane kufuneka izaliswe.
4. IGenome ebonakalayo
Umbuzo obonakalayo Ukuphendula idatha kwindawo yokhetho oluninzi iyafumaneka kwiVisual Genome. Yenziwe ngeefoto ze-101,174 ze-MSCOCO kunye ne-1.7 yezigidi ze-QA izibini, kunye ne-avareji yemibuzo ye-17 ngomfanekiso ngamnye.
Xa kuthelekiswa neseti yedatha yokuPhendula koMbuzo weVisual, i-Visual Genome dataset inosasazo olunobulungisa kuzo zonke iindidi zemibuzo emithandathu: Yintoni, Phi, Nini, Ngubani, Kutheni, kwaye Njani.
Ukongeza, i-Visual Genome dataset ibandakanya iifoto ze-108K eziphawulwe kakhulu ngezinto, iipropathi, kunye noqhagamshelo.
5. LibriSpeech
I-LibriSpeech corpus yingqokelela emalunga ne-1,000 leeyure zeaudiobook ezivela kwiprojekthi yeLibriVox. Uninzi lweencwadi ezirekhodiweyo zivela kwiProjekthi Gutenberg.
Idatha yoqeqesho ihlulwe ibe yizahlulo ezintathu ze-100hr, i-360hr, kunye ne-500hr iisethi, ngelixa i-dev kunye nedatha yokuvavanya i-5hr ubude bomsindo.
6. Iindawo zeSixeko
Olunye lolona lwazi luphezulu lwaziwayo lweevidiyo zestereo ezinemibono yasezidolophini ibizwa ngokuba yiCityscapes.
Ngamagqabantshintshi achanekileyo ngepixel aquka iindawo zeGPS, ubushushu bangaphandle, idatha ye-ego-motion, kunye nemibono echanekileyo yestereo, ibandakanya urekhodo oluvela kwizixeko ezingama-50 ezahlukeneyo zaseJamani.
7. Iseti yedatha yeKinetics
Enye yezona datha ziyaziwa kakhulu zevidiyo zokuqaphela umsebenzi womntu kwinqanaba elikhulu kunye nomgangatho omhle yi-Kinetics dataset. Ubuncinci kukho iikliphu zevidiyo ezingama-600 kwiklasi nganye kwezingama-600 ezenziwa ngabantu, zizonke zingaphezulu kwama-500,000.
Iifilimu zatsalwa kwiYouTube; nganye imalunga nemizuzwana eli-10 ubude kwaye ineklasi enye kuphela edwelisiweyo.
8. CelebAMask-HQ
I-CelebAMask-HQ yingqokelela yeefoto zobuso obuphezulu ezingama-30,000 ezineemaski ezichazwe ngocoselelo kunye neeklasi ezili-19 ezibandakanya izinto zobuso ezifana nolusu, impumlo, amehlo, iibrow, iindlebe, umlomo, umlomo, iinwele, umnqwazi, iglasi yamehlo, icici, intsimbi yomqala, intamo, izinto.
Iseti yedatha ingasetyenziselwa ukuvavanya nokuqeqesha ukuqondwa kobuso, ukwahlula ubuso, kunye nee-GAN zokuvelisa ubuso kunye nokuhlela i-algorithms.
9. Penn Treebank
Enye yeyona corpora iphawulekayo kwaye isetyenziswa rhoqo kuvavanyo lweemodeli zokuthegiswa kolandelelwano yi-English Penn Treebank (PTB) corpus, ngokukodwa inxalenye yekophu ehambelana namanqaku e-Wall Street Journal.
Igama ngalinye malibe nenxalenye yentetho ephawulwe njengenxalenye yomsebenzi. Inqanaba lomlinganiswa kunye nenqanaba legama imodeli yolwimi ikwasebenzisa rhoqo ikhoposi.
10. VoxCeleb
I-VoxCeleb yidathasethi enkulu yokuchonga intetho eyenziwe ngokuzenzekelayo ukusuka imidiya evulelekileyo. IVoxCeleb ineentetho ezingaphezu kwesigidi ezivela kwizithethi ezingaphezulu kwe-6k.
Njengoko i-dataset ibandakanya i-audio-visual, ingasetyenziselwa iintlobo zezicelo ezongezelelweyo, kubandakanywa ukudibanisa intetho ebonakalayo, ukwahlukana kwentetho, ukuhanjiswa kwemodyuli ukusuka ebusweni ukuya kwilizwi okanye ngokuchaseneyo, kunye nokuqeqeshwa kobuso bokuqonda ukusuka kwividiyo ukuncedisa ukuqaphela ubuso bangoku. iiseti zedatha.
11. SIXray
Iseti yedatha ye-SIXray iquka imifanekiso ye-X-reyi eyi-1,059,231 eqokelelwe kwizikhululo zikaloliwe ezingaphantsi komhlaba kwaye ichazwa ngabahloli bokhuseleko ukuze bachonge iindidi ezintandathu eziphambili zezinto ezingavumelekanga: imipu, iimela, iziklintshi, iipliers, izikere, neehamile. Ngaphaya koko, iibhokisi zokubophelela zento nganye engavunywanga ziye zongezwa ngesandla kwiiseti zovavanyo ukuze kuvavanywe ukusebenza kwezinto zalapha.
12. IiNgozi zase-US
Umba weprojekthi sele utyhilwe ngegama ledatha, iiNgozi zase-US. Le datha yedatha yeengozi zeemoto zelizwe lonke iquka ulwazi ukusuka ngoFebruwari 2016 ukuya kuDisemba 2021 kwaye iquka imimandla engama-49 e-USA.
Malunga ne-1.5 yezigidi zeerekhodi zeengozi ezikhoyo ngoku kule ngqokelela. Yahlanganiswa ngexesha lokwenyani ngokusebenzisa ii-APIs ezininzi zendlela.
Ezi APIs zisasaza ulwazi lwezendlela oluqokelelwe kwimithombo eyahlukeneyo, kubandakanywa iikhamera zendlela, imibutho yokuthotyelwa komthetho, kunye ne-US kunye namasebe karhulumente wezothutho.
13. Ukuqatshelwa kwezifo ze-Ocular
I-database ye-ophthalmic ehleliweyo ye-Ocular Disease Intelligent Recognition (ODIR) iqulethe ulwazi kwizigulane ze-5,000, kubandakanywa iminyaka yazo, umbala we-fundus emehlweni abo asekhohlo nasekunene, kunye namagama angundoqo okuxilongwa kweengcali zonyango.
Le datha yingqokelela yokwenyani yedatha yesigulana evela kwizibhedlele ezahlukeneyo kunye namaziko ezonyango e-China efunyenwe yi-Shanggong Medical Technology Co., Ltd. Nge ulawulo lolawulo lomgangatho, amanqakwana aphawulwe ngabantu abanobuchule bokufunda.
14. Isifo sentliziyo
Le dataset yesifo senhliziyo inceda ekuchongeni ubukho besifo senhliziyo kwisigulane esisekelwe kwi-76 parameters ezifana nobudala, isini, uhlobo lweentlungu zesifuba, ukuphumla koxinzelelo lwegazi, njalo njalo.
Ngeemeko ze-303, i-database ifuna ukwahlula nje ubukho besifo (ixabiso le-1,2,3,4) ukusuka kokungabikho (ixabiso le-0).
15. I-CLEVR
Iseti yedatha ye-CLEVR (uLwimi oluQongileyo kunye nokuQiqa okuPhakamisayo okuBonakalayo) ilinganisa ukuPhendula koMbuzo oBonwayo. Iqulethe iifoto zezinto ezinikezelwe nge-3D, kunye nefoto nganye ikhatshwa luluhlu lwemibuzo eyakhiwe kakhulu eyahlulahlulwe yaziindidi ezininzi.
Kuyo yonke imifanekiso kaloliwe kunye nokuqinisekisa kunye nemibuzo, iseti yedatha iquka iifoto ezingama-70,000 kunye nemibuzo engama-700,000 yoqeqesho, imifanekiso eyi-15,000 kunye nemibuzo eyi-150,000 yokuqinisekiswa, kunye nemifanekiso eyi-15,000 kunye nemibuzo eyi-150,000 yovavanyo olubandakanya izinto, iimpendulo, iigrafu ezisebenzayo, kunye neegrafu ezisebenzayo.
16. Ukuxhomekeka kwiHlabathi liphela
Iprojekthi ye-Universal Dependencies (UD) ijolise ekudaleni imorphology yolwimi olufanayo kunye ne-syntax treebank annotation kwiilwimi ezininzi. Inguqulelo 2.7, eyakhutshwa ngo-2020, ine-183 yeebhanki zemithi ngeelwimi ezili-104.
Inkcazo yenziwe ngeethegi ze-POW jikelele, iintloko zokuxhomekeka, kunye neelebhile zokuxhomekeka jikelele.
17. KITTI – 360
Enye yezona datha zisetyenziswa rhoqo kwiirobhothi ezihambayo kunye ukuqhuba ngokuzimela yiKITTI (iKarlsruhe Institute of Technology kunye neToyota Technological Institute).
Yenziwe ziimeko zetrafikhi ezixabisa iiyure ezithi zabanjwa kusetyenziswa uluhlu lweendlela zokuziva, ezinje nge-high-resolution RGB, greyscale stereo, kunye neekhamera ze3D laser scanner. Uluhlu lwedatha luye lwaphuculwa ngokuhamba kwexesha ngabaphandi abaliqela abathe bachaza ngokwahlukileyo iinxalenye zayo ezahlukeneyo ukuze zihambelane neemfuno zabo.
18. I-MOT(Ukulandelela izinto ezininzi)
I-MOT (iMultiple Object Tracking) yiseti yedatha yokulandela umkhondo wezinto ezininzi eziquka imbonakalo yangaphakathi nangaphandle yeendawo zikawonke-wonke eziquka abahambi ngeenyawo njengezinto ezinomdla. Ividiyo yomboniso ngamnye yahlulahlulwe yaziingceba ezibini, enye yeyoqeqesho kunye neyovavanyo.
Iseti yedatha ibandakanya ukubhaqwa kwezinto kwizakhelo zevidiyo zisebenzisa ii-detectors ezintathu: i-SDP, i-Faster-RCNN, kunye ne-DPM.
19. PASCAL 3D+
I-Pascal3D + i-multi-view dataset yenziwe ngeefoto eziqokelelwe endle, oko kukuthi, imifanekiso yeendidi zezinto ezinokuguquguquka okuphezulu, ezithathwe kwiimeko ezingalawulwayo, kwiindawo ezixineneyo, nakwiindawo ezahlukeneyo. I-Pascal3D + iquka iindidi ze-12 eziqinileyo ezithathwe kwi-PASCAL VOC 2012 dataset.
Ezi zinto zineenkcukacha zokuma eziphawulwe kuzo (i-azimuth, ukuphakama, kunye nomgama kwikhamera). I-Pascal3D + iquka iifoto ezichazwe kwi-pose-annotated kwingqokelela ye-ImageNet kwezi ndidi ze-12.
20. Iimodeli eziKhubazekayo zobuso beZilwanyana
Injongo yeprojekthi ye-Facial Deformable Models of Animals (FDMA) kukucela umngeni kwiindlela ezisetyenziswayo ngoku ekuchongeni indawo eyibhakana yobuso bomntu kunye nokulandela umkhondo kunye nokuphuhlisa iindlela ezintsha ezinokuthi zijongane noguquko olukhulu kakhulu oluphawu lweempawu zobuso besilwanyana.
Ii-algorithms zeprojekthi zibonise ukukwazi ukuqaphela kunye nokulandelela iimpawu zomhlaba ebusweni bomntu ngelixa ujongene nokwahlukana okubangelwa utshintsho kwiimvakalelo zobuso okanye izikhundla, ukuvalela inxalenye, kunye nokukhanya.
21. Iseti yedatha yeMPII yoLuntu
I-MPII Human Pose Dataset iqulethe malunga neefoto ze-25K, i-15K yazo iisampuli zoqeqesho, i-3K yazo iisampulu zokuqinisekisa, kunye ne-7K yazo iisampuli zokuvavanya.
Izikhundla zibhalwe ngesandla ukuya kuthi ga kwi-16 amalungu omzimba, kwaye iifoto zithathwe kwiifilim zikaYouTube ezigubungela imisebenzi eyahlukeneyo yabantu engama-410.
22. UCU101
Iseti yedatha ye-UCF101 iqulethe iikliphu zevidiyo ezingama-13,320 ezihlelwe ngokweendidi ezili-101. Ezi ndidi ze-101 zahlulwe zaba ziindidi ezintlanu: iintshukumo zomzimba, intsebenziswano yomntu nomntu, unxibelelwano phakathi kwabantu, ukudlala isixhobo somculo, kunye nemidlalo.
Iividiyo zivela kuYouTube kwaye ziquka iiyure ezingama-27 ubude.
23. Iseti yomsindo
I-Audioset yi-audioset dataset eyenziwe ngaphaya kwezigidi ezi-2 zabantu-amacandelo evidiyo yemizuzwana eli-10. Ukucacisa le datha, i-ontology yoluhlu lwemigangatho ebandakanya iindidi zeziganeko ezingama-632 ziyasetyenziswa, nto leyo ethetha ukuba isandi esifanayo sinokubhalwa ngokwahlukileyo.
24. Inkcazo yoLwimi lweNdalo yaseStanford
I-dataset ye-SNLI (i-Stanford Natural Language Inference) iqulethe i-570k yezivakalisi ezidityanisiweyo eziye zahlelwa ngesandla njengento ebandakanya, ukuchasana, okanye ukungathathi hlangothi.
Izakhiwo ziinkcazo zemifanekiso yeFlickr30k, ngelixa iingqikelelo zaphuhliswa ngabachazi abaphuma kwisihlwele ababonelelwe ngesiseko kwaye bayalelwa ukuba benze iingxelo ezibandakanyayo, eziphikisanayo, nezingathathi hlangothi.
25. Ukuphendulwa kwemibuzo ebonwayo
IVisual Question Answering (VQA) yidathasethi equlethe imibuzo evulelekileyo malunga nemifanekiso. Ukuze uphendule le mibuzo, kufuneka ubambe umbono, ulwimi kunye nengqiqo.
isiphelo
Njengoko ukufunda koomatshini kunye nobukrelekrele bokwenziwa (AI) buxhaphake ngakumbi kwishishini ngalinye nakubomi bethu bemihla ngemihla, liya kuba njalo ke inani lezixhobo kunye nolwazi olukhoyo ngalo mbandela.
Iiseti zedatha zikawonke-wonke esele zenziwe zibonelela ngesiqalo esihle sokuphuhlisa iimodeli ze-AI ngelixa zikwavumela abadwelisi benkqubo beML abanamava ukuba bonge ixesha kwaye bagxile kwezinye izinto zeeprojekthi zabo.
Shiya iMpendulo