Teburin Abubuwan Ciki[Boye][Nuna]
- 1. CelebFaces Halayen Dataset
- 2. DOTA
- 3. Google Facial Expression datasetset
- 4. Visual Genome
- 5. Maganar Libri
- 6. Wuraren Birni
- 7. Kinetics Dataset
- 8. CelebAMask-HQ
- 9. Penn Treebank
- 10. VoxCeleb
- 11. SIXray
- 12. Hatsarin Amurka
- 13. Gane Ciwon Ido
- 14. Ciwon Zuciya
- 15. CLEVR
- 16. Dogaran Duniya
- 17. KITTI - 360
- 18. MOT (Bibiyar Abubuwa da yawa)
- 19. PASCAL 3D+
- 20. Samfuran Dabbobi Masu Nakasa Fuska
- 21. MPII Human Post Dataset
- 22. UCF101
- 23. Sauraron sauti
- 24. Fahimtar Harshen Halitta na Stanford
- 25. Amsa Tambayoyi Na Gani
- Kammalawa
A zamanin yau, yawancin mu mun mai da hankali kan haɓaka koyan injina da samfuran AI da magance batutuwa ta amfani da bayanan bayanan yanzu. Amma da farko, dole ne mu ayyana tsarin bayanan, mahimmancinsa, da kuma rawar da yake takawa wajen haɓaka hanyoyin AI da ML masu ƙarfi.
A yau, muna da ɗimbin buɗaɗɗen bayanan bayanai waɗanda za mu gudanar da bincike ko haɓaka aikace-aikace don tunkarar al'amuran duniya na ainihi a sassa daban-daban.
Koyaya, ƙarancin manyan bayanai na ƙididdigewa shine tushen damuwa. Bayanai sun tashi sosai kuma za su ci gaba da fadada cikin sauri a nan gaba.
A cikin wannan sakon, za mu rufe bayanan da aka samu kyauta waɗanda za ku iya amfani da su don haɓaka aikin AI na gaba.
1. CelebFaces Bayanan Bayanan Bayani
CelebFaces Attributes Dataset (CelebA) ya ƙunshi hotuna masu shahara sama da 200K da bayanan sifofi 40 don kowane hoto, yana mai da shi kyakkyawan wurin farawa don ayyuka kamar su. Fahimtar fuska, Gane fuska, alamar ƙasa (ko bangaren fuskar fuska), da gyara fuska & haɗawa. Bugu da ƙari, hotuna a cikin wannan tarin sun ƙunshi ɗimbin bambance-bambancen matsayi da ɗimbin ɗigon baya.
2. DOTA
DOTA (bayanan bayanai na Gano makamin a cikin Hotunan Jiran Sama) babban ma'aunin bayanai ne don gano abu wanda ya ƙunshi nau'ikan gama gari 15 (misali, jirgi, jirgin sama, mota, da sauransu), hotuna 1411 don horo, da hotuna 458 don ingantawa.
3. Google Facial Expression dataset
Saitin kwatanta yanayin fuskar Google ya ƙunshi kusan hotuna uku 500,000, gami da hotunan fuska 156,000. Yana da kyau a lura cewa kowane uku na uku a cikin wannan bayanan an tsara su ta hanyar aƙalla masu ƙididdigewa mutane shida.
Wannan saitin bayanai yana da amfani ga ayyukan da suka haɗa da nazarin maganganun fuska, kamar dawo da hoto na tushen magana, rarrabuwar motsin rai, haɗa magana, da sauransu. Don samun damar yin amfani da saitin bayanai, dole ne a cika ɗan gajeren fom.
4. Visual Genome
Tambayar Kayayyakin Kayayyakin Amsa bayanai a cikin yanayi mai zaɓi da yawa yana samuwa a cikin Kayayyakin Kayayyakin Kayayyaki. Yana da hotuna 101,174 MSCOCO tare da nau'i-nau'i na QA miliyan 1.7, tare da matsakaicin tambayoyi 17 akan kowane hoto.
Idan aka kwatanta da saitin Amsa Tambayoyin Kayayyakin Kayayyakin, Kayayyakin bayanai na Kayayyakin Kayayyakin Kayayyakin Kayayyakin Yana da ingantacciyar rarrabuwa a cikin nau'ikan tambayoyi shida: Menene, Ina, Yaushe, Wane, Me yasa, da Ta yaya.
Bugu da kari, bayanan Kayayyakin Kayayyakin Kayayyakin Kayayyakin Ya hada da hotuna 108K wadanda aka yiwa alama sosai tare da abubuwa, kadarori, da haɗi.
5. Maganar Librin
LibriSpeech corpus tarin kusan awoyi 1,000 na littattafan mai jiwuwa daga aikin LibriVox. Yawancin littattafan mai jiwuwa sun samo asali ne daga Project Gutenberg.
An raba bayanan horon zuwa sassa uku na 100hr, 360hr, da 500hr sets, yayin da dev da gwajin bayanan sun yi kusan 5hr a tsawon sauti.
6. Wuraren Birni
Ɗaya daga cikin sanannun manyan bayanai na bidiyo na sitiriyo tare da ra'ayoyin birane shine ake kira The Cityscapes.
Tare da ingantattun bayanan pixel waɗanda suka haɗa da wuraren GPS, zafin waje, bayanan motsin rai, da ra'ayoyin sitiriyo daidai, ya haɗa da rikodi daga garuruwa 50 na Jamus.
7. Kinetics Dataset
Ɗaya daga cikin sanannun bayanan bayanan bidiyo don gane ayyukan ɗan adam a kan babban sikelin kuma tare da inganci mai kyau shine Kinetics dataset. Akwai aƙalla shirye-shiryen bidiyo 600 ga kowane ɗayan azuzuwan ayyukan ɗan adam 600, jimlar sama da 500,000 gabaɗaya.
An ciro fina-finan daga YouTube; kowanne yana da tsayin daƙiƙa 10 kuma yana da aji ɗaya kawai da aka jera.
8. CelebAMask-HQ
CelebAMask-HQ tarin hotuna ne na fuska masu tsayi 30,000 tare da abubuwan rufe fuska a hankali da azuzuwan 19 waɗanda suka haɗa da abubuwan fuska kamar fata, hanci, idanu, brow, kunnuwa, baki, leɓe, gashi, hula, gilashin ido, ɗan kunne, abin wuya, wuya, abu.
Ana iya amfani da saitin bayanan don gwadawa da horar da tantance fuska, tantance fuska, da GANs don ƙirƙirar fuska da gyara algorithms.
9. Penn Treebank
Ɗaya daga cikin sanannun kuma sau da yawa ana amfani da corpora don ƙididdige samfura don alamar tambari shine Turanci Penn Treebank (PTB) corpus, musamman ɓangaren corpus wanda ya dace da labaran Wall Street Journal.
Dole ne kowace kalma ta sami sashin magana da aka yiwa alama a matsayin bangaren aikin. Matsayin hali da matakin kalma ƙirar harshe kuma akai-akai amfani da corpus.
10. VoxCeleb
VoxCeleb babban bayanan tantance magana ne wanda aka samar ta atomatik daga bude-source kafofin watsa labarai. VoxCeleb yana da furci sama da miliyan ɗaya daga masu magana sama da 6k.
Kamar yadda bayanan ya ƙunshi audio-visual, ana iya amfani da shi don ƙarin aikace-aikace iri-iri, gami da haɗar magana ta gani, rabuwar magana, canja wurin tsari daga fuska zuwa murya ko akasin haka, da kuma tantance fuska na horarwa daga bidiyo don ƙara fahimtar fuskar yanzu. datasets.
11. SIXray
Saitin bayanan SIXray ya haɗa da hotunan X-ray 1,059,231 da aka tattara daga tashoshin jirgin ƙasa kuma masu binciken tsaron ɗan adam suka bayyana don gano manyan nau'ikan haramtattun abubuwa guda shida: bindigu, wuƙaƙe, wuƙaƙe, filawa, almakashi, da guduma. Bugu da ƙari, akwatunan ɗaure ga kowane abu da aka hana an ƙara da hannu zuwa saitin gwaji don kimanta aikin gano abu.
12. Hatsari na Amurka
An riga an bayyana abubuwan aikin da sunan bayanan, Hatsarin Amurka. Wannan saitin bayanai kan hadurran motoci na kasa baki daya ya hada da bayanai daga Fabrairu 2016 zuwa Disamba 2021 kuma ya shafi jihohi 49 a Amurka.
Kimanin bayanan hadurra miliyan 1.5 yanzu suna cikin wannan tarin. An tattara shi a ainihin-lokaci ta amfani da APIs na zirga-zirga da yawa.
Waɗannan APIs ɗin suna watsa bayanan zirga-zirga da aka tattara daga tushe iri-iri, gami da kyamarori na zirga-zirga, ƙungiyoyin tilasta doka, da sassan sufuri na Amurka da na jihohi.
13. Gane Ciwon Ido
Ƙididdigar bayanan ilimin ido da aka tsara (ODIR) ya ƙunshi bayanai game da marasa lafiya 5,000, ciki har da shekarun su, launi na fundus a idanunsu na hagu da dama, da kuma kalmomin bincike na kwararrun likitoci.
Wannan ma'auni na ainihi tarin bayanan marasa lafiya ne daga asibitoci daban-daban da wuraren kiwon lafiya a kasar Sin wanda Shanggong Medical Technology Co., Ltd. ya samu. Tare da kula da ingancin inganci, ƙwararrun masu karatu na ɗan adam sun sanya alamar bayanin bayanai.
14. cututtukan zuciya da
Wannan bayanan cututtukan zuciya yana taimakawa wajen gano wanzuwar cututtukan zuciya a cikin majiyyaci bisa la'akari da sigogi 76 kamar shekaru, jinsi, nau'in ciwon kirji, hutun hawan jini, da sauransu.
Tare da shari'o'i 303, bayanan suna neman kawai bambance wanzuwar rashin lafiya (darajar 1,2,3,4) daga rashi (darajar 0).
15. CLEVR
Saitin bayanan CLEVR (Harshen Rubuce-rubuce da Tunanin Kayayyakin Farko) yana kwaikwayon Amsa Tambayoyin Kayayyakin Kayayyakin. Ya ƙunshi hotuna na abubuwa da aka yi na 3D, tare da kowane hoto tare da jerin tambayoyin da aka haɗa sosai zuwa kashi da yawa.
Ga duk jirgin ƙasa da ingantattun hotuna da tambayoyi, bayanan ya ƙunshi hotuna 70,000 da tambayoyi 700,000 don horo, hotuna 15,000 da tambayoyi 150,000 don ingantawa, da hotuna 15,000 da tambayoyi 150,000 don gwaji da suka haɗa da abubuwa, amsoshi, shirye-shiryen fage da aiki.
16. Abubuwan Dogara na Duniya
Aikin Dogara na Duniya (UD) yana nufin ƙirƙirar nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan harshe iri-iri da bayanin tsarin bankin itace don harsuna da yawa. Shafin 2.7, wanda aka saki a cikin 2020, yana da bankunan itace 183 a cikin harsuna 104.
Bayanin an yi shi ne da alamun POW na duniya, shugabannin dogara, da alamun dogaro na duniya.
17. KITTI - 360
Ɗaya daga cikin mafi yawan lokuta ana amfani da saitin bayanai don mutum-mutumi na hannu da tuki mai 'yanci KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute).
Yana da darajar sa'o'i na yanayin zirga-zirga waɗanda aka kama ta amfani da kewayon hanyoyin firikwensin, kamar RGB mai ƙarfi, sitiriyo launin toka, da kyamarori na Laser na'urar daukar hotan takardu. Masu bincike da yawa sun inganta tsarin bayanan na tsawon lokaci waɗanda suka tantance sassa daban-daban da hannu don dacewa da bukatunsu.
18. MOT (Bibiyar Abubuwa da yawa)
MOT (Tsarin Abubuwa da yawa) saitin bayanai ne don bin diddigin abubuwa da yawa waɗanda suka haɗa da wuraren gida da waje na wuraren jama'a waɗanda suka haɗa da masu tafiya a ƙasa a matsayin abubuwan sha'awa. Bidiyon kowane fage ya kasu kashi biyu, ɗaya don horo ɗaya kuma don gwaji.
Saitin bayanan ya haɗa da gano abu a cikin firam ɗin bidiyo ta amfani da na'urori uku: SDP, Fast-RCNN, da DPM.
19. PASCAL 3D+
Rubutun bayanan duba da yawa na Pascal3D+ an yi shi ne da hotuna da aka tattara a cikin daji, watau, hotunan nau'ikan nau'ikan abubuwa masu yawa, waɗanda aka kama cikin yanayin da ba a kula da su ba, a cikin cunkoson jama'a, da kuma a wurare daban-daban. Pascal3D+ ya ƙunshi nau'ikan abubuwa masu ƙarfi guda 12 waɗanda aka zana daga bayanan PASCAL VOC 2012.
Waɗannan abubuwan suna da bayanin matsayi da aka yiwa alama (azimuth, ɗagawa, da nisa zuwa kamara). Pascal3D+ ya kuma haɗa da hotuna da aka zayyana daga tarin ImageNet a cikin waɗannan nau'ikan 12.
20. Samfuran Dabbobi Masu Nakasa Fuska
Manufar aikin Samfuran Dabbobi na Fuskar Fuskar (FDMA) shine ƙalubalantar hanyoyin da ake amfani da su a halin yanzu a cikin tantance alamun fuskar ɗan adam da bin diddigi da haɓaka sabbin algorithms waɗanda zasu iya magance babban canji mai girma wanda ke halayyar halayen fuskar dabba.
Algorithms na aikin sun nuna ikon ganewa da bin diddigin alamomin kan fuskokin ɗan adam yayin da ake fuskantar bambance-bambancen da suka haifar da canje-canje a motsin fuska ko matsayi, ɓarna ɓarna, da haske.
21. MPII Bayanan Bayanan Dan Adam
MPII Human Pose Dataset ya ƙunshi kusan hotuna 25K, 15K daga cikinsu samfuran horo ne, 3K daga cikinsu samfuran inganci ne, kuma 7K daga cikinsu samfuran gwaji ne.
An sanya wa muƙaman alamar da hannu tare da haɗin gwiwa na jiki 16, kuma an ɗau hotunan daga fina-finai na YouTube da ke rufe ayyukan mutane 410 daban-daban.
22. Saukewa: UCF101
Saitin bayanan UCF101 ya ƙunshi shirye-shiryen bidiyo 13,320 da aka tsara cikin nau'ikan 101. Waɗannan nau'ikan guda 101 sun kasu kashi biyar: motsin jiki, hulɗar ɗan adam da ɗan adam, hulɗar ɗan adam, wasan kayan kida, wasanni.
Bidiyon daga YouTube kuma sun ƙunshi sa'o'i 27 a tsawon lokaci.
23. Saitin sauti
Audioset saitin bayanan taron ne mai jiwuwa wanda ya ƙunshi sama da miliyan biyu da aka ba da labarin ɗan adam sassan bidiyo na daƙiƙa 2. Don fayyace wannan bayanan, ana amfani da ilimin ilimin ilimin lissafi wanda ya ƙunshi nau'ikan aukuwa 10, wanda ke nuna cewa ana iya lakafta sauti iri ɗaya daban.
24. Fahimtar Harshen Halitta na Stanford
Saitin bayanai na SNLI (Ingantacciyar Harshen Halitta na Stanford) ya ƙunshi nau'ikan jumla guda 570k waɗanda aka rarraba su da hannu azaman haɗaka, sabani, ko tsaka tsaki.
Wuraren bayanan hoto ne na Flickr30k, yayin da hasashe ya samo asali ne daga masu bayyana ra'ayi na taron jama'a waɗanda aka ba da jigo kuma an umurce su da su samar da kalamai masu ban sha'awa, masu karo da juna.
25. Amsar Tambaya ta Kayayyakin gani
Amsa Tambayoyi na gani (VQA) saitin bayanai ne wanda ya ƙunshi buɗaɗɗen tambayoyi game da hotuna. Don amsa waɗannan tambayoyin, kuna buƙatar fahimtar hangen nesa, harshe, da hankali.
Kammalawa
Kamar yadda koyan na'ura da basirar wucin gadi (AI) ke ƙara yaɗuwa a kusan kowane kasuwanci da kuma rayuwarmu ta yau da kullun, haka ma adadin albarkatu da bayanan da ake samu kan batun.
Shirye-shiryen bayanan jama'a suna ba da babban wurin farawa don haɓaka ƙirar AI yayin da kuma ba da damar ƙwararrun masu shirye-shiryen ML don adana lokaci da mai da hankali kan sauran abubuwan ayyukansu.
Leave a Reply