Okuqukethwe[Fihla][Bonisa]
Izibalo ezithuthukile nezinhlelo zokufunda komshini ziqhutshwa idatha, kodwa ukufinyelela kuleyo datha kungase kube nzima ezifundweni ngenxa yezinselele zobumfihlo nezinqubo zebhizinisi.
Idatha yokwenziwa, engabiwa futhi isetshenziswe ngezindlela idatha yangempela engakwazi ngayo, iyindlela entsha engase ilandelwe. Kodwa-ke, leli su elisha alinazo izingozi noma ukungalungi, ngakho-ke kubalulekile ukuthi amabhizinisi acubungule ngokucophelela ukuthi azisebenzisa kuphi futhi kanjani izinsiza zawo.
Esikhathini samanje se-AI, singasho futhi ukuthi idatha ingamafutha amasha, kodwa abambalwa kuphela abakhethiwe abahlezi ku-gusher. Ngakho-ke, abantu abaningi bazikhiqizela uphethiloli wabo, othengekayo futhi osebenza kahle. Kwaziwa njengedatha yokwenziwa.
Kulokhu okuthunyelwe, sizobheka kabanzi idatha yokwenziwa—kungani kufanele uyisebenzise, ukuthi yenziwa kanjani, yini eyenza ihluke kudatha yangempela, yiziphi izimo ezingase zisetshenziswe, nokunye okuningi.
Ngakho-ke, yini i-Synthetic Data?
Uma amasethi wedatha wangempela enganele ngokwekhwalithi, inombolo, noma ukuhlukahluka, idatha yokwenziwa ingasetshenziswa ukuqeqesha amamodeli e-AI esikhundleni sedatha yangempela yomlando.
Uma idatha ekhona ingazanelisi izimfuneko zebhizinisi noma inezingozi zobumfihlo lapho isetshenziselwa ukuthuthukisa ukufunda imishini amamodeli, isofthiwe yokuhlola, noma okufanayo, idatha yokwenziwa ingaba ithuluzi elibalulekile lemizamo yebhizinisi ye-AI.
Kalula nje, idatha yokwenziwa ivame ukusetshenziswa esikhundleni sedatha yangempela. Ngokunemba kakhudlwana, yidatha emakwe ngokwenziwa futhi yakhiqizwa ngokulingisa noma ama-algorithms ekhompuyutha.
Idatha yokwenziwa wulwazi oludalwe uhlelo lwekhompiyutha ngokuzenzisa esikhundleni sokuvela kwezenzeko zangempela. Izinkampani zingangeza idatha yokwenziwa kudatha yazo yokuqeqeshwa ukuze zihlanganise zonke izimo zokusetshenziswa nezimo, zehlise izindleko zokuqoqwa kwedatha, noma zanelise imithetho yobumfihlo.
Idatha yokwenziwa manje isifinyeleleka kakhulu kunangaphambili ngenxa yokuthuthukiswa kwamandla okucubungula nezindlela zokugcina idatha njengefu. Idatha yokwenziwa ithuthukisa ukudalwa kwezixazululo ze-AI ezinenzuzo enkulu kubo bonke abasebenzisi bokugcina, futhi ngokungangabazeki lokho kuyintuthuko enhle.
Ibaluleke kangakanani idatha yokwenziwa futhi kungani kufanele uyisebenzise?
Lapho uqeqesha amamodeli e-AI, onjiniyela bavamise ukudinga amasethi edatha amakhulu anamalebula anembile. Lapho kufundiswa ngedatha ehlukahlukene, amanethiwekhi we-neural enze ngokunembe kakhudlwana.
Ukuqoqa nokulebula lawa madathasethi amakhulu aqukethe amakhulu noma izigidi zezinto, nokho, kungase kudle isikhathi nemali ngendlela engafanele. Intengo yokukhiqiza idatha yokuqeqeshwa ingancishiswa kakhulu ngokusebenzisa idatha yokwenziwa. Isibonelo, uma sidalwe ngokwenziwa, isithombe sokuqeqeshwa esibiza u-$5 uma sithengwa ku- umhlinzeki wokulebula idatha ingabiza u-$0.05 kuphela.
Idatha yokwenziwa ingadambisa ukukhathazeka kobumfihlo okuhlobene nedatha engase ibe bucayi ekhiqizwa emhlabeni wangempela kuyilapho futhi yehlisa izindleko.
Uma kuqhathaniswa nedatha yangempela, engakwazi ukubonisa ngokunembile inhlobonhlobo yamaqiniso amaningi ngomhlaba wangempela, ingasiza ekwehliseni ukucwasa. Ngokuhlinzeka ngezehlakalo ezingavamile ezimelela izinto ezinokwenzeka kodwa okungase kube inselele ukuthola kudatha esemthethweni, idatha yokwenziwa inganikeza ukuhlukahluka okukhulu.
Idatha yokwenziwa ingase ilingane kahle kuphrojekthi yakho ngenxa yezizathu ezibalwe ngezansi:
1. Ukuqina kwemodeli
Ngaphandle kokuthi ukuthole, finyelela idatha ehlukahlukene kakhulu yamamodeli akho. Ngedatha yokwenziwa, ungakwazi ukuqeqesha imodeli yakho usebenzisa okuhlukile komuntu oyedwa ngokugunda izinwele okuhlukahlukene, izinwele zobuso, izibuko, ukuma kwekhanda, njll., kanye nethoni yesikhumba, izici zobuzwe, ukwakheka kwamathambo, amafreckles, nezinye izici ukuze ukhiqize okuyingqayizivele. abhekane nayo ayiqinise.
2. Amacala asemaphethelweni ayacatshangelwa
Ukulinganisela Idathasethi ikhethwa ngokufunda komshini ama-algorithms. Cabanga emuva esibonelweni sethu sokuqashelwa kobuso. Ukunemba kwamamodeli abo bekuyobe kuthuthukile (futhi empeleni, amanye alawa mabhizinisi enza lokhu nje), futhi ngabe akhiqize imodeli yokuziphatha eyengeziwe uma ayekhiqize idatha yokwenziwa yobuso obumnyama ukuze bagcwalise izikhala zabo zedatha. Amaqembu angakhava zonke izimo zokusetshenziswa, okuhlanganisa nezimo lapho idatha iyindlala noma ingekho, ngosizo lwedatha yokwenziwa.
3. Ingatholwa ngokushesha kakhulu kunedatha "yangempela".
Amaqembu ayakwazi ukukhiqiza inani elikhulu ledatha yokwenziwa ngokushesha. Lokhu kuwusizo ikakhulukazi uma idatha yempilo yangempela incike ezenzakalweni ezingavamile. Amaqembu angase akuthole kunzima ukuthola idatha eyanele yomhlaba wangempela ezimweni ezinzima zomgwaqo kuyilapho eqoqa idatha yemoto ezishayelayo, isibonelo, ngenxa yokuvela kwayo okungavamile. Ukuze kusheshiswe inqubo yezichasiselo ekhandlayo, ososayensi bedatha bangafaka ama-algorithms ukuze balebule ngokuzenzakalelayo idatha yokwenziwa njengoba ikhiqizwa.
4. Ivikela imininingwane eyimfihlo yomsebenzisi
Izinkampani zingase zibe nezinkinga zokuphepha ngenkathi ziphethe idatha ebucayi, kuye ngebhizinisi kanye nohlobo lwedatha. Ulwazi lwezempilo lomuntu siqu (i-PHI), isibonelo, luvamise ukufakwa kudatha yeziguli embonini yezokunakekelwa kwempilo futhi kufanele lusingathwe ngokuphepha okukhulu.
Ngenxa yokuthi idatha yokwenziwa ayifaki ulwazi olumayelana nabantu bangempela, izinkinga zobumfihlo ziyehla. Cabangela ukusebenzisa idatha yokwenziwa njengenye indlela uma ithimba lakho kufanele linamathele emithethweni ethile yobumfihlo yedatha.
Idatha yangempela Vs idatha yokwenziwa
Emhlabeni wangempela, idatha yangempela iyatholakala noma iyalinganiswa. Uma othile esebenzisa i-smartphone, ikhompuyutha ephathekayo, noma ikhompuyutha, egqoka iwashi lesandla, efinyelela iwebhusayithi, noma enza umsebenzi we-inthanethi, lolu hlobo lwedatha lwenziwa khona manjalo.
Ukwengeza, izinhlolovo zingasetshenziswa ukunikeza idatha yangempela (ku-inthanethi nokungaxhunyiwe ku-inthanethi). Izilungiselelo zedijithali zikhiqiza idatha yokwenziwa. Ngaphandle kwengxenye engazange ithathwe kunoma yiziphi izehlakalo zomhlaba wangempela, idatha yokwenziwa idalwa ngendlela elingisa ngempumelelo idatha yangempela ngokwezimfanelo ezibalulekile.
Umqondo wokusebenzisa idatha yokwenziwa esikhundleni sedatha yangempela uyathembisa ngoba ungasetshenziswa ukuhlinzeka idatha yokuqeqeshwa efundwa ngomshini amamodeli adinga. Kodwa akuqiniseki lokho ukuhlakanipha okungekhona okwangempela ingaxazulula yonke inkinga ephakamayo emhlabeni wangempela.
Sebenzisa amacala
Idatha yokwenziwa ilusizo ezinhlosweni zezentengiso ezahlukahlukene, okuhlanganisa ukuqeqeshwa kwamamodeli, ukuqinisekiswa kwemodeli, nokuhlolwa kwemikhiqizo emisha. Sizoklelisa imikhakha embalwa ehamba phambili ekusetshenzisweni kwayo ekufundeni ngomshini:
1. Ukunakekela impilo
Uma kubhekwa ukuzwela kwedatha yawo, umkhakha wezokunakekelwa kwempilo ukufanela kahle ukusetshenziswa kwedatha yokwenziwa. Idatha yokwenziwa ingasetshenziswa ngamathimba ukuze aqophe umzimba wazo zonke izinhlobo zesiguli ezingaba khona, ngaleyo ndlela kusize ekuxilongeni ngokushesha nangokunembe kakhudlwana kwezifo.
Imodeli ye-Google yokuhlonza i-melanoma ingumfanekiso othakazelisayo walokhu njengoba ihlanganisa idatha yokwenziwa yabantu abanesikhumba esimnyama (indawo yedatha yomtholampilo engamelwe ngokudabukisayo) ukuze inikeze imodeli amandla okusebenza ngempumelelo kuzo zonke izinhlobo zesikhumba.
2. Izimoto
Izifanisi zivame ukusetshenziswa izinkampani ezakha izimoto ezizishayelayo ukuze zihlole ukusebenza. Uma isimo sezulu sisibi, isibonelo, ukuqoqa idatha yomgwaqo yangempela kungaba yingozi noma kube nzima.
Ukuthembela ekuhlolweni okubukhoma ngezimoto zangempela emigwaqweni ngokuvamile akuwona umqondo omuhle njengoba kunezinto eziningi eziguquguqukayo okufanele zicatshangelwe kuzo zonke izimo zokushayela ezihlukene.
3. Ukuphatheka Kwedatha
Ukuze zikwazi ukwabelana nabanye ngedatha yazo yokuqeqeshwa, izinhlangano zidinga izindlela ezithembekile nezivikelekile. Ukufihla ulwazi lomuntu siqu oluhlonzayo (PII) ngaphambi kokwenza isethi yedatha ibe sesidlangalaleni kungolunye uhlelo lokusebenza oluthakazelisayo lwedatha yokwenziwa. Ukushintshanisa amasethi edatha ocwaningo lwesayensi, idatha yezokwelapha, idatha yezenhlalo yabantu, nezinye izinkambu ezingaqukatha i-PII, kubhekiselwa kuzo njengedatha yokwenziwa egcina ubumfihlo.
4. Security
Izinhlangano zivikeleke kakhulu ngenxa yedatha yokwenziwa. Mayelana nesibonelo sethu sokubona ubuso futhi, kungenzeka ujwayelene nenkulumo ethi “deep fakes,” echaza izithombe noma amavidiyo akhiwe. Ama-deep fakes angakhiqizwa amabhizinisi ukuze ahlole ukuqashelwa kobuso bawo kanye nezinhlelo zokuphepha. Idatha yokwenziwa iphinde isetshenziswe ekugadweni kwevidiyo ukuqeqesha amamodeli ngokushesha okukhulu nangezindleko ezishibhile.
Idatha Yokwenziwa kanye Nokufunda Ngomshini
Ukuze wakhe imodeli eqinile nethembekile, ama-algorithms okufunda komshini adinga inani elibalulekile ledatha okufanele licutshungulwe. Uma ingekho idatha yokwenziwa, ukukhiqiza umthamo omkhulu kangaka wedatha kungaba inselele.
Ezizindeni ezifana nombono wekhompyutha noma ukucutshungulwa kwesithombe, lapho ukuthuthukiswa kwamamodeli kuhanjiswa ukuthuthukiswa kwedatha yokwenziwa yangaphambi kwesikhathi, kungase kubaluleke kakhulu. Intuthuko entsha emkhakheni wokuqashelwa kwezithombe ukusetshenziswa kwe-Generative Adversarial Networks (GANs). Ivamise ukuhlanganisa amanethiwekhi amabili: ijeneretha kanye nobandlululo.
Ngenkathi inethiwekhi yababandlululi ihlose ukuhlukanisa izithombe zangempela kwezingamanga, inethiwekhi yokukhiqiza isebenza ukukhiqiza izithombe zokwenziwa ezifana kakhulu nezithombe zomhlaba wangempela.
Ekufundeni komshini, ama-GAN ayisethi engaphansi yomndeni wenethiwekhi ye-neural, lapho womabili amanethiwekhi eqhubeka efunda futhi athuthuke ngokwengeza amanodi amasha nezendlalelo.
Lapho udala idatha yokwenziwa, unenketho yokushintsha indawo kanye nohlobo lwedatha njengoba kudingeka ukuze kuthuthukiswe ukusebenza kwemodeli. Nakuba ukunemba kwedatha yokwenziwa kungafinyelelwa kalula ngesikolo esiqinile, ukunemba kwedatha yesikhathi sangempela enelebula ngezinye izikhathi kungase kubize kakhulu.
Ungayenza kanjani idatha yokwenziwa?
Izindlela ezisetshenziswayo ukwakha iqoqo ledatha yokwenziwa zimi kanje:
Ngokusekelwe ekusabalaliseni kwezibalo
Isu elisetshenziswe kulesi simo ukuthatha izinombolo ekusatshalalisweni noma ukubheka ukusatshalaliswa kwezibalo kwangempela ukuze kudalwe idatha engamanga ebukeka iqhathaniseka. Idatha yangempela ingase ingabi khona ngokuphelele kwezinye izimo.
Usosayensi wedatha angakwazi ukukhiqiza idathasethi equkethe isampuli engahleliwe yanoma yikuphi ukusatshalaliswa uma enolwazi olunzulu lokusatshalaliswa kwezibalo kudatha yangempela. Ukusabalalisa okuvamile, ukusatshalaliswa komchazi, ukusabalalisa kwe-chi-square, ukusatshalaliswa kwe-lognormal, nokunye okwengeziwe kuyizibonelo ezimbalwa zokusatshalaliswa kwamathuba ezibalo angasetshenziswa ukwenza lokhu.
Izinga lolwazi lososayensi wedatha ngesimo lizoba nomthelela omkhulu ekunembeni kwemodeli eqeqeshiwe.
Kuye ngemodeli
Le nqubo yakha imodeli elandisa ngokuziphatha okubhekiwe ngaphambi kokusebenzisa leyo modeli ukukhiqiza idatha engahleliwe. Empeleni, lokhu kuhilela ukufaka idatha yangempela kudatha evela ekusabalaliseni okwaziwayo. Indlela ye-Monte Carlo-ke ingasetshenziswa yizinkampani ukudala idatha engelona iqiniso.
Ngaphezu kwalokho, ukusabalalisa kungafakwa ngokusebenzisa amamodeli wokufunda wemishini njengezihlahla zesinqumo. Ososayensi bedatha kumele sinake isibikezelo, noma kunjalo, njengoba izihlahla zesinqumo zivame ukugcwala ngokweqile ngenxa yobulula bazo kanye nokunwetshwa kokujula.
Ngokufunda okujulile
Ukufunda okujulile amamodeli asebenzisa i-Variational Autoencoder (VAE) noma i-Generative Adversarial Network (GAN) izindlela ezimbili zokudala idatha yokwenziwa. Amamodeli wokufunda womshini ongagadiwe afaka ama-VAE.
Zakhiwe izifaki khodi, ezishwabanisa futhi zihlanganise idatha yangempela, namadekhoda, acubungula le datha ukuze anikeze ukumelela idatha yangempela. Ukugcina idatha yokufaka nephumayo ifana ngangokunokwenzeka kuyinhloso eyisisekelo ye-VAE. Amanethiwekhi amabili e-neural aphikisanayo amamodeli e-GAN namanethiwekhi aphikisanayo.
Inethiwekhi yokuqala, eyaziwa ngokuthi inethiwekhi yokukhiqiza, iphethe ukukhiqiza idatha mbumbulu. Inethiwekhi yababandlululi, inethiwekhi yesibili, isebenza ngokuqhathanisa idatha yokwenziwa edaliwe nedatha yangempela ngomzamo wokukhomba ukuthi ingabe idathasethi inomgunyathi. Umbandlululi wazisa ijeneretha lapho ithola idathasethi mbumbulu.
Iqoqo elilandelayo ledatha elinikezwe umbandlululi libuye lilungiswe ijeneretha. Ngenxa yalokho, umbandlululi uba ngcono ngokuhamba kwesikhathi ekuboneni amasethi edatha mbumbulu. Lolu hlobo lwemodeli luvame ukusetshenziswa emkhakheni wezezimali ukuze kutholwe ukukhwabanisa kanye nasemkhakheni wezokunakekelwa kwempilo ukuze kuthathwe izithombe zezokwelapha.
I-Data Augmentation iyindlela ehlukile esetshenziswa ososayensi bedatha ukuze bakhiqize idatha eyengeziwe. Akufanele kwenziwe iphutha ngedatha mbumbulu, nokho. Kalula nje, ukukhuliswa kwedatha kuyisenzo sokwengeza idatha entsha kudathasethi yangempela ekhona kakade.
Ukudala izithombe ezimbalwa ngesithombe esisodwa, ngokwesibonelo, ngokulungisa umumo, ukukhanya, ukukhuliswa, nokuningi. Kwesinye isikhathi, isethi yedatha yangempela isetshenziswa ngolwazi lomuntu siqu kuphela olusele. Ukungaziwa kwedatha yilokho okuyikho, futhi isethi yedatha enjalo ngokufanayo akufanele ithathwe njengedatha yokwenziwa.
Izinselele kanye nemikhawulo yedatha yokwenziwa
Nakuba idatha yokwenziwa inezinzuzo ezihlukahlukene ezingasiza amafemu ngemisebenzi yesayensi yedatha, futhi inemikhawulo ethile:
- Ukuthembeka kwedatha: Kuwulwazi oluvamile ukuthi yonke imodeli yokufunda yomshini/yokufunda ngokujulile inhle kuphela njengedatha ephakelwayo. Ikhwalithi yedatha yokwenziwa kulo mongo ihlobene kakhulu nekhwalithi yedatha yokufaka kanye nemodeli esetshenziselwa ukukhiqiza idatha. Kubalulekile ukuqinisekisa ukuthi akukho ukuchema okukhona kudatha yomthombo, njengoba lokhu kungabonakaliswa ngokucace kakhulu kudatha yokwenziwa. Ngaphezu kwalokho, ngaphambi kokwenza noma yiziphi izibikezelo, ikhwalithi yedatha kufanele iqinisekiswe futhi iqinisekiswe.
- Kudinga ulwazi, umzamo nesikhathi: Nakuba ukudala idatha yokwenziwa kungase kube lula futhi kungabizi kakhulu kunokudala idatha yangempela, kudinga ulwazi oluthile, isikhathi, nomzamo.
- Ukuphindaphinda okudidayo: Ukufanekisa okuphelele kwedatha yomhlaba wangempela akunakwenzeka; idatha yokwenziwa ingayilinganisela kuphela. Ngakho-ke, ezinye izinto ezingaphandle ezikhona kudatha yangempela zingase zingambozwa idatha yokwenziwa. Okudidayo kwedatha kubaluleke kakhulu kunedatha evamile.
- Ukulawula ukukhiqizwa kanye nokuqinisekisa izinga: Idatha yokwenziwa ihloselwe ukuphindaphinda idatha yomhlaba wangempela. Ukuqinisekiswa kwedatha mathupha kubaluleka. Kubalulekile ukuqinisekisa ukunemba kwedatha ngaphambi kokuyihlanganisa emshinini wokufunda/amamodeli wokufunda okujulile kumadathasethi ayinkimbinkimbi adalwe kusetshenziswa ama-algorithms.
- Impendulo yomsebenzisi: Njengoba idatha yokwenziwa ingumqondo wenoveli, akuwona wonke umuntu ozobe elungele ukukholelwa izibikezelo ezenziwe ngayo. Lokhu kubonisa ukuthi ukuze kwandiswe ukwamukeleka komsebenzisi, okokuqala kudingekile ukukhulisa ulwazi lokusebenziseka kwedatha yokwenziwa.
Ikusasa
Ukusetshenziswa kwedatha yokwenziwa kukhuphuke kakhulu kule minyaka eyishumi edlule. Nakuba isindisa izinkampani isikhathi nemali, ayinazo izinkinga zayo. Intula izinto ezingaphandle, ezenzeka ngokwemvelo kudatha yangempela futhi zibalulekile ekunembeni kwamanye amamodeli.
Kuhle futhi ukuqaphela ukuthi ikhwalithi yedatha yokwenziwa ivamise ukuncika kudatha yokufaka esetshenziselwa ukudala; ukuchema kudatha yokokufaka kungasakazeka ngokushesha kudatha yokwenziwa, ngaleyo ndlela ukukhetha idatha yekhwalithi ephezulu njengendawo yokuqala akufanele kugcizelelwe.
Okokugcina, idinga ukulawula okukhiphayo okwengeziwe, okuhlanganisa ukuqhathanisa idatha yokwenziwa nedatha yangempela echazwe ngumuntu ukuze kuqinisekiswe ukuthi ukungafani akwethulwa. Naphezu kwalezi zingqinamba, idatha yokwenziwa isalokhu iyinkambu ethembisayo.
Kusisiza ukuthi sakhe izixazululo ze-AI zenoveli nanoma idatha yomhlaba wangempela ingatholakali. Okubaluleke kakhulu, kwenza amabhizinisi akwazi ukwakha imikhiqizo ebandakanya wonke umuntu futhi ekhombisa ukuhlukahluka kwabathengi bayo.
Nokho, esikhathini esizayo esiqhutshwa idatha, idatha yokwenziwa ihlose ukusiza ososayensi bedatha benze inoveli nemisebenzi yokudala engaba inselele ukuyiqeda ngedatha yomhlaba wangempela kuphela.
Isiphetho
Kwezinye izimo, idatha yokwenziwa ingadambisa ukushoda kwedatha noma ukuntuleka kwedatha efanele ngaphakathi kwebhizinisi noma inhlangano. Siphinde sabheka ukuthi yimaphi amasu angasiza ekukhiqizeni idatha yokwenziwa nokuthi ubani ongazuza ngayo.
Siphinde sakhuluma ngobunye ubunzima obuhambisana nokubhekana nedatha yokwenziwa. Ukuze kuthathwe izinqumo zezentengiso, idatha yangempela izohlala ithandwa. Kodwa-ke, idatha engokoqobo iyindlela engcono kakhulu elandelayo lapho idatha yeqiniso enjalo ingafinyeleleki ukuze ihlaziywe.
Kodwa-ke, kufanele kukhunjulwe ukuthi ukuze kukhiqizwe idatha yokwenziwa, ososayensi bedatha abanokuqonda okuqinile kokumodela idatha bayadingeka. Ukuqondisisa okuphelele kwedatha yangempela nendawo eyizungezile nakho kubalulekile. Lokhu kubalulekile ukuqinisekisa ukuthi, uma ikhona, idatha ekhiqiziwe inembe ngangokunokwenzeka.
shiya impendulo