Kwiminyaka yakutshanje, iimodeli zokufunda nzulu ziye zasebenza ngakumbi ekuqondeni ulwimi lwabantu.
Cinga ngeeprojekthi ezifana I-GPT-3, ekwaziyo ngoku ukwenza amanqaku apheleleyo kunye neewebhusayithi. I-GitHub yazisa kutshanje IGitHub Copilot, inkonzo enikezela ngeziqwengana zekhowudi ngokuchaza ngokulula uhlobo lwekhowudi oyifunayo.
Abaphandi e-OpenAI, Facebook, kunye ne-Google baye basebenza ngeendlela zokusebenzisa ukufunda okunzulu ukusingatha omnye umsebenzi: imifanekiso echazayo. Ukusebenzisa i-dataset enkulu enezigidi zamangenelo, baye beza nezinye mangalisa iziphumo.
Kutshanje, aba baphandi baye bazama ukwenza umsebenzi ochaseneyo: ukwenza imifanekiso kwi-caption. Ngaba ngoku kunokwenzeka ukwenza umfanekiso omtsha ngokupheleleyo ngaphandle kwenkcazo?
Esi sikhokelo siya kuphonononga ezimbini zezona modeli ziphambili zokubhaliweyo ukuya kumfanekiso: I-OpenAI's DALL-E 2 kunye neGoogle Imagen AI. Nganye kwezi projekthi iye yazisa iindlela ezisisiseko ezinokutshintsha uluntu njengoko silwazi.
Kodwa kuqala, masiqonde ukuba sithetha ukuthini xa sithetha ngesicatshulwa ukuya kumfanekiso.
Yintoni ukuveliswa kokubhaliweyo ukuya kumfanekiso?
Iimodeli zokubhaliweyo ukuya kumfanekiso vumela iikhompyutha zenze imifanekiso emitsha kunye neyodwa ngokusekwe kwiingcebiso. Abantu ngoku banokunika inkcazo yombhalo womfanekiso abafuna ukuwuvelisa, kwaye imodeli iya kuzama ukwenza umfanekiso ohambelana nenkcazo ngokusondeleyo kangangoko kunokwenzeka.
Imifuziselo yokufunda koomatshini isebenzise usetyenziso lweeseti zedatha ezinkulu eziqulethe iperi ye-caption-caption yomfanekiso ukuqhubela phambili ukuphucula ukusebenza.
Uninzi lokubhaliweyo ukuya kumfanekiso Iimodeli zisebenzisa imodeli yolwimi lwesiguquli ukutolika imiyalelo. Olu hlobo lomzekelo a inethiwekhi yomnatha ezama ukufunda umxholo kunye nentsingiselo yesemantiki yolwimi lwendalo.
Okulandelayo, iimodeli zokuvelisa ezifana iimodeli zokusasaza kunye nothungelwano oluvelisayo oluchaseneyo lusetyenziselwa ukuhlanganiswa komfanekiso.
Yintoni iDALLE 2?
I-DALL-E2 imodeli yekhompyutha yi-OpenAI eyakhutshwa ngo-Epreli 2022. Imodeli yaqeqeshwa kwisiseko sedatha yezigidi zemifanekiso ebhaliweyo ukudibanisa amagama kunye namabinzana kwimifanekiso.
Abasebenzisi banokuchwetheza ibinzana elilula, njengokuthi "ikati edla i-lasagna", kwaye i-DALL-E 2 iya kuvelisa ingcaciso yayo yento ibinzana elizama ukuyichaza.
Ngaphandle kokudala imifanekiso ukusuka ekuqaleni, i-DALL-E 2 inokuhlela kwakhona imifanekiso ekhoyo. Kulo mzekelo ungezantsi, i-DALL-E yakwazi ukuvelisa umfanekiso olungisiweyo wegumbi kunye nesofa elongezelelweyo.
I-DALL-E 2 yenye yeeprojekthi ezifanayo ezikhutshwe yi-OpenAI kule minyaka imbalwa idlulileyo. I-OpenAI's GPT-3 iye yaba yindaba-mlonyeni xa ibonakala ngathi ivelisa isicatshulwa esineendlela ezahlukeneyo.
Okwangoku, i-DALL-E 2 isekuvavanyo lwe-beta. Abasebenzisi abanomdla banokubhalisela zabo uluhlu lokulinda kwaye ulinde ukufikelela.
Usebenza njani?
Ngelixa iziphumo ze-DALL-E 2 zinomtsalane, usenokuba uyazibuza ukuba isebenza njani yonke.
I-DALL-E 2 ngumzekelo wokuphunyezwa kwezinto ezininzi ze-OpenAI's GPT-3 project.
Okokuqala, ukwaziswa kokubhaliweyo komsebenzisi kufakwa kwi-encoder yokubhaliweyo eyenza i-prompt ye-indawo yokubonisa. I-DALL-E 2 isebenzisa enye imodeli ye-OpenAI ebizwa ngokuba yi-CLIP (i-Contrastive Language-Image Pre-Training) ukufumana ulwazi lwe-semantic kulwimi lwendalo.
Emva koko, imodeli eyaziwa ngokuba yi phambi yenza imephu yokubhaliweyo kufakelo lwekhowudi yomfanekiso. Le khowudi yomfanekiso kufuneka ibambe ulwazi lwesemantic olufunyenwe kwinyathelo lokufakwa kwekhowudi yokubhaliweyo.
Ukwenza umfanekiso wangempela, i-DALL-E 2 isebenzisa idikhowuda yomfanekiso ukuvelisa okubonakalayo usebenzisa ulwazi lwesemantic kunye neenkcukacha zokufakwa kweekhowudi zomfanekiso. I-OpenAI isebenzisa uguqulelo olulungisiweyo lwe UKUQHUBA imodeli ukwenza ukuveliswa komfanekiso. I-GLIDE ixhomekeke kwi-a imodeli yokusasaza ukwenza imifanekiso.
Ukongezwa kwe-GLIDE kwimodeli ye-DALL-E 2 yenze ukuba i-photorealistic iphume ngakumbi. Ekubeni imodeli ye-GLIDE i-stochastic okanye inqunywe ngokungaqhelekanga, imodeli ye-DALL-E 2 inokudala ngokulula ukuhluka ngokuqhuba imodeli kwakhona kwaye kwakhona.
Imida
Ngaphandle kweziphumo ezithandekayo zemodeli ye-DALL-E 2, isajongene nemida ethile.
Upelo Isiqendu
Iingcebiso ezizama ukwenza i-DALL-E 2 ivelise isicatshulwa sibonisa ukuba inobunzima bokupela amagama. Iingcali zicinga ukuba oku kungenxa yokuba ulwazi lopelo aluyonxalenye ye isethi yedatha yoqeqesho.
Ukuqiqa Okuqukayo
Abaphandi baqaphela ukuba i-DALL-E 2 isenobunzima obuthile bokuqiqa okuqulunqiweyo. Ngamafutshane, imodeli inokuqonda imiba yomfanekiso ngelixa usenengxaki yokufumana ubudlelwane phakathi kwale miba.
Umzekelo, ukuba unikwe ngokukhawuleza "ityhubhu ebomvu phezu kwetyhubhu eluhlaza", i-DALL-E iya kuvelisa ityhubhu eluhlaza kunye netyhubhu ebomvu ngokuchanekileyo kodwa iyasilela ukuyibeka ngokuchanekileyo. Lo mzekelo uphinde wabonwa ukuba unobunzima bezinto ezifuna ukuba kutsalwe inani elithile lezinto.
Icala kwidathasethi
Ukuba i-prompt ayinazo ezinye iinkcukacha, i-DALL-E iye yabonwa ukubonisa abantu abamhlophe okanye abaseNtshona kunye neendawo. Olu khetho lokumela lwenzeka ngenxa yobuninzi bemifanekiso yeNtshona-centric kwidathasethi.
Lo mzekelo uphinde waqatshelwa ukuba ulandele iingqikelelo zesini. Umzekelo, ukuchwetheza "umlindi wenqwelomoya" okhawulezayo uvelisa imifanekiso yabasetyhini abagadi benqwelomoya.
Yintoni iGoogle Imagen AI?
Google Umfanekiso we-AI ngumzekelo ojolise ekudaleni imifanekiso ye-photorealistic evela kumbhalo wokufakwayo. Ngokufana ne-DALL-E, imodeli iphinda isebenzise imodeli yolwimi lwe-transformer ukuqonda isicatshulwa kwaye ixhomekeke ekusebenziseni imodeli yokusabalalisa ukudala imifanekiso ephezulu.
Ecaleni kwe-Imagen, uGoogle ukwakhuphe ibhenchmark yeemodeli zokuya kumfanekiso ezibizwa ngokuba yiDrawBench. Besebenzisa i-DrawBench, baye bakwazi ukujonga ukuba amaxabiso abantu akhetha imveliso ye-Imagen kunezinye iimodeli ezibandakanya i-DALL-E 2.
Usebenza njani?
Ngokufana ne-DALL-E, i-Imagen iqala iguqulela umyalezo womsebenzisi ube luzinziso lombhalo ngokusebenzisa i-encoder yombhalo emkhenkcezileyo.
I-Imagen isebenzisa imodeli yosasazo efunda indlela yokuguqula ipateni yengxolo ibe yimifanekiso. Imveliso yokuqala yale mifanekiso inesisombululo esiphantsi kwaye kamva idluliswa kwenye imodeli eyaziwa ngokuba yimodeli yokusasazwa kwe-super-resolution ukwandisa isisombululo somfanekiso wokugqibela. Imodeli yokuqala yokusasazwa ikhupha umfanekiso we-pixel we-64×64 kwaye kamva ivuthelwe ukuya kumfanekiso okwi-high-resolution 1024×1024.
Ngokusekwe kuphando lweqela le-Imagen, iimodeli ezinkulu zolwimi ezikhenkcezisiweyo eziqeqeshwe kuphela kwidatha yokubhaliweyo zisasebenza kakhulu kwiikhowudi zokubhaliweyo zokwenziwa kombhalo ukuya kumfanekiso.
Uphononongo kwakhona lwazisa ingqikelelo ye-dynamic thresholding. Le ndlela yenza ukuba imifanekiso ibonakale ifotorealistic ngakumbi ngokunyusa ubunzima besikhokelo xa usenza umfanekiso.
Ukusebenza kweDALLE 2 vs Imagen
Iziphumo zangaphambili ezivela kwi-benchmark kaGoogle zibonisa ukuba abaphenduli babantu bakhetha imifanekiso eyenziwe ngu-Imagen ngaphezu kwe-DALL-E 2 kunye nezinye iimodeli zokubhaliweyo ukuya kumfanekiso ezifana ne-Latent Diffusion kunye ne-VQGAN+CLIP.
Isiphumo esivela kwiqela le-Imagen siye sabonisa ukuba imodeli yabo yenza ngcono kwisicatshulwa sopelo, ubuthathaka obaziwayo bemodeli ye-DALL-E 2.
Nangona kunjalo, kuba uGoogle engekayikhupheli imodeli eluntwini, kusafuneka kubonwe ukuba ichaneka kangakanani na ibhentshi zikaGoogle.
isiphelo
Ukunyuka kweemodeli ze-photorealistic text-to-image kuyimpikiswano kuba le mizekelo ivuthiwe ukuba isetyenziswe ngokungekho mthethweni.
Itekhnoloji inokukhokelela ekudalweni komxholo ocacileyo okanye njengesixhobo sokungachazi ulwazi. Abaphandi abavela kuGoogle kunye ne-OpenAI bayayazi le nto, nto leyo ebangela ukuba obu buchwepheshe bungekafikeleleki kuye wonke umntu.
Iimodeli zombhalo ukuya kumfanekiso nazo zinefuthe elibalulekileyo kwezoqoqosho. Ngaba ubugcisa obufana neemodeli, abafoti, kunye namagcisa aya kuchaphazeleka ukuba iimodeli ezifana ne-DALL-E ziba yinto eqhelekileyo?
Okwangoku, le modeli isenayo imida. Ukubamba nawuphi na umfanekiso owenziwe yi-AI ukuze uhlolisise kuya kubonisa ukungafezeki kwayo. Ngezo zombini i-OpenAI kunye noGoogle zikhuphisana kwezona modeli zisebenzayo, inokuba ngumcimbi wexesha ngaphambi kokuba kuveliswe imveliso egqibeleleyo: umfanekiso ongacaciswanga kwinto yokwenyani.
Ucinga ukuba kuya kwenzeka ntoni xa iteknoloji isiya kude kangaka?
Shiya iMpendulo