Cishe uyazi ukuthi ikhompuyutha ingachaza isithombe.
Ngokwesibonelo, isithombe senja idlala nezingane zakho singahunyushwa ngokuthi 'inja nezingane engadini.' Kodwa bewazi ukuthi indlela ehlukile manje isingenzeka futhi? Ubhala amagama athile, bese umshini ukhiqiza isithombe esisha.
Ngokungafani nosesho lwe-Google, olusesha izithombe ezikhona, konke lokhu kusha. Eminyakeni yakamuva, i-OpenAI ibingenye yezinhlangano eziholayo, ibika imiphumela emangalisayo.
Baqeqesha ama-algorithms abo kuma-database amakhulu wombhalo nezithombe. Bashicilele iphepha kumodeli yabo yesithombe ye-GLIDE, eyaqeqeshwa ezithombeni ezingamakhulu ezigidi. Ngokuya nge-photorealism, idlula imodeli yabo yangaphambili ye-'DALL-E'.
Kulokhu okuthunyelwe, sizobheka i-OpenAI's GLIDE, enye yezinyathelo ezimbalwa ezihehayo okuhloswe ngazo ukukhiqiza nokushintsha izithombe ze-photorealistic ngamamodeli okusabalalisa aqondiswa umbhalo. Ake siqale.
Kuyini Vula i-AI Glide?
Nakuba izithombe eziningi zingachazwa ngamagama, ukwenza izithombe ngokufaka umbhalo kudinga ulwazi olukhethekile kanye nesikhathi esibalulekile.
Ukuvumela i-ejenti ye-AI ukuthi ikhiqize izithombe ze-photorealistic ezivela ekwazisweni kolimi lwemvelo akuvumeli kuphela abantu ukuthi bakhe izinto ezibonakalayo ezicebile nezihlukahlukene kalula ngendlela engakaze ibonwe ngaphambili kodwa futhi kuvumela ukuhlungwa okuphindaphindayo okulula nokulawula okucolisekile kwezithombe ezidaliwe.
I-GLIDE ingasetshenziselwa ukuhlela izithombe ezikhona ngokusebenzisa iziyalezo zolimi lwemvelo ukufaka izinto ezintsha, ukudala izithunzi nokubonisa, ukwenza. ukudweba isithombe, njalo njalo.
Ingase futhi iguqule imidwebo yemigqa eyisisekelo ibe izithombe ze-photorealistic, futhi inamakhono akhethekile okukhiqiza nokulungisa amasampula angamaziro ezimeni eziyinkimbinkimbi.
Ucwaningo lwakamuva lubonise ukuthi amamodeli okusabalalisa asekelwe okungenzeka nawo angakhiqiza izithombe zokwenziwa zekhwalithi ephezulu, ikakhulukazi uma ehlanganiswe nendlela eyisiqondiso ebhalansisa ukuhlukahluka nokwethembeka.
I-OpenAI ishicilele i-a imodeli yokusabalalisa eqondisiwe ngoMeyi, okuvumela amamodeli okusabalalisa ukuthi abe nemibandela kumalebula ohlukanisa ngezigaba. I-GLIDE ithuthukisa le mpumelelo ngokuletha ukusakazeka okuqondisiwe enkingeni yokudala isithombe esinemibandela yombhalo.
Ngemva kokuqeqesha ipharamitha eyizigidi eziyizinkulungwane ezingu-3.5 imodeli yokusabalalisa ye-GLIDE isebenzisa isifaki khodi sombhalo ukuze ihambisane nezincazelo zolimi lwemvelo, abacwaningi bahlole amanye amasu omhlahlandlela: isiqondiso se-CLIP kanye nesiqondiso esingenaso isigaba.
I-CLIP iwuhlelo olungalawuleki lokufunda izethulo ezihlanganyelwe zombhalo nezithombe ezilethela amaphuzu ngokusekelwe ekutheni isithombe siseduze kangakanani namagama-ncazo.
Ithimba lisebenzise leli su kumamodeli alo okusabalalisa ngokufaka esikhundleni somhleli ngemodeli ye-CLIP “eqondisa” amamodeli. Ngaleso sikhathi, isiqondiso samahhala esihlanganisa isigaba siyisu lokuqondisa amamodeli okusabalalisa angafaki ukuqeqeshwa kwesihlukanisi esihlukile.
I-GLIDE Architecture
Isakhiwo se-GLIDE siqukethe izingxenye ezintathu: i-Ablated Diffusion Model (ADM) eqeqeshelwe ukukhiqiza isithombe esingu-64 × 64, imodeli yombhalo (i-transformer) ethonya ukukhiqizwa kwesithombe ngomyalo wombhalo, kanye nemodeli yokusampula eguqula i-64 × 64 yethu encane. izithombe ezitolika kakhudlwana 256 x 256 pixels.
Izingxenye ezimbili zokuqala zisebenza ndawonye ukuze zilawule inqubo yokukhiqiza izithombe ukuze zibonise ngokufanelekile ukwaziswa kombhalo, kuyilapho yokugcina idingeka ukwenza izithombe esizidalayo ziqondeke kalula. Iphrojekthi ye-GLIDE igqugquzelwe ngu-a umbiko oshicilelwe ngo-2021 eyabonisa ukuthi amasu e-ADM asebenza kahle kakhulu kunamanje amamodeli akhiqizayo adumile njengamanje ngokwekhwalithi yesampula yezithombe.
Ku-ADM, ababhali be-GLIDE basebenzise imodeli efanayo ye-ImageNet 64 x 64 njenge-Dhariwal ne-Nichol, kodwa ngamashaneli angu-512 esikhundleni sika-64. Imodeli ye-ImageNet inamapharamitha alinganiselwa ku-2.3 billion ngenxa yalokhu.
Ithimba le-GLIDE, ngokungafani no-Dhariwal no-Nichol, lalifuna ukuba nokulawula okuqondile okukhulu phezu kwenqubo yokukhiqiza izithombe, ngaleyo ndlela lihlanganise imodeli ebonakalayo ne-transformer ekwazi ukunaka. I-GLIDE ikunikeza ukulawula okuthile kokuphumayo kwenqubo yokukhiqiza isithombe ngokucubungula imiyalelo yokufaka umbhalo.
Lokhu kufezwa ngokuqeqesha imodeli yesiguquli kudathasethi enkulu ngokufanelekile yezithombe namagama-ncazo (afana nalawo asetshenziswe kuphrojekthi ye-DALL-E).
Umbhalo uqale ubhalwe ngekhodi ochungechungeni lwamathokheni ka-K ukuze ubekwe esimweni. Ngemuva kwalokho, amathokheni alayishwa kumodeli ye-transformer. Ukukhishwa kwe-transformer kungasetshenziswa ngezindlela ezimbili. Kumodeli ye-ADM, ukushumeka kwethokheni kokugcina kusetshenziswa esikhundleni sokushumeka kwekilasi.
Okwesibili, isendlalelo sokugcina sokushumeka kwamathokheni - uchungechunge lwamavektha wesici - luvezwa ngokuzimela ubukhulu besendlalelo ngasinye sokunaka kumodeli ye-ADM futhi sixhunywe kumongo ngamunye wokunaka.
Eqinisweni, lokhu kwenza imodeli ye-ADM ikwazi ukukhiqiza isithombe esisuka ezinhlanganisela ezintsha zamathokheni ombhalo afanayo ngendlela eyingqayizivele neyizithombe, ngokusekelwe ekuqondeni kwayo okufundiwe kwamagama okokufaka kanye nemifanekiso yawo ehlobene. Lesi siguquli sombhalo wekhodi siqukethe amapharamitha ayizigidi eziyizinkulungwane ezingu-1.2 futhi sisebenzisa amabhulokhi asele angama-24 anobubanzi buka-2048.
Okokugcina, imodeli yokusabalalisa i-upsampler ihlanganisa cishe amapharamitha ayizigidi eziyizinkulungwane ezingu-1.5 futhi iyahlukahluka kumodeli eyisisekelo ngoba isishumeki saso sombhalo sincane, ngobubanzi beziteshi eziyisisekelo ezingu-1024 nezingu-384, uma kuqhathaniswa nemodeli yesisekelo. Le modeli, njengoba negama libonisa, isiza ekuthuthukisweni kwesampula ukuze kuthuthukiswe ukutolika kwayo kokubili imishini nabantu.
Imodeli yokusabalalisa
I-GLIDE ikhiqiza izithombe isebenzisa inguqulo yayo ye-ADM (ADM-G ethi “guided”). Imodeli ye-ADM-G iwukuguqulwa kwemodeli ye-U-net yokusabalalisa. Imodeli ye-U-net ehlukanisiwe ihluke kakhulu kumasu ajwayeleke kakhulu okuhlanganiswa kwezithombe afana ne-VAE, i-GAN, nama-transformer.
Bakha uchungechunge lwe-Markov lwezinyathelo zokusabalalisa ukuze bafake kancane kancane umsindo ongahleliwe kudatha, bese befunda ukuhlehlisa inqubo yokusabalalisa futhi bakhe kabusha amasampula edatha adingekayo kusukela kumsindo wodwa. Isebenza ngezigaba ezimbili: ukusakazeka phambili nokuhlehla.
Indlela yokusabalalisa phambili, inikezwe iphoyinti ledatha elisuka ekusabalaliseni kwangempela kwesampula, yengeza inani elincane lomsindo kusampula phezu kochungechunge olusethiwe lwezinyathelo. Njengoba izinyathelo zikhula ngosayizi futhi zisondela kokungapheli, isampula ilahlekelwa yizo zonke izici ezibonakalayo futhi ukulandelana kuqala ukufana nejika le-Gaussian le-isotropic.
Ngesikhathi sokusabalalisa emuva isigaba, imodeli yokusabalalisa ifunda ukubuyisela emuva ithonya lomsindo owengeziwe ezithombeni futhi iholele isithombe esikhiqiziwe sibuyele esimweni saso sasekuqaleni ngokuzama ukufana nokusabalalisa isampula yokufakwayo kwasekuqaleni.
Imodeli eqediwe ingenza kanjalo ngokufaka kwangempela komsindo we-Gaussian kanye nokwaziswa. Indlela ye-ADM-G iyahluka kweyandulelayo ngokuthi imodeli, i-CLIP noma i-transformer eyenziwe ngendlela oyifisayo, inomthelela esigabeni sokusabalalisa esisemuva ngokusebenzisa amathokheni okwaziswa afakiwe.
Amakhono wokushibilika
1. Isizukulwane Sesithombe
Ukusetshenziswa okudume kakhulu nokusetshenziswa kabanzi kwe-GLIDE cishe kuzoba ukuhlanganiswa kwezithombe. Nakuba izithombe zinesizotha futhi i-GLIDE inobunzima ngezinhlobo zezilwane/omuntu, amandla okukhiqizwa kwesithombe esinesithombe esisodwa cishe awapheli.
Ingakha izithombe zezilwane, osaziwayo, indawo ezungezile, izakhiwo, nokunye okuningi, futhi ingakwenza ngezitayela ezihlukahlukene zobuciko kanye nesithombe ngokwangempela. Ababhali babacwaningi bagomela ngokuthi i-GLIDE iyakwazi ukuhumusha futhi iguqule inhlobonhlobo yokufakwayo kombhalo ibe yifomethi ebonakalayo, njengoba kubonakala kumasampula angezansi.
2. Glide inpainting
Ukupenda kwesithombe okuzenzakalelayo kwe-GLIDE ngokungangabazeki kuwukusetshenziswa okujabulisa kakhulu. I-GLIDE ingathatha isithombe esikhona njengokufaka, isicubungule ngomyalo wombhalo engqondweni wezindawo ezidinga ukushintshwa, bese yenza izinguquko ezisebenzayo kulezo zingxenye kalula.
Kufanele isetshenziswe ngokuhambisana nemodeli yokuhlela, efana ne-SDEdit, ukuze kukhiqizwe imiphumela engcono nakakhulu. Ngokuzayo, izinhlelo zokusebenza ezisebenzisa amandla afana nalawa zingabaluleka ekuthuthukiseni izindlela zokushintsha izithombe ezingenakhodi.
Isiphetho
Manje njengoba sesidlule kunqubo, kufanele ubambe izisekelo zendlela i-GLIDE esebenza ngayo, kanye nobubanzi bamakhono ayo ekwakhiweni kwesithombe nokuguqulwa kwesithombe.
shiya impendulo