Usenokuba uyazi ukuba ikhompuyutha inokuchaza umfanekiso.
Ngokomzekelo, umfanekiso wenja edlala nabantwana bakho unokuguqulelwa ngokuthi 'inja nabantwana egadini.' Kodwa ubusazi ukuba indlela echaseneyo inokwenzeka ngoku? Uchwetheza amagama athile, kwaye umatshini uvelisa umfanekiso omtsha.
Ngokungafaniyo nophendlo lukaGoogle, olukhangela iifoto ezikhoyo, konke oku kusitsha. Kwiminyaka yakutshanje, i-OpenAI ibingomnye wemibutho ekhokelayo, enika ingxelo ngeziphumo ezimangalisayo.
Baqeqesha ii-algorithms zabo kumbhalo omkhulu kunye nogcino-lwazi lwemifanekiso. Bapapashe iphepha kwimodeli yabo ye-GLIDE, eyayiqeqeshwe kumakhulu ezigidi zeefoto. Ngokubhekiselele kwi-photorealism, idlula imodeli yabo yangaphambili ye-'DALL-E'.
Kwesi sithuba, siza kujonga kwi-OpenAI's GLIDE, elinye lamanyathelo anomdla ajolise ekuveliseni nasekuguquleni imifanekiso enemifanekiso eneemodeli zokusasazwa kwesicatshulwa. Masiqale.
Yintoni i Vula i-AI Glide?
Ngelixa uninzi lwemifanekiso inokuchazwa ngamagama, ukwenza imifanekiso evela kwigalelo lombhalo kufuna ulwazi olukhethekileyo kunye nexesha elininzi.
Ukuvumela i-arhente ye-AI ukuba ivelise imifanekiso ye-photorealistic evela kwii-produces zolwimi lwendalo ayivumeli kuphela abantu ukuba benze izinto ezibonakalayo ezityebileyo kunye nezahlukeneyo ngokulula ngendlela engazange ibonwe ngaphambili kodwa ikwavumela ukucocwa okuphindaphindayo kunye nolawulo olusulungekileyo lwemifanekiso eyenziweyo.
I-GLIDE inokusetyenziselwa ukuhlela iifoto ezikhoyo ngokusebenzisa isicatshulwa solwimi lwendalo ukufaka izinto ezintsha, ukwenza izithunzi kunye nokubonakalisa, ukwenza. ukupeyinta umfanekiso, kwaye nangokunjalo.
Isenokuthi ijike imizobo yomgca esisiseko ibe ziifoto zefotorealistic, kwaye inesampulu engaqhelekanga yokuvelisa kunye nokulungiswa kweemeko ezinzima.
Uphando lwakutsha nje lubonise ukuba iimodeli zokusasaza ezisekwe ngokunokwenzeka zinokuvelisa imifanekiso eyenziweyo ekumgangatho ophezulu, ngakumbi xa idityaniswe nendlela esisikhokelo elinganisa ukwahluka kunye nokuthembeka.
I-OpenAI ipapashe a imodeli yosasazo olukhokelwayo ngoMeyi, evumela imifuziselo yokusasazwa ukuba ibe nemiqathango kwiilebhile zomhleli. I-GLIDE iyayiphucula le mpumelelo ngokuzisa usasazo olukhokelwayo kwingxaki yokwenziwa komfanekiso wombhalo onemiqathango.
Emva kokuqeqesha i-3.5 yeebhiliyoni zeparameter ye-GLIDE imodeli yokusabalalisa usebenzisa i-encoder yombhalo ukuya kwimeko yenkcazo yolwimi lwendalo, abaphandi bavavanya iindlela ezimbini zokukhokela: isikhokelo se-CLIP kunye nesikhokelo esingena-classifier.
I-CLIP bubuchule obunokwehla bokufunda ukumelwa okudibeneyo kwesicatshulwa kunye nemifanekiso enikezela amanqaku ngokusekelwe kwindlela umfanekiso okufutshane ngayo kwi-caption.
Iqela lisebenzise esi sicwangciso kwiimodeli zabo zokusasazwa ngokutshintsha umdidiyeli ngemodeli ye-CLIP “ekhokela” iimodeli. Ngeli xesha, isikhokelo esingahlawulelwayo sisicwangciso-qhinga sokwalathisa imifuziselo yosasazo olungabandakanyi uqeqesho lomdidiyeli owahlukileyo.
GLIDE Architecture
I-architecture ye-GLIDE iqulethwe ngamacandelo amathathu: i-Ablated Diffusion Model (ADM) eqeqeshelwe ukuvelisa umfanekiso we-64 × 64, imodeli yombhalo (i-transformer) echaphazela ukuveliswa komfanekiso nge-text prompt, kunye ne-upsampling model eguqula i-64 × 64 yethu encinci. imifanekiso ukutolika ngakumbi 256 x 256 pixels.
Amacandelo amabini okuqala asebenza kunye ukulawula inkqubo yokuvelisa umfanekiso ukwenzela ukuba ibonise ngokufanelekileyo isicatshulwa sombhalo, ngelixa le yokugqibela ifunekayo ukwenza imifanekiso esiyidalayo ibe lula ukuyiqonda. Iprojekthi ye-GLIDE yakhuthazwa ngu ingxelo epapashwe ngo-2021 ebonisa ukuba ubuchule be-ADM bugqwesile ngoku imodeli ethandwayo, ye-art-of-art yokuvelisa ngokomgangatho wesampulu yemifanekiso.
Kwi-ADM, ababhali be-GLIDE baqeshe imodeli efanayo ye-ImageNet 64 x 64 njengeDhariwal kunye ne-Nichol, kodwa ngamatshaneli angama-512 endaweni ye-64. Imodeli ye-ImageNet ineeparitha ezimalunga ne-2.3 yeebhiliyoni ngenxa yale nto.
Iqela le-GLIDE, ngokungafaniyo noDhariwal noNichol, bafuna ukulawula ngokuthe ngqo inkqubo yokuvelisa umfanekiso, ngaloo ndlela badibanisa imodeli ebonakalayo kunye ne-transformer enika ingqalelo. I-GLIDE ikunika ulawulo oluthile kwimveliso yenkqubo evelisa umfanekiso ngokuqhubela phambili igalelo lomyalezo obhaliweyo.
Oku kuphunyezwa ngokuqeqesha imodeli yoguqulo kwidathasethi enkulu ngokufanelekileyo yeefoto kunye neenkcazelo (ezifana naleyo iqeshwe kwiprojekthi ye-DALL-E).
Okubhaliweyo kuqala kukhokhowudi kuluhlu lwe K ukuze lulungiswe. Emva koko, amathokheni alayishwa kwimodeli ye-transformer. Imveliso ye-transformer ingasetyenziswa ngeendlela ezimbini. Kwimodeli ye-ADM, i-token embedding yokugqibela isetyenziswa endaweni yokufakela iklasi.
Okwesibini, umaleko wokugqibela wokuzinzisa umqondiso - uthotho lweempawu zevektha - uqikelelwa ngokuzimeleyo kwimilinganiselo yomgangatho woqwalaselo ngamnye kwimodeli ye-ADM kwaye ihambelana nomxholo ngamnye wokuqwalaselwa.
Enyanisweni, oku kwenza ukuba imodeli ye-ADM ivelise umfanekiso ovela kwimidibaniso emitsha yamathokheni abhaliweyo afanayo ngendlela ekhethekileyo kunye ne-photorealistic fashion, ngokusekelwe ekuqondeni kwayo okufundiweyo kwamagama angenayo kunye nemifanekiso ehambelana nayo. Le transformer ye-encoding text iqulethe i-parameters ye-1.2 yeebhiliyoni kwaye iqeshe iibhloko ze-24 ezishiyekileyo kunye nobubanzi be-2048.
Okokugqibela, imodeli yokusasazwa kwe-upsampler ibandakanya malunga ne-1.5 yeebhiliyoni zeeparamitha kwaye iyahluka kwimodeli esisiseko kuba i-encoder yombhalo wayo incinci, kunye nobubanzi be-1024 kunye ne-384 yeendlela ezisisiseko, xa kuthelekiswa nemodeli yesiseko. Le modeli, njengoko negama libonisa, inceda ekuphuculweni kwesampulu ukuze kuphuculwe ukutolika kubo bobabini oomatshini kunye nabantu.
Imodeli yokusabalalisa
I-GLIDE yenza imifanekiso isebenzisa inguqulelo yayo ye-ADM (ADM-G ye-“guided”). Imodeli ye-ADM-G luhlengahlengiso lwemodeli ye-U-net yokusabalalisa. Imodeli ye-U-net yokusasazwa yohluke kakhulu kubuchule obuqhelekileyo bokwenziwa kwemifanekiso njenge-VAE, i-GAN, kunye neziguquli.
Bakha ikhonkco Markov amanyathelo ukusasazwa ngokuthe ngcembe ukutofa ingxolo random kwidatha, baze bafunde ukubuyisela umva inkqubo yokusasazwa kunye nokwakha kwakhona iisampulu data ezifunekayo kwingxolo yedwa. Isebenza kwizigaba ezibini: phambili kunye nokusasaza umva.
Indlela yokusasaza phambili, enikwe inqaku ledatha ukusuka kunikezelo lokwenyani lwesampulu, yongeza isixa esincinci sengxolo kwisampulu ngaphezulu koluhlu olucwangcisiweyo lwamanyathelo. Njengoko amanyathelo enyuka ngobukhulu kunye nokusondela kokungapheliyo, isampuli ilahlekelwa yizo zonke iimpawu ezibonakalayo kwaye ulandelelwano luqala ukufana nejika le-isotropic ye-Gaussian.
Ngexesha lokusasazwa ngasemva isigaba, imodeli yokusasazwa ifunda ukubuyisela umva impembelelo yengxolo eyongeziweyo kwimifanekiso kwaye ikhokelele umfanekiso ovelisiweyo ubuyele kwimilo yawo yokuqala ngokuzama ukufana nokuhanjiswa kwesampulu yegalelo lokuqala.
Imodeli egqityiweyo inokwenza oko ngegalelo lokwenyani lengxolo yeGaussian kunye nokukhawuleza. Indlela ye-ADM-G iyahluka kweyandulelayo kuba imodeli, nokuba yi-CLIP okanye isiguquli esilungiselelweyo, sichaphazela isigaba sangasemva sokusasazwa ngokuqesha iithokheni ezikhawulezayo zeteksti ezifakiweyo.
Izakhono zokutyibilika
1. Isizukulwana soMfanekiso
Olona setyenziso ludumileyo nolusetyenziswa kakhulu lwe-GLIDE luyakuba ludibaniso lwemifanekiso. Nangona imifanekiso ithozamile kwaye i-GLIDE inobunzima kwiifom zezilwanyana / zomntu, ukubanakho ukuveliswa komfanekiso omnye kuphantse kungapheli.
Inokwenza iifoto zezilwanyana, abantu abadumileyo, iimbonakalo zomhlaba, izakhiwo, kunye nokunye okuninzi, kwaye ingayenza ngeendlela ezahlukeneyo zobugcisa kunye neefoto-zokwenene. Ababhali babaphandi baqinisekisa ukuba i-GLIDE iyakwazi ukutolika kunye nokulungelelanisa uluhlu olubanzi lwezimvo ezibhaliweyo kwifomathi ebonakalayo, njengoko kubonwe kwiisampuli ezingezantsi.
2. Ukupeyinta ngokutyibiliza
Ukupeyinta ifoto ye-GLIDE ngokuzenzekelayo lolona setyenziso lunomdla. I-GLIDE inokuthatha umfanekiso osele ukho njengegalelo, iwuqhube ngomyalelo wombhalo engqondweni weendawo ezifuna ukutshintshwa, emva koko yenze ulungiso olusebenzayo kwezo ndawo ngokulula.
Kufuneka isetyenziswe ngokudibeneyo nemodeli yokuhlela, efana ne-SDEdit, ukuvelisa iziphumo ezingcono. Kwixesha elizayo, ii-apps ezithatha ithuba lobukhona obunje bunokuba yimfuneko ekuphuhliseni iindlela zokuguqula umfanekiso ezingenakhowudi.
isiphelo
Ngoku njengoko sidlulile kwinkqubo, kufuneka ubambe iziseko zendlela i-GLIDE esebenza ngayo, kunye nobubanzi besakhono sayo ekudalweni kwemifanekiso kunye nokuguqulwa komfanekiso.
Shiya iMpendulo