Eminyakeni yamuva nje, amamodeli akhiqizayo abizwa ngokuthi "amamodeli okusabalalisa" asethandwa kakhulu, futhi ngesizathu esihle.
Umhlaba usubonile ukuthi yimaphi amamodeli okusabalalisa akwaziyo ukukwenza, njengama-GAN asebenza kahle kakhulu ekuhlanganiseni izithombe, ngenxa yokushicilelwa okumbalwa okuyingqopha-mlando okushicilelwe nje ngawo-2020s & 2021s.
Abasebenzi muva nje babone ukusetshenziswa kwamamodeli wokusabalalisa ku I-DALL-E2, Imodeli yokudala izithombe ye-OpenAI eshicilelwe ngenyanga edlule.
Abasebenzi abaningi bokufunda ngomshini ngokungangabazeki bafuna ukwazi ngokusebenza kwangaphakathi kwe-Diffusion Models uma kubhekwa impumelelo yabo yakamuva.
Kulokhu okuthunyelwe, sizobheka ukusekela kwethiyori kwamamodeli we-Diffusion, ukwakheka kwawo, izinzuzo zawo, nokunye okuningi. Asiqhubeke.
Iyini imodeli ye-Diffusion?
Ake siqale ngokuthola ukuthi kungani le modeli ibizwa ngokuthi imodeli yokusabalalisa.
Igama elihlobene ne-thermodynamics emakilasini e-physics libizwa ngokuthi i-diffusion. Isistimu ayikho ekulinganisweni uma kunokugxilwa okukhulu kwento, njengephunga, endaweni eyodwa.
Ukusabalalisa kufanele kwenzeke ukuze isistimu ingene ekulinganeni. Ama-molecule ephunga ahlakazeka ohlelweni lonke esuka endaweni yokugxilisa ingqondo ephakeme, okwenza uhlelo lufane kulo lonke.
Konke ekugcineni kuba okufanayo ngenxa yokusabalalisa.
Amamodeli okusabalalisa akhuthazwa yilesi simo sokungalingani kwe-thermodynamic. Amamodeli okusabalalisa asebenzisa iketango le-Markov, okuwuchungechunge lwezinto eziguquguqukayo lapho inani lokuhluka ngakunye lincike esimweni somcimbi wangaphambilini.
Ukuthatha isithombe, sengeza ngokulandelana inani elithile lomsindo kuso sonke isigaba sokusabalalisa phambili.
Ngemva kokugcina isithombe esinomsindo kakhulu, siqhubeka nokudala isithombe esilandelayo ochungechungeni ngokwethula umsindo owengeziwe.
Izikhathi eziningana, le nqubo iyenziwa. Isithombe esimsulwa siphuma ekuphindaphindeni le ndlela izikhathi ezimbalwa.
Singasenza kanjani-ke isithombe kusuka kulesi sithombe esimafuhlufuhlu?
Inqubo yokusabalalisa ihlehliswa kusetshenziswa a inethiwekhi ye-neural. Amanethiwekhi afanayo nezisindo ezifanayo zisetshenziswa enqubweni yokusabalalisa eya emuva ukuze kudalwe isithombe ukusuka ku-t ukuya ku-t-1.
Esikhundleni sokuvumela inethiwekhi ilindele isithombe, umuntu angazama ukubikezela umsindo esinyathelweni ngasinye, okufanele sisuswe esithombeni, ukuze enze umsebenzi ube lula.
Kunoma yisiphi isimo, i i-neural network design kufanele kukhethwe ngendlela egcina ubukhulu bedatha.
Ngena Ngokujulile Kumodeli Yokusabalalisa
Izingxenye zemodeli yokusabalalisa ziyinqubo eya phambili (eyaziwa nangokuthi inqubo yokusabalalisa), lapho i-datum (imvamisa isithombe) izwakala kancane kancane, kanye nenqubo ehlehlayo (eyaziwa nangokuthi inqubo yokusabalalisa okuhlanekezelwe), lapho umsindo uzwakala khona. iguqulelwe emuva ibe yisampula ukusuka ekusabalaliseni okuqondiwe.
Uma izinga lomsindo liphansi ngokwanele, ama-Gaussians anemibandela angasetshenziswa ukusungula izinguquko zeketango lamasampula kunqubo eya phambili. Ukwenziwa kwepharamitha okulula kwenqubo eya phambili kuphumela ekuhlanganiseni lolu lwazi nomcabango kaMarkov:
q(x1:T |x0) := YT t=1 q(xt|xt−1), q(xt|xt−1) := N (xt; p 1 − βtxt−1, βtI)
Lapha 1, XNUMX, XNUMX, XNUMX….T iyishejuli yokuhluka (okufundiwe noma elungisiwe) eqinisekisa, ngo-T ophakeme ngokwanele, ukuthi i-xT cishe iyi-Gaussian isotropic.
Inqubo ephambene yilapho umlingo wemodeli yokusabalalisa kwenzeka khona. Imodeli ifunda ukubuyisela emuva le nqubo yokusabalalisa phakathi nokuqeqeshwa ukuze ikhiqize idatha entsha. Imodeli ifunda ukusatshalaliswa okuhlangene njenge (x0:T) umphumela wokuqala ngesibalo esimsulwa somsindo we-Gaussian
(xT):=N(xT,0,I).
pθ(x0:T ) := p(xT ) YT t=1 pθ(xt−1|xt), pθ(xt−1|xt) := N (xt−1; µθ (xt, t), Σθ( xt, t))
lapho amapharamitha ancike esikhathini se-Gaussian atholwa khona. Ikakhulukazi, qaphela ukuthi ukwakheka kwe-Markov kusho kanjani ukuthi ukusabalalisa kwenguquko ehlehlayo enikeziwe kuncike kakhulu esinyathelweni sesikhathi sangaphambilini (noma isitebhisi sesikhathi esilandelayo, kuye ngokuthi ukubheka kanjani):
pθ(xt−1|xt) := N (xt−1; µθ (xt, t), Σθ(xt, t))
Ukuqeqeshwa Kwemodeli
Imodeli ye-Markov ehlehlayo ekhulisa amathuba edatha yokuqeqeshwa isetshenziselwa ukuqeqesha imodeli yokusabalalisa. Uma sikhuluma nje, ukuqeqeshwa kuyafana nokunciphisa ukuhluka okungaphezulu kwamathuba okungena okungalungile.
E [− log pθ(x0)] ≤ Eq − log pθ(x0:T ) q(x1:T |x0) = Eq − log p(xT ) − X t≥1 log pθ(xt−1|xt) q (xt|xt−1) =: L
models
Manje sidinga ukunquma ukuthi singayisebenzisa kanjani Imodeli yethu Yokuhlukanisa ngemva kokuthola isisekelo sezibalo somsebenzi womgomo wethu. Isinqumo esisodwa esidingekayo senqubo eya phambili ukunquma ishejuli yokuhluka, okuvamise ukuthi amanani ayo akhuphuke phakathi nenqubo.
Sicabanga ngokuqinile ukusebenzisa ipharamitha yokusabalalisa ye-Gaussian kanye nesakhiwo semodeli senqubo yokuhlehla.
Ukuphela kwesimo somklamo wethu ukuthi kokubili okokufaka nokukhiphayo kunobukhulu obufanayo. Lokhu kugcizelela izinga elikhulu lenkululeko elihlinzekwa yi-Diffusion Models.
Ngezansi, sizongena ekujuleni okwengeziwe ngalezi zinketho.
Phambili Inqubo
Kufanele sinikeze ishejuli yokuhluka ngokuphathelene nenqubo yokudlulisela phambili. Sizibeke ngokukhethekile ukuthi zibe yizimo ezincike esikhathini futhi singakunaki ukuthi kungenzeka zifundwe. Isheduli yokulandelana kwezenzakalo kusukela
β1 = 10−4 kuya ku-βT = 0.02.
Lt iba yinto engashintshiyo ngokuphathelene nesethi yethu yemingcele efundekayo ngenxa yeshejuli yokuhluka okugxilile, okusivumela ukuthi singakunaki phakathi nokuqeqeshwa kungakhathaliseki amanani athile akhethiwe.
Inqubo yokuhlehlisa
Manje sedlula izinqumo ezidingekayo ukuze sichaze inqubo ehlehliswayo. Khumbula ukuthi sikuchaze kanjani ukuhlehla kwezinguquko zikaMarkov njengoGaussian:
pθ(xt−1|xt) := N (xt−1; µθ (xt, t), Σθ(xt, t))
Manje njengoba sesihlonze izinhlobo zokusebenza. Naphezu kweqiniso lokuthi kukhona amasu eyinkimbinkimbi kakhulu parameterize, sisanda kusetha
Σθ(xt, t) = σ 2 t I
σ 2 t = βt
Ukukubeka ngenye indlela, sibheka i-multivariate Gaussian njengomphumela wama-Gaussia ahlukene anokwehluka okufanayo, inani elihlukile elingaguquguquka ngokuhamba kwesikhathi. Lokhu kuchezuka kusethwe ukuze kufane nohlelo lwezikhathi lokuchezuka kwenqubo yokudlulisela.
Njengomphumela walokhu kwakhiwa okusha, Sine:
pθ(xt−1|xt) := N (xt−1; µθ (xt, t), Σθ(xt, t)) :=N (xt−1; µθ (xt, t), σ2 t I)
Lokhu kubangela omunye umsebenzi wokulahlekelwa oboniswe ngezansi, ababhali abawuthole ukhiqiza ukuqeqeshwa okungaguquki kanye nemiphumela ephakeme kakhulu:
Lsimple(θ) := Et,x0, h − θ( √ α¯tx0 + √ 1 − α¯t, t) 2
Ababhali baphinde badwebe ukuxhumana phakathi kwalokhu kwakhiwa kwamamodeli okusabalalisa kanye namamodeli akhiqizayo asuselwa ku-Langevin. Njengokuthuthukiswa okuzimele nokuhambisanayo kwe-quantum physics esekelwe kumagagasi kanye ne-matrix-based quantum mechanics, eveze ukwakheka okubili okuqhathanisekayo kwesenzakalo esifanayo, kubonakala sengathi Amamodeli Okuhlukanisayo kanye namamodeli Asekelwe Kumaphuzu angaba izinhlangothi ezimbili zohlamvu lwemali olufanayo.
Ukwakhiwa Kwenethiwekhi
Ngaphandle kweqiniso lokuthi umsebenzi wethu wokulahlekelwa okufingqiwe uhlose ukuqeqesha imodeli Σθ, asikakanqumi ngesakhiwo sale modeli. Khumbula ukuthi imodeli kufanele nje ibe nobukhulu obufanayo bokufaka nokukhiphayo.
Ngokunikezwa kwalesi sici, cishe akulindelekile ukuthi izakhiwo ezifana ne-U-Net-like zivame ukusetshenziselwa ukudala amamodeli okusabalalisa izithombe.
Izinguquko eziningi zenziwe emzileni wenqubo yokuhlehla ngenkathi kusetshenziswa ukusabalalisa okunemibandela kwe-Gaussian okuqhubekayo. Khumbula ukuthi umgomo wenqubo yokuhlehla uwukwakha isithombe esenziwe ngamavelu ephikseli aphelele. Ngakho-ke, ukunquma amathuba ahlukene (welogi) wevelu ngayinye yephikseli engaba khona kuwo wonke amaphikseli kuyadingeka.
Lokhu kufezwa ngokunikeza isikhiphi sekhodi esihlukile ekuguqukeni kokugcina kochungechunge lokusabalalisa. ukulinganisa ithuba lesithombe esithile x0 inikezwe x1.
pθ(x0|x1) = YD i=1 Z δ+(xi 0 ) δ−(xi 0 ) N (x; µ i θ (x1, 1), σ2 1 ) dx
δ+(x) = ∞ uma x = 1 x + 1 255 uma x < 1 δ−(x) = −∞ uma x = −1 x − 1 255 uma x > −1
lapho umbhalo omkhulu I ubonisa ukukhishwa kwesixhumanisi esisodwa futhi u-D ebonisa inani lobukhulu kudatha.
Inhloso kuleli phuzu iwukuba kutholwe amathuba enani lenombolo ngayinye yephikseli ethile uma kubhekwa ukusatshalaliswa kwamanani anamandla aleyo pixel ngokwehlukana kwesikhathi. t=1.
Inhloso yokugcina
Imiphumela emikhulu kakhulu, ngokusho kososayensi, yavela ekubikezeleni ingxenye yomsindo wesithombe ngesikhathi esithile. Ekugcineni, basebenzisa umgomo olandelayo:
Lsimple(θ) := Et,x0, h − θ( √ α¯tx0 + √ 1 − α¯t, t) 2
Esithombeni esilandelayo, ukuqeqeshwa nezinqubo zesampula zemodeli yethu yokusabalalisa ziboniswa kafushane:
Izinzuzo Zemodeli Yokusabalalisa
Njengoba bese kubonisiwe, inani locwaningo lwamamodeli okusabalalisa liphindaphindeke muva nje. Amamodeli we-Diffusion manje aletha ikhwalithi yesithombe se-State-of-the-Art futhi akhuthazwa yi-thermodynamics engalingani.
Amamodeli we-Diffusion ahlinzeka ngezinhlobonhlobo zezinye izinzuzo ngaphezu kokuba nekhwalithi yesithombe esezingeni eliphezulu, njengokungadingi ukuqeqeshwa kwabantu abamelene nawe.
Izithiyo zokuqeqeshwa kwezitha zaziwa kabanzi, ngakho-ke kuvame ukukhetha ukukhetha okungezona okuphambene nokusebenza okulinganayo nempumelelo yokuqeqesha.
Amamodeli okusabalalisa nawo ahlinzeka ngezinzuzo zokukala nokuhambisana ngokusebenza ngempumelelo koqeqesho.
Nakuba ama-Diffusion Models ebonakala eveza imiphumela ebonakala sengathi iphuma emoyeni omncane, isisekelo sale miphumela sibekwe izinqumo eziningi ezicatshangelwayo nezithakazelisayo zezibalo, futhi izindlela ezingcono kakhulu zemboni zisathuthukiswa.
Isiphetho
Sengiphetha, abacwaningi babonisa okutholwe kokuhlanganiswa kwezithombe kwekhwalithi ephezulu kusetshenziswa amamodeli angenzeka wokusabalalisa, ikilasi lamamodeli aguquguqukayo acashile agqugquzelwa imibono evela ku-nonequilibrium thermodynamics.
Bazuze izinto ezinhle kakhulu ngenxa yemiphumela yabo ye-State-of-the-Art nokuqeqeshwa okungezona izitha futhi ngenxa yobuntwana babo, intuthuko eyengeziwe ingase ilindelwe eminyakeni ezayo.
Ikakhulukazi, kutholwe ukuthi amamodeli okusabalalisa abalulekile ekusebenzeni kwamamodeli athuthukile njenge-DALL-E 2.
Lapha ungafinyelela ucwaningo oluphelele.
shiya impendulo