M'zaka zaposachedwa, zitsanzo zopangira zotchedwa "diffusion models" zakhala zikudziwika kwambiri, komanso chifukwa chabwino.
Dziko lapansi lawona zomwe mitundu yofananira imatha kuchita, monga ma GAN opambana pamaphatikizidwe azithunzi, chifukwa cha zolemba zingapo zomwe zidasindikizidwa mu 2020s & 2021s.
Madokotala posachedwapa adawona kugwiritsa ntchito mitundu yofalitsa DALL-E2, Chitsanzo chopanga zithunzi cha OpenAI chomwe chinasindikizidwa mwezi watha.
Akatswiri ambiri a Machine Learning mosakayikira ali ndi chidwi chofuna kudziwa momwe ma Diffusion Models amagwirira ntchito chifukwa cha kupambana kwawo kwaposachedwa.
Mu positi iyi, tiwona zofotokozera za Diffusion Models, kapangidwe kake, zabwino zake, ndi zina zambiri. Tiyeni tizipita.
Kodi Diffusion model ndi chiyani?
Tiyeni tiyambe ndi kulingalira chifukwa chake chitsanzochi chikutchedwa chitsanzo cha diffusion.
Mawu okhudzana ndi thermodynamics m'makalasi a physics amatchedwa diffusion. Dongosolo silikhala lofanana ngati pali zinthu zambiri, monga fungo, pamalo amodzi.
Kufalikira kuyenera kuchitika kuti dongosololo lilowe mumgwirizano. Mamolekyu a fungo amafalikira mu dongosolo lonse kuchokera kudera lapamwamba kwambiri, kupangitsa dongosolo kukhala lofanana lonse.
Zonse pamapeto pake zimakhala zofanana chifukwa cha kufalikira.
Mitundu yosakanikirana imalimbikitsidwa ndi chikhalidwe cha thermodynamic chosafanana. Mitundu yosakanikirana imagwiritsa ntchito unyolo wa Markov, womwe ndi mndandanda wamitundu yosiyanasiyana pomwe mtengo wamtundu uliwonse umadalira momwe zidachitikira.
Kujambula chithunzi, motsatizana timawonjezera phokoso linalake kwa icho mu gawo lonse lakupita patsogolo.
Pambuyo posungira chithunzi chaphokoso, timapitiriza kupanga chithunzi chotsatira pamndandandawu poyambitsa phokoso lina.
Kangapo, njirayi imachitika. Chithunzi choyera chaphokoso chimachokera ku kubwereza njirayi kangapo.
Nanga tingapange bwanji chithunzi kuchokera pa chithunzi chophwanyidwa ichi?
Njira yofalitsa imasinthidwa pogwiritsa ntchito a neural network. Maukonde omwewo ndi zolemera zomwezo zimagwiritsidwa ntchito posinthira mmbuyo kuti apange chithunzi kuchokera ku t kupita ku t-1.
M'malo molola maukonde kuyembekezera chithunzicho, munthu akhoza kuyesa kulosera phokoso pa sitepe iliyonse, yomwe iyenera kuchotsedwa pa chithunzicho, kuti apitirize kuphweka ntchitoyo.
Mulimonsemo, a neural network design ziyenera kusankhidwa m'njira yomwe imasunga kukula kwa data.
Kumiza Kwambiri mu Diffusion Model
Zigawo za mtundu wa diffusion ndi njira yopita patsogolo (yomwe imadziwikanso kuti diffusion process), momwe datum (nthawi zambiri chithunzi) imamveka phokoso pang'onopang'ono, ndi njira yosinthira (yomwe imadziwikanso kuti reverse diffusion process), momwe phokoso limakhala. kutembenuzidwanso kukhala chitsanzo kuchokera kugawa komwe mukufuna.
Phokoso likakhala lotsika mokwanira, ma Gaussians okhazikika angagwiritsidwe ntchito kukhazikitsa masinthidwe a sampuli mumayendedwe akutsogolo. Kukonzekera kosavuta kwa njira yakutsogolo kumabwera chifukwa chophatikiza chidziwitso ichi ndi lingaliro la Markov:
q(x1:T |x0) := YT t=1 q(xt|xt−1), q(xt|xt−1) := N (xt; p 1 − βtxt−1, βtI)
apa chimodzi….T ndi ndondomeko ya kusiyana (mwina yophunzira kapena yokhazikika) yomwe imatsimikizira, pa T mokwanira, kuti xT ndi Gaussian isotropic.
Njira yosiyana ndi yomwe matsenga amitundu yosiyanasiyana amachitika. Mtunduwu umaphunzira kubweza kufalikira uku panthawi yophunzitsira kuti apange deta yatsopano. Chitsanzochi chimaphunzira kugawa pamodzi monga (x0:T) zotsatira zoyambira ndi phokoso loyera la Gaussian
(xT):=N(xT,0,ine).
pθ(x0:T ) := p(xT ) YT t=1 pθ(xt−1|xt), pθ(xt−1|xt) := N (xt−1; µθ (xt, t), Σθ( xt, t))
kumene magawo odalira nthawi ya Gaussian amapezedwa. Makamaka, zindikirani momwe mawonekedwe a Markov amanenera kuti kugawika kosinthika kosinthika kumatengera nthawi yam'mbuyo (kapena nthawi yotsatila, kutengera momwe mukuwonera):
pθ(xt−1|xt) := N (xt−1; µθ (xt, t), Σθ(xt, t))
Maphunziro a Chitsanzo
Njira yosinthira ya Markov yomwe imakulitsa kuthekera kwa data yophunzitsira imagwiritsidwa ntchito pophunzitsa mtundu wa kufalikira. Kunena zowona, maphunziro amafanana ndi kuchepetsa kusinthika kwapamwamba pa kuthekera kolakwika kwa chipika.
E [− chipika pθ(x0)] ≤ Eq − chipika pθ(x0:T ) q(x1:T |x0) = Eq − chipika p(xT ) − X t≥1 chipika pθ(xt−1|xt) q (xt|xt−1) =: L
zitsanzo
Tsopano tikuyenera kusankha momwe tingagwiritsire ntchito Diffusion Model yathu titakhazikitsa masamu a ntchito yathu. Chisankho chokhacho chomwe chimafunikira pakupititsa patsogolo ndikusankha ndandanda yosiyana, yomwe mitengo yake imakwera panthawi ya ndondomekoyi.
Timaganizira kwambiri kugwiritsa ntchito magawo ogawa a Gaussian ndi mapangidwe amitundu yosinthira.
Chikhalidwe chokha cha mapangidwe athu ndikuti zolowetsa ndi zotuluka zimakhala ndi miyeso yofanana. Izi zikutsimikizira kukula kwaufulu komwe Diffusion Models amapereka.
M'munsimu, tifika mozama za zosankha izi.
Forward Process
Tiyenera kupereka ndondomeko yosiyana pokhudzana ndi ndondomeko yopita patsogolo. Tidawayika kuti akhale okhazikika omwe amadalira nthawi ndikunyalanyaza mwayi woti atha kuphunziridwa. Ndandanda yanthawi yochokera
β1 = 10−4 mpaka βT = 0.02.
Lt imakhala yokhazikika polemekeza gawo lathu la magawo omwe tingaphunzire chifukwa cha ndandanda yokhazikika, zomwe zimatilola kunyalanyaza panthawi yophunzitsidwa mosasamala kanthu za zomwe zasankhidwa.
Reverse Process
Tsopano timayang'ana zisankho zofunika kuti tifotokozere m'mbuyo. Kumbukirani momwe tidafotokozera kusintha kwa Markov ngati Gaussian:
pθ(xt−1|xt) := N (xt−1; µθ (xt, t), Σθ(xt, t))
Tsopano popeza tazindikira mitundu yogwira ntchito. Ngakhale kuti pali njira zovuta kwambiri zopangira parameterize, tangokhazikitsa
Σθ(xt, t) = σ 2 t I
σ 2 t = βt
Kufotokozera mwanjira ina, timawona kuti Gaussian multivariate ndi zotsatira za ma Gaussia osiyana omwe ali ndi kusiyana komweko, mtengo wosiyana womwe ukhoza kusinthasintha pakapita nthawi. Zopatuka izi zakhazikitsidwa kuti zigwirizane ndi nthawi yopatuka kwa njira zotumizira.
Chifukwa cha mapangidwe atsopanowa, tili ndi:
pθ(xt−1|xt) := N (xt−1; µθ (xt, t), Σθ(xt, t)) :=N (xt−1; µθ (xt, t), σ2 t I)
Izi zimabweretsa ntchito ina yotayika yomwe ili pansipa, yomwe olemba adapeza kuti ikupanga maphunziro osasinthika komanso zotsatira zabwino kwambiri:
Lsimple(θ) := Et,x0, h − θ( √ α¯tx0 + √ 1 − α¯t, t) 2
Olembawo amalumikizananso pakati pa mapangidwe awa amitundu yofananira ndi mitundu yofananira yofananira ndi Langevin. Monga momwe zimakhalira ndi chitukuko chodziyimira pawokha komanso chofananira cha ma wave-based quantum physics ndi matrix-based quantum mechanics, zomwe zidawulula mitundu iwiri yofananira ya zochitika zomwezo, zikuwoneka kuti Diffusion Models ndi Score-Based models zitha kukhala mbali ziwiri zandalama imodzi.
Zojambula Zamakono
Ngakhale kuti ntchito yathu yotayika yofupikitsidwa ikufuna kuphunzitsa chitsanzo Σθ, sitinasankhebe za zomangamanga za chitsanzo ichi. Kumbukirani kuti chitsanzocho chiyenera kukhala ndi miyeso yofanana yolowera ndi yotulutsa.
Potengera izi, mwina sizosayembekezereka kuti zomanga zonga za U-Net zimagwiritsidwa ntchito nthawi zambiri kupanga mitundu yosinthira zithunzi.
Zosintha zambiri zimapangidwira panjira yobwerera m'mbuyo pogwiritsa ntchito magawidwe a Gaussian mosalekeza. Kumbukirani kuti cholinga cha njira yosinthira ndikupangira chithunzi chopangidwa ndi ma pixel angapo. Kuzindikira kuthekera kwapang'onopang'ono (logi) pamtengo uliwonse womwe ungakhalepo pa ma pixel onse ndikofunikira.
Izi zimatheka popereka decoder yosiyana ku kusintha komaliza kwa chain diffusion. kuyerekeza mwayi wa chithunzi china x0 wapatsidwa x1.
pθ(x0|x1) = YD i=1 Z δ+(xi 0 ) δ−(xi 0 ) N (x; µ i θ (x1, 1), σ2 1 ) dx
δ+(x) = ∞ ngati x = 1 x + 1 255 ngati x <1 δ−(x) = −∞ ngati x = −1 x − 1 255 ngati x > -1
kumene superscript I imatanthauza kuchotsedwa kwa mgwirizano umodzi ndipo D imasonyeza chiwerengero cha miyeso mu deta.
Cholinga pakadali pano ndikukhazikitsa mwayi wamtengo wamtundu uliwonse wa pixel inayake potengera kugawidwa kwa zinthu zomwe zingatheke pa pixelyo panthawi yosiyana. t=1.
Cholinga Chomaliza
Zotsatira zazikulu kwambiri, malinga ndi asayansi, zidabwera chifukwa choneneratu za phokoso la chithunzi pa nthawi inayake. Pomaliza, amagwiritsa ntchito zolinga zotsatirazi:
Lsimple(θ) := Et,x0, h − θ( √ α¯tx0 + √ 1 − α¯t, t) 2
Pachithunzi chotsatirachi, njira zophunzitsira ndi zitsanzo za mtundu wathu wofalitsa zikuwonetsedwa mwachidule:
Ubwino wa Diffusion Model
Monga momwe zasonyezedwera kale, kuchuluka kwa kafukufuku wa zitsanzo za kufalitsa kwachulukira posachedwa. Ma Diffusion Models tsopano akupereka mawonekedwe amtundu wa State-of-the-Art ndipo amalimbikitsidwa ndi non-equilibrium thermodynamics.
Ma Diffusion Models amapereka maubwino ena osiyanasiyana kuphatikiza kukhala ndi zithunzi zotsogola, monga zosafunikira maphunziro a adani.
Zoyipa za maphunziro a adani zimadziwika kwambiri, chifukwa chake nthawi zambiri zimakhala bwino kusankha njira zomwe sizingagwirizane ndi momwe zimagwirira ntchito komanso luso lophunzitsira.
Mitundu yophatikizika imaperekanso zabwino za scalability ndi kufananiza pochita bwino ndi maphunziro.
Ngakhale Ma Diffusion Models akuwoneka kuti akupanga zotulukapo zowoneka ngati zopanda mpweya, maziko azotsatirazi amayalidwa ndi zisankho zingapo zoganizira komanso zosangalatsa zamasamu, ndipo njira zabwino zamabizinesi zikupangidwabe.
Kutsiliza
Pomaliza, ochita kafukufuku akuwonetsa zomwe zapezedwa pazithunzi zapamwamba kwambiri pogwiritsa ntchito mitundu yofananira, gulu lamitundu yosinthika yomwe imalimbikitsidwa ndi malingaliro ochokera ku nonequilibrium thermodynamics.
Achita zinthu zabwino kwambiri chifukwa cha zotsatira za State-of-the-Art komanso maphunziro omwe sanagwirizane ndi adani komanso chifukwa cha ukhanda wawo, kupita patsogolo kowonjezereka kungayembekezeredwe m'zaka zikubwerazi.
Makamaka, zadziwika kuti mitundu yofananira ndiyofunikira pakugwira ntchito kwamitundu yapamwamba ngati DALL-E 2.
apa mutha kupeza kafukufuku wathunthu.
Siyani Mumakonda