Mumakore achangopfuura, mamodheru ekudzidza zvakadzika ave akanyanya kushanda pakunzwisisa mutauro wevanhu.
Funga nezvemapurojekiti akadai GPT-3, iyo yava kukwanisa kugadzira zvinyorwa zvese uye mawebhusaiti. GitHub ichangoburwa GitHub Copilot, sevhisi inopa kodhi yese snippets nekungotsanangura mhando yekodhi yaunoda.
Vatsvagiri veOpenAI, Facebook, neGoogle vanga vachishanda munzira dzekushandisa kudzidza kwakadzama kubata rimwe basa: kutora mifananidzo. Vachishandisa dataset hombe ine mamirioni ezvinyorwa, ivo vauya nezvimwe zvinoshamisa zvawanikwa.
Nguva pfupi yadarika, vaongorori ava vakaedza kuita iro rakapesana nebasa: kugadzira mifananidzo kubva mumusoro. Zvinoita here kugadzira mufananidzo mutsva kubva murondedzero?
Gwaro iri rinoongorora maviri emhando yepamusoro-yeku-chifananidzo mhando: OpenAI's DALL-E 2 uye Google's Imagen AI. Imwe neimwe yemapurojekiti aya yakaunza nzira dzepasi dzinogona kushandura nzanga sezvatinozviziva.
Asi chekutanga, ngatinzwisise zvatiri kureva nechizvarwa-kune-mufananidzo chizvarwa.
Chii chinonzi kugadzirwa kwemavara-kune-mufananidzo?
Zvinyorwa-kune-mufananidzo mhando bvumira makomputa kugadzira mitsva uye yakasarudzika mifananidzo inoenderana nekurudziro. Vanhu vanogona zvino kupa tsananguro yemavara echifananidzo chavanoda kugadzira, uye modhi yacho ichaedza kugadzira mufananidzo unoenderana netsanangudzo yacho zvakanyanya sezvinobvira.
Mamodheru ekudzidza emuchina akawedzera kushandiswa kwemaseti makuru ane mufananidzo-mapeji emifananidzo kuti uwedzere kunatsiridza mashandiro.
Zvizhinji-mavara-ku-mufananidzo mhando dzinoshandisa chimiro chemutauro we transformer kududzira zvirevo. Mhando iyi yemuenzaniso ndeye neural network iyo inoedza kudzidza mamiriro uye semantic zvinoreva mutauro wechisikigo.
Tevere, generative modhi dzakadai kupararira mienzaniso uye generative adversarial network anoshandiswa kugadzira mufananidzo.
Chii chinonzi DALLE 2?
DALL-E2 imodhi yekombuta neOpenAI iyo yakaburitswa muna Kubvumbi 2022. Iyo modhi yakadzidziswa pane dhatabhesi yemamiriyoni emifananidzo yakanyorwa kubatanidza mazwi nemitsara kumifananidzo.
Vashandisi vanogona kunyora mutsara wakapfava, wakadai sekuti “katsi inodya lasagna”, uye DALL-E 2 inoburitsa dudziro yayo yezviri kuedza kutsanangura mutsara.
Kunze kwekugadzira mifananidzo kubva pakutanga, DALL-E 2 inogona zvakare kugadzirisa mifananidzo iripo. Mumuenzaniso uri pazasi, DALL-E yakakwanisa kugadzira mufananidzo wakagadziridzwa wekamuri ine sofa yakawedzerwa.
DALL-E 2 ingori imwe yemapurojekiti akawanda akafanana neOpenAI yakaburitsa mumakore mashoma apfuura. OpenAI's GPT-3 yakava nhau apo yaitaridza kuburitsa mavara akasiyana masitaera.
Parizvino, DALL-E 2 ichiri mukuyedza beta. Vashandisi vanofarira vanogona kusaina yavo chinyorwa chekumirira uye kumirira kuwana.
Chinoshanda sei?
Nepo mhedzisiro yeDALL-E 2 ichinakidza, unogona kunge uchinetseka kuti zvese zvinoshanda sei.
DALL-E 2 muenzaniso wekuitwa kwakawanda kweOpenAI's GPT-3 chirongwa.
Chekutanga, chirevo chemushandisi chinoiswa mune encoder yemavara inoburitsa kukurumidza kunzvimbo yekumiririra. DALL-E 2 inoshandisa imwe OpenAI modhi inonzi CLIP ( Contrastive Language-Image Pre-Training) kuwana ruzivo rwesemantic kubva mumutauro wechisikigo.
Zvadaro, muenzaniso unozivikanwa se mushure inomepu mavara encoding kuita mufananidzo encoding. Iyi encoding yemufananidzo inofanirwa kutora iyo semantic ruzivo runowanikwa mune yemavara encoding nhanho.
Kugadzira iwo mufananidzo chaiwo, DALL-E 2 inoshandisa decoder yemufananidzo kugadzira chinooneka uchishandisa semantic ruzivo uye mufananidzo encoding data. OpenAI inoshandisa yakagadziridzwa vhezheni ye RUDO modhi yekugadzira mufananidzo. GLIDE inotsamira pane a diffusion model kugadzira mifananidzo.
Kuwedzerwa kweGLIDE kune iyo DALL-E 2 modhi yakagonesa mamwe mafotorealistic kubuda. Sezvo iyo GLIDE modhi iri stochastic kana kusarongeka, iyo DALL-E 2 modhi inogona nyore kugadzira misiyano nekumhanyisa modhi zvakare uye zvakare.
Nokuremara
Kunyangwe paine zvinokatyamadza mhedzisiro yeiyo DALL-E 2 modhi, ichiri kutarisana nezvimwe zvisingakwanisi.
Chiperengo Chinyorwa
Mazano anoyedza kugadzira DALL-E 2 abudise mavara anoratidza kuti ane dambudziko rekuperetera mazwi. Nyanzvi dzinofunga kuti izvi zvingadaro nekuti ruzivo rwechiperengo hachisi chikamu che kudzidzisa dataset.
Compositional Kurangarira
Vatsvagiri vanocherekedza kuti DALL-E 2 ichine imwe dambudziko nekugadzirisa kufunga. Zvichitaurwa zviri nyore, iyo modhi inogona kunzwisisa ega ega maficha echifananidzo uchiri kunetsekana kuona hukama huripo pakati pezvinhu izvi.
Semuenzaniso, kana ikapihwa nekukurumidza "red cube pamusoro peblue cube", DALL-E inogadzira cube yebhuruu uye cube tsvuku nemazvo asi inotadza kuzviisa nemazvo. Iyo modhi yakaonekwa zvakare kuve nekunetsekana nekukurudzira kunoda nhamba chaiyo yezvinhu kuti ibudiswe.
Kusarura mune dataset
Kana iyo yekukurumidza isina humwe humbowo, DALL-E yakacherechedzwa kuratidza vachena kana vanhu vekuMadokero nenzvimbo. Kurerekera uku kunoitika nekuda kwekuwanda kwemifananidzo yeWestern-centric mudataset.
Iyo modhi yakacherechedzwa zvakare kutevedzera stereotypes yevakadzi. Semuyenzaniso, kutaipa kwechimbichimbi "mushandi wemundege" kunowanzo gadzira mifananidzo yevakadzi vanotarisira ndege.
Chii chinonzi Google Imagen AI?
Google's Imagen AI imodhi ine chinangwa chekugadzira mafotorealistic mifananidzo kubva kune inopinza zvinyorwa. Zvakafanana neDALL-E, modhi yacho inoshandisawo mamodhiyo emutauro wekushandura kuti unzwisise zvinyorwa uye unovimba nekushandiswa kwemamodheru ekuparadzira kugadzira mifananidzo yemhando yepamusoro.
Padivi peImagen, Google yakaburitsawo bhenji remavara-kune-mufananidzo modhi inonzi DrawBench. Vachishandisa DrawBench, vakakwanisa kuona kuti mareti evanhu aifarira kubuda kweImagen pane mamwe mamodheru anosanganisira DALL-E 2.
Chinoshanda sei?
Zvakafanana neDALL-E, Imagen inotanga kushandura mushandisi wekukurumidza kuita chinyorwa chinomisikidzwa kuburikidza nechando chinyorwa encoder.
Imagen inoshandisa diffusion modhi inodzidza kushandura patani yeruzha kuita mifananidzo. Iyo yekutanga kubuda kwemifananidzo iyi yakaderera resolution uye gare gare inopfuudzwa neimwe modhi inozivikanwa seyepamusoro-resolution diffusion modhi kuti iwedzere kugadziriswa kwemufananidzo wekupedzisira. Yekutanga diffusion modhi inoburitsa 64 × 64 pixel mufananidzo uye inozofuridzirwa kumusoro-resolution 1024 × 1024 mufananidzo.
Zvichienderana nekutsvagisa kwechikwata cheImagen, mamodheru emitauro yakaomeswa nechando akadzidziswa chete pane zvinyorwa achiri kushanda zvakanyanya maencoder emavara ekugadzira mavara-kune-mufananidzo.
Chidzidzo chacho chinosumawo pfungwa ye dynamic thresholding. Iyi nzira inoita kuti mifananidzo ionekwe zvakanyanya photorealistic nekuwedzera kutungamira huremu paunenge uchigadzira mufananidzo.
Kuita kweDALLE 2 vs Imagen
Mhedzisiro yekutanga kubva kuGoogle's benchmark inoratidza kuti vanhu vakapindura vanofarira mifananidzo yakagadzirwa neImagen pamusoro peDALL-E 2 uye mamwe mavara-kune-mufananidzo modhi seLatent Diffusion uye VQGAN+CLIP.
Kubuda kunobva kuchikwata cheImagen kwakaratidzawo kuti modhi yavo inoita zvirinani pamavara echiperengo, kusasimba kunozivikanwa kweiyo DALL-E 2 modhi.
Nekudaro, sezvo Google haisati yaburitsa modhi kune veruzhinji, ichiri kuoneka kuti mabhenji eGoogle akarurama sei.
mhedziso
Kusimuka kwemafotorealistic text-to-image modhi kune gakava nekuti aya mamodheru akaibva kushandiswa zvisina kunaka.
Iyo tekinoroji inogona kutungamira mukugadzirwa kwezvakajeka zvemukati kana sechombo che disinformation. Vatsvagiri vanobva kuGoogle neOpenAI vanoziva izvi, zvinova zvimwe chikonzero nei matekinoroji aya asati awanikwa nemunhu wese.
Mavara-kune-mufananidzo mamodheru anewo zvakakosha zvehupfumi. Ko hunyanzvi hwakadai semamodheru, vatori vemifananidzo, uye maartist anokanganisika kana mamodheru akadai seDALL-E akave makuru?
Parizvino, aya mamodheru achine zvisingakwanisi. Kubata chero mufananidzo wakagadzirwa neAI kuti uongororwe kunoburitsa kusakwana kwayo. Nezvose zviri zviviri OpenAI neGoogle zvichikwikwidzana nemamodhi akanyanya kushanda, ingangove nyaya yenguva isati yabuda yakanyatso kuburitsa: mufananidzo usinga zivikanwe kubva kune chaicho.
Iwe unofunga kuti chii chichaitika kana tekinoroji ikaenda kure kudaro?
Leave a Reply