M'zaka zaposachedwapa, zitsanzo zozama zakhala zogwira mtima pomvetsetsa chinenero cha anthu.
Ganizirani ntchito ngati GPT-3, yomwe tsopano ikutha kupanga zolemba zonse ndi mawebusaiti. GitHub yatulutsa posachedwa Wolemba GitHub, ntchito yomwe imapereka zidule za ma code pofotokoza mtundu wa code yomwe mukufuna.
Ofufuza ku OpenAI, Facebook, ndi Google akhala akugwiritsa ntchito njira zophunzirira mwakuya kuti agwire ntchito ina: kujambula zithunzi. Pogwiritsa ntchito gulu lalikulu la data lomwe lili ndi mamiliyoni a zolemba, apanga zina zodabwitsa zotsatira.
Posachedwapa, ofufuzawa ayesa kuchita ntchito yosiyana: kupanga zithunzi kuchokera pamutu. Kodi tsopano ndi zotheka kupanga chithunzi chatsopano kuchokera ku malongosoledwe ake?
Bukuli liwunika mitundu iwiri yapamwamba kwambiri yosinthira zithunzi: OpenAI's DALL-E 2 ndi Google's Imagen AI. Iliyonse mwa mapulojekitiwa yakhazikitsa njira zotsogola zomwe zingasinthe anthu monga tikudziwira.
Koma choyamba, tiyeni timvetsetse zomwe tikutanthauza ponena za m'badwo wamawu ndi zithunzi.
Kodi kupanga mawu ndi zithunzi ndi chiyani?
Zitsanzo za malemba ndi zithunzi lolani makompyuta kuti apange zithunzi zatsopano komanso zapadera potengera zomwe akuuzidwa. Anthu tsopano atha kupereka mafotokozedwe a malemba a chithunzi chomwe akufuna kupanga, ndipo chitsanzocho chidzayesa kupanga chithunzi chomwe chikugwirizana ndi kufotokozera mozama momwe zingathere.
Mitundu yophunzirira pamakina yathandizira kugwiritsa ntchito zida zazikulu zokhala ndi zilembo zazithunzi kuti ziwongolere bwino magwiridwe antchito.
Zolemba zambiri pazithunzi zitsanzo zimagwiritsa ntchito chilankhulo cha transformer kutanthauzira malangizo. Mtundu uwu wa chitsanzo ndi neural network amene amayesa kuphunzira nkhani ndi tanthauzo la semantic chinenero chilengedwe.
Kenako, zitsanzo zopangira monga zitsanzo zofalitsa ndi maukonde oyambitsa adani amagwiritsidwa ntchito popanga zithunzi.
Kodi DALLE 2 ndi chiyani?
DALL-E2 ndi kompyuta yopangidwa ndi OpenAI yomwe idatulutsidwa mu Epulo 2022. Chitsanzocho chinaphunzitsidwa pazithunzithunzi za mamiliyoni a zithunzi zolembedwa kuti zigwirizane ndi mawu ndi ziganizo ndi zithunzi.
Ogwiritsa ntchito amatha kulemba mawu osavuta, monga "mphaka akudya lasagna", ndipo DALL-E 2 ipanga kutanthauzira kwake komwe mawuwa akuyesera kufotokoza.
Kupatula kupanga zithunzi kuyambira poyambira, DALL-E 2 imathanso kusintha zithunzi zomwe zilipo. Muchitsanzo chomwe chili pansipa, DALL-E adatha kupanga chithunzi chosinthidwa cha chipinda chokhala ndi sofa yowonjezera.
DALL-E 2 ndi imodzi mwazinthu zofananira zomwe OpenAI yatulutsa zaka zingapo zapitazi. OpenAI's GPT-3 idakhala yodziwika bwino pomwe imawoneka kuti ikupanga zolemba zamitundu yosiyanasiyana.
Pakadali pano, DALL-E 2 ikadali pakuyezetsa kwa beta. Ogwiritsa ntchito achidwi atha kulembetsa awo mndandanda wa odikirira ndikudikirira kufikira.
Kodi Zimagwira Ntchito Bwanji?
Ngakhale zotsatira za DALL-E 2 ndizochititsa chidwi, mungakhale mukuganiza momwe zonsezi zimagwirira ntchito.
DALL-E 2 ndi chitsanzo cha kukhazikitsidwa kosiyanasiyana kwa polojekiti ya OpenAI's GPT-3.
Choyamba, mawu a wogwiritsa ntchito amayikidwa mu encoder yamawu yomwe imayika chidziwitso ku malo oyimira. DALL-E 2 imagwiritsa ntchito mtundu wina wa OpenAI wotchedwa CLIP ( Contrastive Language-Image Pre-Training) kuti mupeze chidziwitso cha semantic kuchokera kuchilankhulo chachilengedwe.
Kenako, chitsanzo chotchedwa isanafike imayika zolembazo kukhala encoding yazithunzi. Kusindikiza kwazithunziku kuyenera kujambula chidziwitso cha semantic chopezeka mu sitepe ya encoding.
Kuti apange chithunzi chenicheni, DALL-E 2 amagwiritsa ntchito chojambula zithunzi kuti apange zowoneka pogwiritsa ntchito chidziwitso cha semantic ndi tsatanetsatane wa encoding ya zithunzi. OpenAI imagwiritsa ntchito mtundu wosinthidwa wa YANDIKIRA model kuti apange chithunzi kupanga. GLIDE amadalira a kufalitsa chitsanzo kupanga zithunzi.
Kuphatikizika kwa GLIDE ku mtundu wa DALL-E 2 kunathandizira kutulutsa kwazithunzi. Popeza mtundu wa GLIDE ndi wokhazikika kapena wokhazikika mwachisawawa, mtundu wa DALL-E 2 ukhoza kupanga masinthidwe mosavuta poyendetsa mtunduwo mobwerezabwereza.
sitingathe
Ngakhale zotsatira zochititsa chidwi za mtundu wa DALL-E 2, zimakumanabe ndi zolepheretsa.
Mawu a Mawu
Kulimbikitsa komwe kumayesa kupanga DALL-E 2 kutulutsa mawu kumawonetsa kuti zimakhala ndi vuto la kalembedwe. Akatswiri akuganiza kuti izi zitha kukhala chifukwa chidziwitso cha kalembedwe si gawo la maphunziro a dataset.
Kukambitsirana Kokhazikika
Ofufuza awona kuti DALL-E 2 akadali ndi vuto ndi kulingalira kophatikiza. Mwachidule, chitsanzochi chimatha kumvetsetsa mbali zonse za chithunzi pomwe chimakhalabe ndi vuto kudziwa mgwirizano wazinthu izi.
Mwachitsanzo, ngati atapatsidwa mwachangu "kyubu yofiyira pamwamba pa cube ya buluu", DALL-E ipanga cube yabuluu ndi kyubu yofiyira molondola koma amalephera kuziyika bwino. Chitsanzocho chawonedwanso kuti chimakhala ndi vuto ndi zolimbikitsa zomwe zimafuna kuti chiwerengero cha zinthu chitulutsidwe.
Kukondera mu dataset
Ngati chidziwitsocho chilibe zina, DALL-E yawonedwa kuti ikuwonetsa azungu kapena azungu ndi malo. Kukondera koyimiliraku kumachitika chifukwa cha kuchuluka kwa zithunzi zakumadzulo zomwe zili mu dataset.
Chitsanzochi chawonedwanso kuti chikutsatira malingaliro a amuna ndi akazi. Mwachitsanzo, kulemba "wothandizira ndege" nthawi zambiri kumapanga zithunzi za azimayi oyendetsa ndege.
Kodi Google Imagen AI ndi chiyani?
Google Chithunzi cha AI ndi chitsanzo chomwe cholinga chake ndi kupanga zithunzi za photorealistic kuchokera ku malemba olowetsa. Mofanana ndi DALL-E, chitsanzocho chimagwiritsanso ntchito zitsanzo za zilankhulo za transformer kuti zimvetsetse malembawo ndipo zimadalira kugwiritsa ntchito zitsanzo zofalitsa kuti apange zithunzi zapamwamba.
Pamodzi ndi Imagen, Google yatulutsanso benchmark yamitundu yojambula zithunzi yotchedwa DrawBench. Pogwiritsa ntchito DrawBench, adatha kuwona kuti owonera anthu amakonda kutulutsa kwa Imagen kuposa mitundu ina kuphatikiza DALL-E 2.
Kodi Zimagwira Ntchito Bwanji?
Zofanana ndi DALL-E, Imagen imayamba kutembenuza wogwiritsa ntchito kukhala mawu ophatikizika kudzera pa encoder yachisanu.
Imagen imagwiritsa ntchito mtundu wa diffusion womwe umaphunzira kusintha mtundu wa phokoso kukhala zithunzi. Kutulutsa koyambirira kwa zithunzizi kumakhala kocheperako ndipo pambuyo pake amadutsa mumtundu wina wotchedwa super-resolution diffusion model kuti awonjezere kusintha kwa chithunzi chomaliza. Mtundu woyamba wophatikiza umatulutsa chithunzi cha pixel 64 × 64 ndipo pambuyo pake amawomberedwa mpaka chithunzi chapamwamba kwambiri cha 1024 × 1024.
Kutengera ndi kafukufuku wa gulu la Imagen, mitundu yayikulu yoyimitsidwa yophunzitsidwa pamawu okha akadali ma encoder amphamvu kwambiri pakusintha mawu kupita kuzithunzi.
Phunziroli limayambitsanso lingaliro la dynamic thresholding. Njirayi imathandizira kuti zithunzi ziziwoneka bwino kwambiri powonjezera zolemera zowongolera popanga chithunzicho.
Kuchita kwa DALLE 2 vs Imagen
Zotsatira zoyambilira zochokera ku benchmark ya Google zikuwonetsa kuti anthu omwe adafunsidwa amakonda zithunzi zopangidwa ndi Imagen kuposa DALL-E 2 ndi mitundu ina yotengera zithunzi monga Latent Diffusion ndi VQGAN+CLIP.
Kutulutsa kochokera ku gulu la Imagen kwawonetsanso kuti mtundu wawo umachita bwino pamalembedwe amalembedwe, kufooka kodziwika kwa mtundu wa DALL-E 2.
Komabe, popeza Google sinatulutse mtunduwo kwa anthu, zikuwonekerabe kuti ma benchmarks a Google ndi olondola bwanji.
Kutsiliza
Kuwonjezeka kwa zithunzi zojambulidwa ndi zithunzi ndizotsutsana chifukwa zitsanzozi ndizokhwima kuti zigwiritsidwe ntchito molakwika.
Ukadaulo ukhoza kupangitsa kuti pakhale zinthu zomveka bwino kapena ngati chida chowonongera chidziwitso. Ofufuza kuchokera ku Google ndi OpenAI akudziwa izi, zomwe ndichifukwa chake matekinolojewa sakupezekabe kwa aliyense.
Zithunzi zojambulidwa ndi zithunzi zilinso ndi zovuta zachuma. Kodi ntchito monga zitsanzo, ojambula, ndi ojambula zithunzi zidzakhudzidwa ngati zitsanzo monga DALL-E zikhale zofala?
Pakalipano, zitsanzozi zimakhalabe ndi malire. Kusunga chithunzi chilichonse chopangidwa ndi AI kuti muwunikenso kudzawulula zolakwika zake. Pokhala ndi OpenAI ndi Google akupikisana pamitundu yothandiza kwambiri, zitha kukhala nthawi yayitali kuti zotulutsa zangwiro zipangidwe: chithunzi chomwe sichingasiyanitsidwe ndi chenicheni.
Kodi mukuganiza kuti chidzachitika chiyani tekinoloje ikadzafika pamenepo?
Siyani Mumakonda