Eminyakeni yamuva nje, amamodeli okufunda ngokujulile asebenze kangcono ekuqondeni ulimi lwabantu.
Cabanga ngamaphrojekthi afana GPT-3, manje esekwazi ukudala ama-athikili namawebhusayithi wonke. I-GitHub isanda kwethula IGitHub Copilot, isevisi enikeza wonke amazwibela ekhodi ngokuchaza nje uhlobo lwekhodi oludingayo.
Abacwaningi bakwa-OpenAI, Facebook, kanye ne-Google bebelokhu besebenza ngezindlela zokusebenzisa ukufunda okujulile ukuze baphathe omunye umsebenzi: ukubhala amagama-ncazo izithombe. Besebenzisa isethi yedatha enkulu enezigidi zokufakiwe, baqhamuke nokunye kusimangaze Imiphumela.
Muva nje, laba bacwaningi bazame ukwenza umsebenzi ophambene: ukudala izithombe kusuka kumagama-ncazo. Ingabe manje kungenzeka ukudala isithombe esisha ngokuphelele ngaphandle kwencazelo?
Lo mhlahlandlela uzohlola amamodeli amabili athuthuke kakhulu okuthumela isithombe-esithombeni: I-OpenAI's DALL-E 2 kanye ne-Imagen AI ye-Google. Ngayinye yale phrojekthi yethule izindlela eziyisisekelo ezingase ziguqule umphakathi ngendlela esiwazi ngayo.
Kodwa okokuqala, ake siqonde ukuthi sisho ukuthini ngokukhiqiza umbhalo uye esithombeni.
Kuyini ukukhiqizwa kombhalo kuya esithombeni?
Amamodeli wombhalo uye esithombeni vumela amakhompyutha ukuthi enze izithombe ezintsha nezihlukile ngokusekelwe ekwazisweni. Abantu manje banganikeza incazelo yombhalo yesithombe abafuna ukusikhiqiza, futhi imodeli izozama ukudala okubonakalayo okufana naleyo ncazelo eduze ngangokunokwenzeka.
Amamodeli okufunda omshini asebenzise kakhulu ukusetshenziswa kwamadathasethi amakhulu aqukethe ukubhanqwa kwamagama-ncazo wesithombe ukuze kuthuthukiswe ukusebenza.
Iningi lombhalo uye esithombeni amamodeli asebenzisa imodeli yolimi lwe-transformer ukuhumusha imiyalelo. Lolu hlobo lwemodeli a inethiwekhi ye-neural ezama ukufunda umongo kanye nencazelo ye-semantiki yolimi lwemvelo.
Okulandelayo, amamodeli okukhiqiza afana amamodeli wokusabalalisa kanye namanethiwekhi okuphikisana nawo akhiqizayo asetshenziselwa ukuhlanganiswa kwezithombe.
Yini i-DALLE 2?
I-DALL-E2 imodeli yekhompyutha ye-OpenAI eyakhululwa ngo-April 2022. Imodeli yaqeqeshwa kusizindalwazi sezigidi zezithombe ezinelebula ukuze zihlobanise amagama nemishwana ezithombeni.
Abasebenzisi bangathayipha umushwana olula, njengokuthi “ikati elidla i-lasagna”, futhi i-DALL-E 2 izokhiqiza eyayo incazelo yalokho igama elizama ukuchaza.
Ngaphandle kokudala izithombe kusukela ekuqaleni, i-DALL-E 2 ingaphinda ihlele izithombe ezikhona. Esibonelweni esingezansi, i-DALL-E ikwazile ukwenza isithombe esilungisiwe segumbi elinosofa owengeziwe.
I-DALL-E 2 ingenye nje yamaphrojekthi amaningi afanayo i-OpenAI ekhishwe eminyakeni embalwa edlule. I-OpenAI's GPT-3 ibe yizindaba lapho ibonakala ikhiqiza umbhalo wezitayela ezahlukene.
Okwamanje, i-DALL-E 2 isekuhlolweni kwe-beta. Abasebenzisi abanentshisekelo bangabhalisela zabo uhlu lokulinda bese ulinda ukufinyelela.
Isebenza kanjani?
Yize imiphumela ye-DALL-E 2 ihlaba umxhwele, ungahle uzibuze ukuthi kusebenza kanjani konke.
I-DALL-E 2 iyisibonelo sokuqaliswa kwe-multimodal yephrojekthi ye-OpenAI's GPT-3.
Okokuqala, umyalo wombhalo womsebenzisi ufakwa kusifaki khodi sombhalo esibeka imephu ukwaziswa endaweni yokumela. I-DALL-E 2 isebenzisa enye imodeli ye-OpenAI ebizwa ngokuthi i-CLIP ( Contrastive Language-Image Pre-Training) ukuze ithole ulwazi lwe-semantic kusuka olimini lwemvelo.
Okulandelayo, imodeli eyaziwa ngokuthi ngaphambi yenza imephu yombhalo wekhodi ibe umbhalo wekhodi. Lokhu kufakwa kwekhodi kwesithombe kufanele kuthwebule ulwazi lwe-semantic olutholakala esinyathelweni sombhalo wekhodi.
Ukuze udale isithombe sangempela, i-DALL-E 2 isebenzisa isikhiphi sesithombe ukwenza okubonakalayo kusetshenziswa ulwazi lwe-semantic kanye nemininingwane yombhalo wekhodi. I-OpenAI isebenzisa inguqulo eguquliwe ye- IGLIDE imodeli ukwenza ukukhiqizwa kwesithombe. I-GLIDE incike ku-a imodeli yokusabalalisa ukwakha izithombe.
Ukwengezwa kwe-GLIDE kumodeli ye-DALL-E 2 kunikeze amandla okukhiphayo okunezithombe eziningi. Njengoba imodeli ye-GLIDE imile noma inqunywa ngokungahleliwe, imodeli ye-DALL-E 2 ingakwazi ukudala ukuhlukahluka ngokusebenzisa imodeli ngokuphindaphindiwe.
Ukulinganiselwa
Naphezu kwemiphumela emihle yemodeli ye-DALL-E 2, isabhekene nemikhawulo ethile.
Umbhalo Wamagama
Imiyalo ezama ukwenza i-DALL-E 2 ikhiqize umbhalo iveza ukuthi inobunzima bokupela amagama. Ochwepheshe bacabanga ukuthi lokhu kungenzeka kungenxa yokuthi ulwazi lwesipelingi aluyona ingxenye ye isethi yedatha yokuqeqeshwa.
Ukubonisana Okuhlanganisiwe
Abacwaningi babona ukuthi i-DALL-E 2 isenobunzima obuthile ngokucabanga okubhaliwe. Kalula nje, imodeli ingaqonda izici ngazinye zesithombe kuyilapho isenenkinga yokuthola ubudlelwano phakathi kwalezi zici.
Isibonelo, uma inikezwa ngokushesha “ikhyubhu ebomvu phezu kwekhyubhu eluhlaza okwesibhakabhaka”, i-DALL-E izokhiqiza ikhyubhu eluhlaza okwesibhakabhaka kanye nekhiyubhu ebomvu ngokunembile kodwa yehluleke ukuyibeka ngendlela efanele. Imodeli iphinde yaqashelwa ukuthi inobunzima bokwaziswa okudinga inani elithile lezinto okufanele likhishwe.
Ukuchema kudathasethi
Uma ukwaziswa kungenayo eminye imininingwane, i-DALL-E iye yaqashelwa ukuze ibonise abantu abamhlophe noma abaseNtshonalanga kanye nezindawo. Lokhu kukhetha kokumela kwenzeka ngenxa yobuningi bezithombe ezimaphakathi nentshonalanga kudathasethi.
Imodeli iphinde yaqashelwa ukuthi ilandela imibono yobulili. Isibonelo, ukuthayipha "isisebenzi sendiza" esisheshayo kudala izithombe zabesifazane abasebenza endizeni.
Yini i-Google Imagen AI?
I-Google Isithombe se-AI imodeli ehlose ukudala izithombe ze-photorealistic kusuka kumbhalo ofakiwe. Ngokufana ne-DALL-E, imodeli iphinde isebenzisa amamodeli olimi lwe-transformer ukuqonda umbhalo futhi incike ekusetshenzisweni kwamamodeli okusabalalisa ukuze idale izithombe zekhwalithi ephezulu.
Eceleni kwe-Imagen, i-Google iphinde yakhipha ibhentshimakhi yamamodeli wombhalo kuya-isithombe abizwa nge-DrawBench. Besebenzisa i-DrawBench, bakwazile ukubona ukuthi abalinganisi babantu bancamela ukuphuma kwe-Imagen kunamanye amamodeli okuhlanganisa i-DALL-E 2.
Isebenza kanjani?
Ngokufanayo ne-DALL-E, i-Imagen iqala ngokuguqulela ukwaziswa komsebenzisi kube ukushumeka kombhalo ngesishumeki sombhalo esifriziwe.
I-Imagen isebenzisa imodeli yokusabalalisa efunda ukuguqula iphethini yomsindo ibe izithombe. Ukukhishwa kokuqala kwalezi zithombe kuwukucaca okuphansi futhi kamuva kudluliswa kwenye imodeli eyaziwa ngokuthi imodeli yokusabalalisa ye-super-resolution ukuze kwandiswe ukulungiswa kwesithombe sokugcina. Imodeli yokuqala yokusabalalisa ikhipha isithombe samaphikiseli angu-64×64 futhi kamuva ivunguzelelwa kumfanekiso wokucaca okuphezulu okungu-1024×1024.
Ngokusekelwe ocwaningweni lwethimba le-Imagen, amamodeli amakhulu olimi oluqandisiwe aqeqeshwa kuphela kudatha yombhalo asengamakhodi ombhalo asebenza kahle kakhulu ekukhiqizeni umbhalo uye esithombeni.
Ucwaningo luphinde lwethule umqondo we-dynamic thresholding. Le ndlela yenza izithombe zibonakale njenge-photoreal kakhudlwana ngokwandisa izisindo zokuqondisa lapho kukhiqizwa isithombe.
Ukusebenza kwe-DALLE 2 vs Imagen
Imiphumela yokuqala evela kubhentshimakhi ye-Google ibonisa ukuthi abaphendulayo bancamela izithombe ezikhiqizwe i-Imagen kune-DALL-E 2 kanye namanye amamodeli okuthumela umbhalo-kuya-isithombe njenge-Latent Diffusion ne-VQGAN+CLIP.
Okukhiphayo okuvela eqenjini le-Imagen kuphinde kwabonisa ukuthi imodeli yabo isebenza kangcono embhalweni wesipelingi, ubuthakathaka obaziwayo bemodeli ye-DALL-E 2.
Kodwa-ke, njengoba i-Google ingakayikhiphi imodeli emphakathini, kusazobonakala ukuthi amabhentshimakhi akwaGoogle anembe kangakanani.
Isiphetho
Ukunyuka kwamamodeli we-photorealistic text-to-image kuyimpikiswano ngoba lawa mamodeli aselungele ukusetshenziswa okungekho emthethweni.
Ubuchwepheshe bungaholela ekwakhiweni kokuqukethwe okuyingcaca noma njengethuluzi lokungahloniphi ulwazi. Abacwaningi abavela ku-Google kanye ne-OpenAI bayakwazi lokhu, okuyingxenye yokuthi kungani lobu buchwepheshe bungakafinyeleli kuwo wonke umuntu.
Amamodeli ombhalo uye esithombeni nawo anemithelela ebalulekile yezomnotho. Ingabe ubungcweti obufana namamodeli, abathwebuli bezithombe, nabaculi bazothinteka uma amamodeli afana ne-DALL-E eba yinsakavukela?
Okwamanje, lawa mamodeli asenemikhawulo. Ukubamba noma yisiphi isithombe esenziwe nge-AI ukuze usihlolisise kuzoveza ukungapheleli kwaso. Ngokuqhudelana kokubili kwe-OpenAI ne-Google ngamamodeli asebenza kahle kakhulu, kungase kuthathe isikhathi ngaphambi kokuthi kukhiqizwe okuphumayo okuphelele ngempela: isithombe esingehlukaniseki nento yangempela.
Ucabanga ukuthi kuzokwenzekani uma ubuchwepheshe buya kude kangaka?
shiya impendulo