Kodi mudalakalaka mutakambirana ndi AI yomwe imamvetsetsa zomwe zimalankhulidwa komanso zowoneka? MultiModal-GPT paradigm imaphatikiza chilankhulo ndi kumvetsetsa kowoneka.
Imapereka kuthekera kolumikizana kolondola komanso kosiyanasiyana kwa makompyuta a anthu. MultiModal-GPT atha kupereka mawu ofotokozera, kuwerengera zinthu payekha, ndikuyankha mafunso wamba.
Koma, zimachita bwanji zimenezo? Ndipo, mungatani ndi MultiModal-GPT?
Tiyeni titengere nkhaniyi poyambira ndikumvetsetsa zotheka zomwe zili patsogolo pathu.
Ndi kutuluka kwa mitundu ya zilankhulo monga GPT-4, matekinoloje okonza zilankhulo zachilengedwe akuwona kusintha. Zatsopano monga ChatGPT zaphatikizidwa kale m'miyoyo yathu.
Ndipo, zikuwoneka kuti zikubwerabe!
GPT-4 ndi Zochepa Zake
GPT-4 yawonetsa luso lodabwitsa pamakambirano amitundu yambiri ndi anthu. Kafukufuku ayesetsa kufanizira ntchitoyi, koma chifukwa cha kuchuluka kwa zizindikiro zazithunzi, kuphatikiza zitsanzo zokhala ndi chidziwitso cholondola zitha kukhala zokwera mtengo kwambiri.
Zitsanzo zomwe zilipo siziphatikizanso malangizo a zilankhulo m'maphunziro awo, zomwe zimawalepheretsa kutenga nawo mbali pazokambirana zazithunzi za zero-shot multiturn.
Kumanga Pa Flamingo Framework
Njira yatsopano yotchedwa MultiModal-GPT idapangidwa kuti izitha kulumikizana ndi anthu pogwiritsa ntchito zilankhulo komanso zowonera.
Madivelopa adagwiritsa ntchito pulogalamu yotchedwa the Flamingo framework, yomwe idaphunzitsidwa kale kumvetsetsa zolemba ndi zowonera, kuti izi zitheke.
Flamingo inafunikira kusintha, komabe, chifukwa sinathe kukhala ndi zokambirana zowonjezera zomwe zimaphatikizapo zolemba ndi zithunzi.
Mtundu wosinthidwa wa MultiModal-GPT ukhoza kusonkhanitsa deta kuchokera pazithunzi ndikusakaniza ndi chinenero kuti umvetsetse ndikuchita malamulo a anthu.
MultiModal-GPT
MultiModal-GPT ndi mtundu wa mtundu wa AI womwe umatha kutsatira mafunso osiyanasiyana a anthu monga kufotokozera zowoneka, kuwerengera zinthu, ndikuyankha mafunso. Imamvetsetsa ndikutsata madongosolo pogwiritsa ntchito kusakanikirana kwa data yowoneka ndi mawu.
Ochita kafukufuku adaphunzitsa chitsanzochi pogwiritsa ntchito deta yowona ndi chinenero chokha kuti awonjezere luso la MultiModal-GPT kuti athe kulankhulana ndi anthu. Kuphatikiza apo, idapangitsa kusintha kowoneka bwino m'njira yake yolankhulira. Zinapangitsanso kusintha kowoneka bwino pamakambirano ake.
Iwo adapeza kuti kukhala ndi chidziwitso chapamwamba cha maphunziro ndikofunika kwambiri kuti pakhale zokambirana zabwino, chifukwa deta yaying'ono yokhala ndi mayankho afupikitsa ingathandize chitsanzo kupanga mayankho achidule ku lamulo lililonse.
Kodi Mungatani Ndi MultiModal-GPT?
Kukambirana
Mofanana ndi zilankhulo zomwe zidabwera kale, chimodzi mwazofunikira za MultiModal-GPT ndikutha kuchita nawo zokambirana zachilankhulo chachilengedwe. Izi zikutanthawuza kuti ogula akhoza kuchita nawo chitsanzo monga momwe amachitira ndi munthu weniweni.
Mwachitsanzo, MultiModal-GPT imatha kupatsa makasitomala njira yatsatanetsatane yopangira Zakudyazi kapena kupangira malo odyera omwe angakadyeko. Mtunduwu umathanso kuyankha mafunso anthawi zonse okhudzana ndi zolinga zaulendo.
Kuzindikira Zinthu
MultiModal-GPT imatha kuzindikira zinthu pazithunzi ndikuyankha mafunso okhudza iwo. Mwachitsanzo, chitsanzochi chimatha kuzindikira Freddie Mercury pachithunzi ndikuyankha mafunso okhudza iye.
Ikhozanso kuwerengera chiwerengero cha anthu ndi kufotokoza zomwe akuchita pa chithunzi. Mphamvu yozindikiritsa chinthu ichi imagwira ntchito m'magawo osiyanasiyana, kuphatikiza malonda a e-commerce, zaumoyo, ndi chitetezo.
MultiModal-GPT imathanso kuzindikira zolemba mkati mwazithunzi za digito. Izi zikutanthawuza kuti chitsanzocho chimatha kuwerenga zomwe zili muzithunzi ndikuchotsa zofunikira. Mwachitsanzo, imatha kuzindikira anthu a pachithunzipa n’kuzindikira amene analemba bukulo.
Ndi chida chothandiza kwambiri kukonza malemba, kuyika deta, ndi kusanthula zomwe zili.
Kukambitsirana ndi Kubadwa kwa Chidziwitso
Multi-modal-GPT imatha kulingalira ndikupanga chidziwitso chokhudza dziko lapansi. Izi zikutanthauza kuti imatha kufotokoza zonse za zithunzi komanso kuwauzanso nyengo yomwe chithunzicho chinajambulidwa.
Luso limeneli n’lothandiza m’maphunzilo osiyanasiyana, kuphatikizapo kuyang’anila zachilengedwe, ulimi, ndi zanyengo. Mtunduwu ukhozanso kupanga zinthu zaluso monga ndakatulo, nthano, ndi nyimbo, zomwe zimapangitsa kukhala chida chabwino kwambiri pantchito zopanga.
Ntchito Zamkati za MultiModal-GPT
Template for Unified Instructions
Gululi limapereka template imodzi yophatikizira deta ya zinenero zosawerengeka komanso deta ya masomphenya ndi chinenero cha multimodal kuti aphunzitse bwino MultiModal-GPT model mu synergistic.
Njira yophatikizikayi ikuyesera kupititsa patsogolo magwiridwe antchito achitsanzo pa ntchito zosiyanasiyana pogwiritsa ntchito kuthekera kophatikizana kwa njira zonse za data ndikulimbikitsa kumvetsetsa mozama kwa malingaliro omwe ali pansi.
Zolemba za Dolly 15k ndi Alpaca GPT4 zimagwiritsidwa ntchito ndi gulu kuyesa luso lotsatira malangizo a chilankhulo chokha. Ma seti a datawa amakhala ngati template yofulumira yokonza zolowetsa dataset kuti zitsimikizire mtundu wotsatira wa malangizo.
Chithunzi: Chidule cha dataset ya Doly 15k
Kodi Chitsanzocho Chimagwira Ntchito Motani?
Zigawo zitatu zazikuluzikulu zimapanga mtundu wa MultiModal-GPT: chosinthira chilankhulo, chowunikiranso chozindikira, ndi encoder ya masomphenya. Chithunzicho chimatengedwa ndi encoder ya masomphenya, yomwe imapanga mndandanda wazinthu zomwe zimachizindikiritsa.
Decoder ya chilankhulo imagwiritsa ntchito chidziwitso chochokera ku encoder ya masomphenya kupanga mawu ofotokozera chithunzicho mothandizidwa ndi choyesereranso chozindikira.
Chigawo cha chitsanzo chomwe chimamvetsetsa chinenero ndikutulutsa malemba ndi decoder ya chinenero. Kudziwiratu mawu otsatirawa m'mawu, chitsanzocho chimaphunzitsidwa pogwiritsa ntchito chidziwitso cha chinenero chokha komanso masomphenya-kuphatikiza malangizo a chinenero.
Izi zimaphunzitsa chitsanzo momwe tingachitire ndi malamulo a anthu ndipo zimapereka malemba ovomerezeka kuti afotokoze zithunzi.
Team Kumbuyo
MultiModal-GPT idapangidwa ndi gulu la ofufuza ndi mainjiniya a Microsoft Research Asia motsogozedwa ndi Tao Gong, Chengqi Lyu, ndi Shilong Zhang. Yudong Wang, Miao Zheng, Qian Zhao, Kuikun Liu, Wenwei Zhang, Ping Luo, ndi Kai Chen onse anathandizira pakuphunzira ndi chitukuko cha chitsanzo.
Kukonza zilankhulo zachilengedwe, masomphenya a makompyuta, ndi kuphunzira pamakina ndi mbali zonse za luso la gulu. Ali ndi zolemba zingapo zosindikizidwa m'misonkhano yapamwamba komanso zofalitsa, komanso ulemu ndi kuyamikiridwa kosiyanasiyana chifukwa cha zoyesayesa zawo zasayansi.
Kufufuza kwa gululi kumayang'ana pa chitukuko cha zitsanzo zamakono ndi njira zothandizira kuyanjana kwachilengedwe komanso mwanzeru pakati pa anthu ndi teknoloji.
Kukula kwa Multi-modal-GPT ndichinthu chodziwika bwino pantchitoyi chifukwa ndi imodzi mwazinthu zoyamba kuphatikiza masomphenya ndi chilankhulo munjira imodzi pazokambirana zingapo.
Zopereka za gululi pa kafukufuku ndi chitukuko cha MultiModal-GPT zimatha kukhala ndi chikoka chachikulu pa tsogolo la chilankhulo chachilengedwe komanso kuyanjana kwa makina a anthu.
Momwe Mungagwiritsire Ntchito MultiModal-GPT
Kwa oyamba kumene, kugwiritsa ntchito MultiModal-GPT chida ndikosavuta. Ingopitani ku https://mmgpt.openmmlab.org.cn/ ndikusindikiza batani la "Pangani Image".
Sankhani fayilo kuti muyike, kenako lembani mawuwo m'gawo lalemba. Kuti mupange yankho kuchokera kuchitsanzo, dinani batani la "Submit", lomwe liziwoneka pansi pa gawo lazolemba.
Mutha kuyesa zithunzi ndi malangizo osiyanasiyana kuti mudziwe zambiri za kuthekera kwachitsanzocho.
khazikitsa
Kuti muyike phukusi la MultiModal-GPT, gwiritsani ntchito lamulo la terminal "git clone https://github.com/open-mmlab/Multimodal-GPT.git" kuti mutengere posungira kuchokera ku GitHub. Mutha kutsatira izi:
git clone https://github.com/open-mmlab/Multimodal-GPT.git
cd Multimodal-GPT
pip install -r requirements.txt
pip install -v -e .
Kapenanso, gwiritsani ntchito conda env create -f environment.yml
kukhazikitsa malo atsopano a conda. Mutha kuyendetsa chiwonetserocho kwanuko mukachiyika potsitsa zolemera zomwe zidaphunzitsidwa kale ndikuzisunga mufoda yoyang'anira.
Chiwonetsero cha Gradio chikhoza kukhazikitsidwa poyendetsa lamulo "python app.py".
Zomwe Zingachitike
Mtundu wa MultiModal-GPT ukadali ndi zolakwika komanso malo opangira chitukuko ngakhale kuti umachita bwino kwambiri.
Mwachitsanzo, polimbana ndi zowoneka zovuta kapena zosamvetsetseka, chitsanzocho sichingathe kuzindikira nthawi zonse ndikumvetsetsa zomwe zalembedwazo. Izi zitha kubweretsa kulosera kolakwika kapena machitidwe amtunduwo.
Kuonjezera apo, makamaka pamene kulowetsako kuli kovuta kapena kotsegula, chitsanzo sichikhoza kutulutsa zotsatira zabwino kwambiri nthawi zonse. Mwachitsanzo, yankho lachitsanzolo lingakhale lokhudzidwa ndi mmene zikuto za mabuku aŵiriwo zinkawonekera pa nkhani ya chizindikiritso cholakwika cha chikuto cha buku.
Kutsiliza
Ponseponse, mtundu wa MultiModal-GPT ukuyimira sitepe yayikulu pakuwongolera zilankhulo zachilengedwe komanso kuphunzira pamakina. Ndipo, ndizosangalatsa kwambiri kuzigwiritsa ntchito ndikuyesa nazo. Chifukwa chake, muyenera kuyesanso!
Komabe, ili ndi malire, monganso mitundu yonse, ndipo imafunikira kuyengedwa kwina ndi kuwongolera kuti ipeze magwiridwe antchito osiyanasiyana pazogwiritsa ntchito ndi madambwe.
Siyani Mumakonda