Wakamboshuvira kuti utaure neAI inonzwisisa zvese zvinotaurwa uye zvinoonekwa data? Iyo MultiModal-GPT paradigm inosanganisa kugadzirisa mutauro nekunzwisisa kwekuona.
Inopa mukana wekuita kwakaringana uye kwakasiyana-siyana kwevanhu-kombuta. MultiModal-GPT inogona kupa zvinyorwa zvinotsanangura, kuverenga zvinhu zvega, uye kupindura kumibvunzo yese yemushandisi.
Asi, zvinoita sei izvozvo? Uye, chii chaungaite neMultiModal-GPT?
Ngatitorei nyaya kumavambo uye tinzwisise zvingangoitika zviri mberi kwedu.
Nekubuda kwemhando dzemitauro seGPT-4, matekinoroji ekugadzirisa mitauro ari kupupurira shanduko. Zvitsva zvakaita seChatGPT zvakatobatanidzwa muhupenyu hwedu.
Uye, vanoita sevanoramba vachiuya!
GPT-4 uye Miganhu Yayo
GPT-4 yakaratidza hunyanzvi hunoshamisa mukukurukurirana kwakawanda nevanhu. Zvidzidzo zvakaedza kudzokorora kuita uku, asi nekuda kwenhamba yakakwira yematokeni emifananidzo, kusanganisira mamodheru ane ruzivo chairwo rwekuona anogona kudhura zvakanyanya.
Mamodheru aripo zvakare haasanganisi kuraira kwemutauro mukudzidza kwavo, izvo zvinovatadzisa kutora chikamu muzero-shot multiturn image-text hurukuro.
Kuvaka Pamusoro peFlamingo Framework
Muenzaniso mutsva unonzi MultiModal-GPT wakagadzirwa kuti ugone kutaurirana nevanhu vachishandisa mitauro nekuona.
Vagadziri vacho vakashandisa chirongwa chinonzi the Flamingo framework, iyo yakambodzidziswa kunzwisisa zvese zvinyorwa uye zvinoonekwa, kuti izvi zviitike.
Flamingo yaida shanduko, kunyange zvakadaro, sezvo yaisakwanisa kuva nenhaurirano dzakawedzerwa dzaisanganisira zvinyorwa uye zvinoonekwa.
Iyo yakagadziridzwa MultiModal-GPT modhi inogona kuunganidza data kubva pamifananidzo uye kuisanganisa nemutauro kuti unzwisise uye uite mirairo yevanhu.
MultiModal-GPT
MultiModal-GPT imhando yeAI modhi inogona kutevedzera kwakasiyana kubvunza kwevanhu sekutsanangura zvinoonekwa, kuverenga zvinhu, uye kupindura mibvunzo. Inonzwisisa uye inotevera maodha ichishandisa musanganiswa wekuona uye yemashoko data.
Vatsvakurudzi vakadzidzisa muenzaniso vachishandisa zvose zvinoonekwa uye mutauro-chete data kuwedzera MultiModal-GPT kukwanisa kutaura nevanhu. Uyezve, yakakonzera kuvandudzwa kunooneka munzira iyo hurukuro yayo yakaitwa. Zvakakonzerawo kuvandudzwa kunooneka mukuita kwayo hurukuro.
Vakaona kuti kuve nedhata remhando yepamusoro-soro rakakosha pakuita kwekutaura kwakanaka, nekuti dhata diki rine mhinduro pfupi rinogona kugonesa modhi kugadzira mhinduro pfupi kune chero murairo.
Chii Chaungaite NeMultiModal-GPT?
Kuita Kukurukurirana
Kufanana nemamodheru emitauro akambouya, chimwe cheMultiModal-GPT chekutanga hunhu kugona kwayo kuita hurukuro dzemutauro wechisikigo. Izvi zvinoreva kuti vatengi vanogona kuita nemuenzaniso sezvavaizoita nemunhu chaiye.
Semuenzaniso, MultiModal-GPT inogona kupa vatengi nzira yakadzama yekugadzira ma noodles kana kukurudzira inokwanisika maresitorendi ekudyira kunze. Iyo modhi zvakare inokwanisa kupindura kumibvunzo yakajairika nezve vavariro dzerwendo rwevashandisi.
Kuzivikanwa kweZvinhu
MultiModal-GPT inogona kuziva zvinhu mumifananidzo uye kupindura kubvunza pamusoro pazvo. Semuenzaniso, modhi inogona kuziva Freddie Mercury mumufananidzo uye kupindura mibvunzo nezvake.
Inogonawo kuverenga nhamba yevanhu uye kutsanangura zvavari kuita mumufananidzo. Ichi chiziviso chechinhu chine maapplication munzvimbo dzakasiyana siyana, kusanganisira e-commerce, hutano, uye chengetedzo.
MultiModal-GPT inogonawo kuona mavara mukati memifananidzo yedhijitari. Izvi zvinoreva kuti modhi inogona kuverenga zvinyorwa mumifananidzo uye kubvisa data rakakosha. Somuenzaniso, inogona kuona vanhu vari pamufananidzo uye kuziva munyori webhuku.
Icho chinhu chinobatsira zvakanyanya kune gwaro manejimendi, kupinza data, uye kuongorora zvirimo.
Kufunga uye Chizvarwa cheZivo
Multi-modal-GPT inogona kufunga uye kuburitsa ruzivo pamusoro penyika. Izvi zvinoreva kuti inogona kupa tsanangudzo yakazara yemifananidzo uye kunyange kuvaudza kuti mufananidzo wacho wakatorwa mwaka upi.
Unyanzvi uhwu hunobatsira mune dzakasiyana siyana, kusanganisira kuongorora kwezvakatipoteredza, zvekurima, uye meteorology. Iyo modhi inogona kuwedzera kugadzira zvinhu zvekugadzira senhetembo, ngano, uye nziyo, zvichiita kuti ive chishandiso chakanakisa chemabasa ekugadzira.
Inner Workings yeMultiModal-GPT
Templeti Yemirayiridzo Yakabatana
Chikwata chinopa template imwe chete yekubatanidzwa kweunimodal linguistic data uye multimodal kuona-uye-mutauro data kudzidzisa nemazvo MultiModal-GPT modhi nenzira ye synergistic.
Iyi nzira yakasanganiswa inoedza kuvandudza mashandiro emuenzaniso mumabasa akasiyana-siyana nekushandisa hunyanzvi hwekugona kwezvose zviri zviviri data modalities uye kukurudzira kunzwisisa kwakadzama kwemazano epasi.
Iyo Dolly 15k uye Alpaca GPT4 dhataseti inoshandiswa nechikwata kuyera mutauro-chete kuraira-kutevera kugona. Aya ma dataset anoshanda seyekukurumidza template yekumisikidza dhatabheti yekumisikidza inowirirana kuraira-kutevera fomati.
Mufananidzo: Pfupiso yeDoly 15k dataset
Muenzaniso Unoshanda Sei?
Zvinhu zvitatu zvakakosha zvinoumba modhi yeMultiModal-GPT: decoder yemutauro, resampler yekuona, uye encoder yekuona. Mufananidzo unotorwa neiyo encoder yechiratidzo, inozoburitsa muunganidzwa wehunhu hunouratidza.
Decoder yemutauro inoshandisa ruzivo kubva kuencoder yechiratidzo kugadzira zvinyorwa zvinotsanangura mufananidzo uchibatsirwa nemugadziri wekuona zvakare.
Chikamu chemuenzaniso chinonzwisisa mutauro uye chinoburitsa chinyorwa idhikodha yemutauro. Kufanotaura izwi rinotevera mumutsara, modhi inodzidziswa uchishandisa zvese zviri zviviri mutauro-chete uye chiratidzo-pamwe nemirairo yemutauro-inotevera data.
Izvi zvinodzidzisa muenzaniso maitiro ekuita kune mirairo kubva kuvanhu uye inopa zvinyorwa zvinogamuchirwa zvetsanangudzo yemifananidzo.
Team Kuseri
Iyo MultiModal-GPT yakagadzirwa nechikwata cheMicrosoft Research Asia vaongorori uye mainjiniya inotungamirwa naTao Gong, Chengqi Lyu, uye Shilong Zhang. Yudong Wang, Miao Zheng, Qian Zhao, Kuikun Liu, Wenwei Zhang, Ping Luo, naKai Chen vese vakabatsira mukudzidza nekusimudzira kwemuenzaniso.
Kugadziriswa kwemutauro wechisikigo, computer vision, uye kudzidza muchina inzvimbo dzese dzekugona kwechikwata. Ivo vane zvinyorwa zvakati wandei zvakaburitswa mupamusoro-tier makonferensi uye zvinyorwa, pamwe nerukudzo rwakasiyana uye rukudzo nekuda kwekuedza kwavo kwesainzi.
Tsvagiridzo yechikwata inotarisana nekuvandudzwa kwemamodeli ekucheka-kumucheto uye nzira dzekugonesa kudyidzana kwakawanda kwakasikwa uye kwakangwara pakati pevanhu uye tekinoroji.
Multi-modal-GPT budiriro chiitiko chakakosha mumunda sezvo iri imwe yemamodheru ekutanga kubatanidza chiono nemutauro muchimiro chimwe chete chekukurukurirana kwakatenderedza.
Zvipo zvechikwata kuMultiModal-GPT tsvakiridzo nekusimudzira zvine mukana wekuva nesimba rakakura pane ramangwana rekugadzirisa mutauro wechisikigo uye kusangana kwevanhu nemuchina.
Maitiro ekushandisa MultiModal-GPT
Kune vanotanga, kushandisa MultiModal-GPT chishandiso chiri nyore. Ingoenda ku https://mmgpt.openmmlab.org.cn/ wobva wadzvanya bhatani rekuti “Upload Image”.
Sarudza faira remufananidzo kuti uise, uye wobva wanyora mutsara wemavara mundima yemavara. Kuti ugadzire mhinduro kubva kumuenzaniso, tinya bhatani rekuti "Tumira", iro richaonekwa pazasi pendima yemavara.
Iwe unogona kuedza nemifananidzo yakasiyana-siyana uye mirairo kuti udzidze zvakawanda pamusoro pekugona kwemuenzaniso.
nekuisa
Kuisa iyo MultiModal-GPT package, shandisa iyo terminal command "git clone https://github.com/open-mmlab/Multimodal-GPT.git" kutevedzera repository kubva kuGitHub. Unogona kungotevera matanho aya:
git clone https://github.com/open-mmlab/Multimodal-GPT.git
cd Multimodal-GPT
pip install -r requirements.txt
pip install -v -e .
Neimwe nzira, shandisa conda env create -f environment.yml
kugadzira nharaunda itsva yeconda. Iwe unogona kumhanyisa demo munharaunda mushure mekuiisa nekurodha pasi-yakadzidziswa huremu uye nekuichengeta mucheki folda.
Iyo Gradio demo inogona kuzotangwa nekumhanyisa murairo "python app.py".
Zvinogona Kukanganisa
Iyo MultiModal-GPT modhi ichine zvikanganiso uye nzvimbo yekusimudzira kunyangwe kuita kwayo kwakanaka.
Semuyenzaniso, kana uchibata neakaomesesa kana asinganzwisisiki ekuona, modhi yacho inogona kusagarokwanisa kuziva nekunzwisisa mamiriro ezvakaiswa. Izvi zvinogona kukonzera kufanotaura kusina kururama kana maitiro kubva kumuenzaniso.
Pamusoro pezvo, kunyanya kana iyo yekuisa yakaoma kana yakavhurika-yakavhurika, modhi inogona kusagara ichiburitsa mhinduro yakanaka kana mhedzisiro. Mhinduro yomuenzanisiro, somuenzaniso, ingave yakapesvedzerwa nokufanana kwakafanana kwebutiro remabhuku maviri pachiitiko chokuzivikanwa kusina kururama kwebutiro rebhuku.
mhedziso
Pakazere, iyo MultiModal-GPT modhi inomiririra nhanho hombe yekumberi mukugadziriswa kwemutauro wechisikigo uye kudzidza muchina. Uye, zvinonakidza kwazvo kuishandisa uye kuedza nazvo. Saka, iwe unofanirwa kuedza kana!
Nekudaro, ine miganhu, sezvinoita ese mamodheru, uye inoda kumwe kunatswa nekusimudzira kuti uwane hukuru hwekuita muakasiyana maapplication uye madomasi.
Leave a Reply