Ngaba ukhe wanqwenela ukuba uncokole ne-AI eqonda zombini ezithethiweyo kunye nedatha ebonakalayo? I-Paradigm ye-MultiModal-GPT idibanisa ukusetyenziswa kolwimi kunye nokuqonda okubonakalayo.
Inika ithuba lokusebenzisana okuchanekileyo kunye nokwahluka kwabantu kwikhompyuter. I-MultiModal-GPT inokubonelela ngeenkcazo ezichazayo, ibale izinto ezizimeleyo, kwaye iphendule imibuzo yabasebenzisi ngokubanzi.
Kodwa, ikwenza njani oko? Kwaye, unokwenza ntoni nge-MultiModal-GPT?
Masithathe ibali ekuqaleni kwaye siqonde izinto ezinokwenzeka phambi kwethu.
Ngokuvela kwemifuziselo yolwimi efana ne-GPT-4, ubugcisa bokulungisa ulwimi lwendalo bubona utshintsho. Izinto ezintsha ezifana neChatGPT sele zibandakanyiwe kubomi bethu.
Kwaye, babonakala beqhubeka nokuza!
I-GPT-4 kunye neMida yayo
I-GPT-4 ibonise ubuchule obumangalisayo kwiingxoxo ezininzi kunye nabantu. Uphononongo lwenze umzamo wokuphindaphinda lo msebenzi, kodwa ngenxa yenani eliphezulu leethokheni zemifanekiso, kubandakanywa iimodeli ezinolwazi oluchanekileyo olubonakalayo lunokubiza kakhulu.
Imifuziselo ekhoyo ikwangabandakanyi imiyalelo yolwimi kwisifundo sabo, esithintela amandla abo okuthatha inxaxheba kwiincoko ze-zero-shot multiturn image-text.
Ukwakha phezu kweFlamingo Framework
Imodeli entsha ebizwa ngokuba yi-MultiModal-GPT yaphuhliswa ukwenzela ukuba unxibelelwano nabantu basebenzise zombini iimpawu zolwimi kunye nezibonwayo.
Abaphuhlisi basebenzise inkqubo ebizwa ngokuba yi isakhelo seFlamingo, ebikade iqeqeshelwe ukuqonda kokubini okubhaliweyo kunye nokubonwayo, ukwenza oku kwenzeke.
I-Flamingo yayidinga utshintsho oluthile, nangona kunjalo, njengoko yayingakwazi ukuba neengxoxo ezandisiweyo ezibandakanya isicatshulwa kunye nokubonwayo.
Imodeli ehlaziyiweyo ye-MultiModal-GPT inokuqokelela idatha kwimifanekiso kwaye ixube nolwimi ukuze iqonde kwaye iqhube imiyalelo yabantu.
I-MultiModal-GPT
I-MultiModal-GPT luhlobo lwemodeli ye-AI enokulandela imibuzo eyahlukeneyo yabantu njengokuchaza izinto ezibonakalayo, ukubala izinto, kunye nokuphendula imibuzo. Iyaqonda kwaye ilandele imiyalelo isebenzisa umxube wedatha ebonakalayo kunye neyomlomo.
Abaphandi baqeqeshe imodeli besebenzisa idatha ebonakalayo kunye nolwimi kuphela ukunyusa umthamo we-MultiModal-GPT wokuthetha nabantu. Ukongezelela, kubangele ukuphucuka okuphawulekayo kwindlela intetho yayo eyenziwa ngayo. Kwakhona kubangele ukuphucuka okubonakalayo ekusebenzeni kwayo kwengxoxo.
Baye bafumanisa ukuba ukuba nedatha yoqeqesho olukumgangatho ophezulu kubalulekile ekusebenzeni kakuhle kwencoko, kuba idatha encinci eneempendulo ezimfutshane inokwenza imodeli idale iimpendulo ezimfutshane kuwo nawuphi na umyalelo.
Yintoni onokuyenza nge-MultiModal-GPT?
Ukubandakanyeka Kwiincoko
Njengemifuziselo yolwimi eza ngaphambili, enye yeempawu eziphambili ze-MultiModal-GPT kukukwazi ukubandakanya iingxoxo zolwimi lwendalo. Oku kuthetha ukuba abathengi banokubandakanyeka kwimodeli njengoko bebeya kwenza nomntu wokwenyani.
Umzekelo, i-MultiModal-GPT inokunika abathengi iresiphi eneenkcukacha yokwenza ii-noodles okanye bacebise iindawo zokutyela ezinokwenzeka zokutyela ngaphandle. Imodeli ikwanakho ukuphendula kwimibuzo eqhelekileyo malunga neenjongo zohambo lwabasebenzisi.
Ukuqatshelwa kweZinto
I-MultiModal-GPT inokubona izinto kwiifoto kwaye iphendule imibuzo malunga nazo. Umzekelo, imodeli inokubona uFreddie Mercury emfanekisweni kwaye iphendule imibuzo ngaye.
Isenokuthi kananjalo ibale inani labantu kwaye ichaze into abayenzayo emfanekisweni. Esi sikhundla sokuchonga into sinezicelo kwiinkalo ezahlukeneyo, kubandakanya i-e-commerce, ukhathalelo lwempilo, kunye nokhuseleko.
I-MultiModal-GPT inokubona umbhalo ngaphakathi kwemifanekiso yedijithali. Oku kuthetha ukuba imodeli inokufunda okubhaliweyo kwiifoto kwaye ikhuphe idatha eluncedo. Ngokomzekelo, isenokubabona abalinganiswa emfanekisweni ize ichaze umbhali wencwadi.
Sisixhobo esiluncedo kakhulu ulawulo lwexwebhu, igalelo ledatha, kunye nohlalutyo lomxholo.
Ukuqiqa kunye nesiZukulwana soLwazi
I-Multi-modal-GPT inokuqiqa kwaye ivelise ulwazi malunga nehlabathi. Oku kuthetha ukuba inokubonelela ngeengcaciso ezipheleleyo zeefoto kwaye ibaxelele nokuba leliphi ixesha lonyaka umfanekiso othathwe ngalo.
Obu buchule buluncedo kwiinkalo ngeenkalo, kuquka nokubeka iliso kokusingqongileyo, ezolimo, nenzululwazi ngemozulu. Imodeli inokongeza izinto zokuyila njengemibongo, amabali, kunye neengoma, iyenza ibe sisixhobo esihle kakhulu semisebenzi yokuyila.
Ukusebenza kwangaphakathi kwe-MultiModal-GPT
Isakhelo seMiyalelo eManyeneyo
Iqela libonisa ithemplate enye yokuhlanganiswa kwedatha yeelwimi ezingafaniyo kunye nedatha ye-multimodal umbono kunye nolwimi ukuze kuqeqeshwe ngokufanelekileyo imodeli ye-MultiModal-GPT ngendlela ye-synergistic.
Esi sicwangciso-qhinga sidityanisiweyo sizama ukuphucula ukusebenza komfuziselo kuyo yonke imisebenzi eyahluka-hlukeneyo ngokusebenzisa amandla ahambelanayo azo zombini iindlela zedatha kunye nokukhuthaza ukuqondwa nzulu kweengcamango ezisisiseko.
I-Dolly 15k kunye ne-Alpaca GPT4 datasets zisetyenziswa liqela ukulinganisa ubuchule bokulandela imiyalelo yolwimi kuphela. Ezi seti zedatha zisebenza njengethempleyithi ekhawulezileyo yokucwangcisa igalelo leseti yedatha ukuqinisekisa ifomathi engaguqukiyo yokulandela imiyalelo.
Umfanekiso: Isishwankathelo se-Doly 15k iseti yedatha
Usebenza Njani Lo Mzekelo?
Izinto ezintathu eziphambili zenza imodeli ye-MultiModal-GPT: i-decoder yolwimi, i-perceiver resampler, kunye ne-encoder yombono. Umfanekiso uthathwa yi-encoder yombono, ethi ke ivelise ingqokelela yeempawu eziwubonakalisayo.
Idikhowuda yolwimi isebenzisa ulwazi olusuka kwi-encoder yombono ukwenza umbhalo ochaza umfanekiso ngoncedo lwe-perceiver resampler.
Icandelo lemodeli eliqonda ulwimi lize livelise isicatshulwa sisidikhowuda solwimi. Ukuqikelela igama elilandelayo kwibinzana, imodeli iqeqeshwa ngokusebenzisa zombini ulwimi-kuphela kunye nombono-kunye nomyalelo wolwimi-olandela idatha.
Oku kufundisa imodeli indlela yokusabela kwimiyalelo evela ebantwini kwaye inika isicatshulwa esamkelekileyo senkcazo yemifanekiso.
Iqela ngasemva
I-MultiModal-GPT yenziwe liqela leMicrosoft Research Asia abaphandi kunye neenjineli ezikhokelwa nguTao Gong, Chengqi Lyu, kunye noShilong Zhang. UYudong Wang, uMiao Zheng, uQian Zhao, uKuikun Liu, uWenwei Zhang, uPing Luo, noKai Chen bonke baba negalelo kwisifundo somzekelo kunye nophuhliso.
Ukulungiswa kolwimi lwendalo, umbono wekhompyutha, kunye nokufunda koomatshini zizo zonke iinkalo zobuchule kwiqela. Banamanqaku aliqela apapashwe kwiinkomfa ezikumgangatho ophezulu kunye nopapasho, kunye neembeko ezahlukeneyo kunye nokuwongwa ngemizamo yabo yesayensi.
Uphando lweqela lujolise ekuphuhliseni iimodeli zokusika kunye neendlela zokwenza ukuba kubekho intsebenziswano yendalo kunye nobukrelekrele phakathi kwabantu kunye netekhnoloji.
Uphuhliso lwe-Multi-modal-GPT lufezekiso oluphawulekayo kwintsimi kuba yenye yeemodeli zokuqala zokudibanisa umbono kunye nolwimi kwisakhelo esisodwa kwiingxoxo ezininzi.
Igalelo leqela kuphando lwe-MultiModal-GPT kunye nophuhliso lunamandla okuba nefuthe elibonakalayo kwikamva lokusetyenzwa kolwimi lwendalo kunye nokusebenzisana kwabantu noomatshini.
Uyisebenzisa njani i-MultiModal-GPT
Kubaqalayo, ukusebenzisa isixhobo se-MultiModal-GPT kulula. Yiya nje ku https://mmgpt.openmmlab.org.cn/ kwaye ucinezele iqhosha elithi "Layisha umfanekiso".
Khetha ifayile yomfanekiso oza kuyilayisha, kwaye emva koko uchwetheze umyalezo wokubhaliweyo kwindawo yokubhaliweyo. Ukwenza impendulo kwimodeli, nqakraza iqhosha elithi "Thumela", eliza kuvela ngaphantsi kwendawo yombhalo.
Unokuzama ngeefoto ezahlukeneyo kunye nemiyalelo ukuze ufunde ngakumbi malunga nesakhono semodeli.
Ukufaka
Ukufakela iphakheji ye-MultiModal-GPT, sebenzisa umyalelo we-terminal "git clone https://github.com/open-mmlab/Multimodal-GPT.git" ukufanisa indawo yokugcina esuka kwi-GitHub. Unokulandela la manyathelo:
git clone https://github.com/open-mmlab/Multimodal-GPT.git
cd Multimodal-GPT
pip install -r requirements.txt
pip install -v -e .
Okanye, sebenzisa conda env create -f environment.yml
ukuseka indawo entsha yeconda. Ungaqhuba idemo kwindawo emva kokuyifakela ngokukhuphela iintsimbi eziqeqeshwe kwangaphambili kwaye uzigcine kwifolda yokukhangela.
Idemo yeGradio inokuqaliswa ngokuqhuba umyalelo othi “python app.py”.
Iingxaki ezinokuthi zibekho
Imodeli ye-MultiModal-GPT iseneziphene kunye negumbi lophuhliso nangona isebenza kakuhle.
Umzekelo, xa ujongana nezinto ezibonakalayo ezintsonkothileyo okanye ezingaqondakaliyo, imodeli ayinakuhlala ikwazi ukuqaphela kunye nokuqonda umxholo wegalelo. Oku kunokubangela uqikelelo olungachanekanga okanye iimpendulo ezivela kwimodeli.
Ukongeza, ngakumbi xa igalelo linzima okanye livulekile, imodeli ayinakuhlala ivelisa eyona mpendulo ilungileyo okanye isiphumo. Ngokomzekelo, impendulo yalo mzekelo, isenokuba ichatshazelwe yindlela ayefana ngayo amaqweqwe eencwadi ezimbini kwimeko yokuchongwa okungachanekanga kweqweqwe lencwadi.
isiphelo
Ngokubanzi, imodeli ye-MultiModal-GPT ibonisa inyathelo elikhulu eliya phambili ekuqhubeni ulwimi lwendalo kunye nokufunda komatshini. Kwaye, kuyavuyisa kakhulu ukuyisebenzisa kunye nokuzama ngayo. Ngoko ke, kufuneka uzame nokuba!
Nangona kunjalo, inemida, njengazo zonke iimodeli, kwaye ifuna ukusulungekiswa okongeziweyo kunye nophuculo lokufumana ukusebenza okuphezulu kwiinkqubo ezahlukeneyo kunye nemimandla.
Shiya iMpendulo