Ungakwazi yini ukusebenzisa i-AI ukuze udale irekhodi elisha elivela kumculi omthandayo?
Ukuphumelela kwakamuva ekufundeni komshini kubonise ukuthi amamodeli manje ayakwazi ukuqonda idatha eyinkimbinkimbi njengombhalo nezithombe. I-OpenAI's Jukebox ifakazela ukuthi ngisho nomculo ungalingiswa kahle yinethiwekhi ye-neural.
Umculo uyinto eyinkimbinkimbi yokumodela. Kufanele ucabangele zombili izici ezilula ezifana ne-tempo, umsindo omkhulu, nephimbo kanye nezici eziyinkimbinkimbi ezifana namagama, izinsimbi, kanye nesakhiwo somculo.
Ukusebenzisa okuthuthukisiwe ukufunda imishini amasu, i-OpenAI ithole indlela yokuguqula umsindo ongahluziwe ube isethulo esingasetshenziswa amanye amamodeli.
Lesi sihloko sizochaza ukuthi yini iJukebox engayenza, ukuthi isebenza kanjani, kanye nemikhawulo yamanje yobuchwepheshe.
Iyini i-Jukebox AI?
I-Jukebox iyimodeli yenetha ye-neural ka-OpenAI engakhiqiza umculo ngokucula. Imodeli ingakhiqiza umculo ezinhlobonhlobo zezinhlobo nezitayela zabaculi.
Isibonelo, i-Jukebox ingakhiqiza ingoma ye-rock ngesitayela sika-Elvis Presley noma ingoma ye-hip hop ngesitayela se-Kanye West. Ungavakashela lokhu iwebhusayithi ukuhlola ukuthi imodeli iphumelela kangakanani ekuthwebuleni umsindo wabaculi bakho obathandayo nezinhlobo.
Imodeli idinga uhlobo, umculi, namagama njengokufaka. Lokhu okokufaka kuqondisa imodeli eqeqeshwe ezigidini zamaciko nedatha yamagama.
Isebenza kanjani iJukebox?
Ake sibheke ukuthi i-Jukebox ikwazi kanjani ukwenza inoveli yomsindo ongahluziwe ovela kumodeli eqeqeshwe ngezigidi zezingoma.
Inqubo Yokufaka Ikhodi
Ngenkathi amanye amamodeli esizukulwane somculo esebenzisa idatha yokuqeqeshwa kwe-MIDI, i-Jukebox iqeqeshelwa ifayela lomsindo elingahluziwe langempela. Ukucindezela umsindo endaweni eqondile, i-Jukebox isebenzisa indlela yokufaka ikhodi ezenzakalelayo eyaziwa ngokuthi i-VQ-VAE.
I-VQ-VAE imele i-Vector Quantized Variational Autoencoder, engase izwakale iyinkimbinkimbi, ngakho-ke masiyihlukanise.
Okokuqala, ake sizame ukuqonda ukuthi sifuna ukwenzani lapha. Uma kuqhathaniswa nezinhlamvu zezinhlamvu noma umculo weshidi, ifayela lomsindo elingahluziwe liyinkimbinkimbi kakhulu. Uma sifuna imodeli yethu "ifunde" ezingomeni, kuzodingeka siyiguqule ibe ukumelela okucindezelwe kakhulu nokwenza lula. Ku ukufunda imishini, lokhu sikubiza okuyisisekelo ngokuthi a indawo ecashile.
An i-autoencode kuyindlela yokufunda engagadiwe esebenzisa a inethiwekhi ye-neural ukuthola izethulo ezifihlekile ezingaqondile zokusabalalisa idatha enikeziwe. I-autoencoder iqukethe izingxenye ezimbili: i-encoder ne-decoder.
The ikhodi izama ukuthola isikhala esifihlekile kusethi yedatha eluhlaza ngenkathi i i-decoder isebenzisa isethulo esifihlekile ukuze sizame ukuyakha kabusha kufomethi yako yokuqala. I-autoencoder empeleni ifunda indlela yokucindezela idatha eluhlaza ngendlela enciphisa iphutha lokwakha kabusha.
Manje njengoba sesazi ukuthi i-autoencoder yenzani, masizame ukuqonda ukuthi sisho ukuthini ngesithwebuli sekhodi esizenzakalelayo “esiguquguqukayo”. Uma kuqhathaniswa nama-autoencoder ajwayelekile, ama-autoencoder ahlukile angeza ngaphambi kwesikhala esicashile.
Ngaphandle kokucwila kuzibalo, ukwengeza okungenzeka ngaphambili kugcina ukusabalalisa okucashile kuhlangene eduze. Umehluko omkhulu phakathi kwe-VAE ne-VQ-VAE ukuthi lena isebenzisa isethulo esifihlekile esifihlekile kunesiqhubekayo.
Ileveli ngayinye ye-VQ-VAE ibhala ngokuzimele okokufaka. Umbhalo wekhodi wezinga eliphansi ukhiqiza ukwakhiwa kabusha kwekhwalithi ephezulu kakhulu. Umbhalo wekhodi wezinga eliphezulu ugcina ulwazi olubalulekile lomculo.
Ukusebenzisa ama-Transformers
Manje njengoba sesinamakhodi omculo afakwe i-VQ-VAE, singazama ukwenza kanjalo khiqiza umculo kulesi sikhala esiqondile esicindezelwe.
I-Jukebox isebenzisa i-autoregressive transformers ukuze udale umsindo ophumayo. Ama-Transformer awuhlobo lwenethiwekhi ye-neural esebenza kangcono ngedatha elandelanayo. Njengoba kunikezwe ukulandelana kwamathokheni, imodeli ye-transformer izozama ukubikezela ithokheni elandelayo.
I-Jukebox isebenzisa okuhlukile okwenziwe lula kwe-Sparse Transformers. Uma wonke amamodeli angaphambili eseqeqeshiwe, isiguquli sikhiqiza amakhodi acindezelwe abese ehlehliswa abuyiselwe kumsindo ongahluziwe kusetshenziswa idekhoda ye-VQ-VAE.
I-Artist and Genre Conditioning ku-Jukebox
Imodeli ekhiqizayo ye-Jukebox yenziwa ilawuleke kakhulu ngokunikeza amasignali anemibandela eyengeziwe phakathi nesinyathelo sokuqeqesha.
Amamodeli okuqala anikezwa abaculi namalebula ohlobo lwengoma ngayinye. Lokhu kunciphisa i-entropy yokuqagela komsindo futhi kuvumela imodeli ukuthi ifinyelele ikhwalithi engcono. Amalebula futhi asenza sikwazi ukuqondisa imodeli ngesitayela esithile.
Ngaphandle kweciko nohlobo, amasiginali wesikhathi ayengezwa ngesikhathi sokuqeqeshwa. Lawa masignali ahlanganisa ubude bengoma, isikhathi sokuqala kwesampula ethile, kanye nengxenye yengoma edlulile. Lolu lwazi olwengeziwe lusiza imodeli ukuthi iqonde amaphethini omsindo ancike esakhiweni sonke.
Isibonelo, imodeli ingase ifunde ukuthi ukushaya ihlombe komculo obukhoma kwenzeka ekupheleni kwengoma. Imodeli ingafunda, ngokwesibonelo, ukuthi ezinye izinhlobo zinezigaba zezinsimbi ezinde kunezinye.
I-Lyrics
Amamodeli anesimo ashiwo esigabeni esandulele ayakwazi ukukhiqiza amazwi ahlukene okucula. Kodwa-ke, la mazwi avame ukungahambisani futhi angabonakali.
Ukulawula imodeli yokukhiqiza uma kuziwa ekukhiqizeni i-lyric, abacwaningi banikeza umongo owengeziwe ngesikhathi sokuqeqesha. Ukusiza imephu idatha yelyric ibe isikhathi somsindo wangempela, abacwaningi basebenzise I-Spleeter ukukhipha amazwi kanye I-NUS AutoLyricsAlign ukuthola ukuqondana kwezinga legama lamagama.
Imikhawulo ye-Jukebox Model
Omunye wemikhawulo eyinhloko ye-Jukebox ukuqonda kwayo izakhiwo zomculo ezinkulu. Isibonelo, isiqeshana esifushane samasekhondi angu-20 somphumela singase sizwakale sihlaba umxhwele, kodwa abalaleli bazoqaphela ukuthi ukwakheka komculo okujwayelekile kwamakhorasi namavesi aphindayo akukho ekuphumeni kokugcina.
Imodeli nayo iyashesha ukwenza. Kuthatha cishe amahora angu-9 ukuze unikeze ngokugcwele iminithi elilodwa lomsindo. Lokhu kukhawulela inani lezingoma ezingakhiqizwa futhi kuvimbela imodeli ukuthi isetshenziswe ezinhlelweni ezisebenzisanayo.
Okokugcina, abacwaningi baye baqaphela ukuthi idathasethi yesampula ngokuyinhloko ingesiNgisi futhi ibonisa ngokuyinhloko izimiso zomculo zaseNtshonalanga. Abacwaningi be-AI bangagxila ocwaningweni lwangomuso ekukhiqizeni umculo ngezinye izilimi nezitayela zomculo okungezona ezaseNtshonalanga.
Isiphetho
Iphrojekthi ye-Jukebox igqamisa amandla akhulayo amamodeli okufunda omshini ukuze adale izethulo ezinembile ezicashile zedatha eyinkimbinkimbi njengomsindo ongahluziwe. Ukuphumelela okufanayo kuyenzeka embhalweni, njengoba kubonakala kumaphrojekthi afana GPT-3, nemifanekiso, njengoba kubonakala kuma-OpenAI's I-DALL-E2.
Nakuba ucwaningo kulesi sikhala luhlaba umxhwele, kusenokukhathazeka mayelana namalungelo empahla yengqondo kanye nomthelela lawa mamodeli angase abe nawo ezimbonini zobuciko sezizonke. Abacwaningi nabadali kufanele baqhubeke nokusebenzisana eduze ukuze baqinisekise ukuthi lawa mamodeli angaqhubeka nokuthuthuka.
Amamodeli omculo wokukhiqiza wesikhathi esizayo angase akwazi ukusebenza njengethuluzi labaculi noma njengohlelo lokusebenza lwabadali abadinga umculo wangokwezifiso wamaphrojekthi.
shiya impendulo