Ake sithi uzama ukufundisa irobhothi ukuhamba. Ngokungafani nokufundisa ikhompuyutha ukuthi ibikezela kanjani izintengo zesitoko noma ukuhlukanisa izithombe ngokwezigaba, asinayo ngempela idathasethi enkulu esingayisebenzisela ukuqeqesha irobhothi lethu.
Nakuba kungase kube ngokwemvelo kuwe, ukuhamba empeleni kuyisenzo esiyinkimbinkimbi kakhulu. Ukuhamba isinyathelo ngokuvamile kuhilela inqwaba yemisipha ehlukene esebenza ndawonye. Umzamo nezindlela ezisetshenziswayo ukuhamba usuka endaweni ethile uya kwenye nakho kuncike ezicini ezihlukahlukene, okuhlanganisa ukuthi uphethe okuthile noma kukhona ukuthambekela noma ezinye izinhlobo zezithiyo.
Ezimeni ezifana nalezi, singasebenzisa indlela eyaziwa ngokuthi ukufunda kokuqinisa noma i-RL. Nge-RL, ungachaza umgomo othize ofuna imodeli yakho iyixazulule futhi kancane kancane uvumele imodeli ukuthi ifunde ngokwayo indlela yokuyifeza.
Kulesi sihloko, sizohlola izinto eziyisisekelo zokufunda okuqiniswayo nokuthi singalusebenzisa kanjani uhlaka lwe-RL ezinkingeni ezihlukahlukene emhlabeni wangempela.
Kuyini ukufunda okuqiniswayo?
Ukufunda okuqiniswayo kubhekise esihlokweni esithile se ukufunda imishini egxile ekutholeni izixazululo ngokuvuza ukuziphatha okufiswayo nokujezisa ukuziphatha okungafunwa.
Ngokungafani nokufunda okugadiwe, indlela yokufunda yokuqinisa ngokuvamile ayinayo idathasethi yokuqeqesha enikeza okukhiphayo okulungile kokokufaka okunikeziwe. Uma ingekho idatha yokuqeqeshwa, i-algorithm kufanele ithole isisombululo ngokuzama nangephutha. I-algorithm, esivame ukubhekisela kuyo njenge- agent, kumele izitholele isisombululo ngokwayo ngokuxhumana ne imvelo.
Abacwaningi banquma ukuthi yimiphi imiphumela ethile umvuzo nokuthi yini i-algorithm ekwazi ukuyenza. Njalo isenzo i-algorithm ethathwayo izothola uhlobo oluthile lwempendulo ekhombisa ukuthi i-algorithm yenza kahle kangakanani. Phakathi nenqubo yokuqeqesha, i-algorithm ekugcineni izothola isisombululo esilungile sokuxazulula inkinga ethile.
Isibonelo Esilula: Igridi engu-4×4
Ake sibheke isibonelo esilula senkinga esingayixazulula ngokufunda ngokuqinisa.
Ake sithi sinegridi engu-4×4 njengendawo yethu. Umenzeli wethu ubekwe ngokungahleliwe kwesinye sezikwele kanye nezithiyo ezimbalwa. Igridi izoqukatha izithiyo ezintathu “zomgodi” okufanele zigwenywe kanye nomvuzo owodwa “wedayimane” okumele umenzeli awuthole. Incazelo ephelele yendawo yethu yaziwa ngokuthi imvelo isimo.
Kumodeli yethu ye-RL, umenzeli wethu angathuthela kunoma yisiphi isikwele esiseduze inqobo nje uma zingekho izithiyo ezibavimbayo. Isethi yazo zonke izenzo ezivumelekile endaweni ethile yaziwa ngokuthi indawo yesenzo. Umgomo womenzeli wethu ukuthola indlela emfushane eya emvuzweni.
Umenzeli wethu uzosebenzisa indlela yokufunda yokuqinisa ukuze athole indlela eya kudayimane edinga inani elincane lezinyathelo. Isinyathelo ngasinye esifanele sizonikeza irobhothi umvuzo futhi isinyathelo ngasinye esingalungile sizokhipha umvuzo werobhothi. Imodeli ibala umvuzo ophelele uma i-ejenti ifinyelela idayimane.
Manje njengoba sesichazile i-ejenti nendawo, kufanele siphinde sichaze imithetho ezosetshenziswa ukunquma isenzo esilandelayo umenzeli azosithatha uma kubhekwa isimo saso samanje nendawo ezungezile.
Izinqubomgomo Nemiklomelo
Kumodeli yokufunda yokuqinisa, a inqubomgomo isho isu elisetshenziswa i-ejenti ukufeza izinhloso zabo. Inqubomgomo ye-ejenti yiyona enqumayo ukuthi umenzeli kufanele enzeni ngokulandelayo uma kubhekwa isimo samanje se-ejenti nendawo yayo.
I-ejenti kufanele ihlole zonke izinqubomgomo ezingaba khona ukuze ibone ukuthi iyiphi inqubomgomo efanelekile.
Esibonelweni sethu esilula, ukufika endaweni engenalutho kuzobuyisela inani elingu -1. Uma i-ejenti ifika endaweni enomvuzo wedayimane, izothola inani elingu-10. Sisebenzisa la manani, singaqhathanisa izinqubomgomo ezahlukene sisebenzisa i- umsebenzi wokusiza U.
Manje ake siqhathanise ukusetshenziswa kwezinqubomgomo ezimbili ezibonwe ngenhla:
U(A) = -1 – 1 -1 + 10 = 7
U(B) = -1 – 1 – 1 – 1 – 1 + 10 = 5
Imiphumela ibonisa ukuthi Ipholisi A iyindlela engcono yokuthola umvuzo. Ngakho-ke, i-ejenti izosebenzisa uMzila A ngaphezu Kwenqubomgomo B.
Ukuhlola vs. Ukuxhashazwa
Inkinga yokuhwebelana ngokuhlola iqhathaniswa nokuxhashazwa ekuqiniseni ukufunda iyinkinga i-ejenti okufanele ibhekane nayo phakathi nenqubo yesinqumo.
Ingabe ama-ejenti kufanele agxile ekuhloleni izindlela ezintsha noma izinketho noma kufanele aqhubeke nokusebenzisa izinketho asebezazi?
Uma i-ejenti ikhetha ukuhlola, kunethuba lokuthi i-ejenti ithole inketho engcono, kodwa futhi ingase ibe sengozini yokumosha isikhathi nezinsiza. Ngakolunye uhlangothi, uma i-ejenti ikhetha ukuxhaphaza isisombululo esele isazi, ingase iphuthelwe inketho engcono.
Izicelo Ezisebenzayo
Nazi ezinye izindlela Abacwaningi be-AI basebenzise amamodeli okufunda okuqinisa ukuxazulula izinkinga zomhlaba wangempela:
Ukuqinisa Ukufunda Ezimotweni Ezizishayelayo
Ukufunda ukuqinisa kusetshenziswe ezimotweni ezizishayelayo ukuze kuthuthukiswe ikhono lazo lokushayela ngokuphepha nangempumelelo. Ubuchwepheshe buvumela izimoto ezizimele ukuthi zifunde emaphutheni azo futhi zihlale zilungisa ukuziphatha kwazo ukuze zithuthukise ukusebenza kwazo.
Isibonelo, inkampani ye-AI eseLondon Wayve usebenzise ngempumelelo imodeli yokufunda yokuqinisa ejulile yokushayela ngokuzenzakalelayo. Ekuhloleni kwabo, basebenzise umsebenzi wokuklomelisa okhuphula inani lesikhathi imoto egijima ngayo ngaphandle kokuba umshayeli egibele anikeze okokufaka.
Amamodeli e-RL futhi asiza izimoto ukuthi zenze izinqumo ezisekelwe endaweni ezungezile, njengokugwema izithiyo noma ukuhlangana nethrafikhi. Lezi zinhlobo kufanele zithole indlela yokuguqula indawo eyinkimbinkimbi ezungeze imoto ibe yindawo emele isimo esingasiqonda imodeli.
Ukuqinisa Ukufunda Kumarobhothi
Abacwaningi bebelokhu besebenzisa ukufunda kokuqinisa ukwakha amarobhothi angafunda imisebenzi eyinkimbinkimbi. Ngalawa mamodeli e-RL, amarobhothi ayakwazi ukubuka indawo akuyo futhi enze izinqumo ngokusekelwe kulokho akubonayo.
Isibonelo, ucwaningo lwenziwe ngokusebenzisa amamodeli okufunda okuqinisa ukuvumela amarobhothi ahamba nge-bipedal ukuthi afunde ukwenza Hamba bebodwa.
Abacwaningi babheka i-RL njengendlela eyinhloko emkhakheni wamarobhothi. Ukufunda kokuqinisa kunikeza ama-robotic agents uhlaka lokufunda izenzo eziyinkimbinkimbi okungenzeka kube nzima ukuzinjiniyela.
Ukuqinisa Ukufunda Emidlalweni
Amamodeli e-RL nawo asetshenziselwe ukufunda ukudlala imidlalo yevidiyo. Ama-ejenti angasethwa ukuze afunde emaphutheni awo futhi aqhubeke ethuthukisa ukusebenza kwawo kugeyimu.
Abacwaningi sebevele benza ama-agent angakwazi ukudlala imidlalo efana ne-chess, i-Go, ne-poker. Ngo-2013, Deepmind kusetshenziswe i-Deep Reinforcement Learning ukuvumela imodeli ukuthi ifunde ukudlala imidlalo ye-Atari kusukela ekuqaleni.
Imidlalo eminingi yebhodi nemidlalo yevidiyo inesikhala esilinganiselwe sesenzo kanye nomgomo ophathekayo ochazwe kahle. Lezi zici zisebenza ngokuzuzisa imodeli ye-RL. Izindlela ze-RL zingaphindaphinda ngokushesha ngaphezu kwezigidi zemidlalo elungisiwe ukuze ufunde amasu aphelele okuzuza ukunqoba.
Isiphetho
Noma ngabe ukufunda ukuhamba noma ukufunda ukudlala imidlalo yevidiyo, amamodeli e-RL afakazelwe njengezinhlaka ze-AI eziwusizo zokuxazulula izinkinga ezidinga ukuthathwa kwezinqumo eziyinkimbinkimbi.
Njengoba ubuchwepheshe buqhubeka nokuvela, bobabili abacwaningi nabathuthukisi bazoqhubeka nokuthola izinhlelo zokusebenza ezintsha ezisebenzisa ikhono lemodeli lokuzifundisa.
Yiziphi izinhlelo ezisebenzayo ocabanga ukuthi ukufunda kokuqinisa kungasiza kuzo?
shiya impendulo