Masithi uzama ukufundisa irobhothi ukuhambahamba. Ngokungafaniyo nokufundisa ikhompyutha indlela yokuqikelela amaxabiso esitokhwe okanye ukwahlula imifanekiso ngokweendidi, asinayo ngokwenene i-dataset enkulu esinokuyisebenzisa ukuqeqesha irobhothi yethu.
Nangona kunokuzizela kuwe, ukuhamba ngokwenene sisenzo esinzima kakhulu. Ukuhamba inyathelo ngokuqhelekileyo kubandakanya inqwaba yezihlunu ezahlukeneyo ezisebenza kunye. Umgudu nobuchule obusetyenziswayo ukusuka kwenye indawo ukuya kwenye bukwaxhomekeke kwizinto ezahlukeneyo, kuquka enoba uphethe into ethile okanye kukho ukuthambeka okanye ezinye iintlobo zemiqobo.
Kwiimeko ezifana nezi, sinokusebenzisa indlela eyaziwa ngokuba yi-reinforcement learning okanye i-RL. Nge-RL, unokuchaza injongo ethile ofuna ukuba imodeli yakho isombulule kwaye ngokuthe ngcembe uvumele imodeli ifunde ngokwayo indlela yokuyifeza.
Kweli nqaku, siza kuphonononga iziseko zokufunda okomeleza kunye nendlela esinokusisebenzisa ngayo isakhelo se-RL kwiingxaki ezahlukeneyo ezahlukeneyo kwihlabathi lokwenyani.
Yintoni imfundo yokomeleza?
Ukomeleza ukufunda kubhekisa kwisetyana elithile le yokufunda umatshini egxile ekufumaneni izisombululo ngokuvuza izimilo ezinqwenelekayo nokohlwaya ukuziphatha okungafunekiyo.
Ngokungafaniyo nokufunda okugadwayo, indlela yokufunda yokomeleza ayinayo idatha yoqeqesho ebonelela ngesiphumo esifanelekileyo segalelo elinikiweyo. Ngokungabikho kwedatha yoqeqesho, i-algorithm kufuneka ifumane isisombululo ngokuzama kunye nephutha. I-algorithm, esibhekisa kuyo ngokuqhelekileyo njenge arhente, kufuneka ifumane isisombululo ngokwayo ngokunxibelelana ne indawo.
Abaphandi bathatha isigqibo malunga nokuba zeziphi iziphumo ezithile umvuzo kwaye yintoni ialgorithm ekwaziyo ukuyenza. Yonke inyathelo i-algorithm ithatha iya kufumana uhlobo oluthile lwengxelo ebonisa ukuba i-algorithm iqhuba kakuhle kangakanani. Ngethuba lenkqubo yoqeqesho, i-algorithm ekugqibeleni iya kufumana isisombululo esona sisombululo sokusombulula ingxaki ethile.
Umzekelo olula: 4×4 Igridi
Makhe sijonge kumzekelo olula wengxaki esinokuyisombulula ngokufunda okomeleza.
Masithi sinegridi ye-4 × 4 njengendawo yethu. I-arhente yethu ibekwe ngokungaqhelekanga kwenye yezikwere kunye nemiqobo embalwa. Igridi iya kuba nemiqobo emithathu "yomgodi" ekufuneka igwenywe kunye nomvuzo "wedayimane" omnye ekufuneka i-arhente iwufumane. Inkcazo epheleleyo yendawo esingqongileyo yaziwa ngokuba yeyemo engqongileyo Lumente.
Kwimodeli yethu ye-RL, i-arhente yethu inokufudukela kuyo nayiphi na isikwere esikufutshane ukuba akukho miqobo ibathintelayo. Iseti yazo zonke iintshukumo ezisebenzayo kwindawo enikiweyo yaziwa ngokuba yi indawo yentshukumo. Injongo yearhente yethu kukufumana eyona ndlela imfutshane eya kumvuzo.
I-arhente yethu iya kusebenzisa indlela yokufunda yokomeleza ukufumana indlela eya kwidayimane efuna ubuncinci bamanyathelo. Isinyathelo ngasinye esilungileyo siya kunika i-robot umvuzo kwaye inyathelo ngalinye elingalunganga liya kukhupha umvuzo werobhothi. Imodeli ibala umvuzo opheleleyo xa i-arhente ifikelele kwidayimani.
Ngoku sele sichazile iarhente kunye nokusingqongileyo, kufuneka sichaze imigaqo emasiyisebenzise ukumisela inyathelo elilandelayo eliza kuthatyathwa yi-arhente xa kujongwa imeko yayo yangoku kunye nokusingqongileyo.
Imigaqo-nkqubo kunye neMvuzo
Kwimodeli yokufunda yokomeleza, a inkqubo ibhekisa kwisicwangciso esisetyenziswa yi-arhente ukufezekisa iinjongo zabo. Umgaqo-nkqubo we-arhente yinto eyenza isigqibo malunga nokuba i-arhente kufuneka yenze ntoni ngokulandelayo ngokuqwalasela imeko yangoku ye-arhente kunye nokusingqongileyo.
I-arhente kufuneka ivavanye yonke imigaqo-nkqubo enokubakho ukuze ibone ukuba yeyiphi na imigaqo-nkqubo efanelekileyo.
Kumzekelo wethu olula, ukuhlala kwindawo engenanto kuya kubuyisela ixabiso le--1. Xa i-arhente ifika kwisithuba esinomvuzo wedayimani, baya kufumana ixabiso le-10. Ukusebenzisa la maxabiso, sinokuthelekisa imigaqo-nkqubo eyahlukeneyo ngokusebenzisa umsebenzi oluncedo U.
Ngoku makhe sithelekise ukusetyenziswa kwale migaqo-nkqubo mibini ibonwe ngasentla:
U(A) = -1 – 1 -1 + 10 = 7
U(B) = -1 – 1 – 1 – 1 – 1 + 10 = 5
Iziphumo zibonisa ukuba iPolisi A yeyona ndlela ingcono yokufumana umvuzo. Ke, iarhente iya kusebenzisa iNdlela A ngaphezulu kwePolisi B.
Uphononongo vs. Ukuxhaphaza
I-exploration vs. exploitation trade-off problem kwi-reinforcement learning yingxaki ekufuneka i-arhente ijongane nayo ngexesha lenkqubo yokwenza isigqibo.
Ngaba iiarhente kufuneka zigxile ekuphononongeni iindlela ezintsha okanye ukhetho okanye kufuneka baqhubeke bexhaphaza iinketho abasele bezazi?
Ukuba i-arhente ikhetha ukuphonononga, kukho ithuba lokuba i-arhente ifumane ukhetho olungcono, kodwa inokuba sengozini yokuchitha ixesha kunye nezixhobo. Kwelinye icala, ukuba iarhente ikhetha ukuxhaphaza isisombululo esele isazi, inokuphoswa kukhetho olungcono.
Izicelo eziSebenzayo
Nazi ezinye iindlela Abaphandi be-AI basebenzise iimodeli zokomeleza ukufunda ukusombulula iingxaki zehlabathi lokwenyani:
Ukomelezwa kokuFunda kwiiMoto eziziqhubayo
Imfundo yokomelezwa isetyenziswe kwiimoto eziziqhubayo ukuze ziphucule amandla azo okuqhuba ngokukhuselekileyo nangempumelelo. Itekhnoloji yenza ukuba iimoto ezizimeleyo zifunde kwiimpazamo zazo kwaye ziqhubeke zihlengahlengisa indlela eziziphatha ngayo ukuze zisebenze kakuhle.
Umzekelo, inkampani ye-AI esekwe eLondon Wayve uye wasebenzisa ngempumelelo imodeli yokufunda yokuqinisa enzulu yokuqhuba ngokuzimeleyo. Kuvavanyo lwabo, basebenzise umsebenzi wokuvuza owandisa inani lexesha elibalekayo isithuthi ngaphandle kokuba umqhubi anike igalelo.
Iimodeli zeRL zikwanceda iimoto ukuba zenze izigqibo ezisekelwe kokusingqongileyo, njengokunqanda imiqobo okanye ukudibanisa kwitrafikhi. Le mizekelo kufuneka ifumane indlela yokuguqula indawo enzima ejikeleze imoto ibe yindawo emeleyo yombuso onokuyiqonda imodeli.
UkuFunda okomeleza kwiiRobhothi
Abaphandi baye basebenzisa ukufunda okomeleza ukwenza iirobhothi ezinokufunda imisebenzi enzima. Ngokusebenzisa le mifuziselo ye-RL, iirobhothi ziyakwazi ukujonga indawo ezihlala kuzo kwaye zenze izigqibo ezisekelwe kwimigqaliselo yazo.
Umzekelo, uphando lwenziwe malunga nokusebenzisa imodeli yokufunda yokuqinisa ukuvumela iirobhothi ze-bipedal ukuba zifunde indlela yokwenza uhamba ngokwabo.
Abaphandi bajonga iRL njengendlela ephambili kwintsimi yerobhothi. Ukomeleza ukufunda kunika iiarhente zerobhothi isakhelo sokufunda iintshukumo ezintsonkothileyo ezinokuthi ngenye indlela kube nzima ukwenza ubunjineli.
Ukomelezwa kokuFunda kuMdlalo
Iimodeli zeRL nazo zisetyenzisiwe ukufunda indlela yokudlala imidlalo yevidiyo. Iiarhente zinokusekwa ukuba zifunde kwiimpazamo zabo kwaye ziqhubeke ziphucula ukusebenza kwazo kumdlalo.
Abaphandi sele bephuhlise iiarhente ezinokudlala imidlalo efana nechess, Hamba, kunye nepoker. Ngo-2013, DeepMind isetyenzisiwe ukuFundisa okuNzululwazi ukuvumela imodeli ukuba ifunde indlela yokudlala imidlalo ye-Atari ukusuka ekuqaleni.
Uninzi lwemidlalo yebhodi kunye nemidlalo yevidiyo inendawo encinci yesenzo kunye nenjongo yekhonkrithi echazwe kakuhle. Ezi mpawu zisebenza kuncedo lwemodeli yeRL. Iindlela ze-RL zinokuphinda-phinda ngokukhawuleza ngaphezulu kwezigidi zemidlalo efanisiweyo ukufunda amaqhinga afanelekileyo okuphumelela.
isiphelo
Nokuba kukufunda ukuhamba okanye ukufunda indlela yokudlala imidlalo yevidiyo, iimodeli ze-RL zibonakaliswe njengezikhokelo ze-AI eziluncedo zokusombulula iingxaki ezifuna ukwenziwa kwezigqibo ezinzima.
Njengoko itekhnoloji iqhubeka nokuvela, bobabini abaphandi kunye nabaphuhlisi baya kuqhubeka befumana izicelo ezitsha ezithatha ithuba lokukwazi ukuzifundisa kwemodeli.
Zeziphi izicelo ezisebenzayo ocinga ukuba ukomeleza ukufunda kunganceda kuzo?
Shiya iMpendulo