Minimally Naturalistic Artificial Intelligence StevenHansen DepartmentofPsychology StanforedUniversity Stanford,CA94303 7 [email protected] 1 0 2 n a 1 Introduction J 4 Therapidadvancementofmachinelearningtechniqueshasre-energizedresearchintogeneralartifi- 1 cialintelligence. Whiletheideaofdomain-agnosticmeta-learningisappealing,thisemergingfield mustcometo termswith itsrelationshipto humancognitionandthestatistics andstructureof the ] I taskshumansperform. Thepositionofthisarticleisthatonlybyaligningouragents’abilitiesand A environmentswiththoseofhumansdowestandachanceatdevelopinggeneralartificialintelligence . (GAI). s c Abroadreadingofthefamous“NoFreeLunch”theorem[1]isthatthereisnouniversallyoptimal [ inductivebiasor,equivalently,bias-freelearningisimpossible.Thisfollowsfromthefactthatthere 1 areaninfinitenumberofwaystoextrapolatedata,anyofwhichmightbetheoneusedbythedata v generating environment; an inductive bias prefers some of these extrapolations to others, which 8 lowersperformanceinenvironmentsusingtheseadversarialextrapolations. We maypositthatthe 6 optimalGAIistheonethatmaximallyexploitsthestatisticsofitsenvironmenttocreateitsinductive 8 bias;acceptingthefactthatthisagentisguaranteedtobeextremelysub-optimalforsomealternative 3 environments. This trade-off appears benign when thinking about the environment as being the 0 physical universe, as performanceon any fictive universe is obviouslyirrelevant. But, we should . 1 expectasharperinductivebiasifwefurtherconstrainourenvironment.Indeed,weimplicitlydoso 0 bydefiningGAI in termsofaccomplishingthathumansconsideruseful. One commonversionof 7 thisisneedthefor“common-sensereasoning”,whichimplicitlyappealstothestatisticsofphysical 1 universeasperceivedbyhumans[2]. : v i X 2 The CommAI Environment r a The CommAI environment[3] is both too broad and too narrow. It is too broad in that the tasks need notbear in relation to those encounteredby humans, and are untetheredto the statistics that lietherein. Forexample,imaginethataCommAIagentistryingtofollowwrittendrivingdirection toreachallocationinasimulatedcar. Mostofthetimetheteachergivesstandardinstructionsakin to those of a human. But sometimes, directions are given under the assumption that the world is governedbynon-Euclideangeometry(e.g. "driveuntiltheseparallelstreetsintersect"). Anoptimal agentshouldspendtime tryingto inferthe geometrygoverningthe teacher’sdirections. But such anagentiswoefullysub-optimalintherealworld,asdirectionsgivenunderthesebeliefsarefartoo raretowarrantsuchexploratorybehavior.ThevariationintheCommAIenvironmentdoesn’tmatch thevariationintherealenvironment. Thiscriticismappearstobeaddressedbyappealingtometa-learningorlearning-to-learn.Forexam- ple,byfirstexploringthespaceofpossibleteachers,andthenforminganexplorationstrategybased ontheempiricalvariation. Butthismerelypushestheproblemtoahigherlevelofabstraction. We canimagineanenvironmentwherebythetypeofvariationinthesetofteachersdoesnotalignwith reality,suchasteachers’whosegeometricbelieveschangebythedayoftheweek.Anoptimalagent wouldthushavetotrackthestatisticsoftheteachers’beliefsconditionedbythedayoftheweekin 30thConferenceonNeuralInformationProcessingSystems(NIPS2016),Barcelona,Spain. ordertoinfertheappropriateexplorationstrategy,whichwouldclearlyleadtospuriouscorrelations inanaturalisticsetting. The CommAIenvironmentalso too narrow, in that it throwsoutnaturalistic structure in the input and output representations in the pursuit of simplicity. While the bit stream passed between the the teacher and the agent can send image data, its one dimensional nature means stripping it of its two-dimensionalstructure. While this couldbe recoveredbythe agent, sucha problemis non- trivial and would likely slow progress on its ultimate task, which is problematic since one of the environment’sprimarygoalsis tostrip awayextraneouspreprocessinginorderto focuseffortson generallearning principles. The authorsattempt to address this criticism by the suggested use of object detection systems to convert image data into a format with a similarity structure akin to linguistic data, but this ignores the fact that much of the recent progress in machine learning has beenduetosystemsthatadapttheirrepresentationstotheultimateobjectivefunction,foregoingthe modularizationnecessitatedbyanobjectdetectionsystem[4]. NoneofthisistosaythattheCommaAIenvironmentwon’tbeausefultoolinthepursuitofgeneral artificial intelligence. Indeed, the high barrier to entry imposed by vision-based environments is probably the reason why more work isn’t done on higher level tasks like instruction taking. Sep- arating these projects may accelerate progress in both areas, as the reasons behind unsuccessful modelingattemptscanbemorereadilydiagnosed.However,wemustbecognizantthatwe’vemade thissplitforpragmaticreasonsratherthanphilosophicalones. 3 The CaseofRewardInference Aconcretecasewheregenericandminimallynaturalisticapproachestoartificialintelligencediverge istheproblemofgoalinference.Whilerewardisinsomesensethesimplestpossiblefeedback,there aremanycaseswhereevena rewardsignalishardtocomeby[5]. Indeed,sociallearningappears to be one of these cases[6]. Instead, we must frame our agent’s motivations in terms of a latent intention that the teacher has in mind, but can only communicateindirectly (e.g. sending reward, verbalinstructions). Themostgenericwaytohandlesuchasituationisbytakingtherewardsignal atfacevalueandtreatotherindicationsoflatentintentionasadditionalstateinformation. Thiswas theapproachtakenintheworkofBranavanetal[7](thoughtheydidnothavetohandletheissueof impreciserewards).Intryingtosolveavideo-game(Civilization2)viamodel-basedreinforcement learning, they utilized the written instructionsin the gamemanualto augmentthe state-space and focusthesearch. AmorepsychologicallyinformedapproachcanbeseenintheworkofUllmanetal,wherethetask istoexplainthebehaviorofotheragentsintheenvironment. Onecouldtakethegenericapproach andtryandfindamappingbetweenobservationsoftheotheragentstothecorrectexplanationthe behaviorof otheragentswasunderstoodby inverseplanningona modelof eachagent’sbehavior. Notonlydidthisapproachpredicthumanperformanceonthistask,relatedworkhasshownthatthis approachisalsoacomputationallyviablewaytohandleimpreciserewardsignals[6]. Havinganagentmodelitsteacherisverydifficult,astheagentislikelytoonlyobservetheteacher is circumscribed, unrepresentative situations (e.g. when the teacher is about to teach something). However, if the teacher is sufficiently similar to the agent, the agent can use a model of its own behaviorasastand-inforateachermodel. OneoftheoverarchingassumptionsofGAIisthatwe wantto be the onesgivinginstructions; we wantto be the teacher. Thegrowingbodyof workon humanlearningsuggeststhatinordertodosotractably,weshouldmakeadeliberateefforttomake the inductive biases of our agent as close to ours as possible by training in naturalistic situations. Theextentto whichthisconstraintisweighedagainstothers, suchasthe computationalresources required for simulating an environment, is an open question, but some minimal commitment to naturalismisnecessary. References [1]Wolpert,D.H.,&Macready,W.G.(1997).Nofreelunchtheoremsforoptimization.IEEEtransactionson evolutionarycomputation,1(1),67-82. [2]Minsky, M.L.,Singh, P.,&Sloman, A.(2004). TheSt. Thomascommonsensesymposium: designing architecturesforhuman-levelintelligence.AIMagazine,25(2),113. 2 [3] Mikolov, T., Joulin, A., & Baroni, M. (2015). A roadmap towards machine intelligence. arXivpreprint arXiv:1511.08130. [4]Mnih,V.,Kavukcuoglu,K.,Silver,D.,Rusu,A.A.,Veness,J.,Bellemare,M.G.,...&Petersen,S.(2015). Human-levelcontrolthroughdeepreinforcementlearning.Nature,518(7540),529-533. [5]Loftin,R.,Peng, B., MacGlashan, J.,Littman,M.L.,Taylor, M.E.,Huang, J.,&Roberts, D.L.(2016). Learningbehaviorsviahuman-delivereddiscretefeedback: modelingimplicitfeedbackstrategiestospeedup learning.AutonomousAgentsandMulti-AgentSystems,30(1),30-59. [6] Ullman, T., Baker, C., Macindoe, O., Evans, O., Goodman, N., & Tenenbaum, J. B. (2009). Help or hinder: Bayesianmodelsofsocialgoalinference. InAdvancesinneuralinformationprocessingsystems(pp. 1874-1882). [7]Branavan,S.R.K.,Silver,D.,&Barzilay,R.(2011,June).LearningtowinbyreadingmanualsinaMonte- Carloframework.InProceedingsofthe49thAnnualMeetingoftheAssociationforComputationalLinguistics: HumanLanguageTechnologies-Volume1(pp.268-277). AssociationforComputationalLinguistics. 3