Introduction to Artificial Intelligence MarcToussaint February6,2017 Themajorityofslidesonsearch,CSPandlogicareadaptedfromStuartRussell. This is a direct concatenation and reformatting of all lecture slides and exercises from theArtificialIntelligencecourse(winterterm2016/17,UStuttgart),includingindexing tohelpprepareforexams. Contents 1 Introduction 6 2 Search 14 Motivation&Outline 2.1 ProblemFormulation&Examples . . . . . . . . . . . . . . . . . . . . 14 Example:Romania(2:3)ProblemDefinition:Deterministic,fullyobservable(2:5) 2.2 BasicTreeSearchAlgorithms . . . . . . . . . . . . . . . . . . . . . . . 16 Treesearchimplementation: statesvsnodes(2:12)TreeSearch: GeneralAlgorithm (2:13)Breadth-firstsearch(BFS)(2:16)ComplexityofBFS(2:17)Uniform-costsearch (2:18)Depth-firstsearch(DFS)(2:19)ComplexityofDFS(2:20)Iterativedeepening search(2:22)ComplexityofIterativeDeepeningSearch(2:24)Graphsearchandre- peatedstates(2:26) 2.3 GreedyandA∗Search . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Best-firstSearch(2:30)GreedySearch(2:32)ComplexityofGreedySearch(2:35)A∗ search(2:36)A∗: Proof1ofOptimality(2:38)ComplexityofA∗(2:39)A∗: Proof2of Optimality(2:40)Admissibleheuristics(2:42)Memory-boundedA∗(2:45) 1 2 IntroductiontoArtificialIntelligence,MarcToussaint 3 Probabilities 34 Motivation & Outline Probabilities as (subjective) information calculus (3:3) Infer- ence:generalmeaning(3:5)FrequentistvsBayesian(3:6) 3.1 Basicdefinitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Definitionsbasedonsets(3:8)Randomvariables(3:9)Probabilitydistribution(3:10) Jointdistribution(3:11)Marginal(3:11)Conditionaldistribution(3:11)Bayes’Theo- rem(3:13)MultipleRVs,conditionalindependence(3:14) 3.2 Probabilitydistributions . . . . . . . . . . . . . . . . . . . . . . . . . . 39 BernoulliandBinomialdistributions(3:16)Beta(3:17)Multinomial(3:20)Dirichlet (3:21)Conjugatepriors(3:25) 3.3 Distributionsovercontinuousdomain . . . . . . . . . . . . . . . . . . 44 Dirac distribution (3:28) Gaussian (3:29) Particle approximation of a distribution (3:33) Utilities and Decision Theory (3:36) Entropy (3:37) Kullback-Leibler diver- gence(3:38) 3.4 MonteCarlomethods. . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Monte Carlo methods (3:40) Rejection sampling (3:41) Importance sampling (3:42) Student’st,Exponential,Laplace,Chi-squared,Gammadistributions(3:44) 4 Bandits,MCTS,&Games 51 Motivation&Outline 4.1 Bandits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Multi-armedBandits(4:2) 4.2 UpperConfidenceBounds(UCB) . . . . . . . . . . . . . . . . . . . . . 53 Exploration,Exploitation(4:7)UpperConfidenceBound(UCB1)(4:8) 4.3 MonteCarloTreeSearch . . . . . . . . . . . . . . . . . . . . . . . . . . 55 MonteCarloTreeSearch(MCTS)(4:14)UpperConfidenceTree(UCT)(4:19)MCTS forPOMDPs(4:20) 4.4 MCTSappliedtoPOMDPs* . . . . . . . . . . . . . . . . . . . . . . . . 58 4.5 GamePlaying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Minimax(4:29)Alpha-BetaPruning(4:32)Evaluationfunctions(4:37)UCTforgames (4:38) 4.6 Beyondbandits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 GlobalOptimization(4:43)GP-UCB(4:46)ActiveLearning(4:50) 4.7 ActiveLearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 IntroductiontoArtificialIntelligence,MarcToussaint 3 5 Formalmodelsofinteractivedomains 72 5.1 BasicTaxonomyofdomainmodels . . . . . . . . . . . . . . . . . . . . 72 PDDL (5:10) Noisy Deictic Rules (5:13) Markov Decision Process (5:15) POMDP (5:18)Dec-POMDP(5:22)Control(5:23) 6 DynamicProgramming 82 Motivation&Outline 6.1 DynamicProgramming . . . . . . . . . . . . . . . . . . . . . . . . . . 82 Value Function (6:3) Bellman optimality equation (6:7) Value Iteration (6:9) Q- Function(6:10)Q-Iteration(6:11)ProofofconvergenceofQ-Iteration(6:12) 6.2 DynamicProgramminginBeliefSpace . . . . . . . . . . . . . . . . . . 88 7 ReinforcementLearning 94 Motivation&Outline 7.1 LearninginMDPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Temporaldifference(TD)(7:10)Q-learning(7:11)ProofofconvergenceofQ-learning (7:14)Eligibilitytraces(7:16)Model-basedRL(7:28) 7.2 Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Epsilon-greedyexplorationinQ-learning(7:31)R-Max(7:32)BayesianRL(7:34)Op- timisticheuristics(7:35) 7.3 PolicySearch,Imitation,&InverseRL*. . . . . . . . . . . . . . . . . . 109 Policygradients(7:39)ImitationLearning(7:43)InverseRL(7:46) 8 ConstraintSatisfactionProblems 117 Motivation&Outline 8.1 ProblemFormulation&Examples . . . . . . . . . . . . . . . . . . . . 117 Inference (8:2) Constraint satisfaction problems (CSPs): Definition (8:4) Map- ColoringProblem(8:5) 8.2 MethodsforsolvingCSPs . . . . . . . . . . . . . . . . . . . . . . . . . 121 Backtracking (8:11) Variable order: Minimum remaining values (8:16) Variable or- der: Degreeheuristic(8:17)Valueorder: Leastconstrainingvalue(8:18)Constraint propagation(8:19)Tree-structuredCSPs(8:25) 4 IntroductiontoArtificialIntelligence,MarcToussaint 9 GraphicalModels 130 Motivation&Outline 9.1 BayesNetsandConditionalIndependence . . . . . . . . . . . . . . . 130 Bayesian Network (9:4) Conditional independence in a Bayes Net (9:8) Inference: generalmeaning(9:13) 9.2 InferenceMethodsinGraphicalModels . . . . . . . . . . . . . . . . . 136 Inferenceingraphicalmodels: overview(9:18)MonteCarlo(9:20)Importancesam- pling(9:23)Gibbssampling(9:25)Variableelimination(9:28)Factorgraph(9:31)Be- liefpropagation(9:37)Messagepassing(9:37)Loopybeliefpropagation(9:41)Junc- tiontreealgorithm(9:43)Maximuma-posteriori(MAP)inference(9:47)Conditional randomfield(9:48) 10 DynamicModels 150 Motivation&OutlineMarkovProcess(10:1)HiddenMarkovModel(10:2)Filtering, Smoothing,Prediction(10:3)HMM:Inference(10:4)HMMinference(10:5)Kalman filter(10:8) 11 PropositionalLogic 157 Motivation&Outline 11.1 Syntax&Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Knowledgebase:Definition(11:3)WumpusWorldexample(11:4)Logic:Definition, Syntax, Semantics (11:7) Propositional logic: Syntax (11:9) Propositional logic: Se- mantics(11:10)Logicalequivalence(11:12) 11.2 InferenceMethods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Inference(11:19)HornForm(11:23)ModusPonens(11:23)Forwardchaining(11:24) CompletenessofForwardChaining(11:27)BackwardChaining(11:28)Conjunctive NormalForm(11:31)Resolution(11:31)ConversiontoCNF(11:32) 12 First-OrderLogic 183 Motivation&Outline 12.1 TheFOLlanguage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 FOL:Syntax(12:4)Universalquantification(12:6)Existentialquantification(12:6) 12.2 FOLInference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Reductiontopropositionalinference(12:16)Unification(12:19)GeneralizedModus Ponens(12:20)ForwardChaining(12:21)BackwardChaining(12:27)Conversionto CNF(12:33)Resolution(12:35) IntroductiontoArtificialIntelligence,MarcToussaint 5 13 RelationalProbabilisticModellingandLearning 200 Motivation&Outline 13.1 STRIPS-likerulestomodelMDPtransitions . . . . . . . . . . . . . . . 200 MarkovDecisionProcess(MDP)(13:2)STRIPSrules(13:3)PlanningDomainDefini- tionLanguage(PDDL)(13:3)Learningprobabilisticrules(13:9)Planningwithprob- abilisticrules(13:11) 13.2 RelationalGraphicalModels . . . . . . . . . . . . . . . . . . . . . . . . 204 Probabilistic Relational Models (PRMs) (13:20) Markov Logic Networks (MLNs) (13:24)TheroleofuncertaintyinAI(13:31) 14 Exercises 212 14.1 Exercise1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 14.2 Exercise2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 14.3 Exercise3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 14.4 Exercise4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 14.5 Exercise5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 14.6 Exercise6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 14.7 Exercise6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 14.8 Exercise9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 14.9 Exercise7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 Index 230 sdeeqcuiseinotniasl propositional relational spedrqeoucbielseniomtniasl sequential assignment CSP proploosgiitcional FOL constraint deterministic seBaFrSch gamesbacktarmalppicnrhkuiaimnn/ibganexgta ptrrooenpeasgation cfwhdai/nbiwndg MCTS UCB bandits utilities graphical FOL relational probabilistic Decision Theory muMltiD-aPgsent pVrod(gsy)rn,a aQmm(msi,cMain)DgPsreMlaDtioPnsal models mprsHogMpb. aeMpglaismeasftssfiigownn.dg p/bawssding gmraopdheiclsal ML Reinforcement Learning learning LeAacrtnivineg 6 IntroductiontoArtificialIntelligence,MarcToussaint 1 Introduction (someslidesbasedonStuartRussell’sAIcourse) • ThecurrenthypeaboutAI – Thesingularity – Ethics – Thevalueproblem – Theoutrageousinabilityofhumanstodefinewhatis“good” – Paperclips • What’stheroutetoAI? – Neuroscience?(EUBigBrainproject) – DeepLearning?(PureMachineLearning?,DeepMind(London)) – Social/Emotional/conciousnes/Feelingsstuff? – HardcoreclassicalAI?Modernprobabilistic/learningAI? – Robotics? • WhatarethingsAIneverbeabletodo? • WhyistherenouniversitydepartmentforIntelligenceResearch?! 1:1 UpsanddownsofAI–history 1943 McCulloch&Pitts:Booleancircuitmodelofbrain 1950 Turing’s“ComputingMachineryandIntelligence” 1952–69 Look,Ma,nohands! 1950s EarlyAIprograms,includingSamuel’scheckersprogram, Newell&Simon’sLogicTheorist,Gelernter’sGeometryEngine 1956 Dartmouthmeeting:“ArtificialIntelligence”adopted 1965 Robinson’scompletealgorithmforlogicalreasoning 1966–74 AIdiscoverscomputationalcomplexity Neuralnetworkresearchalmostdisappears 1969–79 Earlydevelopmentofknowledge-basedsystems 1980–88 Expertsystemsindustrybooms 1988–93 Expertsystemsindustrybusts:“AIWinter” 1985–95 Neuralnetworksreturntopopularity 1988– Resurgenceofprobability;generalincreaseintechnicaldepth “NouvelleAI”:ALife,GAs,softcomputing 1995– Agents,agents,everywhere... 2003– Human-levelAIbackontheagenda 1:2 Whatisintelligence? • Maybeitiseasiertofirstaskwhatsystemsweactuallytalkabout: IntroductiontoArtificialIntelligence,MarcToussaint 7 – Decisionmaking – Interactingwithanenvironment • Thendefineobjectives! – Quantifywhatyouconsidergoodorsuccessful – Intelligencemeanstooptimize... 1:3 IntelligenceasOptimization? • A cognitive scientist or psychologist: “Why are you AI people always so ob- sessedwithoptimization? Humansarenotoptimal!” • That’satotalmisunderstandingofwhat“beingoptimal”means. • Optimizationprinciplesareameanstodescribesystems: – Feynman’s“unworldlinessmeasure”objectivefunction – Everythingcanbecastoptimal–undersomeobjective – Optimalityprinciplesarejustascientificmeansofformallydescribingsystemsand theirbehaviors(esp.inphysics,economy,...andAI) – Toussaint, Ritter&Brock: TheOptimizationRoutetoRobotics–andAlternatives. Ku¨nstliche Intelligenz,2015 • Generally,Iwouldroughlydistinguishthreebasictypesofproblems: – Optimization – Logical/categorialInference (CSP,findfeasiblesolutions) – ProbabilisticInference 1:4 Whatareinterestinggenericobjectives • Learntocontrolalldegreesoffreedomoftheenvironmentthatarecontrollable – DOFs are mechanical/kinematics DOFs, objects, light/temperature, mood of hu- mans – Thisobjectiveisgeneric:nopreferences,notlimits – ImpliestoactivelygoexploringandfindingcontrollableDOFs – ActingtoLearning(insteadof’LearningtoAct’forafixedtask) – Relatednotionsinotherfields: (Bayesian)ExperimentalDesign,ActiveLearning,cu- riosity,intrinsicmotivation • AttimeT,thesystemwillbegivenarandomtask(e.g.,randomgoalconfigu- rationofDOFs);theobjectivethenistoreachitasquicklyaspossible 1:5 8 IntroductiontoArtificialIntelligence,MarcToussaint Interactivedomains • Weassumetheagentisininteractionwithadomain. – Theworldisinastatest ∈S (seebelowonwhatthatmeans) – Theagentsensesobservationsyt ∈O – Theagentdecidesonanactionat ∈A – Theworldtransitionstoanewstatest+1 • Theobservationy describesallinformationreceivedbytheagent(sensors,also t rewards,feedback,etc)ifnotexplicitlystatedotherwise (ThetechnicaltermforthisisaPOMDP) 1:6 State • Thenotionofstateisoftenusedimprecisely • Atanytimet,weassumetheworldisinastates ∈S t • s isastatedescriptionofadomainifffutureobservationsy ,t+ >tarecondi- t t+ tionallyindependentofallhistoryobservationsy ,t− <tgivens andfuture t− t actionsa : t:t+ agent y0 a0 y1 a1 y2 a2 y3 a3 s0 s1 s2 s3 • Notes: – Intuitively,stdescribeseverythingabouttheworldthatis“relevant” – Worldsdonothaveadditionallatent(hidden)variablestothestatest 1:7 Examples IntroductiontoArtificialIntelligence,MarcToussaint 9 • Whatisasufficientdefinitionofstateofacomputerthatyouinteractwith? • Whatisasufficientdefinitionofstateforathermostatscenario? (First,assumethe’room’isanisolatedchamber.) • Whatisasufficientdefinitionofstateinanautonomouscarcase? →inrealworlds,theexactstateispracticallynotrepresentable →allmodelsofdomainswillhavetomakeapproximatingassumptions(e.g., aboutindependencies) 1:8 Howcanagentsbeformallydescribed? ...or,whatformalclassesofagentsdoexist? • Basicalternativeagentmodels: – Theagentmapsyt (cid:55)→at (stimulus-responsemapping..non-optimal) – Theagentstoresallpreviousobservationsandmaps f : y ,a (cid:55)→a 0:t 0:t-1 t f iscalledagentfunction. Thisisthemostgeneralmodel,includingtheothersas specialcases. – Theagentstoresonlytherecenthistoryandmaps yt−k:t,at−k:t-1 (cid:55)→at(crude,butmaybeagoodheuristic) – Theagentissomemachinewithitsowninternalstatent,e.g.,acomputer,afinite statemachine,abrain... Theagentmaps(nt-1,yt)(cid:55)→nt(internalstateupdate)and n (cid:55)→a t t – The agent maintains a full probability distribution (belief) bt(st) over the state, maps(bt-1,yt)(cid:55)→bt(Bayesianbeliefupdate),andbt (cid:55)→at 1:9 POMDPcoupledtoastatemachineagent 10 IntroductiontoArtificialIntelligence,MarcToussaint agent n0 n1 n2 y0 a0 y1 a1 y2 a2 s0 s1 s2 r0 r1 r2 1:10 Multi-agentdomainmodels (ThetechnicaltermforthisisaDecentralizedPOMDPs) (fromKumaretal.,IJCAI2011) • Thisisaspecialtype(simplification)ofageneralDEC-POMDP • Generally,thislevelofdescriptionisverygeneral,butNEXP-hard Approximatemethodscanyieldverygoodresults,though 1:11 • Wegaveaverygeneralmodel(formalization)ofwhatitmeansthatanagent takesdecisionsinaninteractiveenvironment • Therearemanyflavorsofthis: – Fullyobservablevs.partiallyobservable – Singleagentvs.multiagent – Deterministicvs.stochastic – Structureofthestatespace:Discrete,continuous,hybrid;factored;relational – Discretevs.continuoustime 1:12 Organisation
Description: