Attend, Adapt and Transfer: Attentive Deep Architecture for Adaptive Transfer from multiple sources in the same domain Janarthanan Rajendran, Aravind S. Lakshminarayanan, Mitesh M. Khapra, P Prasanna, Balaraman Ravindran UniversityofMichigan,IndianInstituteofTechnologyMadras,McGillUniversity ICLR 2017 Presenter: Jack Lanchantin ICLR2017Presenter:JackLanchantin 1 JanarthananRajendran,AravindS.LakshminaArtatyeannda,nA,dMaipttesahndMT.rKahnasfperra:,APttPernatsivaennDae,eBpaAlarrcahmitaenctRuraevifnodrrAandaptiveTransferfrom/m2u8ltiplesourcesinthesamedomain Outline 1 Knowledge Transfer and A2T 2 Knowledge Transfer with A2T Reinforcement Learning Policy Transfer Value Transfer 3 Experiments and Results Selective Transfer Avoiding Transfer Choosing When to Transfer 4 Conclusions ICLR2017Presenter:JackLanchantin 1 JanarthananRajendran,AravindS.LakshminaArtatyeannda,nA,dMaipttesahndMT.rKahnasfperra:,APttPernatsivaennDae,eBpaAlarrcahmitaenctRuraevifnodrrAandaptiveTransferfrom/m2u8ltiplesourcesinthesamedomain Knowledge Transfer ICLR2017Presenter:JackLanchantin 2 JanarthananRajendran,AravindS.LakshminaArtatyeannda,nA,dMaipttesahndMT.rKahnasfperra:,APttPernatsivaennDae,eBpaAlarrcahmitaenctRuraevifnodrrAandaptiveTransferfrom/m2u8ltiplesourcesinthesamedomain This paper: Using combination of the solutions to obtain K T N (cid:88) K (s) = w K (s)+ w K (s) (1) T N+1,s B i,s i i=1 w is the weight of solution i at state s (learned by a separate network) i,s Knowledge Transfer N source tasks with K ,K ,...,K being the solutions of the source 1 2 N tasks (e.g. tennis coaches) K is the base solution for the target task which starts learning from B scratch (tennis student’s initial knowledge) K is the solution we want to learn for target task T (tennis student’s T final skills) ICLR2017Presenter:JackLanchantin 3 JanarthananRajendran,AravindS.LakshminaArtatyeannda,nA,dMaipttesahndMT.rKahnasfperra:,APttPernatsivaennDae,eBpaAlarrcahmitaenctRuraevifnodrrAandaptiveTransferfrom/m2u8ltiplesourcesinthesamedomain Knowledge Transfer N source tasks with K ,K ,...,K being the solutions of the source 1 2 N tasks (e.g. tennis coaches) K is the base solution for the target task which starts learning from B scratch (tennis student’s initial knowledge) K is the solution we want to learn for target task T (tennis student’s T final skills) This paper: Using combination of the solutions to obtain K T N (cid:88) K (s) = w K (s)+ w K (s) (1) T N+1,s B i,s i i=1 w is the weight of solution i at state s (learned by a separate network) i,s ICLR2017Presenter:JackLanchantin 3 JanarthananRajendran,AravindS.LakshminaArtatyeannda,nA,dMaipttesahndMT.rKahnasfperra:,APttPernatsivaennDae,eBpaAlarrcahmitaenctRuraevifnodrrAandaptiveTransferfrom/m2u8ltiplesourcesinthesamedomain Attention Network for Selective Transfer (A2T) N (cid:88) K (s) = w K (s)+ w K (s) (1) T N+1,s B i,s i i=1 ICLR2017Presenter:JackLanchantin 4 JanarthananRajendran,AravindS.LakshminaArtatyeannda,nA,dMaipttesahndMT.rKahnasfperra:,APttPernatsivaennDae,eBpaAlarrcahmitaenctRuraevifnodrrAandaptiveTransferfrom/m2u8ltiplesourcesinthesamedomain Outline 1 Knowledge Transfer and A2T 2 Knowledge Transfer with A2T Reinforcement Learning Policy Transfer Value Transfer 3 Experiments and Results Selective Transfer Avoiding Transfer Choosing When to Transfer 4 Conclusions ICLR2017Presenter:JackLanchantin 5 JanarthananRajendran,AravindS.LakshminaArtatyeannda,nA,dMaipttesahndMT.rKahnasfperra:,APttPernatsivaennDae,eBpaAlarrcahmitaenctRuraevifnodrrAandaptiveTransferfrom/m2u8ltiplesourcesinthesamedomain A: finite set of actions P: state transition probability matrix, Pa = P[S = s(cid:48)|S = s,A = a] ss(cid:48) t+1 t t r(s,a): reward function R : return, sum of rewards over the agent’s trajectory: R = t t r +r +r +...+r = (cid:80) γkr t t+1 t+2 T k=0:∞ t+k π: policy function, distribution over actions given states: π(a,s) = P[A = a|S = s] t t V(s): state value function, the expected return of a policy π, for every state: V (s) = E [R |S = s] π π t t Background: Reinforcement Learning S: finite set of states ICLR2017Presenter:JackLanchantin 6 JanarthananRajendran,AravindS.LakshminaArtatyeannda,nA,dMaipttesahndMT.rKahnasfperra:,APttPernatsivaennDae,eBpaAlarrcahmitaenctRuraevifnodrrAandaptiveTransferfrom/m2u8ltiplesourcesinthesamedomain P: state transition probability matrix, Pa = P[S = s(cid:48)|S = s,A = a] ss(cid:48) t+1 t t r(s,a): reward function R : return, sum of rewards over the agent’s trajectory: R = t t r +r +r +...+r = (cid:80) γkr t t+1 t+2 T k=0:∞ t+k π: policy function, distribution over actions given states: π(a,s) = P[A = a|S = s] t t V(s): state value function, the expected return of a policy π, for every state: V (s) = E [R |S = s] π π t t Background: Reinforcement Learning S: finite set of states A: finite set of actions ICLR2017Presenter:JackLanchantin 6 JanarthananRajendran,AravindS.LakshminaArtatyeannda,nA,dMaipttesahndMT.rKahnasfperra:,APttPernatsivaennDae,eBpaAlarrcahmitaenctRuraevifnodrrAandaptiveTransferfrom/m2u8ltiplesourcesinthesamedomain r(s,a): reward function R : return, sum of rewards over the agent’s trajectory: R = t t r +r +r +...+r = (cid:80) γkr t t+1 t+2 T k=0:∞ t+k π: policy function, distribution over actions given states: π(a,s) = P[A = a|S = s] t t V(s): state value function, the expected return of a policy π, for every state: V (s) = E [R |S = s] π π t t Background: Reinforcement Learning S: finite set of states A: finite set of actions P: state transition probability matrix, Pa = P[S = s(cid:48)|S = s,A = a] ss(cid:48) t+1 t t ICLR2017Presenter:JackLanchantin 6 JanarthananRajendran,AravindS.LakshminaArtatyeannda,nA,dMaipttesahndMT.rKahnasfperra:,APttPernatsivaennDae,eBpaAlarrcahmitaenctRuraevifnodrrAandaptiveTransferfrom/m2u8ltiplesourcesinthesamedomain
Description: