ebook img

Attentive Deep Architecture for Adaptive Transfer from multiple sources in the same domain PDF

37 Pages·2017·2.71 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Attentive Deep Architecture for Adaptive Transfer from multiple sources in the same domain

Attend, Adapt and Transfer: Attentive Deep Architecture for Adaptive Transfer from multiple sources in the same domain Janarthanan Rajendran, Aravind S. Lakshminarayanan, Mitesh M. Khapra, P Prasanna, Balaraman Ravindran UniversityofMichigan,IndianInstituteofTechnologyMadras,McGillUniversity ICLR 2017 Presenter: Jack Lanchantin ICLR2017Presenter:JackLanchantin 1 JanarthananRajendran,AravindS.LakshminaArtatyeannda,nA,dMaipttesahndMT.rKahnasfperra:,APttPernatsivaennDae,eBpaAlarrcahmitaenctRuraevifnodrrAandaptiveTransferfrom/m2u8ltiplesourcesinthesamedomain Outline 1 Knowledge Transfer and A2T 2 Knowledge Transfer with A2T Reinforcement Learning Policy Transfer Value Transfer 3 Experiments and Results Selective Transfer Avoiding Transfer Choosing When to Transfer 4 Conclusions ICLR2017Presenter:JackLanchantin 1 JanarthananRajendran,AravindS.LakshminaArtatyeannda,nA,dMaipttesahndMT.rKahnasfperra:,APttPernatsivaennDae,eBpaAlarrcahmitaenctRuraevifnodrrAandaptiveTransferfrom/m2u8ltiplesourcesinthesamedomain Knowledge Transfer ICLR2017Presenter:JackLanchantin 2 JanarthananRajendran,AravindS.LakshminaArtatyeannda,nA,dMaipttesahndMT.rKahnasfperra:,APttPernatsivaennDae,eBpaAlarrcahmitaenctRuraevifnodrrAandaptiveTransferfrom/m2u8ltiplesourcesinthesamedomain This paper: Using combination of the solutions to obtain K T N (cid:88) K (s) = w K (s)+ w K (s) (1) T N+1,s B i,s i i=1 w is the weight of solution i at state s (learned by a separate network) i,s Knowledge Transfer N source tasks with K ,K ,...,K being the solutions of the source 1 2 N tasks (e.g. tennis coaches) K is the base solution for the target task which starts learning from B scratch (tennis student’s initial knowledge) K is the solution we want to learn for target task T (tennis student’s T final skills) ICLR2017Presenter:JackLanchantin 3 JanarthananRajendran,AravindS.LakshminaArtatyeannda,nA,dMaipttesahndMT.rKahnasfperra:,APttPernatsivaennDae,eBpaAlarrcahmitaenctRuraevifnodrrAandaptiveTransferfrom/m2u8ltiplesourcesinthesamedomain Knowledge Transfer N source tasks with K ,K ,...,K being the solutions of the source 1 2 N tasks (e.g. tennis coaches) K is the base solution for the target task which starts learning from B scratch (tennis student’s initial knowledge) K is the solution we want to learn for target task T (tennis student’s T final skills) This paper: Using combination of the solutions to obtain K T N (cid:88) K (s) = w K (s)+ w K (s) (1) T N+1,s B i,s i i=1 w is the weight of solution i at state s (learned by a separate network) i,s ICLR2017Presenter:JackLanchantin 3 JanarthananRajendran,AravindS.LakshminaArtatyeannda,nA,dMaipttesahndMT.rKahnasfperra:,APttPernatsivaennDae,eBpaAlarrcahmitaenctRuraevifnodrrAandaptiveTransferfrom/m2u8ltiplesourcesinthesamedomain Attention Network for Selective Transfer (A2T) N (cid:88) K (s) = w K (s)+ w K (s) (1) T N+1,s B i,s i i=1 ICLR2017Presenter:JackLanchantin 4 JanarthananRajendran,AravindS.LakshminaArtatyeannda,nA,dMaipttesahndMT.rKahnasfperra:,APttPernatsivaennDae,eBpaAlarrcahmitaenctRuraevifnodrrAandaptiveTransferfrom/m2u8ltiplesourcesinthesamedomain Outline 1 Knowledge Transfer and A2T 2 Knowledge Transfer with A2T Reinforcement Learning Policy Transfer Value Transfer 3 Experiments and Results Selective Transfer Avoiding Transfer Choosing When to Transfer 4 Conclusions ICLR2017Presenter:JackLanchantin 5 JanarthananRajendran,AravindS.LakshminaArtatyeannda,nA,dMaipttesahndMT.rKahnasfperra:,APttPernatsivaennDae,eBpaAlarrcahmitaenctRuraevifnodrrAandaptiveTransferfrom/m2u8ltiplesourcesinthesamedomain A: finite set of actions P: state transition probability matrix, Pa = P[S = s(cid:48)|S = s,A = a] ss(cid:48) t+1 t t r(s,a): reward function R : return, sum of rewards over the agent’s trajectory: R = t t r +r +r +...+r = (cid:80) γkr t t+1 t+2 T k=0:∞ t+k π: policy function, distribution over actions given states: π(a,s) = P[A = a|S = s] t t V(s): state value function, the expected return of a policy π, for every state: V (s) = E [R |S = s] π π t t Background: Reinforcement Learning S: finite set of states ICLR2017Presenter:JackLanchantin 6 JanarthananRajendran,AravindS.LakshminaArtatyeannda,nA,dMaipttesahndMT.rKahnasfperra:,APttPernatsivaennDae,eBpaAlarrcahmitaenctRuraevifnodrrAandaptiveTransferfrom/m2u8ltiplesourcesinthesamedomain P: state transition probability matrix, Pa = P[S = s(cid:48)|S = s,A = a] ss(cid:48) t+1 t t r(s,a): reward function R : return, sum of rewards over the agent’s trajectory: R = t t r +r +r +...+r = (cid:80) γkr t t+1 t+2 T k=0:∞ t+k π: policy function, distribution over actions given states: π(a,s) = P[A = a|S = s] t t V(s): state value function, the expected return of a policy π, for every state: V (s) = E [R |S = s] π π t t Background: Reinforcement Learning S: finite set of states A: finite set of actions ICLR2017Presenter:JackLanchantin 6 JanarthananRajendran,AravindS.LakshminaArtatyeannda,nA,dMaipttesahndMT.rKahnasfperra:,APttPernatsivaennDae,eBpaAlarrcahmitaenctRuraevifnodrrAandaptiveTransferfrom/m2u8ltiplesourcesinthesamedomain r(s,a): reward function R : return, sum of rewards over the agent’s trajectory: R = t t r +r +r +...+r = (cid:80) γkr t t+1 t+2 T k=0:∞ t+k π: policy function, distribution over actions given states: π(a,s) = P[A = a|S = s] t t V(s): state value function, the expected return of a policy π, for every state: V (s) = E [R |S = s] π π t t Background: Reinforcement Learning S: finite set of states A: finite set of actions P: state transition probability matrix, Pa = P[S = s(cid:48)|S = s,A = a] ss(cid:48) t+1 t t ICLR2017Presenter:JackLanchantin 6 JanarthananRajendran,AravindS.LakshminaArtatyeannda,nA,dMaipttesahndMT.rKahnasfperra:,APttPernatsivaennDae,eBpaAlarrcahmitaenctRuraevifnodrrAandaptiveTransferfrom/m2u8ltiplesourcesinthesamedomain

Description:
Knowledge Transfer. Janarthanan Rajendran, Aravind S. Lakshminarayanan, Mitesh M. Khapra, P Prasanna, Balaraman Ravindran. Attend, Adapt and Transfer: Attentive Deep Architecture for Adaptive Transfer from multiple sources i. ICLR 2017 Presenter: Jack Lanchantin. 2. / 28
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.