Table Of ContentAttend, Adapt and Transfer: Attentive Deep
Architecture for Adaptive Transfer from multiple sources
in the same domain
Janarthanan Rajendran, Aravind S. Lakshminarayanan, Mitesh M.
Khapra, P Prasanna, Balaraman Ravindran
UniversityofMichigan,IndianInstituteofTechnologyMadras,McGillUniversity
ICLR 2017
Presenter: Jack Lanchantin
ICLR2017Presenter:JackLanchantin 1
JanarthananRajendran,AravindS.LakshminaArtatyeannda,nA,dMaipttesahndMT.rKahnasfperra:,APttPernatsivaennDae,eBpaAlarrcahmitaenctRuraevifnodrrAandaptiveTransferfrom/m2u8ltiplesourcesinthesamedomain
Outline
1 Knowledge Transfer and A2T
2 Knowledge Transfer with A2T
Reinforcement Learning
Policy Transfer
Value Transfer
3 Experiments and Results
Selective Transfer
Avoiding Transfer
Choosing When to Transfer
4 Conclusions
ICLR2017Presenter:JackLanchantin 1
JanarthananRajendran,AravindS.LakshminaArtatyeannda,nA,dMaipttesahndMT.rKahnasfperra:,APttPernatsivaennDae,eBpaAlarrcahmitaenctRuraevifnodrrAandaptiveTransferfrom/m2u8ltiplesourcesinthesamedomain
Knowledge Transfer
ICLR2017Presenter:JackLanchantin 2
JanarthananRajendran,AravindS.LakshminaArtatyeannda,nA,dMaipttesahndMT.rKahnasfperra:,APttPernatsivaennDae,eBpaAlarrcahmitaenctRuraevifnodrrAandaptiveTransferfrom/m2u8ltiplesourcesinthesamedomain
This paper: Using combination of the solutions to obtain K
T
N
(cid:88)
K (s) = w K (s)+ w K (s) (1)
T N+1,s B i,s i
i=1
w is the weight of solution i at state s (learned by a separate network)
i,s
Knowledge Transfer
N source tasks with K ,K ,...,K being the solutions of the source
1 2 N
tasks (e.g. tennis coaches)
K is the base solution for the target task which starts learning from
B
scratch (tennis student’s initial knowledge)
K is the solution we want to learn for target task T (tennis student’s
T
final skills)
ICLR2017Presenter:JackLanchantin 3
JanarthananRajendran,AravindS.LakshminaArtatyeannda,nA,dMaipttesahndMT.rKahnasfperra:,APttPernatsivaennDae,eBpaAlarrcahmitaenctRuraevifnodrrAandaptiveTransferfrom/m2u8ltiplesourcesinthesamedomain
Knowledge Transfer
N source tasks with K ,K ,...,K being the solutions of the source
1 2 N
tasks (e.g. tennis coaches)
K is the base solution for the target task which starts learning from
B
scratch (tennis student’s initial knowledge)
K is the solution we want to learn for target task T (tennis student’s
T
final skills)
This paper: Using combination of the solutions to obtain K
T
N
(cid:88)
K (s) = w K (s)+ w K (s) (1)
T N+1,s B i,s i
i=1
w is the weight of solution i at state s (learned by a separate network)
i,s
ICLR2017Presenter:JackLanchantin 3
JanarthananRajendran,AravindS.LakshminaArtatyeannda,nA,dMaipttesahndMT.rKahnasfperra:,APttPernatsivaennDae,eBpaAlarrcahmitaenctRuraevifnodrrAandaptiveTransferfrom/m2u8ltiplesourcesinthesamedomain
Attention Network for Selective Transfer (A2T)
N
(cid:88)
K (s) = w K (s)+ w K (s) (1)
T N+1,s B i,s i
i=1
ICLR2017Presenter:JackLanchantin 4
JanarthananRajendran,AravindS.LakshminaArtatyeannda,nA,dMaipttesahndMT.rKahnasfperra:,APttPernatsivaennDae,eBpaAlarrcahmitaenctRuraevifnodrrAandaptiveTransferfrom/m2u8ltiplesourcesinthesamedomain
Outline
1 Knowledge Transfer and A2T
2 Knowledge Transfer with A2T
Reinforcement Learning
Policy Transfer
Value Transfer
3 Experiments and Results
Selective Transfer
Avoiding Transfer
Choosing When to Transfer
4 Conclusions
ICLR2017Presenter:JackLanchantin 5
JanarthananRajendran,AravindS.LakshminaArtatyeannda,nA,dMaipttesahndMT.rKahnasfperra:,APttPernatsivaennDae,eBpaAlarrcahmitaenctRuraevifnodrrAandaptiveTransferfrom/m2u8ltiplesourcesinthesamedomain
A: finite set of actions
P: state transition probability matrix,
Pa = P[S = s(cid:48)|S = s,A = a]
ss(cid:48) t+1 t t
r(s,a): reward function
R : return, sum of rewards over the agent’s trajectory: R =
t t
r +r +r +...+r = (cid:80) γkr
t t+1 t+2 T k=0:∞ t+k
π: policy function, distribution over actions given states:
π(a,s) = P[A = a|S = s]
t t
V(s): state value function, the expected return of a policy π, for
every state: V (s) = E [R |S = s]
π π t t
Background: Reinforcement Learning
S: finite set of states
ICLR2017Presenter:JackLanchantin 6
JanarthananRajendran,AravindS.LakshminaArtatyeannda,nA,dMaipttesahndMT.rKahnasfperra:,APttPernatsivaennDae,eBpaAlarrcahmitaenctRuraevifnodrrAandaptiveTransferfrom/m2u8ltiplesourcesinthesamedomain
P: state transition probability matrix,
Pa = P[S = s(cid:48)|S = s,A = a]
ss(cid:48) t+1 t t
r(s,a): reward function
R : return, sum of rewards over the agent’s trajectory: R =
t t
r +r +r +...+r = (cid:80) γkr
t t+1 t+2 T k=0:∞ t+k
π: policy function, distribution over actions given states:
π(a,s) = P[A = a|S = s]
t t
V(s): state value function, the expected return of a policy π, for
every state: V (s) = E [R |S = s]
π π t t
Background: Reinforcement Learning
S: finite set of states
A: finite set of actions
ICLR2017Presenter:JackLanchantin 6
JanarthananRajendran,AravindS.LakshminaArtatyeannda,nA,dMaipttesahndMT.rKahnasfperra:,APttPernatsivaennDae,eBpaAlarrcahmitaenctRuraevifnodrrAandaptiveTransferfrom/m2u8ltiplesourcesinthesamedomain
r(s,a): reward function
R : return, sum of rewards over the agent’s trajectory: R =
t t
r +r +r +...+r = (cid:80) γkr
t t+1 t+2 T k=0:∞ t+k
π: policy function, distribution over actions given states:
π(a,s) = P[A = a|S = s]
t t
V(s): state value function, the expected return of a policy π, for
every state: V (s) = E [R |S = s]
π π t t
Background: Reinforcement Learning
S: finite set of states
A: finite set of actions
P: state transition probability matrix,
Pa = P[S = s(cid:48)|S = s,A = a]
ss(cid:48) t+1 t t
ICLR2017Presenter:JackLanchantin 6
JanarthananRajendran,AravindS.LakshminaArtatyeannda,nA,dMaipttesahndMT.rKahnasfperra:,APttPernatsivaennDae,eBpaAlarrcahmitaenctRuraevifnodrrAandaptiveTransferfrom/m2u8ltiplesourcesinthesamedomain
Description:Knowledge Transfer. Janarthanan Rajendran, Aravind S. Lakshminarayanan, Mitesh M. Khapra, P Prasanna, Balaraman Ravindran. Attend, Adapt and Transfer: Attentive Deep Architecture for Adaptive Transfer from multiple sources i. ICLR 2017 Presenter: Jack Lanchantin. 2. / 28