Teaching a Powered Prosthetic Arm with an Intact Arm Using Reinforcement Learning by Gautham Vasan Athesissubmittedinpartialfulfillmentoftherequirementsforthedegreeof MasterofScience DepartmentofComputingScience UniversityofAlberta (cid:13)c GauthamVasan,2017 Abstract The idea of an amputee playing the piano with all the flair and grace of an able- handedpersonmayseemlikeafuturisticfantasy. Whilemanyprostheticlimbslook lifelike, finding one that also moves naturally has proved more of a challenge for bothresearchersandamputees. Eventhoughsophisticatedupperextremityprosthe- ses like the Modular Prosthetic Limb (MPL) are capable of effecting almost all of themovementsasahumanarmandhand,theycanbeusefulonlyifrobustsystems of control are available. The fundamental issue is that there is a significant mis- matchbetweenthenumberofcontrollablefunctionsavailableinmodernprosthetic arms and the number of control signals that can be provided by an amputee at any given moment. In order to bridge the gap in control, we require a neural interface that can translate the physiological signals of the user into a large number of joint commandsforsimultaneous,coordinatedcontroloftheartificiallimb. Inthisthesis,wefocusonacollaborativeapproachtowardsthecontrolofpow- ered prostheses. In our approach, the user shifts greater autonomy to the prosthetic device, thereby sharing the burden of control between the human and the machine. In essence, the prosthesis or rehabilitative device learns to “fill in the gaps” for the user. Withthisviewinourmind,wedevelopedamethodthatcouldallowsomeone with an amputation to use their non-amputated arm to teach their prosthetic arm how to move in a natural and coordinated way by simply showing the prosthetic arm the right way to move in response to inputs from the user. Such a paradigm couldwellexploitthemusclesynergiesalreadylearnedbytheuser. Considercases where an amputee has a desired movement goal, e.g., “add sugar to my coffee,” “button up my shirt,” or “shake hands with an acquaintance”. In these more com- plicated examples, it may be difficult for a user to frame their objectives in terms ii of device control parameters or existing device gestures, but they may be able to executethesemotionsskillfullywiththeirremainingbiologicallimb. As a first contribution of this thesis, we present results from our work on learn- ing from demonstration using Actor-Critic Reinforcement Learning (ACRL), and showthatable-bodiedsubjects(n = 3)areabletotrainaprostheticarmtoperform synergistic movements in three degrees of freedom(DOF) (wrist flexion, wrist ro- tation and hand open/close). The learning system uses only the joint position and velocityinformationfromtheprosthesisandabove-elbowmyoelectricsignalsfrom the user. We also assessed the performance of the system with an amputee partic- ipant and demonstrate that the learning-from-demonstration paradigm can be used toteachaprostheticarmnatural,coordinatedmovementswiththeintactarm. For our second contribution, we describe a sensor fusion and artificial vision based control approach that could potentially give rise to context-aware control of a multi-DOF prosthesis. Our results indicate that the learning system can make use of the addition sensory and motor information to determine and context and differentiatebetweendifferentmovementsynergies. Ourresultssuggestthatthislearning-from-demonstrationparadigmmaybewell suited to use by both patients and clinicians with minimal technical knowledge, as it allows a user to personalize the control of his or her prosthesis without having to know the underlying mechanics of the prosthetic limb. This approach may extend inastraightforwardwaytonext-generationprostheseswithprecisefingerandwrist control, such that these devices may someday allow users to perform fluid and in- tuitive movements like playing the piano, catching a ball, and comfortably shaking hands. iii Preface A version of Chapter 3 has been accepted for publication and presentation as Gau- tham Vasan, Patrick M. Pilarski, Learning from Demonstration: Teaching a Myo- electricProsthesiswithanintactlimbviaReinforcementLearning,fortheProc. of the2017IEEEInternationalConferenceonRehabilitationRobotics(ICORR).Lon- don,UnitedKingdom,2017. Itwaspresentedasaposterandalsoasafast-forward session podium talk for 60 seconds. Our work was nominated for the best poster award(top20among600+posters)asapartofRehabWeek2017. An extended abstract of this paper was also presented as a poster and podium talk(20mins)attheMultidisciplinaryConferenceonReinforcementLearningand Decision Making (RLDM), Ann Arbor, Michigan, June 11-14, 2017. I was re- sponsible for experimental design, implementation, execution, and analysis and for the bulk of manuscript composition. Pilarski was the supervisory author and contributed to the manuscript and provided implementation feedback and insights throughout. This research received ethics approval for two protocols from the human re- searchethicsboardattheUniversityofAlberta iv Dedication Tomyparents,VasanandVijayalakshmi. Thankyoubothfortheloveandsupport overtheyearsandlettingmeliveonmyownterms. Tomybrother,Srinath,thanksforputtingupwithallmyrandomnotionsandlong rantsaboutAI,politics,etc. andcounteringmyargumentswithagreatsenseof humor v Wehaveabrainforonereasononereasononly: toproduceadaptableandcomplexmovements –DanielWolpert vi Acknowledgements IwishtoexpressmygratitudetoPatrickM.Pilarski. Hisenthusiasmandoptimistic outlook towards research is highly contagious :) To Patrick: thanks a lot for your mentorshipandconstantencouragement. You’vehelpedmegrowasascientistand provided numerous opportunities to be a part of amazing research. I couldn’t have askedforabettersupervisor. I’d also like to thank Rich Sutton. His course was the major decisive factor in my choice to work on Reinforcement Learning. He’s provided me with critical feedback which I believe has made me a better researcher. Many thanks to Martha White and Ming Chan for their thoughtful questions and suggestions as a part of mythesiscommittee. I’m incredibly grateful Jaden Travnik, Vivek Veeriah, Craig Sherstan, Michael RoryDawsonandKoryMathewsonforalltheirhelpthroughout. We’vehadseveral interestingdiscussionsaboutAI,neuroscienceandrobotics. Iamalsohugelygrate- ful to other colleagues in the BLINC lab (Alex, Nadia, Dylan, Katherine, Tarvo, McNiel, Jon, Sai, Riley, Quinn, Oggy and James) and the RLAI lab for their gen- uineaffabilityandcamaraderie. Many thanks to Trevor Lashyn, Matthew Curran and Ming Chan for getting me involved in the Cerebral Palsy and Spasticity trials. It was a great learning experience. vii Table of Contents 1 Introduction 1 1.1 WhatShouldtheIdealProstheticControlSystemDo? . . . . . . . . 2 1.2 OurApproachtoAddressingTheseChallenges . . . . . . . . . . . 3 1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 KeyContributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4.1 Extending Learning from Demonstration to Powered Pros- thesisDomains . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4.2 Context-AwareLearningfromDemonstration . . . . . . . . 5 1.4.3 Development of Delsys Trigno, CyberTouch II, Thalmic MyoandBentoArmControlandSoftwareInterface . . . . 5 2 Background 6 2.1 MyoelectricControl . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.1 DegreesofControl . . . . . . . . . . . . . . . . . . . . . . 7 2.1.2 ConventionalMyoelectricControl . . . . . . . . . . . . . . 9 2.1.3 PatternRecognition . . . . . . . . . . . . . . . . . . . . . . 9 2.1.4 MotorSkillLearning . . . . . . . . . . . . . . . . . . . . . 11 2.2 ReinforcementLearning(RL) . . . . . . . . . . . . . . . . . . . . 11 2.2.1 TheAgent-EnvironmentInterface . . . . . . . . . . . . . . 11 2.2.2 RLNotation . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.3 ValueFunctionApproaches(Critic-OnlyMethods) . . . . . 14 2.2.4 PolicyGradientFramework . . . . . . . . . . . . . . . . . 17 2.2.5 ValueFunctionMethodsvsPolicySearch . . . . . . . . . . 20 2.2.6 Actor-CriticMethods . . . . . . . . . . . . . . . . . . . . . 21 2.2.7 ActorCriticReinforcementLearning(ACRL) . . . . . . . . 23 2.2.8 Tilecoding . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.3 ProstheticControlasaRobotControlProblem . . . . . . . . . . . 26 3 Learning from Demonstration: Teaching a Prosthetic Arm Using an IntactArmviaReinforcementLearning 28 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.4 Learning a Control Policy in Real-Time Using Actor-Critic Rein- forcementLearning . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.6.1 ComparisonofPerformanceBetweenSubjects . . . . . . . 40 3.6.2 TransferabilityofResults . . . . . . . . . . . . . . . . . . . 42 3.6.3 WhyReinforcementLearning? . . . . . . . . . . . . . . . . 42 3.6.4 ExtendingtoApplicationsBeyondUpper-LimbProstheses . 43 3.7 ExtendingtoBimanualTaskssuchasPlayingtheGuitar . . . . . . 44 viii 3.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4 Case Study: Learning from Demonstration with an Amputee Partici- pant 46 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.2.1 LearningaControlPolicyUsingACRL . . . . . . . . . . . 49 4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5 Context-awarecontrolofmulti-DOFprosthesis 54 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.2 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.3.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 5.3.2 PredictioninAdaptiveControl . . . . . . . . . . . . . . . . 58 5.3.3 MakingPredictionsUsingTD((cid:21)) . . . . . . . . . . . . . . 59 5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 5.5.1 End-to-endRLforContext-AwareControlinaRealWorld Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 5.5.2 Sensor Fusion for Context-Aware Control of a multi-DOF Prosthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.5.3 RepresentationLearning . . . . . . . . . . . . . . . . . . . 64 5.5.4 StudyLimitationsandFutureDevelopments . . . . . . . . . 65 5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 6 Discussion&FutureWork 66 6.1 InformationExtractionforVisuomotorLearning . . . . . . . . . . . 66 6.1.1 FeatureIntegrationTheoryofVisualAttention . . . . . . . 68 6.2 AlternativeApproachestoLearnaRobustControlPolicy . . . . . . 70 6.2.1 MotorPrimitives . . . . . . . . . . . . . . . . . . . . . . . 70 6.2.2 NaturalActor-Critic(NAC) . . . . . . . . . . . . . . . . . 72 6.2.3 ApprenticeshipLearningviaInverseReinforcementLearning 74 6.2.4 GuidedPolicySearch . . . . . . . . . . . . . . . . . . . . . 75 6.3 IntelligentBehaviorPrinciplesDrawnfromtheNeurobiologyofan Octopus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 7 Conclusion 79 Bibliography 82 Appendices 89 A ROSInfrastructure 90 B ThalmicMyoArmband 92 C DelsysTrignoWirelessLab 93 D CyberTouchII 95 E IntegratingBentoArmwithTrigno&CyberGlove 97 ix List of Figures 2.1 ModularProstheticLimb(MPL) . . . . . . . . . . . . . . . . . . . 8 2.2 SignalProcessingStagesforPatternRecognition . . . . . . . . . . 9 2.3 TheAgent-EnvironmentInteractioninRL. . . . . . . . . . . . . . . 12 2.4 ExampleofValueEstimationUsingTemporalDifference(TD)learn- inginGridWorld . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.5 Actor-CriticArchitecture . . . . . . . . . . . . . . . . . . . . . . . 22 2.6 Tilecoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.1 CollaborativeControlofaProstheticArm . . . . . . . . . . . . . . 30 3.2 ExperimentalSetupWhichIncludestheBentoArm,DelsysTrigno WirelessLabandCyberTouchII . . . . . . . . . . . . . . . . . . . 33 3.3 Schematic Showing the Flow of Information Through the Experi- mentalSetupDuringtheTrainingPeriod. . . . . . . . . . . . . . . 34 3.4 AnatomicalTermsofHandMotion . . . . . . . . . . . . . . . . . . 35 3.5 SchematicShowingtheFlowofInformationDuringDeployment . . 36 3.6 Comparison of mean absolute angular error accumulated over the courseof(cid:24) 45minoflearning. . . . . . . . . . . . . . . . . . . . . 41 3.7 Comparison of target (grey line) and achieved (colored lines) actu- atortrajectoriesovertrainingandtestingperiods . . . . . . . . . . . 41 3.8 Testing the Learned Control Policy Over Different Sessions (on DifferentDays) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.1 LearningfromDemonstration-StudywithanAmputeeParticipant . 46 4.2 We obtain control signals and state information from the control arm. ThetrainingarmprovidesrewardsignalsfortheRLagent. . . 48 4.3 Comparison of differential EMG signal (red line) and target actu- ator trajectories (blue line) over training. There exists a clear cor- relation between wrist rotation joint angle and the processed EMG signals (left) whereas there is no distinct correlation between wrist flexionjointangleandtheprocessedEMGsignals(right). . . . . . 50 4.4 Amputeetrial-Comparisonoftarget(greyline)andachieved(col- oredlines)actuatortrajectoriesovertrainingandtestingperiods. . . 51 4.5 Amputee trial - Comparison of mean absolute angular error accu- mulatedoverthecourseof(cid:24) 45minoflearning. . . . . . . . . . . . 52 5.1 Experimental Setup Which Includes Artificial Vision, Bento Arm, ThalmicMyoArmbandandCyberTouchII . . . . . . . . . . . . . 56 5.2 Example Input Images of the Different Objects Provided to the LearningSystem. . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 5.3 Wrist Rotation Angle Predictions in the Contextual Learning From DemonstrationTrials . . . . . . . . . . . . . . . . . . . . . . . . . 61 5.4 Wrist Flexion Angle Predictions in the Contextual Learning From DemonstrationTrials . . . . . . . . . . . . . . . . . . . . . . . . . 61 x
Description: