Proc.R.Soc.B (2011) 278, 2325–2332 doi:10.1098/rspb.2010.2518 Publishedonline 5January 2011 Risk-sensitivity and the mean-variance trade-off: decision making in sensorimotor control Arne J. Nagengast1,2,*,†, Daniel A. Braun1,† and Daniel M. Wolpert1 1Computational and Biological Learning Lab, Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK 2Department of Experimental Psychology, University of Cambridge, Cambridge CB2 3EB, UK Numerouspsychophysicalstudiessuggestthatthesensorimotorsystemchoosesactionsthatoptimizethe average cost associated with a movement. Recently, however, violations of this hypothesis have been reported in line with economic theories of decision-making that not only consider the mean payoff, but are also sensitive to risk, that is the variability of the payoff. Here, we examine the hypothesis that risk-sensitivity in sensorimotor control arises as a mean-variance trade-off in movement costs. We designed a motor task in which participants could choose between a sure motor action that resulted in a fixed amount of effort and a risky motor action that resulted in a variable amount of effort that could be either lower or higher than the fixed effort. By changing the mean effort of the risky action while experimentally fixing its variance, we determined indifference points at which participants chose equiprobably between the sure, fixed amount of effort option and the risky, variable effort option. Depending on whether participants accepted a variable effort with a mean that was higher, lower or equaltothefixedeffort,theycouldbeclassifiedasrisk-seeking,risk-averseorrisk-neutral.Mostsubjects wererisk-sensitiveinourtaskconsistentwithamean-variancetrade-offineffort,thereby,underliningthe importance of risk-sensitivity in computational models of sensorimotor control. Keywords: motor control; risk; decision-making; mean-variance trade-off; prospect theory 1. INTRODUCTION between the expected payoff (mean return) E(x) and the In the fields of psychology and economic decision- variability of the payoff (risk) Var(x), such that U(x)¼ making, it is well established that risk attitudes influence E(x)2uVar(x). The parameter uexpresses the decision- human behaviour. For example, when given a choice maker’s risk attitude: risk-neutral decision-makers are betweenasurebetof$50anda50:50chanceofwinning only sensitive to the expected payoff (u¼0), while risk- $100 or $0, most people would prefer the sure bet, even averse individuals discount payoff variability (u.0) and though on average the two options have the same mean risk-seekers consider it a bonus (u,0). In biology, payoff. In fact, a risk-averse decision-maker would even mean-variance models of risk-sensitivity have been pre- prefer a sure bet with a slightly lower payoff, say $45, viously applied in ecology [8] and neuroeconomics, and thus accept a $5 risk premium—a fact that is elucidating the neural underpinnings of risk-sensitivity exploited by insurance companies in their policies. in economic choice tasks [9–16]. In psychology and By contrast, risk-seeking individuals assign higher value behaviouraleconomics,manyotherstudieshavealsopro- to options that have greater variability—for example, vided evidence for risk-sensitivity in the context of whengamblinginacasino.Riskmightalsoplayanimpor- prospecttheory,inwhichrisk isthoughttoarisethrough tantroleinmotortasks.Consider,forexample,aclimber nonlinear distortions of values and probabilities [3]. who has to choose between different routes—a long Incontrast,mostresearchonthehumanmotorsystem secure route or a shorter route that could lead to the has emphasized risk-neutrality and has not considered goal faster, but could take longer if slippery. On his way payoff variance as a potential influence on behaviour. he might be faced with manysuch decisions. For example, a number of studies have proposed that Thetheoryofriskindecision-makinggoesbacktothe humans choose movement strategies so as to maximize eighteenthcentury[1]andhassinceflourishedintoahost an average gain in inherently uncertain motor tasks that ofdifferentmodelsofdecision-makingunder uncertainty involve both spatially [17–19] and temporally structured [2–7]. One of the most popular risk models in modern rewards [20,21]. As average gain models only consider finance is Markowitz’ risk-return model, in which the meanrewards,theyareneutralwithrespecttorisk.Simi- value U(x) of an investment x is modelled as a trade-off larly, current computational theories of motor control often consider exclusively mean movement costs and are, therefore, risk-neutral. For example, in most studies on optimal feedback control theory, the optimal *Authorforcorrespondence([email protected]). †Theseauthorscontributedequallytothestudy. behaviour does not consider how variable the movement cost is, but only depends on the average cost [22–27]. Electronic supplementary material is available at http://dx.doi.org/ 10.1098/rspb.2010.2518orviahttp://rspb.royalsocietypublishing.org. Recently, however, violations of the mean payoff Received21November2010 Accepted6December2010 2325 Thisjournalisq2011TheRoyalSociety 2326 A. J. Nagengast et al. Decision-making in sensorimotor control hypothesishavebeenreportedinmotorcontroltasks.Wu vertical axis of the screen (figure 1). The effort circles et al. [28] showed, for example, that in a pointing task representedallthepossibleeffortlevelsthatcouldbeexperi- subjects exhibit risk-seeking behaviour in line with pro- enced by the subject in the effort stage of that trial. The spect theory, because they systematically underweight yellow circle was always x ¼10 cm from the start yellow small probabilities and overweight large probabilities of location (the sure bet), while the test stimuli were rep- hitting designated targets by pointing movements. Simi- resented by the green and red circles, with the green larly, Nagengast et al. [29] showed that subjects exhibit circle always having a shorter distance, x ,10cm (lower green risk-averse behaviour in a motor task that required them effort),andtheredcirclealwaysagreaterdistance,x .10 red to control a Brownian particle under different levels of cm(highereffort),fromthestartinglocation.Thecoloursof noise. Subjects’ changes in control gain depended on the three effort circles corresponded to the coloursthat were their risk-sensitivity in line with the predictions of a used to indicate different target regions on two walls that risk-sensitive optimal feedback controller [30]. Here, we were located 20cm lateral to the starting location and examine the hypothesis that risk-sensitivity in sensori- extended the full height of the screen. Subjects moved from motor control tasks can be understood as a trade-off the starting location to hit one of the two walls. The left betweenthemeanmovementcostandthevariabilityofthe wall was entirely yellow, whereas the right wall was red with cost,analogoustotherisk-returnmodelusedineconomics. a green region embedded whose height was varied between trials(figure1).Thegreenregiondeterminedtheprobability ofp ,whichwasequilibratedinatestsessiontofitsubjects’ hit 2. METHODS individual motor variability (compare experimental sessions). (a)Experimentalset-up Depending on which of the three colour regions subjects hit Fifteenright-handedsubjects(eightmale,sevenfemale,aged they would have to move to the corresponding effort circle. 20–30) participated in the experiment after providing Therefore, they could always choose the yellow effort circle written informed consent. The experimental protocols if they wished (sure bet) or take the risky option of aiming were approved by the local ethics committee. Subjects were for the green region and either reach to the green or red naivetothepurposeoftheexperimentandnoneofthesub- effort circle depending on the outcome. To make the task jects reported any sensory or motor deficits. While seated, more demanding, the movement time was limited to 0.3s subjects used their right hand to grasp the handle of a (iflonger,subjectshadtorepeatthetrial)andweintroduced vBOT force-generating robotic manipulandum, which a visual gain of 3 in the y-direction relative to the starting could be moved in the horizontal plane (for details, see location (i.e. errors were magnified threefold) and this gain [31]).Thepositionandvelocityofthehandwerecomputed waskeptconstantthroughouttheexperiment. online at 1000Hz. Subjects could not see their arm but the position of their hand could be displayed in the plane of (ii) Effortstage thearmusing areflected rear-projection system. Aftertheyhadmadetheirdecisionandhitoneoftheregions, Thetaskwasanimplicitmotorversionofabinaryeconomic subjects had to move their hand to the corresponding effort decision-making task. In the economics domain probabilities circle and hold it there for 1.5s against a spring-like force andrewards(orlosses)aretypicallybothrepresentedexplicitly F that was pushing them to the right and whose magni- byinformingsubjectsaboutthenumbersinvolved.Incontrast, right tude wasproportional to the distance x that they wereaway inourtasklossesweredeterminedbytheeffortsubjectshadto fromthestartinglocation,F ¼k.x.Thespringconstant exerttoachieveamovementandtheprobabilitiesweredeter- right kwas adjusted to the strength of each subject at the begin- minedimplicitlybythesubjects’motor variability.Weuseda ning of the experiment. We used body weight as a proxy two-alternative forced-choice paradigm in which subjects for maximum force production and the spring constants chose on each trial between a certain fixed effort movement ranged fromk¼125Nm21 for the lightest (weight approx. andagambleinwhichtheywouldhavetomakeeitheralower 50kg) to k¼200Nm21 for the heaviest subject (weight or higher (than the fixed) effort movement. Which of these approx. 80kg). As the spatial range of targets was small we efforts they would experience if they chose the gamble was ignore changes in configuration on the arm (biomechanics) determinedprobabilistically,withprobabilityp and12p , hit hit affecting thesubjective measures ofeffort. respectively.Theprobabilityp wasimplicitlyencodedbythe hit sizeofasmalltargetregionsubjectscouldtrytohitinalimited time(withthetargetsizecalibratedsothattheprobabilityofhit- (iii) Experimentalsessions tingthetarget,phit,wascontrolled).Ifthetargetwashit,they The first 200 trials were a trainingsession, in which subjects thenmadethelowereffortmovement,butiftheymissedthey practiced hitting the green region on the right-hand side, madethehighereffortmovement.Eachtrialoftheexperiment, which varied in size from trial-to-trial (0.5–5cm, 20 trials therefore, involved two stages. First, subjects made a choice each).Thenext50trialsfamiliarizedsubjectswiththedifferent betweena sure and a risky strategy (decisionstage) andthen effort levels. Subjects moved to hit the yellow wall anywhere producedamovementundertheassociatedeffortlevel(effort along its length and then moved to the yellow effort circle stage). The main experimental manipulation was to change whose distance varied from trial-to-trial (1–19cm, five trials theeffortlevelsovertrialssoastoinfluencethemeanandvar- each). The subsequent 100 trials (the ‘s-estimation session’) ianceoftheeffortandstudyhowthesechangesinfluencechoice was used to estimate subjects’ endpoint variability. Subjects behaviour.Subjectswereinstructedtochoosetheoptionthat attemptedtohitasmall0.5cmgreenregion(equivalenttoa theypreferred. rangeofmotionofthehandof0.5/3¼0.16cmowingtothe visualgain).Thevarianceofthe(approx.)Gaussianendpoint (i)Decision stage distribution was used to establish the relationship between The decision stage started with three effort circles (green, targetsizeandhittingprobabilityfordifferenttargetsizesthat yellow and red; 0.75cm radius) being displayed along the wasusedsubsequently.Thelast400trialswerethetestsession Proc.R.Soc.B(2011) Decision-making in sensorimotor control A. J. Nagengast et al. 2327 certainty utility probability We discretized both probability and effort space, com- puted all possible combinations and selected those combinations that had a particular variance within a given tolerance. The probabilityof hitting a target p wasdiscre- hit tized into steps of 0.01 (101 levels) corresponding to a set ofheightsofthegreenregionthatdependedontheindivid- ualsubject’svariance in pointing. Themovementeffort was discretized into steps of 0.5cm with E ranging from 0 to hit 2 9.5cm and Emiss from 11.5 to 20cm, corresponding to the effortforthegreenandredcircles.Allpossiblecombinations ofE ,E andp (hencep ¼12p )wereconsidered 1–p hit miss hit miss hit force hit resulting in 20(cid:2)20(cid:2)101¼40,400 combinations. The mean effort m¼p . E þp . E and the variance hit hit miss miss s2¼(E 2m)2 . p þ(E 2m)2 . p were computed hit hit miss miss for all combinations. Lotteries with a variance of s2¼ 1 1 p hit f1,5,11,17,24g+0.5 were selected and saved as five stimu- lus sets used in the experiment resulting in n¼ f1148,1366,1076,780,713g different stimuli for every set. From these five stimulus sets, we selected those stimuli for 1 – phit presentationduringtheexperimentthatwouldprovidemaxi- mum information about the subjects’ indifference points (mean effort) where subjects would choose equiprobably between the risky strategy and the sure bet strategy. To this end,weselectedthestimulibasedonastandardadaptivefit- Figure1.Schematicofexperiment.Atrialinthe‘mean-var- tingprotocol(QUEST)[32,33].Thismethodselectsthenext iancesession’consistedoftwostages:adecisionstageandan stimulustoliewithinthe95%confidenceintervalofthecur- effort stage. Three possible circular targets were displayed rentestimateoftheindifferencepointbasedonfittingallthe (green, the closest; red, the furthest; yellow, always at datatoa logistic function.The trialsforeachofthe five var- 10 cm from the origin). The target selection from these depended on the outcome of the decision stage. (1) In lim- iance levels were interleaved in a pseudo-random order with ited time, subjects chose to move their hand (represented atotalof80trialsateachvariancelevel.Thisprocedurepro- by the small blue circle) either to the left or to the right. ducedindifferencepointsforeachofthefivevariancelevels. The left-hand side was a sure bet and the yellow circular target was always selected. Moving to the right was risky (b)Models and subjects attempted to hit a small green target. Having To estimate subjects’ risk-sensitivity, we modelled decisions established the subjects’ Gaussian endpoint distribution for made by ideal actor models whose choices were contami- this movement previously, a given target size corresponded nated by noise and we used maximum-likelihood methods toaparticularprobabilityofhittingthetargetp .Therefore, hit toestimateparametersoftheidealactormodels.Inparticu- if subjects chose the risky strategy they would have a prob- lar, we considered the mean-variance model and prospect ability of p of hitting the green target-wall and 12 p of hit hit theory to explain subjects’ choice behaviour. The noise hitting the red target-wall. The size of the yellow wall was modelforbothcasescanbefoundtogetherwiththemethods always the same. (2) In the effort stage, subjects moved to the corresponding target where they had to push against a for the model comparison in the electronic supplementary stiffspring requiringaforceF .Wevariedtheprobability material. right p and the red and green circular target positions to estab- hit lish for which effort level subjects were indifferent between (i)Mean-variance model the sure bet and the risky option for five levels of effort Asoutlinedin§1,themean-variancemodelofrisk-sensitivity variance. postulates a utility function that containstermsthatinclude both the mean payoff and the variance of the payoff such (the ‘mean-variance session’) in which we measured the that U (x)¼2E(x)þu Var(x), where x is the distribution 1 1 subjects’choicebehaviour. of possible distances to the effort circles and u is the risk- 1 parameter (risk-averse for u ,0, risk-neutral for u ¼0 1 1 (iv)Stimulus setforfinding indifferencepoints andrisk-seekingforu .0).Notethatthesignoftheutility 1 We wished to examine how variability of the effort affected has been reversed since distances are‘disutilities’. Also note subjects’choicesbetweenthesurebetandtheriskystrategy. that we can use the distance x as a proxy for effort, since Todothis,wewantedtofindindifferencepointswheresub- theforcedependsonxinalinearfashionandutilitiesarecar- jects would choose each possibility equiprobably (p¼0.5). dinaluptoalineartransform—thatis,choicesthatsatisfythe Aswewereinterestedinhowvarianceaffectstheindifference usualrationalityaxiomscanberepresentedbyautilityindex pointwecreatedstimulussetsfortheriskychoicethathada that is unique up to a linear transformation. We also use a fixedvarianceandonlyvariedinthemean—therebyfinding slightly more general formulation of risk-sensitivity, by the mean for the risky choice to which subjects would be including higher order statistics beyond the variance. This indifferent to choosing the sure bet. To create this stimulus can be easily achieved by means of a utility function of the set with a fixed variance that differs only in the mean form U2(x)¼2u221 lnE(e2(1/2)u2x) that has the same terms effort, we manipulated both the hitting probabilities (height asU (x)inthefirsttwotermsofitsTaylorSeriesexpansion 1 of the green region) and effort levels of the risky choice (withu ¼4u).Importantly,thesamegeneralizationcanbe 2 1 (locations ofthe redandgreeneffort circles). usedtointroducerisk-sensitivitytooptimalfeedbackcontrol Proc.R.Soc.B(2011) 2328 A. J. Nagengast et al. Decision-making in sensorimotor control models[29,30].Accordingly,thesurebetinourexperiment * * 20 can be represented as U (x )¼2x and the risky 2 yellow yellow n15 alternative as U2(fxgreen, xredg)¼2u221 ln(phit e2(1/2)u2xgreenþ mea10 (12phit)e2(1/2)u2xred). 5 q =0.46 q =0.16 1 1 (ii) Prospect theory * * Unlike the mean-variance approach, prospect theory does 15 n not have a single risk-parameter. Instead, prospect theory ea10 m postulatesdifferentvaluefunctionsvþ(x)andv2(x)thatdis- 5 q =0.16 q =0.14 tort the objective value of x and different probability 1 1 weightingfunctionswþ(p)andw2(p)thatdistorttheobjec- * * tive probabilities depending on a particular reference point, 15 n i.e. depending on whether one deals with gains (þ) or ea10 m losses (2) or both. Risk-sensitivity then depends on the 5 q =0.12 q =0.12 shapeofthevaluefunctionaswellastheshapeoftheweight- 1 1 ing function. In our experiment, we exclusively deal with * * losses, since all outcomes require effort (the reference point 15 n is0effort).For purelossprospects,theutilityof a prospect ea10 m withbinaryoutcomesxredandxgreenandassociatedprobabil- 5 q =0.11 q =0.08 ities (12phit), and phit is given by U(x)¼[12w2(phit)] 1 1 v2(x )þw2((12p ))v2(x ). To parameterize this green hit red * decision model, we used a standard value function family 15 n proposed by Kahnemann & Tversky [3] v2(x)¼2xa and ea10 m a common probability weighting function family proposed 5 q =0.04 q =0.03 byPrelec[34]w2(p)¼exp[2(2lnp)g].Thedecisionmodelis 1 1 thendeterminedbytheparametersaandg.Consequently,we * q =−0.05 canwritethesurebetoptionasU(x )¼v2(x )andthe 15 1 yellow yellow n risky option as U(fxgreen, xredg)¼(12w2(phit)) v2(xgreen)þ mea10 w2(12phit)v2(xred). 5 q =0.02 1 q =−0.05 * q =−0.20 3. RESULTS 15 1 1 n (a) Mean-variance indifference points ea10 m To test the mean-variance hypothesis of risk for motor 5 control, we designed a probabilistic decision-making task in which subjects could choose between a sure 1 5 11 17 24 1 5 11 17 24 variance variance bet—a movement of a fixed effort—or a risky option—a movement entailing either a lower or higher effort Figure 2. Mean-variance trade-off. The result of the (figure 1). By controlling the mean and variance of the experiment for all 14 subjects ordered from the most effort of the risky option, we found indifference points risk-seeking to the most risk-averse. The indifference points where subjects chose equiprobably between the sure bet +s.d. obtained from the five psychometric curves are and the risky option (see electronic supplementary showninblack.Thebestlinesoffitobtainedusingweighted material, figure S1 shows the psychometric curves for a linear regression are shown in blue. The risk-attitude par- typical subject). These indifference points were stable ameteru1 is the line’s slope and is shown in the right-hand corners of the subplots. For all but three subjects, the null through the course of the experiment—that is they did hypothesis of risk-neutrality could berejected with p,0.05 not shift owing to fatigue, for example—and thus they (markedwithan asterisk). reflect a stationary choice pattern (see electronic sup- plementary material, results and figure S2). At the indifference point, the mean effort of the risky choice indifference points depending on the level of variance. relative to the fixed effort could be equal (risk-neutral), Ascanbeseenbytheregressionsmarkedwithanasterisks higher (risk-seeking) or lower (risk-averse). Therefore, in figure 2, for all except three subjects, the null hypoth- risk-averse subjects only accept the risky reach if the esisofrisk-neutrality,i.e.alineindistinguishablefromthe meaneffortlevelislowerthanthefixedeffortalternative, horizontal, could be rejected with p,0.05. whereas risk-seeking subjects are prepared to take a gamble even at unfavourable odds with the hope for the (b) Mean-variance models improbable outcome requiring lower effort than the The slope of the linear fits allowed us also to infer the fixed effort alternative. risk-parameter in the simple mean-variance model. For Figure 2 shows the indifference points at the five the sure-bet reach, the effort circle is always at 10cm, variance levels for all 14 subjects. We used weighted i.e. Us ¼2E(10)¼210, and for the risky option 1 least-squares regression to obtain linear fits of the five Ur(x)¼2E(x)þuVar(x).Thecurveofindifferencepoints 1 1 mean-variance indifference points. The slope of these of mean effort levels at different variances can hence be fits informs us about the risk-sensitivity. A slope of zero describedbytheconditionUs¼Ur(x)resultingin 1 1 is compatible with risk-neutrality. A non-zero slope of these fits implies that subjects modulated their EðxÞ¼u1VarðxÞþ10; ð3:1Þ Proc.R.Soc.B(2011) Decision-making in sensorimotor control A. J. Nagengast et al. 2329 Table 1. Parameter estimates. Mean-variance (U ). The mean parameterestimates ofu +s.d. of a mean-variance decision- 1 1 maker obtained from the linear regression analysis of the subjects’ indifference points (see figure 2). Mean-Variance (U ). 2 The mean parameter estimates of u +s.d. (estimated using bootstrapping with 1000 repetitions) of a mean-variance 2 decision-maker obtained using a maximum-likelihood analysis of a noisy decision-maker. Prospect theory. The mean parameter estimates of a+s.d. and g+s.d. (estimated using bootstrapping with 1000 repetitions) of a prospect theory decision-makerobtainedusingamaximum-likelihoodanalysisofanoisydecision-maker. subject 1 2 3 4 5 6 7 8 9 10 11 12 13 14 mean-variance(U ) 1 u 0.46 0.16 0.16 0.14 0.12 0.12 0.11 0.08 0.04 0.03 0.02 20.05 20.05 20.2 1 + 0.18 0.03 0.02 0.02 0.03 0.03 0.04 0.03 0.01 0.04 0.05 0.03 0.04 0.06 mean-variance(U ) 2 u 0.43 0.18 0.22 0.25 0.27 0.13 0.16 0.13 0.19 0.1 0.06 20.18 20.07 20.34 2 + 0.03 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.03 0.02 0.03 0.02 0.03 0.03 prospecttheory a 0.28 0.12 0.13 0.13 0.22 0.09 0.06 0.09 0.25 0.28 0.12 2.61 2.76 4.76 + 0.05 0.04 0.03 0.04 0.05 0.03 0.03 0.06 0.14 0.18 0.04 0.23 0.45 0.22 g 0.54 1.28 1.45 0.94 0.76 1.67 1.59 2.31 0.88 0.93 2.05 1.22 3.87 1.71 + 0.1 0.14 0.13 0.1 0.08 0.31 0.21 0.26 0.11 0.12 0.17 0.15 0.24 0.08 wheretheslopeistherisk-attitudeparameteru wewishto The utility model then is given by U (x)¼22u21 determine,and10isexpectedtobetheintercept1oftheindif- lnE(e2(1/2)u2x2)andU2ðxÞ¼(cid:3)2u(cid:3)21lnEðe(cid:3)ð1=2Þu22pffixffiÞ,respec2t- ference curve. Based on this analysis, we found that the ively. Importantly, nonlinear transformations of the utility majority of subjects, that is nine subjects, can be classified leadtotherepresentationofdifferentpreferences.However, as risk-seeking in the task, three as indistinguishable from thebestfitsforthesenonlinearscalesweresignificantlyworse risk-neutralandtheremainingtwoasrisk-averse.Therisk- than the best fits with the linear force scale (p,0.001, attitudeparameteru rangedfrom0.46forthemostrisk-seek- likelihoodratiotest).Thissuggeststhatour mean-variance 1 ingto20.2forthemostrisk-aversesubject(seetable1forthe modelthatassumedanundistortedperceptionoftheexperi- estimated values). This provides evidence that subjects are encedforcesfitsthedatabetterthanmean-variancemodels not indifferent to the variance of the outcome but have a that assume either super-linear or sub-linear perception of certainattitudetowardsriskthatinfluencestheirdecisions. theexperiencedforces. Tocheckforconsistencyoftheinferredrisk-sensitivity parameters, we used a slightly more complex mean- variance model (see §2) to derive risk-sensitivity (c) Prospect theory model parameters based on subjects’ trial-by-trial choices and Adifferentwayoflookingathumandecision-makinghas then compared the two sets of risk-parameters for all been suggested by Kahnemann & Tversky. In their orig- subjects.Theidealactormodelassumedautilityfunction inal formulation of prospect theory [3] and its later U (x)¼22u21lnE(e2(1/2)u2x), where u is a risk- extension cumulative prospect theory (CPT) [35], devi- 2 2 2 parameter. We used a maximum-likelihood method to ations from risk-neutrality are due to two factors—the estimatetheparameteru foreachsubject (seeelectronic distortion of probabilities in the probability weighting 2 supplementary material, methods for details). The risk- function and the curvature in the value function. In attitude parameter u ranged from 0.43 for the most CPT, people’s value function is described as convex for 2 risk-seeking to 20.34 for the most risk-averse subject monetary losses and concave for monetary gains. In (see table 1 for the estimated values). The results addition, people act as if they misperceive probability, obtained using the two methods to estimate the risk putting too much weight on small probabilities and too parameters u and u are in good agreement (r¼0.91, little weight on large probabilities. This is captured by a 1 2 p,0.0001). To test whether this risk-based model was value function and probability weighting function whose better than a risk-neutral model, we used the Bayesian shapeisdeterminedbyaparameteraandg,respectively information criterion (BIC) to compare the ideal actor (see§2fordetails).Werepeatedthemaximum-likelihood model to a risk-neutral model. The BIC for the risk- analysisforaCPTdecision-makerandestimatedthepar- sensitive model was smaller than for the risk-neutral ametersaandg(see table 1 and figure 3a,b). The three model (risk-neutral decision-maker: BIC¼6256.1, risk- subjects that had been classified as risk-averse had sensitive decision-maker: BIC¼6156.2) supporting the convex value functions, the remaining subjects had con- risk-sensitive model and corroborating the findings from cave value functions. In general, the estimated u and a 2 theregressionanalysisoftheindifferencepoints.Alikeli- wereanti-correlated(r¼20.89,p,0.001).Thepicture hoodratiotestfornestedmodelsconfirmedthefindingof was more mixed for the probability weighting function theBICanalysisandshowedthattherisk-sensitivemodel (r¼20.43, p.0.05) but the majority of subjects fits the data significantly better (p,0.001). seemed to be under rather than overweight small prob- Wealsofittherisk-sensitiveidealactormodelwithtwo abilities (g¼1.51+0.23). Based on BIC, a model differentcoordinatesystems,whereforcesarenotperceived comparison with the risk-neutral model was not in linearly,butnonlinearlyeitherasthesquare(super-linear) favour of the CPT model (risk-neutral decision-maker: or the square root (sub-linear) of the objective force. BIC¼6256.1, CPT decision-maker: BIC¼6293.9); Proc.R.Soc.B(2011) 2330 A. J. Nagengast et al. Decision-making in sensorimotor control 10 1.0 1.0 8 0.8 0.8 6 0.6 0.6 ue p) ual val 4 w( 0.4 pact 0.4 2 0.2 0.2 0 0 0 5 10 0.5 1.0 0.5 1.0 outcome p ppredicted Figure3.Parameterestimatesfortheprospecttheoryfitsandcontrolresults.(a)Theestimatedvaluefunctionforeachsubject (blue)andthemeanacrosssubject(red).Thedashedlineindicatesarisk-neutralvaluefunction.(b)Theestimatedprobability weightingfunctionw(p)foreachsubject (blue)andthemeanacrosssubject(red).Thedashedlineindicatesnodistortionof probabilities.(c)Theempiricalprobabilityofhittingthetargetinthe‘mean-variancesession’versusthehittingprobabilitypre- dicted by using subjects’ endpoint variability from the ‘s-estimation session’ with 1 s.e.m. across subjects. The dashed lines indicates a perfect match betweenthetwo. however, based on the Akaike information criterion (AIC) baselineof afixed certaineffort,we found that most sub- theCPTmodelwaspreferred(risk-neutraldecision-maker: jects were prepared to accept a gamble with higher mean AIC¼6163.2,CPTdecision-maker:AIC¼6015.4).Com- effort when variability was high (risk-seeking), whereas paring the CPT model to the mean-variance model, we some were risk-neutral and a minority would only accept found that the mean-variance model was preferred both alowermeaneffort(risk-averse).Ourresultsareconsistent basedonBIC(mean-variancemodel:BIC¼6156.2,CPT with a risk-sensitive decision-maker that trades off the decision-maker:BIC¼6293.9)andbasedonAIC(mean- mean and variance of movement effort, but inconsistent variance model: AIC¼5970.6, CPT decision-maker: with a risk-neutral account of motorcontrol. AIC¼6015.4). A number of previous studies have found that people maximizeexpectedgaininmovementtasksinwhichsub- jects made speeded pointing movements and the spatial (d) Control of experimental assumptions [18,19] or temporal outcome [20,21] of their movement Our experiment depends on the assumption that the resulted in a monetary payoff. These studies compared subjects’ endpoint variability did not change from the subjects’ behaviour with an ideal actor model that maxi- ‘s-estimation session’ to the ‘mean-variance session’. mized expected payoff. Crucially, the optimal movement This was true for 14 out of 15 subjects (all p.0.3, strategy suggested by such models is independent of the two-sample Kolmogorov–Smirnov test for the mean- variance of the payoff. This should, however, not be corrected endpoint-distribution of the ‘s-estimation confused with the variance of the movement outcome session’ and the ‘mean-variance session’). On average, the (see electronic supplementary material, discussion for endpoint-variability (s) of subjects was 1.90+0.44 cm mathematical details). The fact that various kinds of in the ‘s-estimation session’ and 1.86+0.31cm in the movement variability play an important role in the ‘mean-variance session’. One subject had to be excluded choice of suitable movement strategies is well known fromtheanalysisasthestandarddeviationofhismovements [17] and taken into account by expected gain models. changed drastically from 5.86cm in the ‘s-estimation ses- This raises the question as to why these previous studies sion’ to 1.70cm in ‘mean-variance session’ (p,0.002, have not reported risk-sensitivity. One key difference two-sample Kolmogorov–Smirnov test). Furthermore, our from our study is that in these previous studies the experimentaldesignreliedonpredictingthesubjects’hitting mean and variance of the reward were not manipulated probabilityfromtheirendpointvariability.Figure3cshowsa independently of each other making it difficult to estab- plot of the empirical probabilityof hitting the target in the lish the effect of one variable alone on subjects’ ‘mean-variance session’ versus the hitting probability behaviour.Implicitinthe‘gain-maximizationhypothesis’ predicted by using subjects’ endpoint variability from the is also that the utilityof money is linear acrossthewhole ‘s-estimation session’. Using linear regression on the data rangeandnotconcaveforgainsandconvexforlossesasis after subtracting the diagonal and testing for zero slope the usual consensus in behavioural economics [3]. (t8¼1.08, p.0.3) and zero intercept (t8¼0.9, p.0.3) A possible reason why the linear utility function is suc- suggests coincidence of the data with the diagonal and cessful is that these studies used very small monetary hence confirms accurate prediction of hitting probabilities remunerations of only a few cents (2.5 cents maximum duringtheexperiment. reward per trial and 12.5 cents maximum loss per trial [18,19,36]). That is they effectively only tested subjects over a very narrow (possibly linear) range of their utility 4. DISCUSSION functions.Indeed,arecentstudythatusedlargerrewards Inourstudy,weexaminedwhethersubjectsaresensitiveto reportedthesamevaluefunctionformoneyinmovement the variance of movement costs rather than just the mean tasksasineconomicdecision-makingtasks[28]andisat level of movement costs. In particular, we investigated odds with the ‘expected gain maximization’ hypothesis. howsubjectstradeoffthemeaneffortagainstthevariabil- Wu et al. [28] examined violations of expected utility ity of effort during a movement. Compared with the theory in a motor task that involved making accurate Proc.R.Soc.B(2011) Decision-making in sensorimotor control A. J. Nagengast et al. 2331 pointing movements. In particular, they investigated vio- value and how the brain0s different valuation and action lations of the so-called independence axiom, stating that selection system interact and vie for control to arrive at preferences should not be affected by the addition of a an overt behavioural decision [41]. ‘common consequence’. Consider two different tasks in Current computational accounts of motorcontrol-like which subjects can choose between lotteries of the form optimal feedback control theory are risk-neutral [26,27] [p U($V ), p U($V ),...] where there is a probability p and only consider minimization of the expectation of a 1 1 2 2 1 of receiving $V that has a subjective utility of U($V ), cost function, usually with terms for positional accuracy 1 1 etc (we assume without loss of generality that U($0)¼0). and effort. The variance of the cost does not influence In the first task, we can choose between two lotteries these models when computing the optimal movement [0.33U($2500),0.67U($0)] and [0.34U($2400),0.66 policy. However, models of risk-sensitive optimal feed- U($0)]. In a second task, we can choose between back controllers are compatible with a mean-variance [0.33U($2500),0.66U($2400),0.01U($0)] and [0.66 trade-off in movement costs as found in the current U($2400),0.34U($2400)]¼[1.0U($2400)]. These two study,becausethefirsttwotermsoftheTaylorexpansion tasks only differ in their ‘common consequence’ in that oftherisk-sensitivecostfunctioncorrespondtomeanand the second task simply adds 0.66U($2400) to both variance of the movement cost. Recently, we have shown lotteries in task 1. However, in the first task, people how risk-sensitive optimal feedback control can account tend to prefer the first lottery implying that 0.33 for sensorimotor behaviour under uncertainty in a con- U($2500).0.34U($2400) whereas in the second task tinuous motor task where subjects had to control a they tend to prefer the second lottery as it has a guaran- Brownian particle under different noise levels [29]. In teed outcome. Therefore, some decision-makers reverse this previous study, we found that subjects showed theirpreferencebetweenthetasks.Importantly,expected mostly risk-averse behaviour, whereas in the current utility theory does not allow preference reversals of this study and in the study by Wu et al. [28] subjects were kind. Wu et al. [28] observed, however, exactly this kind mostly risk-seeking. An important difference between of preference reversals violating the independence these experiments is that in the previous study the noise axiom. By introducing common consequences in their was given by the Brownian particle, whereas in the task, Wu et al. [28] simultaneously changed the mean current study (and also in [28]) the noise was given by andthevarianceoftheirpayoffs.Incontrast,inourexper- subjects’ own motor noise. In non-motor settings, the imentaldesignwedidnotusecommonconsequencesand ‘illusionofcontrol’[42]isoneofthecorefactorsincaus- insteadwereabletofixthepayoffvarianceoftheriskylot- ingpeopletomistakegamesofpurechancewithgamesof tery and only change its mean payoff. By examining skilleventhoughtheyarenotcontrollable[43].Hence,a subjects choice between this risky lottery and the certain possible explanation for the difference in risk-sensitivity lottery(zerovarianceandfixedpayoff),wecoulddirectly inourcasemightbethatsubjectsarerisk-seekingbecause measure indifference points (for five different levels of theytendtobeover-confidentabouttheirowngenerated variance) where subjects chose equiprobably between motor noise, but risk-averse with respect to noise that is the two lotteries. This separate manipulation of mean given in their environment. This hypothesis could be and variance allowed us to directly show that subjects tested in future experiments. trade off mean and variance of movement costs. To compareour results to Wu et al. [28], we also fit a The authors would like to thank Luc Selen for helpful prospect theory model to our data, where risk-sensitivity discussions, and Ian Howard and James Ingram for depends bothonthedistortionoftheprobabilityweight- technical assistance. This work was supported by the ing function and the curvature of the value function. Wellcome Trust and the European Project SENSOPAC Similar to their results, our fit indicated that small prob- (IST-2005-028056). A.J.N. was financially supported by a MRCresearch studentship. abilitieswereunderweightedinmostsubjectsandthatthe valuefunction wasmostlyconcave,bothofwhichiscon- sistentwithrisk-seekingbehaviour.However,whetherthe REFERENCES brain represents risk in agreement with either the mean- 1 Bernoulli, D. 1954 Exposition of a new theory on the variance approach or with the prospect theory account measurementofrisk.Econometrica22,23–36,orig.1738. iscurrentlysubjectofanongoingdebate[37].Recentevi- 2 Arrow,K.1965Aspectsofthetheoryofrisk-bearing.Yrjo¨ dence from electrophysiological and functional imaging Jahnssonin Sa¨a¨tio¨. studieshasprovidedsupportforboththeories.Insupport 3 Kahneman, D. & Tversky, A. 1979 Prospect theory: of the mean-variance approach, separate encoding of an analysis of decision under risk. Econometrica 47, reward magnitude and risk has been reported in 263–291. (doi:10.2307/1914185) humans [14–16] as well as in non-human primates 4 Markowitz, H. 1952 Portfolio selection. J. Financ. 7, [38]. However, recent studies have also found neural 77–91. (doi:10.2307/2975974) evidence in favour of prospect theory. Martino et al. 5 Neumann,J.V.&Morgenstern,O.1944Theoriesofgames [39],forexample,reportedneuralcorrelatesofthefram- and economic behavior. Princeton, NJ: Princeton ing effect, that is the susceptibility of the decision-maker University Press. 6 Pratt,J.1964Riskaversioninthesmallandinthelarge. to the manner in which options are presented. In Econometrica32,122–136.(doi:10.2307/1913738) addition, Hsu et al. [40] found that neural responses in 7 Savage,L.J.1972Thefoundationsofstatistics.NewYork, thebraindependedonprobabilitiesinanonlinearfashion NY:Dover Publications. during a risky task. Both effects are cornerstones of pro- 8 Real, L., Ott, J. & Silverfine, E. 1982 On the tradeoff spect theory. In our experiment, the model comparison between the mean and the variance in foraging: effect favours the mean-variance approach. However, further of spatial distribution and color preference. Ecology 63, studies are needed to elucidate how the brain represents 1617–1623. (doi:10.2307/1940101) Proc.R.Soc.B(2011) 2332 A. J. Nagengast et al. Decision-making in sensorimotor control 9 Christopoulos,G.I.,Tobler,P.N.,Bossaerts,P.,Dolan, 25 Nagengast, A. J., Braun, D. A. & Wolpert, D. M. 2009 R.J.&Schultz,W.2009Neuralcorrelatesofvalue,risk, Optimalcontrolpredictshumanperformanceonobjects and risk aversion contributing to decision making under with internal degrees of freedom. PLoS Comput. Biol. 5, risk. J. Neurosci. 29, 12574–12583. (doi:10.1523/ e1000419.(doi:10.1371/journal.pcbi.1000419) JNEUROSCI.2614-09.2009) 26 Todorov, E. 2004 Optimality principles in sensorimotor 10 Fecteau,S.,Pascual-Leone,A.,Zald,D.H.,Liguori,P., control.Nat.Neurosci.7,907–915.(doi:10.1038/nn1309) The´oret, H., Boggio, P. S. & Fregni, F. 2007 Activation 27 Todorov,E.&Jordan,M.I.2002Optimalfeedbackcon- of prefrontal cortex by transcranial direct currentstimu- trol as a theory of motor coordination. Nat. Neurosci. 5, lation reduces appetite for risk during ambiguous 1226–1235. (doi:10.1038/nn963) decision making. J. Neurosci. 27, 6212–6218. (doi:10. 28 Wu,S.-W.,Delgado,M.R.&Maloney,L.T.2009Econ- 1523/JNEUROSCI.0314-07.2007) omic decision-making compared with an equivalent 11 Floden, D., Alexander, M. P., Kubu, C. S., Katz, D. & motor task. Proc. NatlAcad. Sci. USA 106, 6088–6093. Stuss, D. T. 2008 Impulsivity and risk-taking behavior (doi:10.1073/pnas.0900102106) in focal frontal lobe lesions. Neuropsychologia 46, 213– 29 Nagengast, A. J., Braun, D. A. & Wolpert, D. M. 2010 223. (doi:10.1016/j.neuropsychologia.2007.07.020) Risk-sensitiveoptimalfeedbackcontrolaccountsforsen- 12 Huettel, S. A., Stowe, C. J., Gordon, E. M., Warner, sorimotor behavior under uncertainty. PLoS Comput. B.T.&Platt,M.L.2006Neuralsignaturesofeconomic Biol. 6,e1000857.(doi:10.1371/journal.pcbi.1000857) preferencesforriskandambiguity.Neuron49,765–775. 30 Whittle,P.1981Risk-sensitivelinear/quadratic/Gaussian (doi:10.1016/j.neuron.2006.01.024) control. Adv. Appl. Probab. 13, 764–777. (doi:10.2307/ 13 Knoch,D.,Gianotti,L.R.R.,Pascual-Leone,A.,Treyer, 1426972) V., Regard, M., Hohmann, M. & Brugger, P. 2006 Dis- 31 Howard, I. S., Ingram, J. N. & Wolpert, D. M. 2009 A ruption of right prefrontal cortex by low-frequency modular planar robotic manipulandum with end-point repetitive transcranial magnetic stimulation induces torque control. J. Neurosci. Meth. 181, 199–211. risk-taking behavior. J. Neurosci. 26, 6469–6472. (doi:10.1016/j.jneumeth.2009.05.005) (doi:10.1523/JNEUROSCI.0804-06.2006) 32 Ko¨rding, K. P., Fukunaga, I., Howard, I. S., Ingram, 14 Preuschoff, K., Bossaerts, P. & Quartz, S. R. 2006 J. N. & Wolpert, D. M. 2004 A neuroeconomics Neural differentiation of expected reward and risk in approach to inferring utility functions in sensorimotor human subcortical structures. Neuron 51, 381–390. control.PLoS Biol. 2, e330. (doi:10.1016/j.neuron.2006.06.024) 33 Watson, A. & Pelli, D. 1983 QUEST: a Bayesian 15 Tobler,P.N.,O’Doherty,J.P.,Dolan,R.J.&Schultz,W. adaptive psychometric method. Percept. Psychophys. 33, 2007 Reward value coding distinct from risk attitude- 113–120. related uncertainty coding in human reward systems. 34 Prelec, D. 1998 The probability weighting function. J. Neurophysiol. 97, 1621–1632. (doi:10.1152/jn.00745. Econometrica66,497–527. (doi:10.2307/2998573) 2006) 35 Tversky, A. & Kahneman, D. 1992 Advances in 16 Tobler, P. N., Christopoulos, G. I., O’Doherty, J. P., prospect theory: cumulative representation of uncer- Dolan, R. J. & Schultz, W. Risk-dependent reward tainty. J. Risk Uncertain. 5, 297–323. (doi:10.1007/ value signal in human prefrontal cortex. Proc. Natl BF00122574) Acad. Sci. USA 106, 7185–7190. (doi:10.1073/pnas. 36 Trommersha¨user, J., Gepshtein, S., Maloney, L. T., 0809599106) Landy, M. S. & Banks, M. S. 2005 Optimal compen- 17 Cohen,R.&Sternad,D.2009Variabilityinmotorlearn- sation for changes in task-relevant movement variability. ing: Relocating, channeling and reducting noise. Exp. J. Neurosci.25, 7169–7178. BrainRes.193,69–83.(doi:10.1007/s00221-008-1596-1) 37 Boorman,E.D.&Sallet,J.2009Mean-varianceorpro- 18 Trommersha¨user, J., Maloney, L. T. & Landy, M. S. spect theory: the nature of value representations in the 2003 Statistical decision theory and the selection of human brain. J. Neurosci. 29, 7945–7947. (doi:10. rapid, goal-directed movements. J. Opt. Soc. Am. A 20, 1523/JNEUROSCI.1876-09.2009) 1419–1433. 38 Tobler, P. N., Fiorillo, C. D. & Schultz, W. 2005 Adap- 19 Trommersha¨user, J., Maloney, L. T. & Landy, M. S. tive coding of reward value by dopamine neurons. 2003Statisticaldecisiontheoryandtrade-offsinthecon- Science307,1642–1645.(doi:10.1126/science.1105370) trolofmotor response. Spat.Vis. 16,255–275. 39 Martino,B.D.,Kumaran,D.,Seymour,B.&Dolan,R.J. 20 Dean,M.,Wu,S.-W.&Maloney,L.T.2007Tradingoff 2006Frames,biases,andrationaldecision-makinginthe speed and accuracy in rapid, goal-directed movements. human brain. Science 313, 684–687. (doi:10.1126/ J. Vis.7,1–12.(doi:10.1167/7.5.10) science.1128356) 21 Hudson, T. E., Maloney, L. T. & Landy, M. S. 2008 40 Hsu, M., Krajbich, I., Zhao, C. & Camerer, C. F. 2009 Optimalcompensationfortemporaluncertaintyinmove- Neuralresponsetorewardanticipationunderriskisnon- mentplanning.PLoSComput.Biol.4,e1000130.(doi:10. linear in probabilities. J. Neurosci. 29, 2231–2237. 1371/journal.pcbi.1000130) (doi:10.1523/JNEUROSCI.5296-08.2009) 22 Braun,D.A.,Aertsen,A.,Wolpert,D.M.&Mehring,C. 41 Rangel,A.,Camerer,C.&Montague,P.R.2008Afra- 2009 Learning optimal adaptation strategies in unpre- mework for studying the neurobiology of value-based dictable motor tasks. J. Neurosci. 29, 6472–6478. decision making. Nat. Rev. Neurosci. 9, 545–556. (doi:10.1523/JNEUROSCI.3075-08.2009) (doi:10.1038/nrn2357) 23 Diedrichsen,J.2007Optimaltask-dependentchangesof 42 Langer, E. 1975 The illusion of control. J. Pers. Soc. bimanualfeedbackcontrolandadaptation.Curr.Biol.17, Psychol. 32, 311–328. (doi:10.1037/0022-3514.32.2. 1675–1679. (doi:10.1016/j.cub.2007.08.051) 311) 24 Izawa, J., Rane, T., Donchin, O. & Shadmehr, R. 2008 43 Davis, D., Sundahl, I. & Lesbo, M. 2000 Illusory per- Motor adaptation as a process of reoptimization. sonal control as a determinant of bet size and type in J. Neurosci. 28, 2883–2891. (doi:10.1523/JNEUR- casino craps games. J. Appl. Soc. Psychol. 30, 1224– OSCI.5359-07.2008) 1242.(doi:10.1111/j.1559-1816.2000.tb02518.x) Proc.R.Soc.B(2011)