Conceived and designed the experiments: EJAT DAB DMW. Performed the experiments: EJAT. Analyzed the data: EJAT. Wrote the paper: EJAT DAB DMW.
The authors have declared that no competing interests exist.
Sensorimotor learning has been shown to depend on both prior expectations and sensory evidence in a way that is consistent with Bayesian integration. Thus, prior beliefs play a key role during the learning process, especially when only ambiguous sensory information is available. Here we develop a novel technique to estimate the covariance structure of the prior over visuomotor transformations – the mapping between actual and visual location of the hand – during a learning task. Subjects performed reaching movements under multiple visuomotor transformations in which they received visual feedback of their hand position only at the end of the movement. After experiencing a particular transformation for one reach, subjects have insufficient information to determine the exact transformation, and so their second reach reflects a combination of their prior over visuomotor transformations and the sensory evidence from the first reach. We developed a Bayesian observer model in order to infer the covariance structure of the subjects' prior, which was found to give high probability to parameter settings consistent with visuomotor rotations. Therefore, although the set of visuomotor transformations experienced had little structure, the subjects had a strong tendency to interpret ambiguous sensory evidence as arising from rotation-like transformations. We then exposed the same subjects to a highly-structured set of visuomotor transformations, designed to be very different from the set of visuomotor rotations. During this exposure the prior was found to have changed significantly to have a covariance structure that no longer favored rotation-like transformations. In summary, we have developed a technique which can estimate the full covariance structure of a prior in a sensorimotor task and have shown that the prior over visuomotor transformations favor a rotation-like structure. Moreover, through experience of a novel task structure, participants can appropriately alter the covariance structure of their prior.
When learning a new skill, such as riding a bicycle, we can adjust the commands we send to our muscles based on two sources of information. First, we can use sensory inputs to inform us how the bike is behaving. Second, we can use prior knowledge about the properties of bikes and how they behave in general. This prior knowledge is represented as a probability distribution over the properties of bikes. These two sources of information can then be combined by a process known as Bayes rule to identify optimally the properties of a particular bike. Here, we develop a novel technique to identify the probability distribution of a prior in a visuomotor learning task in which the visual location of the hand is transformed from the actual hand location, similar to when using a computer mouse. We show that subjects have a prior that tends to interpret ambiguous information about the task as arising from a visuomotor rotation but that experience of a particular set of visuomotor transformations can alter the prior.
Uncertainty poses a fundamental problem for perception, action and decision-making. Despite our sensory inputs providing only a partial and noisy view of the world, and our motor outputs being corrupted by significant amounts of noise, we are able to both perceive and act on the world in what appears to be an efficient manner
In the Bayesian framework the prior can have a strong impact on the update, with particular priors leading to inductive biases when confronted with insufficient information. Many perceptual biases have been explained as the influence of priors learned from the statistics of the real world, such as the prior for lower speed when interpreting visual motion
In sensorimotor tasks, a number of studies have shown that when a participant is exposed to a task which has a fixed statistical distribution they incorporate this into their prior and combine it with new evidence in a way that is consistent with Bayesian estimation
If one uses Bayesian estimation in an attempt to learn the parameters of a new motor task, the prior over the parameters will impact on the estimates. While previously priors have been either imposed on a motor task or assumed, there has been no paradigm that allows the natural prior distribution to be assessed in sensorimotor tasks. Here we develop a technique capable of estimating the prior over tasks.
We examine visuomotor transformations, in which a discrepancy is introduced between the hand's actual and visual locations, and estimate the prior over visuomotor transformations. Importantly, we are not simply trying to estimate the mean of the prior but its full covariance structure. Subjects made reaching movements which alternated between batches in which feedback of the hand's position was either veridical or had a visuomotor transformation applied to it. By exposing participants to a large range of visuomotor transformations we are able to fit a Bayesian observer model to estimate the prior. Our model assumes that at the start of each transformation batch a prior is used to instantiate the belief over visuomotor transformations and this is used to update the posterior after each trial of a transformation batch. The prior to which the belief is reset at the start of a transformation trial may change with experience. For our model we estimate the average prior used over an experimental session by assuming it is fixed within a session, as we expect the prior to only change slowly in response to the statistics of experience.
Our approach allows us to study the inductive biases of visuomotor learning in a quantitative manner within a Bayesian framework and to estimate the prior distribution over transformations. Having estimated the prior in one experimental session, we examine whether extensive training in two further sessions with a particular distribution of visuomotor transformations could alter the participants' prior.
Subjects made reaching movements to targets presented in the horizontal plane, with feedback of the hand position projected into the plane of movement by a virtual-reality projection system only at the end of each reach (terminal feedback). Reaches were from a starting circle,
Each session alternated between veridical and transformed batches of trials. Each subject participated in three sessions, the first using an uncorrelated distribution of transformations, and the second and third using a correlated distribution. The joint distributions of
where we define the
Session 1 | Session 2 | Session 3 | ||||||
Subject | Transforms | Trials | Delay | Transforms | Trials | Delay | Transforms | Trials |
1 | 120 | 745 | 3 | 118 | 786 | 9 | 120 | 850 |
2 | 150 | 947 | 3 | 150 | 830 | 8 | 200 | 1102 |
3 | 144 | 827 | 4 | 150 | 860 | 8 | 180 | 977 |
4 | 133 | 944 | 3 | 140 | 929 | 9 | 160 | 1075 |
5 | 150 | 871 | 5 | 150 | 838 | 8 | 206 | 1076 |
6 | 140 | 970 | 6 | 124 | 928 | 9 | 155 | 1117 |
7 | 160 | 1090 | 5 | 151 | 1035 | 7 | 144 | 955 |
8 | 133 | 861 | 3 | 108 | 731 | 7 | 134 | 762 |
The number of transformations and trials in each experimental session, and the lengths of the delay in days between sessions.
The starting point of the reaches (1 cm radius circle) and the area from which the centres of targets were drawn (
Column
Compensatory responses tend to be in the correct direction: Column D shows that target-hand vectors on trials 2 and 3 tend to be in the same direction as the target-hand vector that would place the cursor on the target (
We fit subjects' performance on the first two trials of each transformed batch using a Bayesian observer model in which we assume subjects attempt to estimate the four parameters (
The plots show six 2-dimensional views of the 4-dimensional probability space of the
An optimal observer would integrate this prior with information received on the first trial (hand position and visual feedback of hand position) to generate a posterior over transformations. Even if there were no noise in proprioception or vision, the information from the first trial would not uniquely specify the underlying transformation. For example, for a particular feedback on the first trial the evidence is compatible with many settings of the four parameters (grey lines and planes in
In Session 1, transformations were sampled so as to minimize pairwise correlations between elements of the transformation matrix. This ‘uncorrelated’ distribution was designed to avoid inducing learning of new correlations. The set of transformations experienced in the first session is shown in the top-left cell of
Left column: Session 1. Right column: Session 2. Top row: the distributions of transformations in the two sessions. In each case 700 of the experimental transformations are plotted in the six projections of the 4-D space of linear transformations used in
We also analyzed the orientations of these covariance ellipses. Confidence limits on the orientation angle of the long axis of each ellipse were obtained by bootstrapping. The bottom-left cell of
Each subject participated in Session 2 between three and six days after Session 1, and in Session 3 between seven and nine days after Session 2 (
The priors fit to the data of the five subjects in Session 2 are shown in the middle-right cell of
In Session 3 (see
The top line shows the best fits in each of the experimental sessions, for each of the eight subjects; the middle line shows means and confidence limits on the covariance orientation angles. The bottom-left graph shows the mean across subjects of the orientation angles from the best fits to each subject's data, with 95% confidence limits on the mean found by bootstrapping.
To assess the extent to which our Bayesian observer model explained the data, we compared the magnitudes of its errors in predicting hand positions to the errors made by four other models: (A) the ‘no-adaptation’ model, which assumes the hand hits the centre of the target on all trials; (B) the ‘shift’ model, which is also a Bayesian observer but assumes the transformation is a translation; (C) the ‘rotation & uniform scaling’ model, another Bayesian observer that assumes the transformation is a rotation combined with a scaling; (D) the ‘affine’ model, which is a Bayesian observer more general than the standard model in that it accounts for linear transformations combined with shifts. Comparisons of hand position prediction error were made for each trial of a transformed batch from the 2nd to the 7th, although it should be remembered that trials after the 3rd represent progressively fewer batches, with only 44% of batches lasting to the 4th trial and only 19% lasting to the 7th. The Bayesian observer models integrated information about a transformation from all previous trials of a batch when making a prediction for the next trial. Since the Bayesian observer models were all fit to data from the second trials of each transformed batch (i.e. the standard model used the fits presented above), comparison of prediction errors on the second trials themselves was done using 10-fold cross-validation for these models, in order to avoid over-fitting by complex models.
To compare the models we focus on trial 3, which is late enough that the subjects have received a considerable amount of information about the transformation (just enough to specify the whole transformation matrix, in noiseless conditions) but early enough that all batches can be included.
Models are compared on the basis of their mean error, across subjects and sessions, in predicting subjects' hand positions on trials 2–7 of transformation batches. For each trial, all batches that lasted for at least that number of trials are used. Errors are capped at 20 cm before averaging, to reduce the effect of outliers. Trial 2 values are computed using 10-fold cross-validation, and later trial values are computed using fits to all transformation batches.
We also varied the origin of the linear transformations that we used in the Bayesian observer model, to see if the coordinate system used by the experimental subjects was based around the starting point of the reaches (small circle in
For each small square the shading denotes the performance of the standard Bayesian observer model when the origin of the linear transformations is set to the centre of that square. Performance is measured using the error between modelled and measured second-trial hand positions, averaged within an experimental session for one subject (after capping all errors at 20 cm) and then averaged across all subjects and all sessions. The small circle shows the start point of the reaches, which is used as the origin in all other modelling. The cross shows the approximate position of the eyes (
By exposing participants to numerous linear transformations (
Our study has three key novel features. First, we have developed a technique which can, unlike previous paradigms, estimate the full covariance structure of a prior in a sensorimotor task. Second, we have shown that for our task the prior over visuomotor transformations favors rotation-like structures. Third, we have shown that through experience of a novel correlation structure between the task parameters, participants appropriately alter the covariance structure of their prior.
Previous studies have attempted to determine the natural co-ordinate system used for visuomotor transformations. The dominant paradigm has been to expose subjects to a limited alteration in the visuomotor map and examine generalisation to novel locations in the workspace. These studies show that when a single visual location is remapped to a new proprioceptive location, the visuomotor map shows extensive changes throughout the workspace when examined in one-dimensional
To study this covariance structure in the fitted priors, we analyzed both the correlation coefficients between elements of the transformation matrix – as a measure of the strength of the relationship between elements – and also the orientation of the covariance ellipses of pairs of elements – as a measure of the slope of the relationship. A significant strong negative correlation was seen between the off-diagonal elements of the
as this corresponds to
Vetter and colleagues
Importantly, to measure the prior we ensured that the distribution of transformations in the first session was relatively unstructured in the space of the four elements of the transformation matrix, and in particular the distribution of transformations used had only a very small correlation between the off-diagonal elements. Therefore, it is unlikely (particularly given the adaptation results discussed below) that the prior for rotations came about because of the particular set of transformations used in our paradigm.
Our approach of probing a subject's prior with many transformations would be disrupted if the learning of these transformations interfered with each other. Many studies have shown interference between the learning of similar but opposing visuomotor perturbations
The previous work on visuomotor generalization cited above
Recent studies have shown that when exposed to tasks that follow a structured distribution, subjects can learn this structure and use it to facilitate learning of novel tasks corresponding to the structure
Previous studies have also demonstrated the ability of people to learn priors over novel sensorimotor tasks. For instance, one study showed that subjects learned a non-zero-mean Gaussian prior over horizontal shifts
In the current study we have made a number of simplifying assumptions which facilitated our analysis but which we believe in future studies could be relaxed. First, we have analysed the prior within the Cartesian coordinate system in which the prior is over the elements of the set of
Furthermore, the comparison of different models in this paper (
A further simplifying assumption was that the prior takes on a multivariate Gaussian distribution over elements of the transformation matrix. The true prior could be both nonlinear and non-Gaussian in our parameterization and as such our estimation may be an approximation to the true prior. While it may be possible to develop techniques to find a prior which has more complex structure, such as a mixture of Gaussians, such an analysis would require far more data for the extra degrees of freedom incurred by a more complex model.
Another model assumption is that the subject uses the MAP transformation to choose his hand position. Although it is common for Bayesian decision models to use point estimates of parameters when making decisions, different rules that also take into account the observer's uncertainty over the transformation may better model the data.
Our model was purely parametric, with the observer performing inference directly over the parameters of the transformation matrix. In the future it will be interesting to consider hierarchical observer models which would perform inference over
All eight subjects were naïve to the purpose of the experiments. Experiments were performed using a vBOT planar robotic manipulandum
All subjects gave written informed consent in accordance with the requirements of the Psychology Research Ethics Committee of the University of Cambridge.
In the first session, subjects alternated between making reaching movements under veridical and transformed feedback (see
Transformed trials were the same as veridical trials except that: 1) a linear transformation was applied between the hand's final location and the displayed cursor position and this transformation was kept fixed within a batch; 2) the position of the visual target (3 cm radius) had to satisfy an added requirement not to overlap the cursor position of the preceding trial; 3) to end a batch subjects had to complete at least three trials and place the centre of the hand cursor within a target circle, and 4) starting on the eighth trial, a batch could spontaneously terminate with a probability of 0.2 after each trial.
For the transformed trials the cursor position (
The target color, yellow or blue, indicated whether the trial was veridical or transformed respectively. Subjects were told that on ‘blue’ trials the feedback was not of their actual hand position, but was related to their hand position by a rule. Subjects were told to attempt to learn, and compensate for, this rule in order to hit the targets, and that the rule would be constant across trials until they had hit a target and a set of ‘yellow’ trials had begun. They were told that a new rule was chosen each time a new set of blue trials started, and was unrelated to the rule of the previous set.
In the second and third sessions, subjects again alternated between making reaching movements under veridical and transformed feedback. However, in the transformed feedback batches, full-feedback trials were included in which the transformed hand cursor was continuously displayed throughout the trial, in order to speed up learning of the transformations and thus of the distribution of transformations. On these trials the batch did not terminate on reaching the target (1 cm radius) and these trials occurred randomly after the third trial with probability
To sample a transformation from the correlated distribution used in sessions 2 and 3, elements
In Session 1, the transformation on the first trial was also selected from the correlated distribution. This ensured that the distribution of evidence given to the subject on the first trial was consistent across sessions. However, on the second trial of a batch a new transformation consisted with the first-trial evidence was chosen, and then used for this and all remaining trials of the batch. This new transformation is treated in our analysis as if it had been the transformation throughout the batch, since it would have generated the same evidence on the first trial as the transformation from the correlation distribution. The new transformation was chosen such that across batches there were negligible correlations between any pair of elements in the eventual transformation matrices. To achieve this, at the start of the second trial elements
Correlation in uncorrelated distribution | 1.00 | 0.13 | 0.05 | 0.13 | |
1.00 | −0.09 | 0.03 | |||
1.00 | 0.01 | ||||
1.00 | |||||
S.D. in uncorrelated distribution | 0.64 | 0.62 | 0.72 | 0.53 | |
S.D. in correlated distribution | 0.53 | 0.54 | 0.54 | 0.41 | |
Mean in uncorrelated distribution | 1.12 | 0.01 | −0.01 | 1.07 | |
Mean in correlated distribution | 1.17 | 0.03 | 0.03 | 0.99 |
Top: statistics of the ‘uncorrelated’ and ‘uncorrelated’ distributions, estimated from the 1130 transforms used in Session 1 and the 1091 transforms used in Session 2 respectively.
Our observer model starts each transformation batch within an experimental session with the same prior probability distribution over transformations. Over the course of each batch, it optimally combines this prior with the evidence shown to the subject, and on each trial uses the updated distribution to select its final hand position.
We vectorize the transformation matrix, i.e.
On any transformed trial
Our aim is to find the prior
since for tractability we model the internal representation of the hand position
We now express the likelihood function in terms of the vectorized transformation matrix (
where
We multiply this Gaussian likelihood with the Gaussian distribution over transformations to give an updated distribution over transformations
where
The observer then takes the MAP estimate of the transformation (
It can be shown that scaling the visual noise constant,
For a given prior covariance over the elements of the transformation matrix, the model predicts the optimal locations for the reaches on the second trial of each batch (
with
We then optimized the covariance matrix for each subject in each session to minimize the cost. We did this by optimizing the 10 free elements of the
A trust-region-reflective algorithm implemented by the
825 simulated datasets were created by sampling random ‘generating’ priors (created in the same way as the random precision matrices used to initiate model fits) and running the model on an artificial experiment with 150 transformations chosen as for the real experiments. Zero-mean Gaussian noise of covariance
The model was fit to each of these datasets by taking the best of 100 fits. These best fits always gave a lower cost than did the generating prior, due to the finite sample size of the artificial data set. Since our analysis of priors concentrates on the covariance orientation angles and correlation coefficients between pairs of elements, we sought to establish that the differences between these statistics in the generating and fitted priors were small. The median absolute difference in covariance angle between the generating prior and the fitted prior was
(A) The distribution of the difference in covariance orientation angle between pairs of elements in the generating and fitted priors, aggregated across all six pairings of elements. (B) The corresponding distribution when random priors are compared. (C) The distribution of the absolute difference in correlation coefficient between pairs of elements in the generating and fitted priors, aggregated across all six pairings of elements. (D) The corresponding distribution when random priors are compared.
The standard Bayesian observer model described above correctly assumes the cursor position to be at a linear transformation of the hand position,
The ‘shift’ model assumes the cursor position to be at a shift of the hand position,
The ‘rotation & scaling’ model assumes transformations to consist of a rotation and uniform scaling. This was implemented in polar coordinates centred on the start position, as a shift by
or in vector form,
The ‘affine transformations’ model is the most general of all, assuming the hand position to be subject to a linear transformation and a shift,
The mean transformation is
in order to restrict the number of free parameters to 13 (rather than a possible 21).
The same trust-region-reflective algorithm as for the standard model was used to fit the affine model. A slower active-set algorithm, also implemented by the fmincon function of Matlab's Optimization Toolbox, was used to fit the shift and rotation & scaling models; the choice of optimization method was not so important when fitting these models, which have fewer parameters.
Models were compared on the basis of errors between the predicted and actual hand positions. These predictive errors were capped at 20 cm to minimize the effect of outliers, then averaged across all transformations within an experimental session, and then across all subjects and sessions. For trials 3–7 of transformed batches, the Bayesian observer models used priors fit to the second trial of all transformation batches. For comparing prediction errors on the second trial itself, 10-fold cross-validation was used so that complex models did not benefit from over-fitting. The transformations experienced by a subject in one session were assigned into 10 non-overlapping and evenly-spaced groups. For example, if the session included 111 transformations, group 1 consisted of transformations 1, 11, 21, ..., 101, 111; group 2 consisted of transformations 2, 12, 22, ..., 92, 102, etc. Second-trial hand positions were predicted for each group using priors fit as normal to the other nine groups.