Analyzed the data: MH. Contributed reagents/materials/analysis tools: MH. Wrote the paper: MH DN. Conceived and designed the simulations: MH. Performed the simulations: MH. Mathematically proved the convergence of the equations: MH DN.
The authors have declared that no competing interests exist.
Recent theoretical studies have proposed that the redundant motor system in humans achieves well-organized stereotypical movements by minimizing motor effort cost and motor error. However, it is unclear how this optimization process is implemented in the brain, presumably because conventional schemes have assumed a priori that the brain somehow constructs the optimal motor command, and largely ignored the underlying trial-by-trial learning process. In contrast, recent studies focusing on the trial-by-trial modification of motor commands based on error information suggested that forgetting (i.e., memory decay), which is usually considered as an inconvenient factor in motor learning, plays an important role in minimizing the motor effort cost. Here, we examine whether trial-by-trial error-feedback learning with slight forgetting could minimize the motor effort and error in a highly redundant neural network for sensorimotor transformation and whether it could predict the stereotypical activation patterns observed in primary motor cortex (M1) neurons. First, using a simple linear neural network model, we theoretically demonstrated that: 1) this algorithm consistently leads the neural network to converge at a unique optimal state; 2) the biomechanical properties of the musculoskeletal system necessarily determine the distribution of the preferred directions (PD; the direction in which the neuron is maximally active) of M1 neurons; and 3) the bias of the PDs is steadily formed during the minimization of the motor effort. Furthermore, using a non-linear network model with realistic musculoskeletal data, we demonstrated numerically that this algorithm could consistently reproduce the PD distribution observed in various motor tasks, including two-dimensional isometric torque production, two-dimensional reaching, and even three-dimensional reaching tasks. These results may suggest that slight forgetting in the sensorimotor transformation network is responsible for solving the redundancy problem in motor control.
It is thought that the brain can optimize motor commands to produce efficient movements; however, it is unknown how this optimization process is implemented in the brain. Here we examine a biologically plausible hypothesis in which slight forgetting in the motor learning process plays an important role in the optimization process. Using a neural network model for motor learning, we initially theoretically demonstrated that motor learning with a slight forgetting factor consistently led the network to converge at an optimal state. In addition, by applying the forgetting scheme to a more sophisticated neural network model with realistic musculoskeletal data, we showed that the model could account for the reported stereotypical activity patterns of muscles and motor cortex neurons in various motor tasks. Our results support the hypothesis that slight forgetting, which is conventionally considered to diminish motor learning performance, plays a crucial role in the optimization process of the redundant motor system.
The motor system exhibits tremendous redundancy
The hypothesis that the brain selects a solution that minimizes the cost of movement has long been proposed
It should be noted that these conventional optimization studies tacitly assume that the brain somehow constructs a motor command that theoretically minimizes the cost function, and largely ignored the underlying trial-by-trial learning process
However, it is unknown whether the decay algorithm could minimize the cost (
To gain insight into these mechanisms, we conducted computer simulations of motor learning by applying the “feedback-with-decay” algorithm to a redundant neural network model for sensorimotor transformation. First, we used a simple linear model to gain a firm theoretical understanding of the effect of the decay on the minimization of the cost (
As a simple example of a redundant motor task, we considered a task that requires the production of torque in a two-joint system with redundant actuators (
(
First, we considered the case where the synaptic weights are solely modified to reduce the error, according to the following equation:
Trial-dependent changes in the magnitude of error (
However, the situation was considerably different when modification of the synaptic weights based on error feedback was not perfect, but incorporated
In mathematical terms, the modification of the synaptic weights based on the feedback-with-decay rule (Eq. (3)) is similar to the gradient descent rule for minimizing the cost function
Furthermore, we have also proven that the synaptic weight matrix (
The above results indicate three important points regarding the “feedback-with-decay” rule. First, the optimal solution can be obtained using only
Another interesting observation regarding the formation of the bias of the PDs is that when the initial synaptic weight is relatively small (see cyan trace in
In summary, in the linear neural network model, the “feedback-with-decay” rule consistently leads to the optimal synaptic weight and the optimal PD bias, whereas the “feedback-only” rule only predicts the approximate direction of the optimal PD bias in limited conditions.
Next, we examined whether these aspects hold true in non-linear neural network models that additionally include a muscle layer whose activity (
(
Firstly, each corticospinal neuron receives the desired movement parameters from the input layer and their firing rate obeys cosine tuning
First, we simulated the isometric torque production task with a two-joint system (shoulder and elbow) conducted by Herter et al.
Trial-dependent changes in the magnitude of error (
Interestingly, the predicted PD distribution (
Thus, error-based learning with slight forgetting seems to predict the non-uniform PD distribution of M1 neurons; however, what happens if forgetting is not slight? Theoretical considerations suggest that a relatively larger decay rate led to the system assigning much more weight to minimize the motor effort cost (
Next, we examined whether the weight decay rule can predict the characteristic bias of the PD distribution of M1 neurons observed during the reaction time period before reaching movements. Since the activity of M1 neurons just before reach initiation would reflect the activity necessary to produce the initial acceleration, we focused on the initial ballistic phase of a reaching movement. To mimic the initial phase, we modified the network by replacing the “desired torque” in
(
(
First, we simulated the reaching task with a two-joint system in a horizontal plane described by Scott et al.
The model was further extended to 3D reaching movements.
It has long been hypothesized that well-organized stereotypical movements are achieved by minimizing the cost (
A small number of previous studies have proposed a mechanism for how the cost of the motor effort is minimized in the brain on a trial-by-trial basis. Kitazawa
In contrast, recent studies have suggested that forgetting might be useful to minimize the motor effort
The present study further applied the “feedback-with-decay” algorithm to the sensorimotor transformation network, which includes M1 neurons. We initially used a linear neural network and theoretically derived the necessary conditions for convergence on the optimal state. Importantly, these conditions seem to be satisfied in the actual brain. First, the decay rate is known to be much smaller than the learning rate
The “feedback-with-decay” rule can be considered as biologically plausible in that it does not need to explicitly calculate the sum of the squared neural activity (total effort cost) by gathering activity information from a vast number of neurons. Since weight decay in each synapse could occur independently of other synapses, a global summation across all neurons would not be needed. Using a framework of weight decay, it would be possible for the CNS to minimize even the motor effort cost during movement of the whole body. One may argue that since we perceive tiredness, the brain must compute the total energetic cost (or motor effort cost); however, to the best of our knowledge, individual neurons that encode the total energetic cost have not been discovered. It is rather likely that such a physical quantity is represented by a large number of distributed neurons in the brain and this distributed information may be perceived as tiredness. Since it is unclear whether the total energetic cost could be readout from such distributed information, decay would be a more promising mechanism for minimizing motor efforts. Furthermore, our simulation results indicate that the formation of an optimal PD distribution pattern for M1 neurons was not necessarily accompanied with the realization of a nearly optimal muscle activation pattern (compare
Although we referred to the “feedback-with-decay” algorithm as biologically plausible, it should be noted that our simulation algorithm is not fully biologically plausible because it still depends on an artificial calculation (i.e., error back-propagation). Although it is well established that error information is available to the cerebellum
The important point of the present study is that we theoretically proved that the “feedback-with-decay” rule consistently leads the PDs of M1 neurons to converge at a distribution that is orthogonal to the MD distribution. Although Guigon et al.
Importantly, the non-linear model combined with the realistic musculoskeletal parameters can reproduce the non-uniform PD distribution of M1 neurons observed during various motor tasks. The origin of the PD bias has been a hotly debated topic in neurophysiology
Another interesting finding is that even the “feedback-only” rule predicts the skewed PD distribution of M1 neurons approximately if the two following conditions are satisfied: a large number of neurons participate in the task (condition #2) and the initial synaptic weight is considerably smaller than the pseudo-inverse matrix (
According to our mathematical consideration, the weight decay rate must be substantially lower than the learning rate (see Supporting
The present scheme also implies that motor learning has two different time scales: a fast process associated with error correction and a slow process associated with optimizing efficiency through weight decay (
Due to its simplicity, our model provided clear insights into the role of weight decay on optimization; however, of course, it has several limitations. First, the model considered only corticospinal neurons, although M1 also includes inhibitory interneurons. However, it is noteworthy that our model could predict the PD distribution of M1 neurons recorded from non-human primates, suggesting that most of the neurons recorded in previous experiments were corticospinal neurons. Indeed, considering the large size of corticospinal pyramidal neurons, it is likely that the chance of recording these neurons is relatively high because stable isolation over an extended period of time is required in such experiments
Second, a uniform distribution was assumed for the neuron-muscle connectivity (
Thirdly, the model only considered static tasks (i.e., isometric force production) and an instantaneous ballistic task (i.e., the initial phase of the reaching movement). Such a single time point model is unrealistic for reaching movements in that it ignores the change of limb posture, posture-dependent changes in the muscle moment arms, multi-joint dynamics during motion, and the deceleration phase. This limitation prevents us from predicting the essential features of movement such as trajectory formation and online trajectory correction
First, we used a linear neural network to transform the desired torque (input layer) into the actual torque (output layer) through an intermediate layer that consisted of 1000 neurons (
The network was trained to produce the appropriate output torque by randomly presenting 8 target torques (
In the feedback-only rule, the synaptic weight
The procedures in the feedback-with-noise rule were the same as in the feedback-only rule, except that SDN was added to the actuator activity and synaptic modification. The activation of each actuator was determined by:
In the feedback-with-decay rule, the synaptic weight
The initial synaptic weights were set to random values as follows:
To confirm the effectiveness of weight decay in a more realistic model, we also considered a neural network model with a muscle layer whose activity (
Using realistic muscle data, we modeled a 2D upper limb that had 2 degrees of freedom (DOF; shoulder and elbow joints) with 26 muscle elements (
The network model can also be applied to the task of producing the linear acceleration of the fingertip (i.e., the initial phase of the reaching movement) by replacing the torque in
We further extended the model to 3D reaching movements. We modeled a 3D upper limb with 4 DOF; (3 DOF for the shoulder and 1 DOF for the elbow) with 26 muscle elements (
For the 3D simulation, 14 equally spaced targets (
the probability of appearance was equal for all 14 targets;
the probability for targets #1 and #3 was 8/28 (1/28 for the other targets);
the probability for targets #2 and #4 was 8/28 (1/28 for the other targets);
the probability for targets #5 and #6 was 8/28 (1/28 for the other targets).
In total, we conducted 20 (5 initial weights×4 probability conditions) simulations.
To examine the significance of the bimodal distribution obtained from the simulation, we performed the Rayleigh test for uniformity against a bimodal alternative (
(TIF)
(TIF)
(TIF)
(TIF)
(DOC)
(DOC)
(DOC)
(PDF)