The authors have declared that no competing interests exist.
Conceived and designed the experiments: CK SS AG. Performed the experiments: CK SS. Analyzed the data: CK SS GS AG. Contributed reagents/materials/analysis tools: CK SS. Wrote the paper: CK SS GS AG.
According to a prominent view of sensorimotor processing in primates, selection and specification of possible actions are not sequential operations. Rather, a decision for an action emerges from competition between different movement plans, which are specified and selected in parallel. For action choices which are based on ambiguous sensory input, the frontoparietal sensorimotor areas are considered part of the common underlying neural substrate for selection and specification of action. These areas have been shown capable of encoding alternative spatial motor goals in parallel during movement planning, and show signatures of competitive value-based selection among these goals. Since the same network is also involved in learning sensorimotor associations, competitive action selection (decision making) should not only be driven by the sensory evidence and expected reward in favor of either action, but also by the subject's learning history of different sensorimotor associations. Previous computational models of competitive neural decision making used predefined associations between sensory input and corresponding motor output. Such hard-wiring does not allow modeling of how decisions are influenced by sensorimotor learning or by changing reward contingencies. We present a dynamic neural field model which learns arbitrary sensorimotor associations with a reward-driven Hebbian learning algorithm. We show that the model accurately simulates the dynamics of action selection with different reward contingencies, as observed in monkey cortical recordings, and that it correctly predicted the pattern of choice errors in a control experiment. With our adaptive model we demonstrate how network plasticity, which is required for association learning and adaptation to new reward contingencies, can influence choice behavior. The field model provides an integrated and dynamic account for the operations of sensorimotor integration, working memory and action selection required for decision making in ambiguous choice situations.
Decision making requires the selection between alternative actions. It has been suggested that action selection is not separate from motor preparation of the according actions, but rather that the selection emerges from the competition between different movement plans. We expand on this idea, and ask how action selection mechanisms interact with the learning of new action choices. We present a neurodynamic model that provides an integrated account of action selection and the learning of sensorimotor associations. The model explains recent electrophysiological findings from monkeys' sensorimotor cortex, and correctly predicted a newly described characteristic pattern of their choice errors. Based on the model, we present a theory of how geometrical sensorimotor mapping rules can be learned by association without the need for an explicit representation of the transformation rule, and how the learning history of these associations can have a direct influence on later decision making.
Actions beyond simple reflexes are generally not the direct consequence of a sensory input. Instead, the association of a specific sensory input with an appropriate action has to be learned from experience, and depends on the behavioral context. Often these context-dependent associations can be described in terms of a general mapping rule. In most situations, subjects can choose among more than one associated action. This requires a process for action selection, a form of decision making. We propose that a reward-based learning mechanism for forming new sensorimotor associations is integrated in the action selection system. Through this integration in a common neural substrate, the learning history directly influences the decision process.
While traditional psychological theories tended to view decision making as the outcome of a higher cognitive process which is separate from perception and action
We present a dynamic neural field (DNF) model that unifies the processes of sensory integration, working memory formation, associative learning and action selection in a context-dependent mapping task. The model implements a reward-driven Hebbian mechanism that allows it to learn simple associative sensorimotor mappings from reward history. The model selects from a continuum of ‘behavioral’ options through an integrated competition process between potential action plans. This framework reflects the conceptual idea of integrated sensorimotor and decision processing
With this model, first, we explain adaptive decision behavior and its neural underpinnings in tasks which require rule-based selection of spatial motor goals. As an example, we use the model to mimic the behavioral and neural findings of a previous monkey experiment. In this experiment, the authors investigated the preparatory neural activity in situations in which in response to an ambiguous visual cue, two potential motor goals could be ‘freely’ chosen according to two different spatial transformation rules
Our results provide support for two assumptions, which are more general than the specific examples for which we directly demonstrate the suitability of our approach. The first assumption regards the neural mechanism underlying context-specific “rule-based” spatial remapping in visuomotor tasks. It is in general unclear if rules that can be derived by abstraction from concrete examples are encoded as such in the monkey brain, or if instead the brain stores the individual underlying associations that constitute the rule. We propose that spatial mapping rules are learned, at least in our monkey experiments, by local associations. The nature of local associations limits the ability to generalize a mapping rule and imposes interactions between novel cues and already trained cue locations, which lead to specific patterns of choice errors. The second assumption regards the interaction of sensorimotor learning with adaptive choice behavior in action selection tasks. We propose that the same reward-driven Hebbian learning mechanism which allows learning of arbitrary stimulus-response mappings also contributes to adapting the choice behavior to changing probabilistic reward contingencies in a free-choice task, in addition to other biasing factors for adapting choice behavior. As an inevitable consequence, the learning and reward history influences the decision process, and biases the behavior in free-choice situations.
This study was granted permission to carry out experiments on vertebrates by the Niedersächsische Landesamt für Verbraucherschutz und Lebensmittelsicherheit, No 33-9-42502-047-064/07. All animal work was conducted according to the German Animal Welfare Act and all experiments were conducted in conformity with the European Communities Council Directive of November 1986 (86/609/ECC).
Our approach addresses sensorimotor association learning and decision making in situations in which context-dependent remapping of a spatial sensory (e.g. visual) location onto different motor (e.g. reach) goals is required, and the mapping is achieved according to geometric transformation rules. Different variants of the task were employed in previous studies
In the beginning either a single spatial cue (PMG task, B) or a spatial and a contextual cue (DMG task, A) were presented, indicated by a white circle (spatial cue) and a colored rectangle (contextual cue). During the memory period no cue was shown. The ‘go’-signal indicated the subject to make a reach movement towards the goal, which was either at the same location as the spatial cue (direct trial; green) or at the diametrically opposite location (inferred trial; blue). In one part of the PMG trials the contextual cue was presented at the end of the memory period (PMG-CI), and in another part no contextual cue was shown at all (PMG-NC) and a free choice had to be made (see
We use a model architecture that consists of multiple dynamic neural fields (DNF) to capture the neural processes underlying cue perception, working memory for visual locations, movement plan formation, and movement initiation (
(A) The model consists of four interconnected DNFs and a set of dynamic nodes. The spatial input field, motor preparation field, and motor field are one-dimensional fields that span the space of possible spatial cue/reach directions. The two-dimensional association field is defined over this directional space as well as a second dimension along which selectivity for the contextual cue develops. Its activation is shown color coded (red highest, blue lowest activation). The activation of the two context nodes is shown as a bar plot. Fixed projections between the fields are shown as white arrows; variable projections (that are subject to learning) are shown through dark red arrows with a weight matrix W. (B) Lateral interactions in DNFs, shown exemplarily for the motor preparation field. Exogenous input from other fields (indicated by grey arrows at the bottom) locally increases activation (red). Regions of high activation produce an output signal (the soft threshold of the sigmoid output function is indicated by the dashed line), which acts on other parts of the field and is also projected to other fields of the architecture. The lateral interactions consist of local excitatory connections and surrounding inhibitory connections, which together implement a soft competition between distant field regions. This creates a selection property in the field, promoting the formation of a single peak even for multi-modal input.
The model is largely pre-structured in its inter-field connectivity (white projection arrows in
Selecting the spatial stimulus location as ‘default’ motor plan; this is realized by a direct topological connectivity between a spatial sensory input field and the motor-related fields (
Selecting one reach plan out of several alternatives; this is realized by lateral inhibition and a winner-take-all dynamic (
Memorizing the last stimulus location in absence of the stimulus; this is realized by local excitation which can form self-sustained peaks of activation in DNFs (
These basic functions allow the model to produce a memory-guided reach directly towards a previously cued goal position as a default behavior. In addition to this, the model must be flexible enough to learn different spatial mappings from the spatial sensory input onto the motor output. This is achieved through additional plastic connectivity (red connections in
DNFs describe neural activation patterns through the evolution of continuous activation distributions over time, emphasizing the role of attractor states and instabilities
Here,
The field output is close to zero for low activation levels, rises around a soft threshold (arbitrarily placed at zero), and saturates for higher activations. The specific pattern of lateral interactions promotes the formation of localized peaks of activation as attractor states of the field dynamics (
Depending on the interaction parameters (see
For numerical simulations, the conceptually continuous field dynamics have to be discretized in space and time. To perform comparisons with electrophysiological data, the field output at one point in the field is equated to the firing rate of neurons with corresponding selectivity profile. Evidence for a cortical organization that supports the neural field dynamic have been shown by
The dynamic model for context-dependent reaching consists of a set of interconnected DNFs and discrete nodes that can be organized into four levels: Perception (spatial and context input fields), memory and association (association field), movement planning (motor preparation field) and movement initiation (motor field), which are shown in
A direct pathway from the
The motor preparation field in turn reciprocally projects to the
An additional indirect pathway from the spatial input to the motor preparation field runs through the
The association field receives a second input from a set of two
(A, C, E, G) Weight difference matrix from the context input nodes to the association field. The color at each point of the field indicates the difference of the weights from the inferred context input node and the direct context input node to that point in the association field. In the untrained network, weight differences are randomly distributed around 0 without any spatial pattern (A). Over the course of IR training, distinct areas sensitive for direct or inferred context input evolve at the trained spatial positions (C, E, G). (B, D, F, H) Index shift in the projection from the association field to the motor preparation field (difference between spatial position of a point in the association field and the position in the motor preparation field to which it projects most strongly). In the beginning each point in the same spatial column preferably connects to the corresponding spatial position in the motor preparation field (B, index shift = 0°). After IR training those areas which prefer the inferred context input preferably connect to the opposite spatial position in the motor preparation field, corresponding to an index shift of about 180° (D, F, H).
When a peak has formed in the association field, it remains stable even without exogenous inputs due to strong self-excitation. This peak provides a second input to the motor preparation field. The projection from the association to the motor preparation field is initially topologically organized along the spatial dimension, so that it supports a delayed reach movement to the memorized location of a previously presented spatial cue, but it is likewise subject to learning.
The projections from the context neurons to the association field and from the association field to the motor preparation field are adapted according to a reward-driven Hebbian learning rule
The learning rules are applied once for each trial after the system has selected a response, which is defined as a sufficiently strong peak in the motor field (smoothed field output at one position exceeds a threshold
The connection weights from the context neurons to the association field are updated according to the reward-dependent instar rule:
Here,
With the instar learning rule, only those neurons in the association field that are active during a trial adapt their incoming connection weights from the context nodes. In the case of a positive reward signal, the weights of these neurons are adapted in such a way that the weight patterns become more similar to the current output pattern of the set of context nodes. The neurons whose weights have been adapted will be driven more strongly if the same output pattern of the context nodes appears again in subsequent trials, and will receive proportionally less input from different output patterns of the context nodes. Note that there is no normalization on the presynaptic side, such that multiple regions in the association field can form preference for the same context input without competition between them. This means that the instar rule supports development of divergent projections from the context nodes to the association field.
The weights from the association field to the motor preparation field are adapted according to the reward-dependent outstar rule:
Analogously,
With the outstar rule, the normalization of the weights is reversed compared to the instar rule. Again, weights are only adapted for those neurons in the association field that are active in a given trial (these are now the presynaptic neurons). If the reward signal is positive, outgoing weights of these neurons are adapted in such a way that the weight patterns become more similar to the postsynaptic output pattern in the motor preparation field, which reflects the actually performed reach. In the case of failed trial with negative reward signal, the connections from the active regions in the association field to the active region in the motor preparation field is weakened and the projection to all inactive regions is strengthened. This increases the probability that a different motor response is chosen in the next trial with the same conditions. Due to the normalization in this learning rule, each region in the association field can only strongly support a single motor response, but different regions may support the same response without competition between them. This means that the outstar rule supports development of convergent projections from the association field to the motor preparation field.
In the first step, we will use our model to reproduce and explain the behavioral and neural observations of a previous monkey experiment, in which the authors investigated neural selectivity in the frontoparietal cortex during selection of rule-based spatial motor-goals
Due to the pre-structuring, our model by default (without further training) produces ‘direct’ reaches, i.e., reaches to the spatial cue location. We trained the model to perform both direct and inferred reaches depending on the context (
This task was used as a control condition to test if the model properly had learned to make direct and inferred choices depending on the context during IR training. In the DMG task the spatial and the contextual cue were presented simultaneously at the beginning of the memory period (
This task was used to examine the ongoing decision making process in situations with incomplete information (partial pre-cueing). It was a variation of the DMG task in which the spatial and the contextual cues were separated in time (
We used the PMG-NC task to test the free-choice behavior of the model. PMG-NC trials were identical to PMG-CI trials, except that no context instruction was shown at all. In this case two different reward schedules decided about which trials were rewarded and which not (see below). When the model was trained with PMG-NC trials, these were randomly interspersed with PMG-CI trials (PMG-NC:PMG-CI ratio 40∶60), equivalent to the monkey experiment.
In the PMG-NC trials, two algorithms determined which of the two potential motor goals would be rewarded. In the
The monkey behavioral and neuronal data which we refer to in this study are taken from a previous electrophysiological study and are described in detail elsewhere
During the IR training task the model acquired the initially unknown inferred mapping rule, in addition to the default direct mapping. The model forms the required stimulus-response associations in the following way: The spatial cue induces an activation peak at the corresponding location in the association field, and at the same time in the motor preparation field (via the direct pathway). The association field peak remains self-sustained after the input disappears, and keeps supporting the activation in the motor preparation field, due to the a-priori topology of this projection (
The IR trained model was then tested in the DMG task. It reached a performance of 99% (n = 4000). This successful training confirms that the model can perform both the direct and inferred reach in a flexible context-depending manner, by re-learning local associations. To test whether this architecture and its integrated learning mechanism can solve a more general class of tasks, we also tested the system with a larger number of different contexts, all indicating different mapping rules. For three different contexts (with associated rotations of 0°, 180°, and 90°), the model still reached a performance of 94% in the DMG task after an analogous training procedure. For four contexts (indicating required rotations of 0°, 180°, 90° and 270°), a performance of 90% was reached (n = 4000). The decrease in performance for a higher number of different contexts is a consequence of interference between different context preferences in the association field: If the number of contexts becomes too high, the context specific regions that form during learning are no longer cleanly separated, and the corresponding projections to the motor preparation field do not form correctly. This limitation could be overcome in the model either by making the interactions in the associations field sharper (decreasing the kernel width in the context dimension) or by increasing the field size along the context dimension. In a biological system, the former would correspond to a sharpening of tuning properties of the neurons and the latter would correspond to the recruitment of a larger number of neurons for the association task.
A mechanism which learns spatial transformations via local associations instead of global geometrical rules is limited in its ability to generalize to new cue locations. We tested the generalization limits of our model and compared it to that of a monkey that performed the same task. The model was IR-trained with four spatial cue locations (e.g. the four cardinal directions) as described before. The model was then tested with four novel cue locations at positions between the trained locations (oblique directions). The model was unable to fully generalize and perform the task to the new locations. Importantly, the model made specific goal selection errors (
Reaches performed by monkey (black) and model (white) were analyzed when generalizing from cardinal to oblique spatial cue directions. Bars show proportion of reaches in a direction relative to the rewarded goal (this means, 0° reaches are directed towards the correct goal, all others are failed reaches). Direct reaches to cardinal (A) and oblique (C) goals are almost always performed correctly. Inferred reaches to trained (cardinal) goals (B) are also almost always performed correctly, as was to be expected. If inferred reaches were required to oblique positions (D), both monkey and model show a similar pattern of failed reaches, illustrated in the inset of panel (D): Most reaches were made either in a previously trained cardinal direction adjacent to the goal direction (red, deviation of 45°) or in the direction of the spatial cue, meaning that a direct reach was performed (green, deviation of 180° from the goal direction).
The monkey control experiment was performed accordingly (previously unpublished data from one monkey). After learning context-specific direct and inferred reaches (as described in
In summary, in the way we implemented a context-specific mapping task via local association learning in our model, it predicted specific spatial generalization errors which we could confirm in the monkey behavior. The model provides a mechanistic explanation for these particular error types. As detailed in the previous section, the association field has formed two context-specific regions for each trained spatial cue direction. This is the result of the Hebbian learning. The regions which are specific for the inferred context project to the reach field at a position opposite to the spatial cue direction. If spatial input arrives from the spatial input layer for an untrained direction, together with an inferred-context signal from the context input nodes, it will create a peak between two of the context specific sub-regions in the association field (
Two snapshots of the activation patterns in the model during the memory period are shown, taken from different trials that developed different movement plans due to random noise in the model. In both cases, the spatial cue was located at 225° (an oblique direction not used during training), the blue context input indicates that an inferred reach should be performed. The model is depicted in the same form as in
We note that the adjacent direction error can also occur in oblique trials with the ‘direct’ context signal, and does appear in the simulation results in a small proportion of trials (
A core idea of our approach is that the mechanisms which are implemented for learning sensorimotor associations allow the network to also adapt its reward-based choice behavior. We tested this by confronting an IR-trained model with different reward schedules. To emulate the scenario of the previous monkey experiment
After IR-training, the model is capable not only of correctly performing DMG trials, but also instructed trials in which the context cue appears later than the spatial cue (PMG-CI trials). In these trials, the model achieved 92% (n = 4000) correct choices (monkey performance in electrophysiological study was >98%). We then probed the model's free-choice behavior by presenting a spatial cue but no context cue (PMG-NC trials).
If no context instruction is given in a trial, both model and monkeys show an inherent bias to perform the inferred reach after training (A). A balanced choice behavior (B) can be achieved by application of an appropriate reward schedule (BRMS).
The bias for selecting inferred reaches is also apparent in the output pattern of the motor preparation field in the model (
Plots show the averaged and normalized field output from the motor preparation field in the model (A, C) and from electrophysiological recordings in PRR (B, D) during the PMG task. Prior to averaging and normalizing, the real and model neurons' selectivity profiles were aligned according to their preferred directions in DMG trials (PD: preferred direction, OD: opposite-to-preferred direction). The averaged and normalized activity of real neurons during the PMG task in the biased (B) and balanced (D) datasets is shown for three epochs, aligned to cue onset, ‘go’-signal, and movement onset, since the length of the epochs was variable. The model neurons were aligned accordingly even though the epochs had fixed lengths. It can be seen that during the memory period in the model and in the real data plots, only one activation ridge is stable throughout the memory period, before a bias minimizing reward schedule (BMRS; see
We then switched to a bias-minimizing reward schedule (BMRS). The model parameters (connection weights) that had developed in the previous testing phase were taken as starting conditions. The model developed a balanced choice behavior under the new reward schedule (
A major implication of an overlapping neural substrate and shared learning mechanism for sensorimotor association learning and reward-based action selection is that the learning history must inevitably influence the choice behavior. A surprising finding in our previous monkey study was the strong bias of the well-trained monkeys to almost exclusively prepare and execute the inferred reach in free-choice situations with EPRS reward. We hypothesized that this bias arose from the higher number of inferred reach trials during early training
In the model, the initial presentation of the spatial cue induces the formation of a sustained activation peak in the association field (
The figure shows two snapshots of the activation patterns in the model during a single PMG trial. (A) During the memory period, after the presentation of a spatial cue, an activation peak has formed in the association field. Its position along the spatial axis reflects the direction of the spatial cue, while its location along the second dimension is unspecific and spans both context-sensitive regions (shown as outlines in the association field, green for direct, white for inferred context). The region that shows preference for the inferred context is substantially larger than the direct-context region, due to the high proportion of inferred trials during training. This region projects to the location in the motor preparation field which codes for a reach in the direction opposite to the spatial cue. The competitive interactions in the motor preparation field further amplify this stronger input that supports the inferred reach. (B) When a context signal for a direct trial is given at the end of the memory period, the context input induces a shift of the peak in the association field: It is pulled almost completely onto the region specific for the direct context with which it partly overlapped. The input to the motor preparation field changes accordingly, leading to a switch in that field's activation pattern and a stronger activation of the ‘direct’ reach direction.
As presented above, when the model is trained with a ratio of 80% inferred trials to emulate the intense inferred reach training procedure in the electrophysiological study, it develops a bias to prepare the inferred reach in PMG trials, with a time course of activation in the motor preparation field that qualitatively reproduces the recorded neural activity in monkeys (
We systematically varied the ratio of direct to inferred trials during the IR training in the model and tested the resulting spatial response profiles in the motor preparation field and the choice behavior in PMG-NC trials (
(A) The behavioral bias for inferred reaches in the free-choice trials depends on the percentage of inferred trials during IR training and rises continuously in a sigmoidal fashion (logistic fit function; black curve). (B) The difference of the mean activation of the motor preparation field at the preferred and opposite-to-preferred position during the memory period shows a softer, but also approximately sigmoid increase when the number of inferred trials is increased.
Note that this result is indeed an effect of the input statistics, not of the expected reward for different choices. Even in training sets with 100% reward rate for both direct and inferred reaches the described biases still developed in the model (data not shown).
Neurophysiological data suggests that learning of sensorimotor associations, decision making, and movement planning share a common neural substrate, that includes frontoparietal sensorimotor areas
Delineating a 1-to-1 correspondence between our model and the neurophysiological functional architecture of the primate brain can obviously only be coarse for several reasons. For example, it is yet unclear to a large extent, how many, which and in which way brain areas (cortical and subcortical) contribute to such high-level tasks as context-dependent, rule-based, and reward-driven visuomotor reach-goal selection. For example, similar task-related neural activation patterns during spatial goal selection tasks can be found in parietal and in premotor areas (e.g.
Yet, the model should be seen as a rough sketch of cortical frontoparietal visuomotor processing. The spatial input field is retinocentrically organized, and mimics the organization of extrastriatal visual cortex and the available dorsal-stream visual input to the frontoparietal reach network via areas V6/V6a in the parieto-occipital sulcus
The indirect pathway in our model allows for flexible, context-specific, goal-directed behavior. The two-dimensional association field, which implements the working memory and the actual rule learning, is reminiscent of processing in the cortico-basal loops between PFC and the premotor cortex (PMC) with the basal ganglia
When we designed the adaptive DNF architecture, we assumed that the behavior of the monkeys in the experiment did not rely on an explicit representation of a geometrical transformation rule to achieve the visuomotor mapping, but rather on specific associations between individual stimulus combinations and the rewarded motor response. This may at first seem counterintuitive for a task that can be described unambiguously through a simple rule. It should be noted, however, that from a computational perspective the forming of concrete associations (which can be achieved by established mechanisms like Hebbian learning) is much more straightforward than the recognition and implementation of a general rule.
Our assumption was supported by the control experiment, in which the monkey had to generalize the learned mapping “rule” to untrained positions. If the monkey applied a geometric transformation rule, one would have expected easy generalizing to novel goal directions. Instead, the monkey showed a highly specific pattern of errors that the model was able to predict, and which in the model was an emergent effect of the local association learning. Note that the observed failed reaches could not be explained by a failure of the monkeys to identify the proper context, since direct trials in all directions and inferred trials in cardinal directions were conducted correctly. Instead, the associated motor responses to the untrained oblique goal positions in the inferred context were undefined. This led to responses which were either guided by the default behavior (a seeming ‘context’ error), or which resulted in the selection of a neighboring trained motor association (adjacent direction error). These observations suggest that the context-dependent reach task in monkeys was not learned through the application of a general mapping rule to the spatial cue positions, but rather by individual, local associations between the spatial and context cue and the rewarded reach location.
The adaptable DNF model implements such association learning in that it develops specialized attractor states in the association field, with dedicated sub-regions which prefer different mapping “rules”. In this implementation the context errors originate in the initial topological connection pattern from the association field to the motor preparation field, which is still prevailing after the IR-training for those spatial cue directions that have not been trained. The adjacent direction errors can be explained by a spread of activation from such untrained regions in the association field to neighboring sub-regions which were affected by the IR training and are now associated with the inferred context.
In the model, the adjacent direction error also occurs in small percentage of the oblique direct trials, due to the same mechanism. The fact that this error is not observed in the experimental data may indicate that some aspect of the task is not fully captured by the computational model. For instance, the representation and processing of the direct and inferred context signals may not be as symmetrical in the biological system as it is in the model. In particular, the context signal might affect the processing more globally, e.g. by strengthening the direct pathway for the ‘direct’ context cue and the indirect pathway for the ‘inferred’ context cue. This would decrease the impact of training certain directions on the behavior in direct trials. Such a mechanism of executive control has previously been employed in a DNF model of task switching
We note that this implementation of the spatial mapping rule in the association field does in principle not have to be locally restricted. If the sub-regions that implement the ‘inferred’ mapping were expanded over the whole spatial dimension, and if their projections to the motor preparation field were changing in a more continuous fashion, they could implement the general mapping rule for arbitrary spatial cue directions. Forming such a connection pattern would require a sufficiently large number of training directions, which would provide the necessary fine sampling of the directional space to generalize the mapping rule to all directions through averaging. Conversely, the model in its current state is not capable of generalization in a stricter sense, such as the transfer of a rule to completely novel stimuli. Introducing such capabilities would require a substantial extension of the current architecture.
This does not mean that the mechanism we presented cannot also be involved in the learning of abstract rules. It is conceivable (e.g. in the case of humans performing this task) that generalized connection patterns as described above for different mapping rules accumulate and prevail in the system. Learning a specific variation of a mapping task then only requires the association of the context cue with the appropriate known mapping. This would allow a fast generalization from few examples. In general, however, we propose that the learning via local associations may be the default case, and that forming of true generalizations is an extension that builds on previously learned associations and additional neural structures.
With the adaptive DNF model, we integrate two behavioral functions in a single neural architecture. On the one hand, we provide a process model of movement plan formation and action selection. It is in this respect similar to another recent modeling study of decision making in the fronto-parietal cortex
A reward-driven Hebbian learning algorithm enables the model to adapt to changes in the reward schedule in a manner similar to what is called the ‘matching law’. This means, biases in the reward schedule can produce biases in choice behavior and thereby adapt the choice to the reward probabilities
Interestingly, in free-choice tasks which do not encourage balanced behavior (i.e., choice-independent reward schedules like our EPRS), the learning algorithm can easily lead to a biased behavior. Even small imbalances in the probabilities of either choice can self-enhance the probability of the same choice in later trials. This is especially true if the reward probability is high (e.g. 100%), in which case an initially randomly chosen option will be more likely to be chosen again. Such a behavioral bias in free-choice trials is evident in our electrophysiological study
Not only does the free-choice reward learning affect the learned associations, but, vice versa, the input statistics during learning of the stimulus-response associations also have an impact on the free-choice behavior, as our model results show. For example, the model's free-choice behavior can be biased even if the model is perfectly able to solve the instructed tasks. Humans rely on prior probabilities if they have to base their decision on lacking or ambiguous evidence
Our model successfully integrates sensorimotor processing and working memory formation with decision making. The reward-driven Hebbian learning mechanism which we use for learning context dependent visuomotor mappings is sufficient to also explain adaptation to probabilistic reward contingencies and at the same time creates susceptibility for input statistics during learning. With our model we could reproduce the electrophysiological results from a previous study
(PDF)
(PDF)
(PDF)