Conceived and designed the experiments: SJK. Performed the experiments: SJK KJF. Analyzed the data: SJK KJF. Contributed reagents/materials/analysis tools: KJF. Wrote the paper: SJK JD KJF.
The authors have declared that no competing interests exist.
In this paper, we suggest that cortical anatomy recapitulates the temporal hierarchy that is inherent in the dynamics of environmental states. Many aspects of brain function can be understood in terms of a hierarchy of temporal scales at which representations of the environment evolve. The lowest level of this hierarchy corresponds to fast fluctuations associated with sensory processing, whereas the highest levels encode slow contextual changes in the environment, under which faster representations unfold. First, we describe a mathematical model that exploits the temporal structure of fast sensory input to track the slower trajectories of their underlying causes. This model of sensory encoding or perceptual inference establishes a proof of concept that slowly changing neuronal states can encode the paths or trajectories of faster sensory states. We then review empirical evidence that suggests that a temporal hierarchy is recapitulated in the macroscopic organization of the cortex. This anatomic-temporal hierarchy provides a comprehensive framework for understanding cortical function: the specific time-scale that engages a cortical area can be inferred by its location along a rostro-caudal gradient, which reflects the anatomical distance from primary sensory areas. This is most evident in the prefrontal cortex, where complex functions can be explained as operations on representations of the environment that change slowly. The framework provides predictions about, and principled constraints on, cortical structure–function relationships, which can be tested by manipulating the time-scales of sensory input.
Currently, there is no theory that explains how the large-scale organization of the human brain can be related to our environment. This is astonishing because neuroscientists generally assume that the brain represents events in our environment by decoding sensory input. Here, we propose that the brain models the entire environment as a collection of hierarchical, dynamical systems, where slower environmental changes provide the context for faster changes. We suggest that there is a simple mapping between this temporal hierarchy and the anatomical hierarchy of the brain. Our theory provides a framework for explaining a wide range of neuroscientific findings by a single principle.
Our brains navigate our bodies, including our sensory apparatus, through a
dynamically changing environment. This is a remarkable achievement, because a
specific behaviour might be optimal in the short-term, but suboptimal over longer
time periods. It is even more remarkable that the brain selects among different
behaviours quickly and online. Causal dynamics and structure in the environment are
critical for selecting behaviour, because the brain can learn this structure to
predict the future, and exploit these predictions to negotiate the environment
adaptively. Ontogenetically, there is good reason to believe that the brain learns
regularities in the environment from exposure to sensory input and internally
generated signals
For an adaptive agent, surprise means sampling unexpected input given the
expectations of the agent. Mathematically, surprise or improbability is quantified
by −ln
To predict extero- and interoceptive input online, an agent must entertain dynamic
expectations about its input using an internal model of environmental causes and
their trajectories. These models reduce high-dimensional input to a few variables or
‘causes’ in the environment. These environmental causes do not
need to be physical objects but can be any quantity that predicts the
agent's past and future sensory input (we use prediction here in reference
to the mapping between causes and their sensory consequences; this mapping subsumes
but is more than a forecast of future events). Critically, from the point of view of
an agent, its body is a part of the environment. Therefore, internal models embed an
agent's knowledge about how environmental dynamics, including its own
movements, generate sensory input
In general, the sensory consequences of environmental causes are mediated by
dynamical systems. This necessarily induces delays in the mapping between causes and
their sensory consequences. How can an agent accommodate this temporal dislocation
to explain causes
Predictions about sensory input at fast time-scales become imprecise when projected
too far into the future. One way to deal with this uncertainty is to use concepts to
guide representations at shorter time-scales. If predictions of sensory input remain
veridical at a fast time-scale and action ensures these predictions are fulfilled,
the agent will avoid surprising input. The ensuing behaviour would be consistent
with the agent's concepts. Note that an agent following this principle can
still handle novel, unexpected input, although the agent might experience a large
prediction error and adapt its internal model accordingly (see simulations). If the
high-level representations or concepts prove correct in predicting sensory input,
they confirm the validity of those concepts. Therefore, concepts can be seen as
self-fulfilling prophecies, which, given a compliant environment, would appear to
mediate goals, plans and long-term strategies for exchange with the world
The novel contribution of this paper is to consider hierarchical models, in which high-level states change more slowly than low-level states, and to relate these models to structure-function relationships in the brain. The basic idea is that temporal hierarchies in the environment are transcribed into anatomical hierarchies in the brain; high-level cortical areas encode slowly changing contextual states of the world, while low-level areas encode fast trajectories. We will present two arguments in support of this hypothesis. First, using simulations, we will demonstrate that hierarchical dependencies among dynamics in the environment can be exploited to recognise the causes of sensory input. The ensuing recognition models have a hierarchical structure that is reminiscent of cortical hierarchies in the brain. Second, we will consider neuroscientific evidence that suggests the cortical organisation recapitulates hierarchical dependencies among environmental dynamics.
Note that this paper is not about hierarchies of neuronal dynamics; see e.g.
In this section, we present a modelling approach to show, as a proof-of-principle, that perception can be understood in terms of inverting hierarchical models and that these models entail a separation of temporal scales.
Here, we model the neuronal states of an internal model in an abstract fashion, to describe their evolution under continuous sensory input. This allows us to focus on how the brain could exploit dependencies between dynamics at different time-scales, using internal models.
We pursue the notion that synthetic agents can extract information about another agent, at various time-scales, by modelling the sensory input, originating from the other agent, with an internal, generative model. We will describe how an agent produces a song and how another agent decodes the auditory input. We will deal with environmental dynamics at two different time-scales (fast and slow). In our model, we let the dynamics at the slow-scale enter as ‘control’ parameters of dynamics at the fast scale.
Our example uses birdsong: There is a large body of theoretical and experimental
evidence that birdsongs are generated by dynamic, nonlinear and hierarchical
systems
It may be that the recognition of human song or speech is implemented using
hierarchical structures too; although the experimental evidence for this seems
much scarcer. In particular, speech has been construed as the output of a
multi-level hierarchical system, which must be decoded at different time-scales
Recently, Laje et al.
To generate birdsong sonograms, we use the Lorenz attractor, for both levels.
For both levels, we used
We will call the vectors
(A) At the first level, there are two outputs (i.e., data) (left: blue and green solid line) and three hidden states of a Lorenz attractor (right: blue, green, and red solid line). The second level is also a Lorenz attractor that evolves at a time-scale that is one magnitude slower than the first. At the second level, the causal state (left: blue solid line) serves as control parameter (Rayleigh number) of the first-level attractor, and is governed by the hidden states at the second level (right: blue, green, and red solid line). The red dotted lines (top left) indicate the observation error on the output. (B) Sonogram (time-frequency representation) constructed from model output. High intensities represent time-frequency locations with greater power.
Inversion of this forward model corresponds to perception or mapping from the
sonogram to the underlying cause in the singing bird. In this example,
recognition involves the online estimation of the states at both levels.
Although two of the states (those controlling amplitude and frequency of the
acoustic input) at the first-level are accessed easily, the third
Given some sensory data
The free-energy comprises an energy term
Generally, the variables
Here, Equations 1 and 2 specify the generative model in terms of the likelihood
function
Under the free-energy principle, the agent must implement models that represent,
at each moment in time, the dynamics of causes in the environment, as in
Equations 1 and 2. Because these equations also prescribe how the motions of
various states couple to each other, our generative model covers not just the
states but their motion, acceleration, and higher order velocities. These are
referred to collectively as ‘generalised coordinates of
motion’, in the sense that the trajectory (or motion) of any dynamical
system can be described within this frame of reference. We use the following
notation for a vector of generalized coordinates:
In our simulations, we used six high-order temporal derivatives for the hidden
states
In this section, we generate synthetic birdsong using the coupled Lorenz
oscillators described above and model a ‘listening’ bird
during song recognition by inverting the model using Equation 5, where we
consider the conditional moments,
In
Observed
data (see
To simulate the LFPs we multiplied the prediction errors by their precision to
simulate the activity of neurons encoding prediction error: We assume here that
LFPs are an expression of prediction error, see
We deliberately chose to generate both levels of the birdsong with the same (Lorenz) attractor to show it is possible to invert generative models with temporal hierarchies comprising more than two levels: because we were able to reconstruct the dynamics at the second level given the first, we can argue, by induction, that this process is repeatable to any hierarchical order, with increasing temporal scales. This is because the dynamics at the second level are exactly the same as the first (but evolve more slowly). Having established that the online perception returns sensible results, we can ask two interesting questions. First, what happens when the sensory input violates hierarchical predictions? Second, how would the second level express itself empirically, using LFPs and lesion studies?
First, we simulated a surprising song, in which the last chirps were omitted. We
stopped the bird's singing after 1.4 seconds, which effectively removes
the last two chirps (
The sensory data presented in
This example was chosen to show how hierarchical models might disclose themselves
empirically. Consider the simulated LFP responses based on prediction error in
Here we simulated a synthetic bird whose second level had been removed. In
(A) The single-level model can explain the data (no song interruption) well. (B) The single-level model quickly approaches the zero line after an interruption at 1.4 seconds. (C) Simulated LFPs for model inversion in (A). (D) Simulated LFPs for model inversion in (B).
First, the larger and more enduring prediction error of the two-level system
signals that something unexpected and potentially important has happened (a cat
might have put an abrupt end to the rendition). The second-level prediction
error could then be explained away by supraordinate causes (i.e., a nearby
predator) whose representation may be essential for survival. In short,
hierarchical systems can register and explain away surprising violations of
temporal succession, on extended time-scales. Second, the two-level system can
infer slowly changing causes to which the single-level system is blind. These
second-level dynamics may carry useful information; for example, that the
singing bird is strong and well-fed. Missing this information may pose a serious
disadvantage when it comes to choosing a mate. Finally, the second level adds
stability to the inversion process and renders recognition more robust to random
fluctuations in the environment. The coupling of the fast to the slow level
improves inference on degraded sensory input by providing empirical priors. This
is shown in
We show only the output of each model and the causal state of the two-level model. (A) The two-level model can explain the data relatively well, although it misses the third syllable. (B) The single-level model is unable to predict the data at all.
A key aspect of the recognition model above rests on the nonlinearity of the internal model. It is this nonlinearity that allows high-level states to act as control parameters to reconfigure the motion of faster low-level states. If the equations of motion at each level were linear in the states, each level would simply convolve its supraordinate inputs with an impulse response function. This precludes the induction of faster dynamics because linear convolutions can only suppress or amplify the input frequencies; they cannot create new frequencies. However, the environment is nonlinear, where long-term causes may disclose themselves through their influence on the autonomous nonlinear dynamics of other systems. To predict the ensuing environmental trajectories accurately, top-down effects in the agent's internal model must be nonlinear too.
The simulations have shown how environmental trajectories at two different time-scales can be extracted from fast sensory input. This simple example of how a synthetic bird recognises songs provides a metaphor for how the human brain might exploit temporal structure in the environment. Obviously, the brain affords many more levels than two and operates on much higher-dimensional input. However, the principle of hierarchical inference, with separation of time-scales, could be an inherent part of neuronal computations. If the generative model employed by the brain embodies autonomous dynamics that are coupled nonlinearly by control parameters, each level in the hierarchy may represent a specific time-scale. In the following, we will discuss two bodies of neuroscientific evidence for such a mapping: (i) modulatory backward connections which operate at slower time-scales than forward connections and (ii) a cortical gradient of environmental time-scales. We then relate the principle of hierarchical inference to other theoretical accounts in neuroscience.
There is extensive literature on the hierarchical organisation of the brain,
in particular of the cortex
There is a key functional distinction between forward and backward
connections that renders backward connections more nonlinear or modulatory
in their effects on neuronal responses, e.g.,
Assuming that the brain employs a temporal hierarchy and that
‘wiring costs’
Cortical Areas | Brief Description | Time-Scale of Environmental Dynamics | Section in |
Sensory and association cortex | Sensory processing follows a temporal hierarchy | Milliseconds to hundreds of milliseconds | Section 1 |
Primary motor and premotor cortex | Motor areas serve the hierarchical prediction of the sensory consequences of movement trajectories | Tens of milliseconds to seconds | Section 2 |
Rostral anterior cingulate cortex | Hierarchical, contextual influence on action prediction | Tens of seconds to much longer periods | Section 3 |
Lateral prefrontal cortex | Hierarchically ordered ‘cognitive control’ system | Tens of seconds to much longer periods | Section 4 |
Orbitofrontal cortex | Representation of temporally most stable environmental states | Very long periods | Section 5 |
The location along this gradient determines the time-scale of the environmental dynamics that are represented.
The concept of modelling sensory dynamics and their relation to neuronal
representations can be related to several approaches in theoretical physics
There is extensive literature on the hierarchical structure of human
behaviour, see
There are several theories that relate to the hypothesis that the operations
of specific brain systems pertain to temporal structure of the environment.
An exemplary approach is Fuster's sensorimotor hierarchy
Other models, in particular from motor control theory, try to explain
perception and action via forward modelling and reinforcement learning,
e.g.,
There is a large experimental and theoretical literature on coupled neuronal
dynamics, e.g.,
We have proposed that the brain employs a hierarchical model, where nonlinear coupling among hierarchical levels endows each with a distinct temporal scale. At low levels of this hierarchy; e.g., close to primary sensory areas, neuronal states represent the trajectories of short-lived environmental causes. Conversely, high levels represent the context in which lower levels unfold. Critically, at each level, representations depend on, and interact with, representations at other levels. We presented simulations that provide a proof of concept that a temporal hierarchy is a natural model to recover information about dynamic environmental causes. In addition, we have discussed empirical findings, which support the conclusion that cortical structure recapitulates a hierarchy of temporal scales.
The principle of a temporal hierarchy provides a theoretical framework for
experiments in systems neuroscience. The predictions based on this account could
be addressed by making time-scale an experimental factor. For visual areas,
Hasson et al.
Review of neuroscientific evidence. In sections 1 to 5, evidence is reviewed that cortical structure and function reflect an anatomic-temporal hierarchy, following a rostro-caudal gradient.
(0.13 MB PDF)
We thank Katharina von Kriegstein for valuable discussions and her comments on an earlier version of the manuscript. We thank Christian Ruff, Chris Frith, Jérémie Mattout, Debbie Talmi, Sven Bestmann, and Felix Blankenburg for their comments on earlier versions of the manuscript.