Open Access
Research Article
Integrated Information in Discrete Dynamical Systems: Motivation and Theoretical Framework
Department of Psychiatry, University of Wisconsin, Madison, Wisconsin, United States of America
Abstract
This paper introduces a time- and state-dependent measure of integrated information, φ, which captures the repertoire of causal states available to a system as a whole. Specifically, φ quantifies how much information is generated (uncertainty is reduced) when a system enters a particular state through causal interactions among its elements, above and beyond the information generated independently by its parts. Such mathematical characterization is motivated by the observation that integrated information captures two key phenomenological properties of consciousness: (i) there is a large repertoire of conscious experiences so that, when one particular experience occurs, it generates a large amount of information by ruling out all the others; and (ii) this information is integrated, in that each experience appears as a whole that cannot be decomposed into independent parts. This paper extends previous work on stationary systems and applies integrated information to discrete networks as a function of their dynamics and causal architecture. An analysis of basic examples indicates the following: (i) φ varies depending on the state entered by a network, being higher if active and inactive elements are balanced and lower if the network is inactive or hyperactive. (ii) φ varies for systems with identical or similar surface dynamics depending on the underlying causal architecture, being low for systems that merely copy or replay activity states. (iii) φ varies as a function of network architecture. High φ values can be obtained by architectures that conjoin functional specialization with functional integration. Strictly modular and homogeneous systems cannot generate high φ because the former lack integration, whereas the latter lack information. Feedforward and lattice architectures are capable of generating high φ but are inefficient. (iv) In Hopfield networks, φ is low for attractor states and neutral states, but increases if the networks are optimized to achieve tension between local and global interactions. These basic examples appear to match well against neurobiological evidence concerning the neural substrates of consciousness. More generally, φ appears to be a useful metric to characterize the capacity of any physical system to integrate information.
Author Summary
We have suggested that consciousness has to do with a system's capacity to generate integrated information. This suggestion stems from considering two basic properties of consciousness: (i) each conscious experience generates a large amount of information, by ruling out alternative experiences; and (ii) the information is integrated, meaning that it cannot be decomposed into independent parts. We introduce a measure that quantifies how much integrated information is generated by a discrete dynamical system in the process of transitioning from one state to the next. The measure captures the information generated by the causal interactions among the elements of the system, above and beyond the information generated independently by its parts. We present numerical analyses of basic examples, which match well against neurobiological evidence concerning the neural substrates of consciousness. The framework establishes an observer-independent view of information by taking an intrinsic perspective on interactions.
Citation: Balduzzi D, Tononi G (2008) Integrated Information in Discrete Dynamical Systems: Motivation and Theoretical Framework. PLoS Comput Biol 4(6): e1000091. doi:10.1371/journal.pcbi.1000091
Editor: Olaf Sporns, Indiana University, United States of America
Received: December 26, 2007; Accepted: April 29, 2008; Published: June 13, 2008
Copyright: © 2008 Balduzzi, Tononi. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This work was supported by the NIH Director's Pioneer Award and the J. S. McDonnell Foundation.
Competing interests: The authors have declared that no competing interests exist.
* E-mail: gtononi@wisc.edu
Introduction
Scientists and engineers are usually interested in how information can be transmitted or stored from the perspective of a user. However, it is just as important to consider information, in the classic sense of reduction of uncertainty, from the perspective of an autonomous system. How much information is generated when the system enters a particular state by virtue of causal interactions among its elements? And to what extent is the information generated by the system as a whole, as opposed to the information generated independently by its parts? Addressing these questions requires the development of a new framework that is based on the notion of integrated information [1],[2].
The need for such a framework is not merely academic. Indeed, it was initially motivated by one of the most baffling scientific problems – the generation of conscious experience by the brain. We know that certain regions of the brain, for example the thalamocortical system [3], are essential for consciousness, whereas other regions, such as the cerebellum, are not, though the cerebellum has even more neurons and is seemingly just as complicated. We also know that consciousness fades during sleep early in the night, although neurons in the thalamocortical system remain just as active as during quiet wakefulness. During generalized seizures neurons fire even more strongly and synchronously, yet consciousness is suspended or much reduced. Why is this the case? Specifically, what are the necessary and sufficient conditions for a physical system to generate experience? This problem – also known as the first problem of consciousness – is thought to be rather hard, as it is not easy to see how “subjective” experience could be squeezed out of a collection of physical elements.
The integrated information theory of consciousness represents an attempt to address the first problem of consciousness from first principles [4],[5]. The theory argues that consciousness is integrated information, starting from a phenomenological analysis. It proceeds by defining integrated information and suggesting how it can be measured for stationary systems. Finally, it shows that the integrated information perspective can provide a parsimonious account for several key empirical facts about the relationship between consciousness and the brain.
In the present work, our goal is to provide a definition and measure of integrated information for systems of discrete elements that evolve through time. This extension provides a framework for integrated information that is fully general and can be applied in principle to any kind of physical system. It also permits further predictions concerning the relationships between brain processes and consciousness. Finally, irrespective of the relevance for understanding consciousness, the notion of integrated information presented here may be useful for characterizing computational systems not merely as processors or stores of information, but as integrators of information.
It is useful to briefly examine the phenomenological observations that motivate the integrated information approach to consciousness. From a first-person perspective – the perspective of the system that is actually capable of generating subjective experience – two fundamental properties of consciousness are apparent: i) there is a large repertoire of conscious experiences. This means that, when one particular experience occurs, it generates a lot of information; ii) each experience is integrated, i.e. it appears as a whole that cannot be decomposed into independent parts [4],[5]. Since we tend to take consciousness for granted, these two properties are best understood by resorting to thought experiments: one involving a photodiode and the other a digital camera.
Information.
Consider the following: You are facing a blank screen that is alternately on and off, and you have been instructed to say “light” when the screen turns on and “dark” when it turns off. A photodiode – a very simple light-sensitive device – has also been placed in front of the screen, and is set up to beep when the screen emits light and to stay silent when it does not. The first problem of consciousness reduces to this: when you distinguish between the screen being on or off, you have the subjective experience of seeing light or dark. The photodiode can also distinguish between the screen being on or off, but presumably it does not have a subjective experience of light and dark. What is the key difference between you and the photodiode?
According to the theory, the difference has to do with how much information is generated when that distinction is made. Information is classically defined as reduction of uncertainty when a particular outcome occurs out of a repertoire of alternative outcomes: the more numerous the outcomes, the greater the reduction of uncertainty, and thus the information. When the blank screen turns off, the photodiode enters one of its two possible states and beeps, yielding 1 bit of information. However, when you see the blank screen turn off, the state you enter rules out a very large number of possible states. Imagine that, instead of turning homogeneously off, the screen were to display at random every frame from every movie that was ever produced. Without any effort, each of these frames would cause you to enter a different state and see a different image. This means that when you enter the particular state (“seeing pure darkness”) you rule out not just “seeing light,” but an extraordinarily large number of alternative possibilities. Whether or not you think of the bewildering number of alternatives (you won't and you can't), this corresponds to an extraordinary amount of information. Importantly, this information has nothing to do with how complicated the scene is – pure darkness or a busy city street – but only with the number of alternative outcomes.
Integration.
While the ability to distinguish among a large number of states is a fundamental difference between you and the photodiode, by itself it is not enough to account for the presence of consciousness. To see why, consider an idealized megapixel digital camera, whose sensor chip is essentially a collection of a million photodiodes. Even if each photodiode in the sensor chip were just binary, the camera could distinguish among 21,000,000 states, an immense number, corresponding to 1,000,000 bits of information. Indeed, the camera would enter a different state for every frame from every movie that was ever produced. Yet few would argue that the camera is conscious. What is the key difference between you and the camera?
According to the theory, the difference has to do with integrated information. An external observer may consider the camera chip as a single system with a repertoire of 21,000,000 states. In reality, however, the chip is not an integrated entity: since its 1,000,000 photodiodes have no way to interact, the state of each photodiode is causally independent of that of the others: in reality, the chip is a collection of 1,000,000 independent photodiodes, each with a repertoire of 2 states. This is easy to prove: if the sensor chip were cut down into its individual photodiodes, the performance of the camera would not change at all. By contrast, your vast repertoire of conscious states truly belongs to an integrated system, since it cannot be subdivided into repertoires of states available to independent components. Thus, a conscious image is always experienced as an integrated whole: no matter how hard you try, you cannot experience the left half of the visual field of view independently of the right half, or colors independent of shapes. Underlying this unity of experience are causal interactions within your brain, which make the state of each element causally dependent on that of other elements. Indeed, unlike the camera, your brain's performance breaks down if its elements are disconnected. And so does consciousness: for example, splitting the brain in two along the corpus callosum prevents causal interactions between the two hemispheres and splits experience in two – the right half of the visual field is experienced independently of the left.
This phenomenological analysis suggests that, to generate consciousness, a physical system must have a large repertoire of available states (information) and it must be unified, i.e. it should not be decomposable into a collection of causally independent subsystems (integration). How can one establish the size of the repertoire of states available to a unified system?
Our goal is to provide a way to measure how much information is generated when a physical system enters one particular state out of a repertoire of possible states, but only to the extent that the information is generated by the system as a whole, above and beyond the information generated independently by its parts. Previous work [2],[4],[5] focused on neural systems modeled as stationary multidimensional Gaussian processes. This had the advantage that analytical results could be obtained, but suffered from the drawback that time and the changing dynamics of the systems were not taken into account. In this paper, we extend the theory to include time as a discrete variable. We apply the theory to simple examples, discrete systems of a dozen or fewer elements. Although these systems are too small to be considered at all realistic, we choose them to illustrate the relationship between integrated information and the anatomical connectivity, causal architecture, dynamics, and noise levels of the networks.
Models
To evaluate how much integrated information is generated when a system enters a particular state, we consider simple systems composed of a few interacting elements. Though the present framework is meant to be general, it is convenient to think of neural elements that can be active (fire) or inactive and can communicate through directed connections.
Let X be a system consisting of n elements, which are taken to be abstract indivisible units. Each element is assumed to have a finite repertoire of outputs, with no accessible internal structure. In the examples below the repertoire of the elements will typically consist of two outputs: 0 or 1, corresponding to silence or firing. The internal states of the elements are irrelevant because it is only through outputs that an element can causally affect the rest of the system.
Elements are linked by connections to form a directed graph, specifying which source elements are capable of affecting which target elements. Each target element is endowed with a “mechanism” or rule through which it determines its next output based on the inputs it receives. These mechanisms are assumed to be elementary, for example AND, XOR; they can also be probabilistic.
Time is assumed to pass in discrete instants, which could correspond to milliseconds for example. We use the word state to refer to the total output of a given subset of a discrete system at a given instant in time. Finally, the elements are memoryless, meaning they are modeled as first order Markov processes: the output of an element at time t depends only on the inputs at time t−1. In future work we will extend the framework to include elements with memory and explain how the natural time frame over which a system generates integrated information is specified.
Notation.
We refer to systems and subsets of systems by capital letters: X, S and so forth. Uppercase letters with subscripts (X0, S0) denote probability distributions of perturbations that are physically imposed on the outputs of a subset at a given time, e.g. at t = 0. Lowercase letters with subscripts (x1, s1) denote events: the actual output of the subset in question at a particular time, e.g. at t = 1.
Information
First, we need to evaluate how much information is generated by a system when it enters a particular state, x1, out of its repertoire (a repertoire is a probability distribution on the set of output states of a system). The information generated should be a function of how large the repertoire of possible states is, and how much uncertainty about the repertoire is reduced by entering state x1. Also, the reduction of uncertainty must be produced by interactions among the elements of the system acting through their causal mechanisms, which is why we call it effective information.
Let us first consider an isolated system, as in Figure 1. The system consists of three AND-gates and transitions from state x0 = 110 at time zero to state x1 = 001 at time one. How much effective information does the system generate? To answer the question we need to precisely describe: i) the alternative states available to the system (the a priori repertoire); ii) those states that the architecture of the system specifies as causes of x1 (the a posteriori repertoire). Effective information captures the information generated by the system by measuring the difference between these two repertoires.
Figure 1. Effective information generated by entering a particular state.
A system of three connected AND-gates transitions from state x0 = 110 at time zero to x1 = 001 at time one. The a priori repertoire is the uniform distribution on the 8 possible outputs of the elements of the system. The causal architecture of the system specifies that state 110 is the unique cause of x1, so the a posteriori repertoire (shown in cyan) assigns probability 1 to state 110 and 0 to all other states. Effective information generated by the system transitioning to x1 is 3 bits.
doi:10.1371/journal.pcbi.1000091.g001Effective information is defined as the entropy of the a posteriori repertoire relative to the a priori repertoire, which we write as:(1A)
The a priori repertoire is the probability distribution on the set of possible outputs of the elements considered independently, with each output equally likely. This repertoire includes all possible states of the system prior to considering the effects of its causal architecture and the fact that it entered state x1. This distribution is imposed onto the system, i.e. we perform a perturbation in the sense of [6]. The a priori repertoire coincides with the maximum entropy (maxent) distribution on the states of the system; we denote it by pmax(X0). No perturbation can be ruled out a priori, since it is only by passing a state through the mechanism that the system generates information. The maximum entropy distribution formalizes the notion of complete ignorance [7]. In Figure 1 the a priori repertoire distribution assigns equal probability to each of the 23 = 8 possible outputs of the system.
The a posteriori repertoire p(X0 → x1) is the repertoire of states that could have led to x1 through causal interactions. We determine the a posteriori repertoire by forcibly intervening in the system and imposing each state in the a priori repertoire, thus we implement a perturbational approach [1],[2],[4],[6]; see also [8],[9] which apply perturbations to measure the average interaction between subsets for general distributions. Considering each a priori perturbation in turn we find that some perturbations could have caused (led to) x1 and others not (either deterministically or with a certain probability). The a posteriori repertoire is formally captured by Bayes' rule, which keeps track of which perturbations cause (lead to) the given effect (see Text S1, section 3). In Figure 1 x0 is the unique perturbation that causes x1, so the a posteriori repertoire assigns weight 1 to x0 and weight 0 to all other perturbations.
Relative entropy (also known as Kullback-Leibler divergence, see Text S1, section 1) is the uncertainty reduction provided by an a posteriori repertoire with respect to an a priori repertoire. It is always non-negative, and is zero if and only if the repertoires are identical. In our case the information is generated by the system when, through causal interactions among its elements, it enters state x1 and thereby specifies an a posteriori distribution with respect to an a priori distribution. By comparing the a priori and a posteriori repertoires effective information measures those “differences that make a difference” [10].
Given that the second term is a maximum entropy distribution, Equation 1A can be more simply written as a difference of entropies, so that(1B)
Here H(p(−))is the entropy of probability distribution p. Entropy of the a priori repertoire n bits in a system of n binary elements. The second term is the entropy of the a posteriori repertoire, and lies between 0 and n bits depending on the state x1 and the architecture of the system. It follows that a system of n binary elements generates at most n bits of information.
In Figure 1 the entropy of the a priori repertoire is 3 bits and that of the a posteriori is 0 bits, so 3 bits of effective information are generated by the system when it enters x1: one out of eight perturbations is specified by the system as a cause of its current state, and the other 7 perturbations are ruled out, thus reducing uncertainty (generating information).
In Figure 2, we show that effective information depends both on the size of the repertoire and on how much uncertainty is reduced by the mechanisms of the system. Figure 2A depicts a system of two elements. The a priori repertoire is smaller than in Figure 1, and effective information is reduced to 2 bits. Figure 2B shows the AND-gate system entering state x1 = 000. In this case the a posteriori repertoire specified by the system contains four perturbations that cannot be distinguished by its causal architecture, since each of the four perturbations leads to 000. Fewer alternatives from the a priori repertoire are ruled out, so effective information is 1 bit.
Figure 2. Effective information: a few examples.
Each panel depicts a different system, which has entered a particular state. The a priori and a posteriori repertoires are shown and effective information is measured. (A) is a simple system of two elements that copy each other's previous outputs (a couple). Effective information is 2 bits, less than for the system in Figure 1 since the repertoire of outputs is smaller. (B) shows the AND-gate system of Figure 1 entering the state 000. This state is less informative than 001 since the a posteriori repertoire specified by the system includes four perturbations; effective information is reduced to 1 bit. The systems in (C) and (D) generate no effective information. In (C) the elements always fire regardless of their inputs, corresponding to an inescapable fixed point. In (D) the elements fire or are silent at random, so that the prior state is irrelevant. In both cases the a posteriori repertoire is the maximum entropy distribution since no alternatives have been ruled out, so effective information is zero.
doi:10.1371/journal.pcbi.1000091.g002Finally, Figure 2C and 2D illustrate two systems that generate no effective information. In Figure 2C the elements fire no matter how the system is perturbed, so the system always enters state x1 = 111. The process of entering x1 does not rule out any alternative states, so the a posteriori repertoire coincides with the a priori repertoire and effective information is zero. In Figure 2D the elements fire or not with 50% probability no matter how the system is perturbed. In other words, the behavior of the system is completely dominated by noise. Again, the process of entering x1 does not rule out any alternative states, so the a posteriori repertoire coincides with the a priori repertoire and effective information is zero.
Effective information in systems that are not isolated.
Up to now we have exclusively considered isolated systems. Suppose we embed X in some larger system W that forms the “world” of X. Inputs from the environment, E = W \ X, to X cannot be accounted for by X internally. From X's point of view they are a source of extrinsic noise, since the information generated by the system must be due to causal interactions within the system. In general, to compute effective information one should average over all possible external inputs with the maximum entropy distribution (see Text S1, section 3, for details).
Integrated information
Next, we must evaluate how much information is generated by a system above and beyond what can be accounted for by its parts acting independently.
Consider Figure 3A. Effective information ei(X0 → x1) generated by the system, considered as a single entity, is 4 bits. In this case, however, it is clear that the two couples do not constitute a single entity at all: since there are no causal interactions between them, each of the disjoint couples generates 2 bits of information independently (Figure 3B). Effective information tells us how much information is generated without taking into account the extent to which the information is integrated. What we need to know, instead, is how much information is generated by the system as a whole, over and above the information generated independently by its parts, that is, we need to measure integrated information.
Figure 3. Integrated information for a system of two disjoint couples.
The panels analyze the same system of two disjoint couples from three different perspectives. The interactions in the system are displayed in cyan. Those interactions that occur within a part are shown in red, and those between parts are in dark blue. (A) computes effective information for the entire system X, finding it to be 4 bits. (B) computes effective information generated by each of the couples independently and then computes integrated information φ(x1), finding it to be 0 bits since the two couples do not interact. Notice that the combined a posteriori repertoire of the parts coincides with the a posteriori repertoire of the system; the parts account for all the interactions within X. (C) considers a partition of the system other than the minimum information partition. Since is not isolated it cannot account for the effect of interactions with
internally; they are treated as extrinsic noise and result in
specifying a maximum entropy a posteriori repertoire. Effective information generated across the partition is 4 bits.
Integrated information φ (I for information and O for integration) is defined as the entropy of the a posteriori repertoire of the system relative to the combined a posteriori repertoires of the parts:(2A)
where M and μ stand for parts, and PMIP is the minimum information partition, which represents the natural decomposition of the system into parts.
The a posteriori repertoires of the parts are found by considering each part as a system in its own right (averaging over inputs from other parts and extrinsic to the system, Figure 4). Each part has an a priori repertoire, given by the maximum entropy distribution. The product of the a priori repertoires of the parts is the same as the a priori repertoire of the system, since the elements are treated independently in both cases. The a posteriori repertoire of each part Mk is specified (as for the whole, X, in the previous section) by its causal architecture and current state
, after averaging over external inputs. Thus the rest of the system is treated as a source of extrinsic noise by each part. The effective information generated independently by the parts, shown in red in Figure 4, is the sum of the entropies of their a priori repertoires relative to their a posteriori repertoires.
Figure 4. Effective information generated across the minimum information partition.
(A) depicts the interactions within the system that are quantified by effective information of the entire system. (B) disentangles the interactions, showing interactions within parts in red, and interactions between parts in dark blue. (C) is a schematic of the relationship between the repertoires specified by the system and the parts. Effective information, represented by the arrows, is the entropy of a lower repertoire relative to an upper one. φ(x1) is the entropy of the a posteriori repertoire of the system relative to the combined a posteriori repertoire of the minimal parts.
doi:10.1371/journal.pcbi.1000091.g004Integrated information, shown in dark blue, measures the information generated by the system through causal interactions among its elements (its a posteriori repertoire) with respect to (over and above) the information generated independently by its parts (their combined a posteriori repertoires). In particular, integrated information is zero if and only if the system can be decomposed into a collection of independent parts. Thus, φ(x1) of a system captures how much “the whole is more than the sum (or rather the product) of its parts.”
To exemplify, consider again the system of Figure 3, where the natural decomposition into parts is given by the subsets M1 and M2, as shown in Figure 3B. The a posteriori repertoire specifies perturbation 10. Similarly the a posteriori repertoire of M2 specifies perturbation 01. The combined a posteriori repertoire of the parts specifies perturbation 1001 (red notch), coinciding with the a posteriori perturbation specified by the entire system. No alternatives are ruled out by the system as a whole, so integrated information is
The system generates no information as a whole, over and above that generated by its parts.
Of note, a related measure is stochastic interaction [11], which quantifies the average interactions between subsets of a system. Briefly, our approach is distinguished by comparing the whole to the parts, rather than the parts to one another; see Text S1, section 8, for detailed discussion and technical motivation.
The minimum information partition.
In the case of the two couples the natural decomposition of the system into parts is captured by partition PMIP. Considering other partitions, for example partition in Figure 3C, would miss the obvious decomposition of the system into independent parts and lead to erroneous estimates of integrated information. This example suggests that, for any system, we need to find the informational “weakest link”, i.e. the decomposition into those parts that are most independent (least integrated). This weakest link is given by the minimum information partition PMIP, which can be found by searching over all partitions of the system after appropriate normalization.
To do so, let us define the effective information across an arbitrary partition as
where the parts are mutually disjoint and collectively pave the system. A special case to consider is the total partition P = {X}. Since the part is the entire system, the a posteriori repertoire of the part equals that of the system, so if we apply Equation (2A) to the total partition we always obtain zero. Thus we define effective information across the total partition to be effective information generated by the entire system, as in Equation 1A. For a system containing no elements with self-connections effective information generated by the system (across the total partition) and effective information across the partition into individual elements coincide.
Normalization.
Normalization is necessary because effective information across an asymmetric bipartition where one part contains a single element and the second part contains the rest will typically be less than across a symmetric partition into two parts of equal size. Similarly, effective information across partitions into many parts tends to be higher than across partitions into few parts. To fairly compare different partitions we therefore introduce the normalization:
where m is the number of parts in the partition. The normalization is the size of the smallest a priori repertoire of a part multiplied by the number of other parts. In particular, for a partition into two parts NP is the size of the smaller a priori repertoire. The normalization for the total partition is NP = Hmax(X0).
The minimum information partition (MIP) can then be defined as the partition for which normalized effective information is a minimum:
If there is more than one partition that attains the minimum normalized value, we select those partitions that generate the lowest un-normalized quantity of effective information to be the minimum information partition(s). Once the minimum information partition has been found, integrated information can be simply expressed as(2B)
Integrated information is bounded.
For a discrete system composed of n binary elements φ(x1)≤n bits. This follows since the normalization is largest for the total partition, and for this partition effective information is bits.
Complexes
For any given system X, we are now in a position to identify those subsets that are capable of integrating information, the complexes. A subset S of X forms a complex when it enters state s1 if φ(s1)>0 and S is not contained in some larger set with strictly higher φ. A complex whose subsets have strictly lower φ is called a main complex. For instance, the complex in a given system with the maximum value of φ necessarily forms a main complex.(3A)
At each instant in time any system of elements can be decomposed into its constituent complexes, which form its fundamental units. Indeed, only a complex can be properly considered to form a single entity. For a complex, and only for a complex, it is meaningful to say that, when it enters a particular state out of its repertoire, it generates an amount of integrated information corresponding to its φ value.
Decomposing a system into complexes.
Figure 5 shows how a system X can be analyzed to find its constituent complexes, shown in shades of gray. From the figure, we see that complexes have the following properties: i) the same element can belong to more than one complex, and complexes can overlap; in particular, a smaller complex of high φ (main complex) may be contained within a larger complex of low φ; ii) a complex can be causally connected to elements that are not part of it (the input and output elements of a complex are called ports-in and ports-out, respectively); iii) groups of elements with identical causal architectures can generate different amounts of integrated information depending on their ports-in and ports-out (subsets A and B in Figure 5).
Figure 5. Decomposing systems into overlapping complexes.
In this example elements are parity gates: they fire if they receive an odd number of spikes. Links without arrows are bidirectional. The system is decomposed into three of its complexes, shown in shades of gray. Observe that: i) complexes can overlap; ii) a complex can interact causally with elements not part of it; iii) groups of elements with identical architectures generate different amounts of integrated information, depending on their ports-in and ports-out (compare subset A, the dark gray filled-in circle, with subset B, the right-hand circle).
doi:10.1371/journal.pcbi.1000091.g005Elements independently driven by a complex do not generate integrated information.
Figure 6 shows a system of interacting elements, A, with three additional elements attached that copy its outputs. In its current state, subset A forms a main complex, and generates 3 bits of integrated information. However, the entire system does not form a complex: φ(x1) = 0 since the interactions outside of A are redundant. Elements {n4, n5, n6} are analogous to photodiodes in a digital camera, taking a snapshot of A's state. The snapshot generates no integrated information over and above the original. Clearly an interaction occurs between elements n3 and n6, but from the perspective of the entire system it is redundant. Restricting attention to subset B, the couple, we see that integrated information generated by B is 1 bit.
Figure 6. Elements driven by a complex do not contribute to integrated information.
The system is constructed using the AND-gate system of Figure 1, with the addition of three elements copying the inner triple. The AND-triple forms a main complex, as do the couples. However, the entire system generates no integrated information and does not form a complex, since X generates no information over and above that generated by subset A.
doi:10.1371/journal.pcbi.1000091.g006Complexes must be analyzed at the level of elementary components and operations.
Finding the integrated information generated by a system requires analyzing it into complexes from the ground up in terms of elementary components and elementary mechanisms or operations. Figure 7 shows two examples of systems that appear to generate a large amount of integrated information, but on closer analysis dissolve into many weakly interacting components with low φ.
Figure 7. Analyzing systems in terms of elementary components.
(A) and (C) show systems that on the surface appear to generate a large amount of integrated information. The units in (A) have a repertoire of 2n outputs, with the bottom unit copying the top. Integrated information is n bits. Analyzing the internal structure of the system in (B) we find n disjoint couples, each integrating 1 bit of information; the entire system however is not integrated. (C) shows a system of binary units. The top unit receives inputs from 8 other units and performs an AND-gate like operation, firing if and only if all 8 inputs are spikes. Increasing the number of inputs appears to easily increase φ without limit. (D) examines a possible implementation of the internal architecture of the top unit using binary AND-gates. The architecture has a bottleneck, shown in red, so that φ = 1 bit no matter the number of input units.
doi:10.1371/journal.pcbi.1000091.g007Consider the system in Figure 7A. If we ignore internal structure, we might assume that the system is made up of two components, each with a repertoire of 2n outputs. If the lower component copies the output of the upper in the previous time step then this two unit system generates n bits of integrated information – it would seem to be trivial to implement systems with arbitrarily large values of φ. However, we need to consider how such components could be built. Figure 7B depicts a simple construction: each component contains n binary elements, and the connection between the components decomposes into couplings between pairs of elements. Analyzing the system at this more detailed level uncovers a collection of disjoint couples each of which forms an independent complex and generates 1 bit of integrated information. Since the system as a whole is disconnected, φ = 0 bits. The dotted elliptic components have been artificially imposed on the system and do not reflect the underlying causal interactions, resulting in an incorrect value of φ in the higher-level analysis. Note that, if we attempt to address this problem by adding horizontal connections between elements, so that the components are integrated within, we introduce a second problem: the horizontal couplings shrink the a posteriori repertoires of the components, reducing effective information between them. We discuss a related example, and similar considerations for continuous systems, in Text S1, sections 10 and 11.
Figure 7C presents a similar situation. The system contains nine binary components, with a single component receiving inputs from the other eight; the component fires if all eight inputs are active in the previous time step. The minimum information partition is the total partition P = {X} and φ(x1) = 8 bits when the top component is firing, since it uniquely specifies the prior state of the other eight components. Increasing the number of inputs feeding into the top component while maintaining the same rule – fire if and only if all inputs are active – seems to provide a method for constructing systems with high φ using binary components. The difficulty once again lies in physically implementing a component that processes n inputs at a single point in space and at a single instant in time for large n. Figure 7D shows a possible internal architecture of the component, constructed using a hierarchy of logical AND-gates. When analyzed at this level, it is apparent that the system generates 1 bit of integrated information regardless of the number of inputs that feed into the top component, since the bipartition framed by the red cut forms a bottleneck.
The examples in this paper assume that the elements are abstract indivisible objects and that the rules are simple (logic gates, threshold functions and variations thereof). In future work we will investigate the internal structure of elements and determine the conditions under which they can be considered to be indivisible.
Extrinsic inputs can contribute to integrated information within a complex.
The a posteriori repertoire of a complex X is specified using only information that is intrinsic to the complex; extrinsic inputs from the environment E = W \ X are averaged over and treated as extrinsic noise. At first glance it appears that environmental inputs cannot meaningfully contribute to the integrated information generated by X, however this is not the case.
Consider the cartoon example shown in Figure 8. The gray box is a main complex, with environmental inputs (red arrows) entering at the bottom. The bulk of the main complex (the black zig-zag) is not shown. The portion depicted can be considered, for example, as an idealization of the visual center of the mammalian cortex. It is dominated by strong feedforward connections driving the elements, with weak feedback and lateral connections. The system enters state x1. To what extent does the a posteriori repertoire of the system reflect environmental inputs?
Figure 8. Integrated information and extrinsic inputs.
The gray box represents a main complex. Red arrows are input from the environment. Black arrows depict strong feedforward connections; gray arrows are weaker modulatory connections. The black zig-zag represents the bulk of the main complex. The current state of row Ra is determined by extrinsic inputs, which are treated as extrinsic noise. However the current state of row Rb together with the feedforward architecture of the system together specify the prior state of Ra, so that the system is able to distinguish extrinsic inputs once they have caused an interaction between elements within the main complex. Similarly row Rc specifies higher-order invariants in the prior state of row Rb.
doi:10.1371/journal.pcbi.1000091.g008We answer the question by considering the contribution of the current state of three rows of interest, labeled Ra through Rc, to the a posteriori repertoire. State is entirely determined by the feedforward connections from the environment. External inputs are treated as noise, so the state
does nothing to reduce uncertainty regarding the a priori repertoire of states on X. Now consider the state
. As shown, Rb simply copies Ra, so
exactly specifies the prior state of Ra. If Ra is also copying its inputs, then the environmental inputs contribute to the information integrated by the system, albeit one temporal and spatial step removed. R
Start a discussion on this article