Falling towards Forgetfulness: Synaptic Decay Prevents Spontaneous Recovery of Memory

James V. Stone; Peter E. Jupp

doi:10.1371/journal.pcbi.1000143

Abstract

Long after a new language has been learned and forgotten, relearning a few words seems to trigger the recall of other words. This “free-lunch learning” (FLL) effect has been demonstrated both in humans and in neural network models. Specifically, previous work proved that linear networks that learn a set of associations, then partially forget them all, and finally relearn some of the associations, show improved performance on the remaining (i.e., nonrelearned) associations. Here, we prove that relearning forgotten associations decreases performance on nonrelearned associations; an effect we call negative free-lunch learning. The difference between free-lunch learning and the negative free-lunch learning presented here is due to the particular method used to induce forgetting. Specifically, if forgetting is induced by isotropic drifting of weight vectors (i.e., by adding isotropic noise), then free-lunch learning is observed. However, as proved here, if forgetting is induced by weight values that simply decay or fall towards zero, then negative free-lunch learning is observed. From a biological perspective, and assuming that nervous systems are analogous to the networks used here, this suggests that evolution may have selected physiological mechanisms that involve forgetting using a form of synaptic drift rather than synaptic decay, because synaptic drift, but not synaptic decay, yields free-lunch learning.

Author Summary

If you learn a skill, then partially forget it, does relearning part of that skill induce recovery of other parts of the skill? More generally, if you learn a set of associations, then partially forget them, does relearning a subset induce recovery of the remaining associations? In previous work, in which participants learned the layout of a scrambled computer keyboard, the answer to this question appeared to be “yes.” More recently, we modeled this “free-lunch learning” effect using artificial neural networks, in which the synaptic strength between each pair of model neurons is a connection weight. We proved that if forgetting is induced by allowing each weight value to drift randomly, then free-lunch learning is almost inevitable. However, if, after learning a set of associations, forgetting is induced by allowing each connection weight to decay or fall toward zero, then relearning a subset of associations decreases performance on the remaining associations. This suggests that evolution may have selected physiological mechanisms that involve forgetting using a form of synaptic drift rather than synaptic decay, because synaptic drift yields free-lunch learning, whereas decay does not.

Figures

Citation: Stone JV, Jupp PE (2008) Falling towards Forgetfulness: Synaptic Decay Prevents Spontaneous Recovery of Memory. PLoS Comput Biol 4(8): e1000143. https://doi.org/10.1371/journal.pcbi.1000143

Editor: Karl J. Friston, University College London, United Kingdom

Received: December 7, 2007; Accepted: June 25, 2008; Published: August 22, 2008

Copyright: © 2008 Stone, Jupp. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: No funding was received for this work.

Competing interests: The authors have declared that no competing interests exist.

Introduction

The idea that structural changes underpin the formation of new memories can be traced to the 19th century [1]. More recently, Hebb proposed that “When an axon of cell A is near enough to excite B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A's efficiency, as one of the cells firing B, is increased” [2]. It is now widely accepted that learning involves some form of Hebbian adaptation, and a growing body of evidence suggests that Hebbian adaptation is associated with the long-term potentiation (LTP) observed in neuronal systems [3]. LTP is an increase in synaptic efficacy which occurs in the presence of pre-synaptic and post-synaptic activity, and can be specific to a single synapse. One consequence of Hebbian adaptation is that information regarding a specific association is distributed amongst many synaptic connections, and therefore gives rise to a distributed representation of each association.

In [4], participants learned the layout of letters on a “scrambled” keyboard. After a period of forgetting, participants relearned a subset of letter positions. Crucially, this improved performance on the remaining (i.e., nonrelearned) letter positions. However, whereas relearning some associations shows evidence of FLL in some studies [4]–[6], this is not found in not all studies [7]. This discrepancy may be because the many studies performed to investigate this general phenomenon use a wide variety of different materials and procedures, with some measuring recall and others measuring recognition performance, for example. However, within the realms of psychology, one relevant effect is known as part-set cueing inhibition.

Part-set cueing inhibition [8] occurs when a subject is exposed to part of a set of previously learned items, which is found to reduce recall of nonrelearned items. However, [9] showed that a learned row of words was better recalled if the cues consisted of a subset of words placed in their learned positions than if cue words were placed in other positions. In this case, part-set cueing seems to improve performance, but only if each “part” appears in the spatial position in which it was originally learned. This position-specificity is consistent with the FLL effect reported using the “scrambled keyboard” procedure in [4] but has no obvious concomitant in network models (e.g., [4],[10],[11]).

If the brain stores information as distributed representations, then each neuron contributes to the storage of many associations. Therefore, relearning some old and partially forgotten associations should affect the integrity of other associations learned at about the same time. As noted above, previous work has shown that relearning some forgotten associations does not disrupt other associations, but partially restores them. This FLL effect has also been demonstrated in neural network models ([10],[12]), where it can accelerate evolution of adaptive behaviors [13]. Crucially, in [12], the proof that relearning some associations partially restores other associations assumes that forgetting is caused by the addition of isotropic noise to connection weights, which could result from the cumulative effect of small random changes in connection weights. In contrast, here we prove that if forgetting is induced by shrinking weights towards zero, so that weights “fall” towards the origin, then relearning some associations disrupts other associations.

The protocol used to examine FLL here is the same as that used in [4] and [12] and is as follows (see Figure 1). First, learn a set of n₁+n₂ associations A = A₁∪A₂ consisting of two subsets A₁ and A₂ of n₁ and n₂ associations, respectively. After all learned associations A have been partially forgotten, measure performance error on subset A₁. Finally, relearn only subset A₂ and then remeasure performance on subset A₁. FLL occurs if relearning subset A₂ improves performance on A₁.

Download:

Figure 1. Free-lunch learning protocol.

Two subsets of associations A₁ and A₂ are learned. After partial forgetting (see text), performance error E_pre on subset A₁ is measured. Subset A₂ is then relearned to pre-forgetting levels of performance, and performance error E_post on subset A₁ is re-measured. If E_post<E_pre then FLL has occurred, and the amount of FLL is δ = E_pre−E_post. Redrawn from [12].

https://doi.org/10.1371/journal.pcbi.1000143.g001

In order to preclude a common misunderstanding, we emphasize that, for a network with n connection weights, it is assumed that n≥n₁+n₂ ; that is, the number of connection weights on each output unit is not less than the number n₁+n₂ of learned associations. Using the class of linear network models described below, up to n associations can be learned perfectly (see [12]).

The proofs below refer to a network with one output unit. However, these proofs apply to networks with multiple output units, because the n connections to each output unit can be considered as a distinct network, in which case our results can be applied to the network associated with each output unit.

Definition of Performance Error

Each association consists of an input vector x and a corresponding target value d. For a network with weight vector w, the response to an input vector x is y = w·x. We define the performance error for input vectors x₁,…,x_k and desired outputs d₁,…,d_k to be(1)where y_i = w·x_i is the output response to the input vector x_i. By putting X = (x₁,…,x_k)^T, d = (d₁,…,d_k)^T andwe can write Equation 1 succinctly as(2)

The two subsets A₁ and A₂ consist of n₁ and n₂ associations, respectively. Let w₀ be the network weight vector after A₁ and A₂ are learned. When A₁ and A₂ are forgotten, the network weight vector changes to w₁, say, and the performance error on A₁ becomes E_pre = E(X;w₁,d). Finally, relearning A₂ yields a new weight vector, w₂, say, and the performance error on A₁ is E_post = E(X;w₂,d). Free-lunch learning has occurred if performance error on A₁ is less after relearning A₂ than it was before relearning A₂ (i.e., if E_post<E_pre).

Given weight vectors w₁ and w₂, a matrix X of input vectors, and a vector d of desired outputs, define(3)which we shall also refer to simply as δ.

In previous work [12], we assumed that the “forgetting vector” v (defined as v = w₁−w₀) has an isotropic distribution. Here we shall assume instead that the post-forgetting weight vector w₁ is given by(4)for some (possibly random) scalar r, so that(5)and therefore(6)The interpretation of Equation 6 is that forgetting consists of making the optimal weight vector w₀ “fall” towards the origin by a falling factor 1−r.

Results

We provide theoretical results, and compare these with results obtained using computer simulations. In essence, our theoretical and simulation results indicate that falling weights induce negative FLL, which decreases with the square of the falling factor 1−r.

Theoretical Results

Our two main theorems are summarised here, and proofs are provided in the Methods section. These theorems apply to a network with n weights which learns n₁+n₂ associations A = A₁∪A₂, and then after partial forgetting, relearns the n₂ associations in A₂.

We prove that if n₁+n₂≤n (so that, in general, the associations A₁ and A₂ are consistent) and the joint distribution of (X₁,d₁) is isotropic (where X₁ and d₁ are the matrix of inputs and the vector of desired outputs for subset A₁ of associations) then the expected value of δ is negative (recall that δ is defined in Equation 3). We then prove that the probability P(δ<0) that δ is negative approaches unity as n₁ approaches ∞.

Theorem 1

For every non-zero value of r, the expected value of δ given r is negative. More precisely,(7)with equality only in trivial cases, and where the constant of proportionality is guaranteed to be positive. Thus, the expected amount of FLL is negative (or zero).

From a physiological perspective, the case r<1 is obviously of interest because it represents synaptic weight decay. However, from a mathematical perspective, Theorem 1 applies to every value of r, and so it also holds for r>1. In other words, any movement of the weight vector w along the the line connecting w₀ to the origin yields an expectation of negative FLL, in accordance with Theorem 1.

Theorem 2

Under mild conditions on the distributions of the input/output pairs (X₁,d₁) and (X₂,d₂),(8)where x and are any columns of and , respectively, and

Theorem 2 implies that, if (i) the number (n₁) of associations in A₁ is a fixed non-zero proportion ( n₁/n ) of the number n of connection weights, (ii) E[∥d₁∥²]E[∥d₂∥⁻²] is bounded as n → ∞, and (iii) γ(n) → 0 as n → ∞ then P(δ>0) → 0 as n → ∞, i.e., the amount of FLL is negative, with a probability which tends to 1 as n → ∞.

For example, if we assume that (i) each input vector x = (x₁,…,x_n) is chosen from an isotropic Gaussian distribution and (ii) the variance of x_i is then γ(n) = 2/n, , and E[∥d₁∥²]E[∥d²∥⁻²] = n₁/(n₂−1). This ensures that P(δ>0) → 0 as n → ∞.

Simulation Results

Simulation was carried out on a network with n input units and one output unit. The set A of associations consisted of k input vectors (x₁,…,x_k) and k corresponding desired scalar output values (d₁,…,d_k). Each input vector comprised n elements x = (x₁,…,x_n). The values of x_i and d_i were chosen from a Gaussian distribution with unit variance (i.e., ). A network's output y_i is a weighted sum of input values , where x_ij is the jth component of the ith input vector x_i, and each weight w_j is the connection between the jth input unit and the output unit.

Given that the network error for a given set of k associations is , the derivative of E with respect to w yields the delta learning rule , where η is the learning rate, which is adjusted according to the number of weights.

However, in order to save time, we used an equivalent learning method. Learning of the k = n associations in A = A₁∪A₂ was performed by solving a set of n simultaneous equations using a standard method, after which the weight vector w₀ was obtained; this provided perfect performance on all n associations. Partial forgetting was induced by making weights “fall” towards the origin w₁ = rw₀, after which performance error was E_pre. Relearning the n₂ = n/2 associations in A₂ was implemented with k = n₂ as above, after which performance error was E_post.

In each simulation, each value in each input vector x_i, and each target value d_i was chosen from the same isotropic gaussian distribution with unit variance. There were 100 input units, and one output unit. The subsets A₁ and A₂ each consisted of 50 associations. The value of δ = E_pre−E_post was obtained in each of 100 simulations, using a different random seed for each simulation. In Figure 2, the mean of 100 values of δ is shown for various values of the falling factor 1−r.

Download:

Figure 2. Free-lunch learning decreases as the network's weight vector falls toward the origin.

A network with 100 input units and one output unit learns two subsets A₁ and B₂, each of which consists of 50 associations. After learning A₁ and A₂, the network has a weight vector w = w₀, but after partial forgetting, the weight vector is w = w₁. If forgetting consists of subtracting a proportion 1−r of w₀ such that w₁ = w₀−(1−r)w₀ then the weight vector “falls” towards the origin; the factor 1−r is called the falling factor. After forgetting, performance error on A₁ is E_pre, an error which changes to E_post after relearning A₂, where this change is δ = E_pre−E_post. Given that there are A₁ associations in A₁, the expected free-lunch learning per association in A₁ is therefore E[δ/n₁|r]. Solid curve: the expected FLL, E[δ/n₁|r], where this expectation is taken over 100 computer simulations. Dashed curve: theoretical prediction of E[δ/n₁|r] (see Equation 7), using a constant of proportionality equal to unity, so that the predicted free-lunch learning is E_predict[δ/n₁|r] = −(1−r)². As predicted, free-lunch learning E[δ/n₁|r] becomes more negative as the falling factor 1−r increases.

https://doi.org/10.1371/journal.pcbi.1000143.g002

The Geometry of Forgetting

We present a brief account of the geometry which underpins the results reported here, for a network with two input units and one output unit, as shown in Figure 3A. This network learns two associations A₁ = (X₁,d₁) and A₂ = (X₂,d₂).

Download:

Figure 3. Geometric example of how relearning A₂ increases the error on A₁.

(A) A network with two input units and one output unit, with connection weights ω_a and ω_b defines a weight vector w = (ω_a,ω_b). The network learns two associations A₁ and A₂. For example, A₁ is the mapping from input vector x₁ = (x₁₁,x₁₂) to desired output value d₁, and learning A₁ consists of adjusting w until the network output y₁ = w·x₁ equals d₁. (B) For a given association A₂ = (X₂,d₂), the corresponding constraint line in the space defined by (ω_a,ω_b) is L₂. Irrespective of the precise value of the target output value d₁ in association A₁, if d₁ is distributed isotropically then +d₁ is as probable as −d₁. When averaged over +d₁ and −d₁, the change δ in error on A₁ induced by relearning A₂ can be shown to be −(1−r)²e², where w₁^± = rw₀^±. Since this is less than zero, the expected change E[δ|r]<0. (Figure 3A redrawn from [12]).

https://doi.org/10.1371/journal.pcbi.1000143.g003

Figure 3B provides a geometric example of how relearning A₂ increases the error on A₁. After learning A₁ and A₂, w = w₀. The effects of forgetting and relearning can be seen by ignoring the ± superscripts and subscripts for now. After partial forgetting, w = w₁, and performance error E_pre = p². Relearning A₂ yields w₂, the orthogonal projection of w₁ on to L₂, and performance error is E_post = q². FLL occurs if δ = E_pre−E_post>0, or equivalently if p²−q²>0 (see [12], Appendices A–C for proofs). Forgetting here consists of reducing w₀ by a factor r<1, so that w₁ = rw₀.

The plus and minus signs in Figure 3B refer to two versions and of association A₁, in which X₁ is the same and the target d₁ has the same magnitude, but opposite signs: and .

We now find the expected change in error induced by relearning a given association A₂. After learning followed by forgetting, the change in error on after relearning A₂ is . After learning followed by forgetting, the change in error on after relearning A₂ is . Using similar triangles in Figure 3B,(9)(10)Therefore, the total change in error on and induced by relearning A₂ (on different occasions) is(11)(12)(13)Irrespective of the precise value of the target output value d₁ in A₁, if the distribution of d₁ is isotropic then +d₁ is as probable as −d₁. If the total change in error for two instances ( and ) of A₁ is −2(1−r)²e² then the expected change (conditional on e ) is E[δ|e] = −(1−r)²e². Therefore, if forgetting is induced by falling weight values, then the expected change in error E[δ]<0.

Discussion

We have proved and demonstrated that, in one of the simplest forms of neural network model, relearning part of a previously learned set of associations reduces performance on the remaining non-relearned associations. This result is in stark contrast to our previous results, which proved that relearning induced partial recovery of non-relearned items [12]. The only difference between these two studies is the way in which forgetting was induced.

An obvious physiological concomitant of Hebbian learning is long-term potentiation (LTP), which seems to underpin learned behaviors [14]. LTP can last for hours, days or even months, and usually follows an exponential decay [3]. However, some forms of LTP do not seem to decay [15], and have been shown to be stable for up to one year [16]. Such stability is remarkable, but from a statistical point of view, would almost certainly be accompanied by random fluctuations which would have a cumulative effect over time; and indeed, fluctuations are apparent in the stable LTP reported in [16]. Crucially, it is not known if the forgetting of learned behaviors is caused by decaying efficacy at many synapses, or by the cumulative effect of random fluctuations in stable LTP-induced synaptic efficacies. Here, decaying efficacy is analogous to weight values that fall toward zero in network models, whereas the cumulative effect of random fluctuations is analogous to the addition of random noise, or drifting, of weight values in network models.

Given a choice between forgetting via synaptic weights that fall towards zero and weights that drift isotropically, has evolution chosen drifting or falling? If all other things were equal then forgetting via synaptic drift would seem to be the obvious choice. This is because drifting ensures that relearning a subset of associations improves performance on other associations, whereas falling decreases performance. However, other things are rarely equal. The expected magnitude of weights increases with drifting but decreases with falling. (Consider a hypersphere centered on the origin, with radius ∥w₀∥ . Simple geometry shows that more than half of all directions emanating from w₀ yield a new weight vector w₁ which lies outside the hypersphere, and therefore E[∥w₁∥]>E[∥w₀∥] (assuming, for example, that all vectors w₁−w₀ have the same length).) This decrease in weight magnitudes effectively reduces neuronal firing rates, which reduces metabolic costs relative to costs incurred by synaptic drift. Synaptic drift therefore confers mnemonic benefits, but these benefits come at a metabolic price. Thus the increased fitness gained from the mnemonic benefits of synaptic drift must be offset against their metabolic costs. In essence, even free-lunch learning comes at a price.

Methods

We proceed by deriving expressions for E_pre, E_post, and for δ = Epre−E_post. We prove that if n₁+n₂≤n then the expected value of δ is negative. We then prove that the probability P(δ<0) that δ is negative approaches unity as n₁ approaches ∞.

Performance Errors

Given a c×n matrix X and a c -dimensional vector d, let L_X_,d be the affine subspaceof . If X and d are consistent (i.e., there is a w such that Xw = d) thenGiven weight vectors w₁ and w₂, a matrix X of input vectors, and a vector d of desired outputs, definewhere E_pre = E(X;w₁,d) and E_post = E(X;w₂,d). Let be any element of L_X_,d. Then (14)

If X_i has rank n_i then transposing the QR decomposition of (or, equivalently, using Gram–Schmidt orthonormalisation of the rows of X_i) givesfor unique n_i×n_i and n_i×n matrices T_i and Z_i with T_i lower triangular with positive diagonal elements, and . Simple calculation shows that, for any weight vector w, and are orthogonal. Since , it follows that the matrix represents the operator that projects orthogonally onto the image of . Because(15)the image of is contained in that of . As both these images have dimension n_i, they must be equal, and so represents the operator which projects orthogonally onto the image of .

Now suppose that X and d are consistent, where

Then, after the network has learned A₁ and A₂, the weight vector w₀ satisfies(16)(If, as below, n₁+n₂≤n, X₂ and d₂ are consistent, and (X₁,d₁) has a continuous distribution then Equation 16 holds with probability 1.)

Falling

We now assume that forgetting is induced by weight values “falling” towards the origin at zero, i.e., forgetting consists of shrinking the weight vector w₀ by a (possibly random) factor r towards the “dead state” 0. Thus the post-forgetting weight vector w₁ is given by(17)and so the “forgetting vector” v = w₁−w₀ is(18)

The form of forgetting given by Equation 17 is very different from that investigated in [12], where v has an isotropic distribution and is independent of (X₁,d₁) and (X₂,d₂).

Let w₂ be the orthogonal projection of w₁ onto L₂. Then

Manipulation gives(19)and so(20)

Then Equations 14, 16, and 18–20 yield(21)

The Case of Isotropic Random (X₁,d₁)

In this section we assume that the distribution of (X₁,d₁) is isotropic, i.e., that (UX₁V,Ud₁) has the same distribution as (X₁,d₁) for all orthogonal n₁×n₁ matrices U and all orthogonal n×n matrices V. Then taking the conditional expectation of Equation 21 for given X₂, d₂, and r gives the following theorem.

Theorem 1

If

n₁+n₂≤n,
X₂ and d₂ are consistent,
the distribution of (X₁,d₁) is continuous and isotropic,
X₁, d₁, and (X₂,d₂,r) are independent.

then(22)where x is any column of .

Corollary 1

If 1.-3. of Theorem 1 hold then(23)with equality if and only if either r = 1 or d₂ = 0.

Corollary 1 says that (apart from trivial exceptions) the expected amount of FLL is negative.

To obtain Theorem 2, it is useful to have some moments of isotropic distributions. Let x be isotropically distributed on . Then Equations 9.6.1 and 9.6.2 of Mardia and Jupp (2000), together with some algebraic manipulation, yield(24)(25)as in Equations A.14 and A.15 of [12].

The other tool used in proving Theorem 2 is the formula(26)for any random variables X,Y,Z for which these quantities exist. Equation 26 is an application to the conditional distribution of Y|Z of the standard conditional variance formula that is given in Equation 2b.3.6 on page 97 of [17].

Taking the expectation and variance of Equation 21 as only d₁ varies and using Equation 24 gives(27)(28)

Taking the expectation of Equation 28 as only X₁ varies and using Equation 24 gives(29)

We now suppose that(30)

Then taking the variance of Equation 27 as only X₁ varies and using Equation 25 gives(31)

Adding Equations 29 and 30 and using Equation 26 yields(32)

To obtain an upper bound on the conditional probability of FLL (i.e., on P(δ≥0|X₂,d₂,r)), we use Chebyshev's inequality, which states that, for any random variable Y and any positive value of t

Applying Chebyshev's inequality to the conditional distribution of δ(w₁,w₂,X₁,d₁) given (X₂,d₂,r), taking t = E[δ(w₁,w₂;X₁,d₁)|X₂,d₂,r], and noting that (by Equation 23) t≤0, we obtain(33)

Substituting Equations 22 and 32 into Equation 33 gives(34)where

For any positive-definite symmetric matrix A and vector x, diagonalization of A, together with the fact that x+1/x≥2 for positive x, yields(35)

Combining Equations 34 and 35 with the fact that gives(36)

Taking the expectation of Equation 36 over X₂ yields(37)where x and are any columns of and , respectively.

Taking the expectation of Equation 37 over d₂ and r yields the following theorem.

Theorem 2

If (a) conditions 1.-4. of Theorem 1 hold, (b) the columns of are distributed independently, (c) X₂, d₂, and r are independent, (d) the distribution of (X₂,d₂) is isotropic, and (e) E[∥d₂∥⁻²] is finite then(38)where x and are any columns of and , respectively, and

Corollary 2

If the conditions of Theorem 2 hold andwhere x and are any columns of and , respectively, then

Thusprovided that n₁/n and n₂/n are bounded away from zero.

Acknowledgments

Thanks to David Sterratt for asking, “What would happen to free-lunch learning if weights decayed?” and to three anonymous reviewers for their detailed comments.

Author Contributions

Conceived and designed the experiments: JS. Performed the experiments: JS. Analyzed the data: JS. Contributed reagents/materials/analysis tools: JS. Wrote the paper: JS PEJ. Mathematical proofs: PEJ.

References

1. Tanzi E (1893) I fatti e le induzioni nellodierna isologia del sistema nervosa. Riv Sper Freniatr Med Leg 19: 419–472.
- View Article
- Google Scholar
2. Hebb D (1949) The Organization of Behavior: A Neuropsychological Theory. New York: Wiley.
3. Abraham W (2003) How long will long-term potentiation last? Philos Trans R Soc Lond B Biol Sci 358: 735–744.
- View Article
- Google Scholar
4. Stone J, Hunkin N, Hornby A (2001) Predicting spontaneous recovery of memory. Nature 414: 167–168.
- View Article
- Google Scholar
5. Coltheart M, Byng S (1989) A treatment for surface dyslexia. In: Seron X, editor. Cognitive Approaches in Neuropsychological Rehabilitation. London: Lawrence Erlbaum Associates.
6. Weekes B, Coltheart M (1996) Surface dyslexia and surface dysgraphia: treatment studies and their theoretical implications. Cogn Neuropsychol 13: 277–315.
- View Article
- Google Scholar
7. Atkins P (2001) What happens when we relearn part of what we previously knew? Predictions and constraints for models of long-term memory. Psychol Res 65: 202–215.
- View Article
- Google Scholar
8. Roediger H III (1973) Inhibition in recall from cueing with recall targets. J Verbal Learn Verbal Behav 12: 644–657.
- View Article
- Google Scholar
9. Serra M, Nairne J (2000) Part-set cuing of order information: implications for associative theories. Mem Cognit 28: 847–855.
- View Article
- Google Scholar
10. Hinton G, Plaut D (1987) Using fast weights to deblur old memories. Proceedings Ninth Annual Conference of the Cognitive Science Society. pp. 177–186.
11. Atkins P, Murre J (1998) Recovery of unrehearsed items in connectionist models. Connect Sci 10: 99–119.
- View Article
- Google Scholar
12. Stone J, Jupp P (2007) Free-lunch learning: modelling spontaneous recovery of memory. Neural Comput 19: 194–217.
- View Article
- Google Scholar
13. Stone J (2007) Distributed representations accelerate evolution of adaptive behaviours. PLoS Comput Biol 3: e147.
- View Article
- Google Scholar
14. Whitlock J, Heynen A, Shuler M, Bear M (2006) Learning induces long-term potentiation in the hippocampus. Science 313: 1093–1097.
- View Article
- Google Scholar
15. Staubli U, Lynch G (1987) Stable hippocampal long-term potentiation elicited by theta pattern stimulation. Brain Res 435: 227–234.
- View Article
- Google Scholar
16. Abraham WC, Logan B, Greenwood JM, Dragunow M (2002) Induction and experience-dependent consolidation of stable long-term potentiation lasting months in the hippocampus. J Neurosci 22: 9626–9634.
- View Article
- Google Scholar
17. Rao C (1973) Linear Statistical Inference and its Applications. 2nd edition. New York: Wiley.

[ref1] 1. Tanzi E (1893) I fatti e le induzioni nellodierna isologia del sistema nervosa. Riv Sper Freniatr Med Leg 19: 419–472.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Hebb D (1949) The Organization of Behavior: A Neuropsychological Theory. New York: Wiley.

[ref3] 3. Abraham W (2003) How long will long-term potentiation last? Philos Trans R Soc Lond B Biol Sci 358: 735–744.
View Article
Google Scholar

[6] View Article

[7] Google Scholar

[ref4] 4. Stone J, Hunkin N, Hornby A (2001) Predicting spontaneous recovery of memory. Nature 414: 167–168.
View Article
Google Scholar

[9] View Article

[10] Google Scholar

[ref5] 5. Coltheart M, Byng S (1989) A treatment for surface dyslexia. In: Seron X, editor. Cognitive Approaches in Neuropsychological Rehabilitation. London: Lawrence Erlbaum Associates.

[ref6] 6. Weekes B, Coltheart M (1996) Surface dyslexia and surface dysgraphia: treatment studies and their theoretical implications. Cogn Neuropsychol 13: 277–315.
View Article
Google Scholar

[13] View Article

[14] Google Scholar

[ref7] 7. Atkins P (2001) What happens when we relearn part of what we previously knew? Predictions and constraints for models of long-term memory. Psychol Res 65: 202–215.
View Article
Google Scholar

[16] View Article

[17] Google Scholar

[ref8] 8. Roediger H III (1973) Inhibition in recall from cueing with recall targets. J Verbal Learn Verbal Behav 12: 644–657.
View Article
Google Scholar

[19] View Article

[20] Google Scholar

[ref9] 9. Serra M, Nairne J (2000) Part-set cuing of order information: implications for associative theories. Mem Cognit 28: 847–855.
View Article
Google Scholar

[22] View Article

[23] Google Scholar

[ref10] 10. Hinton G, Plaut D (1987) Using fast weights to deblur old memories. Proceedings Ninth Annual Conference of the Cognitive Science Society. pp. 177–186.

[ref11] 11. Atkins P, Murre J (1998) Recovery of unrehearsed items in connectionist models. Connect Sci 10: 99–119.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref12] 12. Stone J, Jupp P (2007) Free-lunch learning: modelling spontaneous recovery of memory. Neural Comput 19: 194–217.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref13] 13. Stone J (2007) Distributed representations accelerate evolution of adaptive behaviours. PLoS Comput Biol 3: e147.
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref14] 14. Whitlock J, Heynen A, Shuler M, Bear M (2006) Learning induces long-term potentiation in the hippocampus. Science 313: 1093–1097.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref15] 15. Staubli U, Lynch G (1987) Stable hippocampal long-term potentiation elicited by theta pattern stimulation. Brain Res 435: 227–234.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref16] 16. Abraham WC, Logan B, Greenwood JM, Dragunow M (2002) Induction and experience-dependent consolidation of stable long-term potentiation lasting months in the hippocampus. J Neurosci 22: 9626–9634.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref17] 17. Rao C (1973) Linear Statistical Inference and its Applications. 2nd edition. New York: Wiley.

Abstract

Author Summary

Figures

Introduction

Definition of Performance Error

Results

Theoretical Results

Theorem 1

Theorem 2

Simulation Results

The Geometry of Forgetting

Discussion

Methods

Performance Errors

Falling

The Case of Isotropic Random (X1,d1)

Theorem 1

Corollary 1

Theorem 2

Corollary 2

Acknowledgments

Author Contributions

References

Cookie Preference Center

Customize Your Cookie Preference

The Case of Isotropic Random (X₁,d₁)