1 Introduction

One line of experimental evidence suggests that synapses may occupy only a very limited number of discrete states of synaptic strength (Petersen et al. 1998; Montgomery and Madison 2002, 2004; O’Connor et al. 2005a, b; Bartol et al. 2015), or may change their strengths via discrete, jump-like processes (Yasuda et al. 2003; Bagal et al. 2005; Sobczyk and Svoboda 2007). Discrete state synapses overcome the catastrophic forgetting of the Hopfield model (Hopfield 1982) in associative memory tasks, turning memory systems into so-called palimpsests, which learn new memories by forgetting old ones (Nadal et al. 1986; Parisi 1986). Unfortunately, memory lifetimes in the simplest such models are rather limited, growing only logarithmically with the number of synapses (Tsodyks 1990; Amit and Fusi 1994; see also Leibold and Kempter 2006; Barrett and van Rossum 2008; Huang and Amit 2010). Memory lifetimes may be extended by considering either sparse coding at the population level (Tsodyks and Feigel’man 1988) or complex models of synaptic plasticity in which synapses can express metaplasticity (changes in internal states) without necessarily expressing plasticity (changes in strength) (Fusi et al. 2005; Leibold and Kempter 2008; Elliott and Lagogiannis 2012; Lahiri and Ganguli 2013). Two previous studies have examined complex models of synaptic plasticity operating in concert with sparse coding (Leibold and Kempter 2008; Rubin and Fusi 2007). For a discussion of the possible roles of the persistence and transience of memories and the synaptic mechanisms underlying synaptic stability, see, for example, Richards and Frankland (2017), and Rao-Ruiz et al. (2021).

We have proposed integrate-and-express models of synaptic plasticity in which synapses act as low-pass filters in order to control fluctuations in developmental patterns of synaptic connectivity (Elliott 2008; Elliott and Lagogiannis 2009). We have also applied these complex models of synaptic plasticity to memory formation, retention and longevity with discrete synapses (Elliott and Lagogiannis 2012), finding that they outperform cascade models (Fusi et al. 2005) in most biologically relevant regions of parameter space (Elliott 2016b). In this paper, we consider the role of sparse coding in the memory dynamics of a filter-based model. For comparison, we also consider the cascade model (Fusi et al. 2005), the serial synapse model (Leibold and Kempter 2008; Rubin and Fusi 2007) and a model of simple synapses (Tsodyks 1990) using our protocols.

Our paper is organised as follows. In Sect. 2, we present our general approach by describing the two memory storage protocols that we study, considering two different definitions of memory lifetimes, and obtaining general, model-independent results. Then, in Sect. 3, we consider both simple and complex models of synaptic plasticity, obtaining the analytical results required to study memory lifetimes in detail. We compare and contrast results for memory lifetimes in simple and complex models in Sect. 4. Finally, in Sect. 5, we briefly discuss our results.

2 General approach and formulation

We provide a convenient list of the most commonly used mathematical symbols and their meanings, excluding those that appear in the appendices, in Table 1.

Table 1 List of frequently used mathematical symbols and their meanings (excluding those in Appendix B)

2.1 Memories and memory lifetimes

We consider a population of P neurons forming a memory system, perhaps performing association or auto-association tasks. Let each neuron receive N synaptic connections from N other neurons that are randomly selected from the entire population. Fully recurrent connectivity would imply that \(N = P - 1\) (excluding self-connections) but in general \(N \ll P\). Other than the requirement that \(N < P\), N may be regarded as mathematically independent of P. Memories are stored sequentially, one after the other, by this memory system. We take them to be stored at times \(t \ge 0\) s governed by a Poisson process of rate r Hz. This continuous time approach is more realistic than a discrete time approach in which memories are stored at uniformly spaced time steps. Due to ongoing synaptic plasticity driven by the storage of later memories, the synaptic patterns that embody earlier memories may be degraded, so that the fidelity of recall of earlier memories may fall over time, ultimately falling to an equilibrium or background level of complete amnesia. It is typical in these scenarios to track the fidelity of recall of the first memory as subsequent memories are stored. This first memory is taken to be stored at time \(t=0\) s on the background equilibrium probability distribution of synaptic strengths.

In previous work, we have focused on a single neuron, or perceptron, in such a system and have examined its recall of stored memories. Here, in a sparse population coding context, we must consider the collective dynamics of the entire population of neurons, but these collective dynamics are nevertheless driven by synaptic processes occurring at the level of single perceptrons in the system. Considering, then, a single perceptron in this population, let its N synapses have strengths \(S_i(t) \in \{-1, +1\}\), \(i=1, \ldots , N\), at time \(t \ge 0\) s. These two strength states should be thought of as low and high rather than inhibitory and excitatory. As memories are presented to the system for storage, the perceptron is exposed to synaptic inputs characterised by the N-dimensional vectors \(\underline{\xi }^\alpha \), \(\alpha = 0, 1, 2, \ldots \), where \(\alpha \) indexes the memories. The component \(\xi _i^\alpha \) represents the input through synapse i during the presentation of memory \(\alpha \), and for simplicity we assume that these components are independent between synapses and across memories.

In response to each of these memory vectors, the perceptron must generate the correct activation or output. With inputs \(x_i\) through its N synapses, the perceptron’s activation is defined as usual by

$$\begin{aligned} h_{\underline{x}}(t) = \frac{1}{N} \sum _{i=1}^N x_i \, S_i(t). \end{aligned}$$
(1)

The perceptron’s output is some possibly nonlinear function of its activation, where this output can correspond to spontaneous activity under conditions of no (or spontaneous) input. We track the fidelity of recall of the first memory \(\underline{\xi }^0\) by examining the perceptron’s activation upon re-presentation (but not re-storage) of this memory at later times \(t > 0\) s. We refer to \(h(t) \equiv h_{\underline{\xi }^0}(t)\) as the tracked memory signal or just the memory signal. The dynamics of h(t) will determine the lifetime of memory \(\underline{\xi }^0\), at least as far as this single perceptron’s capacity to generate the correct output upon re-presentation of \(\underline{\xi }^0\) is concerned. Of course, we are not interested in the lifetime of any particular tracked memory \(\underline{\xi }^0\) stored on any particular pattern of synaptic connectivity and subject to any particular sequence of subsequent, non-tracked memories \(\underline{\xi }^\alpha \), \(\alpha > 0\), stored at any particular set of Poisson-distributed times \(0< t_1< t_2< t_3 < \cdots \). Rather, we are interested only in the lifetime of a typical tracked memory subject to a typical sequence of later memories. Thus, we consider only the statistical properties of h(t) when suitably averaged over all memories.

Memory lifetimes may be defined in a variety of ways using these statistical properties. The simplest definition is to consider the mean and variance of h(t)

$$\begin{aligned} \mu (t)&= \mathsf {E}[ h(t) ], \end{aligned}$$
(2a)
$$\begin{aligned} \sigma (t)^2&= \mathsf {Var}[ h(t) ], \end{aligned}$$
(2b)

define the signal-to-noise ratio (SNR) as:

$$\begin{aligned} \mathrm {SNR}(t) = \left[ \mu (t) - \mu (\infty ) \right] /\sigma (t), \end{aligned}$$
(3)

and then define the memory lifetime as that value of t, call it \(\tau _{\mathrm {snr}}\), that is the (largest) finite, non-negative solution of the equation \(\mathrm {SNR}(\tau _{\mathrm {snr}}) = 1\) when this solution exists; otherwise, we set \(\tau _{\mathrm {snr}} = 0\) s (Tsodyks 1990). This is the last time at which \(\mu (t)\) is distinguishable from its equilibrium value \(\mu (\infty )\) at the level of one standard deviation. Although an “ideal observer” approach to defining memory lifetimes has also been considered (Fusi et al. 2005; Lahiri and Ganguli 2013), it is essentially equivalent to the SNR approach (Elliott 2016b).

The activation h(t) provides a direct read-out of the perceptron’s response to the re-presentation of \(\underline{\xi }^0\) at later times, and would correspond to a neuron’s membrane potential in a more realistic, integrate-and-fire model. By focusing on this read-out of the perceptron’s state, we are naturally led to consider the first passage time (FPT) for the perceptron’s activation to fall below firing threshold, and thus to consider the mean first passage time (MFPT) for this process, which is the mean taken over all tracked and non-tracked memories (Elliott 2014). We may then define an alternative memory lifetime, call it \(\tau _{\mathrm {mfpt}}\), as this MFPT for the perceptron’s activation in response to re-presentation of a typical tracked memory to fall below firing threshold. We have extensively discussed and contrasted the SNR and FPT approaches to defining memory lifetimes elsewhere (Elliott 2014, 2016a, 2017a). In essence, SNR memory lifetimes are only valid in asymptotic, typically large N regimes, while FPT memory lifetimes are valid in all regimes. SNR lifetimes must therefore be interpreted with caution.

To compute FPT lifetimes, we require \(\mathsf {Prob}[ h_{\alpha +1} | h_\alpha ]\), the transition probability describing the probability that the perceptron’s activation (in response to re-presentation of the tracked memory) is \(h_{\alpha +1}\) immediately after the storage of average non-tracked memory \(\underline{\xi }^{\alpha +1}\), given that its activation is \(h_\alpha \) immediately before the memory’s storage. This transition probability is most easily computed in simple models of synaptic plasticity, for which it is independent of the memory storage step (Elliott 2014, 2017a, 2019). This independence arises because simple synapses with only two strength states are “stateless” (Elliott 2020), having no internal states and not enough strength states to carry information between consecutive memory storage steps. In this case, all the probabilities \(\mathsf {Prob}[ h_{\alpha +1} | h_\alpha ]\) over all the possible, discrete values of \(h_\alpha \) and \(h_{\alpha +1}\) define the elements of a transition matrix in the perceptron’s activation between memory storage steps that is independent of the non-tracked memory storage step \(\alpha + 1\) (\(\alpha \ge 0\)). We can then drop the index \(\alpha \) and consider general elements \(\mathsf {Prob}[ \, h \, | \, h' \, ]\) between any two possible values of the perceptron activation, h and \(h'\). We will therefore examine FPT lifetimes only for simple synapses, and SNR memory lifetimes for both simple and complex synapses, but with the understanding that SNR results must be interpreted cautiously. With the transition probabilities \(\mathsf {Prob}[ \, h \, | \, h' \, ]\) being independent of the memory storage step, and with the storage of the definite tracked memory \(\underline{\xi }^0\) inducing the definite activation \(h_0\) immediately after its storage, the FPT lifetime of the memory \(\underline{\xi }^0\) is the solution of the equation

$$\begin{aligned} \tau _{\mathrm {mfpt}} ( h_0 ) = \frac{1}{r} + \sum _{h > \vartheta } \tau _{\mathrm {mfpt}} ( h ) \, \mathsf {Prob}[ \, h \, | \, h_0 \,], \end{aligned}$$
(4)

where transitions to activations below the firing threshold \(\vartheta \) are disallowed (Elliott 2014; see van Kampen 1992, for a general discussion). Equation (4) generalises in the obvious way to an integral equation when h can be regarded as a continuous rather than discrete variable. Solving Eq. (4) for \(\tau _{\mathrm {mfpt}}(h_0)\) for all values of \(h_0 > \vartheta \) entails solving a linear system involving the perceptron activation transition matrix. We therefore refer to Eq. (4) for simplicity but somewhat inaccurately as a matrix equation, to distinguish it from its integral equation equivalent in the continuum limit. This matrix or integral equation (MIE) approach to FPTs is exact. We may also consider an approximation involving the Fokker–Planck equation (FPE) approach by computing the jump moments induced by \(\mathsf {Prob}[ \, h \, | \, h' \, ]\). Because memories are stored as a Poisson process, the jump moments are simply

$$\begin{aligned} A(h')&= \mathsf {E}[ ( h - h' )^1 \, | \, h' \, ], \end{aligned}$$
(5a)
$$\begin{aligned} B(h')&= \mathsf {E}[ ( h - h' )^2 \, | \, h' \, ], \end{aligned}$$
(5b)

where the expectation values are calculated with respect to \(\mathsf {Prob}[ \, h \, | \, h' \, ]\). Then, standard methods (Elliott 2014; see van Kampen 1992, in general) give the MFPT as the solution of the equation:

$$\begin{aligned} - \frac{1}{r} = A(h_0) \frac{{d}}{{d} h_0} \tau _{\mathrm {mfpt}}(h_0) + \frac{1}{2} B(h_0) \frac{{d}^2}{{d} h_0^2} \tau _{\mathrm {mfpt}}(h_0), \end{aligned}$$
(6)

subject to the boundary condition \(r \tau _{\mathrm {mfpt}}(\vartheta ) = 0\). Equations similar to Eqs. (4) and (6) give the higher-order FPT moments (Elliott 2019). Given \(\tau _{\mathrm {mfpt}}( h_0 )\), we obtain \(\tau _{\mathrm {mfpt}}\) by averaging over the distribution of \(h_0\) (for values of \(h_0 > \vartheta \)), corresponding to averaging over the tracked memory \(\underline{\xi }^0\), i.e. \(\tau _{\mathrm {mfpt}} = \langle \tau _{\mathrm {mfpt}}( h_0 ) \rangle _{h_0 > \vartheta }\).

2.2 Hebb protocol

We adopt and adapt the memory storage protocol employed by Leibold and Kempter (2008). Their memory system performs an association task. Within the population of P neurons, a sub-population of “cue” neurons is required to activate a sub-population of “target” neurons. Synapses from cue to target neurons experience potentiating induction signals during memory storage, while those from target to cue experience depressing induction signals; all other synapses do not experience plasticity induction signals. Although Leibold and Kempter (2008) do allow for the possibility of some overlap between cue and target sub-populations, this will not be relevant here. The storage of different memories involves different cue and target sub-populations, so that the entire population of P neurons will be involved in storing many memories over time. If cue and target sub-populations are of equal size, as we assume, then potentiation and depression processes are equally balanced on average. This assumption stands in lieu of realistic neuron models, in which we expect (Elliott 2016a) synaptic plasticity to be dynamically regulated to move to stable dynamical fixed points in which such balancing is achieved automatically (Bienenstock et al. 1982; Burkitt et al. 2004; Appleby and Elliott 2006).

While Leibold and Kempter (2008) consider activities \(\xi _i^\alpha \in \{ 0, 1 \}\), corresponding to inactive (\(\xi _i^\alpha = 0\)) and active (\(\xi _i^\alpha = 1\)) input neurons, we will consider the more general case of \(\xi _i^\alpha \in \{ \zeta , 1 \}\), with \(0 \le \zeta < 1\), where \(\xi _i^\alpha = \zeta \) represents a spontaneous, non-evoked, or background level of activity for an input neuron that is in neither cue nor target sub-populations, while \(\xi _i^\alpha = 1\) represents evoked activity from a cue or target input. We often refer below to active and inactive inputs or neurons, with the understanding that we mean evoked activity and spontaneous activity, respectively. Because synaptic plasticity occurs only between cue and target neurons, synapses between a pre- or postsynaptic neuron that is only spontaneously active do not undergo synaptic plasticity. This accords with our expectations from known physiology: protocols for long-term potentiation (LTP; Bliss and Lømo 1973) and long-term depression (LTD; Lynch et al. 1977) require sustained bouts of evoked electrical activity rather than just spontaneous levels of activity. On a broadly BCM view of synaptic plasticity (Bienenstock et al. 1982), we would expect two thresholds for synaptic plasticity: as activity levels ramp up from spontaneous to weak to strong tetanisation, plasticity switches from none to LTD to LTP. Since synaptic plasticity can only occur between pairs of active, synaptically coupled neurons in this scenario, we refer to it as the Hebb protocol: Hebbian synaptic plasticity is typically understood to mean activity-dependent, bidirectional synaptic plasticity between active pre- and postsynaptic neurons. Although spontaneous activity has by assumption no impact on synaptic plasticity here, it nevertheless has a direct impact on h(t).

For a particular perceptron, let the probability that it is active during the storage of any particular memory be g. Since the perceptron could be part of either cue or target sub-populations, the probability that it is either cue or target during the storage of a memory is just \({\textstyle \frac{1}{2}}g\). The probability that any one of its synaptic inputs is active during memory storage is also just g. However, for the purposes of clarity it is convenient to distinguish between these two probabilities, so we denote the probability that an input is active as f (\(\equiv g\)). In this way, the appearance of a factor of g indicates a global, postsynaptic factor due to the perceptron, or postsynaptic cell, being in the cue or target population, while a factor of f indicates a local, presynaptic factor due to an input being in the cue or target population. The probability g, or f, controls the sparseness of the memory representation in this memory system. Considering just a single perceptron, if it is neither cue nor target, then none of its synapses can experience plasticity induction signals. If it is a cue, then only those inputs that correspond to target cells (if any) experience plasticity induction signals, and specifically depressing signals. If it is a target, then similarly only cue inputs experience induction signals, and so only potentiating signals. Without loss of generality, we may therefore just assume that during memory storage, an active perceptron’s active inputs are either all cue or all target neurons. This simplifying assumption effectively doubles on average the rate of plasticity induction signals experienced by synapses compared to the scenario in which the perceptron’s active inputs could represent a combination of cue and target neurons. We could therefore just scale f accordingly.

We summarise the Hebb protocol in Fig. 1, which schematically illustrates a sample of the population of pairs of pre- and postsynaptic neurons, showing all possible combinations of presynaptic activities and postsynaptic roles with their respective probabilities, together with the direction of synaptic plasticity induced by them.

Fig. 1
figure 1

Schematic illustration of the Hebb protocol for memory storage. Six pairs of synaptically coupled neurons are shown. Each cell body is represented by a triangle, with the value (\(\zeta \) or 1) inside the triangle indicating the neuron’s activity during memory storage. A neuron’s axon is denoted by a directed line, while two of its dendrites are denoted by the dashed lines. Synaptic coupling is indicated by a small black blob where an axon terminates on a dendrite, with the symbol to the right of the blob indicating the direction of induced synaptic plasticity during memory storage (“\(\uparrow \)” indicates potentiation, “\(\downarrow \)” depression, and “\(\times \)” no change). The labels “C”, “N” or “T” attached to a postsynaptic cell body indicate that the neuron is a cue cell, neither a cue cell nor a target cell, or a target cell, respectively, in the population. Probabilities of presynaptic activity (f or \(1-f\)) are indicated, as are the joint probabilities of postsynaptic activity and specific role (\(\frac{1}{2}g\) or \(1-g\)). The fact that an active presynaptic neuron synapsing on a cue or target cell always experiences the induction of depression or potentiation, respectively, reflects the simplifying assumption discussed in the main text

To assess memory lifetimes under this protocol, we may track the ability of the cue sub-population to successfully evoke activity in the target sub-population. Considering a single perceptron in the target sub-population, we may obtain general expressions for \(\mu (t)\) and \(\sigma (t)^2\) in Eq. (2), where these expressions are independent of any particular model of synaptic plasticity. Because \(h(t) = \frac{1}{N} \sum _{i=1}^N \xi _i^0 S_i(t)\) and similarly \(h(t)^2 = \frac{1}{N^2} \sum _{i,j=1}^N \xi _i^0 \xi _j^0 S_i(t) S_j(t)\), their expectation values lead to

$$\begin{aligned} \mu (t)&= f \, \mathsf {E}\left[ S_1(t) \, | \, + \, \right] + (1 - f) \, \zeta \, \overbrace{ \mathsf {E}\left[ S_1(t) \, | \, {\times } \, \right] }^{\equiv 0}, \end{aligned}$$
(7a)
$$\begin{aligned} \sigma (t)^2&= \frac{f + (1-f) \zeta ^2 - \mu (t)^2}{N} \nonumber \\&\quad + \frac{N-1}{N} \Big \{ f^2 \, \mathsf {E}\left[ S_1(t) S_2(t) \, | \, {+} {+} \, \right] \nonumber \\&\quad + 2 f (1-f) \, \zeta \, \mathsf {E}\left[ S_1(t) S_2(t) \, | \, {+} {\times } \, \right] \nonumber \\&\quad + (1-f)^2 \zeta ^2 \, \mathsf {E}\left[ S_1(t) S_2(t) \, | \, {\times } {\times } \, \right] - \mu (t)^2 \Big \}, \end{aligned}$$
(7b)

where we could pick any synapse i in Eq. (7a) and any distinct pair of synapses i and j in Eq. (7b) but we restrict without loss of generality to \(i=1\) and \(j=2\). In these equations, we condition on whether a synapse has experienced a potentiating induction signal (“\(+\)”) with probability f or not (“\(\times \)”) with probability \(1-f\), during the storage of \(\underline{\xi }^0\). For the models of synaptic plasticity that we consider below, the (marginal) equilibrium probability distribution of any single synapse’s strength is uniform, or \(\mathsf {Prob}[ S_i (\infty ) = \pm 1 ] = {\textstyle \frac{1}{2}}\), so that if a synapse does not experience a plasticity induction signal during the storage of \(\underline{\xi }^0\), then \(\mathsf {E}\left[ S_i(t) \, | \, {\times } \, \right] \equiv 0\) at \(t=0\) s and this remains true for all times \(t \ge 0\) s when potentiation and depression processes are treated symmetrically, as indicated in Eq. (7a). However, for the pairwise correlations in Eq. (7b) that condition on one or both synapses not having experienced an induction signal, the expectation values do not vanish under the Hebb protocol. This is because of the higher-order equilibrium correlational structure induced by the fact that it is impossible for some of the synapses of an active neuron to experience potentiating induction signals while others experience depressing induction signals during the storage of the same memory under the Hebb protocol.

We may obtain general expressions for the expectation values in Eq. (7) by writing down the transition processes that govern changes in a single synapse’s strength or simultaneous changes in a pair of synapses’ strengths. Let each synapse have s possible internal states for each of its two possible strengths \(\pm 1\), so that the possible state of a synapse is described by a 2s-dimensional vector, with the internal states for strength \(-1\) (respectively, \(+1\)) corresponding to the first (respectively, last) s components. Given the stochastic nature of the plasticity induction signals, this vector defines a joint probability distribution for a synapse’s combined strength and internal state. Let the transition matrix \(\mathbb {M}_+\) implement the definite change in a synapse’s state in response to a potentiating induction signal, and \(\mathbb {M}_-\) that for a depressing induction signal. We then determine the transition matrix governing the change in a single synapse’s state in response to the storage of a typical non-tracked memory by conditioning on all possible combinations of presynaptic activity and postsynaptic role. Defining \(\mathbb {K}_\pm = (1 - f) \mathbb {I} + f \mathbb {M}_\pm \), where \(\mathbb {I}\) is the identity matrix, this transition matrix is

$$\begin{aligned} \mathbb {T}_1 = (1 - g) \, \mathbb {I} + {\textstyle \frac{1}{2}}\, g \, \mathbb {K}_+ + {\textstyle \frac{1}{2}}\, g \, \mathbb {K}_-, \end{aligned}$$
(8a)

or just \(\mathbb {T}_1 = (1 - g) \, \mathbb {I} + g \, \mathbb {K}\), where \(\mathbb {K} = (1 - f) \mathbb {I} + f \mathbb {M}\) with \(\mathbb {M} = {\textstyle \frac{1}{2}}( \mathbb {M}_+ + \mathbb {M}_- )\). The three terms in Eq. (8a) arise from conditioning on the three possible perceptron roles in memory storage (determined by the global factor g), while the two terms in each of \(\mathbb {K}_\pm \) arise from conditioning on the two possible levels of presynaptic activity (determined by the local factor f). Similarly, the transition operator that governs simultaneous changes in pairs of synapses’ states during typical non-tracked memory storage is

$$\begin{aligned} \mathbb {T}_2 = (1 - g) \, \mathbb {I} \otimes \mathbb {I} + {\textstyle \frac{1}{2}}\, g \, \mathbb {K}_+ \otimes \mathbb {K}_+ + {\textstyle \frac{1}{2}}\, g \, \mathbb {K}_- \otimes \mathbb {K}_-, \end{aligned}$$
(8b)

with the generalisation to \(\mathbb {T}_n\) for any number of synapses n being clear. The (marginal) equilibrium probability distribution of a single synapse’s state, denoted by \(\underline{A}_1\), is the (normalised) eigenvector of \(\mathbb {T}_1\) with unit eigenvalue, which is also just the unit eigenvector of \(\mathbb {M}\). That for any pair of synapses, \(\underline{A}_2\), corresponds to the unit eigenvector of \(\mathbb {T}_2\). However, because \(\mathbb {T}_2 \ne (1-g) \mathbb {I} \otimes \mathbb {I} + g \, \mathbb {K} \otimes \mathbb {K}\), then \(\underline{A}_2 \ne \underline{A}_1 \otimes \underline{A}_1\). Rather, \(\underline{A}_2\) must be explicitly computed as the unit eigenstate of \({\textstyle \frac{1}{2}}( \mathbb {K}_+ \otimes \mathbb {K}_+ + \mathbb {K}_- \otimes \mathbb {K}_- )\). It is this failure of factorisation that induces the non-trivial pairwise correlational structure in the equilibrium state.

Using \(\mathbb {T}_1\) and \(\mathbb {T}_2\), we may write down the conditional expectation values in Eq. (7). We define the vector \(\underline{\Omega }^{\mathrm {T}} = \left( {-} \underline{1}^\mathrm {T} \, | \, {+} \underline{1}^{\mathrm {T}} \right) \), where \(\mathrm {T}\) denotes the transpose and the s-dimensional vector \(\underline{1}\) is a vector all of whose components are unity. This vector weights synaptic states according to their two possible strengths. Then,

$$\begin{aligned} \mathsf {E}\left[ S_1(t) \, | \, + \, \right] = \underline{\Omega }^{\mathrm {T}} e^{\left( \mathbb {T}_1 - \mathbb {I} \right) r t} \, \mathbb {M}_+ \underline{A}_1, \end{aligned}$$
(9a)

and

$$\begin{aligned}&\mathsf {E}\left[ S_1(t) S_2(t) \, | \, {+} {+} \, \right] \nonumber \\&\quad = \big ( \underline{\Omega }^{\mathrm {T}} \otimes \underline{\Omega }^{\mathrm {T}} \big ) e^{\left( \mathbb {T}_2 - \mathbb {I} \otimes \mathbb {I} \right) r t} \left( \mathbb {M}_+ \otimes \mathbb {M}_+ \right) \underline{A}_2, \end{aligned}$$
(9b)

and for the other two pairwise expectation values in Eq. (7b), we replace \(\mathbb {M}_+ \otimes \mathbb {M}_+\) in Eq. (9b) by \(\mathbb {M}_+ \otimes \mathbb {I}\) for \({+} {\times }\) and \(\mathbb {I} \otimes \mathbb {I}\) for \({\times } {\times }\). Since \(\mathbb {T}_1 - \mathbb {I} = f g \left( \mathbb {M} - \mathbb {I} \right) \), we have \(\mu (t) = f \, \underline{\Omega }^{\mathrm {T}} e^{\left( \mathbb {M} - \mathbb {I} \right) f g r t} \, \mathbb {M}_+ \underline{A}_1\), so that sparse coding just introduces a multiplicative factor of f and scales the rate r by the product fg in \(\mu (t)\). In the equilibrium limit, by definition \(\exp [(\mathbb {T}_n - \mathbb {I} \otimes \cdots \otimes \mathbb {I})rt] \, \underline{v} \rightarrow \underline{A}_n\) for any state \(\underline{v}\) corresponding to a probability distribution, as \(t \rightarrow \infty \). Hence, \(\mu (\infty ) = f \, \underline{\Omega }^{\mathrm {T}} \underline{A}_1 \equiv 0\), which always follows when potentiation and depression processes are treated symmetrically. For the equilibrium variance, we obtain

$$\begin{aligned} \sigma (\infty )^2&= \frac{f + (1-f) \zeta ^2}{N} \nonumber \\&\quad + \frac{N-1}{N} \left[ f + (1 - f) \zeta \right] ^2 \big ( \underline{\Omega }^{\mathrm {T}} \otimes \underline{\Omega }^{\mathrm {T}} \big ) \underline{A}_2. \end{aligned}$$
(10)

The second, covariance term does not in general vanish because of the equilibrium synaptic pairwise correlations.

These general results allow us to obtain SNR lifetimes when the matrices \(\mathbb {M}_\pm \) are specified for any particular model of synaptic plasticity. We defer the derivation of the transition matrix elements \(\mathsf {Prob}[ \, h \, | \, h' \, ]\) that are required for FPT lifetimes until we explicitly discuss simple models of synaptic plasticity in Sect. 3.1.

2.3 Hopfield protocol

Although the Hebb protocol is intuitive as a means of exploring memory lifetimes in an associative memory system, its non-trivial equilibrium distribution of synaptic states is awkward. To avoid this awkwardness, we may consider an alternative protocol that is nevertheless equivalent to the Hebb protocol in the limit of small fgN. We first define the protocol and then demonstrate the equivalence.

During memory storage, instead of defining cue and target sub-populations, we now specify the entire activity pattern, representing a memory, across the whole population of neurons. We allow these activities to take values from the set \(\{ -1, -\zeta , +\zeta , +1 \}\) with probabilities \(\{ {\textstyle \frac{1}{2}}f, {\textstyle \frac{1}{2}}(1-f), {\textstyle \frac{1}{2}}(1-f), {\textstyle \frac{1}{2}}f \}\). Here, the values \(\pm 1\) represent evoked activity (the neuron is involved in memory storage), with \(+1\) (respectively, \(-1\)) representing a strongly (respectively, weakly) tetanising stimulus in the usual LTP (respectively, LTD) sense. In contrast, the values \(\pm \zeta \) represent spontaneous activity (the neuron is not involved in memory storage). For a single perceptron, this amounts to specifying memory vectors \(\underline{\xi }^\alpha \) with components \(\xi _i^\alpha \) taking one of these four values, and also specifying the perceptron’s required output in response to an input vector, where this output is drawn from the same set with the same probabilities, but with f replaced by g as usual. We can track the perceptron’s activation when its required output is either \(+1\) or \(-1\), but by symmetry its activation would differ only by a sign between these two cases, so for concreteness we just take the required output to be \(+1\) during the storage of \(\underline{\xi }^0\). As with the Hebb protocol, a synapse does not experience a plasticity induction signal if its presynaptic input or the postsynaptic perceptron itself is only spontaneously active. However, if both input and perceptron are active, then the synapse experiences a plasticity induction signal, either potentiating if both activities are the same or depressing if different. This is just the standard Hopfield rule (Hopfield 1982), so we refer to this protocol as the Hopfield protocol: we obtain a pattern of synaptic plasticity induction signals in response to evoked activity that is identical to the standard Hopfield rule, but supplemented by the presence of spontaneous activity that does not induce synaptic plasticity.

Figure 2 summarises the Hopfield protocol, showing all allowed combinations of pre- and postsynaptic activities during memory storage, together with their associated probabilities and induced plasticity induction signals. Depressing and potentiating induction signals both occur with the same overall probability \({\textstyle \frac{1}{2}}f g\) as in the Hebb protocol.

Computing \(\mu (t)\) and \(\sigma (t)^2\) for the Hopfield protocol and using the various symmetries \(\mathsf {E}\left[ S_1(t) \, | \, + \, \right] = - \mathsf {E}\left[ S_1(t) \, | \, - \, \right] \), \(\mathsf {E}\left[ S_1(t) S_2(t) \, | \, {+} {+} \, \right] = \mathsf {E}\left[ S_1(t) S_2(t) \, | \, {-} {-} \, \right] \), etc., we obtain

$$\begin{aligned} \mu (t)&= f \, \mathsf {E}\left[ S_1(t) \, | \, + \, \right] , \end{aligned}$$
(11a)
$$\begin{aligned} \sigma (t)^2&= \frac{f + (1-f) \zeta ^2 - \mu (t)^2}{N} \nonumber \\&\quad + \frac{N-1}{N} \Big \{ f^2 \, \mathsf {E}\left[ S_1(t) S_2(t) \, | \, {+} {+} \, \right] \nonumber \\&\quad + (1-f)^2 \zeta ^2 \, \underbrace{\mathsf {E}\left[ S_1(t) S_2(t) \, | \, {\times } {\times } \, \right] }_{\equiv 0} - \mu (t)^2 \Big \}. \end{aligned}$$
(11b)

These are structurally identical to the expressions in Eq. (7) for the Hebb protocol, except that the linear terms in \(\zeta \) are absent because of cancellation. Had we instead used a single level \(\zeta \) of spontaneous activity rather than the two levels \(\pm \zeta \), we would have obtained identical linear terms, too. Writing down the transition operators \(\mathbb {T}_1\) and \(\mathbb {T}_2\) in the Hopfield protocol, we obtain

$$\begin{aligned} \mathbb {T}_1&= (1 - g) \, \mathbb {I} + g \, \mathbb {K}, \end{aligned}$$
(12a)
$$\begin{aligned} \mathbb {T}_2&= (1 - g) \, \mathbb {I} \otimes \mathbb {I} + g \, \mathbb {K} \otimes \mathbb {K}, \end{aligned}$$
(12b)

with immediate generalisation to \(\mathbb {T}_n\). The (marginal) equilibrium distribution of a pair of synapses’ states is therefore determined by the unit eigenstate of \(\mathbb {K} \otimes \mathbb {K}\) and thus of \(\mathbb {M} \otimes \mathbb {M}\), and so is just \(\underline{A}_2 = \underline{A}_1 \otimes \underline{A}_1\); again, generalisation to \(\underline{A}_n\) is immediate. The result is that all conditional expectation values involving at least one synapse that does not experience a plasticity induction event during the storage of \(\underline{\xi }^0\) vanish, when potentiation and depression processes are treated symmetrically. So, whether we use four-level or three-level activities in the Hopfield protocol, the \(\zeta \)-dependent contributions to the covariance term in Eq. (11b) drop out, as indicated, so that the variance is affected only by the \(\zeta ^2\) term in the first term on the right-hand side (RHS) of Eq. (11b). Moreover, the covariance term vanishes entirely in the large t, equilibrium limit, since \(\mathsf {E}\left[ S_1(t) S_2(t) \, | \, {+} {+} \, \right] \rightarrow \big ( \underline{\Omega }^{\mathrm {T}} \underline{A}_1 \big )^2 \equiv 0\), so that \(\sigma (\infty )^2 = [ f + (1-f) \zeta ^2 ] / N\).

The equivalence of the Hebb and Hopfield protocols in the limit of small fgN is now clear. The corresponding transition matrices \(\mathbb {T}_1\) are in any case identical for both protocols, and hence so are the means. For \(\mathbb {T}_2\), in both protocols we have that

$$\begin{aligned} \mathbb {T}_2 - \mathbb {I} \otimes \mathbb {I} = f g \left[ \left( \mathbb {M} - \mathbb {I} \right) \otimes \mathbb {I} + \mathbb {I} \otimes \left( \mathbb {M} - \mathbb {I} \right) \right] + \mathscr {O}(f^2 g),\nonumber \\ \end{aligned}$$
(13)

and for general \(\mathbb {T}_N\) the \(\mathscr {O}(f g)\) term on the RHS contains N terms, each of which contains \(N-1\) factors of \(\mathbb {I}\) and just one factor of \(\mathbb {M} - \mathbb {I}\). This structure reflects the fact that in the limit of small fgN, at most one of the perceptron’s synapses experiences a plasticity induction signal, regardless of the protocol. The corresponding unit eigenstate of \(\mathbb {T}_N\) in this limit is just \(\underline{A}_1 \otimes \cdots \otimes \underline{A}_1\), regardless of the protocol. Therefore, in the small fgN limit, the equilibrium distribution of synaptic states in the Hebb protocol reduces to that in the Hopfield protocol, and all statistical properties of h(t) must therefore also reduce in the same way. The Hopfield protocol therefore offers a way of extrapolating the small fgN behaviour of the Hebb protocol to larger f without the awkwardness of the Hebb protocol’s equilibrium structure in this regime. Furthermore, the simpler form of the results in the Hopfield protocol allow us to use it to extract the scaling properties of memory lifetimes as a function of small f (or g) in both protocols.

For the non-sparse-coding case of \(f=1\), spontaneous activity does not contribute to the Hopfield protocol’s dynamics, and we recover precisely the Hopfield model with discrete-state synapses. For \(f<1\), we expand the possible activities of neurons to allow for spontaneously active neurons that are not involved in memory storage. Thus, although the Hopfield protocol provides a convenient tool for examining the small fgN limit of the Hebb protocol, we also regard the Hopfield protocol as a fully fledged protocol in its own right, because it constitutes a very natural way of examining sparse coding with a Hopfield plasticity rule.

2.4 Population memory lifetimes

So far we have focused on the memory dynamics of a single perceptron. We now consider the memory dynamics of the entire population of P neurons. We do this only for the Hopfield protocol for simplicity. The tracked memory will evoke activity in a sub-population of on average gP neurons. In an experimental protocol, during the storage of the tracked memory we can at least in principle explicitly identify all those neurons that are active, and then subsequently track all their activities during later re-presentations of the tracked memory. Because of synaptic coupling between these tracked neurons and the other on average \((1-g)P\) neurons, spontaneous activity in the other neurons will affect and potentially degrade the activation of the tracked neurons upon re-presentation of the tracked memory, affecting the tracked neurons’ ability to read out the tracked memory. But, as we are only concerned with the tracked neurons’ read-out of the tracked memory, we do not need to explicitly track the activities of all these other neurons: their activities do not directly form part of the memory signal from the tracked neurons.

Fig. 2
figure 2

Schematic illustration of the Hopfield protocol for memory storage. The format of this figure is essentially identical to that for the Hebb protocol in Fig. 1, except that labels indicating postsynaptic roles are not required. To avoid duplication, spontaneously active neurons are shown with both possible spontaneous activity levels, \(\pm \zeta \); the corresponding probability is for each of these levels rather than for both

In the Hopfield protocol, a tracked neuron will by definition have an output of \(+1\) or \(-1\) during memory storage. For a single perceptron, we focused on an output of \(+1\) without loss of generality. A perceptron with an initial output of \(-1\) will have identical dynamics to one with an initial output of \(+1\), except that the activation will be reversed in sign. Therefore, we can just define the memory signal for any active perceptron to be \(\pm h(t)\), depending on this sign. Denoting the moment generating function (MGF) of \(+h(t)\) for a tracked neuron with an initial output of \(+1\) by \(\mathscr {M}(z;t)\), the MGF for \(-h(t)\) for a tracked neuron with an initial output of \(-1\) will also be just \(\mathscr {M}(z;t)\). All tracked neurons therefore have the same MGF for their memory signals.

Suppose that \(P_{\mathrm {eff}}\) neurons form the sub-population that stores the tracked memory, where \(P_{\mathrm {eff}}\) is binomially distributed with parameter P and probability g. Although these neurons’ activations will not in general evolve independently, as an extremely coarse approximation we assume that their activations do evolve independently during subsequent memory storage (cf. Rubin and Fusi 2007). Population memory lifetimes obtained from this simplifying assumption will therefore only be theoretical, and perhaps very loose, upper bounds on exact memory lifetimes. With this simplification, the MGF for the memory signal from the tracked sub-population is then \(\left[ \mathscr {M}(z;t) \right] ^{P_{\mathrm {eff}}}\), by independence. Averaging over \(P_{\mathrm {eff}}\), the MGF of the population memory signal is then just \(\left[ (1-g) + g \mathscr {M}(z;t) \right] ^P\). The mean, \(\mu _p(t)\), and variance, \(\sigma _p(t)^2\), of this population signal follow directly. Ignoring covariance terms (or considering the limit \(t \rightarrow \infty \) for the variance), we have \(\mu _p(t) = g P \mu (t)\) and \(\sigma _p(t)^2 \approx g P \sigma (t)^2\), where \(\mu (t)\) and \(\sigma (t)^2\) are the single-perceptron mean and variance above.Footnote 1 Hence, the population SNR, \(\mathrm {SNR}_p(t) = \left[ \mu _p(t) - \mu _p(\infty ) \right] / \sigma _p(t)\), is just scaled by the factor \(\sqrt{g P}\) relative to the single-perceptron SNR, so

$$\begin{aligned} \mathrm {SNR}_p(t) \approx \sqrt{g P} \; \mathrm {SNR}(t). \end{aligned}$$
(14)

The population SNR memory lifetime, which we denote by \(\tau _{\mathrm {pop}}\), is then the solution of \(\mathrm {SNR}_p(\tau _{\mathrm {pop}}) = 1\). With \(\sigma (t)^2 \approx [f + (1-f) \zeta ^2] / N\) in the Hopfield protocol, \(\mathrm {SNR}_p(t)\) depends on \(\mathscr {N} = N P\), the total number of synapses in the memory system, but it also contains the additional factor of \(\sqrt{g}\) compared to \(\mathrm {SNR}(t)\), which modifies scaling behaviour compared to single-perceptron results.

3 Models of synaptic plasticity

3.1 Simple synapses: the stochastic updater

The simplest model of synaptic plasticity to consider is one in which synapses lack any internal states so that \(s=1\), and given a plasticity induction signal, they change strength (if possible) with some fixed probability p (Tsodyks 1990). Because a synapse just changes its strength stochastically in this model, we have called such a synapse a “stochastic updater” (SU; Elliott and Lagogiannis 2012). The underlying strength transition matrices are then

$$\begin{aligned} \mathbb {M}_+ = \begin{pmatrix} 1-p ,&{} \quad 0 \\ p ,&{}\quad 1 \end{pmatrix}, \, \, \mathbb {M}_- = \begin{pmatrix} 1 , &{}\quad p \\ 0 , &{}\quad 1-p \end{pmatrix}, \end{aligned}$$
(15)

and so

$$\begin{aligned} \mathbb {K}_+ = \begin{pmatrix} 1 - \psi , &{}\quad 0 \\ \psi , &{}\quad 1 \end{pmatrix}, \; \; \mathbb {K}_- = \begin{pmatrix} 1 , &{}\quad \psi \\ 0 , &{}\quad 1 - \psi \end{pmatrix}, \end{aligned}$$
(16)

where we define \(\psi = f p\) for convenience.

The equilibrium distribution of a single synapse’s strength in both protocols is just the normalised unit eigenvector of \(\mathbb {K}\), or \(\underline{A}_1 = \frac{1}{2}\left( 1, 1 \right) ^{\mathrm {T}}\). For the Hopfield protocol, any pair of synapses’ strengths has the equilibrium distribution \(\underline{A}_2 = \underline{A}_1 \otimes \underline{A}_1\). For the Hebb protocol, we require the unit eigenstate of \({\textstyle \frac{1}{2}}( \mathbb {K}_+ \otimes \mathbb {K}_+ + \mathbb {K}_- \otimes \mathbb {K}_- )\), which gives

$$\begin{aligned} \underline{A}_2&= \frac{1 \, {+} \, \kappa _2}{4} \left[ \begin{pmatrix} 1 \\ 0 \end{pmatrix} {\otimes } \begin{pmatrix} 1 \\ 0 \end{pmatrix} {+} \begin{pmatrix} 0 \\ 1 \end{pmatrix} {\otimes } \begin{pmatrix} 0 \\ 1 \end{pmatrix} \right] \nonumber \\&\quad + \frac{1 \, {-} \, \kappa _2}{4} \left[ \begin{pmatrix} 1 \\ 0 \end{pmatrix} {\otimes } \begin{pmatrix} 0 \\ 1 \end{pmatrix} {+} \begin{pmatrix} 0 \\ 1 \end{pmatrix} {\otimes } \begin{pmatrix} 1 \\ 0 \end{pmatrix} \right] , \end{aligned}$$
(17)

where \(\kappa _2 = \psi / (2 - \psi )\). The quantity \(\kappa _2\) determines the pairwise correlations present in this state, since \(\big ( \underline{\Omega }^{\mathrm {T}} \otimes \underline{\Omega }^{\mathrm {T}} \big ) \underline{A}_2 \equiv \kappa _2\). For \(f \rightarrow 0\), \(\kappa _2 \rightarrow 0\) and \(\underline{A}_2 \rightarrow \underline{A}_1 \otimes \underline{A}_1\). With these equilibrium distributions, we may explicitly compute \(\mu (t)\) and \(\sigma (t)^2\) in both protocols using Eqs. (7) and (11). For the common mean, we obtain

$$\begin{aligned} \mu (t) = f p \, e^{-f g p r t}, \end{aligned}$$
(18)

and for the two variances, we need the various correlation functions in Eqs. (7b) and (11b). For the Hebb protocol, these are

$$\begin{aligned} \mathsf {E}\left[ S_1(t) S_2(t) \, | \, {+} {+} \right]&= p^2 \, e^{- (2 - f p) f g p r t} \nonumber \\&\quad + \Big [ 1 - p \, (2-p) \, e^{- (2 - f p) f g p r t} \Big ] \kappa _2, \end{aligned}$$
(19a)
$$\begin{aligned} \mathsf {E}\left[ S_1(t) S_2(t) \, | \, {+} {\times } \right]&= \Big [ 1 - p \, e^{- (2 - f p) f g p r t} \Big ] \kappa _2, \end{aligned}$$
(19b)
$$\begin{aligned} \mathsf {E}\left[ S_1(t) S_2(t) \, | \, {\times } {\times } \right]&= \kappa _2, \end{aligned}$$
(19c)

and for the Hopfield protocol, we just set \(\kappa _2 = 0\) in these equations. These results allow us to determine SNR lifetimes for simple, SU synapses. Approximating the Hopfield variance by its asymptotic form \(\sigma (\infty )^2\), the single-perceptron SNR memory lifetime in the Hopfield protocol for a stochastic updater is then

$$\begin{aligned} \tau _{\mathrm {snr}} \approx \frac{1}{2fgpr} \log _e \frac{f^2 p^2}{\sigma _N (\infty )^2}, \end{aligned}$$
(20)

and for the population SNR lifetime \(\tau _{\mathrm {pop}}\), we replace \(\sigma _N(\infty )\) by \(\sigma _{\mathscr {N}}(\infty ) / \sqrt{g}\), where the subscript in \(\sigma _X(\infty )^2 = [f + (1-f) \zeta ^2]/X\) indicates either N or \(\mathscr {N} = N P\).

To determine FPT lifetimes, we require \(\mathsf {Prob}[ \, h \, | \, h' \, ]\) for the MIE approach, or the induced jump moments \(A(h')\) and \(B(h')\) for the FPE method. We relegate the derivation of \(\mathsf {Prob}[ \, h \, | \, h' \, ]\) to Appendix A, where we also indicate our numerical methods for obtaining FPTs. From Appendix A, we obtain the jump moments in Eq. (5) for the FPE approach to FPTs. For both protocols we get the same first jump moment

$$\begin{aligned} A(h') = - \psi g h', \end{aligned}$$
(21)

and for the second jump moment, we get

$$\begin{aligned} B(h' \, | \, N_{\mathrm {eff}})&= \left( \frac{N_{\mathrm {eff}}}{N^2} + \frac{N - N_{\mathrm {eff}}}{N^2} \, \zeta ^2 \right) \psi (2 - \psi ) g \nonumber \\&\quad + \psi ^2 g \, (h')^2 + \frac{N_{\mathrm {eff}}(N_{\mathrm {eff}}- 1)}{N^2} \psi ^2 g \nonumber \\&\quad + 2 \, \frac{N_{\mathrm {eff}}(N - N_{\mathrm {eff}})}{N^2} \, \psi ^2 g \, \zeta \nonumber \\&\quad + \frac{(N - N_{\mathrm {eff}}) (N - N_{\mathrm {eff}}-1 )}{N^2} \psi ^2 g \, \zeta ^2, \end{aligned}$$
(22a)
$$\begin{aligned} B(h' \, | \, N_{\mathrm {eff}})&= \left( \frac{N_{\mathrm {eff}}}{N^2} + \frac{N - N_{\mathrm {eff}}}{N^2} \, \zeta ^2 \right) \psi (2 - \psi ) g \nonumber \\&\quad + \psi ^2 g \, (h')^2, \end{aligned}$$
(22b)

for the Hebb and Hopfield protocols, respectively. We have explicitly indicated the dependence of \(B(h')\) on \(N_{\mathrm {eff}}\), where \(N_{\mathrm {eff}}\) is the number of a perceptron’s synapses that are active during the storage of \(\underline{\xi }^0\). We write \(B(h' \, | \, N_{\mathrm {eff}}) = \psi g B_0(N_{\mathrm {eff}}) + \psi ^2 g \, (h')^2\), where we separate out the quadratic dependence on \(h'\) and it is convenient to remove an overall factor of \(\psi g\) from the definition of \(B_0(N_{\mathrm {eff}})\). Dropping the quadratic term from \(B(h' \, | \, N_{\mathrm {eff}})\) is equivalent to considering dynamics based on the Ornstein–Uhlenbeck process (Uhlenbeck and Ornstein 1930), which we have found to be a very good approximation (Elliott 2014, 2017a, 2019), so we work with just the constant term.

For the MIE approach to FPTs, a technical difficulty as discussed in Appendix A requires us to restrict to the specific case of \(\zeta = 0\) only. We use numerical methods to obtain FPT lifetimes from the MIE approach, but for small fN, the dynamics are dominated by \(N_{\mathrm {eff}}= 1\). For \(N_{\mathrm {eff}}= 1\) and \(\vartheta = 0\), Eq. (4) is trivial because the only contribution to the sum involves no transition, occurring with probability \(1 - {\textstyle \frac{1}{2}}\psi g\) regardless of the protocol. Writing \(\sigma _{\mathrm {fpt}}^2\) as the variance in the FPT, we obtain

$$\begin{aligned} \text{ MIE } \quad \left\{ \begin{array}{l} \tau _{\mathrm {mfpt}} \sim {\displaystyle \frac{N (1+p)}{p g r}}, \\ \sigma ^2_{\mathrm {fpt}} \sim {\displaystyle \frac{4 N (1+p)}{p^2 f g^2 r}}, \end{array} \right. \end{aligned}$$
(23)

at leading order, for small f (\(= g\)) in both protocols. We see that \(\tau _{\mathrm {mfpt}}\) scales as 1/f in this regime, but that \(\sigma _{\mathrm {fpt}}\) scales as \(1/f^{3/2}\). Although \(\sigma _{\mathrm {fpt}}\) swamps \(\tau _{\mathrm {mfpt}}\) for small f, \(\tau _{\mathrm {mfpt}}\) is nevertheless robustly positive. We may use our earlier results to obtain the corresponding forms for the FPE approach to FPT lifetimes for small f (see Eqs. (3.29) and (3.30) in Elliott 2017a). We obtain

$$\begin{aligned} \text{ FPE } \quad \left\{ \begin{array}{l} \tau _{\mathrm {mfpt}} \sim {\displaystyle \frac{\log _e 2}{2 p f g r}},\\ \sigma ^2_{\mathrm {fpt}} \sim {\displaystyle \frac{\pi ^2 + 6 \log _e^2 2}{24 p^2 f^2 g^2 r}}. \end{array} \right. \end{aligned}$$
(24)

In contrast to Eq. (23), now \(\tau _{\mathrm {mfpt}}\) scales as \(1/f^2\) and not 1/f, and \(\sigma _{\mathrm {fpt}}\) scales as \(1/f^2\) and not \(1/f^{3/2}\). Moreover, in the FPE approach, the FPT moments have lost their overall scaling with N. Although the forms in Eq. (24) are obtained using mean field approximations that are expected to be invalid when fN is small, in fact we obtain the same scaling behaviour when the expectation values are obtained by averaging properly over \(h_0\) and \(N_{\mathrm {eff}}\). Our simulation results, discussed in Sect. 4, agree with the behaviour in Eq. (23). Therefore, the failure of the FPE approach for small fN in Eq. (24) is due to the approximations intrinsic to the FPE approach itself. These include the diffusion and especially the continuum limit. For small fN, the system is nowhere near the continuum limit, so the scaling behaviour must be incorrect there.

3.2 Complex synapses

We now turn to models of complex synapses that have internal states, so that \(s > 1\). In such models, synapses can undergo metaplastic changes in their internal states without expressing changes in synaptic strength. We will only consider SNR lifetimes in relation to complex synapses. We have studied FPT lifetimes for filter-based synaptic plasticity for both bistate (Elliott 2017b) and multistate (Elliott 2020) synapses in a non-sparse coding context, but we have yet to consider other models of complex synapses. We therefore restrict to SNR lifetimes, but with the caveat that they are valid only in an asymptotic regime.

We have discussed filter-based models of synaptic plasticity at length elsewhere (Elliott 2008; Elliott and Lagogiannis 2009, 2012; Elliott 2016b), so we only briefly summarise them here. Synapses are proposed to implement a form of low-pass filtering by integrating plasticity induction signals in an internal filter state. Synapses then filter out high-frequency noise in their induction signals and pass only low-frequency trends, rendering them less susceptible to changes in strength due to fluctuations in their inputs. Potentiating (respectively, depressing) induction signals increment (respectively, decrement) the filter state, with synaptic plasticity being expressed (if possible) only when the filter reaches an upper (respectively, lower) threshold. For symmetric potentiation and depression processes, we may take these thresholds to be \(\pm \Theta \). The filter can occupy the \(2 \Theta -1\) states \(-(\Theta -1), \ldots , +(\Theta -1)\), with the thresholds \(\pm \Theta \) not being occupiable states. Several variant filter models are distinguishable by their different dynamics upon reaching threshold (Elliott 2016b), but we consider only the simplest of them here. In the simplest model, the filter always resets to the zero filter state upon reaching threshold, regardless of its strength state and regardless of the type of plasticity induction signal. This filter generalises to any multistate synapse. If the synapse is saturated at its upper (respectively, lower) strength state and reaches its upper (respectively, lower) filter threshold upon receipt of a potentiating (respectively, depressing) induction signal, the filter resets to zero despite the fact that it cannot increment (respectively, decrement) its strength. The transitions for this filter for the case of \(\Theta = 3\) are illustrated in Fig. 3A. Although for clarity we have shown all permitted transitions between all filter and strength states, we stress that each synapse possesses only a single synaptic filter: the filter is not duplicated for each strength state. Transitions in filter state occur independently of strength state. Nevertheless, to describe transitions in the joint strength and filter state, we require \(2(2 \Theta -1) \times 2(2 \Theta -1)\) matrices, so \(s = 2 \Theta - 1\), although the number of required physical states for filter-based synapses is just \(2 \Theta - 1\) for the filter states themselves, and an additional, binary-valued variable for the bistate strength, so a total of \(2 \Theta \) states.

Fig. 3
figure 3

Strength and internal state transitions for various models of complex synapses. Coloured circles indicate synaptic states, with red (respectively, blue) circles corresponding to strength \(S=-1\) (respectively, \(S=+1\)), and the labelled numbers inside the circles identifying the particular internal states (indexed by I for filter states and i for serial and cascade states). Different internal states of the same strength state are organised in the same vertical column, while different strength states correspond to different columns. Solid (respectively, dashed) lines between states show transitions caused by potentiating (respectively, depressing) induction signals, with arrows indicating the direction of the transition. Loops to and from the same state indicate no transition. Three different models are shown, as labelled, corresponding to a \(\Theta = 3\) filter model (A), and \(s = 5\) serial (B) and cascade (C) synapse models. For the filter and serial synapse models, given the presence of an induction signal of the correct type, the transition probabilities are unity. For the cascade model, the transition probabilities are as discussed in the main text

We state without derivation the result for \(\mu (t)\) in this filter model:

$$\begin{aligned} \mu _{\mathrm {fil}}(t)&= \frac{f}{\Theta ^3} \sum _{l=0}^{\Theta -1} \cot ^2 {\textstyle \frac{(2l+1) \pi }{4 \Theta }} \, e^{- f g r t \left[ 1 - \cos \frac{(2l+1) \pi }{2 \Theta } \right] } \nonumber \\&\quad - \frac{4f}{\Theta ^3} \sum _{l=0}^{\big \lfloor \frac{\Theta -1}{2}\big \rfloor } \cot ^2 {\textstyle \frac{(2l+1) \pi }{2\Theta }} \, e^{- f g r t \left[ 1 - \cos \frac{(2l+1) \pi }{\Theta } \right] }, \end{aligned}$$
(25)

where \(\lfloor \cdot \rfloor \) denotes the floor function. This expression is obtained from Eq. (4.24) in Elliott (2016b) just by multiplying by f and inserting a factor of fg into the exponents. This result is required for obtaining SNR lifetimes. The pairwise correlation functions required for \(\sigma (t)^2\) are computed via numerical matrix methods using the matrices \(\mathbb {M}_\pm \) for the filter model (given in Elliott (2016b) or implied by the transitions in Fig. 3A), and we also obtain the Hebb equilibrium distribution \(\underline{A}_2\) by numerical methods.

To estimate SNR lifetimes in the filter model for the Hopfield protocol, we consider the slowest decaying mode in the first and second terms of Eq. (25). For non-sparse coding, it is usually enough to consider just the slowest mode in the first term, but with sparseness, both terms must be considered for a better approximation. For \(\Theta \) large enough, we then have

$$\begin{aligned} \mu _{\mathrm {fil}}(t) \approx \frac{16 f}{\pi ^2 \Theta } \left( e^{-\pi ^2 f g r t / 8 \Theta ^2} - e^{-\pi ^2 f g r t / 2 \Theta ^2} \right) . \end{aligned}$$
(26)

Approximating the Hopfield variance by its asymptotic form \(\sigma (\infty )^2\), the single-perceptron SNR memory lifetime for the filter model is then

$$\begin{aligned} f g r \tau ^{\mathrm {(fil)}}_{\mathrm {snr}} \approx \frac{4 \Theta ^2}{\pi ^2} \log _e \frac{256}{\pi ^4 \Theta ^2} \frac{f^2}{\sigma _N(\infty )^2} - \frac{\pi ^4 \Theta ^5}{512} \frac{\sigma _N(\infty )^3}{f^3}, \nonumber \\ \end{aligned}$$
(27)

where in deriving this expression, we have regarded the second term as a correction to the first term, with the first term arising purely from the first term in Eq. (26). To obtain the population SNR memory lifetime \(\tau ^{\mathrm {(fil)}}_{\mathrm {pop}}\), we again just replace \(\sigma _N(\infty )\) by \(\sigma _{\mathscr {N}}(\infty )/\sqrt{g}\) in Eq. (27).

We also consider the serial synapse model (Leibold and Kempter 2008; Rubin and Fusi 2007). In this model, a synapse performs a symmetric, unbiased, one-step random walk on a set of 2s states between reflecting boundaries. The first (respectively, second) group of s states are identified as corresponding to strength \(-1\) (respectively, \(+1\)). For each strength state, there are thus s metastates. If a synapse has strength \(-1\) (respectively, \(+1\)) and experiences a sequence of depressing (respectively, potentiating) induction signals, then it is pushed into progressively higher metastates. However, the synapse can only change strength when in the lowest, \(i=1\) metastate. The transitions are illustrated in Fig. 3b. The transition matrices \(\mathbb {M}_\pm \) are just

$$\begin{aligned} \mathbb {M}_+&= \mathrm {diag}\{ \overbrace{0 , \ldots , 0}^{2s-1} , 1 \} + \mathrm {diag}_l \{ \overbrace{1, \ldots , 1}^{2s-1} \}, \end{aligned}$$
(28a)
$$\begin{aligned} \mathbb {M}_-&= \mathrm {diag}\{ 1, \underbrace{0, \ldots , 0}_{2s-1} \} + \mathrm {diag}_u \{ \underbrace{1, \ldots , 1}_{2s-1} \}, \end{aligned}$$
(28b)

where \(\mathrm {diag}_u\) and \(\mathrm {diag}_l\) denote the upper and lower diagonals, respectively. The eigen-decomposition of \(\mathbb {M} = \frac{1}{2}\left( \mathbb {M}_+ {+} \mathbb {M}_- \right) \) is standard (cf. Elliott 2016a, for the eigen-decomposition of the similar matrix \(\mathbb {C}\) there), so we can directly evaluate \(\mu (t) = f \, \underline{\Omega }^{\mathrm {T}} e^{\left( \mathbb {M} - \mathbb {I} \right) f g r t} \, \mathbb {M}_+ \underline{A}_1\), where \(\underline{A}_1^{\mathrm {T}} = \left( \, \underline{1}^{\mathrm {T}} \; | \; \underline{1}^{\mathrm {T}} \right) /(2s)\). We obtain

$$\begin{aligned} \mu _{\mathrm {ser}}(t) = \frac{f}{s^2} \sum _{l=0}^{s-1} (-1)^l \cot {\textstyle \frac{(2l+1)\pi }{4s}} \, e^{ - f g r t \left[ 1 - \cos \frac{(2l+1)\pi }{2s} \right] }. \end{aligned}$$
(29)

For the Hebb protocol, we again use numerical matrix methods to obtain \(\underline{A}_2\). To estimate SNR lifetimes for the Hopfield protocol, it is sufficient to consider just the slowest decaying term in Eq. (29), giving

$$\begin{aligned} \mu _{\mathrm {ser}}(t) \approx \frac{4f}{\pi s} e^{-\pi ^2 f g r t/8 s^2}, \end{aligned}$$
(30)

for s large enough, and hence

$$\begin{aligned} f g r \tau ^{\mathrm {ser}}_{\mathrm {snr}} \approx \frac{4s^2}{\pi ^2} \log _e \frac{16}{\pi ^2 s^2} \frac{f^2}{\sigma _N(\infty )^2}, \end{aligned}$$
(31)

as the required approximation, with \(\tau ^{\mathrm {ser}}_{\mathrm {pop}}\) obtained in the usual way.

In the cascade model of synaptic plasticity (Fusi et al. 2005), there are also 2s metalevels, s for each bistate strength state, but unlike the serial synapse model, a potentiating (respectively, depressing) induction signal for a synapse with strength \(-1\) (respectively, \(+1\)) in metastate i can with probability \(2^{1-i}\) (or \(2^{2-i}\) for \(i = s\)) cause the synapse to change strength and return to metastate \(i=1\). The same probabilities govern transitions to higher metastates. The transitions are illustrated in Fig. 3C. The cascade model essentially constitutes a tower of stochastic updaters that progressively render the synapse less labile. We have extensively analysed the cascade model elsewhere (Elliott and Lagogiannis 2012) and compared its memory performance to filter-based synapses, which outperform the cascade model in almost all biologically relevant regions of parameter space (Elliott 2016b). It is possible to obtain analytical results for the Laplace transform of the mean dynamics in the cascade model (Elliott and Lagogiannis 2012), but here we use numerical matrix methods. Rubin and Fusi (2007) give a formula for the SNR based on finding a fit to numerical results. The implied formula for the mean is

$$\begin{aligned} \mu _{\mathrm {cas}}(t) \approx \frac{14f}{5 s} \frac{e^{-f g r t / 2^{s-2}}}{1+ f g rt}. \end{aligned}$$
(32)

Taking the asymptotic variance \(\sigma (\infty )^2\) in the Hopfield protocol, we can then use the expression \(\mu _{\mathrm {cas}}(t) / \sigma _N(\infty )\) for the SNR. This still cannot be solved analytically for the SNR lifetime \(\tau ^{\mathrm {cas}}_{\mathrm {snr}}\) (or the population form \(\tau ^{\mathrm {cas}}_{\mathrm {pop}}\)), but we can use it to obtain numerical solutions that can be compared to results obtained from exact matrix methods.

A serial or cascade synapse possesses 2s states, with each set of s metalevels duplicated for each strength. Metalevel i for strength \(-1\) cannot be identified with metalevel i for strength \(+1\) because the transitions induced by plasticity induction signals are in opposite directions. This is in contrast to the filter model, in which the filter transitions are independent of the strength state. Serial and cascade synapses therefore possess fully 2s physical states characterising the state of a synapse, while a filter synapse possesses \(2 \Theta \) physical states and not \(2 (2 \Theta - 1)\) states. Hence, we may directly compare the performance of a filter synapse with threshold \(\Theta \) to a serial or cascade synapse with a total of 2s metastates, or s metastates per strength state.

4 Results

We now turn to a discussion of our results, comparing and contrasting the various models of synaptic plasticity considered above, for the Hebb and Hopfield protocols. For simplicity we consider simulation results only for SU synapses, to confirm and validate our analytical results. Simulations are run according to protocols discussed extensively elsewhere (see, for example, Elliott and Lagogiannis 2012; Elliott 2014), but modified to allow for sparse coding. We first consider single-perceptron memory lifetimes and then population memory lifetimes.

4.1 Single-perceptron memory lifetimes

In Fig. 4, we show results for memory lifetimes for SU synapses with no spontaneous activity, \(\zeta = 0\), comparing the Hopfield and Hebb protocols. We consider both FPT and SNR lifetimes, and for FPT lifetimes, we show results for both the FPE and MIE approaches. Simulation results are also shown, although only for \(f \ge 10^{-3}\): for smaller values it becomes increasingly difficult to obtain enough statistics for decent averaging due to the longer simulation run times. We select an update probability of \(p=1/10\), which is our standard choice of p in earlier work (see, for example, Elliott 2014). From Eq. (24), \(r \tau _{\mathrm {mfpt}}\) and \(r \sigma _{\mathrm {fpt}}\) are expected to scale as \(1/f^2\) for small f for the FPE approach, so we remove this scaling by multiplying by \(f^2\), which in this figure affords greater clarity and resolution.

Fig. 4
figure 4

Convergence of Hebb and Hopfield protocol results for stochastic updater synapses in the limit of sparse coding. Scaled single-perceptron memory lifetimes are shown as a function of sparseness, f. Results in red (respectively, blue) correspond to the Hopfield (respectively, Hebb) protocol. Shaded regions indicate \(f^2 r ( \tau _{\mathrm {mfpt}} \pm \sigma _{\mathrm {fpt}} )\) (with the central solid line showing \(f^2 r \tau _{\mathrm {mfpt}}\)) computed using the FPE approach to FPTs, so that we show the (scaled) MFPT \(\tau _{\mathrm {mfpt}}\) surrounded by the one standard deviation region around it, governed by \(\sigma _{\mathrm {fpt}}\). Short-dashed lines show \(f^2 r \tau _{\mathrm {mfpt}}\) obtained using the exact, MIE approach to FPTs. Circular data points correspond to results from simulation, for \(f \ge 10^{-3}\). Long-dashed lines show results for \(f^2 r \tau _{\mathrm {snr}}\); \(r \tau _{\mathrm {snr}} = 0\) for the Hebb protocol over the whole range of f in panel A. The value of N is indicated in each panel. In all panels, \(p = 1/10\), \(\zeta = 0\) and \(\vartheta = 0\)

Above we showed that the Hopfield and Hebb protocols must coincide for \(f \lessapprox 1/\sqrt{N}\). For the various choices of N used in Fig. 4, we see this convergence of both protocols’ results, becoming indistinguishable for f below \(1/\sqrt{N}\), for all forms of memory lifetime. Focusing first on \(r \tau _{\mathrm {mfpt}}\) from the FPE approach, for smaller N we clearly see that \(f^2 r \tau _{\mathrm {mfpt}}\) asymptotes to a common, N-independent constant as f becomes small; we would see the same behaviour for larger N too, but would need to take smaller values of f than those used in this figure. We also see that \(f^2 r \tau _{\mathrm {mfpt}}\) from the MIE approach tracks that from the FPE approach quite closely, and indeed for intermediate values of f and smaller choices of N, it plateaus, so that \(r \tau _{\mathrm {mfpt}}\) scales as \(1/f^2\) in this regime. However, for \(N=10^3\) and \(f \lessapprox 10^{-3}\), we clearly see the MIE \(f^2 r \tau _{\mathrm {mfpt}}\) turn downwards and approach zero as f decreases. This behaviour is consistent with the derived form of the exact scaling behaviour in Eq. (23), in which \(r \tau _{\mathrm {mfpt}} \propto 1/f\) for small f. We also just see this change for \(N=10^4\) for f close to \(10^{-4}\), but for larger N we would need to take f smaller to see the 1/f scaling of the exact form of \(r \tau _{\mathrm {mfpt}}\). Our simulation results agree with the results from the MIE approach, validating both. Although we do not take f small enough to see the switch to 1/f scaling for \(N=10^3\) in Fig. 4A, we nevertheless do clearly see the start of the down-turn at \(f=10^{-3}\).

For \(f > 1/\sqrt{N}\) in Fig. 4, we see very significant differences between the Hebb and Hopfield protocols. While for the Hopfield protocol \(f^2 r \tau _{\mathrm {mfpt}}\) grows like \(\log _e N\) for fN large enough, this is not the case for the Hebb protocol. For f in the region of unity, \(f^2 r \tau _{\mathrm {mfpt}}\) is roughly speaking independent of N. This means that the dynamics are dominated by the correlations between pairs of synapses’ strength in the Hebb protocol. For \(f=1\), we obtain \(r \tau _{\mathrm {mfpt}} \approx 5.34\) and 5.35 for \(N=10^3\) and \(N=10^6\), respectively, from the FPE approach. (The corresponding values from the MIE approach are 6.64 and 6.79, respectively.) In the regime of f not too far from unity, memory lifetimes in the Hebb protocol are therefore significantly reduced by the synaptic correlations induced by this protocol, where the influence of these correlations cannot be removed by increasing N.

We see that \(r \tau _{\mathrm {mfpt}}\) is robustly positive in Fig. 4 for all choices of N over the whole range of displayed f, and it remains so for small f because of the discussed scaling behaviour. However, looking at the one standard deviation region around \(r \tau _{\mathrm {mfpt}}\), it is clear that in some regimes for f, there can be high variability in FPT memory lifetimes. For the Hopfield protocol, this regime of high variability occurs for small f (where what is “small” f depends on N), while in the Hebb protocol, there is an additional regime for f close to unity. High variability does not mean that memories cannot be stored: \(r \tau _{\mathrm {mfpt}}\) is always robustly positive. Rather, high variability simply means that some memories are stored strongly while others are stored weakly or not at all.

Turning to a consideration of \(r \tau _{\mathrm {snr}}\), we see from Fig. 4 that \(r \tau _{\mathrm {snr}}\) exists (i.e. \(r \tau _{\mathrm {snr}} > 0\)) in precisely those regions of low variability in FPT lifetimes. Indeed, the results for \(r \tau _{\mathrm {snr}}\) track quite closely those for \(r (\tau _{\mathrm {mfpt}} - \sigma _{\mathrm {fpt}})\) over some range of f, and deviate from it elsewhere. We have shown in a non-sparse coding context that FPT and SNR lifetimes for simple synapses essentially coincide (up to additive constants) in the regime in which the distribution of \(h_0\) is tightly concentrated around its supra-threshold mean (Elliott 2017a). For the specific case of \(\vartheta = 0\), as here, we showed that if we can write the initial variance \(\sigma (0)^2\) in the form \(\sigma (0)^2 \approx B_0(N)/2\), then the parameter \(\mu ' \equiv \mu (0) \sqrt{2/B_0(N)}\) must be large enough, which means \(\mu ' \gtrapprox 2\) (Elliott 2017a). We then have that \(\mu ' \approx \mu (0) / \sigma (0) \gtrapprox 2\), which is just a condition on the initial SNR.Footnote 2 Using the pre-averaged form \(\langle B_0(N_{\mathrm {eff}}) \rangle _{N_{\mathrm {eff}}}\) (see Appendix A), this condition reduces to \(4/(p^2 N) \lessapprox f\) in the Hopfield protocol for \(\zeta = 0\). For the Hebb protocol, the limit of large N with p not too close to unity additionally satisfies the requirement on \(\sigma (0)^2\), giving the upper bound \(f \lessapprox p / 2\) for \(\zeta = 0\). In the Hebb protocol, we therefore have the interval \(4/(p^2 N) \lessapprox f \lessapprox p / 2\) for equivalence of SNR and FPT memory lifetimes, for \(\zeta = 0\). (We must have \(N \gtrapprox 8/p^3\) for this interval to exist.) With \(p = 1/10\) in Fig. 4, these conditions are \(400/N \lessapprox f\) and \(400/N \lessapprox f \lessapprox 0.05\) for the Hopfield and Hebb protocols, respectively. For \(400/N \lessapprox f\) in both protocols (except for the Hebb protocol for \(N=10^3\), where the bounding range of f is invalid), we do indeed see that the FPE results for \(f^2 r \tau _{\mathrm {mfpt}}\) and those for \(f^2 r \tau _{\mathrm {snr}}\) run essentially parallel to each other, but that for \(f < 400/N\), \(f^2 r \tau _{\mathrm {snr}}\) peels away from \(f^2 r \tau _{\mathrm {mfpt}}\). The same is true for the Hebb protocol for \(N > 10^3\): as f increases above 0.05, \(f^2 r \tau _{\mathrm {snr}}\) also peels away from \(f^2 r \tau _{\mathrm {mfpt}}\). Thus, these two estimates for the two protocols appear to capture well the region of f for which \(r \tau _{\mathrm {snr}}\) is a reliable indicator of memory longevity. SNR lifetimes are therefore acceptable surrogates for FPT lifetimes when the latter are subject to low variability, but outside these regions SNR lifetimes fail to capture the possibility of memory storage, albeit with high variability. Importantly, the requirement that \(f \gtrapprox 4/(p^2 N)\) in both protocols means that the SNR approach cannot be extended to very small or just small f, because such values violate the asymptotic regime. Essentially, then, the SNR approach cannot probe the very sparse coding regime in either protocol.

For the Hopfield protocol, Eq. (20) is just

$$\begin{aligned} r \tau _{\mathrm {snr}} \approx \frac{1}{2 f^2 p} \log _e \frac{N f^2 p^2}{f + (1-f) \zeta ^2}. \end{aligned}$$
(33)

With \(\zeta = 0\), we require \(f>1/(p^2 N)\) for \(r \tau _{\mathrm {snr}} > 0\), and we see precisely these threshold values for the different choices of N in Fig. 4. Alternatively, we require \(N>1/(fp^2)\) for memories to be stored according to the SNR criterion. However, these conditions do not carry over to FPT memory lifetimes: we need neither a minimum N nor a minimum f for \(r \tau _{\mathrm {mfpt}} > 0\), because it is always positive. This failure of SNR conditions to carry over to the FPT case also applies to any optimality conditions derived from \(r \tau _{\mathrm {snr}}\). From Eq. (33) with \(\zeta = 0\), we may find that value of f, \(f^{\mathrm {opt}}\), that maximises \(r \tau _{\mathrm {snr}}\), giving rise to \(r \tau _{\mathrm {snr}}^{\mathrm {opt}}\), with the result that \(f^{\mathrm {opt}} = \sqrt{e}/(p^2 N)\). The same value essentially applies to the Hebb protocol, albeit with complicated corrections. However, for the validity of the SNR results, both protocols require \(f \gtrapprox 4/(p^2 N)\). If the SNR optimality condition is valid, then it must satisfy \(f^{\mathrm {opt}} = \sqrt{e}/(p^2 N) \gtrapprox 4/(p^2 N)\), or \(\sqrt{e} \gtrapprox 4\). This is clearly false, and hence the SNR optimality condition for f is spurious, because at \(f = f^{\mathrm {opt}}\), the asymptotic validity condition is violated. In fact, we may essentially take f as small as we like and \(r \tau _{\mathrm {mfpt}}\) will continue to grow, albeit with increasing variability in the FPT lifetimes. Thus, although we will shortly consider optimality conditions for SNR memory lifetimes with complex synapses, these conditions must be viewed with extreme caution.

Figure 4 considers only the case of exactly zero spontaneous activity, \(\zeta = 0\). In Fig. 5, we examine the impact of spontaneous activity on SU memory lifetimes. We show only the case of \(N = 10^5\) to avoid unnecessary clutter, but the results are qualitatively similar for other choices of N. In the Hopfield protocol, \(\zeta \) appears only through a quadratic term in \(B(h')\) or \(\sigma (t)^2\), while in the Hebb protocol, \(\zeta \) also appears through a linear term. This difference makes the Hebb protocol much more sensitive to spontaneous activity than the Hopfield protocol, and we see this explicitly in Fig. 5. In the Hopfield protocol, the asymptotic variance takes the form \(\sigma (\infty )^2 = [ f + (1-f) \zeta ^2 ]/N\), so \(\zeta \) exerts a significant influence on memory lifetimes only for \(f \lessapprox \zeta ^2\). We therefore only start to see a divergence of memory lifetimes from those for \(\zeta = 0\) at around \(f \approx \zeta ^2\), and this is confirmed in the figure. However, as f is taken small, the dependence of \(r \tau _{\mathrm {mfpt}}\) (from the FPE) on \(\zeta \) is lost (just as its dependence on N is lost), so that for very small f, \(\zeta \) does not affect (FPE) FPT lifetimes, neither their means nor their variances. This is because for small f, the scaling results in Eq. (24) depend only on the A and not the B jump moment, so they depend only on drift and not diffusion. However, \(\zeta \) appears only through the diffusion term. In contrast to the Hopfield protocol, even a choice of \(\zeta = 0.01\) induces a large reduction in memory lifetimes in the Hebb protocol, at least away from the small f regime. For small f, the Hebb and Hopfield protocols coincide, so we observe the same loss of dependence on \(\zeta \) in (FPE) FPT lifetimes in the Hebb protocol. However, away from the small f regime, the linear term in \(\zeta \) in B or \(\sigma (t)^2\) significantly impacts memory lifetimes.

Fig. 5
figure 5

Impact of spontaneous activity on stochastic updater single-perceptron memory lifetimes. Results are shown for \(f^2 r \tau _{\mathrm {mfpt}}\) (from the FPE approach) and \(f^2 r \tau _{\mathrm {snr}}\) for both the Hopfield and Hebb protocols, as indicated in the different panels. Different line styles correspond to different levels of spontaneous activity, \(\zeta \), as indicated in the common legend in panel D. Some lines style are absent in panel D because there is no corresponding \(r \tau _{\mathrm {snr}} > 0\). In all panels we take \(N = 10^5\), with \(p = 1/10\) and \(\vartheta = 0\) in all cases

Examining Eq. (33) for the Hopfield protocol, for \(\zeta = 0\), we have just \(f^1\) in the logarithm, while for \(\zeta = 1\), we have \(f^2\). Roughly speaking, for intermediate values of \(\zeta \), the effective power of f switches rapidly from one to two in the vicinity of \(f = \zeta ^2\). This switching can be seen clearly in Fig. 5, where as f decreases, \(f^2 r \tau _{\mathrm {snr}}\) (and also \(f^2 r \tau _{\mathrm {mfpt}}\)) tracks closely the form for \(\zeta = 0\), until it rapidly peels away, following a different power. Although it is still clearly the case that optimality conditions obtained from \(r \tau _{\mathrm {snr}}\) are invalid, it is nevertheless worth examining \(f^{\mathrm {opt}}\). For \(\zeta = 0\), we again obtain \(f^{\mathrm {opt}} = \sqrt{e}/(p^2 N)\), but for \(\zeta = 1\), we instead obtain \(f^{\mathrm {opt}} = \sqrt{e}/(p^2 N)^{1/2}\), so that the N-dependence changes. The corresponding optimal lifetimes are \(r \tau _{\mathrm {snr}}^{\mathrm {opt}} = p^3 N^2 / (4 e)\) for \(\zeta = 0\) and \(r \tau _{\mathrm {snr}}^{\mathrm {opt}} = p N / (2 e)\) for \(\zeta = 1\). Of course, we see explicitly in Fig. 5 that these SNR-derived optimal values of f and thus maximum possible SNR lifetimes are invalid, but SNR lifetimes do at least indicate when FPT lifetimes are subject to lower variability and when they are subject to higher variability.

Considering \(\zeta = 1\) is of course biologically meaningless, as then there is no distinction between spontaneous and evoked electrical activity levels. However, taking either \(\zeta = 0\) or \(\zeta = 1\) allows explicit optimality results to be obtained for these two cases, while such results are not available for intermediate values of \(\zeta \). As just indicated, empirically we observe a very rapid switching in dynamics in the vicinity of \(\zeta = \sqrt{f}\), with the explicit results for \(\zeta =0\) and \(\zeta =1\) therefore indicating the general behaviour prior to and after, respectively, this switching. When we give results for \(\zeta = 1\), we therefore do so with this understanding: that the limit is biologically meaningless, but that it nevertheless indicates the general behaviour for \(\zeta \) in excess of around \(\sqrt{f}\).

We now turn to complex models of synaptic plasticity, considering only SNR lifetimes. In Figs. 6 and 7, we plot SNR lifetimes against sparseness, f, for the three complex models discussed above, for both zero and nonzero spontaneous firing rates, and for the Hopfield (Fig. 6) and Hebb (Fig. 7) protocols. All results are obtained by numerical matrix methods to solve the SNR equation \(\mu (\tau _{\mathrm {snr}}) = \sigma (\tau _{\mathrm {snr}})\), where the standard deviation \(\sigma (t)\) is computed fully rather than via just its asymptotic form \(\sigma (\infty )\).

Fig. 6
figure 6

Spontaneous activity reduces single-perceptron memory lifetimes and limits sparseness in complex synapse models: Hopfield protocol. Single-perceptron SNR memory lifetimes are shown for different complex models of synaptic plasticity under the Hopfield protocol, as a function of sparseness, f. Each panel shows results for the indicated model and choice of spontaneous activity, either \(\zeta = 0\) or \(\zeta = 0.1\). Results are shown for \(\Theta \) or s ranging from 2 to 12 in increments of 2, with the particular choice identified by the line colour described by the common legend in panel B. In all cases, \(N = 10^5\)

Fig. 7
figure 7

Spontaneous activity reduces single-perceptron memory lifetimes and limits sparseness in complex synapse models: Hebb protocol. The format of this figure is identical to Fig. 6, except that it shows results for the Hebb protocol, and in the right-hand panels we use a smaller value \(\zeta = 0.01\). Some lines of specific colour are absent in some graphs because there is no corresponding \(r \tau _{\mathrm {snr}} > 0\). In all cases, \(N = 10^5\)

For the Hopfield protocol in Fig. 6 we see in all cases and for all choices of parameters an onset of SNR lifetimes for a minimum, threshold value of f, the rapid attainment of a peak or optimal value of \(r \tau _{\mathrm {snr}}\), followed by a steady fall in lifetimes as f increases further. For all complex models, this onset of SNR lifetimes occurs for increasingly large values of f as \(\Theta \) or s increases. At least for the parameter ranges in this figure, in the filter and serial models, for a given choice of f, increasing \(\Theta \) or s increases \(r \tau _{\mathrm {snr}}\), although as the number of internal states continues to increase, ultimately \(r \tau _{\mathrm {snr}}\) will start to fall. In the case of the cascade model, however, the dependence of \(r \tau _{\mathrm {snr}}\) on s for fixed f is not as simple as for the other complex models. We note that for all models in this figure, the optimal values of \(r \tau _{\mathrm {snr}}\) decrease for increasing \(\Theta \) or s, at least for \(\zeta = 0\). However, when we increase the spontaneous activity to \(\zeta = 0.1\), the optimal values lose most of their dependence on \(\Theta \) or s in the filter and serial models, although not in the cascade model. This loss of dependence on \(\Theta \) or s is strongly N- and \(\zeta \)-dependent. For \(N=10^3\), we must take \(\zeta \) close to unity before this loss of dependence is noticeable, while for \(N=10^6\), even \(\zeta =10^{-3/2} \approx 0.0316\) is sufficient.

For the Hebb protocol, in Fig. 7, for smaller f we obtain essentially the same results as for the Hopfield protocol because these two protocols must coincide for \(f \lessapprox 1/\sqrt{N}\), regardless of the model of synaptic plasticity. However, for larger f, the synaptic correlation terms induced by the Hebb protocol again significantly impact SNR memory lifetimes, with the impact being greater for larger \(\Theta \) or s in the filter and serial models. Thus, as with SU synapses under the Hebb protocol, SNR lifetimes exist only in some interval of f (below \(f=1\)), with this interval shrinking and disappearing as the number of internal states increases (or as p decreases for SU synapses). These dynamics dramatically limit the number of internal states that give rise to positive SNR lifetimes. Nevertheless, as \(\Theta \) or s increases, \(r \tau _{\mathrm {snr}}\) in general increases, at least until the permissible range of f becomes very small and then disappears entirely. For the cascade model, however, the upper limit on f is roughly speaking independent of s, but we also see that in general, as s increases, \(r \tau _{\mathrm {snr}}\) decreases for fixed f. This relative insensitivity of the upper limit of the permissible range of f to s in the cascade model occurs because the cascade model has different metastates with different update probabilities, with some synapses residing in the lower metastates and so having larger update probabilities than those residing in higher metastates.

In the presence of spontaneous activity, we see a dramatic change in the memory lifetimes. Indeed, such is the sensitivity of the Hebb protocol to \(\zeta \), especially for complex synapse models, that in contrast to Fig. 6 for the Hopfield protocol, for which we took \(\zeta = 0.1\), in Fig. 7 we take \(\zeta = 0.01\). Even with just 1% spontaneous activity, the filter and serial models’ number of internal states becomes severely restricted, in terms of giving rise to positive SNR lifetimes. The cascade model under the Hebb protocol is not quite so sensitive, again because of its different metastates, but a 10% level of spontaneous activity would still dramatically restrict the permissible ranges of f and s, compared to the Hopfield protocol.

We quantify these observations by explicitly considering the optimal choices of the parameters f and either \(\Theta \) or s, so \(f^{\mathrm {opt}}\) and either \(\Theta ^{\mathrm {opt}}\) or \(s^{\mathrm {opt}}\), that maximise \(r \tau _{\mathrm {snr}}\), giving rise to \(r \tau _{\mathrm {snr}}^{\mathrm {opt}}\). In Figs. 8 and 9, we plot \(f^{\mathrm {opt}}\) and \(r \tau _{\mathrm {snr}}^{\mathrm {opt}}\) against \(\Theta \) or s, for different levels of spontaneous activity, \(\zeta \), for the particular choice of \(N = 10^5\). Results are obtained both by numerical matrix methods and by using the approximations for \(\mu (t)\) and \(r \tau _{\mathrm {snr}}\) given in Sect. 3.2. For the latter, we maximise \(r \tau _{\mathrm {snr}}\) as a function of f for fixed \(\Theta \) or s.

Fig. 8
figure 8

Optimal sparseness in complex synapse models for single perceptrons in the Hopfield protocol. The left-hand panels (A, C, E) show the optimal single-perceptron memory lifetimes \(r \tau _{\mathrm {snr}}^{\mathrm {opt}}\) obtained at the corresponding optimal levels of sparseness \(f^{\mathrm {opt}}\) shown in the right-hand panels (B, D, F), for the indicated complex models. Lines show numerical matrix results while the corresponding data points show approximate analytical results obtained as discussed in the main text. Results are shown for different values of \(\zeta \), with identifying line styles corresponding to those in Fig. 5. We have set \(N = 10^5\) in all panels

Fig. 9
figure 9

Optimal sparseness in complex synapse models for single perceptrons in the Hebb protocol. The format of this figure is essentially identical to Fig. 8, except that it shows results for the Hebb protocol. Approximate analytical results are not available for the Hebb protocol and so are not present. The termination of a line at a threshold value of \(\Theta \) or s indicates that above that value, no choice of f generates \(r \tau _{\mathrm {snr}} > 0\). We have set \(N = 10^5\) in all panels

For the Hopfield protocol in Fig. 8, we see that for \(\zeta = 0\), \(r \tau _{\mathrm {snr}}^{\mathrm {opt}}\) falls as a function of \(\Theta \) or s. However, in the filter and serial models, as \(\zeta \) increases, the fall in \(r \tau _{\mathrm {snr}}^{\mathrm {opt}}\) with \(\Theta \) or s reduces and disappears; indeed, the exact results in fact show a very slight increase in \(r \tau _{\mathrm {snr}}^{\mathrm {opt}}\) with \(\Theta \) or s for \(\zeta = 1\), although this behaviour is not noticeable in Fig. 8. For the displayed choice of \(N = 10^5\), we need only take \(\zeta \approx 0.1\) for the filter and serial models’ \(r \tau _{\mathrm {snr}}^{\mathrm {opt}}\) to be relatively insensitive to \(\Theta \) or s. This is N-dependent: for \(N=10^6\), even \(\zeta = 0.01\) is sufficient; for \(N = 10^3\), \(\zeta \) needs to be quite close to unity. In contrast, for the cascade model, \(r \tau _{\mathrm {snr}}^{\mathrm {opt}}\) always falls with s for any choice of \(\zeta \), including \(\zeta = 1\). The behaviour of the filter and serial models’ \(r \tau _{\mathrm {snr}}^{\mathrm {opt}}\) is easy to extract from the approximate results in Sect. 3.2. Ignoring for simplicity the correction terms in Eq. (27), both filter and serial models’ \(r \tau _{\mathrm {snr}}\) can be written in the form

$$\begin{aligned} r \tau _{\mathrm {snr}} \approx \frac{a \, q^2}{f^2} \log _e \frac{N b f^2}{q^2 [ f + (1-f) \zeta ^2]} \end{aligned}$$
(34)

(cf. Eq. (33)), where a and b are numerical constants and q denotes \(\Theta \) or s. For \(\zeta = 0\) we obtain \(f^{\mathrm {opt}} = q^2 \sqrt{e} / (b N)\) and \(r \tau _{\mathrm {snr}}^{\mathrm {opt}} \approx a b^2 N^2 / (2 e q^2)\), while for \(\zeta = 1\) we obtain \(f^{\mathrm {opt}} = q \sqrt{e} / \sqrt{b N}\) and \(r \tau _{\mathrm {snr}}^{\mathrm {opt}} \approx a b N / e\). Therefore, \(f^{\mathrm {opt}}\) scales differently with q and with N in these two cases, and for \(\zeta = 0\), \(r \tau _{\mathrm {snr}}^{\mathrm {opt}}\) falls as q increases, but for \(\zeta = 1\), \(r \tau _{\mathrm {snr}}^{\mathrm {opt}}\) is completely independent of q. Intermediate choices of \(\zeta \) result in intermediate behaviours between these two extremes, and the corrections terms in Eq. (27) provide only corrections to, rather than fundamentally alter, this behaviour. We see in Fig. 8 that the numerical and approximate analytical results agree well for the filter and serial models, and that, moreover, both these models’ optimal values are very similar. Unfortunately, in the case of the cascade model, no such simple analysis, even using the fitted form for \(\mu _{\mathrm {cas}}(t)\) in Eq. (32), is available to explain the fact that \(r \tau _{\mathrm {snr}}^{\mathrm {opt}}\) falls with s for all values of \(\zeta \), including \(\zeta = 1\). The numerical and fitted results for \(r \tau _{\mathrm {snr}}^{\mathrm {opt}}\) agree well in the cascade model, although there are quite large discrepancies in \(f^{\mathrm {opt}}\) obtained from the (exact) numerical methods and the fitted expression, particularly for larger values of s and for \(\zeta \) closer to zero than to unity. Fitting our numerical matrix results for \(r \tau _{\mathrm {snr}}^{\mathrm {opt}}\) in the cascade model to power laws in s and N for large enough s, we find that for \(\zeta = 0\), \(f^{\mathrm {opt}} \sim s^2/N\) and \(r \tau _{\mathrm {snr}}^{\mathrm {opt}} \sim N^2/s^4\), while for \(\zeta = 1\), \(f^{\mathrm {opt}} \sim s/\sqrt{N}\) and \(r \tau _{\mathrm {snr}}^{\mathrm {opt}} \sim N/s^2\). While the scaling behaviour of \(f^{\mathrm {opt}}\) is the same as that in the filter and serial models, the dependence of \(r \tau _{\mathrm {snr}}^{\mathrm {opt}}\) on \(q \equiv s\) differs in the cascade model compared to the filter and serial models.

For the Hebb protocol in Fig. 9, again the pairwise correlation structure present in \(\sigma (t)^2\), and the Hebb protocol’s extreme sensitivity to even very small levels of spontaneous activity \(\zeta \), have a significant impact on optimality conditions. In the filter and serial models, the permissible range of \(\Theta \) or s is considerably reduced, so that even for \(\zeta = 0.01\), s cannot exceed 6 in the serial synapse model, or 5 in the filter model. As N is reduced from the displayed value of \(N = 10^5\), the permissible ranges of \(\Theta \) and s reduce. The cascade model in the Hebb protocol is also extremely sensitive to noise, but as discussed, the different metastates’ different update probabilities somewhat ameliorate this sensitivity. Nevertheless, increasing \(\zeta \) from \(\zeta = 0\) to just \(\zeta = 0.1\) reduces \(r \tau _{\mathrm {snr}}^{\mathrm {opt}}\) by several orders of magnitude.

In Figs. 10 and 11 we instead examine \(\Theta ^{\mathrm {opt}}\) or \(s^{\mathrm {opt}}\) as a function of f, rather than vice versa, so that we maximise \(r \tau _{\mathrm {snr}}\) with respect to \(\Theta \) or s while holding f fixed. For the Hopfield protocol in Fig. 10, subject to a minimum, threshold requirement, \(\Theta ^{\mathrm {opt}}\) or \(s^{\mathrm {opt}}\) increases as a function of f, for any level of spontaneous activity \(\zeta \), in all three complex models considered here. However, as \(\zeta \) moves from \(\zeta = 0\) to \(\zeta = 1\), the functional dependence of \(\Theta ^{\mathrm {opt}}\) or \(s^{\mathrm {opt}}\) on f changes. We can derive this explicitly by again using the simple expression for \(r \tau _{\mathrm {snr}}\) in Eq. (34) for the filter and serial models. The optimal value of q (either \(\Theta \) or s) is

$$\begin{aligned} q^{\mathrm {opt}} = f \frac{\sqrt{b N}}{\sqrt{e [f + (1-f) \zeta ^2]}}. \end{aligned}$$
(35)

Thus, as f increases, \(q^{\mathrm {opt}}\) essentially switches from linear growth in f to slower, \(\sqrt{f}\) growth, at around \(f \approx \zeta ^2\). This behaviour is clearer for the smaller nonzero choices of \(\zeta \) used in Fig. 10. The corrections due to the additional terms in Eq. (27) do not fundamentally change this behaviour for the filter model. The corresponding optimal SNR memory lifetime is

$$\begin{aligned} r \tau _{\mathrm {snr}}^{\mathrm {opt}} \approx \frac{a b N}{e [f + (1-f) \zeta ^2]} \, \, \, \, \, \, \text{(at } q^{\mathrm {opt}}\text{) }. \end{aligned}$$
(36)

For \(\zeta = 0\), \(r \tau _{\mathrm {snr}}^{\mathrm {opt}}\) decreases as f increases, but for \(\zeta = 1\), \(r \tau _{\mathrm {snr}}^{\mathrm {opt}}\) is independent of f. As f increases, the transition from \(r \tau _{\mathrm {snr}}^{\mathrm {opt}}\) being independent of f to falling as 1/f is again sharp, occurring around \(f \approx \zeta ^2\). This transition is clear for the filter and serial models in Fig. 10. In the case of the cascade model, however, although \(s^{\mathrm {opt}}\) increases with f, albeit according to clearly different power laws than for the filter and serial models, the corresponding value of \(r \tau _{\mathrm {snr}}^{\mathrm {opt}}\) always decreases as a function of f, regardless of \(\zeta \).

Fig. 10
figure 10

Optimal synaptic complexity in complex synapse models for single perceptrons in the Hopfield protocol. The format of this figure is very similar to Fig. 8, except that we have optimised with respect to \(\Theta \) or s rather than f. In panels A and C the lines switch from numerical matrix to approximate analytical results when the corresponding values of \(\Theta ^{\mathrm {opt}}\) or \(s^{\mathrm {opt}}\) exceed 20 in the right-hand panels; before this transition, the lines correspond to numerical matrix results and the discrete points to approximate analytical results. We have set \(N = 10^5\) in all panels

Fig. 11
figure 11

Optimal synaptic complexity in complex synapse models for single perceptrons in the Hebb protocol. The format of this figure is essentially identical to Fig. 10, except that approximate analytical results are not available for the Hebb protocol. We have set \(N = 10^5\) in all panels

For the Hebb protocol in Fig. 11, again the small f behaviour must be identical to that for the Hopfield protocol in Fig. 10. However, the increase in \(\Theta ^{\mathrm {opt}}\) or \(s^{\mathrm {opt}}\) with increasing f is halted and then reversed as f increases further, as the effects of the pairwise synaptic correlations induced by the Hebb protocol are felt. These correlations not only pull down the optimal value of \(\Theta ^{\mathrm {opt}}\) or \(s^{\mathrm {opt}}\), but they also have a deleterious effect on \(r \tau _{\mathrm {snr}}^{\mathrm {opt}}\), changing the 1/f behaviour of the filter and serial models in the Hopfield protocol to approximately \(1/f^3\) behaviour in the Hebb protocol (obtained by fitting), for \(\zeta = 0\). Furthermore, while spontaneous activity can make \(r \tau _{\mathrm {snr}}^{\mathrm {opt}}\) independent of f before the switch to 1/f behaviour in the Hopfield protocol for the filter and serial models, in the Hebb protocol, \(r \tau _{\mathrm {snr}}^{\mathrm {opt}}\) always decreases with increasing f, for all three complex models considered here.

4.2 Population memory lifetimes

We now turn to population SNR memory lifetimes. Because \(\mathrm {SNR}_p(t) \approx \sqrt{g P} \, \mathrm {SNR}(t)\), optimisation of \(\tau _{\mathrm {pop}}\) with respect to \(\Theta \) or s is not affected by the additional factor of \(\sqrt{g}\). When we instead optimise population SNR lifetimes with respect to \(f = g\), however, the additional factor of \(\sqrt{g}\) in \(\mu _p(t) / \sigma _p(t)\) compared to \(\mu (t) / \sigma (t)\) functionally changes the optima compared to those for a single perceptron, and so we focus on this case. Because of the independence approximation involved in estimating \(\tau _{\mathrm {pop}}\) in Sect. 2.4, \(\tau _{\mathrm {pop}}\) is only an upper bound on population SNR lifetimes, and this will be implicit below.

For simple, SU synapses, Eq. (20) indicates that \(\tau _{\mathrm {snr}}\) and \(\tau _{\mathrm {pop}}\) differ in the logarithmic term, with the former having argument \(N f^2 p^2/[f + (1-f) \zeta ^2]\) and the latter \(\mathscr {N} f^3 p^2/[f + (1-f) \zeta ^2]\). We therefore see immediately that single-perceptron SNR lifetimes with \(\zeta = 1\) and population SNR lifetimes with \(\zeta = 0\) have identical f-dependence. For single-perceptron lifetimes with \(\zeta = 0\) and \(\zeta = 1\) and population lifetimes with \(\zeta = 0\) and \(\zeta = 1\), the f-dependence under the logarithm is \(f^1\), \(f^2\), \(f^2\) and \(f^3\), respectively. The effective power of f switches rapidly in the vicinity of \(f = \zeta ^2\), in an N- or \(\mathscr {N}\)-dependent way. Because \(\mathscr {N} = NP \gg N\), we expect very rapid switching in the population case, and with only very small, even negligible levels of spontaneous activity being required to induce the change in effective power. Above we found for single-perceptron optimal lifetimes that \(r \tau _{\mathrm {snr}}^{\mathrm {opt}} \approx p^3 N^2/(4 e)\) at \(f^{\mathrm {opt}} = \sqrt{e}/(p^2 N)\) for \(\zeta = 0\), and \(r \tau _{\mathrm {snr}}^{\mathrm {opt}} \approx p N /(2 e)\) at \(f^{\mathrm {opt}} = \sqrt{e} / (p^2 N)^{1/2}\) for \(\zeta = 1\). For optimal population lifetimes these become \(r \tau _{\mathrm {pop}}^{\mathrm {opt}} \approx p \, \mathscr {N} /(2 e)\) at \(f^{\mathrm {opt}} = \sqrt{e} / (p^2 \mathscr {N})^{1/2}\) for \(\zeta = 0\), and \(r \tau _{\mathrm {pop}}^{\mathrm {opt}} \approx 3 (p \, \mathscr {N}^2)^{1/3} / (4 e)\) at \(f^{\mathrm {opt}} = \sqrt{e} / (p^2 \mathscr {N})^{1/3}\). Spontaneous activity changes the N-dependence of \(r \tau _{\mathrm {snr}}^{\mathrm {opt}}\) from \(N^2\) to N, and the \(\mathscr {N}\)-dependence of \(r \tau _{\mathrm {pop}}^{\mathrm {opt}}\) from \(\mathscr {N}\) to \(\mathscr {N}^{2/3}\), which latter is a smaller overall reduction, although in all cases the dependence on p involves a positive power. Because the dominant behaviour of SNR lifetimes in the filter and serial models is governed by a similar, single logarithmic term, many of these scaling observations for simple synapses carry over unchanged to these complex synapses.

We examine the behaviour of optimal population SNR lifetimes for complex synapses in Fig. 12. Compared to the single-perceptron optimal SNR lifetimes in Fig. 8, the population results in Fig. 12 are markedly different, particularly for the filter and serial models. For these models, with \(\zeta = 0\), \(r \tau _{\mathrm {pop}}^{\mathrm {opt}}\) is now approximately independent of \(\Theta \) or s, while with \(\zeta > 0\), \(r \tau _{\mathrm {pop}}^{\mathrm {opt}}\) grows as a function of \(\Theta \) or s. Even with \(\zeta = 0.01\), this growth is present and of almost the same profile as that for \(\zeta = 1\), while at the single-perceptron level, for smaller choices of N it is necessary to take \(\zeta \) close to unity to halt the decrease in \(r \tau _{\mathrm {pop}}^{\mathrm {opt}}\) with increasing \(\Theta \) or s. This sensitivity to small, nonzero values of \(\zeta \) at the population level is \(\mathscr {N}\)-dependent, but even with \(\mathscr {N} = 10^6\) (e.g. \(N = 10^3\) and \(P = 10^3\)), we only require \(\zeta = 0.1\) for \(r \tau _{\mathrm {pop}}^{\mathrm {opt}}\) to adopt the same profile as that for \(\zeta = 1\). For the cascade model, however, optimal population SNR lifetimes fall with s just as they do for single perceptrons. Nonzero \(\zeta \) does render \(r \tau _{\mathrm {pop}}^{\mathrm {opt}}\) nearly independent of s for small s (\(s \le 6\)), but for larger s, \(r \tau _{\mathrm {pop}}^{\mathrm {opt}}\) falls with s. We may quantify the filter and serial models’ population SNR lifetimes as before, using the slowest decaying modes. We obtain

$$\begin{aligned} r \tau _{\mathrm {pop}} = \frac{a \, q^2}{f^2} \log _e \frac{\mathscr {N} b f^3}{q^2 [ f + (1-f) \zeta ^2 ]}, \end{aligned}$$
(37)

where now we have \(f^3\) rather than \(f^2\) in the numerator of the logarithm, just as for SU synapses. The optimal values of f are now \(f^{\mathrm {opt}} = q \sqrt{e} / \sqrt{b \mathscr {N}}\) (cf. \(q^2 \sqrt{e} / (b N)\)) for \(\zeta = 0\) and \(f^{\mathrm {opt}} = q^{2/3} \sqrt{e} / (b \mathscr {N})^{1/3}\) (cf. \(q \sqrt{e} / \sqrt{b N}\)) for \(\zeta = 1\). The corresponding optimal memory lifetimes are \(r \tau _{\mathrm {pop}}^{\mathrm {opt}} = a b \mathscr {N} / e\) (cf. \(a^2 b N^2/(2 e q^2)\)) and \(r \tau _{\mathrm {pop}}^{\mathrm {opt}} = 3 a (b q \mathscr {N})^{2/3} / (2 e)\) (cf. abN/e), respectively. The corrections due to the additional terms in the filter models’ results again modify but do not fundamentally alter this behaviour. Thus, at the population level, for \(\zeta = 0\), the filter and serial models’ optimal SNR lifetimes are independent of q, while at the single perceptron level, they fall as \(1/q^2\). However, for \(\zeta = 1\), the population lifetimes grow as \(q^{2/3}\), while for a single perceptron, they are constant. We cannot obtain similar analytical results for the cascade model, so we fit the numerical results for the cascade model in Fig. 12 to power laws in s and \(\mathscr {N}\). We find that for larger values of s, at the population level \(r \tau _{\mathrm {pop}}^{\mathrm {opt}} \sim \mathscr {N}/s^2\) with \(f^{\mathrm {opt}} \sim s/\sqrt{\mathscr {N}}\) for \(\zeta = 0\) (cf. \(\mathscr {N}^2/s^4\) and \(s^2/\mathscr {N}\), respectively, for a single perceptron), and \(r \tau _{\mathrm {pop}}^{\mathrm {opt}} \sim \mathscr {N}^{2/3}/s^{4/3}\) with \(f^{\mathrm {opt}} \sim s^{2/3}/\mathscr {N}^{1/3}\) for \(\zeta = 1\) (cf. \(\mathscr {N}/s^2\) and \(s/\sqrt{\mathscr {N}}\), respectively, for a single perceptron), with the same rapid switching behaviour for intermediate \(\zeta \) as for the filter and serial models. The population dynamics soften the fall of \(r \tau _{\mathrm {pop}}^{\mathrm {opt}}\) with s, but not enough to turn the dependence into growth with s.

Fig. 12
figure 12

Optimal sparseness in complex synapse models for neuronal populations in the Hopfield protocol. The format of this figure is essentially identical to that in Fig. 8, which shows results for the single-perceptron case. Lines show numerical solutions of the equation \(\mu _p(\tau _{\mathrm {pop}}) / \sigma _p(\tau _{\mathrm {pop}}) = 1\) maximised with respect to f, so \(r \tau _{\mathrm {pop}}^{\mathrm {opt}}\) at \(f = f^{\mathrm {opt}}\), while data points show approximate analytical results. We have set \(N=10^4\) and \(P=10^8\), or \(\mathscr {N} = 10^{12}\), in all panels

In Table 2 we summarise the scaling behaviour of \(r \tau ^{\mathrm {opt}}\) and \(f^{\mathrm {opt}}\) as functions of either p or q and of either N or \(\mathscr {N}\), for simple and complex synapses, for both single-perceptron and population results, for \(\zeta = 0\) and \(\zeta = 1\). In each column, regardless of the model, \(f^{\mathrm {opt}}\) scales identically as a function of q or \(p^{-1}\) and of N or \(\mathscr {N}\). This is not surprising given that the SU, filter and serial model results for \(f^{\mathrm {opt}}\) come from the same dominating logarithmic behaviour, but the cascade results are obtained by fitting numerical matrix data to power laws to extract the behaviour. For \(\tau ^{\mathrm {opt}}\), we also obtain the same scaling behaviour as a function of N or \(\mathscr {N}\) within each column, again regardless of the model. However, the scaling of \(\tau ^{\mathrm {opt}}\) with q (or \(p^{-1}\)) within each column does depend on the particular model of plasticity. Across a row, moving from single-perceptron \(\zeta =0\) and \(\zeta =1\) results to population \(\zeta =0\) and \(\zeta =1\) results, the dependence of \(\tau ^{\mathrm {opt}}\) on q (or \(p^{-1}\)) changes in such a way that increasing q (or decreasing p) has an increasingly less deleterious effect on memory lifetimes for SU and cascade synapses. For SU synapses, the power of p reduces from 3 to 1 to \(\frac{1}{3}\), while for cascade synapses the power of q (or s) changes from \(-4\) to \(-2\) to \(-\frac{4}{3}\). For both SU and cascade synapses, optimal memory lifetimes therefore always decrease as p decreases or s (the number of metastates) increases, regardless of the level of spontaneous activity, and regardless of whether at a single-perceptron or population level. For filter and serial synapses, however, the power of q changes from \(-2\) to 0 (i.e. no dependence) to \(+\frac{2}{3}\). Increasing the number of serial metastates or filter states available to a filter or serial synapse therefore increases optimal population SNR lifetimes, but only in the presence of spontaneous activity. As Fig. 12 indicates, we need only have very low levels of spontaneous activity to induce this growth of optimal population SNR lifetimes with the number of internal states available to filter or serial synapses.

Table 2 Overall dependence of optimal single-perceptron and population SNR memory lifetimes and the corresponding optimal sparseness on model parameters. Here, q represents \(\Theta \) or s, depending on the complex model, and is assumed large

5 Discussion

Memory is a complex, multi-level, system-wide phenomenon involving processes occurring over many time scales and across different brain regions, with integrated and orchestrated control processes coordinating, for example, the transition from short- to long-term memory (Eichenbaum and Cohen 2001). Palimpsest models of memory, in which older memories are forgotten as newer ones are stored (Nadal et al. 1986; Parisi 1986), focus on the dynamics of memory storage and retrieval within a single memory system, such as the hippocampal CA3 recurrent network (Andersen et al. 2007). Sparse population coding (see, for example, Csicsvari et al. 2000; Olshausen and Field 2004) enhances memory lifetimes in these memory models by reducing the overall rate of synaptic plasticity at single synapses, so effectively dilating time, and by decorrelating synaptic updates induced by overlapping memories (Tsodyks and Feigel’man 1988). Complex synapse models, involving metaplastic changes in synapses’ internal states without associated changes in synaptic strength, have also been proposed as a way in which to enhance memory lifetimes in palimpsest models (Fusi et al. 2005), whereas we introduced models of integrate-and-express, filter-based synapses as a means of enhancing the stability of developmental patterns of synaptic connectivity in stochastic models of synaptic plasticity (Elliott 2008; Elliott and Lagogiannis 2009).

Understanding the interaction between sparseness and synaptic complexity in palimpsest memory models is therefore crucial (Leibold and Kempter 2008; Rubin and Fusi 2007). Taken at face-value, our results for SNR single-perceptron memory lifetimes support two conclusions. First, optimal single-perceptron SNR lifetimes, optimised with respect to sparseness, require lower synaptic complexity for longer optimal lifetimes. Second, optimal single-perceptron SNR lifetimes, optimised instead with respect to synaptic complexity, again require lower synaptic complexity as sparseness increases for longer optimal lifetimes. These conclusions hold regardless of the level of spontaneous activity, although spontaneous activity can prevent the decrease in optimal single-perceptron memory lifetimes. These conclusions appear to argue in favour of reduced synaptic complexity in real neurons in the presence of sparse population coding, at least at the single-neuron level. However, at the population level, the first of these conclusions is overturned, at least for filter and serial synapses. Critically, even in the presence of low but nonzero levels of spontaneous activity, optimal population SNR lifetimes, optimised with respect to sparseness, increase rather than decrease with synaptic complexity, for filter and serial synapses but not for cascade synapses. At a population level, sparseness, synaptic complexity and, crucially, nonzero spontaneous activity interact to promote increased optimal population SNR memory lifetimes. It is remarkable that non-cascade complex synapse models therefore appear to require the existence of spontaneous activity in a population setting with sparse population coding.

In reaching these conclusions, we have employed two superficially rather different memory storage protocols. First, the Hebb protocol uses two-level inputs \(\xi _i \in \{ \zeta , 1 \}\), while the Hopfield protocol uses four-level inputs \(\xi _i \in \{ -1 , -\zeta ,\) \(+\zeta , +1 \}\), although in stripping out spontaneous activity, the latter reduces to the standard conventions of the Hopfield model with its two-level inputs \(\xi _i \in \{ -1 , +1 \}\). However, because we have considered binary strength synapses with \(S_i \in \{ -1, +1 \}\), the effective contributions of these two sets of inputs to a perceptron’s activation are identical: in both protocols, \(\xi _i S_i\) takes the same four possible values. Second, the Hebb protocol uses the cue and target sub-populations approach to determine the direction of synaptic plasticity, while the Hopfield protocol uses the standard Hopfield rule governed by the product of evoked pre- and postsynaptic activity. However, in both protocols, synapses experience identical potentiating and depressing plasticity induction signals at the same separate rates, rfg/2. Furthermore, in both protocols, these induction signals are produced by imposing a pattern of electrical activity on the sub-population of active neurons during memory storage, rather than by allowing neurons’ activities to be generated via direct, afferent synaptic drive. Both protocols therefore implicitly assume executive control of memory storage by other brain regions (see, for example, Eichenbaum and Cohen 2001). These two differences are indeed therefore just superficial, and this is reflected in the fact that the mean activation \(\mu (t)\) evolves identically under both protocols. The real difference between the Hebb and Hopfield protocols does not reside in these matters of convention and definition. Rather, it resides in the fact that an active perceptron’s synapses with active inputs experience either only potentiating or only depressing induction signals during memory storage under the Hebb protocol, while in the Hopfield protocol some experience potentiating and others depressing induction signals. This difference gives rise to the Hebb protocol’s complicated equilibrium structure, with its nonzero pairwise and higher-order synaptic correlation functions. Remove this higher-order structure, and the two protocols would have identical statistics for perceptron activation. Indeed, in the limit of small fgN in which at most one of a perceptron’s synapses experiences a plasticity induction signal during memory storage, the dynamical difference between the two protocols vanishes and their statistical structures become identical.

Two earlier studies have considered memory lifetimes in complex models of synaptic plasticity in the presence of sparse population coding (Leibold and Kempter 2008; Rubin and Fusi 2007). Leibold and Kempter (2008) used the cue-target protocol that we have adapted and referred to as the Hebb protocol. They employed synaptic strengths \(S_i \in \{ 0, 1 \}\) rather than our \(S_i \in \{ -1, +1 \}\), although this difference is unimportant because it just amounts to an effective re-definition of the firing threshold \(\vartheta \) (Elliott and Lagogiannis 2012). They also employed two-level activities, but with \(\xi _i \in \{ 0, 1 \}\), so without considering the possible influence of nonzero spontaneous activity, \(\zeta > 0\), on memory lifetimes. Rubin and Fusi (2007) used the Hopfield protocol with two-level activities, \(\xi _i \in \{ -1, +1 \}\), interpreting \(\xi _i = -1\) as spontaneous activity and \(\xi _i = +1\) as evoked activity, and stressed the importance of considering the impact of spontaneous activity on memory lifetimes. We have modelled spontaneous activity in the Hopfield protocol by moving to four-level inputs, but as indicated, this approach is essentially equivalent to two-level inputs for synapses with \(S_i \in \{ -1 , +1 \}\) in terms of the overall statistical structure of perceptron activation. However, by using four activity levels, we are able to consider varying \(\zeta \) over its allowed range in order to explore the impact of different degrees of spontaneous activity on memory lifetimes. A significant difference between our approach and that of Rubin and Fusi (2007) is that we do not allow spontaneous activity to induce synaptic plasticity, a position that we consider to be mandated by a broadly BCM (Bienenstock et al. 1982) view of synaptic plasticity, as discussed earlier. Finally, our respective definitions of the memory signal, from which SNR memory lifetimes are obtained, differ in a population setting. Rubin and Fusi (2007) define this signal over the entire population of neurons, while we define it over only that sub-population of neurons that are directly involved in memory storage (or the equivalent of Leibold and Kempter (2008)’s target sub-population). This difference leads to different scaling behaviours of optimal population SNR memory lifetimes as a function of the sparseness of the population coding.

The difference between the scaling behaviours of optimal SNR memory lifetimes (optimised with respect to sparseness) in the single-perceptron and population cases is intriguing. Furthermore, the role of even very small levels of spontaneous activity in enhancing optimal population SNR lifetimes with increasing synaptic complexity in non-cascade models is fascinating. However, we have cautioned against over-interpreting results from an SNR analysis of memory lifetimes. This analysis depends on the distribution of \(h_0\) being tightly concentrated around its supra-threshold mean. We have shown in earlier work that this requirement is often not satisfied, and that a FPT approach is required to examine memory lifetimes away from this regime (Elliott 2016a, 2017a, 2020). Here, for simple synapses, we have explicitly seen that the single-perceptron SNR analysis breaks down in the limit of small f, and so it cannot probe the very sparse coding regime. The explanation for this failure is straightforward: as f is reduced, the initial SNR \(\mu (0) / \sigma (0)\) reduces, and below some threshold value of f the SNR validity condition \(\mu (0) / \sigma (0) \gtrapprox 2\) fails. For a single perceptron, we saw that this condition is \(N f^2 p^2 / \left[ f + (1-f) \zeta ^2 \right] \gtrapprox 4\) (for either protocol). Plugging in \(f^{\mathrm {opt}}\) for SU synapses with \(\zeta = 0\) and \(\zeta = 1\), this condition becomes \(\sqrt{e} \gtrapprox 4\) and \(e \gtrapprox 4\), respectively, where we saw the former case earlier. Both conditions are violated, although with spontaneous activity the violation is not so great. Although we have not extended our FPT analysis of filter-based synapses (Elliott 2017a, 2020) to the sparse coding regime considered here, the same issues arise with complex synapses. Therefore, we fully expect single-perceptron SNR optimality conditions to be violated for complex synapses, too.

Whether population SNR optimality conditions are violated, in either simple or complex models, is unclear. We would need to extend our single-perceptron FPT analysis to a population setting. Furthermore, this extended analysis would need to be reducible to the population SNR analysis with its rather coarse approximation that neurons’ activities evolve independently, despite synaptic coupling. However, it is extremely tempting to speculate that the simple synapse condition for population SNR validity is just the obvious generalisation, namely \(\mu _p(0) / \sigma _p(0) \gtrapprox 2\). Using the population results for \(f^{\mathrm {opt}}\) for simple synapses, this condition becomes the false \(e \gtrapprox 4\) for \(\zeta = 0\) and the true \(e^{3/2} \gtrapprox 4\) for \(\zeta = 1\). It is thus quite remarkable that if this speculation is borne out by a more careful analysis, then optimal population SNR memory lifetimes for simple synapses are valid in the presence of spontaneous activity.