Skip to main content

Attention and Memory

Bridge from the previous chapter

In the Unconscious we saw that the boundary between conscious and unconscious is determined by the Gap-structure — the opacity of channels. But what governs this boundary in real time? Two mechanisms: attention (redistribution of coherence 'here and now') and memory (the influence of the past through the non-Markovian kernel). Attention decides what enters the focus of consciousness. Memory determines how long it remains accessible.

On notation

In this document:

  • Γ\Gammacoherence matrix, γij\gamma_{ij} — its elements
  • Tr(Γ)=1\mathrm{Tr}(\Gamma) = 1 — normalisation (trace condition)
  • P=Tr(Γ2)P = \mathrm{Tr}(\Gamma^2)purity (viability)
  • γAX\gamma_{AX} — coherences between dimension AA (articulation/attention) and other dimensions XX
  • K(τ)K(\tau) — memory kernel (non-Markovian dynamics)
  • HeffH_{\text{eff}} — effective Hamiltonian (evolution of Γ)
  • RRreflection measure
  • Full notation table — see Notation
Document status

Definitions of attention and memory via the structure of Γ\Gamma[D] (definitions by convention). Memory typology via forms of the kernel K(τ)K(\tau)[C] (conditional on non-Markovian dynamics of coherences). Phenomenological interpretations — [I].

Chapter roadmap

  1. Historical perspective — from William James through filter models to UHM
  2. Attention — definition, spotlight mechanism, three types, connection to Gap
  3. Historical perspective on memory — from Ebbinghaus through multi-level models to UHM
  4. Memory — four types from the form of kernel K(τ)K(\tau): sensory, working, long-term, procedural
  5. Forgetting — two mechanisms (kernel decoherence and Gap increase)
  6. Integration — how attention, memory and Gap form a unified system

1. Historical perspective: attention

1.1 William James (1890)

"Everyone knows what attention is. It is the taking possession by the mind, in clear and vivid form, of one out of what seem several simultaneously possible objects or trains of thought."

— William James, The Principles of Psychology (1890), ch. 11

James formulated the key intuition: attention is a selection from a set of possible contents. This intuition translates directly into the UHM formalism: 'the set of possible objects' = the set of coherences {γAX}\{\gamma_{AX}\}; 'taking possession of one' = increasing γAEtarget|\gamma_{AE_{\text{target}}}| at the expense of the others.

1.2 Filter models (1950–1970s)

Broadbent (1958): early selection filter. Information passes through a narrow 'bottleneck' — only one channel is fully processed, the rest are blocked. In UHM: γAEtargetγAEdistractor|\gamma_{AE_{\text{target}}}| \gg |\gamma_{AE_{\text{distractor}}}| — hard filtering.

Treisman (1964): attenuation model. Non-target channels are not fully blocked but attenuated. In UHM: γAEdistractor>0|\gamma_{AE_{\text{distractor}}}| > 0, but γAEdistractorγAEtarget|\gamma_{AE_{\text{distractor}}}| \ll |\gamma_{AE_{\text{target}}}| — soft filtering. This explains the 'cocktail party effect': you can hear your own name in a nearby conversation because the distractor channel is not fully blocked.

Deutsch and Deutsch (1963): late selection filter. All information is fully processed; selection occurs at the response stage. In UHM: all γAX|\gamma_{AX}| are moderate; selection occurs via channel (A,D)(A,D) — attention influences action (DD), not perception (EE).

1.3 Posner: components of attention (1980–1990s)

Michael Posner identified three neural 'attention networks':

  • Alerting (vigilance) — maintenance of the tonic level γAA\gamma_{AA}
  • Orienting (orientation) — redirecting coherence: γAE1γAE2\gamma_{AE_1} \to \gamma_{AE_2}
  • Executive (executive control) — resolving conflict between channels

In UHM, all three networks are described by the unified mechanism of redistribution of A-sector coherences.

1.4 From classical models to UHM

Classical modelUHM formalism
Broadbent's filter$
Treisman's attenuation$
Late selectionAll $
Alerting (Posner)γAA\gamma_{AA} — tonic level
Orienting (Posner)γAE1γAE2\gamma_{AE_1} \to \gamma_{AE_2} — redistribution
Executive (Posner)Resolution of $

2. Attention as redistribution of coherence

2.1 Definition

Definition (Attention) [D]

Attention to a pair of dimensions (i,j)(i,j) is a temporary increase in the modulus of coherence γAX|\gamma_{AX}| with a simultaneous decrease in other γAY|\gamma_{AY}| (YXY \neq X), constrained by the normalisation Tr(Γ)=1\mathrm{Tr}(\Gamma) = 1:

Attention to X:γAX(τ)YXγAY(τ)\text{Attention to } X: \quad |\gamma_{AX}|(\tau) \uparrow \quad \Rightarrow \quad \sum_{Y \neq X} |\gamma_{AY}|(\tau) \downarrow

More formally: attention is a unitary (or near-unitary) transformation of Γ\Gamma that redistributes coherence from A-sector channels into the target channel (A,X)(A,X).

2.2 The 'spotlight' mechanism: detailed derivation

Why does increasing one coherence inevitably lead to decreasing others? This is a direct consequence of trace normalisation.

Step 1. Trace condition: Tr(Γ)=k=17γkk=1\mathrm{Tr}(\Gamma) = \sum_{k=1}^{7} \gamma_{kk} = 1.

Step 2. Cauchy-Schwarz inequality for each element:

γAX2γAAγXX|\gamma_{AX}|^2 \leq \gamma_{AA} \cdot \gamma_{XX}

Step 3. The sum of all A-sector coherences is bounded:

XAγAX2γAAXAγXX=γAA(1γAA)\sum_{X \neq A} |\gamma_{AX}|^2 \leq \gamma_{AA} \cdot \sum_{X \neq A} \gamma_{XX} = \gamma_{AA} \cdot (1 - \gamma_{AA})

The right-hand side is a fixed constant for a given γAA\gamma_{AA}. Consequently, the sum of squared moduli is bounded, and increasing one term requires decreasing at least one other. This is the 'spotlight' mechanism.

Everyday analogy. You have a limited attention budget — like a limited amount of water in a bucket. You can water one bed abundantly (focused attention on γAE\gamma_{AE}) and leave the others dry. Or water all of them a little (distributed attention). But the total volume of water is fixed — that is Tr(Γ)=1\mathrm{Tr}(\Gamma) = 1. You cannot 'create' more attention, you can only redistribute it.

Numerical example: attention budget. Let γAA=0.15\gamma_{AA} = 0.15 (the fraction of attention in the total 'energy'). Then:

XAγAX20.15(10.15)=0.1275\sum_{X \neq A} |\gamma_{AX}|^2 \leq 0.15 \cdot (1 - 0.15) = 0.1275

This is the 'budget'. It can be distributed as follows:

| Scenario | γAE|\gamma_{AE}| | γAS|\gamma_{AS}| | γAD|\gamma_{AD}| | γAL|\gamma_{AL}| | Sum 2|\cdot|^2 | SNR for EE | |----------|:---:|:---:|:---:|:---:|:---:|:---:| | Full focus | 0.120.12 | 0.010.01 | 0.010.01 | 0.010.01 | 0.01470.0147 | 36.036.0 | | Moderate focus | 0.100.10 | 0.040.04 | 0.030.03 | 0.030.03 | 0.01340.0134 | 3.73.7 | | Distributed | 0.060.06 | 0.060.06 | 0.060.06 | 0.060.06 | 0.01440.0144 | 1.01.0 |

With full focus, the SNR (signal-to-noise ratio) for the target channel EE is 36 — excellent 'reception'. With distributed attention, SNR = 1 — on the detection threshold. This is why attempting to simultaneously read, listen, and think about a third thing results in none of the activities being performed well.

2.3 Connection to the 21-pair taxonomy of qualia

From the qualia table:

  • γAE\gamma_{AE}Apperception (quale #4): discrimination that has entered interiority
  • γAS\gamma_{AS}Morphogenesis (quale #1): crystallisation of forms
  • γAD\gamma_{AD}Actualisation (quale #2): actualisation of distinction in process
  • γAL\gamma_{AL}Predication (quale #3): distinction as logical predicate

The direction of attention = the choice of which qualitative type dominates. When you contemplate a painting, γAS\gamma_{AS} dominates (forms); when listening to an argument — γAL\gamma_{AL} (logic); when meditating — γAE\gamma_{AE} (pure awareness).

2.4 Types of attention

Theorem (Types of attention from normalisation) [D]

From the normalisation Tr(Γ)=1\mathrm{Tr}(\Gamma) = 1 and the Cauchy-Schwarz inequality γAX2γAAγXX|\gamma_{AX}|^2 \leq \gamma_{AA} \cdot \gamma_{XX} three attention modes follow:

(a) Selective (focused) attention:

γAEtarget,γAEdistractor|\gamma_{AE_{\text{target}}}| \uparrow, \quad |\gamma_{AE_{\text{distractor}}}| \downarrow

One target channel is amplified at the expense of the others. Signal-to-noise ratio:

SNR=γAEtarget2XEtargetγAX2\mathrm{SNR} = \frac{|\gamma_{AE_{\text{target}}}|^2}{\sum_{X \neq E_{\text{target}}} |\gamma_{AX}|^2}

Example: reading a book in a noisy room. γAL|\gamma_{AL}| \uparrow (text), γAS|\gamma_{AS}| \downarrow (visual distractors), γAD|\gamma_{AD}| \downarrow (background sounds).

(b) Sustained attention:

Maintaining elevated γAE|\gamma_{AE}| over the interval [τ0,τ0+Δτ][\tau_0, \tau_0 + \Delta\tau]:

γAE(τ)γAEthτ[τ0,τ0+Δτ]|\gamma_{AE}(\tau)| \geq |\gamma_{AE}|_{\text{th}} \quad \forall\, \tau \in [\tau_0, \tau_0 + \Delta\tau]

Energy cost — maintaining γAA\gamma_{AA} against dissipation. Over time γAA\gamma_{AA} decreases due to dissipation, and attention 'tires' — active maintenance is required (effort of will).

Example: driving a car on a long straight road. γAE>θ|\gamma_{AE}| > \theta must be maintained continuously, which requires constant 'expenditure' of γAA\gamma_{AA}.

(c) Distributed (divided) attention:

Several γAXk|\gamma_{AX_k}| are simultaneously elevated, but each is lower than under focused attention:

kγAXk2γAAkγXkXkγAXk<γAXkfocus\sum_k |\gamma_{AX_k}|^2 \leq \gamma_{AA} \cdot \sum_k \gamma_{X_k X_k} \quad \Rightarrow \quad |\gamma_{AX_k}| < |\gamma_{AX_k}|_{\text{focus}}

Consequence: distributed attention is inevitably weaker than focused attention for each individual channel — a direct consequence of normalisation.

Example: driving while simultaneously talking on the phone. γAS|\gamma_{AS}| (road) and γAL|\gamma_{AL}| (conversation) are both elevated, but each is lower than under focused attention. Research shows a 30–50% reduction in reaction speed — a direct spotlight effect.

2.5 Attention and Gap

Directing attention at channel (i,j)(i,j) can reduce Gap(i,j)\mathrm{Gap}(i,j) — this is the mechanism underlying meditative practices:

Gap(i,E)γAE<0\frac{\partial\,\mathrm{Gap}(i,E)}{\partial |\gamma_{AE}|} < 0

Motivation. Why does attention reduce Gap? Formally: increasing γAE|\gamma_{AE}| intensifies the information flow between AA and EE. The intensified flow allows the φ\varphi-operator to more accurately 'see' the state of channel (i,E)(i,E). A more accurate self-model leads to detection of misalignment (Gap), and detected misalignment triggers correction (Gap-reduction).

Numerical example. A mindfulness practitioner directs attention at the breath (channel SES \to E):

| Time | γAE|\gamma_{AE}| | Gap(S,E)\mathrm{Gap}(S,E) | Gap(D,E)\mathrm{Gap}(D,E) | Subjective experience | |-------|:---:|:---:|:---:|:---| | τ=0\tau = 0 | 0.080.08 | 0.400.40 | 0.450.45 | Distracted | | τ=5\tau = 5 min | 0.150.15 | 0.300.30 | 0.420.42 | "Starting to feel the breath" | | τ=15\tau = 15 min | 0.200.20 | 0.180.18 | 0.350.35 | "I see tension in the body" | | τ=30\tau = 30 min | 0.220.22 | 0.120.12 | 0.250.25 | "I notice emotions as bodily sensations" |

Strengthening the attention–experience channel correlates with a reduction in opacity in E-sector channels. This formalises the intuition: 'that to which attention is directed becomes more transparent'. Formally, this means that attention is one of the mechanisms by which content transitions from the unconscious to the conscious.


3. Historical perspective: memory

3.1 Hermann Ebbinghaus (1885)

Ebbinghaus was the first researcher to apply the experimental method to the study of memory. His main discoveries:

  • Forgetting curve: information is forgotten according to a power law — quickly at first, then ever more slowly. Ebbinghaus approximated this as b(τ)τβb(\tau) \sim \tau^{-\beta}, β0.3\beta \approx 0.3.
  • Learning curve: repetition improves retention, but with diminishing returns.
  • Spacing effect: distributed repetition is more effective than massed practice.

In the UHM formalism, Ebbinghaus's forgetting curve is a direct consequence of the power-law kernel K(τ)ταK(\tau) \sim \tau^{-\alpha} (section 4.5).

3.2 Atkinson and Shiffrin (1968): modal model

The 'three-store' model:

  • Sensory register — instantaneous snapshot (duration ~250 ms)
  • Short-term (working) memory — 7 ± 2 items, duration ~20 s
  • Long-term memory — virtually unlimited capacity and duration

In UHM, these three 'stores' are not separate systems, but three forms of the same kernel K(τ)K(\tau):

Atkinson-Shiffrin modelUHM formalism
Sensory registerK(τ)δ(τ)K(\tau) \sim \delta(\tau) — Markovian limit
Working memoryK(τ)eτ/τWMK(\tau) \sim e^{-\tau/\tau_{WM}} — exponential kernel
Long-term memoryK(τ)ταK(\tau) \sim \tau^{-\alpha} — power-law kernel

3.3 Tulving (1972): types of memory

Endel Tulving introduced the distinction:

  • Episodic memory — memory of specific events ('I was at the café yesterday')
  • Semantic memory — knowledge of facts ('Paris is the capital of France')
  • Procedural memory — skills ('how to ride a bicycle')

In UHM:

  • Episodic and semantic memory differ in the shape of the power-law kernel (different values of α\alpha)
  • Procedural memory is fundamentally different — it is embedded in HeffH_{\text{eff}} (section 4.6)

4. Types of memory from the non-Markovian kernel

4.1 The memory kernel and cognitive memory

Non-Markovian dynamics describes coherences with memory: the current evolution γij(τ)\gamma_{ij}(\tau) depends on the entire preceding history through the kernel K(τs)K(\tau - s):

dγijdτ=iΔωijγij(τ)+0τKij(τs)γij(s)ds+Rij\frac{d\gamma_{ij}}{d\tau} = -i\Delta\omega_{ij}\,\gamma_{ij}(\tau) + \int_0^\tau K_{ij}(\tau - s)\, \gamma_{ij}(s)\, ds + \mathcal{R}_{ij}

(see Gap-dynamics, section 4)

Let us examine each term:

  • iΔωijγij(τ)-i\Delta\omega_{ij}\,\gamma_{ij}(\tau) — free evolution (phase accumulation)
  • 0τKij(τs)γij(s)ds\int_0^\tau K_{ij}(\tau - s)\, \gamma_{ij}(s)\, dsmemory: the influence of all past states. This is a convolutional integral — the current state depends on a weighted sum of past states
  • Rij\mathcal{R}_{ij} — regeneration (operator ℛ)

The form of the kernel K(τ)K(\tau) determines the type of cognitive memory. Analogy: K(τ)K(\tau) is a 'filter of the past'. Delta function = 'I remember only the present'. Exponential = 'I remember recent events, but forget quickly'. Power function = 'I remember for a long time, I forget slowly'.

4.2 Memory typology

Definition (Types of memory) [C]

Condition: non-Markovian dynamics of coherences. Four types of memory are defined by the form of the kernel K(τ)K(\tau):

Memory typeKernel K(τ)K(\tau)CharacteristicTime scale
SensoryK(τ)δ(ττ)K(\tau) \sim \delta(\tau - \tau')Instantaneous, no persistenceτmem0\tau_{\text{mem}} \to 0
WorkingK(τ)eτ/τWMK(\tau) \sim e^{-\tau/\tau_{WM}}Exponential decayτWM\tau_{WM} \sim seconds
Long-termK(τ)(τ)αK(\tau) \sim (\tau)^{-\alpha}, α(0,1)\alpha \in (0,1)Power-law decay, slow fadingτmem\tau_{\text{mem}} \to \infty
ProceduralEmbedded in HeffH_{\text{eff}}Structure of evolutionUnbounded

4.3 Sensory memory

Ksens(τ)=Γ2δ(τ)K_{\text{sens}}(\tau) = -\Gamma_2 \cdot \delta(\tau)

Markovian limit — no memory. The current coherence state is determined only by current conditions. Physical analogue: an instantaneous sensory imprint that disappears when the stimulus ceases.

Motivation. Why introduce 'zero memory' as a separate type? Because it is the limiting case required for completeness of the classification. The delta function δ(τ)\delta(\tau) means: 'the past does not influence the present'. Formally: the convolution integral degenerates:

0τKsens(τs)γij(s)ds=Γ2γij(τ)\int_0^\tau K_{\text{sens}}(\tau - s)\, \gamma_{ij}(s)\, ds = -\Gamma_2\, \gamma_{ij}(\tau)

— simply exponential decay with constant Γ2\Gamma_2.

Analogy. Sensory memory is like a fingerprint on a fogged-up window: it exists while the finger is on the glass and disappears instantly. In the formalism: kernel K=δK = \delta, there is no 'tail' — the past does not influence the present.

Numerical example. Iconic memory (visual sensory buffer): at Γ2=4\Gamma_2 = 4 Hz, the half-life is τ1/2=ln2/Γ2170\tau_{1/2} = \ln 2 / \Gamma_2 \approx 170 ms. Sperling's experiment (1960): subjects remembered up to 12 letters for ~300 ms, then complete loss. Prediction: τmem1/Γ2=250\tau_{\text{mem}} \propto 1/\Gamma_2 = 250 ms — consistent with the data.

4.4 Working memory

KWM(τ)=Γ2ωceωcτ,τWM=1/ωcK_{WM}(\tau) = -\Gamma_2 \omega_c \cdot e^{-\omega_c \tau}, \quad \tau_{WM} = 1/\omega_c

Exponential kernel — the standard model from non-Markovian dynamics.

Detailed derivation. By Theorem 5.1 of Gap-dynamics, at finite ωc\omega_c the solution of the convolution equation contains damped oscillations:

γij(τ)eγτcos(ωrτ),ωr=ωcΓ2γ2\gamma_{ij}(\tau) \propto e^{-\gamma\tau} \cos(\omega_r \tau), \quad \omega_r = \sqrt{\omega_c \Gamma_2 - \gamma^2}

where γ\gamma is the decay rate, ωr\omega_r is the oscillation frequency (refresh rate).

Interpretation of oscillations. Coherence does not simply decay, but oscillates: the subject 'returns' to the content before its final disappearance. Each oscillation cycle is one 'run' of working memory (subvocal rehearsal, visual revision). While γij(τ)>εmin|\gamma_{ij}(\tau)| > \varepsilon_{\min}, content is 'held'; when damping prevails — content is lost.

Numerical example (detailed). Holding a phone number:

  • τWM=1/ωc=5\tau_{WM} = 1/\omega_c = 5 s (typical working memory duration without rehearsal)
  • Γ2=0.3\Gamma_2 = 0.3 s1^{-1} (decoherence rate)
  • ωc=0.2\omega_c = 0.2 Hz (kernel frequency)
  • Refresh frequency: ωr=0.2×0.3γ20.15\omega_r = \sqrt{0.2 \times 0.3 - \gamma^2} \approx 0.15 Hz at γ=0.1\gamma = 0.1 s1^{-1}
  • During holding (5 s): ωr×50.75\omega_r \times 5 \approx 0.75 cycles — ~6 'runs'

This is consistent with data on subvocal rehearsal: internally articulating the number at ~2 syllables/s, one can complete ~6 rehearsals of a 7-digit number in 5 seconds.

Interpretation [I]

Working memory oscillations correspond to 'cycling through' the content: coherence does not simply decay but oscillates — the subject 'returns' to the content before its final disappearance. The frequency ωr\omega_r determines the 'refresh rate' of working memory.

Neurophysiological correlate: gamma oscillations (30–80 Hz) in the prefrontal cortex during information maintenance in working memory. These oscillations are the neural implementation of ωr\omega_r.

4.5 Long-term memory

KLTM(τ)Γ2τα,0<α<1K_{LTM}(\tau) \sim -\Gamma_2 \cdot \tau^{-\alpha}, \quad 0 < \alpha < 1

Power-law decay — the kernel decreases more slowly than an exponential. This is a 'heavy tail': information is preserved indefinitely, though the intensity gradually falls.

Motivation. Why specifically a power law? Exponential decay (eτ/τ0e^{-\tau/\tau_0}) implies a characteristic scale τ0\tau_0: information 'lives' for approximately τ0\tau_0, then disappears. But empirical data show that memory has no characteristic scale — forgetting does not speed up or slow down at a particular time horizon. The power law τα\tau^{-\alpha} is the only function without a characteristic scale (scale invariance).

Theorem (Power law of forgetting) [C]

Condition: power-law kernel K(τ)ταK(\tau) \sim \tau^{-\alpha}. The coherence amplitude under a power-law kernel decays as:

γij(τ)γij(0)τβ,β=α2|\gamma_{ij}(\tau)| \sim |\gamma_{ij}(0)| \cdot \tau^{-\beta}, \quad \beta = \frac{\alpha}{2}

This reproduces the Ebbinghaus forgetting curve at α0.5\alpha \approx 0.50.70.7 (empirical result β0.25\beta \approx 0.250.350.35).

Argument. Laplace image K^(s)sα1\hat{K}(s) \sim s^{\alpha - 1} at α<1\alpha < 1 (fractional operator). The convolution equation dγ/dτ=0τK(τs)γ(s)dsd\gamma/d\tau = \int_0^\tau K(\tau-s)\gamma(s)ds in Laplace representation: sγ^=K^γ^s\hat{\gamma} = \hat{K} \cdot \hat{\gamma}, solution: γ^(s)s1s1α=sα\hat{\gamma}(s) \sim s^{-1} \cdot s^{1-\alpha} = s^{-\alpha}. Inverse transform: γ(τ)τα1\gamma(\tau) \sim \tau^{\alpha-1}. Accounting for the initial condition and factor Γ2\Gamma_2: γ(τ)γ(0)τα/2|\gamma(\tau)| \sim |\gamma(0)| \cdot \tau^{-\alpha/2}, i.e. β=α/2\beta = \alpha/2.

Numerical example: Ebbinghaus forgetting curve. A learned poem with initial coherence γij(0)=0.30|\gamma_{ij}(0)| = 0.30 and α=0.6\alpha = 0.6 (β=0.3\beta = 0.3):

| Time τ\tau | γij(τ)|\gamma_{ij}(\tau)| | Fraction of initial | Subjectively | |:---:|:---:|:---:|:---| | 1 day | 0.3010.3=0.300.30 \cdot 1^{-0.3} = 0.30 | 100% | Remember well | | 7 days | 0.3070.30.170.30 \cdot 7^{-0.3} \approx 0.17 | 56% | Remember the main points | | 30 days | 0.30300.30.100.30 \cdot 30^{-0.3} \approx 0.10 | 34% | Remember individual stanzas | | 365 days | 0.303650.30.050.30 \cdot 365^{-0.3} \approx 0.05 | 17% | Remember the theme, individual lines | | 10 years (36503650 days) | 0.3036500.30.0250.30 \cdot 3650^{-0.3} \approx 0.025 | 8% | Vague recollection | | 50 years (1825018250 days) | 0.30182500.30.0150.30 \cdot 18250^{-0.3} \approx 0.015 | 5% | Traces remain |

Note: even after 50 years γ=0.015>0|\gamma| = 0.015 > 0 — traces remain! This is a fundamental difference from exponential decay, under which after 50 years γe50/55×105|\gamma| \approx e^{-50/5} \approx 5 \times 10^{-5} — practically zero. The power-law tail explains why elderly people remember events from 60 years ago — the 'tail' of the kernel decays slowly.

4.6 Procedural memory

Procedural memory:KHeff\text{Procedural memory:} \quad K \hookrightarrow H_{\text{eff}}

Procedural memory is not a kernel in the coherence equation, but the structure of the Hamiltonian itself HeffH_{\text{eff}}. A skill is 'encoded' in the parameters of evolution: frequencies ωi\omega_i, coupling constants, Lindblad operators.

Motivation. Why does procedural memory differ so fundamentally from the others? Because it stores not content (coherence γij\gamma_{ij}), but a rule (how γij\gamma_{ij} evolves). Declarative memory is the 'what', procedural is the 'how'.

Interpretation [I]

Procedural memory is fundamentally different from all other types: it does not decay, since it does not depend on the kernel K(τ)K(\tau), but is embedded in the mechanism of evolution itself. To 'forget' procedural memory = to change HeffH_{\text{eff}}, which requires a structural restructuring of the system, not merely the decoherence of individual coherences.

Analogy. Declarative memory (working + long-term) — notes on a blackboard that gradually fade. Procedural memory — the shape of the blackboard itself: you can erase all the notes, but the board will remain rectangular. The ability to ride a bicycle is 'encoded' not in the coherences γij\gamma_{ij} (which decay), but in the structure of HeffH_{\text{eff}} (which is restructured only under fundamental changes).

Another analogy: declarative memory — the source code of a program (data that can be deleted); procedural — the compiler (the tool that processes the data). The compiler can only be 'forgotten' by reinstalling the operating system.

Numerical example: why cycling is not forgotten. A person learned to ride a bicycle at age 7. By age 70:

Memory typeContentKernelAfter 63 years
Episodic"Dad was holding the handlebars"Kτ0.6K \sim \tau^{-0.6}$
Semantic"A bicycle has two wheels"Kτ0.4K \sim \tau^{-0.4}$
ProceduralRiding skillKHeffK \hookrightarrow H_{\text{eff}}Fully preserved

Procedural memory does not depend on the kernel — it is 'hardwired' into HeffH_{\text{eff}}. Neurophysiological correlate: procedural memory is stored in the cerebellum and basal ganglia, not the hippocampus (like declarative memory) — different neural substrates for different 'records'.


5. Forgetting as kernel decoherence

Definition (Forgetting) [D]

Forgetting — a decrease in the amplitude of the memory kernel K(τ)|K(\tau)| over time, leading to a weakening of the influence of past states on the current dynamics:

Forgetting:K(τ)0asτ\text{Forgetting:} \quad |K(\tau)| \to 0 \quad \text{as} \quad \tau \to \infty

In the Markovian limit (KδK \to \delta) forgetting is instantaneous. With a finite kernel — it is gradual.

5.1 Two mechanisms of forgetting

MechanismDescriptionFormulaReversibilityAnalogy
Kernel decoherenceK(τ)K(\tau) decreasesK(τ)0\lvert K(\tau)\rvert \to 0IrreversibleBook burned
Gap increaseCoherence opaqueGap(i,j)1\mathrm{Gap}(i,j) \to 1ReversibleBook locked in a safe

The distinction is fundamental: if content is 'forgotten' through kernel decoherence, recovery is impossible — information is physically lost. If — through Gap increase, the content is preserved in γij\gamma_{ij} but inaccessible (= in the unconscious). Therapy and meditation work with the second case.

Analogy (extended). Kernel decoherence — the book has burned: the text is lost forever, and no archaeologist can restore the letters from ash. Gap increase — the book is locked in a safe: the text is intact but inaccessible; the key can be picked (therapy), the safe forced open (crisis), or a spare key found (meditation). The difference is colossal for corrective strategies: there is no point in 'opening the safe' if the book has already burned.

Numerical example: two types of 'forgotten' phone number.

Case 1 (kernel decoherence): a number heard 5 years ago without being written down. KWM(τ)K_{WM}(\tau) has long decayed (τWM=5\tau_{WM} = 5 s), and the power-law kernel KLTMK_{LTM} as well: γ0.001|\gamma| \approx 0.001. Recovery is impossible.

Case 2 (Gap increase): a former partner's number, consciously 'forgotten' after a breakup. γLE=0.08|\gamma_{LE}| = 0.08 (coherence preserved — you 'know' the number), but Gap(L,E)=0.90\mathrm{Gap}(L,E) = 0.90 (consciously blocked). With an unexpected stimulus (meeting on the street) Gap can temporarily decrease — and the number is 'remembered'.

5.2 Forgetting rate and viability

Theorem (Forgetting and viability) [C]

Condition: non-Markovian dynamics. The forgetting rate (rate of decrease of K|K|) is bounded from below by the viability condition:

dKdτκPPcritK\frac{d|K|}{d\tau} \geq -\frac{\kappa}{P - P_{\text{crit}}} \cdot |K|

At PPcrit=2/7P \to P_{\text{crit}} = 2/7 forgetting accelerates without bound — a system at the edge of viability loses memory faster.

Derivation. Viability PP determines the 'resource' available for maintaining coherences. The closer PP is to PcritP_{\text{crit}}, the less resource for maintaining the kernel K(τ)K(\tau), and the faster it decays. Formally: the decay rate 1/(PPcrit)\propto 1/(P - P_{\text{crit}}), which gives a singularity at P=PcritP = P_{\text{crit}}.

Numerical example: cognitive decline.

StatePPPPcritP - P_{\text{crit}}Relative forgetting rateClinical analogue
Healthy adult0.360.360.0740.074×1\times 1 (baseline)Normal memory
Onset of decline0.330.330.0440.044×1.7\times 1.7Mild cognitive impairment
Moderate dementia0.300.300.0140.014×5.3\times 5.3Noticeable memory loss
Severe dementia0.290.290.0040.004×18.5\times 18.5Catastrophic
Critical0.28750.28750.00150.0015×49\times 49Loss of identity

This explains the clinical observation: in dementia, cognitive decline accelerates — slowly at first, then catastrophically. A small decrease in PP near PcritP_{\text{crit}} leads to a dramatic acceleration of forgetting. The formula 1/(PPcrit)\propto 1/(P - P_{\text{crit}}) reproduces this nonlinear pattern.


6. Integration: attention, memory and Gap

Key cycle: Attention (γAE|\gamma_{AE}| \uparrow) reduces Gap — content transitions from unconscious to conscious. Kernel decay (K0|K| \to 0) raises Gap — content recedes back into the unconscious. Working memory keeps content 'afloat' through oscillations. Procedural memory exits this cycle — it is embedded in the structure of the system.

6.1 Interaction of attention and memory

This cycle formalises the intuition of 'attention as the key to consciousness' and explains why mindfulness practices are therapeutically effective: systematically directing attention (γAE|\gamma_{AE}| \uparrow) gradually reduces Gap in the channels to which it is directed. With regular repetition, content transitions from working memory (Keτ/τWMK \sim e^{-\tau/\tau_{WM}}) to long-term (KταK \sim \tau^{-\alpha}), and the mindfulness skill — to procedural (KHeffK \hookrightarrow H_{\text{eff}}). For more detail — see the CC theorems (T-103, T-104).


What we learned

  1. Historical line of attention: James (1890, 'everyone knows what attention is') → Broadbent (1958, filter) → Treisman (1964, attenuation) → Posner (1980s, three networks) → UHM (redistribution of A-sector coherences)
  2. Attention = redistribution of coherence in the A-sector; three types (selective, sustained, distributed) follow from the normalisation Tr(Γ)=1\mathrm{Tr}(\Gamma) = 1
  3. Attention reduces Gap in target channels — formal justification for mindfulness practices
  4. Historical line of memory: Ebbinghaus (1885, forgetting curve) → Atkinson-Shiffrin (1968, three stores) → Tulving (1972, types of memory) → UHM (forms of kernel K(τ)K(\tau))
  5. Four types of memory are determined by the form of kernel K(τ)K(\tau): sensory (δ\delta), working (eτ/τWMe^{-\tau/\tau_{WM}}), long-term (τα\tau^{-\alpha}), procedural (HeffH_{\text{eff}})
  6. The Ebbinghaus forgetting curve is reproduced by the power-law kernel at α0.5\alpha \approx 0.50.70.7 [C]
  7. Two mechanisms of forgetting: kernel decoherence (irreversible, 'book burned') and Gap increase (reversible, 'book in a safe')
  8. At PPcritP \to P_{\text{crit}} forgetting accelerates by the law 1/(PPcrit)\propto 1/(P - P_{\text{crit}}) — formal explanation of cognitive decline in dementia
Bridge to the next chapter

Attention and memory are normal mechanisms of coherence control. But what happens when these mechanisms break down? Specific failures of the Gap-profile give rise to pathological states: alexithymia, dissociation, depression, psychosis. In the next chapter — Pathology of consciousness — we will show that each pathology = a characteristic Gap-pattern, and that therapy = targeted Gap-reduction.

Connections