Attention and Memory
In the Unconscious we saw that the boundary between conscious and unconscious is determined by the Gap-structure — the opacity of channels. But what governs this boundary in real time? Two mechanisms: attention (redistribution of coherence 'here and now') and memory (the influence of the past through the non-Markovian kernel). Attention decides what enters the focus of consciousness. Memory determines how long it remains accessible.
In this document:
- — coherence matrix, — its elements
- — normalisation (trace condition)
- — purity (viability)
- — coherences between dimension (articulation/attention) and other dimensions
- — memory kernel (non-Markovian dynamics)
- — effective Hamiltonian (evolution of Γ)
- — reflection measure
- Full notation table — see Notation
Definitions of attention and memory via the structure of — [D] (definitions by convention). Memory typology via forms of the kernel — [C] (conditional on non-Markovian dynamics of coherences). Phenomenological interpretations — [I].
Chapter roadmap
- Historical perspective — from William James through filter models to UHM
- Attention — definition, spotlight mechanism, three types, connection to Gap
- Historical perspective on memory — from Ebbinghaus through multi-level models to UHM
- Memory — four types from the form of kernel : sensory, working, long-term, procedural
- Forgetting — two mechanisms (kernel decoherence and Gap increase)
- Integration — how attention, memory and Gap form a unified system
1. Historical perspective: attention
1.1 William James (1890)
"Everyone knows what attention is. It is the taking possession by the mind, in clear and vivid form, of one out of what seem several simultaneously possible objects or trains of thought."
— William James, The Principles of Psychology (1890), ch. 11
James formulated the key intuition: attention is a selection from a set of possible contents. This intuition translates directly into the UHM formalism: 'the set of possible objects' = the set of coherences ; 'taking possession of one' = increasing at the expense of the others.
1.2 Filter models (1950–1970s)
Broadbent (1958): early selection filter. Information passes through a narrow 'bottleneck' — only one channel is fully processed, the rest are blocked. In UHM: — hard filtering.
Treisman (1964): attenuation model. Non-target channels are not fully blocked but attenuated. In UHM: , but — soft filtering. This explains the 'cocktail party effect': you can hear your own name in a nearby conversation because the distractor channel is not fully blocked.
Deutsch and Deutsch (1963): late selection filter. All information is fully processed; selection occurs at the response stage. In UHM: all are moderate; selection occurs via channel — attention influences action (), not perception ().
1.3 Posner: components of attention (1980–1990s)
Michael Posner identified three neural 'attention networks':
- Alerting (vigilance) — maintenance of the tonic level
- Orienting (orientation) — redirecting coherence:
- Executive (executive control) — resolving conflict between channels
In UHM, all three networks are described by the unified mechanism of redistribution of A-sector coherences.
1.4 From classical models to UHM
| Classical model | UHM formalism |
|---|---|
| Broadbent's filter | $ |
| Treisman's attenuation | $ |
| Late selection | All $ |
| Alerting (Posner) | — tonic level |
| Orienting (Posner) | — redistribution |
| Executive (Posner) | Resolution of $ |
2. Attention as redistribution of coherence
2.1 Definition
Attention to a pair of dimensions is a temporary increase in the modulus of coherence with a simultaneous decrease in other (), constrained by the normalisation :
More formally: attention is a unitary (or near-unitary) transformation of that redistributes coherence from A-sector channels into the target channel .
2.2 The 'spotlight' mechanism: detailed derivation
Why does increasing one coherence inevitably lead to decreasing others? This is a direct consequence of trace normalisation.
Step 1. Trace condition: .
Step 2. Cauchy-Schwarz inequality for each element:
Step 3. The sum of all A-sector coherences is bounded:
The right-hand side is a fixed constant for a given . Consequently, the sum of squared moduli is bounded, and increasing one term requires decreasing at least one other. This is the 'spotlight' mechanism.
Everyday analogy. You have a limited attention budget — like a limited amount of water in a bucket. You can water one bed abundantly (focused attention on ) and leave the others dry. Or water all of them a little (distributed attention). But the total volume of water is fixed — that is . You cannot 'create' more attention, you can only redistribute it.
Numerical example: attention budget. Let (the fraction of attention in the total 'energy'). Then:
This is the 'budget'. It can be distributed as follows:
| Scenario | | | | | Sum | SNR for | |----------|:---:|:---:|:---:|:---:|:---:|:---:| | Full focus | | | | | | | | Moderate focus | | | | | | | | Distributed | | | | | | |
With full focus, the SNR (signal-to-noise ratio) for the target channel is 36 — excellent 'reception'. With distributed attention, SNR = 1 — on the detection threshold. This is why attempting to simultaneously read, listen, and think about a third thing results in none of the activities being performed well.
2.3 Connection to the 21-pair taxonomy of qualia
From the qualia table:
- — Apperception (quale #4): discrimination that has entered interiority
- — Morphogenesis (quale #1): crystallisation of forms
- — Actualisation (quale #2): actualisation of distinction in process
- — Predication (quale #3): distinction as logical predicate
The direction of attention = the choice of which qualitative type dominates. When you contemplate a painting, dominates (forms); when listening to an argument — (logic); when meditating — (pure awareness).
2.4 Types of attention
From the normalisation and the Cauchy-Schwarz inequality three attention modes follow:
(a) Selective (focused) attention:
One target channel is amplified at the expense of the others. Signal-to-noise ratio:
Example: reading a book in a noisy room. (text), (visual distractors), (background sounds).
(b) Sustained attention:
Maintaining elevated over the interval :
Energy cost — maintaining against dissipation. Over time decreases due to dissipation, and attention 'tires' — active maintenance is required (effort of will).
Example: driving a car on a long straight road. must be maintained continuously, which requires constant 'expenditure' of .
(c) Distributed (divided) attention:
Several are simultaneously elevated, but each is lower than under focused attention:
Consequence: distributed attention is inevitably weaker than focused attention for each individual channel — a direct consequence of normalisation.
Example: driving while simultaneously talking on the phone. (road) and (conversation) are both elevated, but each is lower than under focused attention. Research shows a 30–50% reduction in reaction speed — a direct spotlight effect.
2.5 Attention and Gap
Directing attention at channel can reduce — this is the mechanism underlying meditative practices:
Motivation. Why does attention reduce Gap? Formally: increasing intensifies the information flow between and . The intensified flow allows the -operator to more accurately 'see' the state of channel . A more accurate self-model leads to detection of misalignment (Gap), and detected misalignment triggers correction (Gap-reduction).
Numerical example. A mindfulness practitioner directs attention at the breath (channel ):
| Time | | | | Subjective experience | |-------|:---:|:---:|:---:|:---| | | | | | Distracted | | min | | | | "Starting to feel the breath" | | min | | | | "I see tension in the body" | | min | | | | "I notice emotions as bodily sensations" |
Strengthening the attention–experience channel correlates with a reduction in opacity in E-sector channels. This formalises the intuition: 'that to which attention is directed becomes more transparent'. Formally, this means that attention is one of the mechanisms by which content transitions from the unconscious to the conscious.
3. Historical perspective: memory
3.1 Hermann Ebbinghaus (1885)
Ebbinghaus was the first researcher to apply the experimental method to the study of memory. His main discoveries:
- Forgetting curve: information is forgotten according to a power law — quickly at first, then ever more slowly. Ebbinghaus approximated this as , .
- Learning curve: repetition improves retention, but with diminishing returns.
- Spacing effect: distributed repetition is more effective than massed practice.
In the UHM formalism, Ebbinghaus's forgetting curve is a direct consequence of the power-law kernel (section 4.5).
3.2 Atkinson and Shiffrin (1968): modal model
The 'three-store' model:
- Sensory register — instantaneous snapshot (duration ~250 ms)
- Short-term (working) memory — 7 ± 2 items, duration ~20 s
- Long-term memory — virtually unlimited capacity and duration
In UHM, these three 'stores' are not separate systems, but three forms of the same kernel :
| Atkinson-Shiffrin model | UHM formalism |
|---|---|
| Sensory register | — Markovian limit |
| Working memory | — exponential kernel |
| Long-term memory | — power-law kernel |
3.3 Tulving (1972): types of memory
Endel Tulving introduced the distinction:
- Episodic memory — memory of specific events ('I was at the café yesterday')
- Semantic memory — knowledge of facts ('Paris is the capital of France')
- Procedural memory — skills ('how to ride a bicycle')
In UHM:
- Episodic and semantic memory differ in the shape of the power-law kernel (different values of )
- Procedural memory is fundamentally different — it is embedded in (section 4.6)
4. Types of memory from the non-Markovian kernel
4.1 The memory kernel and cognitive memory
Non-Markovian dynamics describes coherences with memory: the current evolution depends on the entire preceding history through the kernel :
(see Gap-dynamics, section 4)
Let us examine each term:
- — free evolution (phase accumulation)
- — memory: the influence of all past states. This is a convolutional integral — the current state depends on a weighted sum of past states
- — regeneration (operator ℛ)
The form of the kernel determines the type of cognitive memory. Analogy: is a 'filter of the past'. Delta function = 'I remember only the present'. Exponential = 'I remember recent events, but forget quickly'. Power function = 'I remember for a long time, I forget slowly'.
4.2 Memory typology
Condition: non-Markovian dynamics of coherences. Four types of memory are defined by the form of the kernel :
| Memory type | Kernel | Characteristic | Time scale |
|---|---|---|---|
| Sensory | Instantaneous, no persistence | ||
| Working | Exponential decay | seconds | |
| Long-term | , | Power-law decay, slow fading | |
| Procedural | Embedded in | Structure of evolution | Unbounded |
4.3 Sensory memory
Markovian limit — no memory. The current coherence state is determined only by current conditions. Physical analogue: an instantaneous sensory imprint that disappears when the stimulus ceases.
Motivation. Why introduce 'zero memory' as a separate type? Because it is the limiting case required for completeness of the classification. The delta function means: 'the past does not influence the present'. Formally: the convolution integral degenerates:
— simply exponential decay with constant .
Analogy. Sensory memory is like a fingerprint on a fogged-up window: it exists while the finger is on the glass and disappears instantly. In the formalism: kernel , there is no 'tail' — the past does not influence the present.
Numerical example. Iconic memory (visual sensory buffer): at Hz, the half-life is ms. Sperling's experiment (1960): subjects remembered up to 12 letters for ~300 ms, then complete loss. Prediction: ms — consistent with the data.
4.4 Working memory
Exponential kernel — the standard model from non-Markovian dynamics.
Detailed derivation. By Theorem 5.1 of Gap-dynamics, at finite the solution of the convolution equation contains damped oscillations:
where is the decay rate, is the oscillation frequency (refresh rate).
Interpretation of oscillations. Coherence does not simply decay, but oscillates: the subject 'returns' to the content before its final disappearance. Each oscillation cycle is one 'run' of working memory (subvocal rehearsal, visual revision). While , content is 'held'; when damping prevails — content is lost.
Numerical example (detailed). Holding a phone number:
- s (typical working memory duration without rehearsal)
- s (decoherence rate)
- Hz (kernel frequency)
- Refresh frequency: Hz at s
- During holding (5 s): cycles — ~6 'runs'
This is consistent with data on subvocal rehearsal: internally articulating the number at ~2 syllables/s, one can complete ~6 rehearsals of a 7-digit number in 5 seconds.
Working memory oscillations correspond to 'cycling through' the content: coherence does not simply decay but oscillates — the subject 'returns' to the content before its final disappearance. The frequency determines the 'refresh rate' of working memory.
Neurophysiological correlate: gamma oscillations (30–80 Hz) in the prefrontal cortex during information maintenance in working memory. These oscillations are the neural implementation of .
4.5 Long-term memory
Power-law decay — the kernel decreases more slowly than an exponential. This is a 'heavy tail': information is preserved indefinitely, though the intensity gradually falls.
Motivation. Why specifically a power law? Exponential decay () implies a characteristic scale : information 'lives' for approximately , then disappears. But empirical data show that memory has no characteristic scale — forgetting does not speed up or slow down at a particular time horizon. The power law is the only function without a characteristic scale (scale invariance).
Condition: power-law kernel . The coherence amplitude under a power-law kernel decays as:
This reproduces the Ebbinghaus forgetting curve at – (empirical result –).
Argument. Laplace image at (fractional operator). The convolution equation in Laplace representation: , solution: . Inverse transform: . Accounting for the initial condition and factor : , i.e. .
Numerical example: Ebbinghaus forgetting curve. A learned poem with initial coherence and ():
| Time | | Fraction of initial | Subjectively | |:---:|:---:|:---:|:---| | 1 day | | 100% | Remember well | | 7 days | | 56% | Remember the main points | | 30 days | | 34% | Remember individual stanzas | | 365 days | | 17% | Remember the theme, individual lines | | 10 years ( days) | | 8% | Vague recollection | | 50 years ( days) | | 5% | Traces remain |
Note: even after 50 years — traces remain! This is a fundamental difference from exponential decay, under which after 50 years — practically zero. The power-law tail explains why elderly people remember events from 60 years ago — the 'tail' of the kernel decays slowly.
4.6 Procedural memory
Procedural memory is not a kernel in the coherence equation, but the structure of the Hamiltonian itself . A skill is 'encoded' in the parameters of evolution: frequencies , coupling constants, Lindblad operators.
Motivation. Why does procedural memory differ so fundamentally from the others? Because it stores not content (coherence ), but a rule (how evolves). Declarative memory is the 'what', procedural is the 'how'.
Procedural memory is fundamentally different from all other types: it does not decay, since it does not depend on the kernel , but is embedded in the mechanism of evolution itself. To 'forget' procedural memory = to change , which requires a structural restructuring of the system, not merely the decoherence of individual coherences.
Analogy. Declarative memory (working + long-term) — notes on a blackboard that gradually fade. Procedural memory — the shape of the blackboard itself: you can erase all the notes, but the board will remain rectangular. The ability to ride a bicycle is 'encoded' not in the coherences (which decay), but in the structure of (which is restructured only under fundamental changes).
Another analogy: declarative memory — the source code of a program (data that can be deleted); procedural — the compiler (the tool that processes the data). The compiler can only be 'forgotten' by reinstalling the operating system.
Numerical example: why cycling is not forgotten. A person learned to ride a bicycle at age 7. By age 70:
| Memory type | Content | Kernel | After 63 years |
|---|---|---|---|
| Episodic | "Dad was holding the handlebars" | $ | |
| Semantic | "A bicycle has two wheels" | $ | |
| Procedural | Riding skill | Fully preserved |
Procedural memory does not depend on the kernel — it is 'hardwired' into . Neurophysiological correlate: procedural memory is stored in the cerebellum and basal ganglia, not the hippocampus (like declarative memory) — different neural substrates for different 'records'.
5. Forgetting as kernel decoherence
Forgetting — a decrease in the amplitude of the memory kernel over time, leading to a weakening of the influence of past states on the current dynamics:
In the Markovian limit () forgetting is instantaneous. With a finite kernel — it is gradual.
5.1 Two mechanisms of forgetting
| Mechanism | Description | Formula | Reversibility | Analogy |
|---|---|---|---|---|
| Kernel decoherence | decreases | Irreversible | Book burned | |
| Gap increase | Coherence opaque | Reversible | Book locked in a safe |
The distinction is fundamental: if content is 'forgotten' through kernel decoherence, recovery is impossible — information is physically lost. If — through Gap increase, the content is preserved in but inaccessible (= in the unconscious). Therapy and meditation work with the second case.
Analogy (extended). Kernel decoherence — the book has burned: the text is lost forever, and no archaeologist can restore the letters from ash. Gap increase — the book is locked in a safe: the text is intact but inaccessible; the key can be picked (therapy), the safe forced open (crisis), or a spare key found (meditation). The difference is colossal for corrective strategies: there is no point in 'opening the safe' if the book has already burned.
Numerical example: two types of 'forgotten' phone number.
Case 1 (kernel decoherence): a number heard 5 years ago without being written down. has long decayed ( s), and the power-law kernel as well: . Recovery is impossible.
Case 2 (Gap increase): a former partner's number, consciously 'forgotten' after a breakup. (coherence preserved — you 'know' the number), but (consciously blocked). With an unexpected stimulus (meeting on the street) Gap can temporarily decrease — and the number is 'remembered'.
5.2 Forgetting rate and viability
Condition: non-Markovian dynamics. The forgetting rate (rate of decrease of ) is bounded from below by the viability condition:
At forgetting accelerates without bound — a system at the edge of viability loses memory faster.
Derivation. Viability determines the 'resource' available for maintaining coherences. The closer is to , the less resource for maintaining the kernel , and the faster it decays. Formally: the decay rate , which gives a singularity at .
Numerical example: cognitive decline.
| State | Relative forgetting rate | Clinical analogue | ||
|---|---|---|---|---|
| Healthy adult | (baseline) | Normal memory | ||
| Onset of decline | Mild cognitive impairment | |||
| Moderate dementia | Noticeable memory loss | |||
| Severe dementia | Catastrophic | |||
| Critical | Loss of identity |
This explains the clinical observation: in dementia, cognitive decline accelerates — slowly at first, then catastrophically. A small decrease in near leads to a dramatic acceleration of forgetting. The formula reproduces this nonlinear pattern.
6. Integration: attention, memory and Gap
Key cycle: Attention () reduces Gap — content transitions from unconscious to conscious. Kernel decay () raises Gap — content recedes back into the unconscious. Working memory keeps content 'afloat' through oscillations. Procedural memory exits this cycle — it is embedded in the structure of the system.
6.1 Interaction of attention and memory
This cycle formalises the intuition of 'attention as the key to consciousness' and explains why mindfulness practices are therapeutically effective: systematically directing attention () gradually reduces Gap in the channels to which it is directed. With regular repetition, content transitions from working memory () to long-term (), and the mindfulness skill — to procedural (). For more detail — see the CC theorems (T-103, T-104).
What we learned
- Historical line of attention: James (1890, 'everyone knows what attention is') → Broadbent (1958, filter) → Treisman (1964, attenuation) → Posner (1980s, three networks) → UHM (redistribution of A-sector coherences)
- Attention = redistribution of coherence in the A-sector; three types (selective, sustained, distributed) follow from the normalisation
- Attention reduces Gap in target channels — formal justification for mindfulness practices
- Historical line of memory: Ebbinghaus (1885, forgetting curve) → Atkinson-Shiffrin (1968, three stores) → Tulving (1972, types of memory) → UHM (forms of kernel )
- Four types of memory are determined by the form of kernel : sensory (), working (), long-term (), procedural ()
- The Ebbinghaus forgetting curve is reproduced by the power-law kernel at – [C]
- Two mechanisms of forgetting: kernel decoherence (irreversible, 'book burned') and Gap increase (reversible, 'book in a safe')
- At forgetting accelerates by the law — formal explanation of cognitive decline in dementia
Attention and memory are normal mechanisms of coherence control. But what happens when these mechanisms break down? Specific failures of the Gap-profile give rise to pathological states: alexithymia, dissociation, depression, psychosis. In the next chapter — Pathology of consciousness — we will show that each pathology = a characteristic Gap-pattern, and that therapy = targeted Gap-reduction.
Connections
- Coherence matrix: Definition of Γ — A-sector coherences
- Evolution: Equations of motion — full equation for
- Non-Markovian dynamics: Memory kernel — forms of
- Gap-dynamics: Non-Markovian oscillations — Gap oscillations
- Qualia: 21-pair taxonomy — qualia types associated with the A-dimension
- Unconscious: Gap-structure of the unconscious — connection between forgetting and the unconscious
- ASC: Meditation and attention — shamatha as training of
- CC Theorems: Coherence Cybernetics — T-103 (hedonic vector) and T-104 (stability)