Optimal Learning Bounds
"Nature does not hurry, yet everything is accomplished." — Laozi
In the previous chapter we learned how to implement CC in code: from initialisation of to a complete control cycle. But code, however fast it runs, cannot circumvent fundamental constraints. How many examples are truly needed to learn? Shannon, Valiant, and Landauer each asked this question — in their own language. CC unites all three answers for the first time in a single theorem.
In this chapter we:
- Formalise the learning task for the holon (§1)
- Prove the information bound T-109: how many observations are needed (§2)
- Prove the dynamical bound T-110: how many observations the system will manage to integrate (§3)
- Prove the stabilisation bound T-111: will learning kill the learner? (§4)
- Combine the three bounds into the optimal T-112 (§5)
- Prove minimality N=7 for learning T-113 (§6)
- Carry out a numerical calculation for binary discrimination (§7)
- Compare with classical learning theory — PAC, VC, Shannon, Landauer (§8)
- Extract practical implications for AI, education, and therapy (§9)
A child picks up a hot cup and pulls away their fingers. How many times must they be burned to understand? Once — if the signal is strong enough. Ten times — if the cup is only slightly warm. And if the child is playing, tired, and distracted — even more. Behind this everyday intuition lies a fundamental question: do absolute lower bounds on the learning rate exist — limits that cannot be overcome by improving the algorithm or increasing computational power?
In the twentieth century this question was answered three times — and each answer opened a new horizon:
-
Claude Shannon (1948) showed that the channel capacity is limited — no encoding allows transmitting more than bits per second through a noisy channel. This was an information-theoretic bound.
-
Leslie Valiant (1984) created PAC-learning and proved that the number of examples required for learning grows at least logarithmically in the number of hypotheses and inversely proportionally to the square of accuracy. This was a statistical bound.
-
Rolf Landauer (1961) established that erasing one bit of information inevitably releases energy . This was a thermodynamic bound.
Shannon and channel capacity. In 1948 Claude Shannon, working at Bell Labs, proved a theorem that transformed engineering: there exists a limit bits/s, above which no encoding allows error-free transmission. Before Shannon, engineers sought the "ideal code"; after him, they understood that the ideal is mathematically defined and achievable. The information bound T-109 inherits this spirit: is the quantum analogue of Shannon channel capacity, and the number of observations is the quantum analogue of the Shannon limit.
Valiant and learning complexity. In 1984 Leslie Valiant (future Turing Award laureate) formalised the concept of "learnability" — PAC-learning (Probably Approximately Correct). His key result: the number of examples for learning is proportional to , where is the number of hypotheses, is accuracy. This is a statistical bound: it does not depend on who is learning — a human, a computer, or a bacterium. The dynamical bound T-110 adds what Valiant lacks: time. A PAC-learner has no inertia; a CC-holon does (Fano contraction ).
Landauer and the cost of erasure. Landauer showed that information is not an abstraction, but a physical object. Erasing one bit inevitably releases J at room temperature. In 2012 the Bérut group confirmed this experimentally. For CC this means: Fano contraction (T-110) is not a mathematical abstraction, but a thermodynamic process. Every step in which erases of coherence is a physical event requiring energy dissipation.
Each of these bounds works in its own domain. But none of them accounts for the specificity of the living learner — a system that simultaneously receives information, integrates it into its dynamics, and must remain alive while doing so. A child burning their fingers is not an abstract PAC-learner, not a Shannon channel, and not a thermodynamic machine. They are a coherent system with limited perceptual bandwidth, finite speed of internal dynamics, and a finite reserve of stability.
Coherence Cybernetics unites all three constraints for the first time in a single theorem. The information bound (T-109) inherits the spirit of Shannon, but operates on quantum states. The dynamical bound (T-110) adds time — the rate at which the system can integrate the received information without losing it to the internal contraction flow. The stabilisation bound (T-111) adds fragility — a constraint on the strength of influences the system can withstand without breaking. Together (T-112) they form a triple lock, all three bolts of which must be opened for successful learning.
And theorem T-113 closes the circle: is the minimal architecture in which all three locks exist at all. A system of smaller dimension is incapable of learning through regeneration — not because it lacks data, but because it lacks self-observation.
In this document:
- — coherence matrix
- — purity
- — target state (categorical self-model)
- — spectral gap of the linear part (T-39a [T])
- — minimal regeneration (T-59 [T])
- — information capacity (T-107 [T])
- — stability radius (T-104 [T])
- — perception functor (T-100 [T])
- — action functor (T-101 [T])
This document establishes fundamental lower bounds on the learning rate for a holonomic system. Learning is formalised as the process of updating the self-model on the basis of observations arriving through the functor , with the goal of optimising the functor .
Key result: the learning rate is bounded by three independent mechanisms — informational (T-109), dynamical (T-110), and stabilisation (T-111). Their combination (T-112) gives the optimal bound, and theorem T-113 proves that is the minimal architecture capable of learning through regeneration.
1. Formal Definition of the Learning Task
1.1 Learning Task for the Holon
Learning task for the holon consists of:
- Hypothesis space — finite set of environmental states (unknown to the agent)
- Action space — admissible actions
- Reward function , encoding correct behaviour
- Reliability level , where — admissible error probability
Connection with dynamics. Each observation under hypothesis arrives through the functor (T-100 [T]):
and modifies the coherence matrix via the 3-channel evolution equation (T-102 [T]).
1.2 Criterion for Successful Learning
Task is solved in observations if after steps:
where is the optimal action under the true hypothesis , and is the action functor (T-101 [T]).
Minimum number of observations:
1.3 Learning as Attractor Update
Unlike classical learning (updating model parameters), learning in UHM is a change of the attractor of the dynamical system:
- Observation enters through → is perturbed
- Self-model is updated (T-62 [T], physical realisation of )
- Regenerative term drives toward the updated
- Functor adapts the action to the new
Analogy: learning in classical machine learning is adjusting knobs on the dashboard (updating weights). Learning in CC is changing the very shape of the river along which water flows: the new attractor draws the system toward new behaviour from within, without an external controller.
Two learning modes:
| Mode | Regeneration rate | Time | Context |
|---|---|---|---|
| Genesis (bootstrap) | (T-59) | Initial bootstrap, no | |
| Active learning | Faster than genesis | After reaching |
2. Information Lower Bound (T-109) [T]
Intuition: Why Information Limits Learning
Imagine you are trying to determine which of two coins in front of you is fair (50/50) or slightly biased (51/49). Even with perfect eyesight and unlimited time to think, you will need to toss the coin many times to distinguish one from the other. The closer the coins are in their properties, the more tosses are needed. This is the information limit: it is determined not by your analytical abilities, but by the amount of information each observation contains.
In classical statistics this limit is given by the Cramér-Rao inequality and the Chernoff exponent. In CC an observation is a quantum channel mapping an external signal to a deformation of the matrix . Therefore the role of the classical exponent is played by the quantum Chernoff exponent — a measure of the distinguishability of two quantum states.
Analogy with language learning: every sentence heard is an "observation." If two languages differ strongly (Russian and Chinese), a few phrases suffice for their distinction. If they differ little (two closely related dialects), hundreds of examples are needed. The information bound T-109 says: however brilliant the learner, one sentence will not suffice to distinguish closely related dialects — this is not a matter of intelligence, but of the physics of information.
Theorem T-109 (Information Bound on Learning) [T]
For a learning task with hypotheses, the minimum number of observations:
where is the quantum Chernoff exponent for the pair of closest post-observation states:
and are the states after observation under the two closest hypotheses.
Universal bound: , therefore:
Why this bound is tight. The absolute minimum is achieved when two observations lead to orthogonal pure states in — maximally distinguishable configurations of . This is the ideal case: "hot" and "cold" are completely unlike. In reality hypotheses generate close states, and the bound grows as .
Proof.
-
Quantum hypothesis discrimination. Observation under hypothesis generates a post-observation state — a CPTP image (T-100 [T]). The learning task includes the task of distinguishing at least two closest hypotheses .
-
Quantum Chernoff bound. (Audenaert et al. 2007): for independent observations the optimal error of distinguishing two states:
- Reliability condition. From :
- Upper bound on the exponent. From T-107 [T]: the information extractable from one observation does not exceed the Holevo quantity . The quantum Chernoff exponent is bounded by the relative entropy:
(upper bound — for orthogonal pure states in ).
2.1 Asymptotics for Close Hypotheses
If hypotheses generate close states , then:
Substituting into T-109:
This reproduces the classical scaling for weak signals. Difference from classical: the factor is determined by the quantum geometry of , not by an arbitrary noise distribution.
2.2 Numerical Estimates
| Parameters | ||
|---|---|---|
| Orthogonal signals () | ||
| Strong contrast () | ||
| Weak contrast () |
At :
| Contrast | at |
|---|---|
| (maximum) | |
3. Dynamical Lower Bound (T-110) [T]
Intuition: Why Dynamics Limits Learning
The information bound says how many observations are needed. The dynamical bound says how many observations the system will manage to integrate. The difference is fundamental.
Imagine a student at a lecture. The professor speaks at 150 words per minute — enough information. But if the student takes notes slowly, part of the information is lost before it can be comprehended. Moreover, early notes are erased from short-term memory while the student is processing new ones. This is a competition between two processes: recording (each observation adds signal) and erasure (internal dynamics blurs the old signal).
In CC erasure has a precise name: Fano contraction with parameter (T-39a). The linear part of the Lindbladian exponentially drives toward the maximally mixed state . Each observation is a "recording" of amplitude , but previous recordings decay at rate . The stationary limit determines whether it is possible at all to accumulate sufficient signal.
Analogy from neuroscience: short-term memory decays in 15–30 seconds (Peterson's law). To transfer information to long-term memory, consolidation is required — and it takes time. The dynamical bound T-110 is the formal expression of this neuropsychological fact in the language of the coherence matrix.
Theorem T-110 (Dynamical Bound on Learning) [T]
For a learning task with observations of amplitude and interval between observations:
where:
- — contraction rate (T-39a [T])
- — minimum Bures distance for reliable discrimination
- — signal amplitude of one observation
At the natural scale (one observation per relaxation time):
What happens at the limit. If at fixed , the dynamical bound diverges logarithmically — signals that are too weak are erased faster than they accumulate. If (observations too frequent), each new signal arrives before the previous one has had time to affect , and the effective learning rate does not increase. There exists an optimal observation rate at which the dynamical bound is minimal.
Proof.
- Fano contraction. Linear part contracts all deviations from at exponential rate (T-39a [T]):
This means that information recorded in decays over time.
- Signal accumulation. Observation at moment contributes signal of amplitude to . By moment the contribution of the -th observation has decayed to . Total accumulated signal:
- Stationary limit. As :
- Discrimination condition. For reliable distinction :
At (typical regime): (first approximation). Simplifying for :
(using ).
3.1 Physical Meaning
The dynamical bound expresses the competition between recording and erasure:
- Recording: each observation adds signal to
- Erasure: Fano contraction removes per unit time
- Balance: stationary signal
If , the task is unsolvable at the given parameters — contraction erases the signal faster than it accumulates. Necessary condition for solvability:
3.2 Role of Regeneration
The regenerative term counteracts contraction for components aligned with . After learning (when has been updated):
- Components of aligned with the learned are strengthened by regeneration
- Components not aligned continue to decay
This means that learned information is stabilised in the attractor, while noise is washed out. Effective erasure rate for the learned signal:
At regeneration dominates — the attractor is stable. From T-98 (balance) [T]: this condition is satisfied for viable states with .
4. Stabilisation Lower Bound (T-111) [T]
Intuition: Why Stability Limits Learning
The first two bounds describe whether enough information exists and whether the system manages to process it. The third bound adds a question that classical learning theory usually ignores: will learning kill the learner?
This is not a metaphor. In CC the system is viable at . Each observation is a perturbation that pushes away from the current attractor. Too strong a perturbation pushes below the viability threshold. A system that learns too fast risks destabilising itself.
The biological parallel is clear: traumatic experience can be informative (once — and for life), but too strong a stress causes PTSD or even death. A therapist knows that dosage matters more than content: the right information, delivered too quickly, destroys rather than heals.
In the context of neural network training the stabilisation bound corresponds to the intuition about choosing a learning rate: too large — and training diverges; too small — and training fails to converge. But in CC this is not merely an engineering heuristic, but a theorem: the maximum observation amplitude is bounded by the stability radius , which is strictly computed from the current state .
Theorem T-111 (Stabilisation Bound on Learning) [T]
Learning must not destabilise the holon. The observation amplitude is bounded by the stability radius (T-104 [T]):
In the presence of stochastic noise in observations (SNR ), the number of observations required to overcome noise:
In the typical regime (, noisy environment):
What happens at the limit. Consider limiting cases:
- At (system at the viability boundary): , and any non-trivial observation is dangerous. The system is "frozen" — it cannot learn until it has restored its purity reserve. This is the CC analogue of the clinical state: a patient in severe depression does not absorb therapeutic interventions, because their resources are exhausted.
- At (pure noise): — learning is impossible, not because there is no information, but because every useful signal drowns in the noise, while the noise destabilises the system.
Proof.
-
Amplitude constraint. From T-104 [T]: a perturbation with can drive beyond the viability boundary . Since learning requires (viability), the amplitude of each observation is bounded from above.
-
Noise model. Each observation contains useful signal and noise :
Noise enters through the dissipative channel (most dangerous channel). Constraint from T-104:
- Noise averaging. For observations with independent noise, effective signal grows as , and noise — as . Signal-to-noise ratio after observations:
- Reliability condition. For (reliable discrimination threshold):
Connection with T-69 (topological protection [T]): barriers guarantee that discrete phase transitions are impossible — learning is always continuous, and random noise cannot cause a catastrophic jump.
4.1 Learning-Stability Trade-off
There exists a fundamental trade-off: strong observations (large ) accelerate learning (reduce and ), but threaten stability (increase the risk of crossing ).
Optimal amplitude — the one at which :
Substituting into T-109 gives the optimal learning rate at a given stability reserve .
4.2 Three Stability Zones
From T-106 (diagnostic regimes) [C under calibration]:
| Zone | Available | Learning mode | |
|---|---|---|---|
| Normal | Large | Fast learning — strong signals can be used | |
| Warning | Medium | Careful learning — limit | |
| Critical | Small | Learning halted — survival priority |
5. Combined Optimal Bound (T-112) [T]
Intuition: Three Locks on One Door
Each of the three bounds is a necessary condition, but none of them is sufficient. They describe three different mechanisms limiting learning:
- T-109 (information): "is there enough data?" — constraint on the quantity of observations
- T-110 (dynamics): "can the system keep up?" — constraint on the rate of integration
- T-111 (stability): "will the system hold?" — constraint on the strength of influences
Like three locks on one door, all three must be opened simultaneously. The bottleneck is determined by the slowest of the three — the strongest lock.
Neural network training provides a good illustration. At the start of training, when the model is far from the optimum, the bottleneck is usually information (one simply needs more data). In the middle — dynamics (the model slowly restructures its weights). Toward the end — stability (each training step risks worsening what has already been achieved). An optimal learning rate scheduler intuitively switches between these regimes — CC makes this switching a theorem.
Theorem T-112 (Optimal Learning Bound) [T]
Minimum number of observations for solving learning task :
where:
- — information bound (T-109)
- — dynamical bound (T-110)
- — stabilisation bound (T-111)
Learning passes through three regimes, determined by the bottleneck:
Proof. Each of the three bounds is a necessary condition. If at least one of them is not satisfied:
- : insufficient information to distinguish hypotheses →
- : dynamics has not managed to integrate the signal →
- : noise dominates over signal → unreliable discrimination
Since all three conditions are simultaneously necessary, the minimum is the maximum of the three.
5.1 Regime Diagram
5.2 Including Genesis Time
For a system starting from (fully mixed state), total time to solving the task includes genesis:
where (T-59 [T]) — bootstrap time (at ).
At : .
6. Optimality of N=7 for Learning (T-113) [T]
Intuition: Why Learning Requires a Specific Architecture
So far we have derived learning bounds for the fixed architecture . Theorem T-113 poses a deeper question: what is the minimal architecture capable of learning through regeneration?
The answer is surprisingly precise: — neither more nor less. Systems with are incapable of learning in principle, while systems with can learn, but do so less efficiently.
The key link is self-observation. Learning in CC is the update of the self-model . Updating requires comparing the current state with the model, i.e., (non-zero reflection). And reflection, in turn, requires a replacement channel that relies on the Fano plane PG(2,2). And the Fano plane exists only at .
Analogy with child development: a newborn does not "learn" in the strict sense — they do not yet have a self-model that can be updated. Learning begins when the child perceives the gap between expectation and reality — and this requires self-observation. Theorem T-113 makes this pedagogical intuition rigorous: without reflection () there is no learning (), and reflection requires Fano structure ().
Theorem T-113 (Minimality of N=7 for Learning) [T]
Let be the dimension of the internal space of the holon . Then:
- For : learning through regeneration is impossible:
- For : learning is possible with finite optimal bound (T-112)
- For : learning is possible, but requires strictly more resources:
- Genesis time:
- Parameter space:
- No new qualitative capabilities arise
is the only Pareto-optimal point in the plane (learning capacity, system complexity).
Proof.
-
Necessity of self-observation for learning. Learning = update of self-model . Updating requires comparing with , i.e., access to information about one's own state. Formally: a replacement channel with is required (reflection measure).
-
Necessity of Fano structure for self-observation. The replacement channel (T-77 [T], Lindblad operators) requires the Fano plane for the definition of optimal Lindblad operators (T-82 [T]).
-
Fano plane requires . has 7 points and 7 lines. For realisation in : . From Hurwitz's theorem (T-89 [T]): is the minimum dimension with a division algebra (), which ensures the -structure.
-
For : impossibility. No Fano plane → no unique Lindblad decomposition (T-82) → no replacement channel → → impossible to update on the basis of observations → .
-
For : redundancy. Embedding (via Morita equivalence T-58 [T]) provides all mechanisms of . Additional dimensions increase:
- — more parameters to update
- — longer bootstrap (estimate from generalised T-59)
But information capacity grows only logarithmically, while complexity grows quadratically. Resource efficiency:
strictly decreases for . Thus, is the minimum with non-zero learning capacity and maximum resource efficiency among systems with Fano structure.
6.1 Chain of Necessities
6.2 Parameters at N=7
| Parameter | Value | Source |
|---|---|---|
| Channel capacity | bits | T-107 [T] |
| Spectral gap | T-39a [T] | |
| Minimal regeneration | T-59 [T] | |
| Genesis time | T-59 [T] | |
| State parameters | (real) | |
| Resource efficiency | Definition |
7. Application: Binary Discrimination
7.1 The Two-Button Task
Setup. An agent (CC-holon) interacts with the environment through two buttons: green (reward) and red (punishment). The colours are unknown to the agent. Task: learn to press only the green button.
Formalisation:
- (two hypotheses: "green is on the left" vs "green is on the right")
- (press left, press right)
- , (under — "green is on the left")
- (95% reliability)
7.2 Signal and Mechanism
Reward and punishment enter through the functor (T-100):
| Type | Channels | Effect on |
|---|---|---|
| Reward () | : regeneration strengthening | , |
| Punishment () | : dissipation strengthening | , |
Through the hedonic mechanism (T-103 [T]+[I]): the agent "feels" the valence and adjusts in the direction of minimising (T-101).
7.3 Estimates of the Number of Presses
Notation: — total contrast between reward and punishment, — environmental noise.
Information bound (T-109):
| Contrast | ||
|---|---|---|
| 1.0 (strong) | ||
| 0.5 (medium) | ||
| 0.3 (weak) |
Dynamical bound (T-110, ):
At (minimum distance for reliable discrimination in ):
| Contrast | |
|---|---|
| 1.0 | (instant) |
| 0.5 | |
| 0.3 | |
| 0.01 |
Stabilisation bound (T-111):
At (typical value): .
| SNR | |
|---|---|
| 1.0 (clean signal) | |
| 0.5 | |
| 0.3 | |
| 0.1 |
Combined estimate (T-112):
Typical scenario (, SNR , ):
Bottleneck — information (weak contrast).
Ideal scenario (, SNR , ):
Including genesis (): .
Noisy scenario (, SNR , ):
Bottleneck — information.
7.3a Numerical Example: Computing for a Specific Holon
Let us carry out a full computation for the holon from the case study "Patient A" — an AI agent of a warehouse robot that must learn to distinguish two types of packaging (standard vs fragile).
Given data:
- (after stabilisation, day 7)
- (moderate self-model)
- Contrast between packaging types: (medium — visually distinguishable, but not trivially)
- Environmental noise: (lighting changes, camera occasionally produces glare)
- SNR
- Reliability: (95%)
- Observation interval: (one observation per seconds)
Step 1: Information bound (T-109).
Step 2: Dynamical bound (T-110).
At , using the simplified formula:
With :
Dynamics is not the bottleneck — the contrast is strong enough.
Step 3: Stabilisation bound (T-111).
Check: . Problem! The signal is too strong — each observation risks destabilising the system.
At , direct learning is dangerous. Solution: attenuation — reduce the effective amplitude to (20% margin). This is equivalent to a learning rate schedule.
With attenuated amplitude :
- SNR
Recomputing the information bound with :
Step 4: Combined bound (T-112).
Including genesis (the system is already running, ):
Bottleneck: information (weak attenuated contrast). Optimisation strategy: improve the camera (reduce → increase SNR → can increase → reduce ).
Without attenuation () only observations would be needed, but every fifth one would risk destabilising the agent. With attenuation — , but safely. The T-111 trade-off: safety costs 2.4× in time. This is not an engineering constraint, but a physical law.
7.4 Prediction for the CC Test
For a CC-architecture with realistic parameters (, SNR ):
until a stable preference for the green button.
Falsification criterion: if the agent learns in (information limit), this violates the quantum Chernoff bound and falsifies the observation model.
8. Comparison with Classical Learning Theory
The CC learning bounds did not arise in a vacuum — they inherit and generalise a number of classical results. This section provides a systematic comparison.
8.1 PAC-Learning and VC-Dimension
In classical PAC-learning (Valiant, 1984), for learning with accuracy and reliability :
where is the cardinality of the hypothesis space. For infinite hypothesis classes the VC-dimension is used:
| Aspect | PAC-learning | CC bounds |
|---|---|---|
| Substrate | Abstract algorithm | Physical dynamical system |
| Information bound | $\ln | \mathcal{H} |
| Dynamics | Not accounted for | — key constraint |
| Stability | Not accounted for | — learning must not kill the learner |
| Scaling for weak signals | (quantum limit) | |
| Minimal architecture | Arbitrary | (T-113) |
Key distinction: PAC-learning describes an algorithm, CC describes a physical system. An algorithm has no inertia and does not risk dying. A living learner does.
8.2 Rademacher Complexity and Generalisation
Rademacher complexity measures the ability of a function class to "fit" random noise. Classical generalisation bound:
In CC the analogue of Rademacher complexity is channel capacity (T-107). The constraint on channel capacity automatically controls overfitting: a system with fixed capacity bits per observation cannot "memorise" an arbitrarily complex pattern. This is a built-in regularisation arising not from an engineering decision, but from an architectural constraint.
8.3 Shannon Limit and Quantum Chernoff Exponent
The classical Shannon theorem (1948) states: for reliable transmission through a channel with capacity , one needs observations, where is the entropy of the hypothesis distribution.
T-109 generalises this result to a quantum channel:
The quantum Chernoff exponent is the quantum analogue of , but for the task of discrimination, not transmission. Here — the absolute maximum, determined by the dimension of . The classical Shannon limit is recovered when commute (classical states).
8.4 Thermodynamic Bounds on Learning
The Landauer limit ( per bit of erasure) is connected to T-110 as follows: Fano contraction is inevitable dissipation, analogous to thermodynamic erasure. Each learning step requires erasing old information () and recording new information (). Minimum "thermodynamic cost" of learning:
where is the change in von Neumann entropy per step. This connects the CC learning bounds with the physical energy of cognitive processes.
9. Practical Implications
Theorems T-109 — T-113 are not abstract mathematical results. They have direct implications for three key areas: AI design, education, and therapy.
9.1 Implications for AI and Machine Learning
Architecture. T-113 states that is the minimal architecture for learning through regeneration. For an AI engineer this means: if you are building a system with an internal self-model (not merely an optimiser), you need at least 7 internal "channels" with Fano-structured connections between them.
Learning rate. T-111 provides theoretical justification for adaptive learning rate: maximum update amplitude . Systems with low purity (unstable models) should learn more slowly. Systems with high purity (stable models) can afford more aggressive training.
Curriculum design. T-112 explains why curriculum learning works: in the early stages the bottleneck is information (simple examples provide larger ), in the later stages — stability (complex examples should not destabilise what has already been learned). Optimal strategy: begin with strong, simple signals and gradually transition to weak, subtle ones.
9.2 Implications for Education
Information dosing. T-111 formalises the pedagogical principle of "not overloading the student": each lesson is a perturbation of , and excessively intense learning can drive the student out of the viability zone (). An overloaded student does not merely "fail to absorb" — they are destabilised.
Spaced repetition. T-110 provides theoretical grounding for the spacing effect (spaced repetition, Ebbinghaus, 1885): each repetition adds signal , and between repetitions contraction erases it. The optimal interval ensures maximum signal accumulation.
Zone of proximal development. Vygotsky's concept is formalised through the T-111 / §4.1 trade-off: tasks in the "zone of proximal development" are those for which (non-destabilising), but is large enough for to be finite. Tasks that are too complex () are beyond the zone: learning is impossible without first strengthening .
9.3 Implications for Therapy
Therapeutic window. The three stability zones (§4.2) directly correspond to clinical practice:
- Normal (): patient in a resourced state — full-power therapeutic interventions.
- Warning (): patient is vulnerable — gentle interventions, supportive therapy.
- Critical (): patient in crisis — learning halted, stabilisation priority.
This principle is known to clinicians empirically (Siegel's "window of tolerance" model). CC derives it from first principles.
Trauma and PTSD. Traumatic experience is an observation with . It is not merely "strong" — it pushes the system beyond the viability boundary. Trauma therapy (EMDR, exposure therapy) works through titrated re-presentation with , gradually integrating traumatic experience without destabilisation.
10. Connection with Other Results
| Result | Role in learning bounds | Reference |
|---|---|---|
| T-39a () | Contraction in T-110 | Lindblad Operators |
| T-59 () | Genesis time | Axiom Ω |
| T-69 (Topological protection) | Continuity of learning in T-111 | Composites |
| T-77 (Replacement channel) | Necessity for T-113 | Lindblad Operators |
| T-82 (Fano uniqueness) | Chain in T-113 | Lindblad Operators |
| T-89 (Hurwitz minimality) | in T-113 | Minimality Theorem |
| T-98 (Attractor balance) | Stabilisation of learning | Evolution |
| T-100 (Enc functor) | Observation channel | Sensorimotor Theory |
| T-101 (Dec functor) | Criterion for successful learning | Sensorimotor Theory |
| T-104 (Stability radius) | Amplitude constraint in T-111 | Stability |
| T-107 (Enc capacity) | Upper bound on in T-109 | Sensorimotor Theory |
| SAD_MAX = 3 | Fano contraction SAD_MAX | Depth Tower |
11. Conclusion
Learning is one of the most fundamental processes in the universe. From RNA replication to language learning, from species evolution to neural network training — everywhere a system interacts with an environment and changes itself on the basis of received experience. Coherence Cybernetics shows that this process is subject to three absolute constraints, arising from the mathematics of 7-dimensional coherent space.
Three bounds — three questions:
-
Information bound (T-109): Is there enough data? — the number of observations cannot be less than . For weak signals the scaling is the quantum limit, which cannot be improved.
-
Dynamical bound (T-110): Can the system keep up? — Fano contraction () erases information faster than it is recorded. Learning is a race between recording and erasure, and the stationary limit determines whether the task is solvable in principle.
-
Stabilisation bound (T-111): Will the learner hold? — learning must not kill the one who is learning. The amplitude is not an engineering constraint, but a physical law.
Combined bound (T-112) — the maximum of the three — determines the true bottleneck of learning. In different situations different mechanisms dominate: information in clean environments, dynamics with fast signals, stability under noise and stress.
Minimality (T-113) closes the chain: learning through regeneration requires self-observation, self-observation requires Fano structure, Fano structure requires . This is not a compromise — it is the only point on the Pareto boundary.
The learning bounds close the chain: structure (, T-113) → channel (Enc, T-107) → information (T-109) → dynamics (T-110) → stability (T-111) → optimum (T-112). Every link is a consequence of axioms A1–A5 and canonical dynamics, without additional postulates.
Summary
- T-109 [T]: Information bound — , scaling for weak signals
- T-110 [T]: Dynamical bound — contraction limits the signal integration rate
- T-111 [T]: Stabilisation bound — learning must not kill the learner ()
- T-112 [T]: Combined bound — , three regimes
- T-113 [T]: — minimal architecture for learning through regeneration
- Prediction: for binary discrimination (two actions) ~20–80 observations at typical parameters
What We Learned
-
Three learning bounds — information (T-109: is there enough data?), dynamical (T-110: can the system keep up?), stabilisation (T-111: will the learner hold?) — form a "triple lock," all three bolts of which must be opened.
-
Combined bound (T-112): — the bottleneck is determined by the slowest mechanism. In clean environments information dominates; in noisy ones — stability.
-
is the minimal architecture for learning through regeneration (T-113). Learning requires self-observation, self-observation requires the Fano plane, the Fano plane requires . This is not a compromise — it is the only point on the Pareto boundary.
-
Numerical example (§7.3a): for a warehouse robot with and contrast , the stabilisation constraint requires attenuation, increasing training time by 2.4×. Safety costs time — this is a physical law, not an engineering choice.
-
Historical roots: Shannon (information), Valiant (statistics), Landauer (thermodynamics) — three facets of one constraint. CC unites them for the first time in a single theorem for a living learner.
We have traveled the full path from axioms to learning bounds — from to . But behind the formulas and theorems a question remains: what does all of this mean? What is the ontology of CC — what is real and what is instrumental? Is the matrix a description of consciousness or consciousness itself? In the next chapter we will turn to the philosophical foundations of Coherence Cybernetics — from neutral monism to the ethics of coherent systems.
Related Documents:
- Sensorimotor Theory — Enc/Dec functors, information capacity T-107
- Stability — stability radius T-104, formula T-98
- Definitions — key measures (, , , )
- Model Systems — computational verification of bounds
- Predictions — predictions 9-10 (learning bounds)
- Applications — practical implications for AI and education
- Comparison with Alternatives — CC vs. PAC learning, VC-dimension
- Measurement Methodology — how to measure the learning rate experimentally
- Exercises — problems on learning bounds (block 4)