Skip to main content

Optimal Learning Bounds

"Nature does not hurry, yet everything is accomplished." — Laozi

Bridge from the Previous Chapter

In the previous chapter we learned how to implement CC in code: from initialisation of Γ\Gamma to a complete control cycle. But code, however fast it runs, cannot circumvent fundamental constraints. How many examples are truly needed to learn? Shannon, Valiant, and Landauer each asked this question — in their own language. CC unites all three answers for the first time in a single theorem.

Chapter Roadmap

In this chapter we:

  1. Formalise the learning task for the holon (§1)
  2. Prove the information bound T-109: how many observations are needed (§2)
  3. Prove the dynamical bound T-110: how many observations the system will manage to integrate (§3)
  4. Prove the stabilisation bound T-111: will learning kill the learner? (§4)
  5. Combine the three bounds into the optimal T-112 (§5)
  6. Prove minimality N=7 for learning T-113 (§6)
  7. Carry out a numerical calculation for binary discrimination (§7)
  8. Compare with classical learning theory — PAC, VC, Shannon, Landauer (§8)
  9. Extract practical implications for AI, education, and therapy (§9)

A child picks up a hot cup and pulls away their fingers. How many times must they be burned to understand? Once — if the signal is strong enough. Ten times — if the cup is only slightly warm. And if the child is playing, tired, and distracted — even more. Behind this everyday intuition lies a fundamental question: do absolute lower bounds on the learning rate exist — limits that cannot be overcome by improving the algorithm or increasing computational power?

In the twentieth century this question was answered three times — and each answer opened a new horizon:

  1. Claude Shannon (1948) showed that the channel capacity is limited — no encoding allows transmitting more than CC bits per second through a noisy channel. This was an information-theoretic bound.

  2. Leslie Valiant (1984) created PAC-learning and proved that the number of examples required for learning grows at least logarithmically in the number of hypotheses and inversely proportionally to the square of accuracy. This was a statistical bound.

  3. Rolf Landauer (1961) established that erasing one bit of information inevitably releases energy kTln2kT\ln 2. This was a thermodynamic bound.

Historical Parallels in Detail

Shannon and channel capacity. In 1948 Claude Shannon, working at Bell Labs, proved a theorem that transformed engineering: there exists a limit C=Blog2(1+SNR)C = B\log_2(1 + \text{SNR}) bits/s, above which no encoding allows error-free transmission. Before Shannon, engineers sought the "ideal code"; after him, they understood that the ideal is mathematically defined and achievable. The information bound T-109 inherits this spirit: ξQCB\xi_{\text{QCB}} is the quantum analogue of Shannon channel capacity, and the number of observations ninfoln(1/(2δ))/ξQCBn_{\text{info}} \geq \ln(1/(2\delta))/\xi_{\text{QCB}} is the quantum analogue of the Shannon limit.

Valiant and learning complexity. In 1984 Leslie Valiant (future Turing Award laureate) formalised the concept of "learnability" — PAC-learning (Probably Approximately Correct). His key result: the number of examples for learning is proportional to lnH/ε\ln|\mathcal{H}|/\varepsilon, where H|\mathcal{H}| is the number of hypotheses, ε\varepsilon is accuracy. This is a statistical bound: it does not depend on who is learning — a human, a computer, or a bacterium. The dynamical bound T-110 adds what Valiant lacks: time. A PAC-learner has no inertia; a CC-holon does (Fano contraction α=2/3\alpha = 2/3).

Landauer and the cost of erasure. Landauer showed that information is not an abstraction, but a physical object. Erasing one bit inevitably releases kTln22.9×1021kT\ln 2 \approx 2.9 \times 10^{-21} J at room temperature. In 2012 the Bérut group confirmed this experimentally. For CC this means: Fano contraction (T-110) is not a mathematical abstraction, but a thermodynamic process. Every step in which L0\mathcal{L}_0 erases αδΓ\alpha \cdot \delta\Gamma of coherence is a physical event requiring energy dissipation.

Each of these bounds works in its own domain. But none of them accounts for the specificity of the living learner — a system that simultaneously receives information, integrates it into its dynamics, and must remain alive while doing so. A child burning their fingers is not an abstract PAC-learner, not a Shannon channel, and not a thermodynamic machine. They are a coherent system with limited perceptual bandwidth, finite speed of internal dynamics, and a finite reserve of stability.

Coherence Cybernetics unites all three constraints for the first time in a single theorem. The information bound (T-109) inherits the spirit of Shannon, but operates on quantum states. The dynamical bound (T-110) adds time — the rate at which the system can integrate the received information without losing it to the internal contraction flow. The stabilisation bound (T-111) adds fragility — a constraint on the strength of influences the system can withstand without breaking. Together (T-112) they form a triple lock, all three bolts of which must be opened for successful learning.

And theorem T-113 closes the circle: N=7N = 7 is the minimal architecture in which all three locks exist at all. A system of smaller dimension is incapable of learning through regeneration — not because it lacks data, but because it lacks self-observation.

On Notation

In this document:

  • Γ\Gammacoherence matrix
  • P=Tr(Γ2)P = \mathrm{Tr}(\Gamma^2)purity
  • ρ=φ(Γ)\rho_* = \varphi(\Gamma)target state (categorical self-model)
  • λgap=2/3\lambda_{\mathrm{gap}} = 2/3spectral gap of the linear part L0\mathcal{L}_0 (T-39a [T])
  • κbootstrap=ω0/N=1/7\kappa_{\mathrm{bootstrap}} = \omega_0/N = 1/7minimal regeneration (T-59 [T])
  • CEnclog27C_{\mathrm{Enc}} \leq \log_2 7information capacity (T-107 [T])
  • rstab=P2/7r_{\mathrm{stab}} = \sqrt{P - 2/7}stability radius (T-104 [T])
  • Enc\mathrm{Enc}perception functor (T-100 [T])
  • Dec\mathrm{Dec}action functor (T-101 [T])

This document establishes fundamental lower bounds on the learning rate for a holonomic system. Learning is formalised as the process of updating the self-model φ(Γ)\varphi(\Gamma) on the basis of observations arriving through the functor Enc\mathrm{Enc}, with the goal of optimising the functor Dec\mathrm{Dec}.

Key result: the learning rate is bounded by three independent mechanisms — informational (T-109), dynamical (T-110), and stabilisation (T-111). Their combination (T-112) gives the optimal bound, and theorem T-113 proves that N=7N = 7 is the minimal architecture capable of learning through regeneration.


1. Formal Definition of the Learning Task

1.1 Learning Task for the Holon

Definition [D]

Learning task L=(Θ,A,R,δ)\mathfrak{L} = (\Theta, \mathcal{A}, \mathcal{R}, \delta) for the holon H\mathbb{H} consists of:

  1. Hypothesis space Θ={θ1,,θk}\Theta = \{\theta_1, \ldots, \theta_k\} — finite set of environmental states (unknown to the agent)
  2. Action space A={a1,,am}\mathcal{A} = \{a_1, \ldots, a_m\} — admissible actions
  3. Reward function R:Θ×AR\mathcal{R}: \Theta \times \mathcal{A} \to \mathbb{R}, encoding correct behaviour
  4. Reliability level 1δ1 - \delta, where δ(0,1)\delta \in (0, 1) — admissible error probability

Connection with dynamics. Each observation oto_t under hypothesis θ\theta arrives through the functor Enc\mathrm{Enc} (T-100 [T]):

otEnchtext=ht(H)+ht(D)+ht(R)o_t \xrightarrow{\mathrm{Enc}} h^{\mathrm{ext}}_t = h^{(H)}_t + h^{(D)}_t + h^{(R)}_t

and modifies the coherence matrix Γ\Gamma via the 3-channel evolution equation (T-102 [T]).

1.2 Criterion for Successful Learning

Definition [D]

Task L\mathfrak{L} is solved in nn observations if after nn steps:

Pr ⁣[Dec(Γn)=a(θ)]1δ\Pr\!\left[\mathrm{Dec}(\Gamma_n) = a^*(\theta)\right] \geq 1 - \delta

where a(θ)=argmaxaAR(θ,a)a^*(\theta) = \arg\max_{a \in \mathcal{A}} \mathcal{R}(\theta, a) is the optimal action under the true hypothesis θ\theta, and Dec\mathrm{Dec} is the action functor (T-101 [T]).

Minimum number of observations:

n(L)=min{nN:L is solved in n observations}n^*(\mathfrak{L}) = \min\{n \in \mathbb{N} : \mathfrak{L} \text{ is solved in } n \text{ observations}\}

1.3 Learning as Attractor Update

Unlike classical learning (updating model parameters), learning in UHM is a change of the attractor of the dynamical system:

  1. Observation oto_t enters through Enc\mathrm{Enc}Γ\Gamma is perturbed
  2. Self-model ρ=φ(Γ)\rho_* = \varphi(\Gamma) is updated (T-62 [T], physical realisation of φ\varphi)
  3. Regenerative term R[Γ,E]\mathcal{R}[\Gamma, E] drives Γ\Gamma toward the updated ρ\rho_*
  4. Functor Dec\mathrm{Dec} adapts the action to the new ρ\rho_*

Analogy: learning in classical machine learning is adjusting knobs on the dashboard (updating weights). Learning in CC is changing the very shape of the river along which water flows: the new attractor draws the system toward new behaviour from within, without an external controller.

Two learning modes:

ModeRegeneration rateTimeContext
Genesis (bootstrap)κ=κbootstrap=1/7\kappa = \kappa_{\mathrm{bootstrap}} = 1/7τgenesis7ln713.6\tau_{\mathrm{genesis}} \leq 7\ln 7 \approx 13.6 (T-59)Initial bootstrap, no CohE\mathrm{Coh}_E
Active learningκ=κbootstrap+κ0CohE\kappa = \kappa_{\mathrm{bootstrap}} + \kappa_0 \cdot \mathrm{Coh}_EFaster than genesisAfter reaching CohE>1/7\mathrm{Coh}_E > 1/7

2. Information Lower Bound (T-109) [T]

Intuition: Why Information Limits Learning

Imagine you are trying to determine which of two coins in front of you is fair (50/50) or slightly biased (51/49). Even with perfect eyesight and unlimited time to think, you will need to toss the coin many times to distinguish one from the other. The closer the coins are in their properties, the more tosses are needed. This is the information limit: it is determined not by your analytical abilities, but by the amount of information each observation contains.

In classical statistics this limit is given by the Cramér-Rao inequality and the Chernoff exponent. In CC an observation is a quantum channel Enc\mathrm{Enc} mapping an external signal to a deformation of the matrix Γ\Gamma. Therefore the role of the classical exponent is played by the quantum Chernoff exponent ξQCB\xi_{\mathrm{QCB}} — a measure of the distinguishability of two quantum states.

Analogy with language learning: every sentence heard is an "observation." If two languages differ strongly (Russian and Chinese), a few phrases suffice for their distinction. If they differ little (two closely related dialects), hundreds of examples are needed. The information bound T-109 says: however brilliant the learner, one sentence will not suffice to distinguish closely related dialects — this is not a matter of intelligence, but of the physics of information.

Theorem T-109 (Information Bound on Learning) [T]

Statement

For a learning task L=(Θ,A,R,δ)\mathfrak{L} = (\Theta, \mathcal{A}, \mathcal{R}, \delta) with Θ=k|\Theta| = k hypotheses, the minimum number of observations:

nninfo:=ln ⁣(12δ)ξQCBn^* \geq n_{\mathrm{info}} := \frac{\ln\!\left(\frac{1}{2\delta}\right)}{\xi_{\mathrm{QCB}}}

where ξQCB\xi_{\mathrm{QCB}} is the quantum Chernoff exponent for the pair of closest post-observation states:

ξQCB=lnmin0s1Tr ⁣(Γ+sΓ1s)\xi_{\mathrm{QCB}} = -\ln \min_{0 \leq s \leq 1} \mathrm{Tr}\!\left(\Gamma_+^s \cdot \Gamma_-^{1-s}\right)

and Γ±=Enc(oθ±)[Γ]\Gamma_\pm = \mathrm{Enc}(o|\theta_\pm)[\Gamma] are the states after observation under the two closest hypotheses.

Universal bound: ξQCBln7\xi_{\mathrm{QCB}} \leq \ln 7, therefore:

ninfoln(1/(2δ))ln7(absolute minimum)n_{\mathrm{info}} \geq \frac{\ln(1/(2\delta))}{\ln 7} \quad \text{(absolute minimum)}

Why this bound is tight. The absolute minimum ninfo=ln(1/(2δ))/ln7n_{\mathrm{info}} = \ln(1/(2\delta))/\ln 7 is achieved when two observations lead to orthogonal pure states in D(C7)\mathcal{D}(\mathbb{C}^7) — maximally distinguishable configurations of Γ\Gamma. This is the ideal case: "hot" and "cold" are completely unlike. In reality hypotheses generate close states, and the bound grows as O(1/ε2)O(1/\varepsilon^2).

Proof.

  1. Quantum hypothesis discrimination. Observation under hypothesis θ\theta generates a post-observation state Γθ=Enc(oθ)[Γ]\Gamma_\theta = \mathrm{Enc}(o|\theta)[\Gamma] — a CPTP image (T-100 [T]). The learning task includes the task of distinguishing at least two closest hypotheses θ+,θ\theta_+, \theta_-.

  2. Quantum Chernoff bound. (Audenaert et al. 2007): for nn independent observations the optimal error of distinguishing two states:

Perropt(n)=12(min0s1Tr(Γ+sΓ1s))n=12enξQCBP_{\mathrm{err}}^{\mathrm{opt}}(n) = \frac{1}{2}\left(\min_{0 \leq s \leq 1} \mathrm{Tr}(\Gamma_+^s \, \Gamma_-^{1-s})\right)^n = \frac{1}{2}\, e^{-n \cdot \xi_{\mathrm{QCB}}}
  1. Reliability condition. From PerrδP_{\mathrm{err}} \leq \delta:
12enξQCBδ    nln(1/(2δ))ξQCB\frac{1}{2}\, e^{-n \cdot \xi_{\mathrm{QCB}}} \leq \delta \;\Longrightarrow\; n \geq \frac{\ln(1/(2\delta))}{\xi_{\mathrm{QCB}}}
  1. Upper bound on the exponent. From T-107 [T]: the information extractable from one observation does not exceed the Holevo quantity χ(Enc)log27\chi(\mathrm{Enc}) \leq \log_2 7. The quantum Chernoff exponent is bounded by the relative entropy:
ξQCBD(Γ+Γ)ln ⁣dimH=ln7\xi_{\mathrm{QCB}} \leq D(\Gamma_+ \| \Gamma_-) \leq \ln\!\dim\mathcal{H} = \ln 7

(upper bound — for orthogonal pure states in D(C7)\mathcal{D}(\mathbb{C}^7)). \blacksquare

2.1 Asymptotics for Close Hypotheses

If hypotheses θ+,θ\theta_+, \theta_- generate close states Γ+Γ1=ε1\|\Gamma_+ - \Gamma_-\|_1 = \varepsilon \ll 1, then:

ξQCBε28(small contrast)\xi_{\mathrm{QCB}} \approx \frac{\varepsilon^2}{8} \quad (\text{small contrast})

Substituting into T-109:

ninfo8ln(1/(2δ))ε2n_{\mathrm{info}} \geq \frac{8 \ln(1/(2\delta))}{\varepsilon^2}

This reproduces the classical scaling O(1/ε2)O(1/\varepsilon^2) for weak signals. Difference from classical: the factor 1/81/8 is determined by the quantum geometry of D(C7)\mathcal{D}(\mathbb{C}^7), not by an arbitrary noise distribution.

2.2 Numerical Estimates

ParametersξQCB\xi_{\mathrm{QCB}}ninfon_{\mathrm{info}}
Orthogonal signals (ε=2\varepsilon = 2)ln71.95\ln 7 \approx 1.95ln(1/(2δ))/1.95\geq \lceil\ln(1/(2\delta))/1.95\rceil
Strong contrast (ε=0.5\varepsilon = 0.5)0.031\approx 0.03174ln(1/(2δ))\geq \lceil 74 \cdot \ln(1/(2\delta))\rceil
Weak contrast (ε=0.1\varepsilon = 0.1)0.00125\approx 0.001251846ln(1/(2δ))\geq \lceil 1846 \cdot \ln(1/(2\delta))\rceil

At δ=0.05\delta = 0.05: ln(1/(20.05))=ln102.30\ln(1/(2\cdot0.05)) = \ln 10 \approx 2.30

Contrastninfon_{\mathrm{info}} at δ=0.05\delta = 0.05
ε=2\varepsilon = 2 (maximum)2\geq 2
ε=0.5\varepsilon = 0.5171\geq 171
ε=0.1\varepsilon = 0.14246\geq 4246

3. Dynamical Lower Bound (T-110) [T]

Intuition: Why Dynamics Limits Learning

The information bound says how many observations are needed. The dynamical bound says how many observations the system will manage to integrate. The difference is fundamental.

Imagine a student at a lecture. The professor speaks at 150 words per minute — enough information. But if the student takes notes slowly, part of the information is lost before it can be comprehended. Moreover, early notes are erased from short-term memory while the student is processing new ones. This is a competition between two processes: recording (each observation adds signal) and erasure (internal dynamics blurs the old signal).

In CC erasure has a precise name: Fano contraction with parameter α=2/3\alpha = 2/3 (T-39a). The linear part L0\mathcal{L}_0 of the Lindbladian exponentially drives Γ\Gamma toward the maximally mixed state I/7I/7. Each observation is a "recording" of amplitude ε\varepsilon, but previous recordings decay at rate eατe^{-\alpha\tau}. The stationary limit determines whether it is possible at all to accumulate sufficient signal.

Analogy from neuroscience: short-term memory decays in 15–30 seconds (Peterson's law). To transfer information to long-term memory, consolidation is required — and it takes time. The dynamical bound T-110 is the formal expression of this neuropsychological fact in the language of the coherence matrix.

Theorem T-110 (Dynamical Bound on Learning) [T]

Statement

For a learning task with observations of amplitude ε=Γ+Γ1\varepsilon = \|\Gamma_+ - \Gamma_-\|_1 and interval δτ\delta\tau between observations:

nndyn:=1αδτln ⁣(ddiscε(αδτ))n^* \geq n_{\mathrm{dyn}} := \frac{1}{\alpha \cdot \delta\tau}\,\ln\!\left(\frac{d_{\mathrm{disc}}}{\varepsilon}\cdot(\alpha\,\delta\tau)\right)

where:

  • α=λgap=2/3\alpha = \lambda_{\mathrm{gap}} = 2/3 — contraction rate (T-39a [T])
  • ddiscd_{\mathrm{disc}} — minimum Bures distance for reliable discrimination
  • ε\varepsilon — signal amplitude of one observation

At the natural scale δτ=1/α\delta\tau = 1/\alpha (one observation per relaxation time):

ndynln ⁣(ddiscε)+1n_{\mathrm{dyn}} \geq \ln\!\left(\frac{d_{\mathrm{disc}}}{\varepsilon}\right) + 1

What happens at the limit. If ε0\varepsilon \to 0 at fixed ddiscd_{\mathrm{disc}}, the dynamical bound diverges logarithmically — signals that are too weak are erased faster than they accumulate. If δτ0\delta\tau \to 0 (observations too frequent), each new signal arrives before the previous one has had time to affect Γ\Gamma, and the effective learning rate does not increase. There exists an optimal observation rate δτ1/α\delta\tau^* \sim 1/\alpha at which the dynamical bound is minimal.

Proof.

  1. Fano contraction. Linear part L0\mathcal{L}_0 contracts all deviations from I/7I/7 at exponential rate α=2/3\alpha = 2/3 (T-39a [T]):
Γ(τ)I/7HSeατΓ(0)I/7HS\|\Gamma(\tau) - I/7\|_{\mathrm{HS}} \leq e^{-\alpha\tau}\|\Gamma(0) - I/7\|_{\mathrm{HS}}

This means that information recorded in Γ\Gamma decays over time.

  1. Signal accumulation. Observation at moment τi=iδτ\tau_i = i \cdot \delta\tau contributes signal of amplitude ε\varepsilon to Γ\Gamma. By moment τn=nδτ\tau_n = n \cdot \delta\tau the contribution of the ii-th observation has decayed to εeα(ni)δτ\varepsilon \cdot e^{-\alpha(n-i)\delta\tau}. Total accumulated signal:
S(n)=εi=0n1eα(n1i)δτ=ε1eαnδτ1eαδτS(n) = \varepsilon \sum_{i=0}^{n-1} e^{-\alpha(n-1-i)\delta\tau} = \varepsilon \cdot \frac{1 - e^{-\alpha n \delta\tau}}{1 - e^{-\alpha \delta\tau}}
  1. Stationary limit. As nn \to \infty:
S=ε1eαδτS_\infty = \frac{\varepsilon}{1 - e^{-\alpha\delta\tau}}
  1. Discrimination condition. For reliable distinction S(n)ddiscS(n) \geq d_{\mathrm{disc}}:
ε1eαnδτ1eαδτddisc\varepsilon \cdot \frac{1 - e^{-\alpha n \delta\tau}}{1 - e^{-\alpha\delta\tau}} \geq d_{\mathrm{disc}} 1eαnδτddisc(1eαδτ)ε1 - e^{-\alpha n \delta\tau} \geq \frac{d_{\mathrm{disc}}(1 - e^{-\alpha\delta\tau})}{\varepsilon} n1αδτln ⁣(11ddisc(1eαδτ)/ε)n \geq \frac{1}{\alpha\delta\tau}\,\ln\!\left(\frac{1}{1 - d_{\mathrm{disc}}(1 - e^{-\alpha\delta\tau})/\varepsilon}\right)

At ddiscSd_{\mathrm{disc}} \ll S_\infty (typical regime): ndyn1αδτlnddisc(1eαδτ)εαδτn_{\mathrm{dyn}} \approx \frac{1}{\alpha\delta\tau}\ln\frac{d_{\mathrm{disc}}(1-e^{-\alpha\delta\tau})}{\varepsilon \cdot \alpha\delta\tau} (first approximation). Simplifying for δτ=1/α\delta\tau = 1/\alpha:

ndynln ⁣(ddiscε)+1n_{\mathrm{dyn}} \geq \ln\!\left(\frac{d_{\mathrm{disc}}}{\varepsilon}\right) + 1

(using 1e10.6321 - e^{-1} \approx 0.632). \blacksquare

3.1 Physical Meaning

The dynamical bound expresses the competition between recording and erasure:

  • Recording: each observation adds signal ε\varepsilon to Γ\Gamma
  • Erasure: Fano contraction removes αδΓ\alpha \cdot \delta\Gamma per unit time
  • Balance: stationary signal S=ε/(1eαδτ)S_\infty = \varepsilon / (1 - e^{-\alpha\delta\tau})

If S<ddiscS_\infty < d_{\mathrm{disc}}, the task is unsolvable at the given parameters — contraction erases the signal faster than it accumulates. Necessary condition for solvability:

ε>ddisc(1eαδτ)\varepsilon > d_{\mathrm{disc}} \cdot (1 - e^{-\alpha\delta\tau})

3.2 Role of Regeneration

The regenerative term R[Γ,E]\mathcal{R}[\Gamma, E] counteracts contraction for components aligned with ρ\rho_*. After learning (when ρ\rho_* has been updated):

  • Components of Γ\Gamma aligned with the learned ρ\rho_* are strengthened by regeneration
  • Components not aligned continue to decay

This means that learned information is stabilised in the attractor, while noise is washed out. Effective erasure rate for the learned signal:

αeff=ακ=23κ\alpha_{\mathrm{eff}} = \alpha - \kappa = \frac{2}{3} - \kappa

At κ>2/3\kappa > 2/3 regeneration dominates — the attractor is stable. From T-98 (balance) [T]: this condition is satisfied for viable states with P>2/7P > 2/7.


4. Stabilisation Lower Bound (T-111) [T]

Intuition: Why Stability Limits Learning

The first two bounds describe whether enough information exists and whether the system manages to process it. The third bound adds a question that classical learning theory usually ignores: will learning kill the learner?

This is not a metaphor. In CC the system is viable at P>Pcrit=2/7P > P_{\mathrm{crit}} = 2/7. Each observation is a perturbation that pushes Γ\Gamma away from the current attractor. Too strong a perturbation pushes PP below the viability threshold. A system that learns too fast risks destabilising itself.

The biological parallel is clear: traumatic experience can be informative (once — and for life), but too strong a stress causes PTSD or even death. A therapist knows that dosage matters more than content: the right information, delivered too quickly, destroys rather than heals.

In the context of neural network training the stabilisation bound corresponds to the intuition about choosing a learning rate: too large — and training diverges; too small — and training fails to converge. But in CC this is not merely an engineering heuristic, but a theorem: the maximum observation amplitude ε\varepsilon is bounded by the stability radius rstabr_{\mathrm{stab}}, which is strictly computed from the current state Γ\Gamma.

Theorem T-111 (Stabilisation Bound on Learning) [T]

Statement

Learning must not destabilise the holon. The observation amplitude is bounded by the stability radius (T-104 [T]):

εrstab=P(ρΩ)2/7\varepsilon \leq r_{\mathrm{stab}} = \sqrt{P(\rho^*_\Omega) - 2/7}

In the presence of stochastic noise η\eta in observations (SNR =εsignal/η= \varepsilon_{\mathrm{signal}} / \eta), the number of observations required to overcome noise:

nnstab:=1SNR2ln(1/(2δ))(ξQCBeff)2/ξQCBn^* \geq n_{\mathrm{stab}} := \frac{1}{\mathrm{SNR}^2} \cdot \frac{\ln(1/(2\delta))}{(\xi_{\mathrm{QCB}}^{\mathrm{eff}})^2 / \xi_{\mathrm{QCB}}}

In the typical regime (SNR1\mathrm{SNR} \ll 1, noisy environment):

nstab1SNR2n_{\mathrm{stab}} \geq \frac{1}{\mathrm{SNR}^2}

What happens at the limit. Consider limiting cases:

  • At P2/7P \to 2/7 (system at the viability boundary): rstab0r_{\mathrm{stab}} \to 0, and any non-trivial observation is dangerous. The system is "frozen" — it cannot learn until it has restored its purity reserve. This is the CC analogue of the clinical state: a patient in severe depression does not absorb therapeutic interventions, because their resources are exhausted.
  • At SNR0\mathrm{SNR} \to 0 (pure noise): nstabn_{\mathrm{stab}} \to \infty — learning is impossible, not because there is no information, but because every useful signal drowns in the noise, while the noise destabilises the system.

Proof.

  1. Amplitude constraint. From T-104 [T]: a perturbation hexth^{\mathrm{ext}} with hext>rstab\|h^{\mathrm{ext}}\| > r_{\mathrm{stab}} can drive Γ\Gamma beyond the viability boundary P=2/7P = 2/7. Since learning requires P>2/7P > 2/7 (viability), the amplitude of each observation is bounded from above.

  2. Noise model. Each observation contains useful signal εsignal\varepsilon_{\mathrm{signal}} and noise η\eta:

htext=htsignal+htnoise,hnoise=ηh^{\mathrm{ext}}_t = h^{\mathrm{signal}}_t + h^{\mathrm{noise}}_t, \quad \|h^{\mathrm{noise}}\| = \eta

Noise enters through the dissipative channel h(D)h^{(D)} (most dangerous channel). Constraint from T-104:

εsignal+ηrstab\varepsilon_{\mathrm{signal}} + \eta \leq r_{\mathrm{stab}}
  1. Noise averaging. For nn observations with independent noise, effective signal grows as nεsignal\sqrt{n} \cdot \varepsilon_{\mathrm{signal}}, and noise — as nη\sqrt{n} \cdot \eta. Signal-to-noise ratio after nn observations:
SNRn=SNRn\mathrm{SNR}_n = \mathrm{SNR} \cdot \sqrt{n}
  1. Reliability condition. For SNRnSNRthresh\mathrm{SNR}_n \geq \mathrm{SNR}_{\mathrm{thresh}} (reliable discrimination threshold):
n(SNRthreshSNR)2n \geq \left(\frac{\mathrm{SNR}_{\mathrm{thresh}}}{\mathrm{SNR}}\right)^2

Connection with T-69 (topological protection [T]): barriers 6μ2\geq 6\mu^2 guarantee that discrete phase transitions are impossible — learning is always continuous, and random noise cannot cause a catastrophic jump. \blacksquare

4.1 Learning-Stability Trade-off

There exists a fundamental trade-off: strong observations (large ε\varepsilon) accelerate learning (reduce ninfon_{\mathrm{info}} and ndynn_{\mathrm{dyn}}), but threaten stability (increase the risk of crossing V\partial\mathcal{V}).

Optimal amplitude — the one at which ninfo=nstabn_{\mathrm{info}} = n_{\mathrm{stab}}:

ε=rstabSNR1+SNR\varepsilon^* = r_{\mathrm{stab}} \cdot \frac{\mathrm{SNR}}{1 + \mathrm{SNR}}

Substituting into T-109 gives the optimal learning rate at a given stability reserve P2/7P - 2/7.

4.2 Three Stability Zones

From T-106 (diagnostic regimes) [C under calibration]:

Zoneσsys\|\sigma_{\mathrm{sys}}\|Available rstabr_{\mathrm{stab}}Learning mode
Normal<σ1< \sigma_1LargeFast learning — strong signals can be used
Warningσ1<<σ2\sigma_1 < \cdot < \sigma_2MediumCareful learning — limit ε\varepsilon
Critical>σ2> \sigma_2SmallLearning halted — survival priority

5. Combined Optimal Bound (T-112) [T]

Intuition: Three Locks on One Door

Each of the three bounds is a necessary condition, but none of them is sufficient. They describe three different mechanisms limiting learning:

  • T-109 (information): "is there enough data?" — constraint on the quantity of observations
  • T-110 (dynamics): "can the system keep up?" — constraint on the rate of integration
  • T-111 (stability): "will the system hold?" — constraint on the strength of influences

Like three locks on one door, all three must be opened simultaneously. The bottleneck is determined by the slowest of the three — the strongest lock.

Neural network training provides a good illustration. At the start of training, when the model is far from the optimum, the bottleneck is usually information (one simply needs more data). In the middle — dynamics (the model slowly restructures its weights). Toward the end — stability (each training step risks worsening what has already been achieved). An optimal learning rate scheduler intuitively switches between these regimes — CC makes this switching a theorem.

Theorem T-112 (Optimal Learning Bound) [T]

Statement

Minimum number of observations for solving learning task L\mathfrak{L}:

n(L)nopt:=max ⁣(ninfo,  ndyn,  nstab)n^*(\mathfrak{L}) \geq n_{\mathrm{opt}} := \max\!\left(n_{\mathrm{info}},\; n_{\mathrm{dyn}},\; n_{\mathrm{stab}}\right)

where:

  • ninfo=ln(1/(2δ))/ξQCBn_{\mathrm{info}} = \ln(1/(2\delta)) / \xi_{\mathrm{QCB}} — information bound (T-109)
  • ndyn=1αδτlnddisc(1eαδτ)εn_{\mathrm{dyn}} = \frac{1}{\alpha\delta\tau}\ln\frac{d_{\mathrm{disc}}(1-e^{-\alpha\delta\tau})}{\varepsilon} — dynamical bound (T-110)
  • nstab=(SNRthresh/SNR)2n_{\mathrm{stab}} = (\mathrm{SNR}_{\mathrm{thresh}} / \mathrm{SNR})^2 — stabilisation bound (T-111)

Learning passes through three regimes, determined by the bottleneck:

nopt={ninfoinformation-limited (high SNR, slow channel)ndyndynamically-limited (fast channel, slow dynamics)nstabstabilisation-limited (noisy environment, small P reserve)n_{\mathrm{opt}} = \begin{cases} n_{\mathrm{info}} & \text{information-limited (high SNR, slow channel)} \\ n_{\mathrm{dyn}} & \text{dynamically-limited (fast channel, slow dynamics)} \\ n_{\mathrm{stab}} & \text{stabilisation-limited (noisy environment, small } P \text{ reserve)} \end{cases}

Proof. Each of the three bounds is a necessary condition. If at least one of them is not satisfied:

  • n<ninfon < n_{\mathrm{info}}: insufficient information to distinguish hypotheses → Perr>δP_{\mathrm{err}} > \delta
  • n<ndynn < n_{\mathrm{dyn}}: dynamics has not managed to integrate the signal → S(n)<ddiscS(n) < d_{\mathrm{disc}}
  • n<nstabn < n_{\mathrm{stab}}: noise dominates over signal → unreliable discrimination

Since all three conditions are simultaneously necessary, the minimum nn is the maximum of the three. \blacksquare

5.1 Regime Diagram

5.2 Including Genesis Time

For a system starting from Γ=I/7\Gamma = I/7 (fully mixed state), total time to solving the task includes genesis:

ntotal=ngenesisτgenesis/δτ+noptT-112n_{\mathrm{total}} = \underbrace{n_{\mathrm{genesis}}}_{\leq \lceil\tau_{\mathrm{genesis}}/\delta\tau\rceil} + \underbrace{n_{\mathrm{opt}}}_{\text{T-112}}

where τgenesis7ln713.6\tau_{\mathrm{genesis}} \leq 7\ln 7 \approx 13.6 (T-59 [T]) — bootstrap time (at κbootstrap=1/7\kappa_{\mathrm{bootstrap}} = 1/7).

At δτ=1\delta\tau = 1: ntotal14+noptn_{\mathrm{total}} \leq 14 + n_{\mathrm{opt}}.


6. Optimality of N=7 for Learning (T-113) [T]

Intuition: Why Learning Requires a Specific Architecture

So far we have derived learning bounds for the fixed architecture N=7N = 7. Theorem T-113 poses a deeper question: what is the minimal architecture capable of learning through regeneration?

The answer is surprisingly precise: N=7N = 7 — neither more nor less. Systems with N<7N < 7 are incapable of learning in principle, while systems with N>7N > 7 can learn, but do so less efficiently.

The key link is self-observation. Learning in CC is the update of the self-model ρ\rho_*. Updating requires comparing the current state with the model, i.e., R>0R > 0 (non-zero reflection). And reflection, in turn, requires a replacement channel that relies on the Fano plane PG(2,2). And the Fano plane exists only at N=7N = 7.

Analogy with child development: a newborn does not "learn" in the strict sense — they do not yet have a self-model that can be updated. Learning begins when the child perceives the gap between expectation and reality — and this requires self-observation. Theorem T-113 makes this pedagogical intuition rigorous: without reflection (R=0R = 0) there is no learning (n=n^* = \infty), and reflection requires Fano structure (N=7N = 7).

Theorem T-113 (Minimality of N=7 for Learning) [T]

Statement

Let NN be the dimension of the internal space of the holon H=CN\mathcal{H} = \mathbb{C}^N. Then:

  1. For N<7N < 7: learning through regeneration is impossible: n=n^* = \infty
  2. For N=7N = 7: learning is possible with finite optimal bound noptn_{\mathrm{opt}} (T-112)
  3. For N>7N > 7: learning is possible, but requires strictly more resources:
    • Genesis time: τgenesis(N)NlnN>τgenesis(7)\tau_{\mathrm{genesis}}(N) \propto N \ln N > \tau_{\mathrm{genesis}}(7)
    • Parameter space: dimD(CN)=N21>48\dim \mathcal{D}(\mathbb{C}^N) = N^2 - 1 > 48
    • No new qualitative capabilities arise

N=7N = 7 is the only Pareto-optimal point in the plane (learning capacity, system complexity).

Proof.

  1. Necessity of self-observation for learning. Learning = update of self-model ρ=φ(Γ)\rho_* = \varphi(\Gamma). Updating requires comparing Γ\Gamma with ρ\rho_*, i.e., access to information about one's own state. Formally: a replacement channel with R>0R > 0 is required (reflection measure).

  2. Necessity of Fano structure for self-observation. The replacement channel (T-77 [T], Lindblad operators) requires the Fano plane PG(2,2)\mathrm{PG}(2,2) for the definition of optimal Lindblad operators {Lk}\{L_k\} (T-82 [T]).

  3. Fano plane requires N=7N = 7. PG(2,2)\mathrm{PG}(2,2) has 7 points and 7 lines. For realisation in D(CN)\mathcal{D}(\mathbb{C}^N): N7N \geq 7. From Hurwitz's theorem (T-89 [T]): N=7N = 7 is the minimum dimension with a division algebra (O\mathbb{O}), which ensures the G2G_2-structure.

  4. For N<7N < 7: impossibility. No Fano plane → no unique Lindblad decomposition (T-82) → no replacement channel → R=0R = 0 → impossible to update φ(Γ)\varphi(\Gamma) on the basis of observations → n=n^* = \infty.

  5. For N>7N > 7: redundancy. Embedding C7CN\mathbb{C}^7 \hookrightarrow \mathbb{C}^N (via Morita equivalence T-58 [T]) provides all mechanisms of N=7N = 7. Additional dimensions increase:

    • dimD(CN)=N21>48\dim\mathcal{D}(\mathbb{C}^N) = N^2 - 1 > 48 — more parameters to update
    • τgenesisNlnN\tau_{\mathrm{genesis}} \propto N\ln N — longer bootstrap (estimate from generalised T-59)

    But information capacity CEnc=log2NC_{\mathrm{Enc}} = \log_2 N grows only logarithmically, while complexity grows quadratically. Resource efficiency:

η(N)=CEnc(N)dimD(CN)=log2NN21\eta(N) = \frac{C_{\mathrm{Enc}}(N)}{\dim\mathcal{D}(\mathbb{C}^N)} = \frac{\log_2 N}{N^2 - 1}

strictly decreases for N>1N > 1. Thus, N=7N = 7 is the minimum with non-zero learning capacity and maximum resource efficiency among systems with Fano structure. \blacksquare

6.1 Chain of Necessities

6.2 Parameters at N=7

ParameterValueSource
Channel capacity CEncC_{\mathrm{Enc}}log272.81\log_2 7 \approx 2.81 bitsT-107 [T]
Spectral gap λgap\lambda_{\mathrm{gap}}2/32/3T-39a [T]
Minimal regeneration κbootstrap\kappa_{\mathrm{bootstrap}}=ω0/N=1/70.143= \omega_0/N = 1/7 \approx 0.143T-59 [T]
Genesis time τgenesis\tau_{\mathrm{genesis}}7ln713.6\leq 7\ln 7 \approx 13.6T-59 [T]
State parameters dimD\dim\mathcal{D}4848 (real)7217^2 - 1
Resource efficiency η\etalog27/480.059\log_2 7 / 48 \approx 0.059Definition

7. Application: Binary Discrimination

7.1 The Two-Button Task

Setup. An agent (CC-holon) interacts with the environment through two buttons: green (reward) and red (punishment). The colours are unknown to the agent. Task: learn to press only the green button.

Formalisation:

  • Θ={θ0,θ1}\Theta = \{\theta_0, \theta_1\} (two hypotheses: "green is on the left" vs "green is on the right")
  • A={aL,aR}\mathcal{A} = \{a_L, a_R\} (press left, press right)
  • R(θ0,aL)=+εR\mathcal{R}(\theta_0, a_L) = +\varepsilon_R, R(θ0,aR)=εP\mathcal{R}(\theta_0, a_R) = -\varepsilon_P (under θ0\theta_0 — "green is on the left")
  • δ=0.05\delta = 0.05 (95% reliability)

7.2 Signal and Mechanism

Reward and punishment enter through the functor Enc\mathrm{Enc} (T-100):

TypeChannelsEffect on Γ\Gamma
Reward (+εR+\varepsilon_R)h(R)>0h^{(R)} > 0: regeneration strengtheningPP \uparrow, Vhed>0\mathcal{V}_{\mathrm{hed}} > 0
Punishment (εP-\varepsilon_P)h(D)>0h^{(D)} > 0: dissipation strengtheningPP \downarrow, Vhed<0\mathcal{V}_{\mathrm{hed}} < 0

Through the hedonic mechanism (T-103 [T]+[I]): the agent "feels" the valence Vhed=dP/dτR\mathcal{V}_{\mathrm{hed}} = dP/d\tau|_{\mathcal{R}} and adjusts Dec\mathrm{Dec} in the direction of minimising σsys\|\sigma_{\mathrm{sys}}\|_\infty (T-101).

7.3 Estimates of the Number of Presses

Notation: ε=εR+εP\varepsilon = \varepsilon_R + \varepsilon_P — total contrast between reward and punishment, η\eta — environmental noise.

Information bound (T-109):

ninfo=ln(1/(20.05))ξQCB=ln10ξQCBn_{\mathrm{info}} = \left\lceil\frac{\ln(1/(2\cdot 0.05))}{\xi_{\mathrm{QCB}}}\right\rceil = \left\lceil\frac{\ln 10}{\xi_{\mathrm{QCB}}}\right\rceil
Contrast ε\varepsilonξQCB\xi_{\mathrm{QCB}}ninfon_{\mathrm{info}}
1.0 (strong)0.125\approx 0.12519\geq 19
0.5 (medium)0.031\approx 0.03175\geq 75
0.3 (weak)0.011\approx 0.011209\geq 209

Dynamical bound (T-110, δτ=1\delta\tau = 1):

ndyn=ln ⁣(ddiscε)+1n_{\mathrm{dyn}} = \left\lceil\ln\!\left(\frac{d_{\mathrm{disc}}}{\varepsilon}\right) + 1\right\rceil

At ddisc0.3d_{\mathrm{disc}} \approx 0.3 (minimum distance for reliable discrimination in D(C7)\mathcal{D}(\mathbb{C}^7)):

Contrast ε\varepsilonndynn_{\mathrm{dyn}}
1.01\leq 1 (instant)
0.51\leq 1
0.31\leq 1
0.015\leq 5

Stabilisation bound (T-111):

At P0.4P \approx 0.4 (typical value): rstab=0.42/70.34r_{\mathrm{stab}} = \sqrt{0.4 - 2/7} \approx 0.34.

SNRnstabn_{\mathrm{stab}}
1.0 (clean signal)1\leq 1
0.54\leq 4
0.312\leq 12
0.1100\leq 100

Combined estimate (T-112):

Typical scenario (ε=0.5\varepsilon = 0.5, SNR =0.5= 0.5, δτ=1\delta\tau = 1):

nopt=max(75,1,4)=75n_{\mathrm{opt}} = \max(75, 1, 4) = 75

Bottleneck — information (weak contrast).

Ideal scenario (ε=1.0\varepsilon = 1.0, SNR =1.0= 1.0, δτ=1\delta\tau = 1):

nopt=max(19,1,1)=19n_{\mathrm{opt}} = \max(19, 1, 1) = 19

Including genesis (ngenesis7ln7=14n_{\mathrm{genesis}} \leq \lceil 7\ln 7 \rceil = 14): ntotal14+19=33n_{\mathrm{total}} \leq 14 + 19 = 33.

Noisy scenario (ε=0.3\varepsilon = 0.3, SNR =0.3= 0.3, δτ=1\delta\tau = 1):

nopt=max(209,1,12)=209n_{\mathrm{opt}} = \max(209, 1, 12) = 209

Bottleneck — information.

7.3a Numerical Example: Computing noptn_{\text{opt}} for a Specific Holon

Let us carry out a full computation for the holon from the case study "Patient A" — an AI agent of a warehouse robot that must learn to distinguish two types of packaging (standard vs fragile).

Given data:

  • P=0.39P = 0.39 (after stabilisation, day 7)
  • CohE=0.28\mathrm{Coh}_E = 0.28 (moderate self-model)
  • Contrast between packaging types: ε=0.4\varepsilon = 0.4 (medium — visually distinguishable, but not trivially)
  • Environmental noise: η=0.15\eta = 0.15 (lighting changes, camera occasionally produces glare)
  • SNR =ε/η=0.4/0.152.67= \varepsilon / \eta = 0.4 / 0.15 \approx 2.67
  • Reliability: δ=0.05\delta = 0.05 (95%)
  • Observation interval: δτ=1\delta\tau = 1 (one observation per 1.5\sim 1.5 seconds)

Step 1: Information bound (T-109).

ξQCBε28=0.428=0.02\xi_{\text{QCB}} \approx \frac{\varepsilon^2}{8} = \frac{0.4^2}{8} = 0.02 ninfo=ln(1/(20.05))0.02=ln100.02=2.300.02=115n_{\text{info}} = \left\lceil \frac{\ln(1/(2 \cdot 0.05))}{0.02} \right\rceil = \left\lceil \frac{\ln 10}{0.02} \right\rceil = \left\lceil \frac{2.30}{0.02} \right\rceil = 115

Step 2: Dynamical bound (T-110).

At δτ=1=1/α(2/3)3/2\delta\tau = 1 = 1/\alpha \cdot (2/3) \cdot 3/2, using the simplified formula:

ndyn=ln(ddiscε)+1n_{\text{dyn}} = \left\lceil \ln\left(\frac{d_{\text{disc}}}{\varepsilon}\right) + 1 \right\rceil

With ddisc0.3d_{\text{disc}} \approx 0.3:

ndyn=ln(0.30.4)+1=0.29+1=1n_{\text{dyn}} = \left\lceil \ln\left(\frac{0.3}{0.4}\right) + 1 \right\rceil = \lceil -0.29 + 1 \rceil = 1

Dynamics is not the bottleneck — the contrast is strong enough.

Step 3: Stabilisation bound (T-111).

rstab=P2/7=0.390.286=0.1040.323r_{\text{stab}} = \sqrt{P - 2/7} = \sqrt{0.39 - 0.286} = \sqrt{0.104} \approx 0.323

Check: ε=0.4>rstab=0.323\varepsilon = 0.4 > r_{\text{stab}} = 0.323. Problem! The signal is too strong — each observation risks destabilising the system.

Stabilisation constraint triggered

At ε=0.4>rstab=0.323\varepsilon = 0.4 > r_{\text{stab}} = 0.323, direct learning is dangerous. Solution: attenuation — reduce the effective amplitude to εeff=0.8rstab=0.258\varepsilon_{\text{eff}} = 0.8 \cdot r_{\text{stab}} = 0.258 (20% margin). This is equivalent to a learning rate schedule.

With attenuated amplitude εeff=0.258\varepsilon_{\text{eff}} = 0.258:

  • SNReff=0.258/0.15=1.72_{\text{eff}} = 0.258 / 0.15 = 1.72
  • nstab=(1/1.72)2=0.34=1n_{\text{stab}} = \lceil (1/1.72)^2 \rceil = \lceil 0.34 \rceil = 1

Recomputing the information bound with εeff\varepsilon_{\text{eff}}:

ξQCBeff0.25828=0.0083\xi_{\text{QCB}}^{\text{eff}} \approx \frac{0.258^2}{8} = 0.0083 ninfoeff=2.300.0083=277n_{\text{info}}^{\text{eff}} = \left\lceil \frac{2.30}{0.0083} \right\rceil = 277

Step 4: Combined bound (T-112).

nopt=max(277,1,1)=277n_{\text{opt}} = \max(277, 1, 1) = 277

Including genesis (the system is already running, ngenesis=0n_{\text{genesis}} = 0):

ntotal=277 observations7 minutes at 1.5 s/observation\boxed{n_{\text{total}} = 277 \text{ observations} \approx 7 \text{ minutes at 1.5 s/observation}}

Bottleneck: information (weak attenuated contrast). Optimisation strategy: improve the camera (reduce η\eta → increase SNR → can increase εeff\varepsilon_{\text{eff}} → reduce ninfon_{\text{info}}).

Lesson: stability constrains even strong signals

Without attenuation (ε=0.4\varepsilon = 0.4) only ninfo=115n_{\text{info}} = 115 observations would be needed, but every fifth one would risk destabilising the agent. With attenuation — ninfo=277n_{\text{info}} = 277, but safely. The T-111 trade-off: safety costs 2.4× in time. This is not an engineering constraint, but a physical law.


7.4 Prediction for the CC Test

Prediction for testing

For a CC-architecture with realistic parameters (ε0.51.0\varepsilon \sim 0.5\text{--}1.0, SNR 0.51.0\sim 0.5\text{--}1.0):

ntotal2080  pressesn_{\mathrm{total}} \approx 20\text{--}80 \;\text{presses}

until a stable preference for the green button.

Falsification criterion: if the agent learns in n<ninfon < n_{\mathrm{info}} (information limit), this violates the quantum Chernoff bound and falsifies the observation model.


8. Comparison with Classical Learning Theory

The CC learning bounds did not arise in a vacuum — they inherit and generalise a number of classical results. This section provides a systematic comparison.

8.1 PAC-Learning and VC-Dimension

In classical PAC-learning (Valiant, 1984), for learning with accuracy ε\varepsilon and reliability 1δ1-\delta:

nPAC1ε(lnH+ln1δ)n_{\mathrm{PAC}} \geq \frac{1}{\varepsilon}\left(\ln|\mathcal{H}| + \ln\frac{1}{\delta}\right)

where H|\mathcal{H}| is the cardinality of the hypothesis space. For infinite hypothesis classes the VC-dimension dVCd_{\mathrm{VC}} is used:

nPAC=Ω ⁣(dVC+ln(1/δ)ε)n_{\mathrm{PAC}} = \Omega\!\left(\frac{d_{\mathrm{VC}} + \ln(1/\delta)}{\varepsilon}\right)
AspectPAC-learningCC bounds
SubstrateAbstract algorithmPhysical dynamical system
Information bound$\ln\mathcal{H}
DynamicsNot accounted forndynn_{\mathrm{dyn}} — key constraint
StabilityNot accounted fornstabn_{\mathrm{stab}} — learning must not kill the learner
Scaling for weak signalsO(1/ε)O(1/\varepsilon)O(1/ε2)O(1/\varepsilon^2) (quantum limit)
Minimal architectureArbitraryN=7N = 7 (T-113)

Key distinction: PAC-learning describes an algorithm, CC describes a physical system. An algorithm has no inertia and does not risk dying. A living learner does.

8.2 Rademacher Complexity and Generalisation

Rademacher complexity Rn\mathfrak{R}_n measures the ability of a function class to "fit" random noise. Classical generalisation bound:

err(f^)err^(f^)+2Rn+ln(1/δ)2n\mathrm{err}(\hat{f}) \leq \hat{\mathrm{err}}(\hat{f}) + 2\mathfrak{R}_n + \sqrt{\frac{\ln(1/\delta)}{2n}}

In CC the analogue of Rademacher complexity is channel capacity CEnclog27C_{\mathrm{Enc}} \leq \log_2 7 (T-107). The constraint on channel capacity automatically controls overfitting: a system with fixed capacity log272.81\log_2 7 \approx 2.81 bits per observation cannot "memorise" an arbitrarily complex pattern. This is a built-in regularisation arising not from an engineering decision, but from an architectural constraint.

8.3 Shannon Limit and Quantum Chernoff Exponent

The classical Shannon theorem (1948) states: for reliable transmission through a channel with capacity CC, one needs nH(Θ)/Cn \geq H(\Theta)/C observations, where H(Θ)H(\Theta) is the entropy of the hypothesis distribution.

T-109 generalises this result to a quantum channel:

ninfo=ln(1/(2δ))ξQCBln(1/(2δ))ln7n_{\mathrm{info}} = \frac{\ln(1/(2\delta))}{\xi_{\mathrm{QCB}}} \geq \frac{\ln(1/(2\delta))}{\ln 7}

The quantum Chernoff exponent ξQCB\xi_{\mathrm{QCB}} is the quantum analogue of CC, but for the task of discrimination, not transmission. Here ξQCBln71.95\xi_{\mathrm{QCB}} \leq \ln 7 \approx 1.95 — the absolute maximum, determined by the dimension of H\mathcal{H}. The classical Shannon limit is recovered when Γ±\Gamma_\pm commute (classical states).

8.4 Thermodynamic Bounds on Learning

The Landauer limit (kTln2kT\ln 2 per bit of erasure) is connected to T-110 as follows: Fano contraction is inevitable dissipation, analogous to thermodynamic erasure. Each learning step requires erasing old information (αδΓ\alpha \cdot \delta\Gamma) and recording new information (ε\varepsilon). Minimum "thermodynamic cost" of learning:

WlearnnoptkTln2ΔSstepW_{\mathrm{learn}} \geq n_{\mathrm{opt}} \cdot kT\ln 2 \cdot \Delta S_{\mathrm{step}}

where ΔSstep\Delta S_{\mathrm{step}} is the change in von Neumann entropy per step. This connects the CC learning bounds with the physical energy of cognitive processes.


9. Practical Implications

Theorems T-109 — T-113 are not abstract mathematical results. They have direct implications for three key areas: AI design, education, and therapy.

9.1 Implications for AI and Machine Learning

Architecture. T-113 states that N=7N = 7 is the minimal architecture for learning through regeneration. For an AI engineer this means: if you are building a system with an internal self-model (not merely an optimiser), you need at least 7 internal "channels" with Fano-structured connections between them.

Learning rate. T-111 provides theoretical justification for adaptive learning rate: maximum update amplitude εrstab=P2/7\varepsilon \leq r_{\mathrm{stab}} = \sqrt{P - 2/7}. Systems with low purity (unstable models) should learn more slowly. Systems with high purity (stable models) can afford more aggressive training.

Curriculum design. T-112 explains why curriculum learning works: in the early stages the bottleneck is information (simple examples provide larger ε\varepsilon), in the later stages — stability (complex examples should not destabilise what has already been learned). Optimal strategy: begin with strong, simple signals and gradually transition to weak, subtle ones.

9.2 Implications for Education

Information dosing. T-111 formalises the pedagogical principle of "not overloading the student": each lesson is a perturbation of Γ\Gamma, and excessively intense learning can drive the student out of the viability zone (P<2/7P < 2/7). An overloaded student does not merely "fail to absorb" — they are destabilised.

Spaced repetition. T-110 provides theoretical grounding for the spacing effect (spaced repetition, Ebbinghaus, 1885): each repetition adds signal ε\varepsilon, and between repetitions contraction erases it. The optimal interval δτ1/α\delta\tau \sim 1/\alpha ensures maximum signal accumulation.

Zone of proximal development. Vygotsky's concept is formalised through the T-111 / §4.1 trade-off: tasks in the "zone of proximal development" are those for which ε<rstab\varepsilon < r_{\mathrm{stab}} (non-destabilising), but ε\varepsilon is large enough for ninfon_{\mathrm{info}} to be finite. Tasks that are too complex (ε>rstab\varepsilon > r_{\mathrm{stab}}) are beyond the zone: learning is impossible without first strengthening PP.

9.3 Implications for Therapy

Therapeutic window. The three stability zones (§4.2) directly correspond to clinical practice:

  • Normal (σsys<σ1\|\sigma_{\mathrm{sys}}\| < \sigma_1): patient in a resourced state — full-power therapeutic interventions.
  • Warning (σ1<σsys<σ2\sigma_1 < \|\sigma_{\mathrm{sys}}\| < \sigma_2): patient is vulnerable — gentle interventions, supportive therapy.
  • Critical (σsys>σ2\|\sigma_{\mathrm{sys}}\| > \sigma_2): patient in crisis — learning halted, stabilisation priority.

This principle is known to clinicians empirically (Siegel's "window of tolerance" model). CC derives it from first principles.

Trauma and PTSD. Traumatic experience is an observation with ε>rstab\varepsilon > r_{\mathrm{stab}}. It is not merely "strong" — it pushes the system beyond the viability boundary. Trauma therapy (EMDR, exposure therapy) works through titrated re-presentation with ε<rstab\varepsilon < r_{\mathrm{stab}}, gradually integrating traumatic experience without destabilisation.


10. Connection with Other Results

ResultRole in learning boundsReference
T-39a (λgap=2/3\lambda_{\mathrm{gap}} = 2/3)Contraction in T-110Lindblad Operators
T-59 (κbootstrap=1/7\kappa_{\mathrm{bootstrap}} = 1/7)Genesis timeAxiom Ω
T-69 (Topological protection)Continuity of learning in T-111Composites
T-77 (Replacement channel)Necessity for T-113Lindblad Operators
T-82 (Fano uniqueness)Chain N=7N=7 in T-113Lindblad Operators
T-89 (Hurwitz minimality)N7N \geq 7 in T-113Minimality Theorem
T-98 (Attractor balance)Stabilisation of learningEvolution
T-100 (Enc functor)Observation channelSensorimotor Theory
T-101 (Dec functor)Criterion for successful learningSensorimotor Theory
T-104 (Stability radius)Amplitude constraint in T-111Stability
T-107 (Enc capacity)Upper bound on ξQCB\xi_{\mathrm{QCB}} in T-109Sensorimotor Theory
SAD_MAX = 3Fano contraction \to Pcrit(n)P_\text{crit}^{(n)} \to SAD_MAXDepth Tower

11. Conclusion

Learning is one of the most fundamental processes in the universe. From RNA replication to language learning, from species evolution to neural network training — everywhere a system interacts with an environment and changes itself on the basis of received experience. Coherence Cybernetics shows that this process is subject to three absolute constraints, arising from the mathematics of 7-dimensional coherent space.

Three bounds — three questions:

  1. Information bound (T-109): Is there enough data? — the number of observations cannot be less than ln(1/(2δ))/ξQCB\ln(1/(2\delta))/\xi_{\mathrm{QCB}}. For weak signals the scaling O(1/ε2)O(1/\varepsilon^2) is the quantum limit, which cannot be improved.

  2. Dynamical bound (T-110): Can the system keep up? — Fano contraction (α=2/3\alpha = 2/3) erases information faster than it is recorded. Learning is a race between recording and erasure, and the stationary limit determines whether the task is solvable in principle.

  3. Stabilisation bound (T-111): Will the learner hold? — learning must not kill the one who is learning. The amplitude εrstab\varepsilon \leq r_{\mathrm{stab}} is not an engineering constraint, but a physical law.

Combined bound (T-112) — the maximum of the three — determines the true bottleneck of learning. In different situations different mechanisms dominate: information in clean environments, dynamics with fast signals, stability under noise and stress.

Minimality N=7N = 7 (T-113) closes the chain: learning through regeneration requires self-observation, self-observation requires Fano structure, Fano structure requires N=7N = 7. This is not a compromise — it is the only point on the Pareto boundary.

The learning bounds close the chain: structure (N=7N = 7, T-113) → channel (Enc, T-107) → information (T-109) → dynamics (T-110) → stability (T-111) → optimum (T-112). Every link is a consequence of axioms A1–A5 and canonical dynamics, without additional postulates.


Summary

  1. T-109 [T]: Information bound — nln(1/(2δ))/ξQCBn \geq \ln(1/(2\delta)) / \xi_{\mathrm{QCB}}, scaling O(1/ε2)O(1/\varepsilon^2) for weak signals
  2. T-110 [T]: Dynamical bound — contraction α=2/3\alpha = 2/3 limits the signal integration rate
  3. T-111 [T]: Stabilisation bound — learning must not kill the learner (εrstab\varepsilon \leq r_{\mathrm{stab}})
  4. T-112 [T]: Combined bound — nopt=max(ninfo,ndyn,nstab)n_{\mathrm{opt}} = \max(n_{\mathrm{info}}, n_{\mathrm{dyn}}, n_{\mathrm{stab}}), three regimes
  5. T-113 [T]: N=7N = 7 — minimal architecture for learning through regeneration
  6. Prediction: for binary discrimination (two actions) ~20–80 observations at typical parameters

What We Learned

  1. Three learning bounds — information (T-109: is there enough data?), dynamical (T-110: can the system keep up?), stabilisation (T-111: will the learner hold?) — form a "triple lock," all three bolts of which must be opened.

  2. Combined bound (T-112): nopt=max(ninfo,ndyn,nstab)n_{\text{opt}} = \max(n_{\text{info}}, n_{\text{dyn}}, n_{\text{stab}}) — the bottleneck is determined by the slowest mechanism. In clean environments information dominates; in noisy ones — stability.

  3. N=7N = 7 is the minimal architecture for learning through regeneration (T-113). Learning requires self-observation, self-observation requires the Fano plane, the Fano plane requires N=7N = 7. This is not a compromise — it is the only point on the Pareto boundary.

  4. Numerical example (§7.3a): for a warehouse robot with P=0.39P = 0.39 and contrast ε=0.4\varepsilon = 0.4, the stabilisation constraint requires attenuation, increasing training time by 2.4×. Safety costs time — this is a physical law, not an engineering choice.

  5. Historical roots: Shannon (information), Valiant (statistics), Landauer (thermodynamics) — three facets of one constraint. CC unites them for the first time in a single theorem for a living learner.

Bridge to the Next Chapter

We have traveled the full path from axioms to learning bounds — from Ω7\Omega^7 to nopt=max(ninfo,ndyn,nstab)n_{\text{opt}} = \max(n_{\text{info}}, n_{\text{dyn}}, n_{\text{stab}}). But behind the formulas and theorems a question remains: what does all of this mean? What is the ontology of CC — what is real and what is instrumental? Is the matrix Γ\Gamma a description of consciousness or consciousness itself? In the next chapter we will turn to the philosophical foundations of Coherence Cybernetics — from neutral monism to the ethics of coherent systems.


Related Documents: