Definition: Epistemic Compression in Grammars and in AI
It is the process by which the human mind reduces the infinite potential of experience into finite systems of reference—rules, models, or categories—so that knowledge becomes communicable, repeatable, and decidable.
-
Dimension Reduction → stripping irrelevant or noisy features from sensory or conceptual input.
-
Indexical Substitution → replacing raw intuitions with symbolic tokens (numbers, terms, concepts).
-
Recursive Transformation → applying lawful operations to refine meaning within bounded contexts.
-
Closure → halting the process at a stable form (proof, rule, narrative resolution, judgment).
-
Limited memory → we cannot store infinite details; compression turns flux into durable representations.
-
Bounded attention → we cannot process everything simultaneously; compression focuses relevance.
-
Costly inference → reasoning consumes time and energy; compression reduces the search space.
-
Need for coordination → cooperation requires shared, testable references; compression produces common syntax.
-
Reduces entropy in the space of possible beliefs.
-
Enables decidability by converting ambiguity into testable claims.
-
Supports prediction by stabilizing causal relations.
-
Facilitates cooperation by aligning individuals under shared constraints.
LLMs typically externalize latent reasoning by generating step‑by‑step narratives—Chain‑of‑Thought (CoT)—that guide ambiguous, high‑dimensional prompts through intermediate linguistic steps toward a conclusion
.
CoT transforms the internal, expansive search space into a linear sequence of human-readable “mini‑grammar” steps—each reduction brings us closer to a concise, checkable conclusion. The grammar here is natural language, constrained by the syntax and semantics the LLM has internalized.
.
-
Two‑Level Recurrence
A low‑level module (L) handles fast, detailed, local computations.
A high‑level module (H) sets a slow, abstract planning context.
-
Hierarchical Convergence
Each low‑level sequence converges to a fixed‑point under the current high‑level context. Then the high‑level updates and resets the low‑level—creating nested cycles of compression and refinement.
-
Training Without BPTT
Instead of backprop through time, HRM uses a one‑step gradient approximation, computing gradients at the equilibrium—drastically reducing memory cost.
-
Adaptive Computation
A reinforcement‑learning‑based Q‑head decides when to halt reasoning depending on problem complexity: more cycles for harder tasks, fewer for easier ones.
-
Compression: Complex reasoning is reduced to nested latent fixed‑point computations, eliminating the need for explicit textual reasoning paths.
-
Decidability: The halting mechanism ensures the process concludes in a well‑defined state, producing a testable output.
-
Efficiency: HRM achieves deep, Turing‑complete computation using only 27 M parameters and ~1,000 training examples—far fewer than CoT models require
.
-
Sudoku (Extreme): Near‑perfect accuracy where CoT fails entirely.
-
Maze Solving (30×30): Optimal pathfinding with zero examples required by larger CoT models.
-
ARC‑AGI Benchmark: Achieves 40–55 % accuracy—well above much larger models
.
.
Source date (UTC): 2025-08-22 20:17:11 UTC
Original post: https://x.com/i/articles/1958986830499782692