Sidebars to Support “LLMs Don’t Just Predict The Next Word”
Constraint layers sit on top of the raw generative model and inject external demands into the incremental generation process.
-
Types of constraints:
Truth & Factuality – e.g., retrieval-augmented generation, knowledge graph checks.
Logical & Legal Rules – symbolic reasoners, compliance filters, safety policies.
Stylistic & Rhetorical Demands – tone control, summarization, pedagogical framing. -
Mechanism:
Each token’s probability distribution is modified before sampling to exclude or penalize paths violating the constraint set. -
Analogy to the brain:
Prefrontal cortex exerts top-down control over motor and language areas, pruning utterances that would violate social, moral, or strategic rules before speech leaves the mouth.
Constraint layers thus transform raw generative capacity into purposeful, accountable output.
Before producing any token, the LLM converts the entire prompt into a latent space through self-attention.
-
Self-attention as compression: Every token attends to every other token, integrating context into a unified representation.
-
Emergent structure: The latent space encodes syntax, semantics, discourse relations, even weak causal models of the world.
-
Dynamic update: As each token is generated, the latent space is locally altered, refining predictions for what follows.
This latent space is why LLMs can:
-
Translate across languages without explicit dictionaries.
-
Follow instructions never seen in training.
-
Reason across multiple steps without symbolic planning.
It functions as a temporary world-model, reconstructed anew for each prompt, guiding output toward coherence and relevance.
Modern neuroscience views the brain as a hierarchical prediction engine:
-
Sensory input generates prediction errors when reality diverges from expectation.
-
Cortical hierarchies minimize these errors by updating internal models of the world.
-
Speech and action emerge as predictions about motor output that continuously correct themselves as feedback arrives.
Key parallels with LLMs:
-
Hierarchical structure → transformer layers vs. cortical layers.
-
Incremental generation → token-by-token output vs. phoneme-by-phoneme speech.
-
Constraint gating → prefrontal control vs. external rule layers.
Both systems produce incremental, demand-satisfying trajectories through predictive world-models, rather than executing pre-written plans.
Source date (UTC): 2025-09-28 00:34:59 UTC
Original post: https://x.com/i/articles/1972097670773960795
Leave a Reply