THE VIRTUE OF SMALL MODELS?
Can I steel man this a bit?
1 – The paradigm (dimensions), vocabulary (references), grammar (rules of expression formation), and logic (constraints on available operations) available in math is tiny and in programming is highly constrained.
2 – The same properties of the physical sciences are larger. The properties of the behavioral sciences are far larger than those. The properties of language are reducible to dimensions whose combinatorics are higher than any other domain.
3 – So you are measuring small domains with small and internal closure – in other words you’re claiming the easiest problem can be reduced to the smallest paradigm, vocabular, grammar, and logic.
Um… it’s absurdly obvious.
Why are humans so effective at language, behavior, cooperation, and cooperation at scale – yet mathematics and programming are a challenge?
It’s also …. absurdly obvious.
4 – Why are small parameter models better at tiny grammars, and why are large parameter models better at vast grammars?
It’s also …. absurdly obvious:
The number of dimensions captured in every referent; the number of operations (field of potential) in every referent, the use of real-world closure instead of internal (set) closure.
I work, my team and my organization work, in the ‘hard’ grammars: we have to discover means of closure possible for LLMs. And LLMs can only provide that closure with real world evidence not tests of internal consistency by permutability.
There is no substitute for the relationship between the paradigm (collection of domains), domains (axis of causality) referents in a domain (names of positions in a domain), available transformations (operations), and most importantly, means of closure (limits providing tests of equality, inequality) within that paradigm.
As such, all the ‘hard problems’ require survival from adversarial competition by the only means of closure available: demonstrated behavior in reality under realism, and naturalism and operationalism.
As such large models for hard problems of wide causal density, and high combinatorics and small models for easy problems but narrow density but high permutability.
Curt Doolittle
Runcible
NLI
Source date (UTC): 2025-09-21 22:14:01 UTC
Original post: https://twitter.com/i/web/status/1969887867557368035
Leave a Reply