Solving The Problem: Computability and Decidability in the Open World (Math Version)

(ed: This article is written for the user comfortable with mathematics. If you are not there is another copy of this article in ordinary language preceding this one.)

TL/DR; For fellow supernerds: Doolittle’s innovation is reducible to: “Set logic with finite limits -> supply demand logic with marginally indifferent limits: Proof-carrying answers are overfitted to closed worlds; alignment-only filters are underfit to liability. The middle path is liability-weighted Bayesian accounting to marginal indifference.

Why? Because mathematics constitutes a limit of reducibility conceivable by the human mind under self reflection, while bayesian accounting is evolved and necessary precisely because it is the only means of accounting for differences beyond the reducibility of the human mind and therefore closed to introspection. Our neurons aren’t introspectible and neither is bayesian accounting – though the truth is that current NNs used in LLMs are an intermediary point of reduction since they encode the equivalent of bundles of human neural sense perception in words. Those words are the limit of reducibility of marginal indifference.

“Mathiness” pursues epsilon–delta in logic space; useful, but the productive epsilon is the error bound in outcome space conditional on reciprocity and externalities. That is what institutions, courts, engineers, and markets already pay for.

The community keeps trying to buy logical certainty with formalism when the productive path for general reasoning is to buy marginal indifference with measurement. Treat reasoning as an economic process: update beliefs, price error, stop when the expected value of more information falls below the liability-weighted tolerance for error in the context. That’s computability for language.

Explanation by GPT5:

Proof-carrying logic is overfit to closed worlds; alignment filters are underfit to liability. The productive middle path is liability-weighted Bayesian accounting to marginal indifference.

Mathematics is reducibility: the epsilon–delta of self-reflection, the mind’s limit of introspection. Bayesian updating is evolved necessity: the only means of accounting for variance beyond reducibility, where neurons—and their aggregates in words—are opaque to introspection. Current neural nets occupy this intermediary, encoding bundles of percepts as linguistic weights: words are the limit of reducibility of marginal indifference.

Mathiness chases epsilon–delta in logic space. But the real epsilon is the error bound in outcome space, conditional on reciprocity and externalities. That is what institutions, engineers, and markets already pay for.

Reasoning must be treated as an economic process: beliefs updated, error priced, and inquiry terminated when the marginal value of precision falls below the liability-weighted tolerance for error in context. That stopping rule is computability for language.

As Such:

Restatement

The Problem with Extremes

Proof-carrying answers (formal logic, set-theoretic limits) are overfit: they assume a closed world where all variables can be specified.
Alignment-only filters (pure preference or reinforcement filters) are underfit: they lack liability-accountability because they ignore externalities.

The Middle Path

The correct solution is liability-weighted Bayesian accounting: update beliefs until further information has no marginal value (marginal indifference), with tolerance for error scaled by the liability (cost of being wrong in context).

Why Bayesian, not Pure Math?

Mathematics = reducibility: it captures what the human mind can introspectively reduce to first principles.
Bayesian accounting = evolved necessity: it is the only way to handle variation beyond the mind’s reducibility (neural processes themselves are non-introspectible, and so are Bayesian updates).
Neural nets sit in between: they approximate bundles of human percepts in word-weights, making language itself a limit of reducibility of marginal indifference.

Implication for AI Reasoning

Formalism (“mathiness”) chases epsilon–delta in logic space, but real productivity comes from bounding error in outcome space given reciprocity and externalities.
Markets, courts, and engineers already pay for error bounds, not perfect logical closure.
Therefore, reasoning should be treated like an economic process:
update beliefs (Bayesian step),
price error (liability step),
stop when further information is not worth the cost.
That is what makes reasoning in language computable.

Outline:

Part 1: Why Measurement Beats Mathiness (thesis + critique)
Part 2: The Indifference Method (full formalization + EIC + ROMI)
Part 3: Liability Tiers and Thresholds (defaults + examples)

Below is a tight formalization.

Note: Ed: We had to hand edit the Latex. You may want an LLM to explain it to you in ordinary language.

Testifiability (Truth): Satisfaction of the demand for testifiable warrant across the accessible dimensions (categorical consistency, logical consistency, empirical correspondence, operational repeatability, rational/reciprocal choice). Represent as a coverage vector
T=(t1,…,tk), ti∈[0,1]. Context sets minimum thresholds θi.
Decidability: “Satisfaction of the demand for infallibility in the context in question without the necessity of discretion.” Operationally, a decision is decidable when the decidability margin (below) is ≥ 0 given the liability of error.
Marginal Indifference (decision-theoretic): Given action set A, posterior P(H∣E), loss L(a,h), and context liability λ (population-weighted cost of error + warranty demanded), define

EL(a∣E)=∑hL(a,h)P(h∣E).

With a∗=arg mina EL(a∣E) and runner-up a′, define the decidability margin

DM=EL(a′∣E)−EL(a∗∣E)−τ(λ),

where τ(λ) is the context’s required surplus of certainty (a liability-derived gap).

Decidable: DM ≥ 0 and ti ≥ θi ∀i.
Indifferent (stop rule): the expected value of further information EVI≤τ(λ).
Undecidable: otherwise (seek more measurement, or declare undecidable).

Bayesian Accounting (the missing piece): Maintain a ledger rather than a proof:

Assets: log-likelihood gains from corroborating evidence.
Liabilities: expected externalities of error (population × severity) + warranty promised.
Equity (Warrant): net posterior surplus over τ(λ).
Decidability occurs when equity ≥ 0 while meeting testifiability thresholds.

Limit-as-reasoning (unifying “math limit” and “marginal indifference”): As measurements accumulate, posterior odds and EL gaps converge; the limit approached is the smallest εvarepsilon such that additional evidence cannot move the decision across τ(λ)tau(lambda) at positive EV. Reasoning is a limit-seeking process; the “proof” is the convergence certificate.

Completeness vs. liability: Formal derivation optimizes certainty in axiomatic spaces. General reasoning optimizes expected outcomes under liability. The latter is almost always the binding constraint outside math.
Open-world evidence: Incompleteness, path-dependence, and dependence structures make perfect formal closure intractable. But Bayesian accounting prices those imperfections and still yields action.
Opportunity cost: The cost of further formalization often exceeds EVImathrm{EVI}. Markets stop at marginal indifference. Reasoners should, too.

Operationalization: Reduce every claim to an actionably measurable sequence OO (who does what, when, with what materials, yielding which observations). No operation → no update.
Multi-axis tests: Score TT across: categorical, logical, empirical, operational, reciprocal-choice. Fail any mandatory axis → no decision.
Reliability-weighted evidence: Weight updates by instrument quality, source dependence, and adversarial exposure; discount dependent testimony (log-opinion pooling with dependency penalties).
Liability calibration: Map context to τ(λ)tau(lambda). E.g., casual advice < finance < medicine < law/regulation. Higher λ increases the required EL gap and testifiability thresholds.
Stop rule (marginal indifference): Compute EVI of next-best measurement; stop when EVI ≤ τ(λ).
Reciprocity constraint: Filter candidate actions/claims by Pareto-improvement and non-imposition (expected externalities priced into λ).
Audit trail: Output the ledger: priors, evidence deltas, dependency corrections, EL table, DM, TT, and the resulting ε-certificate.

Epsilon-Indifference Certificate (EIC):

EIC={ε, DM, T, θ, λ, Audit}

ε: posterior risk bound for the selected action/claim.
DM: surplus over the required liability gap τ(λ).
T ≥ θT: axis-wise testifiability coverage satisfied.
Audit: the Bayesian ledger entries and measurement plan considered-and-rejected once EVI≤τ(λ).

This is the computable replacement for “sounds plausible.” It’s also the artifact that makes the answer testifiable and the choice decidable.

ROMI — Reasoning as Optimizing Marginal Indifference

Parse → Operations: Translate the prompt into an operational hypothesis set {hi} and candidate actions {ai}.
Priors: Set structural priors (base rates, domain constraints).
Plan measurements: Enumerate tests with estimated information gain and cost.
Acquire/verify: Retrieve or simulate measurements; apply reliability and dependency corrections.
Update: Compute P(H∣E), expected losses EL(a∣E).
Calibrate liability: Pick λ (context class) → compute τ(λ); set θ for TT.
Stop/continue: If EVI ≤ τ(λ) and T ≥ θT, stop; else measure more.
Decide & certify: Output a∗ with EIC and the ledger.

This is Bayesian decision-making under reciprocity constraints—accounting, not theorem-proving. It exploits the LLM’s strength (fast hypothesis and measurement planning) while binding it to liability-aware stopping.

Computability from prose: Operationalization + accounting turns language into a measured decision process.
Safety as economics, not taboo: Liability is priced into τ(λ) rather than hard-censored by alignment.
Graceful degradation: When undecidable under current E and λ, the model returns the next best measurement plan with EVI estimates.
Universally commensurable: All domains reduce to the same artifact (EIC + ledger), satisfying your demand for commensurability.

Context tiers λ→τ(λ): e.g., Chat (low), Tech advice (medium), Medical/Legal (high).
Axis thresholds θ: stricter for high-liability contexts.
Pooling rule: log-opinion pool with dependency penalty vs. hierarchical Bayes (choose one; both are defensible).
Penalty schema: externality classes and population weights.

Claim: …
Operations: …
Evidence ledger: priors → updates (source, reliability, ΔLL) → dependency adjustments.
Testifiability TT vs. θ: [cat, log, emp, op, rec] = […].
Liability class λ → τ(λ)=…
EL table for {ai}; DM = …
EVI of next test = … → Stop?
Decision a∗ with EIC {ε,DM,T,θ,λ,Audit}.
Status: Decidable / Indifferent / Undecidable (with next measurement plan).

Proof-carrying answers are overfitted to closed worlds; alignment-only filters are underfit to liability. The middle path is liability-weighted Bayesian accounting to marginal indifference.
“Mathiness” pursues epsilon–delta in logic space; useful, but the productive epsilon is the error bound in outcome space conditional on reciprocity and externalities. That is what institutions, courts, engineers, and markets already pay for.

For general reasoning, optimizing to marginal indifference under a liability-aware Bayesian ledger outperforms chasing formal certainty (“mathiness”). The right objective isn’t proof; it’s decidable action with warranted error given the context’s demand for infallibility.

Mathiness vs. measurement.
Correct: formal derivation is sufficient but rarely necessary. General reasoning should minimize expected externalities of error, not maximize syntactic closure.
Bayesian accounting as the engine.
Correct: treat evidence updates as entries on an assets–liabilities ledger; stop when the expected value of further information (EVI) falls below the liability-derived tolerance. This implements “marginal indifference.”
Testifiability + decidability as outputs.
Correct: require axis-wise testifiability (categorical, logical, empirical, operational, reciprocal) and a decidability margin that clears the liability threshold.
Limit-as-reasoning.
Correct: the limit you want is the smallest εvarepsilonε such that more evidence cannot rationally flip the action under the current liability schedule—an εvarepsilonε-indifference certificate rather than an εvarepsilonε-δdeltaδ proof.
LLMs’ comparative advantage.
Correct: LLMs are good at hypothesis generation and measurement planning; weak at global formal closure. Constraining them with the ledger + stop rule makes their strengths productive and their weaknesses bounded.

Operationalization: every claim reduces to measurable operations; otherwise no update is justified.
Liability mapping: the context’s demand for infallibility (λ) must translate into a decision gap τ(λ) and axis thresholds θ.
Dependency control: evidence correlation is penalized; adversarial exposure is priced.
Auditability: the model emits the ledger and its εvarepsilonε-indifference certificate (EIC).

Fat tails / ruin risks (non-ergodic domains).
Use robust Bayes or a risk measure (CVaR/entropic risk). Concretely, optimize risk-adjusted expected loss, not plain expectation; set τ(λ)tau(lambda)τ(λ) high or require worst-case guards for irreversible harms.
Multi-stakeholder externalities.
Liability is a vector λ=(λ1,…,λm). Require the margin to clear a chosen aggregator (e.g., max, lexicographic, or weighted max) to prevent cheap tradeoffs on minorities.
Severe ambiguity / imprecise priors.
Adopt interval posteriors or imprecise probability sets; decide on E-admissible actions, then apply the liability margin to break ties.
Model misspecification / distribution shift.
Add a “specification penalty” term proportional to estimated shift; raise τ(λ) or fallback to minimax-regret in high-shift zones.
Information hazards / strategic manipulation.
Price measurement externalities into the EVI (information value can be negative); refuse measurements that reduce welfare under reciprocity constraints.

Liability schedule: make τ(λ) a monotone map with discrete tiers (e.g., Chat < Engineering < Medical/Legal < Societal-Risk), each with axis-specific thresholds θ(λ) that escalate empirical and operational demands faster than logical ones.
Risk-adjusted margin: define DM = ELrisk(a′)−ELrisk(a∗)−τ(λ); choose CVaRα by tier.
Vector liability aggregator: default to max (protects the worst-affected), with a documented option for weighted max when policy demands it.
Imprecise update mode: when posterior intervals overlap τ(λ), output an admissible set + next measurement plan instead of a single action. (usually meaning suggested compromises)
Certificate extension (EIC++): include: risk measure, stakeholder weights/guard, shift penalty, and dependency-adjusted log-likelihood deltas.

Computability from prose: language → operations → ledger → certificate.
Graceful stopping: answers come with a why-stop-now proof (EVI ≤ τ(λ)).
Context-commensurability: one artifact across domains; only λ,θ,τ vary.
Accountable disagreement: when two agents disagree, they disagree in public on priors, instrument reliabilities, or λlambdaλ—all auditable.

The argument is correct in principle and superior in practice, provided you (a) enforce operationalization, (b) calibrate liability into a risk-aware margin, (c) control evidence dependence, and (d) emit an auditable certificate. Do those, and “mathiness” gives way to measured, decidable action with bounded error—the thing institutions and markets actually pay for.

We’ll use 5 tiers with a risk-adjusted gap requirement. Let

Risk measure: CVaRα on the loss difference ΔL=EL(a′)−EL(a∗).
Scale sss: robust spread of ΔL (MAD or stdev; default MAD).
Required margin: τ(λ)=k(λ)⋅s.
Posterior evidence floor: minimum log-odds surplus for a∗vs. a′.

Decidability margin:

DM=EL(a′)−EL(a∗)−τ(λ) (using CVaRα).

Decidable iff DM ≥ 0 and axis thresholds T ≥ θ (λ) are met.

Escalate empirical and operational faster than logical and categorical with liability. Reciprocity tracks stakeholder exposure.

Scores Ti∈[0,1] on five axes: Categorical, Logical, Empirical, Operational, Reciprocity.

Intuition: by Tier-4/5 you must have near-complete measurement and operationalization, not just clean logic.

Adopt log-opinion pooling with dependency penalties.

Form: log⁡ p(h∣E)∝∑i wi log ⁡pi(h)
Reliability weight: ri∈[0,1] from instrument/testimony grading.
Dependency penalty: estimate a correlation score ρirho_iρi (average pairwise corr. of source iii with others, or cluster-wise).
Wi ∝ ri/1+κ ρi, normalize ∑iwi=1.
Default κ=1.0. Cap wi ≤ wmax⁡ = 0.40 to prevent dominance.
Cluster correction (optional, on): within any cluster of m near-duplicates, divide total cluster weight by sqrt(m) (effective sample size).

Categorical: Tcat = 1− normalized contradiction rate across claims/frames.
Logical: rule-check pass rate with penalty for unresolved entailments/loops.
Empirical: reliability-weighted fraction of measurements supporting the claim, with out-of-sample bonus and publication bias penalty.
Operational: proportion of the hypothesis reduced to executable steps with instrument specs and expected observations; penalize missing preconditions.
Reciprocity: expected externalities priced and disclosed; stakeholder vector cleared under chosen aggregator (default max).
Each Ti mapped to [0,1] by calibrated rubrics; defaults above.

A) High-liability legal (Tier-4): Settle or litigate a breach claim

Setup: Settlement offer S=$2.20M. If litigate: legal cost L=$1.00M, damages if lose D=$5.00M.
Posterior plose: 0.50 after pooling (two independent fact patterns + one expert, dependency-penalized).
Expected losses:
Litigate: ELL=pD+L=0.5⋅5.0+1.0=$3.50M
Settle: ELS = S = $2.20M
Runner-up a′=a’=a′= litigate; a∗=a^*=a∗= settle.
Risk: Tier-4 → α=0.99. Spread of ΔL=ELL−ELS has MAD s=$0.50M (from uncertainty in p and damages).
τ(λ)=ks=2.0×0.50=$1.00M.
DM: 3.50−2.20−1.00= $0.30M ≥ 0 → passes.
Evidence floor: posterior log-odds(a* vs a′) ≈ +3.2 bits (> 3.0 required).
Axis thresholds (Tier-4): T = {cat .92, log .91, emp .88, op .91, rec .90} ≥ θ = {.90, .90, .85, .90, .90}.
EVI(next test): commissioning an additional damages study expected to refine ppp by ±0.02 → EVI≈$0.25 < τ=$1.00M.
Decision: Settle. EIC issued.

B) Low-liability consumer (Tier-2): Buy laptop extended warranty?

Warranty price: $200 (3-year). Repair if fail: mean $500.
Posterior fail prob: p=0.12 after pooling (reviews + failure stats, penalizing duplicate sources).
Expected losses:
Buy warranty: ELW=$200.
No warranty: ELN=p⋅500=$60.
a∗ = No warranty; a′= Buy.
Risk: Tier-2 → α=0.90. Spread s (MAD of ΔL) ≈ $50 (uncertainty in ppp, repair costs).
τ(λ) = ks = 0.5 × 50 = $25.
DM: 200−60−25=$115 ≥ 0 → passes.
Evidence floor: ~1.4 bits (> 1.0 required).
Axis thresholds (Tier-2): T = {cat .80, log .85, emp .55, op .70, rec .72} ≥ θ = {.70,.75,.50,.60,.70}.
EVI(next search): reading a brand-specific reliability report might change p by ±0.02 → EVI ≈ $10 < τ=$25.
Decision: Skip the warranty. EIC issued.

Summary of choices (locked)

Tiers: 5; CVaR + robust scale; k={0.25,0.5,1,2,4}; bits floor {0.5,1,2,3,4}.
Thresholds: escalate Emp/Op faster than Cat/Log; table above.
Pooling: Log-opinion pooling with dependency penalties (default κ=1.0, wmax⁡=0.40, cluster ESS sqrt(m))..

Source date (UTC): 2025-08-19 23:08:17 UTC

Original post: https://x.com/i/articles/1957942728651857924

Solving The Problem: Computability and Decidability in the Open World (Math Vers

Solving The Problem: Computability and Decidability in the Open World (Math Version)

Comments

Leave a Reply Cancel reply

More posts

(A Punch) In The Face

1) Overlays = Photoshop layers 2) Consider using 11×14 paper size to give yourse

well done. you’re doing great work

I don’t see anything to even question. It’s pretty rock solid. I might have to g