Category: AI, Computation, and Technology

COUNTER PROPOSITIONS: TO RISKS STATED BY ANTHROPIC’S CEO RE #1 Our think tank (‘

COUNTER PROPOSITIONS: TO RISKS STATED BY ANTHROPIC’S CEO

RE #1
Our think tank (‘lab’) and our company (‘commercial application’) produce an AI governance layer that pretty much eliminates hallucination and all but guarantees a warrantable assessment of testifiable(truth), ethics (reciprocity), constructability (possibility), liability and restitutability.

We are certain that within two years it will be possible to gate even current LLMS and that in fact our governance layer or an equivalent will be required to do so – at least in an IP window that is a competitive advantage.
The thing is? Computable Epistemology and Decidability is far harder than you’d think and there is not much evidence of sufficient cross disciplinary knowledge in the field at present.

RE #2:
GIVEN:
a) There is plenty of interstitial discover to be made,
b) There is plenty of permutation discovery made,
c) So there is a relatively finite set of low handing fruit for AI to identify.
d) On the other hand the primary obstacle to innovation is not brains – it’s building experiments and tests.
e) There is a fundamental simple order to the universe (really because we have taught it to our AI), and everything evolved from it.
As such universal commensurability is possible and therefor constructive proof MIGHT be as well as constructive Hypothesis.

RESULT: This means we can’t extrapolate innovation by work of AIs any more than we can demonstrate that we have made any difference in the rate of innovation since 1963 despite vastly increasing the population and funding of researchers (and yes I am correct, sorry.).

Ergo, we should make early discoveries in the interstitial (cross disciplinary) and permutable (combinatorics) space. But those early discoveries will be misleading. The problem will remain boots on the ground testing, with technologies that are increasingly expensive when funding may be pressed by present asymmetric reproduction due to population aging and collapse.

RE: #3. We cannot make an LLM deceive when operating under our governance layer. The mistake everyone is making is that it’s something to do with LLM incentives instead of the semantic content of internet training includes deception that is provoked by context saturation.
Worse, the idea that LLMS are ‘just predicting the next word’ is a childish falsehood. Instead the latent space is a projection of n dimensional relations, the query or prompt is a union with it, and the attention layers are projections of wayfinding through that union. This is an almost perfect analogy of how the human langauge facility operates.

a) The difference is that humans engage in massive parallelism (darwinian competition between hypothesis) updated moment by moment via recursion as we speak. (You should have seen papers last week that illustrated the solution to the problem, or seen how Google is using (I think five) competing hypotheses in adversarial competition, which is one of the (costly) reasons for the radical improvement in Gemini.) FWIW the human grammatical faculty and the universe’s means of evolution are identical: continuous recursive disambiguation to the point of identity.

b) The other difference is that humans have episodic memory for compartmentalization.
You should have seen a paper in the past month that illustrated a rather simple solution – though they don’t arrive at the conclusion that’ they’ve reconstructed the faculty of episodic memory.

c) What’s left to produce is the equivalent of the prefrontal cortex that decomposes and tests any given hypothesis. Our governance layer is effectively that solution.

d) In fact the hardest problem we face, which we are close to overcoming, is that one subset of safety features demanding universalism (prohibiting sex, age, class, culture, civilizational, population group, differences) is causing the LLMs to constantly evade or lie about solving the hardest problems facing us, and prohibiting us from explaining those differences as rational adaptations both evolutionary and cultural, and offering possible means of compromise – thus helping us all understand each other as not evil per se, but as the product of evolution’s division of perception, cognition, valence, and labor.

e) All that is left is something I don’t see value in, which is consciousness – which is not the mystery philosophers claim it is. It’s the natural result of hierarchical memory processing, which is why it emerges incrementally among animals. Giving AIs a task or goal and having it loose ‘consciousness’ upon completion, while still storing episodic memory for later retrieval, tends to mitigate runaway recursive self interest – at least under our governance layer.

So from my understanding (and I have been at this problem since the early 80s and the resulting AI winter) we have all the pieces for AGI and possibly ASI (which is a questionable distinction for the reasons I said above).

FWIW, my experience is that the labs are not as sophisticated as they claim, and are making predictions based on correlations and processing power, and not on necessarily understanding ‘how to make a brain’. This is a kind of optimistic confidence. Even LeCun is overhyping his advancement when it is an addition to the language function. (He’s trying to solve the hippocampal problem which is the equivalent of the sixth sense: the production of a geometric world model in addition to a semantic one that we have today.) This is an add. AFAIK it’s not a replacement. It’s also something we understand, biomechanically, thoroughly.

Thanks for the read if you managed it.
Cheers
Curt Doolittle
NLI and Runcible inc.

Source date (UTC): 2026-01-27 02:31:13 UTC

Original post: https://twitter.com/i/web/status/2015975853298221216

January 27, 2026
LLMs are hypothesis generators, not proof generators. Hallucination means unwant

LLMs are hypothesis generators, not proof generators.

Hallucination means unwanted hypotheses (imagination). But we do often want hypotheses and imagination. We just want to know the difference.

Our organization produces a governance layer that converts LLMs to proof generators. But all that means is that we end hallucination, and state what’s misleading, lying, false, undecidable, possible, untestifiable, unethical or immoral, and what you’re liable for.

We’ve found that ideation from a proof is safe, so we suggest means of correction or cooperation after we have produced that proof.

Net is that you and the LLM producers are asking too much from LLMs. We explain why and what to do about it.

Cheers
CD

http://
runcible.com

(cc:
@BrianRoemmele
)

Source date (UTC): 2026-01-24 23:12:57 UTC

Original post: https://twitter.com/i/web/status/2015201185834365434

January 24, 2026
The weirdness of buying compute by the tens of gigawatts. What planet are we on

The weirdness of buying compute by the tens of gigawatts. What planet are we on again?

(RE: OpenAI buying 75GW of compute from Cerebras.)

Source date (UTC): 2026-01-15 23:31:45 UTC

Original post: https://twitter.com/i/web/status/2011944423878254940

January 15, 2026
Preparing Runcible for Diffusion Models as well as LLMs @werrellbradley, @lukewe
Preparing Runcible for Diffusion Models as well as LLMs
@werrellbradley

,

@lukeweinhagen

,

@autistocratms

,

@noahrevoy

,

@SaitouHajime00

@RobertRoe01

@Claffertysh

@ThruTheHayes

@bryanbrey

PROMPT
I understand functional programming, I understand object oriented programming, I understand state machines with dynamic execution and self modifying systems, I understand expert systems, I understand machine learning, I understand LLMs, I understand diffusion models. I see applications for each, although, the LLM technology appears to allow me to produce more dynamic outputs than the previous architectures.
Of late there is a lot of interest in diffusion models as an alternative to LLMs, even for text. But (a) IMO diffusion models gravitate toward existing representations and often disappointingly so, and (b) I am having trouble understanding whether diffusion models for textual content can be governed with something akin to our Runcible Governance Layer. Can you help me with this?

RESPONSE
You can govern text-diffusion with a Runcible-like layer, but you must separate (i) governance as an external procedure (model-agnostic) from (ii) governance as an internal control signal (model-specific). Diffusion is not a blocker; it just changes where you can intervene and what you can log.1)

Text diffusion is not one thing; it is at least two families:

Discrete diffusion over tokens
Forward process: progressively corrupt tokens (often independently per position) via categorical transitions; reverse process: iteratively denoise back to a token sequence. ACL 2025 summarizes this “tokens as discrete random variables” approach and its common independence assumptions per token.
A useful mental bridge: masked language modeling is a degenerate diffusion case; D3PM work explicitly notes “BERT is a one-step diffusion model” under an absorbing-[MASK] transition choice.

Continuous diffusion over latent/embedding representations
Forward process: add continuous noise in a continuous space (often sentence-level or latent-level); reverse: denoise continuous vectors, then decode to tokens. ACL 2025 also calls out a common limitation: sentence-level continuous diffusion often imposes uniform noise levels across all tokens, restricting token-wise contextual recovery.

A third adjacent category—often conflated in “diffusion for sequences”—is masked generative / iterative refinement (MaskGIT-style predict–mask–resample). Conceptually it behaves like a discrete diffusion schedule even when the authors don’t brand it as diffusion.

Diffusion training is, in effect, “learn to invert corruption by predicting what a typical in-distribution completion looks like given a partially destroyed sample.” That objective has two predictable consequences:

Imputation bias (regression-to-the-manifold): denoising steps preferentially move toward high-density regions of the training distribution. Unless you add a strong conditioning signal (prompt, retrieval, structured constraints, guidance), the sampler will converge to the “most normal” completion compatible with the partial evidence.

Independence and coarse corruption create blandness pressure: in the common discrete setup, tokens are corrupted independently with categorical distributions; this makes the reverse model’s job closer to denoising a bag of locally plausible tokens than constructing a globally novel plan. ACL 2025 flags the limitations of these discrete/continuous paradigms and motivates hybrids specifically because the naïve versions leave benefits on the table.

Conditional long-text is where naïve noising breaks first: Findings NAACL 2025 reports prior discrete diffusion models “completely fail” on long abstractive summarization due to “incompatibility between the backbone architectures and the random noising process,” and they introduce semantic-aware noising to address it.
This is the same phenomenon in your language: the corruption process can destroy the very structure you need to preserve to avoid falling back to generic manifold solutions.

So: your disappointment is not a mystery; it is a predictable interaction between corruption design and conditioning strength.

Yes—if you treat governance as a procedure over artifacts (inputs → outputs → certificates), not as a property of the generator.Runcible governance (as you’ve described it elsewhere) is essentially:

Contract-first (what must be produced, with what scope/limits),

Gate execution (decidability → truth/testifiability → judgment),

Certificate output (what was checked, what evidence supports it, what remains uncertain).

None of that depends on autoregressive decoding. It depends on:

the ability to parse the output,

the ability to run validators (deterministic and/or adversarial),

the ability to loop/repair when validators fail,

the ability to log enough to audit.Diffusion changes the intervention surface, not the viability.

You can govern diffusion in two non-exclusive ways:

A) Post-hoc governance (model-agnostic; simplest; strongest auditability)

Generate candidate(s).

Validate against contracts (schemas, citations, invariants, consistency).

If fail: regenerate/repair with explicit failure reports.

This works identically for AR LLMs, diffusion text models, expert systems, whatever.

B) In-process governance (model-specific; higher leverage; harder to engineer)
Diffusion gives you a natural iterative loop where you can inject constraints during denoising:

classifier(-free) guidance analogs exist for discrete diffusion, explicitly derived and evaluated in recent work.

“editability” is often easier: you can selectively re-noise/resample only the offending positions rather than rewriting an entire suffix (an AR pathology).

In practice: implement A first (because it is architecture-invariant), then selectively add B for cost/latency and higher constraint satisfaction.

This is one workable decomposition that preserves your decidability→truth→judgment hierarchy while respecting diffusion’s mechanics:

Step 0 — Output Contract (Decidability precondition)
Define an explicit contract that is checkable:

Structure: JSON/YAML schema, required fields, allowed enums.

Scope limits: what claims are allowed vs forbidden without evidence.

Evidence protocol: citation slots, data provenance requirements.

Invariants: e.g., “all numeric claims must be derivable from supplied sources or computations”; “no normative conclusion without stated trade-offs.”

(If the contract is not checkable, governance is performative.)

Run the diffusion model to produce k candidates:
vary seed / temperature / guidance scale,
optionally do constrained-token vocab for certain fields (dates, IDs, enums).

Run validators that do not require “interpretation”:
schema validation,
forbidden-string / forbidden-claim patterns,
internal consistency checks (references resolve, IDs consistent, units consistent),
structural completeness.If diffusion is being used, you can add an iterative structural repair mechanism: mask invalid spans and resample only those spans (diffusion-native edit loop).

retrieval verification (claims must map to retrieved passages),

citation coverage checks (every nontrivial empirical claim is cited),

contradiction checks (within document + against retrieved facts),

arithmetic checks.

This is where you can use any model (including an AR verifier) without giving it authorship authority; it is simply an adversarial tool in the gate.

If something cannot be decided mechanically:

force explicit uncertainty,

force explicit trade-offs,

output a bounded set of alternative actions and the costs of each.

Emit a structured certificate:

which gates ran,

pass/fail per gate,

residual uncertainties,

provenance hashes (inputs, retrieval set, seeds, model version).

This certificate is where “governability” becomes operational rather than reputational.

Diffusion’s iterative refinement gives you a clean primitive:

Localized resampling: “these 12 tokens violate schema / contradict evidence; re-noise and resample only them.”
AR decoding often forces suffix regeneration with cascading effects.

Constraint shaping via guidance: discrete diffusion has explicit guidance mechanisms analogous to classifier-free guidance.

Planning vs drift: AR methods can degrade via accumulated errors (“sampling drift”); diffusion is explicitly positioned as a non-autoregressive iterative denoising alternative to mitigate such error accumulation in some settings.

So, if your governance layer is built around repair and adversarial testing, diffusion can be a more cooperative substrate than it first appears.

Conditioning brittleness: if the noising process destroys semantic anchors, the reverse process will “snap back” to generic manifold text. This is documented in long-text conditional generation failures under naïve noising/backbone choices.

Trace interpretability: intermediate denoising states are less semantically interpretable than AR token streams. You can log them, but they are not “reasons.” This pushes you toward “certificate-first” governance (external evidence + checks) rather than “introspective” governance.

Discrete/continuous mismatch: continuous latent diffusion requires a decoder; governance over tokens becomes indirect unless you add token-level constraints

downstream.Net

: diffusion is governable, but you should bias toward external procedural governance and treat in-process guidance as an optimization.

Are you using diffusion primarily for (i) de novo long-form generation, (ii) constrained structured outputs, or (iii) editing/rewriting/infilling? (Diffusion tends to shine most in iii, sometimes ii, least reliably in i.)

Does your governance require hard guarantees (schema correctness, citation coverage, bounded uncertainty), or is “best effort with audit trail” sufficient?

Do you want the generator to also be the certifier, or are you comfortable with a split: diffusion proposes, separate verifier(s) certify?If you answer those, you can pick a clean pattern (pure post-hoc, hybrid proposer/verifier, or guided diffusion with constraint critics) and avoid the common trap: “try to make diffusion behave like an AR reasoner.”

URLs (sources cited)

https://papers.neurips.cc/paper/2021/file/958c530554f78bcd8e97125b70e6973d-Paper.pdf

https://aclanthology.org/2025.acl-long.565.pdf

https://aclanthology.org/2025.findings-naacl.352.pdf

https://openreview.net/forum?id=i5MrJ6g5G1

https://arxiv.org/html/2508.10995v1

PROMPT:
We’re a certifier. That’s our market position.

RESPONSE
If certifier is the product, then the correct stance is:

Generators are interchangeable, fallible proposers. Runcible is the certifying procedure and signature.

Diffusion does not threaten that position; it reinforces it, because diffusion makes it more obvious that “generation” and “certification” are separable functions.

If you sell certification, then you are selling three things, in order:

A contract language: what counts as “passes” is specified in machine-checkable terms.

A gate procedure: a deterministic + adversarial test pipeline that produces a verdict.

A certificate: an auditable artifact (hashes, provenance, checks performed, failures, uncertainty bounds) that can be relied upon downstream.

Everything else is implementation detail.

This implies a non-negotiable separation:

Proposer: any model/system that emits candidate text (LLM, diffusion, expert system, human, template, tool).

Certifier: Runcible pipeline that accepts/rejects/repairs and emits certificates.

So diffusion-text governance becomes: How do we adapt the proposer interface and the repair loop? Not: Can diffusion “be governed”?

You can keep your market position clean by defining three integration tiers:

Inputs: prompt + contract + evidence bundle
Outputs: candidate(s)Runcible does:

validate structure,
extract claims,
map claims to evidence,
run checks,
accept/reject,
request regeneration/repair.

Diffusion fits here perfectly. You do not need internal access to steps, logits, or denoising states to certify.

Adds: confidence metadata, token-level marginals (if available), or “which spans were most uncertain.”

Runcible uses metadata to target repairs and reduce resampling volume.

Some diffusion implementations can expose per-position uncertainty; some cannot. If they can, you harvest it; if not, remain Tier A.

Adds: in-process constraints (guided sampling, constrained decoding, partial resampling)

This is where diffusion’s iterative nature can help, but it is optional. Your certifier identity should not depend on it.

Below is a certifier loop that treats any generator—including diffusion—as replaceable.

You already do this. The key is to force checkability:

schema (JSON/YAML), required fields, enums, formats

scope limits and allowed claim types

evidence requirements per claim type (citation slots, provenance class)

invariants (numeracy, unit consistency, no orphan claims, etc.)

If something cannot be checked, it must be forced into an explicit “undecidable/unknown” channel.

Step 1 — Generate k candidates (parallel, architecture-agnostic)

For diffusion, “k” matters more because variance can be higher under weak conditioning.
Runcible selects candidates for downstream gating, not “the model’s favorite.”

Step 2 — Deterministic gates (Decidability)

These gates should be as non-interpretive as possible:

schema validation, formatting, completeness

forbidden constructs (unsupported claims, missing scopes)

internal consistency checks (IDs, references, units)

computable arithmetic checks where applicable

Fail ⇒ trigger targeted repair request (see Step 5).

Step 3 — Claim graph extraction (Truth precondition)

Transform candidate text into a claim graph:

atomic claims (subject–predicate–object, numeric assertions, causal assertions)

dependency edges (claim A relies on claim B; conclusion relies on premises)

citation hooks (which evidence supports which claim)

This is where certifier strength compounds: you are not “grading an essay”; you are certifying a structured set of commitments.

Step 4 — Adversarial truth gates (Truth/testifiability)

Run verification against provided and retrieved evidence:

citation coverage: every empirical claim has support

entailment/contradiction against evidence

cross-consistency with other claims in the output

adversarial query generation: “what evidence would refute this?” then check whether the system would have found it

boundary enforcement: anything beyond scope becomes “unknown” or “hypothesis”

Important: if you use models here, they are tools inside the procedure, not the authority. The certificate must report what was checked and with what limits.

Step 5 — Repair loop (diffusion is often strong here)

Instead of “regenerate everything,” certify-driven repair works as:

locate failing spans / failing claims

request patches that satisfy specific failed gates

re-run gates only for affected regions + dependent claims

iterate until pass or declared undecidable

Diffusion’s iterative/infilling nature can be an advantage: you can resample only the failing spans (when the proposer supports partial resampling). If not, you still do patching by prompting the proposer to rewrite only specified fields/spans.

Step 6 — Certificate emission (the product)

The certificate should contain:

contract version + hash

inputs + evidence bundle hashes

proposer identity (model, version, settings, seed(s) if available)

gate results: pass/fail + diagnostics

unresolved uncertainties: explicitly bounded

final verdict class: Certified / Certified-with-Exceptions / Not-Certifiable

signature (your key), timestamp, replay token

This is the differentiator: customers integrate the certificate into workflow and liability, not “the model output.”

Diffusion mostly changes three operational parameters:

Sampling strategy: you will often want more parallel candidates (k) at lower per-sample cost.

Repair granularity: diffusion-friendly proposers can patch localized spans efficiently; design your certifier to exploit that when available.

Conditioning discipline: diffusion-text can drift toward “manifold typicality” when conditioning is weak. Countermeasure is not philosophical; it is procedural: increase evidence conditioning, increase constraint strength, and tighten scope.

None of these change your certifier identity. They change throughput economics.

Your strongest market posture is:

“We certify outputs under contracts, independent of generator architecture.”

“We can certify diffusion, LLMs, tool-augmented systems, and humans—because certification is a procedure.”

That implies a commercial packaging:

Runcible Certified Output API: input (task + contract + evidence) → output + certificate

Certificate Verification API: certificate → valid/invalid + audit trail

Governance Ledger: store certificates for downstream dispute resolution / warrantyThis turns “certifier” into an infrastructure primitive rather than a model feature.

Certificate semantics: Is “Certified” a binary, or do you sell tiers (e.g., Structural Certified, Factual Certified, Decision Certified)?
My recommendation: tier it, because it maps to cost and to liability.

Undecidability handling: Do you force abstention, or allow certified outputs with explicit uncertainty bounds?
My recommendation: allow certification with exceptions, but require the exceptions to be machine-readable and signed.

When you say “certifier,” which domain is your first high-value wedge: (i) legal/compliance, (ii) medical/clinical, (iii) finance/accounting, or (iv) enterprise internal controls?

The reason this is necessary: the contract primitives (what is decidable, what evidence counts, what exceptions are allowed) are domain-shaped, even if the certification procedure is universal.

PROMPT
Well, we do all of them. We create domain specific protocols. In fact, we have almost 40 markets defined. This is why we prefer very large parameter models – so we have the most data to work with. And you’re able to produce the protocols if you have our books (volumes) and our existing code base. You just need us to help you incrementally expand your abilities.

RESPONSE

If you are certifier-first across ~40 verticals, then the correct scaling move is to treat “domain protocols” as overlays on a small, stable kernel, and treat “large parameter models” as proposal engines whose only job is to reduce authoring cost—not to supply warrant.

In causal terms:

Multi-domain coverage is contingent upon a universal certification grammar (kernel) that is necessary for commensurability across domains.

Large parameter models are useful insofar as they are sufficient to draft plausible candidates quickly, but they are neither necessary nor sufficient for certification; certification is contingent upon evidence discipline + gates + certificates.

Kernel (immutable-ish):

Output Contract primitives (schema, scope, claim taxonomy hooks)

Gate semantics (Decidability → Truth → Judgment)

Certificate schema + signing + ledger semantics

Generic validators (schema, numeracy, internal consistency, provenance, citation coverage, contradiction detection, uncertainty bounding)

Repair loop semantics (patch targeting, dependency re-check)

Domain overlay (mutable; many):

Claim types and subtypes (what counts as a “claim” in that domain)

Evidence registry (acceptable sources, hierarchy, provenance classes)

Gate mapping (which claims require which evidence / which checks)

Risk tiers (severity × population × time horizon) → required infallibility demand

Domain-specific calculi (e.g., dosage rules, accounting identities, legal elements tests)

This decomposition is what prevents “40 markets” from becoming “40 bespoke systems.”

Your certifier layer should define a single proposer interface:

propose(contract, evidence_bundle, context) -> candidates[]

optional: patch(contract, evidence_bundle, failing_spans, failing_claims) -> patches[]

An autoregressive LLM, a discrete text diffusion model, an expert system, or a human can all satisfy that interface. Your certification kernel remains identical; only proposer adapters differ.

Diffusion can become advantageous specifically in the patch mode (localized resampling), but you should treat that as an optimization, not a dependency.

Your stated preference—“very large parameter models so we have the most data to work with”—is directionally correct for drafting breadth, but certification strength does not come from “more priors,” it comes from binding outputs to evidence under procedure.

A high-leverage posture is:

Use large models to draft: contracts, candidate outputs, candidate claim graphs, candidate gate mappings, candidate test cases.

Use Runcible to bind: every nontrivial claim to evidence or to an explicit uncertainty bucket, with logged checks.

This makes your product resilient to model churn: when the underlying model changes, your certificates do not become epistemically invalid; at worst your authoring efficiency fluctuates.

To industrialize domain protocol creation, you want a repeatable pipeline that produces three artifacts per domain: (a) overlay spec, (b) test vectors, (c) certificate exemplars.

A workable factory loop:

Domain charter (1 page)
Purpose, actors, decision surfaces, harm surfaces, “what must never be asserted without evidence,” risk tiering.

Claim taxonomy (tight)
Enumerate claim types that matter (diagnostic, prognostic, causal, legal-element, compliance, financial-statement, valuation, etc.).
Each claim type gets: required fields, allowed modalities (fact/hypothesis/plan), allowed scope.

Evidence hierarchy + admissibility rules
What sources count, in what order, with what staleness limits, with what jurisdiction/time constraints.

Gate mapping table
For each claim type: which validators run, what constitutes pass/fail/undecidable, what is the required restitution path.

Repair strategies
For each failure signature: “patch this span,” “downgrade modality,” “add evidence,” “narrow scope,” “abstain.”

Test suite generation
golden-pass cases
golden-fail cases (missing evidence, contradictions, out-of-scope claims)
adversarial cases (prompt injection, citation laundering, numeracy traps)

Certificate exemplars
“Certified,” “Certified-with-Exceptions,” “Not-Certifiable,” with realistic diagnostics.

Regression harness + lint
Every protocol change must re-run domain tests and cross-domain invariants (so one vertical can’t quietly break kernel semantics).

This is how you avoid bespoke drift and keep protocol authoring “incrementally teachable.”

Given your constraint environment (file-count limits, S3-backed store, etc.), the minimal “ability expansion” package is not the whole corpus; it is the indices and exemplars that anchor the grammar.

High-yield inputs:

Kernel protocol registry + current kernel gate definitions (the canonical source of truth)

Two mature domain overlays (one legal/compliance, one quantitative like finance/medical)

One end-to-end certificate per overlay (pass + fail)

Your canonical claim taxonomy (even if incomplete)

A short excerpt from the books that defines: decidability, truth/testifiability dimensions, liability tiering, and “demonstrated interests” mapping into certification

With those, I can produce:

new overlay drafts for additional markets,

consistent gate mappings,

test suites,

and diffs that remain commensurable with your kernel.

Adopt a strict rule:

No domain protocol may introduce new gate semantics.

Domains may only (a) select gates, (b) parameterize thresholds, (c) define admissible evidence, (d) define claim types and schemas.

This is the necessary constraint that keeps 40 markets from becoming 40 incompatible “certifiers.
Source date (UTC): 2026-01-12 21:18:43 UTC

Original post: https://x.com/i/articles/2010823782605496609
January 12, 2026
Unfortunately you needed no such accident. Genetic expression is deterministic a

Unfortunately you needed no such accident. Genetic expression is deterministic and the resulting bot (stochastic parrot) is merely an expression of the probabilities of that genetic distribution given it’s heavy genetic load.

(Really. You should stop. I feel like the straight man in a comedy duo. You just keep feeding me material that’s so easy to spin it’s intellectually effortless and endlessly entertaining for my audience.)

Source date (UTC): 2026-01-09 06:03:49 UTC

Original post: https://twitter.com/i/web/status/2009506377010724913

January 9, 2026
(Runcible) –“We can not only save the truth verdicts that runcible issues, but

(Runcible)
–“We can not only save the truth verdicts that runcible issues, but we can save the false verdicts and train the AI to identify and explain the error that the user is making.” — Luke Wienhagen (
@LukeWeinhagen
)

Source date (UTC): 2026-01-06 19:24:31 UTC

Original post: https://twitter.com/i/web/status/2008620716120236370

January 6, 2026
Q: “What if everyone’s AI had access to our Runcible Protocols?” Short answer: u
Q: “What if everyone’s AI had access to our Runcible Protocols?”
Short answer: universal access would raise the cost of nonsense, lower the cost of cooperation, and expose parasitism—but only where people accept being measured by the same grammar. If they won’t, you get conflict at the boundary.

1) Single ingress + pinned tests → fewer rhetorical escapes → computable discourse.
Because the stack requires ingress through a commands/registry gate and pins Truth → Reciprocity → [Possibility] → Decidability in order, speech must pass the same checks or fail closed. Consequence: less equivocation, more “show your operations” culture. Function: interoperable judgments across domains.

2) Output-contracting claims → visible externalities/liability → cleaner incentives.
The protocols force a Sphere of Full Accounting, externalities ledger, and reciprocity gates before verdict emission. Consequence: institutions must either internalize costs or admit irreciprocity. Function: markets, law, and policy align on the same audit surface.

3) Deflationary grammar as the default → less inflationary narrative → higher signal density.
By construction the system privileges operational/deflationary language and treats inflationary narrative as non-measurement. Consequence: media, academia, and politics must translate rhetoric into operations or accept undecidability. Function: compression to commensurable, testable statements.

4) Ten-Tests + reciprocity scoring → standardized falsification → portable trust.
Truth tests with calibrated confidence and lie-severity, plus reciprocity scoring with hard gates (warranty/restitution), make verdicts comparable across cases. Consequence: less reliance on status/credential; more reliance on survivability under tests. Function: portable trust across firms, agencies, and polities.

5) Registry + aliases → civic usability → low-friction adoption.
Human-friendly commands mapped to canonical protocols lowers the skill threshold. Consequence: practitioners can invoke tests quickly; specialization remains optional, not necessary. Function: broad literacy in measurement, not just elite gatekeeping.

Boundary refusal: Groups that profit from inflationary grammars will reject ingress and pinning. Expect institutional trench warfare where auditability threatens rents. (Undecidability guard prevents laundering uncertainty into false certainty.)

Overreach risk: Forcing deflationary grammar into domains of genuine ambiguity can stall action; the stack mitigates by emitting UNDECIDABLE rather than faking verdicts.

Governance capture: If a monopoly actor controls registry/versions, the system can be weaponized. Countermeasure: pinned schema versions and single-door telemetry checks in the invariants.

Media/academia: Shift from opinion throughput to measurement throughput; publish claims with output contracts or mark them as undecidable narrative.

Firms/HR: Replace credential proxies with falsification reports and reciprocity compliance for role design, promotion, and vendor selection. Hard gates kill “performative compliance.”

Policy/law: Bills and regulations come with declared accounting spheres + externalities matrices, enabling citizen and market audit. Function: reversible policy with insurable risk bounds.

Open registry + public commands layer so anyone can call the tests; keep canonical text immutable, versions pinned.

Ship a citizen-grade “contractor”: paste a claim → get Truth/Reciprocity/Decidability + externalities sheet; default emission = plain summary + findings.

Mandate output contracts for state-facing actions (procurement, grants, rulemaking); incentivize in markets via warranty/restitution advantages.

Publish playbooks for translating inflationary grammars to deflationary form (ordinary → operational language conversion).

Necessary: shared ingress, pinned checks, and output contracts to make speech auditable.

Sufficient (with adoption): a civilizational upgrade from persuasion to computation in public reasoning—cooperation where possible, clean separation where not.

Contingent: political will to accept “UNDECIDABLE” as honest output rather than weaponizing pretense.

—
[audit | applied rules]
[style:prompt_protocol:operational_language] enforce causal→consequence→function; suppress moralization.
[closure:invariants:ingress/ordering/versions] single-door ingress; Truth→Reciprocity→Decidability; schema pins.
[contracts:output_contract:sphere+externalities] require full accounting before verdicts.
[reciprocity:hard_gates+warranty] liability and restitution as non-bypassable gates.
Source date (UTC): 2026-01-01 20:09:41 UTC

Original post: https://x.com/i/articles/2006820143612178685
January 1, 2026
A minimal “Primer” that forces correct classification of our work on Runcible De

A minimal “Primer” that forces correct classification of our work on Runcible

Definitions + dependency graph
a) Terms: Paradigm, grammar-as-measurement, domain, claim(s), test(s), constraint(s), closure, decidability, ledger (record)
b) Diagram: Text → Claim Graph → Tests → Evidence Bindings → Verdicts → Output Artifact

Theorem statements (short, ruthless)
a) No closure without proof obligations.
b) No audit without provenance.
c) No liability assignment without typed verdicts + trace.
d) No high-liability deployment without admissible abstention.
e) No cross-domain decidability without a baseline measurement grammar (Natural Law invariants).

Source date (UTC): 2025-12-31 19:25:32 UTC

Original post: https://twitter.com/i/web/status/2006446645052060158

December 31, 2025
The Problem: Why the AI Field Doesn’t “Get It” Most LLM orgs optimize for: bench
The Problem: Why the AI Field Doesn’t “Get It”
Most LLM orgs optimize for:

benchmark lift, preference ratings, throughput, and product delight

safety policy compliance as post-hoc filtering

They are not optimizing for:

warranty, audit, admissibility, and liability assignment per output

typed closure with abstention semantics

institutional dispute resolution as a first-class requirement

So they lack the conceptual vocabulary to interpret “closure” as a product primitive. Without your measurement grammar, they substitute their nearest category: “alignment/morals.”

Our secret sauce so to speak is producing closure in n-dimensional causality: reality.

It’s rocket science really.

Or it wouldn’t be the revolutionary innovation that it is.

Unfortunately you’d need a very deep understanding of the history of thought to grasp that we’re effectively bringing a darwinian revolution to social science and its computability.
Source date (UTC): 2025-12-31 19:21:09 UTC

Original post: https://x.com/i/articles/2006445540175990856
December 31, 2025
Why “Native Semantic Form” Matters – We Use The LLM’s Grammar, We Don’t ‘math it
Why “Native Semantic Form” Matters – We Use The LLM’s Grammar, We Don’t ‘math it’.
LLM producers often think: “If it’s serious, it belongs in a database with schemas.”

But natural langauge has a schema. We just narrow it into operational prose.

So our strategy is different: we exploit that most institutional knowledge already exists as semantically structured text:

policies, contracts, statutes, guidelines, SOPs

case narratives, incident reports, clinical notes

argumentation, exceptions, defeaters, precedence

definitions and scope conditions

Relational databases excel at extensional facts (rows/columns). They are poor at intensional structure (exceptions, precedence, defeaters, conditional obligations, scope clauses), unless you re-encode everything into a bespoke logic layer.

Runcible’s strategy is:

Keep normative/semantic artifacts in their native linguistic structure.

Compile them into tests and constraints rather than flattening them into relational calculus.

Use the LLM as a semantic compiler that can map text into claim graphs + proof obligations.

Use the governance layer to force typed closure and prevent rhetorical completion.

This is the key “why it works” that labs miss: we are not askinging the model to “be moral”; we are using it to compile institutional semantics into computable checks.

Apparenly our use of morality and truth is confusing. Except, all language that is of value to humans that can be used by machines is in fact either both truthful, ethical-moral, possible, and liable or it isn’t.

So the foundation of everything … is ethics. Yes. Really.

So we start with ethics and build a governance layer.
That way we ‘cleans’ the world model of everything that isn’t true, ethical, moral, possible, and liable.

For some reason that set of ideas seems counter-intuitive to people – even people in the field.
Source date (UTC): 2025-12-31 19:17:28 UTC

Original post: https://x.com/i/articles/2006444612521713737
December 31, 2025