-
Discrete diffusion over tokens
Forward process: progressively corrupt tokens (often independently per position) via categorical transitions; reverse process: iteratively denoise back to a token sequence. ACL 2025 summarizes this “tokens as discrete random variables” approach and its common independence assumptions per token.
A useful mental bridge: masked language modeling is a degenerate diffusion case; D3PM work explicitly notes “BERT is a one-step diffusion model” under an absorbing-[MASK] transition choice.
-
Continuous diffusion over latent/embedding representations
Forward process: add continuous noise in a continuous space (often sentence-level or latent-level); reverse: denoise continuous vectors, then decode to tokens. ACL 2025 also calls out a common limitation: sentence-level continuous diffusion often imposes uniform noise levels across all tokens, restricting token-wise contextual recovery.
-
Contract-first (what must be produced, with what scope/limits),
-
Gate execution (decidability → truth/testifiability → judgment),
-
Certificate output (what was checked, what evidence supports it, what remains uncertain).
-
the ability to parse the output,
-
the ability to run validators (deterministic and/or adversarial),
-
the ability to loop/repair when validators fail,
-
the ability to log enough to audit.Diffusion changes the intervention surface, not the viability.
-
Generate candidate(s).
-
Validate against contracts (schemas, citations, invariants, consistency).
-
If fail: regenerate/repair with explicit failure reports.
-
classifier(-free) guidance analogs exist for discrete diffusion, explicitly derived and evaluated in recent work.
-
“editability” is often easier: you can selectively re-noise/resample only the offending positions rather than rewriting an entire suffix (an AR pathology).
-
Structure: JSON/YAML schema, required fields, allowed enums.
-
Scope limits: what claims are allowed vs forbidden without evidence.
-
Evidence protocol: citation slots, data provenance requirements.
-
Invariants: e.g., “all numeric claims must be derivable from supplied sources or computations”; “no normative conclusion without stated trade-offs.”
-
retrieval verification (claims must map to retrieved passages),
-
citation coverage checks (every nontrivial empirical claim is cited),
-
contradiction checks (within document + against retrieved facts),
-
arithmetic checks.
-
force explicit uncertainty,
-
force explicit trade-offs,
-
output a bounded set of alternative actions and the costs of each.
-
Localized resampling: “these 12 tokens violate schema / contradict evidence; re-noise and resample only them.”
AR decoding often forces suffix regeneration with cascading effects.
-
Constraint shaping via guidance: discrete diffusion has explicit guidance mechanisms analogous to classifier-free guidance.
-
Planning vs drift: AR methods can degrade via accumulated errors (“sampling drift”); diffusion is explicitly positioned as a non-autoregressive iterative denoising alternative to mitigate such error accumulation in some settings.
-
Conditioning brittleness: if the noising process destroys semantic anchors, the reverse process will “snap back” to generic manifold text. This is documented in long-text conditional generation failures under naïve noising/backbone choices.
-
Trace interpretability: intermediate denoising states are less semantically interpretable than AR token streams. You can log them, but they are not “reasons.” This pushes you toward “certificate-first” governance (external evidence + checks) rather than “introspective” governance.
-
Discrete/continuous mismatch: continuous latent diffusion requires a decoder; governance over tokens becomes indirect unless you add token-level constraints
: diffusion is governable, but you should bias toward external procedural governance and treat in-process guidance as an optimization.
-
Are you using diffusion primarily for (i) de novo long-form generation, (ii) constrained structured outputs, or (iii) editing/rewriting/infilling? (Diffusion tends to shine most in iii, sometimes ii, least reliably in i.)
-
Does your governance require hard guarantees (schema correctness, citation coverage, bounded uncertainty), or is “best effort with audit trail” sufficient?
-
Do you want the generator to also be the certifier, or are you comfortable with a split: diffusion proposes, separate verifier(s) certify?If you answer those, you can pick a clean pattern (pure post-hoc, hybrid proposer/verifier, or guided diffusion with constraint critics) and avoid the common trap: “try to make diffusion behave like an AR reasoner.”
-
A contract language: what counts as “passes” is specified in machine-checkable terms.
-
A gate procedure: a deterministic + adversarial test pipeline that produces a verdict.
-
A certificate: an auditable artifact (hashes, provenance, checks performed, failures, uncertainty bounds) that can be relied upon downstream.
-
Proposer: any model/system that emits candidate text (LLM, diffusion, expert system, human, template, tool).
-
Certifier: Runcible pipeline that accepts/rejects/repairs and emits certificates.
-
schema (JSON/YAML), required fields, enums, formats
-
scope limits and allowed claim types
-
evidence requirements per claim type (citation slots, provenance class)
-
invariants (numeracy, unit consistency, no orphan claims, etc.)
-
schema validation, formatting, completeness
-
forbidden constructs (unsupported claims, missing scopes)
-
internal consistency checks (IDs, references, units)
-
computable arithmetic checks where applicable
-
atomic claims (subject–predicate–object, numeric assertions, causal assertions)
-
dependency edges (claim A relies on claim B; conclusion relies on premises)
-
citation hooks (which evidence supports which claim)
-
citation coverage: every empirical claim has support
-
entailment/contradiction against evidence
-
cross-consistency with other claims in the output
-
adversarial query generation: “what evidence would refute this?” then check whether the system would have found it
-
boundary enforcement: anything beyond scope becomes “unknown” or “hypothesis”
-
locate failing spans / failing claims
-
request patches that satisfy specific failed gates
-
re-run gates only for affected regions + dependent claims
-
iterate until pass or declared undecidable
-
contract version + hash
-
inputs + evidence bundle hashes
-
proposer identity (model, version, settings, seed(s) if available)
-
gate results: pass/fail + diagnostics
-
unresolved uncertainties: explicitly bounded
-
final verdict class: Certified / Certified-with-Exceptions / Not-Certifiable
-
signature (your key), timestamp, replay token
-
Sampling strategy: you will often want more parallel candidates (k) at lower per-sample cost.
-
Repair granularity: diffusion-friendly proposers can patch localized spans efficiently; design your certifier to exploit that when available.
-
Conditioning discipline: diffusion-text can drift toward “manifold typicality” when conditioning is weak. Countermeasure is not philosophical; it is procedural: increase evidence conditioning, increase constraint strength, and tighten scope.
-
“We certify outputs under contracts, independent of generator architecture.”
-
“We can certify diffusion, LLMs, tool-augmented systems, and humans—because certification is a procedure.”
-
Runcible Certified Output API: input (task + contract + evidence) → output + certificate
-
Certificate Verification API: certificate → valid/invalid + audit trail
-
Governance Ledger: store certificates for downstream dispute resolution / warrantyThis turns “certifier” into an infrastructure primitive rather than a model feature.
-
Certificate semantics: Is “Certified” a binary, or do you sell tiers (e.g., Structural Certified, Factual Certified, Decision Certified)?
My recommendation: tier it, because it maps to cost and to liability.
-
Undecidability handling: Do you force abstention, or allow certified outputs with explicit uncertainty bounds?
My recommendation: allow certification with exceptions, but require the exceptions to be machine-readable and signed.
-
Multi-domain coverage is contingent upon a universal certification grammar (kernel) that is necessary for commensurability across domains.
-
Large parameter models are useful insofar as they are sufficient to draft plausible candidates quickly, but they are neither necessary nor sufficient for certification; certification is contingent upon evidence discipline + gates + certificates.
-
Output Contract primitives (schema, scope, claim taxonomy hooks)
-
Gate semantics (Decidability → Truth → Judgment)
-
Certificate schema + signing + ledger semantics
-
Generic validators (schema, numeracy, internal consistency, provenance, citation coverage, contradiction detection, uncertainty bounding)
-
Repair loop semantics (patch targeting, dependency re-check)
-
Claim types and subtypes (what counts as a “claim” in that domain)
-
Evidence registry (acceptable sources, hierarchy, provenance classes)
-
Gate mapping (which claims require which evidence / which checks)
-
Risk tiers (severity × population × time horizon) → required infallibility demand
-
Domain-specific calculi (e.g., dosage rules, accounting identities, legal elements tests)
-
propose(contract, evidence_bundle, context) -> candidates[]
-
optional: patch(contract, evidence_bundle, failing_spans, failing_claims) -> patches[]
-
Use large models to draft: contracts, candidate outputs, candidate claim graphs, candidate gate mappings, candidate test cases.
-
Use Runcible to bind: every nontrivial claim to evidence or to an explicit uncertainty bucket, with logged checks.
-
Domain charter (1 page)
Purpose, actors, decision surfaces, harm surfaces, “what must never be asserted without evidence,” risk tiering.
-
Claim taxonomy (tight)
Enumerate claim types that matter (diagnostic, prognostic, causal, legal-element, compliance, financial-statement, valuation, etc.).
Each claim type gets: required fields, allowed modalities (fact/hypothesis/plan), allowed scope.
-
Evidence hierarchy + admissibility rules
What sources count, in what order, with what staleness limits, with what jurisdiction/time constraints.
-
Gate mapping table
For each claim type: which validators run, what constitutes pass/fail/undecidable, what is the required restitution path.
-
Repair strategies
For each failure signature: “patch this span,” “downgrade modality,” “add evidence,” “narrow scope,” “abstain.”
-
Test suite generation
golden-pass cases
golden-fail cases (missing evidence, contradictions, out-of-scope claims)
adversarial cases (prompt injection, citation laundering, numeracy traps)
-
Certificate exemplars
“Certified,” “Certified-with-Exceptions,” “Not-Certifiable,” with realistic diagnostics.
-
Regression harness + lint
Every protocol change must re-run domain tests and cross-domain invariants (so one vertical can’t quietly break kernel semantics).
-
Kernel protocol registry + current kernel gate definitions (the canonical source of truth)
-
Two mature domain overlays (one legal/compliance, one quantitative like finance/medical)
-
One end-to-end certificate per overlay (pass + fail)
-
Your canonical claim taxonomy (even if incomplete)
-
A short excerpt from the books that defines: decidability, truth/testifiability dimensions, liability tiering, and “demonstrated interests” mapping into certification
-
new overlay drafts for additional markets,
-
consistent gate mappings,
-
test suites,
-
and diffs that remain commensurable with your kernel.