How To Use Our Methodology On Your LLM
-
A computable curation grammar (from Vol. 2) that turns messy prose into scored claims with warrants, operations, contexts, externalities, and liability.
-
A reciprocity and truth test battery (Vol. 2–4) that assigns TRC scores (Truth/Testifiability, Reciprocity, Commensurability) and Liability costs to each item.
-
Socratic teacher datasets & rubrics (derived from all volumes) that show the model how to pass those tests—not just tell it.
-
Adversarial + cooperative prompts that stress the model on precisely those failure modes that cause hallucination, motivated inference, and irreciprocal outputs.
-
Evaluation harnesses that turn those scores into dataset-level and run-time KPIs.
Start with the domains where errors are most costly (legal/medical/finance/science/enterprise). Don’t boil the internet. Use our grammar + tests to filter and reweight your existing corpora and vendor feeds. Treat everything else as background pretraining.
Replace vague preference rubrics with a TRC+L rubric: reward testifiable, reciprocal, commensurable answers; penalize irreciprocity and unjustified inference. This immediately improves answer quality without changing pretraining.
Train a small policy/checker on our Socratic data. Use it to pre-score candidate data and to generate contrastive pairs (good/bad under TRC+L). Human adversarialists spot-check deltas.
Upweight sources whose per-document TRC and per-domain commensurability are high; downweight sources that systematically fail reciprocity (propaganda, clickbait, rhetorical inflation). Keep the scale; change the mixture.
Deploy the checker as a post-decoder critic or reflection step: when an answer’s TRC margin is low or projected Liability is high, force the model to (a) retrieve evidence, (b) expose operations, or (c) abstain.
Use TRC for inclusion/weighting. Use L for where to invest humans.
We convert text → minimally sufficient operational program (what would one do to make/test the claim). If no program: low Testifiability. If units/referents are sloppy: low Commensurability.
We check for disclosure of incentives/assumptions, acknowledged externalities, symmetry of costs/benefits, and absence of free-riding. Hidden rent-seeking → downweight. Transparent tradeoffs → upweight.
We project cost of error by severity × population × warranty. This drives where abstention and retrieval are mandatory.
We estimate TRC margins under perturbations (slightly changed assumptions, data drift). Small delta → robust claim; big delta → fragile. Use that to rank curation targets.
-
Vendor corpora → de-dupe → source reputation prior.
-
Claim slicing (chunking with discourse boundaries).
-
First-pass TRC+L scoring (teacher/checker + light human audit on tails).
-
Construct domain slices with target TRC distributions (e.g., 0.7+ for safety-critical, 0.5+ for general).
-
Upweight high-TRC slices for pretraining and for SFT seed.
-
Keep low-TRC background for broad coverage, but cap its mass and mask it from SFT.
-
Replace thumbs-up/down with structured comparisons: “Output A exposes operations, binds referents, and acknowledges externalities; Output B does not.”
-
Reward operational transparency and reciprocal framing, not just “helpful.”
-
Ship domain-specific truth/reciprocity/commensurability suites with gold rationales.
-
Add abstention & deferral tests tied to Liability: the model should sometimes say, “insufficient TRC; need evidence.”
-
Checker hook: When low TRC or high L, trigger retrieval, self-critique, or handoff to tools/humans.
-
Dataset TRC distribution by domain/source/date. (Watch drift.)
-
Coverage of operations: % of samples with executable/inspectable operation chains.
-
Reciprocity violations caught per N tokens (pretrain, SFT, inference).
-
Abstention correctness under high Liability tests.
-
Cost-of-error savings: downstream red-team hours, legal review touches, production incidents.
-
Calibration: TRC vs. external evals (e.g., factuality benches, internal truth panels).
-
Scale vs. purity. You will not sanitize the web. Keep scale; steer the mixture with TRC weighting, then focus SFT and RL on high-TRC data.
-
Label cost. Use teachers + adversarialists: teachers generate contrasts; adversarialists audit only disagreements and high-Liability slices.
-
Domain variance. Weights differ: science/legal get high wT and wC; social/helpfulness gets higher wR (reciprocity of framing, costs to others).
-
Latency budget. If runtime checks are expensive, sample the checker: always-on for high-L routes; probabilistic elsewhere.
-
Grammar, checklists, and automated tests for T, R, C, L.
-
Socratic training and ready-to-use teacher/checker heads.
-
Eval suites and playbooks for adoption Levels 0–2.
-
Your domain priorities and cost-of-error model.
-
Access to your corpora and mixture machinery.
-
A small adversarial data team (2–6 FTE) to close the loop in your environment.
-
Curate one slice (e.g., enterprise Q&A or regulatory/compliance). Reweight by TRC; run SFT on the high-TRC subset only.
-
Swap your RLHF rubric for TRC+L. Measure factuality, refusal quality, and abstention correctness deltas.
-
Introduce abstention in high-L routes with a minimal checker. Track incident reduction.
-
Publish a Dataset Card showing TRC distributions and liability gates. This helps auditors and customers immediately.
-
Over-formalization → coverage loss. Counter by mixing: keep broad low-TRC background, but bound its influence.
-
Gaming the rubric. Update the adversarial prompts quarterly; rotate negative exemplars; audit with blind external panels.
-
False certainty. If TRC is low and L is high, the only correct behavior is deferral. We hard-wire that circuit.
Source date (UTC): 2025-08-18 14:41:00 UTC
Original post: https://x.com/i/articles/1957452676175954137