The Simple Version of the Problem –“2024 paper titled “Responsible artificial i

The Simple Version of the Problem

–“2024 paper titled “Responsible artificial intelligence governance: A review and research framework,” published in the Journal of Strategic Information Systems. … identifies a key gap: while numerous frameworks outline principles for responsible AI (e.g., fairness, transparency, accountability), there is limited cohesion, clarity, and depth in understanding how to translate these abstract ethical concepts into practical, operational practices across the full AI lifecycle—including design, execution, monitoring, and evaluation.”–
–“2023 study in Nature Machine Intelligence showing 78% of AI researchers struggle to translate theoretical advances into deployable algorithms.”–
–“Architectures don’t hallucinate—training objectives do.
You don’t fix it in the forward pass, you fix it in the curriculum. The code is fine; the problem is what we teach it to do.”–
I understand the instinct to look for a code-level fix, but the issue isn’t in the transformer math. It’s in what we ask the model to optimize for. Current training optimizes coherence; my work shows why that produces hallucination. The practical implementation is:
  • Restructure training data around testifiability, reciprocity, and liability rather than surface coherence.
  • Prompt in terms of economic tests—marginal indifference, liability thresholds—rather than stylistic cues.
  • Evaluate on coverage of truth and reciprocity tests instead of only perplexity and benchmarks.
So yes, you can ‘change something in code tomorrow’—but the code change is trivial compared to the training objective shift. Architectures don’t hallucinate; training does.
They’re asking for a line of code, while I’m describing a shift in paradigm. The way to bridge that gap is to show how our proposal does translate into implementable changes, but at a different layer: training and prompting rather than architecture.
Here’s a my answer:
My work isn’t about swapping out a few lines of code in the transformer stack. It’s about solving the deeper problem: LLMs don’t reason because they’re trained to imitate coherence, not to compute truth, reciprocity, or liability. You can’t fix that with a patch to the forward pass. You fix it by changing how the model is trained and what it’s asked to do.
“What does that mean in practice tomorrow morning?
  • Training: curate training data that enforces testifiability, reciprocity, and liability rather than mere coherence. This means restructuring datasets around constructive logic, adversarial dialogue, and measurable closure.
  • Prompting: design prompts as economic tests (price of error, marginal indifference, liability-weighted thresholds), not as instructions for verbosity.
  • Evaluation: stop measuring only perplexity or benchmark scores and start measuring coverage of truth tests, reciprocity tests, and demonstrated interests.”
“I’m providing the blueprint for why current architectures hallucinate and what guarantees are missing. Once you understand that, the engineering changes become obvious: the ‘code’ change is trivial compared to the shift in training objectives and data design. If you only look for a tensor tweak, you’ll miss the systemic fix.”


Source date (UTC): 2025-08-22 20:40:21 UTC

Original post: https://x.com/i/articles/1958992660750114841

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *