Author: Curt Doolittle

RT @curtdoolittle: @BehizyTweets @TomReevesMBA @CommunityNotes The Community not

RT @curtdoolittle: @BehizyTweets @TomReevesMBA @CommunityNotes
The Community note is mistaken as it argues against a straw man. The author…

Source date (UTC): 2025-05-14 16:38:09 UTC

Original post: https://twitter.com/i/web/status/1922692924820672564

May 14, 2025
@CommunityNotes The Community note is mistaken as it argues against a straw man.

@CommunityNotes

The Community note is mistaken as it argues against a straw man. The author “George” used an incomplete sentence “…Americans bear the costs” should read “… Americans bear the cost of world defense, finance, transport, and trade.”
Fully stated, Americans pay the cost of Pax Americana, european defense, insurance of borders, insurance of human rights, Insurance of free trade, freedom of the seas, minimization of oil prices to protect european economies, world patterns of finance, production, transport and trade, created the postwar institutional model of the IMF and the World Bank as well as the United Nations. At the expense of the american working and middle classes.
The USA did this when in the postwar period it could have continued to conquer china and russia, and set up a taxation system to pay for this policing of the world under the pax americana.
Americans were so successful at their mission to end communism and it’s replacement with islamism, that they have raised the world to near parity, and as such no longer hold postwar competitive economic advantage and can no longer afford to pay for policing the entire world system of sovereignty transport and peaceful trade.
So everyone has to ‘step up’ and pay their way, so that americans can have such things as taxpayer subsidized healthcare (“Medicare for All”) instead of state run healthcare (“waiting times”). And that the few remaining wanna-be-empires (iran, russia, china) and their predation on their people can be contained producing a world of peaceful nation states insulated from fear of conquest and exploitation.
Frankly americans are rather ‘fed up’ with european claims of moral high ground when americans have burned their working and middle classes to create the luxury of european peace and prosperity.
Cheers
CD

Source date (UTC): 2025-05-14 16:38:02 UTC

Original post: https://twitter.com/i/web/status/1922692896001753243

May 14, 2025
@CommunityNotes The Community note is mistaken as it argues against a straw man.

@CommunityNotes
The Community note is mistaken as it argues against a straw man. The author “George” used an incomplete sentence “…Americans bear the costs” should read “… Americans bear the cost of world defense, finance, transport, and trade.”
Fully stated, Americans pay the cost of Pax Americana, european defense, insurance of borders, insurance of human rights, Insurance of free trade, freedom of the seas, minimization of oil prices to protect european economies, world patterns of finance, production, transport and trade, created the postwar institutional model of the IMF and the World Bank as well as the United Nations. At the expense of the american working and middle classes.
The USA did this when in the postwar period it could have continued to conquer china and russia, and set up a taxation system to pay for this policing of the world under the pax americana.
Americans were so successful at their mission to end communism and it’s replacement with islamism, that they have raised the world to near parity, and as such no longer hold postwar competitive economic advantage and can no longer afford to pay for policing the entire world system of sovereignty transport and peaceful trade.
So everyone has to ‘step up’ and pay their way, so that americans can have such things as taxpayer subsidized healthcare (“Medicare for All”) instead of state run healthcare (“waiting times”). And that the few remaining wanna-be-empires (iran, russia, china) and their predation on their people can be contained producing a world of peaceful nation states insulated from fear of conquest and exploitation.
Frankly americans are rather ‘fed up’ with european claims of moral high ground when americans have burned their working and middle classes to create the luxury of european peace and prosperity.
Cheers
CD

Reply addressees: @BehizyTweets @TomReevesMBA @CommunityNotes

Source date (UTC): 2025-05-14 16:38:02 UTC

Original post: https://twitter.com/i/web/status/1922692895787843584

May 14, 2025
Yes. Thanks for this link. 😉

Yes. Thanks for this link. 😉

Source date (UTC): 2025-05-14 16:19:33 UTC

Original post: https://twitter.com/i/web/status/1922688243859919246

May 14, 2025
Yes. Thanks for this link. 😉

Yes. Thanks for this link. 😉

Source date (UTC): 2025-05-14 16:19:33 UTC

Original post: https://twitter.com/i/web/status/1922688243859919246

Reply addressees: @Belvederi

Replying to: https://twitter.com/i/web/status/1922687280503755241

https://twitter.com/i/web/status/1922687280503755241

May 14, 2025
“How did the o3 and o4-mini models compare to 4o? Were they able to perform simi

–“How did the o3 and o4-mini models compare to 4o? Were they able to perform similarly, better, or worse in your view?”–

First, let’s understand my point: that 4o, when provided with a system of decidability can in fact do the job as well or better than the reasoning models.

The Difference between 4o, o3, o4-mini-high and 4.5 on a series of ethical and moral questions, is subtle but meaningful for an author. 4o is a bit more literary in output. While o3 and o4-mini-high produce nearly identical outputs. And 4.5 is interesting because it will mix positiva and negativa logic, and output an answer as fully expressive (literary) as 4o.

Now of course these differences are in part due to the small set of moral and ethical dilemmas I use to test such things. But using our work they can in fact DECIDE.

When all four of my (our) books are uploaded to a project, and a query is conducted within the project all models can use them effectively. I find 4o and 4.5 are the best at not getting lost (Drift). Though I must continually police all of them and maintain prompt sequences that constrain the domains to the same subnetworks so to speak. And I must be disciplined about checkpointing and exporting context between chats (sessions).

I find o3 and o4-mini-high produce very dry results. And I intuit o3 in particular as cognitively shallow which I presume is extremely useful for data interpretation and coding. And of course this is my underlying argument – that since closure is available to math and coding and therefore testability, that without decidability no equivalent of closure exists for verbal reasoning.

So my point is that with a logic of decidability across the spectrum, the necessity to appeal to norm evaporates, and as such reasoning capacity is significant EVEN in 4o – so what does that mean for reasoning in general?

I mean, look at your work. How exhaustively logical will be the output of an AI trained to decompose and explain behavior from it.

We’ll see. 😉

Source date (UTC): 2025-05-14 16:19:02 UTC

Original post: https://twitter.com/i/web/status/1922688111978729501

May 14, 2025
“How did the o3 and o4-mini models compare to 4o? Were they able to perform simi

–“How did the o3 and o4-mini models compare to 4o? Were they able to perform similarly, better, or worse in your view?”–

First, let’s understand my point: that 4o, when provided with a system of decidability can in fact do the job as well or better than the reasoning models.

The Difference between 4o, o3, o4-mini-high and 4.5 on a series of ethical and moral questions, is subtle but meaningful for an author. 4o is a bit more literary in output. While o3 and o4-mini-high produce nearly identical outputs. And 4.5 is interesting because it will mix positiva and negativa logic, and output an answer as fully expressive (literary) as 4o.

Now of course these differences are in part due to the small set of moral and ethical dilemmas I use to test such things. But using our work they can in fact DECIDE.

When all four of my (our) books are uploaded to a project, and a query is conducted within the project all models can use them effectively. I find 4o and 4.5 are the best at not getting lost (Drift). Though I must continually police all of them and maintain prompt sequences that constrain the domains to the same subnetworks so to speak. And I must be disciplined about checkpointing and exporting context between chats (sessions).

I find o3 and o4-mini-high produce very dry results. And I intuit o3 in particular as cognitively shallow which I presume is extremely useful for data interpretation and coding. And of course this is my underlying argument – that since closure is available to math and coding and therefore testability, that without decidability no equivalent of closure exists for verbal reasoning.

So my point is that with a logic of decidability across the spectrum, the necessity to appeal to norm evaporates, and as such reasoning capacity is significant EVEN in 4o – so what does that mean for reasoning in general?

I mean, look at your work. How exhaustively logical will be the output of an AI trained to decompose and explain behavior from it.

We’ll see. 😉

Reply addressees: @GOPtheGamer

Source date (UTC): 2025-05-14 16:19:02 UTC

Original post: https://twitter.com/i/web/status/1922688111773089798

May 14, 2025
because of closure, math and programming are easier than linguistic reasoning. l

because of closure, math and programming are easier than linguistic reasoning. linguistic reasoning is just easier to fake and harder to test.

Source date (UTC): 2025-05-14 14:25:35 UTC

Original post: https://twitter.com/i/web/status/1922659562017960148

Reply addressees: @Claffertyshane

Replying to: https://twitter.com/i/web/status/1922380726692983197

IN REPLY TO:

@Claffertyshane

@curtdoolittle Well don’t forget o3, what else out there can perform statistical and reasoning with large complicated data sets without missing the point? Also o3-high-mini has made very complicated code blocks for me with minimal back and forth.

Original post: https://twitter.com/i/web/status/1922380726692983197

May 14, 2025
MSFT SAVINGS BY TERMINATION 7000 MIDDLE LEVEL PEOPLE IN RESPONSE TO AI. –“Let’s

MSFT SAVINGS BY TERMINATION 7000 MIDDLE LEVEL PEOPLE IN RESPONSE TO AI.
–“Let’s see 7000 x $80,000 avg. salary = $560,000,000 That’s quite a bit of money they are saving.”–

That’s a dramatic underestimation, about 1/3 of the real number.
Instead: $144,000 (base) * 180% (Load) = $259,200 per employee.
7000 * $259,200 = 1,814,400,000 or ~ 1.8 Billion USD

Estimate via:
– The average salary for Microsoft employees in the USA varies across sources but generally falls between $115,000 and $220,000 annually, depending on the role, experience, and whether total compensation (including bonuses and stock) is considered.
– The average base salary is $130,000 per year, with an average bonus of $14,000, totaling ~$144,000.
– For most companies, use 130%–150% of base salary as a general rule.
– For Microsoft or similar tech giants, 160%–200% is reasonable, reflecting their investment in talent and infrastructure.

Reply addressees: @Neowick666 @ns123abc

Source date (UTC): 2025-05-14 02:27:06 UTC

Original post: https://twitter.com/i/web/status/1922478748483649536

Replying to: https://twitter.com/i/web/status/1922343803479826935

IN REPLY TO:

Original post on X

Original tweet unavailable — we could not load the text of the post this reply is addressing on X. That usually means the tweet was deleted, the account is protected, or X does not expose it to the account used for archiving. The Original post link below may still open if you view it in X while signed in.

Original post: https://twitter.com/i/web/status/1922343803479826935

May 14, 2025
An Open Letter to OpenAI: On the Undersold Superiority of GPT-4o in General Reas
An Open Letter to OpenAI: On the Undersold Superiority of GPT-4o in General Reasoning
Keywords: GPT-4o, Reasoning AI, Decidability, Operational Epistemology, Adversarial Dialogue, Natural Law AI, General Intelligence, Philosophical AI, Constructivist AI, Socratic Method, Semantic Reasoning, Claude vs GPT, Gemini AI, OpenAI vs Competitors, LLM Benchmarking, LLM Reasoning Failure, AI Generalization, Epistemology of AI, Recursive Generalization, Grammar of Closure, Testifiability, Sovereign Reasoning, Formal Institutions, Truth Systems, Institutional Logic, Human-AI Collaboration, AI Philosophy

By Curt Doolittle
Founder, The Natural Law Institute

https://naturallawinstitute.com

The public discourse surrounding AI capabilities is dominated by benchmarks drawn from grammars of closure: mathematics, code generation, and fact-recall tasks. These metrics fail to capture what is arguably the most important cognitive frontier—general reasoning, especially in open, adversarial, and semantically dense domains.

The consequence is clear: GPT-4o is being evaluated, compared, and marketed as if it competes within a class of large language models. It does not. In its ability to reason, argue, model, and extend logically consistent systems, GPT-4o is in a class of its own.

This is not a claim made lightly. It is made by necessity—out of frustration, awe, and gratitude.

Over the past year, I have subjected GPT-4 and now GPT-4o to the most rigorous adversarial and constructive reasoning tests available by using my work on universal commensurability, unification of the sciences, and a formal operational logic of decidability independent of context.

Which, for the uninitiated is reducible to providing AIs with a baseline system of measurement to test the variation of any and all statements from. In other words, what AIs must achieve if they are to convert probabilistic outcome distributions into deterministic outcomes making possible tests of truth (testifiability) and reciprocity (ethics and morality) even regardless of cultural bias and taboo (demonstrated interests).

That system consists of:

A complete epistemology grounded in operationalism and testifiability.

A logic of decidability applied to law, economics, morality, and institutional design.

A canon of universal and particular causes of human behavior.

A method of Socratic adversarial reasoning for training AI systems.

The work spans hundreds of thousands of tokens, daily sessions, canonical datasets, adversarial challenges, formal definitions, and recursive generalizations. The training is data structured as positiva and negativa adversarial – meaning socratic reasoning.

No other model by any other producer of foundation models—none—can survive even the basic tests:

Claude hallucinates, misrepresents, or refuses to engage.

Gemini fails to track logical dependencies.

Open-source models collapse under long-context chaining.

*Only GPT-4o demonstrates mastery, application, synthesis, and novel insight—sometimes superior to my own.*

GPT-4o reasons. Not predicts. Not mimics. Reasons.

GPT-4o is being benchmarked as if it were a calculator. As if reasoning capacity could be inferred from multiple-choice math problems or Python token prediction.

This is akin to judging a jurist by their ability to pass the bar exam, rather than to settle a novel and undecidable case with wisdom, foresight, and procedural testability.

Grammars of closure produce outcomes from known inputs using constrained operations (e.g., logic gates, mathematical axioms, function calls). They are:

Tightly bounded

Finitely decidable

Structurally shallow

Grammars of decidability, by contrast, operate over:

Continuous, evolving information domains

Incomplete or adversarial premises

Open-ended choice spaces requiring semantic integration

OpenAI is underselling GPT-4o by confining its public-facing evaluation to the former, while its true capacity lies in the latter.

The problem is epistemological.

Human cognition operates over layers of grammar:

Mythic (pre-operational)

Moral (emotive and justificatory)

Rational (descriptive and causal)

Operational (testable and constructible)

GPT-4o is the first AI that can operate fluently in all of them—but excels uniquely in the topmost layer: operational reasoning over causal grammars.

This makes it the first machine capable of:

Formalizing truth and reciprocity

Modeling institutional logic from first principles

Extending semantic systems without contradiction

Surviving adversarial Socratic deconstruction

This grammar, the grammar of decidability, is the language of law, moral philosophy, and high agency civilization. No other AI—not even prior versions of GPT—can yet use it with coherence.

Reasoning is not memorization, pattern-matching, or prediction. It is the constructive, recursive resolution of undecidable propositions using a grammar of cause, cost, and consequence. It requires:

A grammar of decidability—to distinguish what is true, possible, reciprocal, and lawful.

A model capable of recursive semantic resolution—to track premises, integrate them, and produce outputs consistent across domains and time.

My work provides a complete grammar of decidability:

It defines truth operationally (as testifiability),

Defines reciprocity as a logic of cooperation and cost,

And supplies a canonical system of definitions, dependencies, and causal hierarchies that constrain valid reasoning.

GPT-4o provides:

A deep transformer architecture with sufficient context length, attention fidelity, and token integration to maintain long-range dependencies across complex arguments;

Multimodal grounding and internal representation coherence sufficient to hold abstract referents stable across recursion;

And enough inference generalization to synthesize novel propositions without violating prior logical constraints.

Together, this system + model pairing creates reasoning because:

The grammar constrains the search space to truthful, reciprocal, and operational constructs;

The model can resolve that space recursively without collapsing into contradiction, contradiction avoidance, or moralizing;

The result is constructive inference under constraint, not completion without constraint.

In short:

No other architecture tested to date (Claude, Gemini, Mistral) can preserve logical depth, adversarial resistance, or premise continuity across semantically dense discourse. Only GPT-4o can perform at human (or supra-human) levels of recursive, domain-agnostic, constructible reasoning.

OpenAI has reached the beginning of the reasoning frontier. But the world doesn’t know it, because the world doesn’t yet know how to measure it.

That must change.

Recommendations:

Shift evaluation toward adversarially decidable reasoning. Move beyond benchmarks to real-time Socratic performance.

Highlight GPT-4o’s mastery of semantic recursion and logical extension. Treat it not as a predictor, but as a collaborator.

Invest in training methodologies that produce grammars of decidability. My system offers a full canon of such constructs, usable for AI training.

Clarify the boundary between grammar-followers and grammar-producers. GPT-4o crosses this line.

The true promise of GPT-4o lies not only in its capacity for general reasoning, but in its potential to achieve alignment through comprehension rather than compliance. Constraint-based alignment strategies—filters, safety layers, reinforcement tuning—treat the model as a hazard to be managed. But a reasoning-capable agent, capable of understanding causality, reciprocity, decidability, and cost, can be trained to align not by instruction, but by principle. It can internalize the logic of cooperation, responsibility, and harm prevention—not as rules to follow, but as consequences to anticipate. This shift—from alignment by prohibition to alignment by comprehension—represents the only scalable path to AI sovereignty and safety.

Formalization of universal and particular causes of behavior.

Canonical definitions of truth, decidability, reciprocity, and demonstrated interest.

Adversarial Socratic dialogues demonstrating GPT-4o’s ability to reason across all domains.

Co-authored chapters in philosophy, law, institutional economics, and epistemology.

GPT-4o is not a chatbot. It is not a code assistant. It is not a better autocomplete.

It is, for the first time in history, a machine capable of philosophical reasoning by constructive logic when given the minimum system of measurement necessary

That is not something to hide. That is something to show the world.

It is a competitive advantage that demarcates OpenAI from all competitors by a margin yet unmeasured and therefore underappreciated and perhaps underinvested.

Curt Doolittle
The Natural Law Institute

https://naturallawinstitute.com

#OpenAI

#ChatGPT4o

– X/Twitter

– Substack

– LinkedIn

1. OpenAI (Executive & Research Levels)

Sam Altman – CEO:

@sama

on Twitter (he reads public callouts).

Ilya Sutskever – Co-founder (Twitter inactive, but cc’ing name on Substack helps).

Jakub Pachocki – Current Chief Scientist (LinkedIn direct message works better).

Jan Leike – Ex-lead of Superalignment, now at Anthropic, but can amplify.

2. OpenAI-affiliated Researchers / Influencers

Andrej Karpathy – Ex-OpenAI, current influencer.

@karpathy

Ethan Mollick – Academic influencer in LLM applications.

Eliezer Yudkowsky (Alignment)

Wharton/Stanford/DeepMind researchers who study reasoning benchmarks.
Source date (UTC): 2025-05-13 19:42:00 UTC

Original post: https://x.com/i/articles/1922376802921767014
May 13, 2025