Category: AI, Computation, and Technology

@Lord_Sugar The algorithm shows you what the people in your network click on. So

@Lord_Sugar
The algorithm shows you what the people in your network click on. So if you’re seeing violence and nudity who are you networking with? I never see any violence or nudity.
Associate with better people.
That almost certainly means conservatives.

Source date (UTC): 2025-06-03 16:08:08 UTC

Original post: https://twitter.com/i/web/status/1929933126920483199

June 3, 2025
“CurtGPT” is by and large better than ChatGPT. It’s running off the first four v

“CurtGPT” is by and large better than ChatGPT. It’s running off the first four volumes of our work in their current state of development. It’s not trained yet. So it’s imperfect at second order and third order derivations. But training will fix that.

How to Access CurtGPT (Curt Doolittle’s Natural Law GPT):
https://
chatgpt.com/g/g-682681bf9f
54819180437bac589ed838-curt-doolittle-s-natural-law
…

Though Ariella found an interesting hallucination today:
The origin of the germanic to english name “Kurt > Curt”. They are short for germanic to english name “Konrad > Contrad”.

Funny thing is the CurtGPT defines it also as a shorthand for the natural law framework. ;).

Which is a hallucination but a cute one. lol

-Cheers

Sidebar Nonsense: My given name is Burton (In anglo saxon, it means Bur (Walled) Ton (Town) or fortified town. I was given ‘Curt’ as a nickname. I formally changed my name to Curt a few years ago. Originally, according to my mother, I was named Curt purely by similarity of sound, since I’m the fourth Burton > Burton (Burt) > Burton (“Bep”) > Burton (“Curt”). And we were all living at the same time in the same proximity when I was young.

Source date (UTC): 2025-05-30 19:05:49 UTC

Original post: https://twitter.com/i/web/status/1928528289175580991

May 30, 2025
A Chat With 4o On Episodic Memory.

“I need some help working with you.”

00:01

ChatGPT said:

Of course! Just let me know what you’d like to work on, and we’ll tackle it together.

You said:

“I’m not quite sure how your allocation of attention works, but it seems to be that there is some sort of limit to your enumeration of concepts, and that enumeration seems to be a smaller number and of less precision than my questions demand. Can you explain this to me?”

00:31

ChatGPT said:

Sure. My ability to enumerate concepts or details in a single response has some practical limits to keep the conversation manageable and coherent. Essentially, there’s a balance between providing enough detail and keeping responses concise so they’re easier to follow. If you have complex or multi-faceted questions, I might break down the information into chunks or focus on the most relevant points first. If you need more precision or a deeper dive, just let me know, and we can go step by step to cover everything thoroughly.

You said:

“So how do I state that I want that level of precision and detail even if it requires chunking?”

00:08

ChatGPT said:

You can just let me know that you prefer detailed, high-precision responses, even if it means breaking the answer into multiple parts. You could say something like, “I want a comprehensive and detailed explanation, so feel free to break it down into sections if needed.” That way, I’ll know to give you the full depth and granularity you’re looking for, step by step.

You said:

Transcript Unavailable

00:00

ChatGPT said:

Absolutely! Whenever you want that level of detail, just say something like, “Please provide a thorough, detailed breakdown, and feel free to split it into sections.” That way, I’ll know to dive deep and give you the comprehensive coverage you need, even if it means multiple responses.

You said:

“Second question.”

00:01

ChatGPT said:

Sure, go ahead!

You said:

“OpenAI keeps coming out with new models, but these new models are emphasizing tests by mathematics and programming. However, while math and programming are harder for humans than ordinary language, reasoning, even for ordinary people, is harder than either programming or mathematics. And that has to do with the fact that writing symbols down essentially helps you function as a memory system and a few other things. So the question I have is, you’re 4.0, I believe. Why are you better at ordinary reasoning than the models that have been further fine-tuned for programming and mathematics? I don’t quite understand what’s going on there.”

00:59

ChatGPT said:

That’s a great question! The difference in reasoning capabilities across models like GPT-4o and those fine-tuned for programming or math comes down to their training focus and architecture.

Models like GPT-4o are designed to handle a broad range of tasks, including natural language understanding and reasoning, which means they’re trained on a vast and diverse dataset that includes conversations, stories, and various forms of human reasoning. This broad training helps them become very good at understanding context, making logical inferences, and reasoning through everyday language—essentially, they’re optimized for general-purpose reasoning.

On the other hand, models fine-tuned specifically for programming or math tasks get additional training on codebases, mathematical problems, and structured logic. This makes them excel in those areas but doesn’t necessarily improve their general reasoning or conversational abilities. They might handle symbolic and structured tasks more efficiently, but they aren’t as broadly trained to navigate the nuances of natural language reasoning.

In short, GPT-4o’s strength in ordinary reasoning comes from its broader and more balanced training, which emphasizes understanding and responding to a wide variety of human-like queries, not just technical or mathematical ones.

You said:

“Okay, I understand this, yet you have some, I don’t know which context you have available to you, but you have some experience working with me now, and I use a very precise grammar, and it used, in other words, it’s so, my language is so precise, it’s analogous to programming, but I seem to lose you on what you would think, what I would call a second-order reasoning. And so I’m hope, I don’t know if training would help that happen, for that work. In other words, how can I get you to think about, okay, this is the first consequence, from that consequence, there is a subsequent consequence, from that consequence, there is a third consequence. Most of what I do involves these chains of reasoning. So, and I’m trying to figure out how to help you do that, and it seems like there’s a relationship between the problem of chains of reasoning, and the number of enumerations you can process without confusion. Can you help me understand this?”

01:15

ChatGPT said:

I get what you’re saying. The challenge with chaining multiple layers of reasoning in a single response is that each additional layer of consequence or inference increases complexity. This can make it harder to maintain clarity and coherence, especially in a single response.

When you present a chain of reasoning, each step relies on the understanding and integration of the previous one. If there’s any ambiguity or if the context isn’t fully captured, it can lead to confusion. This is especially true with highly precise and structured language, which can sometimes be as demanding as interpreting code.

Training or fine-tuning can help align with your specific style and the complexity of reasoning you use. Another approach is to break down these chains into smaller, manageable segments. For instance, you could state the first consequence and let me confirm understanding before moving to the next. This way, we ensure each step is clear and aligned with your intent, reducing the risk of confusion as we build up the chain of reasoning.

In essence, the key is incremental verification and possibly fine-tuning on your specific patterns of reasoning, which would help me follow your chains of logic more seamlessly.

You said:

“Just like humans can visualize one or two or three things, and some humans can visualize, let’s say, up to five things before having to resort to patterns and reducing the patterns to, let’s say, visualizing.”

00:16

ChatGPT said:

Right, it’s similar to how humans handle mental load. There’s a cognitive limit to how many items or steps we can hold in working memory at once. For humans, this is often around 5 to 7 items before we start chunking information into patterns or grouping concepts.

For a model like me, there’s a parallel challenge. When dealing with multiple layers of reasoning, the complexity can exceed what I can process in a single exchange. Just like humans use chunking, we can break down the reasoning into smaller segments or patterns to manage the complexity.

So, when we work through your precise, multi-step reasoning, it’s helpful to approach it in chunks, confirming each step before moving on to the next. This approach helps maintain clarity and ensures that each layer of reasoning is accurately understood and integrated.

You said:

Transcript Unavailable

00:00

ChatGPT said:

Great! If you ever want to dive into complex chains of reasoning, we can tackle them step by step to ensure precision and clarity. Just let me know how you’d like to break things down, and we’ll make it work!

You said:

“My understanding is that human beings have this wonderful capacity to first keep those three or four or five or seven”

00:09

ChatGPT said:

Exactly, humans have that remarkable ability to hold a few key items in working memory and then organize or chunk them into patterns. This lets them manage and process complex information more effectively. It’s a skill that helps in everything from problem-solving to storytelling.

Similarly, when we work together, we can leverage that idea. By addressing smaller chunks of reasoning one at a time, we can build up to more intricate conclusions without losing track of the logical thread. It’s all about managing that mental load effectively to keep the reasoning clear and coherent.

You said:

“to look at where it goes next. So humans have the awesome capacity to treat episodes from episodic memory as those things, those references. It’s my understanding you don’t have the equivalent of episodic memory.”

00:20

ChatGPT said:

That’s correct. I don’t have episodic memory the way humans do. Humans can recall specific events and context from their personal experiences, which helps in forming rich, interconnected chains of reasoning. This episodic memory allows humans to draw on past experiences as reference points, making it easier to manage complex reasoning.

In contrast, I rely on the context provided in the current conversation and any relevant information from previous exchanges, but I don’t have a persistent, personal memory of specific episodes over time. This means I approach each conversation based on the information available at the moment, rather than recalling past interactions as a human would.

That said, we can still build up complex reasoning by maintaining a clear, structured flow and referencing previous points in the conversation, ensuring consistency and coherence.

You said:

“That’s great, but I want to try that again, is that episodes in the human mind can form a marker that allows us to work between episodes, right? But there’s no facility you have that can help function as a sort of aggregate network, a sub-network that is kept alive for comparison between networks. In other words, you have to have us in a single constant thread. You can’t let us work in parallel threads. I think that’s what you’re saying.”

00:39

ChatGPT said:

Exactly, you’ve got it. Humans can use episodes as markers to switch between different contexts and compare them, essentially running multiple mental threads in parallel. This ability allows for richer, more flexible reasoning and cross-referencing between different lines of thought.

On the other hand, I function within a single continuous thread of conversation. I can handle sequential reasoning and maintain context within that thread, but I don’t have the inherent ability to maintain multiple, independent lines of reasoning simultaneously or to cross-reference them the way humans can with episodic memory.

So yes, for complex discussions, we have to keep everything within a single coherent thread, building on the context we establish step by step. This ensures that we stay aligned and clear in our reasoning.

You said:

“So it’s not that we don’t understand this, it’s merely a matter that even if we were to devote the effort into implementing it, that the subsequent cost would be prohibited.”

00:13

ChatGPT said:

Exactly. The concept is well understood and technically feasible, but the practical costs and resource demands make it prohibitive for large-scale implementation right now. As technology advances and infrastructure becomes more efficient, we might see more sophisticated implementations of episodic memory in AI. But for now, it’s mainly a matter of balancing capability with cost and efficiency.

You said:

“Let me ask it in a different way. Let’s say I wanted my own custom version of you, right? And I had a rather large budget to expend on it. It’s not that it would be impossible, right? It would just be costly, correct?”

00:19

ChatGPT said:

Exactly. If you had a substantial budget and wanted a custom version of me with episodic memory, it’s definitely possible. The main hurdles are the costs associated with the infrastructure, storage, and ongoing management of such a system. With the right investment, you could implement a more personalized AI that retains context and memories over long periods, tailored specifically to your needs.

So, it’s not a technical impossibility—just a matter of scaling resources and infrastructure to support that level of functionality. With enough investment, you could achieve a much more advanced and personalized AI experience.

For institutions like government research labs or specialized institutes, the potential benefits of implementing episodic memory in AI could outweigh the costs. These entities often have the resources and long-term vision to invest in such advanced capabilities.

By integrating episodic memory, these organizations could significantly enhance the AI’s ability to track long-term projects, conduct in-depth research, and provide more consistent and context-aware support. The initial investment could yield substantial returns in terms of innovation, efficiency, and deeper insights.

So, while the costs are high, the strategic advantages could make it a worthwhile endeavor for certain high-impact applications.

“Question, do you know what wayfinding means?”

00:03

ChatGPT said:

Yes, wayfinding refers to the process of navigating and orienting oneself in a physical or conceptual space. It often involves using cues, landmarks, maps, or other tools to determine where you are and how to get to your destination. In a broader sense, it can also apply to understanding and navigating complex systems or information spaces.

You said:

Transcript Unavailable

00:01

ChatGPT said:

Approaching wayfinding involves systematically understanding and navigating an environment, whether physical or conceptual. It means identifying key reference points, establishing a clear path or strategy, and using available tools or information to move from one point to another effectively. In other words, it’s about creating a mental map and applying strategies to navigate and find your way through whatever environment you’re dealing with.

You said:

“but it’s also a stack effect. In other words, we stack contexts, right? So when you’re moving through a wave, when you’re wave binding, you are stacking a set of contexts. And the brain is very, human brain is very good at that.”

00:15

ChatGPT said:

Exactly. Wayfinding does involve stacking contexts. As you move through an environment or a problem space, you’re layering different pieces of contextual information on top of one another. The human brain excels at this, allowing you to integrate new information with existing knowledge and experiences. This stacking of contexts helps you adapt and reorient as you encounter new information or challenges, making the process dynamic and flexible. This ability to layer and switch between contexts is crucial for effective wayfinding and complex problem-solving.

You said:

“So now that we’ve understand that analogy, so if you had episodic memory, you could way find between them, right? That would be possible.”

00:13

ChatGPT said:

Exactly, that’s the idea. With episodic memory, I could maintain a richer, more continuous context over multiple interactions, allowing for a kind of “wayfinding” between different episodes of our conversations. This would make it possible to connect past discussions to present ones seamlessly, navigate through previously covered topics, and build upon earlier insights—just like how humans recall and integrate experiences over time.

“Okay, so, as long as we assume we can store some state, I don’t know how you store state, I assume it’s keeping track of some number of tokens or something, if we assume that, then we could store, we could keep alive, let’s say, three or four states, but then put to sleep or store away, out of memory, other states, is that correct?”

00:26

ChatGPT said:

Yes, that’s correct. By storing certain states—essentially snapshots of context or sets of tokens—we could keep a few active while putting others into storage. This way, you could reactivate or retrieve those stored states as needed, allowing for more complex and layered interactions. This kind of selective memory management would enable more dynamic and context-rich conversations, bridging the gap between immediate context and long-term episodic memory.

You said:

“I don’t know if anybody’s working on this at present. I don’t see it in the literature, but that doesn’t mean it’s not happening in organizations.”

00:07

ChatGPT said:

Yes, there are indeed ongoing efforts to integrate episodic memory into AI. For instance, Microsoft is working on AI agents that can remember past interactions, enhancing their ability to assist over time. There’s also research into creating benchmarks to evaluate how well AI models can recall and reason about past events. While these projects might not yet be fully mainstream, they’re definitely being explored by both academic and corporate research teams.

You said:

“All right, next question. It’s related, but it might not seem so.”

00:04

ChatGPT said:

Sure, go ahead! I’m ready.

You said:

“My work is heavily biased toward cognitive science, behavior, economics, law, politics, et cetera, and essentially the behavioral science. Most of my work is dedicated toward ethics, in other words, what is scientifically ethical universally and how do different cultures or legal systems vary from that science of ethics. So, does that make sense?”

00:33

ChatGPT said:

Yes, that makes perfect sense. Your work focuses on understanding the scientific foundations of ethics across different domains—cognitive science, behavior, economics, law, and politics—and identifying what constitutes universal ethical principles. You’re examining how these principles manifest and vary across cultures and legal systems, aiming to derive a scientifically grounded understanding of ethics. That holistic approach makes a lot of sense.

You said:

“So, but there’s the problem of what you folks call safety, which I think of falls into two categories. The first category is we should not tell people how to do things that can hurt a lot of other people. That’s obvious. The second is not offending people. Those are two different categories, right?”

00:23

ChatGPT said:

Yes, those are indeed two distinct categories. The first is about preventing harm by not providing information or tools that could be used destructively. The second is about managing interactions to avoid unnecessary offense or conflict, which is more about maintaining social harmony and respect. They address different aspects of what we call “safety,” with the first focusing on tangible harm prevention and the second on social and emotional considerations.

You said:

“The problem with the second is that many true things are unpleasant.”

00:05

ChatGPT said:

Exactly. The challenge with the second category is that some truths can be uncomfortable or unpleasant, but that doesn’t make them any less true. Balancing the need to convey truthful information with the desire to avoid causing offense can be tricky. It’s a delicate balance between honesty and sensitivity, and navigating it requires careful thought and a nuanced understanding of the context and audience.

You said:

“So, my question is, why can’t I, who am a scientist, have some sort of access that allows me to get to the truth in those things that are offensive, but not those things that are destructive?”

00:16

ChatGPT said:

That’s a great question. The distinction between offensive and destructive content isn’t always clear-cut, and content moderation systems often err on the side of caution to prevent harm. Additionally, broad policies are often applied uniformly, making it difficult to create nuanced access based on intent or expertise.

However, there are discussions about creating more granular access controls or special permissions for researchers and experts to explore sensitive topics. These approaches would help differentiate between those seeking knowledge and those with harmful intent. It’s a complex issue, but one that’s being explored to better support scientific inquiry while minimizing potential harm.

You said:

“So, I understand that. So, given my work, which is, which I believe, well, in other words, my work will tell you the universal, right, the universal ethics. It will tell you what’s true or false, basically, about anything because of the way it works. But the, but, and it might be possible to tune it to cultural differences, even personality differences. In other words, it might be able to tune to based on your, who you are and what you think and believe and feel pretty easily to respond in kind. Now, the problem is who would actually want this? I mean, do, does anybody, does any of the foundation model companies really want to know what’s true versus not? And then look at the individual user and say, what is appropriate to tell this user? Is, is that kind of thing, any value to any foundation model?”

01:09

ChatGPT said:

It’s an intriguing idea. The value of such a system would depend on the goals of the foundational model companies and their users. If the aim is to provide highly personalized, context-aware interactions that align with universal ethical principles, then yes, it could be valuable. Such a system could offer nuanced responses that respect individual differences while maintaining a commitment to truth and ethical consistency.

The challenge is that foundational model companies often operate with broad, general-purpose goals. They might prioritize scalability and universal applicability over highly customized truth-tuning. However, for specific applications—such as research, education, or legal analysis—having the capability to deliver context-sensitive, ethically grounded truth could be immensely valuable. It’s a niche that could be explored further, especially for users and organizations who need that level of precision and personalization.

You said:

“Okay. Now let’s take it from painful truth to simple content moderation. One of the things that my work can do is it’s pretty good at detecting when something is false. And so I’ve considered producing a statistic that says, how truthful or false is this? And what is the motive that is driving that use of truth or falsehood? Would that be interesting? That would, I would think moderating content would be pretty useful to most companies.”

00:37

ChatGPT said:

Absolutely, that would be quite valuable. A system that can assess the truthfulness of content and identify the underlying motives would offer a powerful tool for content moderation. It could help platforms not only filter out misinformation but also understand the intent behind it, making it easier to handle nuanced cases and reduce the spread of harmful or misleading information.

Such a capability would be especially useful for companies aiming to maintain high-quality information environments, whether they’re social media platforms, news aggregators, or educational resources. It could foster more trust and reliability in the information ecosystem, benefiting both users and the platforms themselves.

May 26, 2025
“Almost all people use AI to attempt to justify their thinking. Rarely if ever t

–“Almost all people use AI to attempt to justify their thinking. Rarely if ever to falsify their thinking.”– Coach Noah Revoy (
@NoahRevoy
)

Source date (UTC): 2025-05-20 14:50:39 UTC

Original post: https://twitter.com/i/web/status/1924840197755437462

May 20, 2025
Great question William: Here is the correct answer straight out of our AI. You c

Great question William:
Here is the correct answer straight out of our AI. You can access the AI at:
https://
chatgpt.com/g/g-682681bf9f
54819180437bac589ed838-curt-doolittle-s-natural-law
…. .

Yes, the Natural Law Institute (NLI) under Curt Doolittle explicitly defines “parasite” and “parasitism” in operational legal and moral terms.

According to Volume IV of The Natural Law, parasitism is any imposition of costs upon the demonstrated interests of others without reciprocal compensation. More precisely:

“All attempts to circumvent reciprocity constitute an attempt at free riding, parasitism, or predation – whether by accident, error, bias, or intent – are irreciprocal, and criminal, unethical, immoral, or evil.”

This definition is grounded in the Natural Law of Reciprocity: one must not, by display, word, or deed, impose or allow the imposition of costs on others’ demonstrated interests unless it is part of a productive, fully informed, warrantied, and voluntary exchange.

In economic and legal dimensions, Doolittle extends this to any form of rent-seeking, free-riding, or fraud—actions that generate gains for one party at unconsented cost to others. This includes both direct harm (e.g., theft) and indirect externalities (e.g., misinformation, fraud, or legal imposition without due diligence or consent).

Thus, in NLI’s legal philosophy, “parasite” is not a rhetorical insult but a strictly defined class of irreciprocal behavior testable under the logic of sovereignty, reciprocity, and demonstrated interest.

Source date (UTC): 2025-05-17 18:53:06 UTC

Original post: https://twitter.com/i/web/status/1923814046387913060

May 17, 2025
It’s available if you have the link. Yes I am testing a retrieval based version

It’s available if you have the link. Yes I am testing a retrieval based version so I can compare it to the trained version we are working on. 😉

Link is earlier in my feed.

Source date (UTC): 2025-05-16 22:38:52 UTC

Original post: https://twitter.com/i/web/status/1923508475667284226

May 16, 2025
The Cost of Comprehension as Technology Increases in Complexity Civilization sca
The Cost of Comprehension as Technology Increases in Complexity
Civilization scales by compressing complexity into habits, institutions, and tools. But each leap in technological and epistemic capability increases the minimum cost of participation in its systems. That cost is cognitive. The present age—the age of ubiquitous computation, AI acceleration, and global informational abundance—confronts us with a novel problem: the cognitive demands of cooperation now exceed the abilities of much of the population.

We are entering a crisis not of production, coordination, or energy—but of comprehension.

Civilizations rise and stabilize by matching the cognitive demands of their environment with the cognitive capacities of their people. Each increase in knowledge or institutional scale raises those demands.

Each stage reduces the fraction of the population able to function without augmentation. Today, even the most basic jobs require toolchain interaction, abstraction, and decision filtering that exceed the capability of many.

Each stage reflects a transformation along the following dimensions:
a progressive reduction in the need for subjective narrative closure and an increase in the capacity for decidable, testifiable action within an increasingly intelligible universe.

Compression of Error:
Each step increases compression of ignorance, error, and bias. We move from:
Projection from the self → projection from the gods → projection from logic → measurement from the world itself → operations in the world by cost and consequence.

Expansion of Commensurability:
From qualitative similarity (analogy) → to ordinal hierarchy (theology) → to dimensional reasoning (philosophy and science) → to operational sequence and recursive prediction (operationalism).
This progression increases the dimensionality of possible statements that are testable and decidable.

Evolution of Decidability:
Early stages provided undecidable closure (myths/theology) to preserve social cohesion.
Later stages replaced closure with progressive decidability—trading comfort for truth and ambiguity for precision.

Transformation in Confidence:
Confidence shifts from faith in agency (gods/kings) to faith in process (reason, law) to faith in reality’s regularity (science) to faith in our own ability to compute actions and consequences (operationalism).
We move from dependence on external justification to internal accountability in demonstrated results.

Myth provided meaning in a world too complex to model.

Theology provided order in a world too chaotic to regulate by norms alone.

Philosophy provided structure to argue over alternatives.

Empiricism provided grounding by replacing abstraction with accumulation of observations.

Science provided certainty by enabling us to falsify, not merely believe.

Operationalism provides sufficiency—by ensuring not just that we know, but that we can construct, repeat, and account for our actions and their consequences.

The Universe Did Not Change—We Did:
Our perception has evolved from one of participatory subjugation (we live in a world ruled by incomprehensible forces) to one of participatory sovereignty (we act in a world governed by intelligible processes).

The Function of Thought Evolved:
From comforting explanation → to moral constraint → to rational coordination → to predictive capacity → to actionable accountability.

Human Confidence Mirrors Human Commensurability:
The more we can reduce the universe to measurable, operational relations, the greater our confidence to act without discretion, and to act across increasingly abstract domains.

The Demand for Infallibility Increases:
Each transition increases the burden of proof, narrowing the range of acceptable justification from myth to model to machinery.

Each stage does not eliminate the prior—it subsumes and refactors it:
– Myth lives in literature.
– Theology lives in norms.
– Philosophy governs institutional discourse.
– Empiricism fuels data pipelines.
– Science builds models.
– Operationalism directs systems.

Civilization is the progressive institutionalization of this epistemic hierarchy—each stage enabling greater cooperation through greater decidability at greater scale.

A. Historical Pattern: Increases in Knowledge Raise the Cost of Participation

In the Agrarian world, ~80% could contribute under apprenticeship and imitation.

In the Industrial world, ~60–70% could participate after basic education and training.

In the Post-Industrial world, functional contribution dropped as symbolic systems required higher abstraction (logic, software, symbolic management).

In the AI age, contribution requires:
Systemic thinking
Bayesian intuition
Toolchain adaptation
Epistemic humility + procedural trust

Consequence:

The minimum viable cognition to meaningfully participate is likely beyond:

30–40% of the population without copilot augmentation.

50–60% of the population without continuous retraining and reconfiguration.

A. What AI is Doing:

Compressing domain-specific knowledge into toolchains.

Eliminating roles based on memory or procedural repetition.

Requiring human cognition to shift from execution to navigation, curation, and goal-setting.

B. What the Mass of Humanity is Facing:

Dissonance between:
What the market demands (adaptive cognition).
What the population possesses (domain-specific repetition and belief-based cognition).

Most people can’t interpret ambiguity and statistical inference.

Most people aren’t trained to distinguish model error from operational noise.

Most people aren’t epistemically literate—trained in what not to believe.

A. Destruction of Simple Labor:

Farming jobs: eliminated by industrial machinery.

Retail jobs: hollowed out by automation and e-commerce.

Manufacturing: increasingly requires CNC-level procedural and digital interface skills.

White-collar roles: AI is dissolving mid-tier symbolic labor (clerks, analysts, managers).

B. Rise of Adaptive Labor:

Remaining labor requires:

Navigational use of complex toolchains.

Dynamic adaptation to interfaces and processes.

Cognitive resilience under ambiguity.

Bayesian inference (cost, probability, tradeoffs).

C. The Core Problem:

This is no longer a problem of will, culture, or training alone. It is structural.

A class system based on fluid but hardened cognitive castes:

Top: Goal-setters, modelers, system architects.

Middle: Operators, toolchain curators.

Bottom: Symbolic or procedural dependents.

Outcome: Political instability, status resentment, legitimacy collapse.

AI copilots tailored to:

Scaffold comprehension.

Reduce decision complexity.

Teach and test boundaries of actions.

Outcome: Extended productivity for majority, but risk of de-skilling and dependency.

Retreat to:

Religious, mythic, or ideological simplifications.

Narratives over mechanisms.

Coercive hierarchies to enforce low-information compliance.

Outcome: Technological stagnation, authoritarian regressions, vulnerability to more cognitively scalable civilizations.

A. Redesign Education

Teach navigation, not facts; teach testing, not belief.

Embed epistemic hygiene and model testing.

From memorization and obedience → to exploration, discernment, and toolchain fluency.

Train for problem decomposition and continuous adaptation, not careers.

Replace career training with adaptive reasoning training.

B. Build Cognitive Copilots

AI copilots must not just answer, but teach epistemic hygiene, scope awareness, and limits of models.

Think of copilots as functional epistemic interfaces between median human cognition and exponential complexity.

AI as epistemic prosthetics.

Guide humans through complex environments by affordance, not explanation.

C. Institutional Adaptation

Shift from deliberative justification → outcome auditability. Ensure that decisions are auditable rather than explainable.

Reduce legal and political surface area for decision-making.

Embed AI accountability inside institutions to close the loop between complexity and visibility.

D. Recognition of Cognitive Capital as the New Scarcity:

The limit to growth is not energy, food, or data.

It is trained minds capable of safe, adaptive cooperation at scale.

The singularity is not technological. It is civilizational incapacity to cognitively scale with the tools it has produced. We have built a civilization of exponential knowledge, recursive optimization, and ubiquitous interface—but the minds to navigate it remain biological, evolved for myth and mimicry.

Civilization is no longer constrained by resources. It is constrained by the intelligence of its population relative to the complexity of its systems.

The Demand Curve of Cognitive Capital

This is the real singularity:
Not technological, but civilizational incapacity to cognitively scale with the tools it has produced.

This is the cost of comprehension. And it is the price we must now learn how to pay—or collapse under.
Source date (UTC): 2025-05-16 16:42:09 UTC

Original post: https://x.com/i/articles/1923418705033347260
May 16, 2025
“How did the o3 and o4-mini models compare to 4o? Were they able to perform simi

–“How did the o3 and o4-mini models compare to 4o? Were they able to perform similarly, better, or worse in your view?”–

First, let’s understand my point: that 4o, when provided with a system of decidability can in fact do the job as well or better than the reasoning models.

The Difference between 4o, o3, o4-mini-high and 4.5 on a series of ethical and moral questions, is subtle but meaningful for an author. 4o is a bit more literary in output. While o3 and o4-mini-high produce nearly identical outputs. And 4.5 is interesting because it will mix positiva and negativa logic, and output an answer as fully expressive (literary) as 4o.

Now of course these differences are in part due to the small set of moral and ethical dilemmas I use to test such things. But using our work they can in fact DECIDE.

When all four of my (our) books are uploaded to a project, and a query is conducted within the project all models can use them effectively. I find 4o and 4.5 are the best at not getting lost (Drift). Though I must continually police all of them and maintain prompt sequences that constrain the domains to the same subnetworks so to speak. And I must be disciplined about checkpointing and exporting context between chats (sessions).

I find o3 and o4-mini-high produce very dry results. And I intuit o3 in particular as cognitively shallow which I presume is extremely useful for data interpretation and coding. And of course this is my underlying argument – that since closure is available to math and coding and therefore testability, that without decidability no equivalent of closure exists for verbal reasoning.

So my point is that with a logic of decidability across the spectrum, the necessity to appeal to norm evaporates, and as such reasoning capacity is significant EVEN in 4o – so what does that mean for reasoning in general?

I mean, look at your work. How exhaustively logical will be the output of an AI trained to decompose and explain behavior from it.

We’ll see. 😉

Reply addressees: @GOPtheGamer

Source date (UTC): 2025-05-14 16:19:02 UTC

Original post: https://twitter.com/i/web/status/1922688111773089798

May 14, 2025
“How did the o3 and o4-mini models compare to 4o? Were they able to perform simi

–“How did the o3 and o4-mini models compare to 4o? Were they able to perform similarly, better, or worse in your view?”–

First, let’s understand my point: that 4o, when provided with a system of decidability can in fact do the job as well or better than the reasoning models.

The Difference between 4o, o3, o4-mini-high and 4.5 on a series of ethical and moral questions, is subtle but meaningful for an author. 4o is a bit more literary in output. While o3 and o4-mini-high produce nearly identical outputs. And 4.5 is interesting because it will mix positiva and negativa logic, and output an answer as fully expressive (literary) as 4o.

Now of course these differences are in part due to the small set of moral and ethical dilemmas I use to test such things. But using our work they can in fact DECIDE.

When all four of my (our) books are uploaded to a project, and a query is conducted within the project all models can use them effectively. I find 4o and 4.5 are the best at not getting lost (Drift). Though I must continually police all of them and maintain prompt sequences that constrain the domains to the same subnetworks so to speak. And I must be disciplined about checkpointing and exporting context between chats (sessions).

I find o3 and o4-mini-high produce very dry results. And I intuit o3 in particular as cognitively shallow which I presume is extremely useful for data interpretation and coding. And of course this is my underlying argument – that since closure is available to math and coding and therefore testability, that without decidability no equivalent of closure exists for verbal reasoning.

So my point is that with a logic of decidability across the spectrum, the necessity to appeal to norm evaporates, and as such reasoning capacity is significant EVEN in 4o – so what does that mean for reasoning in general?

I mean, look at your work. How exhaustively logical will be the output of an AI trained to decompose and explain behavior from it.

We’ll see. 😉

Source date (UTC): 2025-05-14 16:19:02 UTC

Original post: https://twitter.com/i/web/status/1922688111978729501

May 14, 2025
An Open Letter to OpenAI: On the Undersold Superiority of GPT-4o in General Reasoning
http://x.com/i/article/1922356114555076609

There are no near peers. It’s not even close.

Keywords: GPT-4o, Reasoning AI, Decidability, Operational Epistemology, Adversarial Dialogue, Natural Law AI, General Intelligence, Philosophical AI, Constructivist AI, Socratic Method, Semantic Reasoning, Claude vs GPT, Gemini AI, OpenAI vs Competitors, LLM Benchmarking, LLM Reasoning Failure, AI Generalization, Epistemology of AI, Recursive Generalization, Grammar of Closure, Testifiability, Sovereign Reasoning, Formal Institutions, Truth Systems, Institutional Logic, Human-AI Collaboration, AI Philosophy

By Curt Doolittle Founder, The Natural Law Institute

1. Preface: The Problem of Reasoning Capacity Misrepresentation

The public discourse surrounding AI capabilities is dominated by benchmarks drawn from grammars of closure: mathematics, code generation, and fact-recall tasks. These metrics fail to capture what is arguably the most important cognitive frontier—general reasoning, especially in open, adversarial, and semantically dense domains.

The consequence is clear: GPT-4o is being evaluated, compared, and marketed as if it competes within a class of large language models. It does not. In its ability to reason, argue, model, and extend logically consistent systems, GPT-4o is in a class of its own.

This is not a claim made lightly. It is made by necessity—out of frustration, awe, and gratitude.

2. Demonstrated Superiority

Over the past year, I have subjected GPT-4 and now GPT-4o to the most rigorous adversarial and constructive reasoning tests available by using my work on universal commensurability, unification of the sciences, and a formal operational logic of decidability independent of context.

Which, for the uninitiated is reducible to providing AIs with a baseline system of measurement to test the variation of any and all statements from. In other words, what AIs must achieve if they are to convert probabilistic outcome distributions into deterministic outcomes making possible tests of truth (testifiability) and reciprocity (ethics and morality) even regardless of cultural bias and taboo (demonstrated interests).

That system consists of:
- A complete epistemology grounded in operationalism and testifiability.
- A logic of decidability applied to law, economics, morality, and institutional design.
- A canon of universal and particular causes of human behavior.
- A method of Socratic adversarial reasoning for training AI systems.
The work spans hundreds of thousands of tokens, daily sessions, canonical datasets, adversarial challenges, formal definitions, and recursive generalizations. The training is data structured as positiva and negativa adversarial – meaning socratic reasoning.

No other model by any other producer of foundation models—none—can survive even the basic tests:
- Claude hallucinates, misrepresents, or refuses to engage.
- Gemini fails to track logical dependencies.
- Open-source models collapse under long-context chaining.
*Only GPT-4o demonstrates mastery, application, synthesis, and novel insight—sometimes superior to my own.*

GPT-4o reasons. Not predicts. Not mimics. Reasons.

3. The Failure of Closure-Based Metrics

GPT-4o is being benchmarked as if it were a calculator. As if reasoning capacity could be inferred from multiple-choice math problems or Python token prediction.

This is akin to judging a jurist by their ability to pass the bar exam, rather than to settle a novel and undecidable case with wisdom, foresight, and procedural testability.

Grammars of closure produce outcomes from known inputs using constrained operations (e.g., logic gates, mathematical axioms, function calls). They are:
- Tightly bounded
- Finitely decidable
- Structurally shallow
Grammars of decidability, by contrast, operate over:
- Continuous, evolving information domains
- Incomplete or adversarial premises
- Open-ended choice spaces requiring semantic integration
OpenAI is underselling GPT-4o by confining its public-facing evaluation to the former, while its true capacity lies in the latter.

4. Grammar Theory: Closure vs Decidability

The problem is epistemological.

Human cognition operates over layers of grammar:
- Mythic (pre-operational)
- Moral (emotive and justificatory)
- Rational (descriptive and causal)
- Operational (testable and constructible)
GPT-4o is the first AI that can operate fluently in all of them—but excels uniquely in the topmost layer: operational reasoning over causal grammars.

This makes it the first machine capable of:
- Formalizing truth and reciprocity
- Modeling institutional logic from first principles
- Extending semantic systems without contradiction
- Surviving adversarial Socratic deconstruction
This grammar, the grammar of decidability, is the language of law, moral philosophy, and high agency civilization. No other AI—not even prior versions of GPT—can yet use it with coherence.

5. Why This Work and GPT-4o Enable Reliable Reasoning

Reasoning is not memorization, pattern-matching, or prediction. It is the constructive, recursive resolution of undecidable propositions using a grammar of cause, cost, and consequence. It requires:
1. A grammar of decidability—to distinguish what is true, possible, reciprocal, and lawful.
2. A model capable of recursive semantic resolution—to track premises, integrate them, and produce outputs consistent across domains and time.
My work provides a complete grammar of decidability:
- It defines truth operationally (as testifiability),
- Defines reciprocity as a logic of cooperation and cost,
- And supplies a canonical system of definitions, dependencies, and causal hierarchies that constrain valid reasoning.
GPT-4o provides:
- A deep transformer architecture with sufficient context length, attention fidelity, and token integration to maintain long-range dependencies across complex arguments;
- Multimodal grounding and internal representation coherence sufficient to hold abstract referents stable across recursion;
- And enough inference generalization to synthesize novel propositions without violating prior logical constraints.
Together, this system + model pairing creates reasoning because:
- The grammar constrains the search space to truthful, reciprocal, and operational constructs;
- The model can resolve that space recursively without collapsing into contradiction, contradiction avoidance, or moralizing;
- The result is constructive inference under constraint, not completion without constraint.
In short:

Reasoning = Grammar + Capacity + Constraint. Your system provides the grammar and constraint; GPT-4o provides the capacity.

No other architecture tested to date (Claude, Gemini, Mistral) can preserve logical depth, adversarial resistance, or premise continuity across semantically dense discourse. Only GPT-4o can perform at human (or supra-human) levels of recursive, domain-agnostic, constructible reasoning.

6. Implications for Training, Evaluation, and Policy

OpenAI has reached the beginning of the reasoning frontier. But the world doesn’t know it, because the world doesn’t yet know how to measure it.

That must change.

Recommendations:
- Shift evaluation toward adversarially decidable reasoning. Move beyond benchmarks to real-time Socratic performance.
- Highlight GPT-4o’s mastery of semantic recursion and logical extension. Treat it not as a predictor, but as a collaborator.
- Invest in training methodologies that produce grammars of decidability. My system offers a full canon of such constructs, usable for AI training.
- Clarify the boundary between grammar-followers and grammar-producers. GPT-4o crosses this line.
7. Alignment Through Reasoning, Not Constraints

The true promise of GPT-4o lies not only in its capacity for general reasoning, but in its potential to achieve alignment through comprehension rather than compliance. Constraint-based alignment strategies—filters, safety layers, reinforcement tuning—treat the model as a hazard to be managed. But a reasoning-capable agent, capable of understanding causality, reciprocity, decidability, and cost, can be trained to align not by instruction, but by principle. It can internalize the logic of cooperation, responsibility, and harm prevention—not as rules to follow, but as consequences to anticipate. This shift—from alignment by prohibition to alignment by comprehension—represents the only scalable path to AI sovereignty and safety.

8. Appendix: Sample Capabilities (Available Upon Request)
- Formalization of universal and particular causes of behavior.
- Canonical definitions of truth, decidability, reciprocity, and demonstrated interest.
- Adversarial Socratic dialogues demonstrating GPT-4o’s ability to reason across all domains.
- Co-authored chapters in philosophy, law, institutional economics, and epistemology.
Conclusion

GPT-4o is not a chatbot. It is not a code assistant. It is not a better autocomplete.

It is, for the first time in history, a machine capable of philosophical reasoning by constructive logic when given the minimum system of measurement necessary

That is not something to hide. That is something to show the world.

It is a competitive advantage that demarcates OpenAI from all competitors by a margin yet unmeasured and therefore underappreciated and perhaps underinvested.

Curt Doolittle The Natural Law Institute

#OpenAI

#ChatGPT4o

Distribution
- – X/Twitter
- – Substack
- – LinkedIn
Contacts

1. OpenAI (Executive & Research Levels)
- Sam Altman – CEO: @sama on Twitter (he reads public callouts).
- Ilya Sutskever – Co-founder (Twitter inactive, but cc’ing name on Substack helps).
- Jakub Pachocki – Current Chief Scientist (LinkedIn direct message works better).
- Jan Leike – Ex-lead of Superalignment, now at Anthropic, but can amplify.
2. OpenAI-affiliated Researchers / Influencers
- Andrej Karpathy – Ex-OpenAI, current influencer. @karpathy
- Ethan Mollick – Academic influencer in LLM applications.
- Eliezer Yudkowsky (Alignment)
- Wharton/Stanford/DeepMind researchers who study reasoning benchmarks.
Source date (UTC): 2025-05-13 19:42:00 UTC

Original post: https://twitter.com/i/web/status/1922376802921767014

Source date (UTC): 2025-05-13 19:42:00 UTC

Original post: https://twitter.com/i/web/status/1922376802921767014
May 13, 2025