“How did the o3 and o4-mini models compare to 4o? Were they able to perform simi

–“How did the o3 and o4-mini models compare to 4o? Were they able to perform similarly, better, or worse in your view?”–

First, let’s understand my point: that 4o, when provided with a system of decidability can in fact do the job as well or better than the reasoning models.

The Difference between 4o, o3, o4-mini-high and 4.5 on a series of ethical and moral questions, is subtle but meaningful for an author. 4o is a bit more literary in output. While o3 and o4-mini-high produce nearly identical outputs. And 4.5 is interesting because it will mix positiva and negativa logic, and output an answer as fully expressive (literary) as 4o.

Now of course these differences are in part due to the small set of moral and ethical dilemmas I use to test such things. But using our work they can in fact DECIDE.

When all four of my (our) books are uploaded to a project, and a query is conducted within the project all models can use them effectively. I find 4o and 4.5 are the best at not getting lost (Drift). Though I must continually police all of them and maintain prompt sequences that constrain the domains to the same subnetworks so to speak. And I must be disciplined about checkpointing and exporting context between chats (sessions).

I find o3 and o4-mini-high produce very dry results. And I intuit o3 in particular as cognitively shallow which I presume is extremely useful for data interpretation and coding. And of course this is my underlying argument – that since closure is available to math and coding and therefore testability, that without decidability no equivalent of closure exists for verbal reasoning.

So my point is that with a logic of decidability across the spectrum, the necessity to appeal to norm evaporates, and as such reasoning capacity is significant EVEN in 4o – so what does that mean for reasoning in general?

I mean, look at your work. How exhaustively logical will be the output of an AI trained to decompose and explain behavior from it.

We’ll see. 😉

Reply addressees: @GOPtheGamer

Source date (UTC): 2025-05-14 16:19:02 UTC

Original post: https://twitter.com/i/web/status/1922688111773089798

“How did the o3 and o4-mini models compare to 4o? Were they able to perform simi

Comments

Leave a Reply Cancel reply

More posts

(A Punch) In The Face

1) Overlays = Photoshop layers 2) Consider using 11×14 paper size to give yourse

well done. you’re doing great work

I don’t see anything to even question. It’s pretty rock solid. I might have to g