Form: Reply

RT @curtdoolittle: @BehizyTweets @TomReevesMBA @CommunityNotes The Community not

RT @curtdoolittle: @BehizyTweets @TomReevesMBA @CommunityNotes
The Community note is mistaken as it argues against a straw man. The author…

Source date (UTC): 2025-05-14 16:38:09 UTC

Original post: https://twitter.com/i/web/status/1922692924820672564

May 14, 2025
Yes. Thanks for this link. 😉

Yes. Thanks for this link. 😉

Source date (UTC): 2025-05-14 16:19:33 UTC

Original post: https://twitter.com/i/web/status/1922688243859919246

Reply addressees: @Belvederi

Replying to: https://twitter.com/i/web/status/1922687280503755241

https://twitter.com/i/web/status/1922687280503755241

May 14, 2025
Yes. Thanks for this link. 😉

Yes. Thanks for this link. 😉

Source date (UTC): 2025-05-14 16:19:33 UTC

Original post: https://twitter.com/i/web/status/1922688243859919246

May 14, 2025
“How did the o3 and o4-mini models compare to 4o? Were they able to perform simi

–“How did the o3 and o4-mini models compare to 4o? Were they able to perform similarly, better, or worse in your view?”–

First, let’s understand my point: that 4o, when provided with a system of decidability can in fact do the job as well or better than the reasoning models.

The Difference between 4o, o3, o4-mini-high and 4.5 on a series of ethical and moral questions, is subtle but meaningful for an author. 4o is a bit more literary in output. While o3 and o4-mini-high produce nearly identical outputs. And 4.5 is interesting because it will mix positiva and negativa logic, and output an answer as fully expressive (literary) as 4o.

Now of course these differences are in part due to the small set of moral and ethical dilemmas I use to test such things. But using our work they can in fact DECIDE.

When all four of my (our) books are uploaded to a project, and a query is conducted within the project all models can use them effectively. I find 4o and 4.5 are the best at not getting lost (Drift). Though I must continually police all of them and maintain prompt sequences that constrain the domains to the same subnetworks so to speak. And I must be disciplined about checkpointing and exporting context between chats (sessions).

I find o3 and o4-mini-high produce very dry results. And I intuit o3 in particular as cognitively shallow which I presume is extremely useful for data interpretation and coding. And of course this is my underlying argument – that since closure is available to math and coding and therefore testability, that without decidability no equivalent of closure exists for verbal reasoning.

So my point is that with a logic of decidability across the spectrum, the necessity to appeal to norm evaporates, and as such reasoning capacity is significant EVEN in 4o – so what does that mean for reasoning in general?

I mean, look at your work. How exhaustively logical will be the output of an AI trained to decompose and explain behavior from it.

We’ll see. 😉

Reply addressees: @GOPtheGamer

Source date (UTC): 2025-05-14 16:19:02 UTC

Original post: https://twitter.com/i/web/status/1922688111773089798

May 14, 2025
“How did the o3 and o4-mini models compare to 4o? Were they able to perform simi

–“How did the o3 and o4-mini models compare to 4o? Were they able to perform similarly, better, or worse in your view?”–

First, let’s understand my point: that 4o, when provided with a system of decidability can in fact do the job as well or better than the reasoning models.

The Difference between 4o, o3, o4-mini-high and 4.5 on a series of ethical and moral questions, is subtle but meaningful for an author. 4o is a bit more literary in output. While o3 and o4-mini-high produce nearly identical outputs. And 4.5 is interesting because it will mix positiva and negativa logic, and output an answer as fully expressive (literary) as 4o.

Now of course these differences are in part due to the small set of moral and ethical dilemmas I use to test such things. But using our work they can in fact DECIDE.

When all four of my (our) books are uploaded to a project, and a query is conducted within the project all models can use them effectively. I find 4o and 4.5 are the best at not getting lost (Drift). Though I must continually police all of them and maintain prompt sequences that constrain the domains to the same subnetworks so to speak. And I must be disciplined about checkpointing and exporting context between chats (sessions).

I find o3 and o4-mini-high produce very dry results. And I intuit o3 in particular as cognitively shallow which I presume is extremely useful for data interpretation and coding. And of course this is my underlying argument – that since closure is available to math and coding and therefore testability, that without decidability no equivalent of closure exists for verbal reasoning.

So my point is that with a logic of decidability across the spectrum, the necessity to appeal to norm evaporates, and as such reasoning capacity is significant EVEN in 4o – so what does that mean for reasoning in general?

I mean, look at your work. How exhaustively logical will be the output of an AI trained to decompose and explain behavior from it.

We’ll see. 😉

Source date (UTC): 2025-05-14 16:19:02 UTC

Original post: https://twitter.com/i/web/status/1922688111978729501

May 14, 2025
)

😉

Source date (UTC): 2025-05-13 18:01:06 UTC

Original post: https://twitter.com/i/web/status/1922351412396093771

Reply addressees: @PlayerJuan11 @iAnonPatriot

Replying to: https://twitter.com/i/web/status/1922328996441669781

IN REPLY TO:

@PlayerJuan11

@curtdoolittle @iAnonPatriot Have a good week Curt!

Original post: https://twitter.com/i/web/status/1922328996441669781

May 13, 2025
RT @SydSteyerhart: The American Left has gone full, mask-off, white genocide. Th

RT @SydSteyerhart: The American Left has gone full, mask-off, white genocide. This is the most important political development of our lifet…

Source date (UTC): 2025-05-13 16:51:53 UTC

Original post: https://twitter.com/i/web/status/1922333992247775590

May 13, 2025
(Hugs brother) 😉

(Hugs brother) 😉

Source date (UTC): 2025-05-13 16:29:06 UTC

Original post: https://twitter.com/i/web/status/1922328259850584095

Reply addressees: @PlayerJuan11 @iAnonPatriot

Replying to: https://twitter.com/i/web/status/1922284771977126325

IN REPLY TO:

@PlayerJuan11

@curtdoolittle @iAnonPatriot God damn Curt, you’re a pleasant surprise to see in this reply section.

Original post: https://twitter.com/i/web/status/1922284771977126325

May 13, 2025
Big. In the late 80s I built a NN with phonemes as the equivalent of tokens. I’m

Big. In the late 80s I built a NN with phonemes as the equivalent of tokens. I’m happy and pleasantly surprised that Meta’s taken this route as it circumvents so many problems we’ve seen emerge. (I’m giddy really.) lol

Source date (UTC): 2025-05-13 16:11:15 UTC

Original post: https://twitter.com/i/web/status/1922323767214162430

Reply addressees: @rohanpaul_ai @AIatMeta

Replying to: https://twitter.com/i/web/status/1921976957991854432

IN REPLY TO:

@rohanpaul_ai

This LLM from Meta skips the tokenizer, reading text like computers do (bytes!) for better performance.

The training and inference code for BLT is released on GitHub.

@AIatMeta just released the model weights for their 8B-param Dynamic Byte Latent Transformer (BLT), an alternative to traditional tokenization.

BLT is a tokenizer-free LLM architecture that processes raw bytes directly.

→ Traditional LLMs rely on tokenization, a pre-processing step grouping bytes into a fixed vocabulary. This introduces issues like domain sensitivity, poor handling of noise, lack of orthographic knowledge, and multilingual inequity, besides being computationally suboptimal as compute is allocated uniformly per token, regardless of information density.

→ BLT tackles this by using a dynamic, learnable method to group bytes into patches based on context, typically using the entropy of the next-byte prediction. This allows allocating more compute where needed (high entropy bytes) and less where it’s not (predictable sequences like whitespace).

→ The architecture features three transformer blocks: two small byte-level models (Local Encoder and Local Decoder) and a large Latent Global Transformer. The Local Encoder maps input bytes to patch representations using cross-attention, the Latent Transformer processes these patches autoregressively, and the Local Decoder converts patch outputs back to bytes, again using cross-attention. Hash n-gram embeddings are added to byte embeddings to incorporate contextual information efficiently.

→ Performance Benchmark
BLT matches the training FLOP-controlled performance of Llama 3 up to 8B parameters and 4T training bytes. It can achieve up to 50% fewer inference FLOPs than Llama 3 by using larger average patch sizes (e.g., 8 bytes vs. Llama 3’s 4.4 bytes).

BLT shows enhanced robustness, outperforming Llama 3 (1T tokens) by +8 points average on noised HellaSwag and significantly on character-manipulation benchmarks like CUTE (+25 points). BLT also demonstrates better scaling trends in fixed-inference-cost scenarios, allowing simultaneous increases in model and patch size.

The training and inference code for BLT is released on GitHub.

Original post: https://twitter.com/i/web/status/1921976957991854432

May 13, 2025
As an absent father married first to his businesses, I must attribute that credi

As an absent father married first to his businesses, I must attribute that credit to my wives. But for all the manners my children do have – more normative in manners, and more kindness rather than proper manners – they are still in part products of their generations, and like most Millennials and Gen Z, the vestiges of our former lesser aristocracy lost, and as such, they are weak. I love them. Proud of them. But they are ordinary.

Reply addressees: @Prisoners_Hope @philbak1

Source date (UTC): 2025-05-13 15:41:20 UTC

Original post: https://twitter.com/i/web/status/1922316238967246848

Replying to: https://twitter.com/i/web/status/1922311487185375440

IN REPLY TO:

@Prisoners_Hope

@curtdoolittle @philbak1 Are you saying that you didn’t raise your kids with good manners?

Original post: https://twitter.com/i/web/status/1922311487185375440

May 13, 2025