Yes, I have a sortcut on my desktop to an architectural diagram for GPT 4 and the fking simplicity of it is what I’d expect. I think no one expected the use of the attention layers to produce what they did.
Sam has come out and said that they think they will solve the reasoning…
Source date (UTC): 2024-01-13 20:35:45 UTC
Original post: https://twitter.com/i/web/status/1746269826962670006
Replying to: https://twitter.com/i/web/status/1746267277664583977
Leave a Reply