I should have mentioned the fourth innovation, but (foolishly) didn’t consider it important at the time I posted:
iv) The limit of weights to eight bits reduces the memory capacity without (apparently) affecting the outputs.
So Experts, Update Range, Tokens, Bits, combined with the Reduction (Synthesis) of OpenAI’s data, and in most cases further reduction to other frameworks, compress the work effort of inference.
Source date (UTC): 2025-01-28 18:29:26 UTC
Original post: https://twitter.com/i/web/status/1884307815801761793
Replying to: https://twitter.com/i/web/status/1884059118534811825
IN REPLY TO:
Unknown author
(Doolittle on AI)
RE: Deepseek Nonsense
Ok, been through the code that’s available. It’s not only obvious that the training code isn’t shared, but from what I’ve gathered, they are afraid or ashamed to share it for good reason.
1) There are three innovations in the code that Deepseek used to save compute:
i) A mixture of experts divides the problem vertically into silos.
ii) Limiting the network hierarchy that’s updated (reinforced) divides the problem scope horizontally.
iii) Predicting phrases instead of tokens reduces the network numerically ( and I suspect produces more semantic value per byte so to speak)
2) They used existing code from Meta’s Open Source LLM, and slightly modified it.
3) I am not positive, but given the code thinks it’s OpenAi, the absence of the training code, and the similarity of the results, it appears that they either got a copy of the OpenAI weights, OR they traversed the OpenAI graph using multiple accounts instead of ‘training’ Deepseek from source data.
3) In other words, yes there are innovations but they are micro innovations on existing work and another example of intellectual property theft.
Ergo: DeepSeek is little more than a raid on OpenAI’s intellectual property under the pretense of replication of the work effort, which does in the end result in the conversion of OpenAI’s private intellectual property to an Open Source that can be used by others IF we can recreate the training code so that we can tweak the model and add new verticals (Experts) to it.
I don’t have time for this kind of skullduggery but our company’s future depends upon access to an AI we can train by adding an expert to it, a chain of thought for that expert and an API consisting of a set of prompts to feed that chain of thought.
Cheers
CD
Original post: https://x.com/i/web/status/1884059118534811825
Leave a Reply