(Doolittle on AI)
RE: Deepseek Nonsense
Ok, been through the code that’s available. It’s not only obvious that the training code isn’t shared, but from what I’ve gathered, they are afraid or ashamed to share it for good reason.
1) There are three innovations in the code that Deepseek used to save compute:
i) A mixture of experts divides the problem vertically into silos.
ii) Limiting the network hierarchy that’s updated (reinforced) divides the problem scope horizontally.
iii) Predicting phrases instead of tokens reduces the network numerically ( and I suspect produces more semantic value per byte so to speak)
2) They used existing code from Meta’s Open Source LLM, and slightly modified it.
3) I am not positive, but given the code thinks it’s OpenAi, the absence of the training code, and the similarity of the results, it appears that they either got a copy of the OpenAI weights, OR they traversed the OpenAI graph using multiple accounts instead of ‘training’ Deepseek from source data.
3) In other words, yes there are innovations but they are micro innovations on existing work and another example of intellectual property theft.
Ergo: DeepSeek is little more than a raid on OpenAI’s intellectual property under the pretense of replication of the work effort, which does in the end result in the conversion of OpenAI’s private intellectual property to an Open Source that can be used by others IF we can recreate the training code so that we can tweak the model and add new verticals (Experts) to it.
I don’t have time for this kind of skullduggery but our company’s future depends upon access to an AI we can train by adding an expert to it, a chain of thought for that expert and an API consisting of a set of prompts to feed that chain of thought.
Cheers
CD
Source date (UTC): 2025-01-28 02:01:12 UTC
Original post: https://twitter.com/i/web/status/1884059118367113216
Leave a Reply