Category: AI, Computation, and Technology

  • RT @curtdoolittle: @ladypharaoh777 @BrianRoemmele –“How is the legacy data you

    RT @curtdoolittle: @ladypharaoh777 @BrianRoemmele –“How is the legacy data you choose entered into the AI? Is some of it scanned?”–

    Most…


    Source date (UTC): 2025-01-30 23:20:11 UTC

    Original post: https://twitter.com/i/web/status/1885105759165288741

  • “How is the legacy data you choose entered into the AI? Is some of it scanned?”-

    –“How is the legacy data you choose entered into the AI? Is some of it scanned?”–

    Most human knowledge is now available digitally, and automated programs (‘bots’) can crawl the internet much like search engines do, programmatically collecting and indexing information.

    Some relevant estimates:
    Arts & Letters – The digitization of literature, music, and other cultural artifacts has surged. While quantifying this as “knowledge” is complex, the volume of digital content available today vastly exceeds anything prior to the digital era.
    Books – An estimated 25-50% of all books published since the invention of the printing press have been produced since the desktop computing era.
    Photographs – Over 90% of all photographs ever taken were created in the digital era.
    Science – 70-80% of all scientific literature has been published since the rise of digital computing.
    Data – The volume of digital data is even more dramatic, with some estimates suggesting that 90% of all existing data was generated in just the last two years.

    Given this explosion of digital content:
    – Search engines and other online repositories can be mined for stored information.
    – Some databases and archives require paid access.
    – Others are circumvented, with their data acquired without permission.
    – Most non-fiction books, along with a substantial portion of fiction, have been digitized and are freely available, particularly in Eastern Europe and Asia.
    – Materials that remain undigitized can be manually scanned, but I am unaware of any AI company actively engaging in large-scale scanning.

    At present, AI models are effectively reducing this massive corpus into a compressed form of meaningful, high-quality human knowledge—acting as a synthesized distillation of available intellectual content.

    (I mean, ChatGPT is familiar with my work, even if it’s ‘off by a bit’ and I’m a relatively minor figure in philosophy and social science.)

    Cheers,
    CD

    Reply addressees: @ladypharaoh777 @BrianRoemmele


    Source date (UTC): 2025-01-30 23:20:05 UTC

    Original post: https://twitter.com/i/web/status/1885105733320269825

    Replying to: https://twitter.com/i/web/status/1884747681307758926

  • RT @BrianRoemmele: Folks are asking is this all I found? No, there is more. I ca

    RT @BrianRoemmele: Folks are asking is this all I found? No, there is more. I can say today DeepSeek used some data from GPT 3.5 and Claude…


    Source date (UTC): 2025-01-30 22:53:59 UTC

    Original post: https://twitter.com/i/web/status/1885099167833612399

  • I asked Grok

    I asked Grok ..
    https://x.com/i/grok/share/HOceFZ9qR0hs7hsudCfRwK8S4


    Source date (UTC): 2025-01-30 14:41:08 UTC

    Original post: https://twitter.com/i/web/status/1884975136966942966

  • RT @ThruTheHayes: AN AI EXPLORATION OF SUBJECTS IN WHICH ONLY MY PEOPLE DEMONSTR

    RT @ThruTheHayes: AN AI EXPLORATION OF SUBJECTS IN WHICH ONLY MY PEOPLE DEMONSTRATE MASTERY

    As an American white man, with a masculine bra…


    Source date (UTC): 2025-01-29 18:03:43 UTC

    Original post: https://twitter.com/i/web/status/1884663729541071285

  • RT @PalmerLuckey: DeepSeek is legitimately impressive, but the level of hysteria

    RT @PalmerLuckey: DeepSeek is legitimately impressive, but the level of hysteria is an indictment of so many.

    The $5M number is bogus. It…


    Source date (UTC): 2025-01-29 17:30:13 UTC

    Original post: https://twitter.com/i/web/status/1884655299480739994

  • RT @NoahRevoy: Even in the human brain lying (that’s what woke is) requires more

    RT @NoahRevoy: Even in the human brain lying (that’s what woke is) requires more brain cycles than telling the truth.

    In an AI that’s mult…


    Source date (UTC): 2025-01-29 13:26:24 UTC

    Original post: https://twitter.com/i/web/status/1884593941380780541

  • RT @Father_Speaking: @curtdoolittle I haven’t considered delving into AI, I have

    RT @Father_Speaking: @curtdoolittle I haven’t considered delving into AI, I have enough hobbies, but the idea of automating you, via a LLM,…


    Source date (UTC): 2025-01-28 23:41:33 UTC

    Original post: https://twitter.com/i/web/status/1884386362444677359

  • (Doolittle on AI) Q: Curt: –“How does AI intersect with Metaphor Theory? AI see

    (Doolittle on AI)
    Q: Curt: –“How does AI intersect with Metaphor Theory? AI seems very logic based while Metaphor Theory seems to postulate that we humans think in layers and layers of metaphors built up based on our physical experience in the world.”– @patriciamdavis

    Metaphor is a category of analogy and analogy is a category of generalization – neurons compete to achieve a coherence of competing networks of generalizations.

    AutoAssociation of episodic memory in the hippocampal region seeks analogy for the purpose of identification > prediction > competition > valence > attention (choice).

    LLM AI is not logic based (sets), but bayesian probabilism (statistics). Instead, we are having a hell of a problem making it Reason because reason (wayfinding) requires parallel competition and preservation of state. So, Math and Programming ( set logic, procedural logic, finite references and referents) are much much easier than reasoning (episodic logic, infinite referents). And my (our) work requires mathematical decidability, programmatic reasoning, and auto-associative narrating (episodic traversal).

    Cheers
    CURT

    Reply addressees: @patriciamdavis


    Source date (UTC): 2025-01-28 18:38:39 UTC

    Original post: https://twitter.com/i/web/status/1884310136313069569

    Replying to: https://twitter.com/i/web/status/1884306479798440187

  • I should have mentioned the fourth innovation, but (foolishly) didn’t consider i

    I should have mentioned the fourth innovation, but (foolishly) didn’t consider it important at the time I posted:

    iv) The limit of weights to eight bits reduces the memory capacity without (apparently) affecting the outputs.

    So Experts, Update Range, Tokens, Bits, combined with the Reduction (Synthesis) of OpenAI’s data, and in most cases further reduction to other frameworks, compress the work effort of inference.


    Source date (UTC): 2025-01-28 18:29:26 UTC

    Original post: https://twitter.com/i/web/status/1884307815801761793

    Replying to: https://twitter.com/i/web/status/1884059118534811825



    IN REPLY TO:

    Unknown author

    (Doolittle on AI)
    RE: Deepseek Nonsense
    Ok, been through the code that’s available. It’s not only obvious that the training code isn’t shared, but from what I’ve gathered, they are afraid or ashamed to share it for good reason.

    1) There are three innovations in the code that Deepseek used to save compute:
    i) A mixture of experts divides the problem vertically into silos.
    ii) Limiting the network hierarchy that’s updated (reinforced) divides the problem scope horizontally.
    iii) Predicting phrases instead of tokens reduces the network numerically ( and I suspect produces more semantic value per byte so to speak)

    2) They used existing code from Meta’s Open Source LLM, and slightly modified it.

    3) I am not positive, but given the code thinks it’s OpenAi, the absence of the training code, and the similarity of the results, it appears that they either got a copy of the OpenAI weights, OR they traversed the OpenAI graph using multiple accounts instead of ‘training’ Deepseek from source data.

    3) In other words, yes there are innovations but they are micro innovations on existing work and another example of intellectual property theft.

    Ergo: DeepSeek is little more than a raid on OpenAI’s intellectual property under the pretense of replication of the work effort, which does in the end result in the conversion of OpenAI’s private intellectual property to an Open Source that can be used by others IF we can recreate the training code so that we can tweak the model and add new verticals (Experts) to it.

    I don’t have time for this kind of skullduggery but our company’s future depends upon access to an AI we can train by adding an expert to it, a chain of thought for that expert and an API consisting of a set of prompts to feed that chain of thought.

    Cheers
    CD

    Original post: https://x.com/i/web/status/1884059118534811825