DeepSeek has disrupted several long-standing assumptions in AI development:
1. “We have a special sauce, and we are very smart.” No, you don’t. True progress comes from disciplined, principled work grounded in the scientific method. Success is achievable by anyone who approaches the challenge with persistence and rigour, not by clinging to secrecy or overconfidence.
2. “Brute force is the answer.” Relying on vast amounts of data or compute power has never been the optimal strategy. Stacking GPUs without deeper insight is uninspired. A purposeful understanding of training processes and operational mechanisms is far more effective than brute force.
3. “The transformer is enough; let’s focus on propaganda and market dominance.” Incorrect. Real innovation requires reviewing, improving, and evolving current techniques, not settling into complacency and prioritising market share over meaningful advancement.
4. “Heavy investment drives innovation.” Not necessarily. Sharing knowledge and fostering collaboration are more powerful than centralising resources in a few hands. Knowledge distribution outpaces capital concentration in driving progress.
The AI bubble has burst, and it’s a critical moment for the industry. Vast resources have been squandered on unsustainable practices, and now is the time to take stock and recalibrate. We need to leave behind speculation and hype, addressing the fundamental problems transformers have revealed over the last eight years. With lessons learned, the focus must shift to creating something truly innovative and sustainable.
It’s time to abandon brute force as a stand-in for understanding. Prioritise evidence-based analysis and raise the bar for the industry as a whole.
Practical steps forward:
• Investigate the learning process, tracing outputs back to training data to identify what skills or behaviours are being promoted.
• Embrace curriculum-based training data to shape more effective and purposeful learning.
• Move beyond traditional Euclidean geometries, adopting structures better suited to the discrete and hierarchical nature of language.
• Replace black-box evaluations with holistic frameworks based on clear mathematical models, free from anthropomorphic or subjective biases.
• Develop hybrid approaches that integrate continuous stochastic distributions with symbolic latent spaces.
Now is the moment to reimagine AI development with clarity, creativity, and accountability.
Agreed. In general, it’s important to balance exploration with consolidation or exploitation. While brute force can be valuable for exploration, this approach needs to be alternated with periods of consolidation. The strategy you choose will naturally depend on your current understanding of the problem space and whether it is bounded or open-ended.
I'll just say that I have a different idea of how categories arise and are represented in language.
Sutton's essay is well worth reading and I agree with many of his points.
"We want AI agents that
can discover like we can, not which contain what we have discovered. Building in our discoveries
only makes it harder to see how the discovering process can be done."
Indeed. But what are LLMs if not "building in our discoveries"? And, of course, it would be great to have such agents as Sutton describes. I think they are so remote at this point that we might as well want time machines while we're at it.
I’ve covered these topics several times on my blog.
Before diving in, it’s essential to understand the nature of AI outputs. This is a crucial step in recognising how outputs function and avoiding common misconceptions. Many people assume that asking a particular fact from an LLM will always yield the same response. However, this is not the case. Outputs follow a stochastic distribution, meaning they can range from correct to incorrect, nonsensical, or anywhere in between.
The psychology behind interpreting AI outputs as individual instances rather than a cohesive whole. People often think the output space is uniform and coherent from a human perspective. In reality, outputs may contradict each other depending on the input and training data. This is because they do not represent a single entity but a multifaceted, kaleidoscopic system.
1.1) How your perception influences your understanding of AI interactions:
Regarding the hierarchical nature of language and Euclidean geometries, this is a more complex topic.
In general, language is discrete—a collection or list—unlike continuous data such as temperature. When an LLM maps tokens against each other, the specific trajectories in latent space treat language as if it were continuous. This oversimplification works in practice but introduces new issues. Word embeddings use floats to represent each dimension, which creates boundaries in high-dimensional spaces. These boundaries may or may not align with meaningful concepts in English or any other language.
An ontology of concepts is used in natural language as a means of compressing leaned relationships. If a car has four wheels then any sub-class of car also has four wheels, for example. You need not learn that every car every seen has a certain number of wheels. Same goes for the LLM. No doubt during training, LLMs compress relationships to the most general concepts for a similar result.
- There was never a special sauce to start with. The base ideas are known.
- Brute force is just so much the answer. DeepSeek did not come up with anything original, was just a bit clever with optimizations. US vendors also release distilled and cheaper versions of their products.
- The transformer was never enough. In case you did not notice, the DeepSeek architecture is an imitation of the OpenAI reasoning logic.
- Heavy investment drives just innovation just fine. DeepSeek immitated existing work. They did no innovation whatsoever.
- No AI bubble burst. The state-of-the-art is a moving goal. It will require heavy research, lots of data, lots of money.
Yes, must use smarter methods, but DeepSeek did not invent any one of them.
Just curious, when you say that DeepSeek is an imitation of the OpenAI reasoning architecture, is that assumed or actually known? Because I thought OpenAI was refusing to share any useful technical details about how o1 works.
It is a solid guess based on o3's behaviour that it is a tweaked llm based on chain of thought. The ideas have been around for a while. Specifics may differ but not much.
everything you write is great, and 100% antithetical to the american way of life, which is do everything by violent force, conceded no errors, cooperate with nobody, and when in doubt just go absolutely crazy.
I appreciate that this development undercuts all conversations about how wonderful AI will be for humanity. If that’s why we’re doing it then wouldn’t folks celebrate someone doing it cheaper and openly? China was also at the forefront of making cheap textbooks that were not subject to the intellectual property and copyright laws that make them so difficult to access in the West.
How does it undercut anything other than the financial plans of some other, non-Chinese AI companies? It says nothing, positive or negative, about "how wonderful AI will be for humanity".
I didn’t say anything about whether AI is actually good for humanity, what I said was that it changes the claims that the reason it’s being pursued is for something other than profit. When the President says we need to be “laser-focused on competing to win,” what is it that we’re trying to win? What do we think the chances will be that the call is not for ramping up innovation and instead for imposing trade protections and restrictions?
When Trump says it, you can be sure it is about country-on-country competition and/or benefits to himself, his family, and close associates. As for AI companies in capitalist countries, they are pursuing profit but they hope their products are good for humanity as that is both profitable and good.
That is true. Biden wouldn’t publicly say it. Hunter would simply sell expensive AI model weights. 😋
Political banter aside, any time politicians are involved, it’sa losing proposition. It also means there is no true value in the underlying tech that has been realized - everything is posturing
After a day of using DeepSeek and (paid) ChatGPT side by side, giving them the same prompts and having them critique each other's answers, I'd give the win to ChatGPT, but not by a lot. I'd say regarding both LLMs that you have to check their answers against known references before making important decisions, but they're both fine if you just want to chat breezily about something that doesn't matter. Maybe DeepSeek gets a C- and ChatGPT gets a C+.
Sure, DeepSeek is programmed to filter out troublesome topics for the CCP, but I've had ChatGPT refuse to write a sample statute for banning crypto because "that would violate my libertarian principles" it said. And the first time I tried using Claude it also refused to do what I considered to be straightforward G-rated tasks. If any of these programs were my research assistants I'd fire them.
The nice thing about these models is you can use 3-4 of them and pick the one that is useful for a given task, which you couldn’t do with research assistants. I use Claude, Gemini, ChatGPT, and DeepSeek, and each is sometimes useful when the others are not.
I agree. Current generation systems are not to be trusted for any serious work. They are useful for inspiring some ideas and identifying potentially overlooked issues. But claims of any significance always have to be investigated closely if you want to rely on them.
The issue that drives me crazy is that the firms are all constantly rushing on to the next useless prototype instead of fixing the last thing so it actually works. The Chatbots (or Sora, or GPTs, or ...) are still no good for any serious work e.g. for customer service or something, but, instead of fixing that, everyone is running off to create super "autonomous agents" or some such nonsense. I think we can guarantee that those things are never going to work before they rush on to a house robot, or a mathematical proof creator, or something else guaranteed to be half-baked.
It's a bit like watching a bunch of people with ADHD being given tens of billions of dollars to run around playing. On second thoughts, that's exactly what it is.
A small number of mega-tech public and private companies have had a vested interest in keeping the focus purely on scaling because it is a massive barrier to entry. This development appears to shift the paradigm - fast following and innovation are possible without billions of dollars. This is positive for consumers and corporations using the technology. There are feasible use cases for LLMs, but the current ones will never get close to paying back the investment. This has stoked a never-ending hype cycle to distract from this reality. Probably, if you mark to market OpenAI and Anthropic, they would have lost half their value yesterday. The game is still on, though. Players with massive supercomputers are figuring out how to incorporate these innovations into their work. It will be interesting to see what is next. And PS, it is not AGI, but it can be useful.
You remind me of IBM having a vested interest in mainframes during the 1980s. Instead of these massively-scaled LLMs, I think the future is every phone, tablet, and laptop will ship with one or more personal LLM cores (like CPU and GPU cores today) that sit underneath user-friendly program interfaces and that train themselves on their specific user and can be configured by their specific user.
There are a few parallels, but also some differences. IBM was late to the mini and unix servers, probably driven by similar mindsets. However, IBM had huge profits from mainframes. It is still not clear how the foundation LLM commercial model will shake out. This is an issue for a company like Deep Seek as well.
As Gary says, this seems to be just a case of the Chinese figuring out how to train LLMs more cheaply. As it was done as an open project, other AI companies will likely adopt some version of their techniques. It is simple as that, which as I read it, was all Dr. Marcus is saying here. All this stuff I see in the comments about how DeepSeek will revolutionize (something) seems to have missed the point of the article.
I agree that it doesn't take us directly closer to general AI, but I think you'd grant that it's a much much more attractive of a platform to build upon its new price point.
Not a perfect analogy but computers don't move beyond being Turing complete when moving from vacuum tubes to chips, but how the toolkit can be harnessed expands dramatically.
I briefly looked at some chat logs and they made me want to throw up on my keyboard. DeepSeek appears to love filler text, like "hmm, let me think this through for a moment...", or "but wait a sec, I'm not totally sure about my answer yet, I should double check it by yadda yadda yadda..."
I know I'm not the target audience, but ewww. It gives me a "doth protest too much" vibe; since lack of realreasoning abilities has been a consistent criticism of LLMs, DeepSeek's creators appear to have hard-coded in a 'lookit me I'm reasoning!" chat personality.
"DeepSeek is an economic revolution, and geopolitical wake-up call, but that doesn't directly bring us any closer to AGI."
Since the field of AI refuses to provide a consensual scientifically valid definition of "intelligence" we cannot say if DeepSeek can directly bring us closer to AGI. What we can say is DeepSeek and the rest of the LLMs have nothing to do with Genotypic Human Behavior, e.g., Language, as Chomsky pointed that out in his essay 2023 "The False Promise of ChatGPT." Unlike some contributors I do not engage in predicting the future so I cannot say if LLMs will have nothing to add to an Information Processing System that can replicate Genotypic Human Behavior. I suspect the basic technology will have some minor role but ¯\_(ツ)_/¯
The threat to nvidia is huge, because it has dramatically shrunk the size of the overall pie (the total addressable market "TAM"). More accurately, it has very suddenly illustrated to people that the TAM was always far smaller than the figures needed to justify nvidia's valuation.
Deepseek uses a training model 1.5 years old and it won’t allow any negativity about Xi. Regarding the CCP deference, can that be stripped out of the source code?
Reportedly, DeepSeek can be subverted by asking it to substitute certain digits and symbols for particular English letters in the answer.
Then its 'alignment' reinforcement training fails and it will then for example give you the details on how tanks were bravely held up in Tiananmen Square.
Building AGI is hardly the point, though the CEO wants to do that too according to reports. He's doing it opensource, with full transparency, and none of the guardrail nonsense, which means it has a better tone as the model has not been crippled. Cheap and popular with consumers may be important however.
The Soviets stole the secrets needed to build one. Respect their espionage abilities, sure, but that's all they needed. I suspect your analogy doesn't apply here unless, of course, this AI is built on theft of knowledge too.
The West gave the Russians the materials and info to make the bomb. The entire Cold War was a scam to justify spending on the MIC and to control the people. The missile divergence was fabricated. Look up the writings of Antony Sutton. Think Oliver Stone hits on some of it in his documentary.
DeepSeek has disrupted several long-standing assumptions in AI development:
1. “We have a special sauce, and we are very smart.” No, you don’t. True progress comes from disciplined, principled work grounded in the scientific method. Success is achievable by anyone who approaches the challenge with persistence and rigour, not by clinging to secrecy or overconfidence.
2. “Brute force is the answer.” Relying on vast amounts of data or compute power has never been the optimal strategy. Stacking GPUs without deeper insight is uninspired. A purposeful understanding of training processes and operational mechanisms is far more effective than brute force.
3. “The transformer is enough; let’s focus on propaganda and market dominance.” Incorrect. Real innovation requires reviewing, improving, and evolving current techniques, not settling into complacency and prioritising market share over meaningful advancement.
4. “Heavy investment drives innovation.” Not necessarily. Sharing knowledge and fostering collaboration are more powerful than centralising resources in a few hands. Knowledge distribution outpaces capital concentration in driving progress.
The AI bubble has burst, and it’s a critical moment for the industry. Vast resources have been squandered on unsustainable practices, and now is the time to take stock and recalibrate. We need to leave behind speculation and hype, addressing the fundamental problems transformers have revealed over the last eight years. With lessons learned, the focus must shift to creating something truly innovative and sustainable.
It’s time to abandon brute force as a stand-in for understanding. Prioritise evidence-based analysis and raise the bar for the industry as a whole.
Practical steps forward:
• Investigate the learning process, tracing outputs back to training data to identify what skills or behaviours are being promoted.
• Embrace curriculum-based training data to shape more effective and purposeful learning.
• Move beyond traditional Euclidean geometries, adopting structures better suited to the discrete and hierarchical nature of language.
• Replace black-box evaluations with holistic frameworks based on clear mathematical models, free from anthropomorphic or subjective biases.
• Develop hybrid approaches that integrate continuous stochastic distributions with symbolic latent spaces.
Now is the moment to reimagine AI development with clarity, creativity, and accountability.
They may not have a special sauce, but they do have a special incantation:
“We know how to build AGI and it’s just around the coroner… with robotaxis”
'The Bitter Lesson' has been stated to include optimization of data-driven approaches and unsupervised reinforcement learning.
https://www.cs.utexas.edu/~eunsol/courses/data/bitter_lesson.pdf
Agreed. In general, it’s important to balance exploration with consolidation or exploitation. While brute force can be valuable for exploration, this approach needs to be alternated with periods of consolidation. The strategy you choose will naturally depend on your current understanding of the problem space and whether it is bounded or open-ended.
Very interesting. Thanks.
I'll just say that I have a different idea of how categories arise and are represented in language.
Sutton's essay is well worth reading and I agree with many of his points.
"We want AI agents that
can discover like we can, not which contain what we have discovered. Building in our discoveries
only makes it harder to see how the discovering process can be done."
Indeed. But what are LLMs if not "building in our discoveries"? And, of course, it would be great to have such agents as Sutton describes. I think they are so remote at this point that we might as well want time machines while we're at it.
"free from anthropomorphic or subjective biases"
Rather a tall order if the underlying data is human language, wouldn't you agree?
"... hierarchical nature of language"
I don't understand what this is.
I’ve covered these topics several times on my blog.
Before diving in, it’s essential to understand the nature of AI outputs. This is a crucial step in recognising how outputs function and avoiding common misconceptions. Many people assume that asking a particular fact from an LLM will always yield the same response. However, this is not the case. Outputs follow a stochastic distribution, meaning they can range from correct to incorrect, nonsensical, or anywhere in between.
https://ai-cosmos.hashnode.dev/the-ai-trinity-what-everyone-gets-wrong-about-modern-ai-systems
1) To answer your first question:
The psychology behind interpreting AI outputs as individual instances rather than a cohesive whole. People often think the output space is uniform and coherent from a human perspective. In reality, outputs may contradict each other depending on the input and training data. This is because they do not represent a single entity but a multifaceted, kaleidoscopic system.
1.1) How your perception influences your understanding of AI interactions:
https://ai-cosmos.hashnode.dev/human-psychology-effects-in-ai-exploring-biases-umwelt-and-worldview
1.2) A practical example of this: comparing interpretations of Sam vs Tom outputs.
https://ai-cosmos.hashnode.dev/llms-the-roulette-wheel-of-decision-making
2) To answer your second question:
Regarding the hierarchical nature of language and Euclidean geometries, this is a more complex topic.
In general, language is discrete—a collection or list—unlike continuous data such as temperature. When an LLM maps tokens against each other, the specific trajectories in latent space treat language as if it were continuous. This oversimplification works in practice but introduces new issues. Word embeddings use floats to represent each dimension, which creates boundaries in high-dimensional spaces. These boundaries may or may not align with meaningful concepts in English or any other language.
https://ai-cosmos.hashnode.dev/beyond-gradient-descent
An ontology of concepts is used in natural language as a means of compressing leaned relationships. If a car has four wheels then any sub-class of car also has four wheels, for example. You need not learn that every car every seen has a certain number of wheels. Same goes for the LLM. No doubt during training, LLMs compress relationships to the most general concepts for a similar result.
I don't think anything got disrupted whatsoever.
- There was never a special sauce to start with. The base ideas are known.
- Brute force is just so much the answer. DeepSeek did not come up with anything original, was just a bit clever with optimizations. US vendors also release distilled and cheaper versions of their products.
- The transformer was never enough. In case you did not notice, the DeepSeek architecture is an imitation of the OpenAI reasoning logic.
- Heavy investment drives just innovation just fine. DeepSeek immitated existing work. They did no innovation whatsoever.
- No AI bubble burst. The state-of-the-art is a moving goal. It will require heavy research, lots of data, lots of money.
Yes, must use smarter methods, but DeepSeek did not invent any one of them.
Just curious, when you say that DeepSeek is an imitation of the OpenAI reasoning architecture, is that assumed or actually known? Because I thought OpenAI was refusing to share any useful technical details about how o1 works.
It is a solid guess based on o3's behaviour that it is a tweaked llm based on chain of thought. The ideas have been around for a while. Specifics may differ but not much.
everything you write is great, and 100% antithetical to the american way of life, which is do everything by violent force, conceded no errors, cooperate with nobody, and when in doubt just go absolutely crazy.
I appreciate that this development undercuts all conversations about how wonderful AI will be for humanity. If that’s why we’re doing it then wouldn’t folks celebrate someone doing it cheaper and openly? China was also at the forefront of making cheap textbooks that were not subject to the intellectual property and copyright laws that make them so difficult to access in the West.
How does it undercut anything other than the financial plans of some other, non-Chinese AI companies? It says nothing, positive or negative, about "how wonderful AI will be for humanity".
I didn’t say anything about whether AI is actually good for humanity, what I said was that it changes the claims that the reason it’s being pursued is for something other than profit. When the President says we need to be “laser-focused on competing to win,” what is it that we’re trying to win? What do we think the chances will be that the call is not for ramping up innovation and instead for imposing trade protections and restrictions?
When Trump says it, you can be sure it is about country-on-country competition and/or benefits to himself, his family, and close associates. As for AI companies in capitalist countries, they are pursuing profit but they hope their products are good for humanity as that is both profitable and good.
That is true. Biden wouldn’t publicly say it. Hunter would simply sell expensive AI model weights. 😋
Political banter aside, any time politicians are involved, it’sa losing proposition. It also means there is no true value in the underlying tech that has been realized - everything is posturing
This - thank you Sherri!
After a day of using DeepSeek and (paid) ChatGPT side by side, giving them the same prompts and having them critique each other's answers, I'd give the win to ChatGPT, but not by a lot. I'd say regarding both LLMs that you have to check their answers against known references before making important decisions, but they're both fine if you just want to chat breezily about something that doesn't matter. Maybe DeepSeek gets a C- and ChatGPT gets a C+.
Sure, DeepSeek is programmed to filter out troublesome topics for the CCP, but I've had ChatGPT refuse to write a sample statute for banning crypto because "that would violate my libertarian principles" it said. And the first time I tried using Claude it also refused to do what I considered to be straightforward G-rated tasks. If any of these programs were my research assistants I'd fire them.
The nice thing about these models is you can use 3-4 of them and pick the one that is useful for a given task, which you couldn’t do with research assistants. I use Claude, Gemini, ChatGPT, and DeepSeek, and each is sometimes useful when the others are not.
I agree. Current generation systems are not to be trusted for any serious work. They are useful for inspiring some ideas and identifying potentially overlooked issues. But claims of any significance always have to be investigated closely if you want to rely on them.
The issue that drives me crazy is that the firms are all constantly rushing on to the next useless prototype instead of fixing the last thing so it actually works. The Chatbots (or Sora, or GPTs, or ...) are still no good for any serious work e.g. for customer service or something, but, instead of fixing that, everyone is running off to create super "autonomous agents" or some such nonsense. I think we can guarantee that those things are never going to work before they rush on to a house robot, or a mathematical proof creator, or something else guaranteed to be half-baked.
It's a bit like watching a bunch of people with ADHD being given tens of billions of dollars to run around playing. On second thoughts, that's exactly what it is.
firms are all constantly rushing on to the next useless prototype instead of fixing the last thing so it actually works“
That’s because from a fundamental factual and reliability standpoint, they are not fixable.
And the folks developing them know that.
Certainly hard to fix but I am probably more of a technology optimist on that score
I am optimistic when it comes to technology that lives in the world of logic and physics.
Every technology before LLMs lived in that world and I don’t believe in exceptions to that rule.
Should say “every successful technology”
How do you “fix” a video generator that produces Bride of Frankensteinian stuff like this?
https://futurism.com/openai-sora-gymnastics-videos
The AI operates in and on the world of pixels not physics.
Someone should really inform the Nobel committee.
If there is one thing they have aplenty it is squirrels.
I just asked ChatGPT to write a statute to ban cryptocurrency and it happily provided me with the same.
This doesn't mean I'm a liar, it means another problem with ChatGPT is this kind of inconsistent response to similar prompts.
How is it an economic revolution if it just generates garbage/slop a lot more cheaply?
Sneed, which LLMs do you use, for which purposes, and what issues have you run into so far?
Probably should have called it CheapSeek instead of DeepSeek
Maybe more like, it has upset the garbage bins, and for those heavily invested in such garbage, or the bins, that is more than upsetting.
Yes, but it does give hope to countries without massive AI budgets, like Britain, that they can do a lot with ingenuity and pragmatism.
A small number of mega-tech public and private companies have had a vested interest in keeping the focus purely on scaling because it is a massive barrier to entry. This development appears to shift the paradigm - fast following and innovation are possible without billions of dollars. This is positive for consumers and corporations using the technology. There are feasible use cases for LLMs, but the current ones will never get close to paying back the investment. This has stoked a never-ending hype cycle to distract from this reality. Probably, if you mark to market OpenAI and Anthropic, they would have lost half their value yesterday. The game is still on, though. Players with massive supercomputers are figuring out how to incorporate these innovations into their work. It will be interesting to see what is next. And PS, it is not AGI, but it can be useful.
You remind me of IBM having a vested interest in mainframes during the 1980s. Instead of these massively-scaled LLMs, I think the future is every phone, tablet, and laptop will ship with one or more personal LLM cores (like CPU and GPU cores today) that sit underneath user-friendly program interfaces and that train themselves on their specific user and can be configured by their specific user.
There are a few parallels, but also some differences. IBM was late to the mini and unix servers, probably driven by similar mindsets. However, IBM had huge profits from mainframes. It is still not clear how the foundation LLM commercial model will shake out. This is an issue for a company like Deep Seek as well.
As Gary says, this seems to be just a case of the Chinese figuring out how to train LLMs more cheaply. As it was done as an open project, other AI companies will likely adopt some version of their techniques. It is simple as that, which as I read it, was all Dr. Marcus is saying here. All this stuff I see in the comments about how DeepSeek will revolutionize (something) seems to have missed the point of the article.
I agree that it doesn't take us directly closer to general AI, but I think you'd grant that it's a much much more attractive of a platform to build upon its new price point.
Not a perfect analogy but computers don't move beyond being Turing complete when moving from vacuum tubes to chips, but how the toolkit can be harnessed expands dramatically.
I briefly looked at some chat logs and they made me want to throw up on my keyboard. DeepSeek appears to love filler text, like "hmm, let me think this through for a moment...", or "but wait a sec, I'm not totally sure about my answer yet, I should double check it by yadda yadda yadda..."
I know I'm not the target audience, but ewww. It gives me a "doth protest too much" vibe; since lack of realreasoning abilities has been a consistent criticism of LLMs, DeepSeek's creators appear to have hard-coded in a 'lookit me I'm reasoning!" chat personality.
"DeepSeek is an economic revolution, and geopolitical wake-up call, but that doesn't directly bring us any closer to AGI."
Since the field of AI refuses to provide a consensual scientifically valid definition of "intelligence" we cannot say if DeepSeek can directly bring us closer to AGI. What we can say is DeepSeek and the rest of the LLMs have nothing to do with Genotypic Human Behavior, e.g., Language, as Chomsky pointed that out in his essay 2023 "The False Promise of ChatGPT." Unlike some contributors I do not engage in predicting the future so I cannot say if LLMs will have nothing to add to an Information Processing System that can replicate Genotypic Human Behavior. I suspect the basic technology will have some minor role but ¯\_(ツ)_/¯
Totally agree! The market panic on NVDA was totally senseless. Anyone who can solve the "hallucination" problem will be the final winner.
Nvidia is overvalued and was overdue for a correction. Still a great company, just priced by hype.
The threat to nvidia is huge, because it has dramatically shrunk the size of the overall pie (the total addressable market "TAM"). More accurately, it has very suddenly illustrated to people that the TAM was always far smaller than the figures needed to justify nvidia's valuation.
Deepseek uses a training model 1.5 years old and it won’t allow any negativity about Xi. Regarding the CCP deference, can that be stripped out of the source code?
Indeed, LLM prompt hacking is a thing.
Reportedly, DeepSeek can be subverted by asking it to substitute certain digits and symbols for particular English letters in the answer.
Then its 'alignment' reinforcement training fails and it will then for example give you the details on how tanks were bravely held up in Tiananmen Square.
I love this. Do the CCP not understand the basic nature of LLMs? That which makes them seem human-like also makes their output impossible to control.
Building AGI is hardly the point, though the CEO wants to do that too according to reports. He's doing it opensource, with full transparency, and none of the guardrail nonsense, which means it has a better tone as the model has not been crippled. Cheap and popular with consumers may be important however.
We also dramatically underestimated the Soviet's abilities to develop a nuclear bomb.
Joe-1 just dropped again.
The Soviets stole the secrets needed to build one. Respect their espionage abilities, sure, but that's all they needed. I suspect your analogy doesn't apply here unless, of course, this AI is built on theft of knowledge too.
The West gave the Russians the materials and info to make the bomb. The entire Cold War was a scam to justify spending on the MIC and to control the people. The missile divergence was fabricated. Look up the writings of Antony Sutton. Think Oliver Stone hits on some of it in his documentary.
DeepSeek will not be used by US corporations, for many reasons. They will use a US solution.
Any of the tricks DeepSeek used will be used by US vendors too. They already offer "flash" versions.
Price wars and lack of moat are always the case in tech.
Weaker players will go under. The rest will find their own niches, and will reach an implicit equilibrium that allows them to make a buck, eventually.