Five things most people don't seem to…

Gary Marcus

Jan 28

272

DeepSeek r1 is not smarter than earlier models, just trained more cheaply

Read →

92 Comments

Gerard

DeepSeek has disrupted several long-standing assumptions in AI development:

1. “We have a special sauce, and we are very smart.” No, you don’t. True progress comes from disciplined, principled work grounded in the scientific method. Success is achievable by anyone who approaches the challenge with persistence and rigour, not by clinging to secrecy or overconfidence.

2. “Brute force is the answer.” Relying on vast amounts of data or compute power has never been the optimal strategy. Stacking GPUs without deeper insight is uninspired. A purposeful understanding of training processes and operational mechanisms is far more effective than brute force.

3. “The transformer is enough; let’s focus on propaganda and market dominance.” Incorrect. Real innovation requires reviewing, improving, and evolving current techniques, not settling into complacency and prioritising market share over meaningful advancement.

4. “Heavy investment drives innovation.” Not necessarily. Sharing knowledge and fostering collaboration are more powerful than centralising resources in a few hands. Knowledge distribution outpaces capital concentration in driving progress.

The AI bubble has burst, and it’s a critical moment for the industry. Vast resources have been squandered on unsustainable practices, and now is the time to take stock and recalibrate. We need to leave behind speculation and hype, addressing the fundamental problems transformers have revealed over the last eight years. With lessons learned, the focus must shift to creating something truly innovative and sustainable.

It’s time to abandon brute force as a stand-in for understanding. Prioritise evidence-based analysis and raise the bar for the industry as a whole.

Practical steps forward:

• Investigate the learning process, tracing outputs back to training data to identify what skills or behaviours are being promoted.

• Embrace curriculum-based training data to shape more effective and purposeful learning.

• Move beyond traditional Euclidean geometries, adopting structures better suited to the discrete and hierarchical nature of language.

• Replace black-box evaluations with holistic frameworks based on clear mathematical models, free from anthropomorphic or subjective biases.

• Develop hybrid approaches that integrate continuous stochastic distributions with symbolic latent spaces.

Now is the moment to reimagine AI development with clarity, creativity, and accountability.

Expand full comment

Reply (5)

Larry Jewett

They may not have a special sauce, but they do have a special incantation:

“We know how to build AGI and it’s just around the coroner… with robotaxis”

Expand full comment

Stephen Reed

'The Bitter Lesson' has been stated to include optimization of data-driven approaches and unsupervised reinforcement learning.

https://www.cs.utexas.edu/~eunsol/courses/data/bitter_lesson.pdf

Expand full comment

Reply (2)

Gerard

Agreed. In general, it’s important to balance exploration with consolidation or exploitation. While brute force can be valuable for exploration, this approach needs to be alternated with periods of consolidation. The strategy you choose will naturally depend on your current understanding of the problem space and whether it is bounded or open-ended.

Expand full comment

Youssef alHoutsefot

Very interesting. Thanks.

I'll just say that I have a different idea of how categories arise and are represented in language.

Sutton's essay is well worth reading and I agree with many of his points.

"We want AI agents that

can discover like we can, not which contain what we have discovered. Building in our discoveries

only makes it harder to see how the discovering process can be done."

Indeed. But what are LLMs if not "building in our discoveries"? And, of course, it would be great to have such agents as Sutton describes. I think they are so remote at this point that we might as well want time machines while we're at it.

Expand full comment

Youssef alHoutsefot

"free from anthropomorphic or subjective biases"

Rather a tall order if the underlying data is human language, wouldn't you agree?

"... hierarchical nature of language"

I don't understand what this is.

Expand full comment

Reply (2)

Gerard

2dEdited

I’ve covered these topics several times on my blog.

Before diving in, it’s essential to understand the nature of AI outputs. This is a crucial step in recognising how outputs function and avoiding common misconceptions. Many people assume that asking a particular fact from an LLM will always yield the same response. However, this is not the case. Outputs follow a stochastic distribution, meaning they can range from correct to incorrect, nonsensical, or anywhere in between.

https://ai-cosmos.hashnode.dev/the-ai-trinity-what-everyone-gets-wrong-about-modern-ai-systems

1) To answer your first question:

The psychology behind interpreting AI outputs as individual instances rather than a cohesive whole. People often think the output space is uniform and coherent from a human perspective. In reality, outputs may contradict each other depending on the input and training data. This is because they do not represent a single entity but a multifaceted, kaleidoscopic system.

1.1) How your perception influences your understanding of AI interactions:

https://ai-cosmos.hashnode.dev/human-psychology-effects-in-ai-exploring-biases-umwelt-and-worldview

1.2) A practical example of this: comparing interpretations of Sam vs Tom outputs.

https://ai-cosmos.hashnode.dev/llms-the-roulette-wheel-of-decision-making

2) To answer your second question:

Regarding the hierarchical nature of language and Euclidean geometries, this is a more complex topic.

In general, language is discrete—a collection or list—unlike continuous data such as temperature. When an LLM maps tokens against each other, the specific trajectories in latent space treat language as if it were continuous. This oversimplification works in practice but introduces new issues. Word embeddings use floats to represent each dimension, which creates boundaries in high-dimensional spaces. These boundaries may or may not align with meaningful concepts in English or any other language.

https://ai-cosmos.hashnode.dev/beyond-gradient-descent

Expand full comment

Stephen Reed

An ontology of concepts is used in natural language as a means of compressing leaned relationships. If a car has four wheels then any sub-class of car also has four wheels, for example. You need not learn that every car every seen has a certain number of wheels. Same goes for the LLM. No doubt during training, LLMs compress relationships to the most general concepts for a similar result.

Expand full comment

Andy X Andersen

2dEdited

I don't think anything got disrupted whatsoever.

- There was never a special sauce to start with. The base ideas are known.

- Brute force is just so much the answer. DeepSeek did not come up with anything original, was just a bit clever with optimizations. US vendors also release distilled and cheaper versions of their products.

- The transformer was never enough. In case you did not notice, the DeepSeek architecture is an imitation of the OpenAI reasoning logic.

- Heavy investment drives just innovation just fine. DeepSeek immitated existing work. They did no innovation whatsoever.

- No AI bubble burst. The state-of-the-art is a moving goal. It will require heavy research, lots of data, lots of money.

Yes, must use smarter methods, but DeepSeek did not invent any one of them.

Expand full comment

Marcus on AI

Five things most people don't seem to…