Marcus on AI

there’s an essay here on LLMs and tools btw, maybe 18 months ago. tools should be in the title

Expand full comment

Spartacus

This is just the idiot operations that are all right there in the 500 odd lines of code that makes an LLM work. It returns correlations between "stuff". That's why there is no connection between being able to reply with text that seems to be "intelligent" and the doing of anything intelligent. Those are very different things.

Grok2 and Grok3 have both given me detailed prompts that appear to make total sense. Those prompts describe exactly what Grok should do. And Grok can't make lexical sense of those prompts it generates to save its life. It's just babble.

This is why I say that the most dangerous thing about "AI" is that there is absolutely no difference between babble that looks sensible to us humans, and babble that does not. It's just a babble machine.

Expand full comment

Tolga Abaci

After reading your comment, I thought that an LLM should not need to do the multiplication "in its head", nor should it even need a calculator. It could do it using the "pen and paper" approach that many of us learned in primary school years ago. This way, it would not need any external tools. The instructions to do that should be available in its training data, but can it do it?

I decided to test it with Gemini 2.0 Flash, I got what I thought was an interesting result. It got the numeric result right for a 10 digit by 10 digit multiplication, but the presentation of the method seems wrong. Here is the conversation: https://gemini.google.com/share/e00fc02b0af7

Note that it says each partial result should be shifted to the left, but it cannot do it properly itself! Also, the top row which should contain all zeros is completely missing for some reason.

It also claims there are some restrictions which prevents it from formatting things accurately, but I placed no such instructions. Maybe there's something in its own preamble?

Also, please note that upon being called out on the erroneous alignment, it seemed to acknowledge the error, but proceeded to output exactly the same result, claiming it had been fixed.

I did a few other tests, where I tried to get it to output the 20x20 calculation directly, but it refused, in one instance it gave me some Python code that would do the multiplication (the multiplication itself, not the pen-and-paper method), but I had not asked for any code at all. The overall experience felt like speaking to a strange alien.

None of this goes against the claims in your comment and article, of course. By telling it to use the "pen-and-paper" method explicitly, we are supplying the necessary reasoning externally. I just wanted to share this weird (and perhaps not so useful) experience.

Expand full comment

Dakara

Yes, it is another relevant point, as some have stated that the AI could also use the same method we use to manually multiply. It has sufficient context to write out the entire solution. It has read every math textbook created, so it should know the rules.

But as you point out, it still can't do this properly.

Furthermore, some will say, "agents solve this". But they don't really. You can train the LLM to use agents for simple problems, but once it has to orchestrate a complex workflow, it is going to fall apart. Somewhere along the line it will hallucinate something. Even if using a calculator, it may enter the wrong values or perform the wrong operation. It will never be deterministic. We can't build reliable systems on top of LLMs.

Expand full comment

Art

"We can't build reliable systems on top of LLMs." - exactly, it was my 1st thought some years ago, still valid ever since. 😈

Expand full comment

Any system that relies on an LLM for any significant part of its functionality will be unreliable.

Expand full comment

It has read every math textbook created, so it should know the rules.“

It has read every math book but has understood none of them.

Expand full comment

4dEdited

I acknowledge the point that for an LLM-GPT based tool to be regarded as capable of or useful as an AGI tool, it should reasonably be able to discern contexts in which the plausible use of - if not implied need for - a calculator is evident.

It's worth mentioning though (from memory - precise fact-check required), that the ability for the user to suggest or nudge LLM tools for the use of a calculator in framing a response was - at least in the case of Open Ai's GPT - an improvement made quite some time ago (more than a year?). From memory, this occurred (perhaps in response to) Stephen Wolfram publishing his nice explanation and critique of how GPT's worked, and how they were inherently poor at responses to problems that involved Math, Geo-spatial reasoning, etc - essentially because they lacked a world model to reason accurately about such concepts.

I recall that in a subsequent point release of Chat GPT, you could prompt "Using a calculator [...]" and there appeared to be a significant improvement in the accuracy of responses where basic math calculations were an integral component of the response (though I can't recall which version introduced this).

Also worth mentioning that (optional) integrations with Wolfram Alpha were also made available around that time, providing the optional capability to obtain better geo-spatial, Math, etc responses.

Expand full comment

Dakara

Yes. Tools have been integrated, but that also requires training. However, it only helps simple problems. The chance of a hallucination still exists. Numbers or input can still be incorrectly provided to the tool or the results misinterpreted or hallucinated.

If you have a problem that requires orchestration of tools, then the chances of a hallucination somewhere in the chain just increases.

This kind of architecture will never get rid of random failures. If your use case requires something dependable, LLMs are unlikely to ever meet the needs.

Expand full comment

4dEdited

I agree that using LLM/GPT's as the overarching "architecture" or core of AI systems in and of themselves is problematic in unsolvable ways. And I agree with many if not most of your points that derive from that position.

I think LLM/GPT's may continue to have future utility *as a subordinate component part of* more well-considered, better-designed AI systems - at the very least as output forming engines (with the help of upstream world-model engines), and possibly (perhaps less likely) at the front end as user-input clarification / verification tools.

I agree that as a generalisation having such tools somewhat randomly "calling out" to orchestrate other tools is problematic. However, I suspect that in some specific cases that approach is possibly 'good enough' for some usage contexts.

Expand full comment

Clyde Wright

So 99.8 percentile in Codeforces from o3 is evidence of what exactly?

Expand full comment

4dEdited

I'm not certain that we understand what seemingly positive results achieved when AI tools tackle limited, structured tests created by humans means. It might be argued that the greatest insight gained are that the tests themselves may be flawed in important ways.

A similar analogy can be found in the way that AI-systems were able to be trained to win at Go: at least until the "multiple focus areas" strategy was used to trick those systems into making seriously "fatal" errors resulting in them losing more often than they won.

Such failures demonstrate one of the common systemic flaws evident in many AI systems (particularly LLM-based neural nets) in that they don't possess a fundamental and complete world model from which to reason. This creates an unsolvable problem in that even in cases where 80% plus of the answers provided by such systems through "heresay" corpus pattern-analysis may be regarded as "good enough" to "good" results, it's the inability to detect and manage the potentially outsized impacts from the range of common errors in the remaining ~20% that makes these systems unusable in many if not most contexts that matter. Being unreliably reliable simply doesn't cut it without significant human oversight: therefore the cost models don't work.

One pattern that seems to be becoming increasingly evident is that when topic experts are working with LLM/GPT tools, they more readily notice significant fundamental problems with the responses produced for prompts that focus on their area of expertise.

For example, when very experienced and well-respected top-tier programmers review the code created by AI tools, they spot fundamental architectural and design problems that in the long-term make the code problematic.

These problems are less-evident to those with significantly less practical experience in whole-systems design and development or in specialist areas.

In some ways that isn't surprising: if the knowledgebase used to create the corpus for an AI-driven coding-generation system is comprised of massive amounts of publicly available code created by - by-in-large - the "average" coder willing to share their work publicly. Some large portion of that code is likely copy-and-edited code: including copying implementation errors, design errors, etc. To at least some extent, some of the best code in the world is not available as input to LLM corpus generation - that code is likely to be highly prized proprietary IP, or to have security considerations, etc.

At best, what is available to LLM's likely represents a pretty poor repository on which to base a corpus for a code-generation system that is likely to generate fairly "average" code.

Expand full comment

Becca

I'm old enough to remember when I put an arithmetic problem into the google search bar and it gave me some gibberish, and when they just programmed a calculator in it was a good day.

I don't see why the average user will care if an agentic AI has been told when it needs to use a calculator.

It is WEIRD the way the current models will make a mistake, and you can break things down to little parts and then they see how to do it right. Very humanlike.

But on a bigger picture- one of "how do you have a good sense of calibration toward your own knowledge, strengths and abilities and the limitations thereof?" I fear we can't possibly reach AGI with the current approaches, because if you train anything on the internet you will get a radical oversampling of glib, loquacious, self-assured, and obliviously wrong.

Expand full comment

Jeanne Dietsch

Again I ask, why is the goal to make AI as smart as humans? As CEO of an intelligent robotics company, my question was similar: why make androids? The point of automation is to solve problems. General purpose humans, or general purpose human brains, are hardly the best solutions to human problems, or market opportunities. It's like building a mechanical super-horse instead of inventing a Tesla.

Even for the transhumanists seeking to build the next generation human 2.0, the basis for evolutionary symbiosis between two entities is a division of labor. We should be figuring out which jobs AI and humans each excel at and transitioning toward a symbiont that combines the two (or three, if including robotics.)

Expand full comment

Reply (3)

the goal isn’t to replicate humans, but we should hope for the robustness and flexibility in the face of novelty

Expand full comment

MarkS

"We should be figuring out which jobs AI and humans each excel at and transitioning toward a symbiont that combines the two"

We already did that, the AI parts are called "computers".

Expand full comment

13h

There is good reason to doubt that most AI developers even know what it is they are chasing.

The evidence of that is that so many of them have very nebulous (often very different) definitions of AGI (when they even have a definition, that is)

The whole endeavor is very UNscientific (notwithstanding the recent award of Nobel prizes in chemistry and physics)

The AI field COULD be a science if more of its practitioners behaved like scientists.

Expand full comment

13h

If one does not precisely define the goal in a measurable, independently verified way (without cheating by providing the bots with test answers ahead of time), how can ever know one has reached it?

Expand full comment

Tom Rearick

I agree with Gary Marcus that scaling cannot get us from AI to AGI. To substantiate that claim, I've outlined at least nine concrete and fatal flaws of AI that cannot be solved simply by scaling: https://tomrearick.substack.com/p/ai-reset. There will be another AI winter and the technology that emerges after it will address each of these flaws.

Expand full comment

Rebecca

To add to this list: AI doesn't know what it doesn't know.

I would bet most AI developers have never raised a child. When I ask a child a question they're not sure how to answer, they will ask me more questions, or say they don't know, or even speculate aloud on how they could get the answer.

But LLMs just make something up. Because they don't know anything at all, they are always guessing, and have no awareness of what they do not know. I agree with LeCun, LLMs are a dead end.

Bullshitting is not a sign of intelligence. But given how much certain tech leaders are salivating over LLMs replacing workers, I'm starting to wonder how much of THEIR jobs are just bullshitting...

Expand full comment

12h

Because they don't know anything at all, they are always guessing, and have no awareness of what they do not know“

In other words, they don’t know what they don’t know because they don’t know.*

*And, worse still, they can’t know what they can’t know because they can’t know,

Expand full comment

Art

Agreed, I've children. 😜

Expand full comment

MarkS

Good post on your stack.

The "AI winter" (great term!) is going to last much longer than the one in Westeros.

Expand full comment

11h

Since it’s unclear what AI practitioners even mean by “AGI”, one might also call it “Unclear winter”

Expand full comment

Earl Boebert

The problem with these commentators is that they operate on the basis of second-hand knowledge and do not formulate and run tests themselves. LLMs do not reason, they *cosplay* reasoning, going through steps which mimic *in form only* the steps that would be taken by somebody who knew what they were doing. Sometimes this yields an answer, sometimes it yields a thing that looks like an answer but isn't, and sometimes it yields ludicrously insane behavior. For a sample you can check out my elementary cryptanalysis tests of a bunch of them in the substack I just posted entitled "All LLMs are clueless but some may be useful."

Expand full comment

Youssef alHoutsefot

That was a good one, thanks. Pretty astonishing that they can't even get the character counts right. You'd think that would be within reach.

Expand full comment

Earl Boebert

The theory, which I have not been able to verify, seems to be that the fact they operate on tokens (word fragments) makes it impossible for them to do this directly. The ones that write and run a Python program to do the counts get it right.

Expand full comment

MarkS

It's fine with me if an LLM can't do word or character counts, but NOT fine that the LLM will nevertheless confidently assert that it can.

Expand full comment

12h

If the bots were actually reasoning, they should not have to call a Python (or other) program to do something simple like count characters.

The bots obviously don’t even know what characters are. All they are doing even when they call a Python routine to do the counting is matching the prompt to the name of a Python routine (or description that accompanied it during training)

If no such routine exists (in python or any other language), the bot will simply fail the task.

Expand full comment

Sherri Nichols

I was in graduate school in computer science 40 years ago, and AI researchers then were saying that AGI was coming soon (only they didn’t call AGI back then, they talked about solving the general reasoning problem). Nine years ago, I began serving on my city’s planning commission, which makes recommendations about land use decisions, and the constant drumbeat was “what are we going to do about autonomous vehicles?” To which my answer was, nothing, because they’re not coming anytime soon, at least not in the way you’re contemplating.

AI is the weirdest field. Some amazing things have come out of AI, but once they’ve solved a problem, then it’s like it’s not AI anymore, so they’re always in the position of overpromising and under delivering. No, I don’t have a self driving car, but I have radar assisted cruise control on my <$40K car. That’s amazing. My phone camera has built in image stabilization that’s incredible. AI as a tool is incredibly useful. I just don’t need it to think for me.

Expand full comment

Art

"I just don’t need it to think for me." - agreed, it's all about money - how to persuade you to buy something you really don't need. Wait, we'll coerce you to buy, for the greater good. 😈

Expand full comment

12h

I just don’t need it to think for me.“

Good thing, because it doesn’t.

Expand full comment

I am not an AI expert, but I know at least a little bit about a lot of things, like inferential statistics and machine learning, analytical philosophy and the problem of induction, deep learning models and the basics of their training.

It’s not hard to be skeptical about the claim that AGI is coming when you ask a fee very simple questions: Where is all the data that we haven’t already used coming from? Do we have massive amounts of data for every problem that we need to solve in the economy? Will we ever? Do we have any theoretical way of training an AI to reason like a human pattern detector in situations where the data does not provide close-enough examples to get a correct answer? Can we reach AGI before the models start to get polluted by being trained on AI generated data?

These are all pretty simple questions that don’t require a PhD or an IQ of 150 to come up with. But it seems like only Gary Marcus is a voice of reason about this. Even if we have quantum computers, many of these problems remain due to dependence of the current generation of models on massive amounts of data and basic philosophical problems.

Expand full comment

Bill Benzon

4dEdited

You know, Gary, Elon Musk likes to talk about working from first principles. It seems to me that the problem with machine learning is that no one is thinking from first principles. Oh, they may say they are, but they're not. Your "Algebraic Mind" is a first-principles kind of book. That's why it's held up for going-on a quarter of a century.

The problem that LLMs have with arithmetic calculation is a first-principles kind of problem. I'd think that anyone who thinks about LLM architecture and operation, on the one hand, and how multiple-digit calculation is done (calculating partial results, storing them temporarily, and then bringing them back into the computational flow), on the other, thinking about that in a first-principles way should clue you in to a fundamental problem.

Expand full comment

Fabian Transchel

There are way more prima facies you can bring into play, but multiplication is kind of the most basic example, yes. And it's baffling to me just how naive people are in brushing them away without any scientific rigour whatsoever.

Others include the halting problem and computability as well as recursion, which are fundamentally incompatible with the transformer paradigm*.

* And oh yes, of course you can show that Transformers are Turing-complete in principle and *at the same time* posit that the way they are used for LLMs clearly are *not*.

Expand full comment

A$ we have $een, ba$ing one$ deci$ion$ and action$ on fir$t principle$ is overrated.

Expand full comment

hugh

It is so strange how gullible all the tech journalists have been on AGI. This whole hype cycle has really made me reassess the level of skepticism I have with them going forward.

It’s also got me wondering… maybe this is why we have these delusional Elon fans that think he’s skilled in all technical (and now governmental?) domains.

I have never encountered any entity with true General Intelligence. Human or machine. Every highly skilled or intelligent person I’ve met is specialized. But for some reason tech journalists assume that any entrepreneur with great success has that general intelligence. It’s so odd.

Expand full comment

John Wellbelove

4dEdited

Very many years ago, I used to tell friends that I saw the future of AI being in what would probably be called Artificial Specific Intelligence (ASI).

AI systems that were very narrowly focussed and extremely competent at a very specific task. That's what I would find most useful.

I code in C++ and I use AI whilst coding when I get stuck on a problem, and need a push in the right direction. Often it gets itself into a loop of:-

AI: "here is the code you asked for"

Me: "There is a bug in X and you didn't account for Y"

AI: "I apologise. Here's the corrected code"

Me: That's the same as the last code you gave me"

AI: "I apologise. Here's the corrected code"

At this point I rephrase the question...

AI: "here is the code you asked for"

Me: That's the same as the last code you gave me"

and so on in a never ending loop.

Why can't I have a smaller, less computer resource hungry, more programming focused AI, that's been pre-trained with the specifics of the language that I am using? Rather than one that has been trained on all sorts of poor quality code it managed to scrape from the internet (sometimes its own). I really have no need to ask it questions about "the total carrot export trade of the UK in 2024" or some other non-programming related nonsense that I Just Don't Need.

Expand full comment

Completely agree with your observations.

Expand full comment

mpsingh

I'm curious, have you tried some of the recent models like claude 3.7 in optimised or honestly somewhat ideal conditions, like small codebases (~20kloc) built with detailed instructions.

As an example the creator of HVM who got it to build a smaller implementation based on technical descriptions, got it to successfully apply low level optimisations and even translate it to cuda.

This is very impressive and it can be expected that incremented improvements make this work better at a practical scale and that makes me somewhat afraid so I would appreciate a more critical and sceptical take

Expand full comment

mcswell

I have to think that the "human level AI is imminent" is behind Musk's firing of Federal workers. He thinks that AI (and probably xAI specifically) will take over their work before long. Particularly with probationary workers--why hire or keep new workers, when their job will be done by AI in a few years?

To be clear: this is taking a terrible all-eggs-in-one-basket risk (and personally I'm pretty sure it will fail). If AI doesn't work as well as Elon thinks, the Federal government will be in a heap of trouble. A *sensible* way to approach a transition towards more AI in government (or anywhere else) would be parallel implementation, a technique that has been used in software projects for decades. Implement the software--the AI in this case--but let it run for a couple-three years in parallel with humans, and evaluate where it works and where it doesn't. And where it doesn't work, and the software people claim to have a solution, continue the parallel implementation to see if their fix actually works.

Note: parallel implementation doesn't take care of the long tail of unusual events. We'll always have those, and therefore we'll always (at least for the next few decades) need humans in the loop.

Expand full comment

I don't think you're correct about Musk/Trump's real plan.

From the start it's been obvious to all that the proposed savings were impossible: they were greater than the discretionary spending in the budget. The actual purpose is to provoke public demonstrations large enough to cover the pretext Trump needs to invoke the Insurrection Act (with the likely help of bad actors).

Expand full comment

Kim Skaggs

Sorry Gary. I didn't read your article. First of all, I am so tired of Erza Kleine He is the new media golden boy. He does write well but he has been wrong for so long. you would think readers would catch on.

Also, I know I have no idea how much AI has invaded my reading. But I find most of what I read now days is almost incoherent. Is that AI or just dumb ass journalists?

Thank you for your intelligent opinions.

Expand full comment

Chad Woodford

If Klein had talked to scientists in 1895, 99% of them would have told him that they had basically solved “all of science.” So often the consensus even among experts is wrong. Anyway, I don’t think his podcast goes back that far.

Expand full comment

https://curriculumredesign.org/wp-content/uploads/The-Frustrating-Quest-to-Define-AGI.pdf

but he’s usually better than that, which is why this one was so disappointing

Expand full comment

Shane

As soon as we figure out the nature of aether, the final secrets of the universe will reveal themselves - 99% of physicists before Einstein.

Expand full comment

Spartacus

Meanwhile, outside all this blather about A.G.I. and the fate of humanity...

I have spent my morning thusly: A colleague (MD, very smart, emergency medicine) has a poster I helped with a little that he presented at a conference. He asked my help turning it into a short communication for journal publication. All fine.

But, he couldn't send me the data. I've been asking him for the data for his graphics without getting response. I sent emails, and yesterday I texted him. He called me, unsure what I need. "Oh, I haven't seen your emails. I get like 1000 emails a day. I can't read all that [deleted] [deleted]." So I send him the email while he watches. And it's off to the races.

There's a spreadsheet of data from experiments conducted that are hard to duplicate. He and a different colleague who traveled a long ways to help him with his data collection collected measurements of [redacted]. To do that, they developed a new method for accurately measuring [redacted]. (More complicated than it sounds, because [redacted] vary [redacted] based on [redacted], and he wanted [redacted] [redacted].) So, all that data they so painstakingly collected was entered into a spreadsheet. Not easy to collect. In fact most people faced with collecting it, would run screaming from the room---literally. Collecting data for [redacted] is very scary if you aren't trained, and could cause death or lifetime disability.

Thus the phone calls. Where is that [deleted]? He entered it in a spreadsheet "the way microsoft works". That turns out in this case to be in "the cloud". So, over the phone, we eventually find it, sort of. He sends me a spreadsheet that is opened by his MS Excel by sharing it to me as a link. I download this file. And I look for the data. The data is in a different spreadsheet named "[redacted].xlsx" Can't find this separate dataset spreadsheet. So, to save time, because I'm "the software guy", we go through the exercise of finding the password for his cloud. This takes a good hour before he calls me back. Then, I sign in and get a code sent to his email. And Microsoft doesn't like that I am across the country 1500 miles away. So we f**k around some more.

Finally, I get in, and root around for the data. Find 6 files that have some version of the data. Download it.

Hours wasted on this. This is real life. This is the Jetson's future.

Then my mind wanders to some future "AGI" that has to deal with stuff like this. I expect that AGI will find a workaround, as I did, and then cement that workaround into itself as "the way to do it." This will turn simple file accesses into a thrashing floundering mess. Then, one day, something happens at Amazon Web Services (AWS) and... everything is gone. And nobody has a clue what is even gone, except that there are empty spots where there used to be "stuff" that "meant something."

Expand full comment

Charles Fadel

You know that they will use the deceiving trick of defining AGI the way that suits their claims, for victory parade, not your far clearer/tougher definition....

Expand full comment

I want to go on record: I have never, ever seen anyone, ever, move the goalposts. ;-)

Expand full comment

Youssef alHoutsefot

From Nature. Yesterday.

How AI can achieve human-level intelligence: researchers call for change in tack

"The AAAI report emphasizes that there are many kinds of AI beyond neural networks that deserve to be researched, and calls for more active support of these techniques. These approaches include symbolic AI, sometimes called ‘good old-fashioned AI’, which codes logical rules into an AI system rather than emphasizing statistical analysis of reams of training data."

https://archive.ph/hS18Y

Expand full comment

Sounds so crazy, it just might work!!

One of Gary's subscribers wrote that. I bet.

Expand full comment

Jim Johnson

"Imminent" AGI would have catastrophic effects on employment (he "Digidemic"), which would completely upend the economic framework in which AGI would operate. Anyone who touts imminent AGI should be frantically urging government action to mitigate those potential effects, as such mitigation would happen much more slowly than the spread of the catastrophic effects. The absence of any such urging from AGI-hypers suggests that their speculation is hollow self-promotion.

Expand full comment

Reply (3)

I would give Klein more credit here. We just had a little email interchange and he is (I think) earnestly urging government action (and political thought around) how to mitigate what he see as large potential threat to employment.

Expand full comment

Reply (3)

R.C.

This is actually what I found so frustrating: what feels possibly-imminent is the EMPLOYMENT of "AGI"—whether it's "AGI" or not.

That is ABSOLUTELY disruptive and VERY possible, setting aside the question of whether it qualifies.

And then you have to address the "what if it isn't?"—implementing a not-actually-smart AGI as a replacement for humans would be enshittification of the actual labour market, and somewhere between driving companies off cliffs, and just sort of washing them out with slop that's sometimes close enough to right that everyone just sort of runs with it and things turn to a bland grey slurry—AND there's an employment crisis.

We don't need "actual AGI" for this crisis to be a real concern, and it was deeply frustrating that he phrase it "only" in those terms—though admittedly the intro put me off so much I couldn't finish the episode.

Expand full comment

Spartacus

4dEdited

Yes, the shittification of corporate operations is a problem. But I don't see it on a broad scale certainly. I see corporations trying it and saying "This is worthless". They won't pay beyond pilot-ware. Corporations are dumping payments for pilot warehouse. Grok3 is given away on X if you pay the minimum. I calculate that my use of Grok has cost Elon at least $20,000 to provide me and probably 5 or 10 times that. It's not sustainable in the real world.

An exception is the "article for hire" also known as journalism, but even that depends on giving the service away.

Another exception is the crypto scammers and "Nigerian scam".

And I think hacking is using it.

IOW, economic monstrosities that are parasites and plagues. That's the big use case for this "AI". And even those need it given away as loss leader.

The only LLM that isn't in this category are applications in biotech training their own models. But even there, they depend on previous work.

Economic and cultural warfare is a new area I think. It could be the greatest use case of all.

Expand full comment

keithdouglas

I am not convinced this is reliably going to happen enough places to prevent the occurrence we are worried about. Happy path testing is just so entrenched that with the "ELIZA effect" we are seriously hurting. The *simple* tests of the chatbots I have seen that show it is full of crap are either not done before it gets to me as a pentester or somehow people don't care. I don't know which is worse.

Expand full comment

R.C.

> Economic and cultural warfare is a new area I think. It could be the greatest use case of all.

This is actually a very good point: I didn't quite articulate all of this, but I think "is it AGI?" is a distraction if you want to talk about the cultural ramifications. It doesn't have to be. Economic and cultural warfare, for example, don't require a "perfect AGI" to have the desired effects.

I do still worry that we *might* get a fine-tuned enough model for some companies to try to pull this off on some smaller scale, even if, in the end—you're probably right and they'd go "this doesn't have a useful increase in throughput, nevermind".

But some of it could be how long it takes to get them to that "nevermind". I'd like to imagine it would take a pretty special CTO or CEO or whomever to actually dump humans and take this on before realizing it doesn't do as well, but the incremental gains that additional training have provided are what give me this taunting fear: output will get so "close enough" that it will be enough of a mirage to encourage some actual adoption before it collapses in on itself.

Mind you, even though I firmly believe "adoption of crappy pseudo-AGI" is far more *possible* (hey, we already have it, all someone has to do is go all-in and try to replace people with it…)—I am still miles from the confidence in adoption happening that Ezra has, whether it qualifies as AGI or not.

Expand full comment

Steven Work

So, RC,

".. a replacement for humans would be enshittification of the actual labour market .." this sounds like an even worse crippling of productivity to the remaining work force as well as the corporation entire - much like most of us older professional workers saw with Affirm-Action and DEI Quotas, 'no white man' hiring policies.

.. Meritless, raised fatherless (no good example) Feelie-thinky, resent-filled non-teammate disrupting slacking co-workers, what those fools that cause problems for the 20% that does 50% of the good work.

But everyone will pretend it was a good choice while leaving the productive few to suffer that poison-dog-vomit dropped in our laps.

God have Mercy.

Expand full comment