I love the cope here from Altman and Mollick. "It's a different kind of intelligence", "there's a magic to it I haven't felt before", "I am talking 'vibes' not scores".
Yes gentlemen, this is what bias feels like from the inside.
there is a genuine sense of less 'ai noise' in the output with grok 3 and likely gpt 4.5, that makes it feel more natural, but calling it an a different kind of intelligence is just stupid, its more comparable to better curve fitting in electronics I guess
AI has been a huge gift to mostly engagement farmers. Practical uses remain rather narrow.
UBI isn't coming, your disease won't be cured by LLMs, and you will struggle more to find reliable information in the wasteland of hallucinations filling blogs, podcasts, etc.
I mean, even if they got AGI, UBI would still never come. Total fantasy. The oligarchs aren't shoveling billions in to help people like you and me, they're doing it in the hopes of no longer needing us and killing us off. It might generate trillions in GDP, but people like us would see zero cents of it. I don't think I'll ever get why the AI evangelists online think they won't die in the bread line with the rest of us.
Correct. Billions of dollars in hopes of creating the wish-granting machine. And if that were possible, who gets the first wish? Utopian dreamers really haven't thought this through as to what would likely happen.
The Anthropic CEO went to Davos, said "AI will double the human lifespan by 2030," and everyone sat there in awe instead of laughing at him until he cried, which is what they should have done.
You can always make a point about something stupid being smart from the perspective of a narcicist getting what he wants, but this is stupid intelligent stupid.
But we should all understand that when they say, "AI will double the human lifespan," they aren't talking about you and me, not talking about the species. They're talking about those oligarchs who, you know, are the only ones who really matter.
Gary Marcus, it would be great if you could test Open Evidence, an LLM specialized in clinical medicine. Doctors use it everywhere, so if Open Evidence hallucinates as much as ChatGPT, the situation would be tragic.
I agree that OpenEvidence includes disclaimers and safety measures, such as requiring users to be healthcare professionals, like physicians, and restricting its use to access and analyze clinical evidence rather than as a substitute for medical advice. However, many doctors rely on OpenEvidence as a source for the medical advice they provide to their patients. Besides, the claim that OpenEvidence does not hallucinate disturbs me. In the link below, Sequoia Capital states:
"The platform searches across 35 million peer-reviewed publications—and, thanks to a new content partnership, is trained on The New England Journal of Medicine. There are no hallucinations; if the literature is inconclusive, OpenEvidence simply doesn’t respond."
Did yet another cycle of the "Please solve this cryptogram" test. Claude 3.7 got the furthest. GPT 4.0 actually regressed since the last test -- it passed the first hurdle last time and failed it this time. All the others failed the first hurdle. Will have a report out on my substack as soon as I recover from an afternoon with stupid robots.
Well, sure, you are right. AGI or anything related isn't going to happen. Just like 'the electronic superhighway' of the 1990s (a.k.a. internet) wasn't going to bring us perfect information for all and democracy everywhere, and a new economy that would make everybody insanely rich while everything was free, etc.. Dehyping such nonsense predictions is a bit like shooting fish in a barrel (been there, done that, thirty years ago). I predict you're going to tell us next year you were right (and you will be, I strongly suspect).
But can we make a decent guess what *is* going to happen? A dotcom-bust like correction at some point? But what else? If you understand the tech, it's easy enough to say some stupid prediction *isn't* going to happen. But what might be? A lot of GenAI generated noise/junk is a pretty obvious one.. But some decent value too I suspect. GenAI-based conversation bots for lonely people? Anything?
Well, LLM has cut down the time it takes for me to look things up on wikipedia really quite dramatically. Less so if you include the time taken to check the answers are actually true.
For me, typing "w keyword" in my browser is still ten times faster and hundred times more energy-efficient than a query for an LLM that might also be less factual than wikipedia.
As a researcher in mathematics and computer science, current LLMs have sped up 10x the time that I had to spend before to learn new things, outside my specialization area. I can instantly find what I need, without having to read dozens, hundreds of articles. That was daunting before. I can now ask for an intuitive explanation, with which in mind I can go far faster in understanding the actual papers. So, not AGI, but incredibly useful assistants. But LLMs give only back what you give to them. If you ask the right questions, and you have good intuitions, they will do the boring work for you, but are not creative.
I was being somewhat facetious, but with that said it does save time on queries like "does alzheimer's disease affect cortical iron content". Then you can follow up on the web with what it answers. Just typing those terms into duckduckgo doesn't always produce very targeted results. . Also, if I'm reading a paper that is somewhat outside my field it's good for quickly getting to grips with things. e.g. it quickly gave a very useful answer to "how does CNPase change with age".
So yeah, I do find chat GPT useful. I would easily survive without it though. I'm about to do some coding and I do expect it to help speed that up. Generally it is good if you are already good at something and can critically assess its output. That seems to be the consensus that's emerging.
This is my question as well. When is the correction, and what are the benefits and harms that will stay even after the correction. I fear one harm is even more people leaning into cognitive bias and further erosion of trust.
The money flowing in won't dry up because the potential payoff of getting to lay off all their human workers forever will be too much for oligarchs to give up on, so sadly it'll just keep going as is
While there is the plattitude about markets staying irrational longer than (most) can stay solvent, burning through capital without tangible ROI *will* eventually come to an end.
I must say I do like Tyler Cowen, and have been extremely dismayed by his credulous, non-academic approach to AI. It makes me wonder whether the only reason I find anything he says interesting is Gell-Mann amnesia. Maybe he is all enthusiasm and information, and no actual thought in everything he's doing, not just in discussing AI.
I mean: the podcast linked here is just nauseating.
Same guy who interviewed the Microsoft CEO last week. Dude actually asked if they were getting close to developing immortality and the CEO was just like "this crap can't even sort my email bro."
You're right, Gary. Scaling is a shambles and isn't going to recover. And, as I pointed out somewhere and you've pointed out, these so-called reasoning models actually seem to employ a bit of symbolic computing in their architecture, a bit of expert-system search and control. Alas, I fear that these guys are likely to double-down on it and throw good money after bad, which is classic sunk-cost behavior.
And yet, what these LLMs can do is utterly remarkable. For example, I typically work across two or three disciplines at a time – chosen from cognitive psych, computational semantics, neuroscience, literary criticism, anthropology and a bit of this and that – and so have trouble getting knowledgeable feedback on my work. But Claude does that for me.
Here's it's evaluation of a series of experiments I did on story variation in ChatGPT, https://new-savanna.blogspot.com/2025/02/claude-37-evaluates-my-paper-on-story.html The experiments were, in turn, suggested to me by the work Claude Lévi-Strauss did on myth back in the 1960s. (And, for what it's worth, Sheldon Kline at Wisconsin did an Old School model of Lévi-Strauss's myth work in the late 1970s.)
But it just praises your paper. Academia is mostly about giving and receiving tough but constructive criticism, which I think is far harder to do. At least it is in the sciences.
Oh, I know what Academia is about. It's also about gate keeping. Gate keeping has the appearance of tough criticism, but it is a very different activity.
Claude's criticized other work I've done this with. Feel free to read that paper and criticize it. Though, if you do so, I'd appreciate real criticism, not the equivalent of "but we don't do things that way."
Unfortunately, like everyone I am already turning down journal requests to review papers all the time. If LLMs can help with that problem, it's a good thing however I'm pretty dubious from what I've seen.
Your project is very interesting. I wondered about a couple of things, 1. the use of "colorless green ideas." Given this is a known expression that ChatGPT was able to reference, did you attempt to use, or would, another, similarly analogous, syntactically correct, but semantically meaningless, expression have given the same, or similar result?
2. What is your evaluation of the CHATGPT evaluation. Do you feel it added anything insightful to either your work you asked it to 'evaluate,' or on your understanding of its methodology or capabilities, etc.? Thanks for sharing this.
1.) On your first question, I doubt that the familiarity of the phase had anything to do with ChatGPT's response, but it's such a strange beast that you don't really know until you try.
At this point I've done over 200 story-variation tests, and 100s of other tests as well. I've used other nonsense phrases in other contexts. I don't off-hand recall just what you're asking about. I noted in the paper that later on I did try the green ideas phrase, but that ChatGPT told a story. If you check the link in the paper I think you'll see that it told a story where the ideas acted as ideas, in dreams or something like that. I was unable to replicate that exact experiment. Given that OpenAI kept changing the version, it's hard to tell what's going on.
Later on I asked ChatGPT to tell a story, no specification beyond that. So it told a fairytale about Lily. I then began prompting for more information about Lily and for more episodes in the ongoing story. At one point I introduced a flying saucer and a robot named Gort. Here's how ChatGPT responded:
"It sounds like you might be referring to a classic science fiction film called "The Day the Earth Stood Still," which features a robot named Gort and a flying saucer landing on Earth. However, as Lily's story has not previously included elements of science fiction or extraterrestrial visitors, I'm afraid I cannot answer this question within the context of her story. If you have any other questions about Lily and her world, I would be happy to try and answer them for you!"
2.) On the second issue, I asked it to review that paper in the context of a longer interaction. I began that interaction by asking it to evaluate a long and complex theoretical paper about language and cognition in the brain. It pointed out strengths and weaknesses in the paper and that led to a discussion, not only of that paper, but of an ongoing collaboration I'd begun with a machine vision expert, Ramesh Viswanathan, at Goethe University Frankfurt. It was in the context of that discussion that I uploaded the story variation paper. Why? That's the paper that motivated Ramesh to contact me. What I got from Claude was simply that the paper presented a sensible line of research.
On the one hand, that doesn't seem like much. But, when I did that paper, I wasn't undertaking a standard kind of investigation. Rather, I was undertaking something that, as far as I knew, I had made up without precendent. When you do that kind of thing, which I've done a few times, it's useful to have a simple reality check: Is this anything, anything at all?
In this particular case, I already had Viswanathan's approval, which is significant because his background is quite different from mine. In particular, he has a great deal more mathematical expertise than I do. Still, the two us could be out to lunch on this one.
But Claude 3.7, for all practical purposes it's been trained on the whole literature (up to its cutoff date). In some sense in "knows" much more than Viswanathan and me put together. That's worth something. Just what it IS worth, I don't know – more than peanuts but most likely somewhat less than gold.
I really feel like everyone discussing LLMs should shun the "hallucinations" term. This was obviously invented by some marketoid to conflate LLMs with human brains, which is complete BS. Let's call them "errors" or "mistakes" or "bullshit" or whatever, but not "hallucinations". That gives them an upgrade they don't deserve.
All the best marketing ideas are stolen. In this case, from the 1983 movie "War Games". Near the end of the movie Dr. Falken states, "Joshua is hallucinating, he is trying to trick you to get the codes". OpenAI's marketing, entirely driven by anthropomorphism, is how they trick you to get your money.
I love the cope here from Altman and Mollick. "It's a different kind of intelligence", "there's a magic to it I haven't felt before", "I am talking 'vibes' not scores".
Yes gentlemen, this is what bias feels like from the inside.
It’s a different kind of intelligence.
An unintelligent kind.
That is unfair toward all other kinds of biases - bias is always stupid, but the reason you incur it are not necessarily stupid.
there is a genuine sense of less 'ai noise' in the output with grok 3 and likely gpt 4.5, that makes it feel more natural, but calling it an a different kind of intelligence is just stupid, its more comparable to better curve fitting in electronics I guess
Funny how both Claude and ChatGPT are asking me to “save money” by paying up front instead of month by month….
I wonder. Anthropic tells me their "special offer" ends tomorrow. I probably won't take it. How long before they make another special offer?
AI has been a huge gift to mostly engagement farmers. Practical uses remain rather narrow.
UBI isn't coming, your disease won't be cured by LLMs, and you will struggle more to find reliable information in the wasteland of hallucinations filling blogs, podcasts, etc.
Exactly. But people are still surrounded by hype and it’s easier to get promoted if you agree with common knowledge 🤨
I mean, even if they got AGI, UBI would still never come. Total fantasy. The oligarchs aren't shoveling billions in to help people like you and me, they're doing it in the hopes of no longer needing us and killing us off. It might generate trillions in GDP, but people like us would see zero cents of it. I don't think I'll ever get why the AI evangelists online think they won't die in the bread line with the rest of us.
Correct. Billions of dollars in hopes of creating the wish-granting machine. And if that were possible, who gets the first wish? Utopian dreamers really haven't thought this through as to what would likely happen.
Thanks for "wish-granting machine".
About to Grant the Impossible. YEP. Thank you Mr. Altman.
The Anthropic CEO went to Davos, said "AI will double the human lifespan by 2030," and everyone sat there in awe instead of laughing at him until he cried, which is what they should have done.
He really said that?
You can always make a point about something stupid being smart from the perspective of a narcicist getting what he wants, but this is stupid intelligent stupid.
But we should all understand that when they say, "AI will double the human lifespan," they aren't talking about you and me, not talking about the species. They're talking about those oligarchs who, you know, are the only ones who really matter.
Gary Marcus, it would be great if you could test Open Evidence, an LLM specialized in clinical medicine. Doctors use it everywhere, so if Open Evidence hallucinates as much as ChatGPT, the situation would be tragic.
Does Open Evidence transparently document how their "AI" works and what tasks and results they stand behind? From a quick search I see...
OpenEvidence is not peer-reviewed
It's not a substitute for clinical expertise
It doesn't provide medical advice, diagnosis, or treatment
I agree that OpenEvidence includes disclaimers and safety measures, such as requiring users to be healthcare professionals, like physicians, and restricting its use to access and analyze clinical evidence rather than as a substitute for medical advice. However, many doctors rely on OpenEvidence as a source for the medical advice they provide to their patients. Besides, the claim that OpenEvidence does not hallucinate disturbs me. In the link below, Sequoia Capital states:
"The platform searches across 35 million peer-reviewed publications—and, thanks to a new content partnership, is trained on The New England Journal of Medicine. There are no hallucinations; if the literature is inconclusive, OpenEvidence simply doesn’t respond."
https://www.sequoiacap.com/article/partnering-with-openevidence-a-life-saving-healthcare-revolution/
Did yet another cycle of the "Please solve this cryptogram" test. Claude 3.7 got the furthest. GPT 4.0 actually regressed since the last test -- it passed the first hurdle last time and failed it this time. All the others failed the first hurdle. Will have a report out on my substack as soon as I recover from an afternoon with stupid robots.
Is it me or is Mollick sounding less and less academic and thought leader-ish, and more just balanced fanboy
I think what you're observing has been the case since beginning of 23, I have not seen a difference.
Well, sure, you are right. AGI or anything related isn't going to happen. Just like 'the electronic superhighway' of the 1990s (a.k.a. internet) wasn't going to bring us perfect information for all and democracy everywhere, and a new economy that would make everybody insanely rich while everything was free, etc.. Dehyping such nonsense predictions is a bit like shooting fish in a barrel (been there, done that, thirty years ago). I predict you're going to tell us next year you were right (and you will be, I strongly suspect).
But can we make a decent guess what *is* going to happen? A dotcom-bust like correction at some point? But what else? If you understand the tech, it's easy enough to say some stupid prediction *isn't* going to happen. But what might be? A lot of GenAI generated noise/junk is a pretty obvious one.. But some decent value too I suspect. GenAI-based conversation bots for lonely people? Anything?
Well, LLM has cut down the time it takes for me to look things up on wikipedia really quite dramatically. Less so if you include the time taken to check the answers are actually true.
Can you comment on what your workflow is?
For me, typing "w keyword" in my browser is still ten times faster and hundred times more energy-efficient than a query for an LLM that might also be less factual than wikipedia.
As a researcher in mathematics and computer science, current LLMs have sped up 10x the time that I had to spend before to learn new things, outside my specialization area. I can instantly find what I need, without having to read dozens, hundreds of articles. That was daunting before. I can now ask for an intuitive explanation, with which in mind I can go far faster in understanding the actual papers. So, not AGI, but incredibly useful assistants. But LLMs give only back what you give to them. If you ask the right questions, and you have good intuitions, they will do the boring work for you, but are not creative.
yeah, I more or less buy that. I don't know about 10x but it's quite helpful.
I was being somewhat facetious, but with that said it does save time on queries like "does alzheimer's disease affect cortical iron content". Then you can follow up on the web with what it answers. Just typing those terms into duckduckgo doesn't always produce very targeted results. . Also, if I'm reading a paper that is somewhat outside my field it's good for quickly getting to grips with things. e.g. it quickly gave a very useful answer to "how does CNPase change with age".
So yeah, I do find chat GPT useful. I would easily survive without it though. I'm about to do some coding and I do expect it to help speed that up. Generally it is good if you are already good at something and can critically assess its output. That seems to be the consensus that's emerging.
This is my question as well. When is the correction, and what are the benefits and harms that will stay even after the correction. I fear one harm is even more people leaning into cognitive bias and further erosion of trust.
The money flowing in won't dry up because the potential payoff of getting to lay off all their human workers forever will be too much for oligarchs to give up on, so sadly it'll just keep going as is
While there is the plattitude about markets staying irrational longer than (most) can stay solvent, burning through capital without tangible ROI *will* eventually come to an end.
Who buys their products when that happens?
Bots will buy bots to make more bots.
Any other questions?
Does it matter? They'll control all the capital and none will be distributed. Social platforms will just be what they are now which is just bots.
Do lonely people like to talk about the health benefits of eating crushed glass and the cooking benefits of glue on pizza?
Twice the cost of Apollo, that is unreal Gary!!
Haha, good comparison. Especially since I would no way set foot into a landing rover built by any of the models anytime soon...
Tesla landing craft 😀
Even irrational exuberance eventually comes to an end.
It’s spelled “AIrational”
No matter how you spell it, irrAItional. or even irrationAIL. Results are the same.
It’s all AIr
You earned a victory lap for sure :D
I must say I do like Tyler Cowen, and have been extremely dismayed by his credulous, non-academic approach to AI. It makes me wonder whether the only reason I find anything he says interesting is Gell-Mann amnesia. Maybe he is all enthusiasm and information, and no actual thought in everything he's doing, not just in discussing AI.
I mean: the podcast linked here is just nauseating.
Same guy who interviewed the Microsoft CEO last week. Dude actually asked if they were getting close to developing immortality and the CEO was just like "this crap can't even sort my email bro."
You're right, Gary. Scaling is a shambles and isn't going to recover. And, as I pointed out somewhere and you've pointed out, these so-called reasoning models actually seem to employ a bit of symbolic computing in their architecture, a bit of expert-system search and control. Alas, I fear that these guys are likely to double-down on it and throw good money after bad, which is classic sunk-cost behavior.
And yet, what these LLMs can do is utterly remarkable. For example, I typically work across two or three disciplines at a time – chosen from cognitive psych, computational semantics, neuroscience, literary criticism, anthropology and a bit of this and that – and so have trouble getting knowledgeable feedback on my work. But Claude does that for me.
Here's it's evaluation of a series of experiments I did on story variation in ChatGPT, https://new-savanna.blogspot.com/2025/02/claude-37-evaluates-my-paper-on-story.html The experiments were, in turn, suggested to me by the work Claude Lévi-Strauss did on myth back in the 1960s. (And, for what it's worth, Sheldon Kline at Wisconsin did an Old School model of Lévi-Strauss's myth work in the late 1970s.)
But it just praises your paper. Academia is mostly about giving and receiving tough but constructive criticism, which I think is far harder to do. At least it is in the sciences.
Oh, I know what Academia is about. It's also about gate keeping. Gate keeping has the appearance of tough criticism, but it is a very different activity.
Claude's criticized other work I've done this with. Feel free to read that paper and criticize it. Though, if you do so, I'd appreciate real criticism, not the equivalent of "but we don't do things that way."
Unfortunately, like everyone I am already turning down journal requests to review papers all the time. If LLMs can help with that problem, it's a good thing however I'm pretty dubious from what I've seen.
Your project is very interesting. I wondered about a couple of things, 1. the use of "colorless green ideas." Given this is a known expression that ChatGPT was able to reference, did you attempt to use, or would, another, similarly analogous, syntactically correct, but semantically meaningless, expression have given the same, or similar result?
2. What is your evaluation of the CHATGPT evaluation. Do you feel it added anything insightful to either your work you asked it to 'evaluate,' or on your understanding of its methodology or capabilities, etc.? Thanks for sharing this.
1.) On your first question, I doubt that the familiarity of the phase had anything to do with ChatGPT's response, but it's such a strange beast that you don't really know until you try.
At this point I've done over 200 story-variation tests, and 100s of other tests as well. I've used other nonsense phrases in other contexts. I don't off-hand recall just what you're asking about. I noted in the paper that later on I did try the green ideas phrase, but that ChatGPT told a story. If you check the link in the paper I think you'll see that it told a story where the ideas acted as ideas, in dreams or something like that. I was unable to replicate that exact experiment. Given that OpenAI kept changing the version, it's hard to tell what's going on.
Later on I asked ChatGPT to tell a story, no specification beyond that. So it told a fairytale about Lily. I then began prompting for more information about Lily and for more episodes in the ongoing story. At one point I introduced a flying saucer and a robot named Gort. Here's how ChatGPT responded:
"It sounds like you might be referring to a classic science fiction film called "The Day the Earth Stood Still," which features a robot named Gort and a flying saucer landing on Earth. However, as Lily's story has not previously included elements of science fiction or extraterrestrial visitors, I'm afraid I cannot answer this question within the context of her story. If you have any other questions about Lily and her world, I would be happy to try and answer them for you!"
That's quite similar to the green ideas response. ChatGPT had a sense of what was (ontologically) appropriate for the story and was unwilling to violate that sense. You can find that interaction on my blog: https://new-savanna.blogspot.com/2023/02/exploring-lilys-world-with-chatgpt.html
2.) On the second issue, I asked it to review that paper in the context of a longer interaction. I began that interaction by asking it to evaluate a long and complex theoretical paper about language and cognition in the brain. It pointed out strengths and weaknesses in the paper and that led to a discussion, not only of that paper, but of an ongoing collaboration I'd begun with a machine vision expert, Ramesh Viswanathan, at Goethe University Frankfurt. It was in the context of that discussion that I uploaded the story variation paper. Why? That's the paper that motivated Ramesh to contact me. What I got from Claude was simply that the paper presented a sensible line of research.
On the one hand, that doesn't seem like much. But, when I did that paper, I wasn't undertaking a standard kind of investigation. Rather, I was undertaking something that, as far as I knew, I had made up without precendent. When you do that kind of thing, which I've done a few times, it's useful to have a simple reality check: Is this anything, anything at all?
In this particular case, I already had Viswanathan's approval, which is significant because his background is quite different from mine. In particular, he has a great deal more mathematical expertise than I do. Still, the two us could be out to lunch on this one.
But Claude 3.7, for all practical purposes it's been trained on the whole literature (up to its cutoff date). In some sense in "knows" much more than Viswanathan and me put together. That's worth something. Just what it IS worth, I don't know – more than peanuts but most likely somewhat less than gold.
Thanks for your work! My daughter Erica turned me on to it. https://4dthinking.studio/ux-book
I've written 6-7 "AI" rants lately. I would like to call your attention to this one:
https://portraitofthedumbass.blogspot.com/2025/02/hallucinations-my-ass.html
I really feel like everyone discussing LLMs should shun the "hallucinations" term. This was obviously invented by some marketoid to conflate LLMs with human brains, which is complete BS. Let's call them "errors" or "mistakes" or "bullshit" or whatever, but not "hallucinations". That gives them an upgrade they don't deserve.
Gerben Wierda suggested to call them "failed approximations" in his insightful blogpost https://ea.rna.nl/2023/11/01/the-hidden-meaning-of-the-errors-of-chatgpt-and-friends/, I strongly recommend reading it :)
Really interesting. I think I will ask CHATGPT what it "thinks" about that article.
All the best marketing ideas are stolen. In this case, from the 1983 movie "War Games". Near the end of the movie Dr. Falken states, "Joshua is hallucinating, he is trying to trick you to get the codes". OpenAI's marketing, entirely driven by anthropomorphism, is how they trick you to get your money.
adding tens of thousands of gpus--just how much energy is this using? that's what i'm really concerned about.
They keep doing the same thing and expect the result to be different. Where have we heard that before?? It doesn't take an Einstein to figure it out.
Hype and no substance to keep the money spigot going!!!