114 Comments
Mar 15, 2023·edited Mar 15, 2023Liked by Gary Marcus

Seems like AI development is becoming more about passing standard tests than tackling the hard problems of intelligence.

Hacks that create a hypeable and sellable product are what's favoured.

Expand full comment
Mar 15, 2023Liked by Gary Marcus

If I had any artistic ability at all, I’d draw the following to try to help people understand the progress that’s been made here:

GPT-3 a blind folded person throwing a single dart at a dartboard

GPT-4 a blind folded person throwing two handfuls of darts at a dartboard

Expand full comment
Mar 15, 2023Liked by Gary Marcus

Wouldn't "confabulation" be a more appropriate word than "hallucination"?

Expand full comment
Mar 15, 2023Liked by Gary Marcus

"...it does astonishingly well on a whole bunch of standardized tests, like LSATs, GREs, and SATs."

Does it really? I asked it to generate ten sample SAT questions with answers. The first three looked fine. The fourth was "If 4x + 3y = 18 and 2x - y = 4, what is the value of x? A) 1, B) 2, C) 3, D) 4. Answer: B) 2." That's wrong, the answer is 3.

As my next question in the same session, I asked it "If 4x + 3y = 18 and 2x - y = 4, what are the values of x and y? Explain your answer." It came up with the correct answer of 3, and showed a reasonable method of solving it.

In one sense ("given a pile of multiple choice questions, does it choose the correct answers?") it does do astonishingly well. But if we step back and consider what those kinds of SAT questions are attempting to measure, which I assume to be reasoning ability, examples like the above seem to indicate that it doesn't have that,[1] is doing well on the SAT though some different means, and the SAT questions we've been giving it have been doing a pretty poor job of measuring the abilities we expect the SAT to measure. In that sense, I find "does well on the SAT" to be incredibly misleading.

I think there's a lot of confusion (much of it deliberate) about what ChatGPT is doing and where this can be useful.

[1]: This is not a cherry-picked example; I see examples of obvious failures to reason all the time from ChatGPT, and not just in math. For example, making up URLs as references that don't lead to a page and, after some investigation, appear to be not links that were there and were taken down, but links that never served anything but a 404 Page Not Found.

Expand full comment

Well written. It is not AGI. It’s a bigger and better version of predicting “what comes next”. Nevertheless, it is paving path to discussions about security, alignment and others. Thank you for sharing your insights.

Expand full comment

I cross posted this Gary. I thought it was a good assessment of strengths and weaknesses. I also agree it is not a path to AGI

Expand full comment
Mar 15, 2023Liked by Gary Marcus

Sir: screenshot of chatGPT-4? .... that mentions chatGPT-5??....

Thank you. You are above all AI experts so far in advancing a call to reject these toys as irrelevant, useless and plain primitive contraptions far removed from our Thoughts, Self-Awareness and unitary Consciousness.

But to me these algorithmic engines are a useful tool to exercise my curiosity and hopefully increase my utility as a thinking, helpful citizen of the World

Expand full comment

My assessment is that, in the ML world, GPT-N are amazing achievements; in the AGI world, however, they are dangerous lowest-common-denominator low-hanging fruit.

Expand full comment

It's failure to improve on the AP English scores of GPT-3.5 (2 out of 5) also indicates a lack of true generalized intelligence. While most standardized tests ask students to answer questions which already have answers, the AP English tests asks students to read a selection of sources and form an original argument that creatively and analytically incorporates material from the sources.

Other tests have similar style questions, but because they test a student's knowledge of a subject matter, they ask questions related to highly researched areas in the field, and provide sources that scholars have thoroughly analyzed. The AP US History exam, for example, might ask a student to write about the most important causes of the American War for Independence, a subject for which there are thousands of books, articles, and publicly available lectures. It will use documents, like the Declaration of Independence or speeches from figures such as Thomas Jefferson, that frequently appear in secondary texts. GPT-4 can easily pastiche and paraphrase what scholars in the field have written to produce what would be an impressive looking essay for a 17 year old.

The English Exams, however, are not testing mastery of a subject matter, but skillfulness in synthesizing, analyzing, and creatively interpreting information. They can ask questions and provide sources which haven't been picked over by scholars. In other words, they can test the ability to truly think creatively and analytically in the face of new information or questions.

Expand full comment

"GPT-4 seems to know little of 2024" - so true!

Expand full comment

My quip about GPT-3 when talking to non-AI people was that "it's great at the stuff GPT-2 was good at, and bad at the stuff GPT-2 was bad at". I suspect I will be refurbishing that one-liner.

Expand full comment

Does anyone else find the limitations of the LLMs, and therefore (hopefully) deep learning generally, quite relieving? We are not ready for AGI, indeed my hope is it is at least a century away, so this potentially being a false dawn in the quest for AI makes me a lot more optimistic for humanity's future.

Expand full comment

https://jamesclear.com/all-models-are-wrong

"In 1976, a British statistician named George Box wrote the famous line, “All models are wrong, some are useful.”

His point was that we should focus more on whether something can be applied to everyday life in a useful manner rather than debating endlessly if an answer is correct in all cases. As historian Yuval Noah Harari puts it, “Scientists generally agree that no theory is 100 percent correct. Thus, the real test of knowledge is not truth, but utility. Science gives us power. The more useful that power, the better the science.” "

Its likely this is a detour on the road to AGI: so what? Some can pursue AGI while others create useful tools. People build tools on current technology in general even while others work to improve technology.

You note: "solve any of the core problems of truthfulness and reliability". Humans aren't entirely truthful or reliable and yet they are sometimes useful. There are concerns over a replication crisis even in the world of science and flaws noted in the peer review process. Humans are still trying to figure out the best approach to collectively seek reliable information while seeking "truth". Humans don't always agree on results using a judicial process to seek "truth".

Humans in general often don't agree on what is truthful or reliable so putting that as a necessary hurdle to achieve is setting an impossible goal and possibly attaching a constraint that would also detour from the path towards AGI.

In the meantime: people need to grasp that machines can be fallible just like humans. They can compare human sources of information, machine sources, etc. Machines can aid with that process. Yes: tools and methods should be created acknowledging the reality of potential harms, just as people do already regarding other technology. People create anti-virus software and spam filters, etc.

The tech is invented by people trying to solve real world problems: regulators don't invent the tech and usually merely distract from the problem and can in fact detract from it. Regulatory capture often leads to big players shutting out competitors so despite myths, often big players want regulation. Unfortunately some humans don't try to find reliable information on all aspects of subjects they write about.

Admittedly of course increasing reliability and accuracy regarding reality is a goal to strive for since we'd like to improve on human reasoning: and to allow for instance humility for them to consider the reality they may be wrong or unreliable due to flaws in the world or themselves, just as humans should spend more time on, especially those who are listened to by the public.

You noted that a prior comment I had was "condescending": but I'd suggest the reaction is due to the one many have that your comments are implicitly condescending in the sense of not actually considering or addressing anything other than strawman versions of critiques against your writings.

Expand full comment

I often think of human intelligence as having a pre-conscious thought generator to basically come up with ideas that might be bulls***. This is what we observe in mindfulness meditation as the monkey mind, or what some people have called "babel". But we also have an editor or curator who looks at the stream of babel and selects those ideas which make sense in light of a broad cognitive model of reality.

It seems to me like the current generative AI models do a pretty good job of brainstorming and remixing stuff to produce some raw idea material but they totally lack the editor/curator functionality of human intelligence. When using Bing chat, the human user is essentially playing the role of editor to the babel produced by the model.

I would like to understand more why the second part is so difficult to build into the systems themselves. It seems these companies take a very ad hoc approach with the alignment stuff, which is essentially a kind of dumb editor that is slapped on after the fact, and is not integrated in an intelligent way.

I know this is an unsolved problem, but I'm curious what are the current best ideas for how we might build such a combination.

Expand full comment

Inspired by David Deutsch's work, I asked ChatGPT if its hallucinations are a bug or a feature in pursuit of AGI. Who cares about the truth? No one, apparently. Are we deploying misinformation at scale in pursuit of AGI? Yes, according to ChatGPT4: https://twitter.com/AAlmeidaAuthor/status/1635919608338317312

Expand full comment
Mar 15, 2023·edited Mar 15, 2023

It seems to me that AI is confronting us not with an issue of information complexity but with an issue of resilience. Scaling is not and has never something humans in general had a problem with. We adjust to scale EXTREMELY well.

Similarly, I don't think the issue with AI is its inability to correctly process information. It will always be flawed, like any artificial tool. The greater threat, in my opinion, lurks in this topic's slipstream. As AI becomes more and more reliable and better at returning believable information, humans get more reliant on it. For as well as we handle scale by developing tools to deal with it, we're all the more ready to apply these tools at any scale once we have them. AI, while useful, is just one more temptation (I realize I'm sounding vaguely biblical or at least preachy here - not my intention at all!) to let the already severely lacking level of media literacy slack off further.

I'm observing this in myself regarding creative pursuits. Writing a story with AI assistance is easy, and I actually have to force myself to write without it by now. It's becoming a real fight to resist the temptation, and while the AI comes up with some great ideas, its output sometimes lacks the direction and "foresight" that a story written by a human author (who, after all, tries to get to a point of some sort eventually) would have. But the only way to recreate that direction is to take the lead yourself, and that requires actually walking the walk while the AI seductively offers you a comfy seat in the back of its metaphorical taxicab.

Expand full comment