114 Comments
Mar 15, 2023·edited Mar 15, 2023Liked by Gary Marcus

Seems like AI development is becoming more about passing standard tests than tackling the hard problems of intelligence.

Hacks that create a hypeable and sellable product are what's favoured.

Expand full comment

I think "hypeability" is not the only thing that matters here, but actual problem-solving.

As far as I've seen, GPT-4 can solve many real-world problems such as programming a Chrome extension. This doesn't sound as Humanity-advancing but has value nonetheless.

Taking GPT-4 just for its "true intelligence" is not sensible.

Expand full comment

I’d be curious to know how you view the effect this is going to have on “human programming” at large. As someone just starting out in a web development career, I know how much I’ve learned by having to think through every piece of code itself, even opting to use as little third-party tools and libraries as possible in order to first get a good sense of the underlying systems at play. If programmers start to increasingly rely on the current crop AI systems, it seems to me there’s a danger of deskilling here, especially in a premature way where we are still dealing with models that hallucinate and lack the ability to “truly reason” themselves.

To be honest, there’s a very personal feeling of loss involved here as well, since I genuinely enjoy writing code, though ultimately, of course, it’s hard to argue with the economics of the matter and I suppose the holy grail of AI research is to one day make such human work obsolete anyway.

Expand full comment

I'm just sceptical that anyone is going to invest in the heavy lifting needed to really approach AGI - salience/relevance, composition, etc. I think the current incentive landscape simply favours short-term, low investment goals.

Expand full comment

Absolutely. That's why AGI will need work from universities and non-profit research centers around the world, which are not worried about profitability.

Expand full comment
Mar 25, 2023·edited Mar 25, 2023

I think, also, full AI will require a situation in which HUMANS are not worried about profitability, otherwise expect big anti-AI movements.

Capitalism will have to be modified somewhat: the early AI pioneers either forgot or simply handwaved away the slight problem that if a robot works as well as a human we all starve to death under the system as is.

Expand full comment

Right, AI is not neutral to capitalism. Many of the futuristic hypotheses forget the simple fact that AI is progressing only in the measure it serves capitalist interests…

Expand full comment
Mar 27, 2023·edited Apr 8, 2023

Thank you. At least one other person in the comments section knows this.

It is one of my greatest annoyances with AI that worries about rogue AI (which at the moment is worth worrying about roughly as much as is the sun becoming a red giant) seem to predominate over the employment problem (which is happening NOW).

I see exactly 4 comments, the 3 in this discussion included, out of 84 in the comments section mentioning this.

OK, rant over.

Anyway, that is something that will have to be dealt with. How do you propose dealing with it? I've thought about this for a while and keep coming back to ubi. I feel like I'm missing something.

Another issue with AI + capitalism is that business owners tend to assume it will make more profit if they remove the humans (lower input cost).

Just one small problem: where do you get profit if no one has any money?

Expand full comment

One is an intermittent step for the other one. Just like humans, LLMs have to go to highschool and college before they can work on their pHDs.

Expand full comment
Mar 15, 2023·edited Mar 15, 2023Liked by Gary Marcus

AI researcher Yannic Kilcher pointed out that standardized tests assume that the person taking the test is good at the things humans are typically good at. They are meant to compare humans to other humans in areas with high variance. They exist, in short, to measure things that humans are typically not so good at.

That an AI is good at things humans are often bad at is noteworthy, but it isn't new. It also doesn't say much about whether they're good at things humans take for granted, which is the main problem for LLMs in this day and age.

Expand full comment

If think about this in problem domains. Not all problems solved by humans are in the LLM solution set. But what I can tell you is that there is a trend of more and more models( LLM, DNN, RNN) that solve human problems. The trend in our models is converging to AGI. But this just as trend, not guaranteeing future returns. Will we hit a wall? Sure, but in the problems we now can solve were only solved by humans before. These new models just expand the domain of problems computers can solve. The question is: to our current economy, what percentage of problems is now on the LLM bucket?

Expand full comment

I don't think that next token prediction is sufficiently sophisticated to bring about the emergence of the intelligent behaviour we're after. LLMs might be a good fundamental way of compressing a lot of textual data, but it has to be coupled with a different paradigm. GPT4 is still unable to generate any novel hypotheses about anything despite having been trained on virtually all the knowledge available. It is unable to admit its ignorance and will always be confident about falsehoods. At this stage, it seems crazy to believe that adding more and more parameters and throwing more and more compute at it will magically cause the qualitative leap to true intelligence that we still haven't seen. A newborn baby exhibits a deeper intelligence in some sense than GPT4, by being able to meta-learn, that is, continually upgrade its own learning algorithm. GPT4 is following the same trend that most publication in ML/AI is nowadays, add more layers, parameters, spin up more GPU instances and get your 1% relative improvement on some ancient and unrepresentative benchmark. We really need to start getting more creative than that.

Expand full comment

I think, in order to believe that our currently trajectory with these kinds of AIs will lead to AGI, you have to believe that intelligence will simply emerge once we have enough processing power. That all problem-solving comes back to one type of problem-solver.

With cog neuro constantly advancing, it seems to me like that's getting less and less likely.

Expand full comment

Yeah, I'm not too opposed to the idea of all problem solving coming back to one type of problem solver or that there is an elegant algorithm for AGI that is yet to be discovered. What I just do know is that the algorithm behind GPT3/4 is not that.

Expand full comment

To me, it's like we don't really understand how humans problem-solve. Except it's something to do with rapidly being able to discount non-productive avenues of investigation, and we don't know how the brain does this. And then it's like we think we can just data-crunch our way through as an alternative. I mean, it's worth a try but I'm not hopeful.

I think the best thing that might come from the whole AI debacle is that we realise that the human brain is somehow doing some pretty amazing stuff and we need to study it more.

Expand full comment
Mar 17, 2023·edited Mar 17, 2023

Lots of opinions in this comment thread, no references to current research or arguments beyond your individual perception. Honestly, I don't think we have studied these *transformer* models (LLM will soon be a misnomer or is already) for long enough after the have multimodal capabilities. It seems to improve generalization (anyone expecting perfect performance at this stage in the game may be jumping the gun), see Google's new PaLM-e for instance. Don't discount transformer models yet. Not saying a future algorithm might not do the job better, but I sincerely think this is the right track. With most new technologies, when introduced, people often dismiss the first publicly hyped products, only for the product to mature and do exactly what was advertised on the tin a few years later. I recognize this is somewhat different, but I still see a similar thing going on here. The differences between a human mind and an LLM, and the way it learns, seem to me to not be indicative of a lack of generalization or abstraction but merely an effect of single-modality inputs and lack of real world experience. More of a lack of quality rather than a lack of intellectual quality than substance. I think it is a false assumption that the current limits of the current technology indicates a "brick wall" rather than a continued progression. I don't see evidence of a "brick wall", but more of a viral public doubt solely springing up as a counter-effect of the wave of AI optimism. So far, none of the "Transformer models can't have real intelligence, it's just statistics" arguments have been any more convincing than the continuing results seen from parameter scaling, as predicted.

Expand full comment

You're saying that learning to fake maturity is a necessary step to attaining maturity? I don't buy that.

Expand full comment

I cannot proof either. I'm not an AI expert, or cognitive psychologist. But pulling from the only other advance intelligence in this planet I can pull some hypothesis.

Expand full comment
Mar 15, 2023Liked by Gary Marcus

If I had any artistic ability at all, I’d draw the following to try to help people understand the progress that’s been made here:

GPT-3 a blind folded person throwing a single dart at a dartboard

GPT-4 a blind folded person throwing two handfuls of darts at a dartboard

Expand full comment

That's what Midjourney is for. :)

Expand full comment

As a software developer, I have seen GPT to be very helpful with code generation. It is already useful today, in my daily work. To expand on your analogy of throwing darts: imagine it's thrown darts are added to my throws. I don't need it to hit the bulls eye everytime, i just need it to be helpful.

Expand full comment
Mar 15, 2023Liked by Gary Marcus

Wouldn't "confabulation" be a more appropriate word than "hallucination"?

Expand full comment
author

Yes; I should probably switch

Expand full comment

Why bullshiting is better than lying for AI 'mistruthing':

"It is impossible for someone to lie unless he thinks he knows the truth. Producing bullshit requires no such conviction. A person who lies is thereby responding to the truth, and he is to that extent respectful of it. When an honest man speaks, says only what he believes to be true; and for the liar, it is correspondingly indispensable that he considers his statements to be false. For the bullshitter, however, all these bets are off: he is neither on the side of the true nor on the side of the false. His eye is not on the facts at all, as the eyes of the honest man and of the liar are, except insofar as they may be pertinent to his interest in getting away with what he says. He does not care whether the things he says describe reality correctly. He just picks them out, or makes them up, to suit his purpose."

From the essay "On Bullshit" by Harry Frankfurt.

Expand full comment

The technical term is "bullshiting".

From Wikipedia:

... the term "bullshit" is sometimes used to specifically refer to statements produced without particular concern of truth, to distinguish from a deliberate, manipulative lie intended to subvert the truth.

Expand full comment

I think "hallucination" is preferred over "confabulation", or just "lying", because it reflects a lack of intent.

Expand full comment

Confabulation seems much more precise. The stories I'm getting from ChatGPT have local coherence but are globally incoherent, like the stuff a bullshit artist might make up out of a few facts at hand. This doesn't match my understanding of hallucination.

Expand full comment

Confabulation and bullshit implies agency. IMHO, that ChatGPT has no agency is one of the most important things that the public needs to know about this technology. What is it about hallucination that bothers you? I'm not sure we're ever going to find the right word for this as our words all refer to human behavior but, so far, hallucination seems the closest.

Expand full comment
Mar 15, 2023Liked by Gary Marcus

"...it does astonishingly well on a whole bunch of standardized tests, like LSATs, GREs, and SATs."

Does it really? I asked it to generate ten sample SAT questions with answers. The first three looked fine. The fourth was "If 4x + 3y = 18 and 2x - y = 4, what is the value of x? A) 1, B) 2, C) 3, D) 4. Answer: B) 2." That's wrong, the answer is 3.

As my next question in the same session, I asked it "If 4x + 3y = 18 and 2x - y = 4, what are the values of x and y? Explain your answer." It came up with the correct answer of 3, and showed a reasonable method of solving it.

In one sense ("given a pile of multiple choice questions, does it choose the correct answers?") it does do astonishingly well. But if we step back and consider what those kinds of SAT questions are attempting to measure, which I assume to be reasoning ability, examples like the above seem to indicate that it doesn't have that,[1] is doing well on the SAT though some different means, and the SAT questions we've been giving it have been doing a pretty poor job of measuring the abilities we expect the SAT to measure. In that sense, I find "does well on the SAT" to be incredibly misleading.

I think there's a lot of confusion (much of it deliberate) about what ChatGPT is doing and where this can be useful.

[1]: This is not a cherry-picked example; I see examples of obvious failures to reason all the time from ChatGPT, and not just in math. For example, making up URLs as references that don't lead to a page and, after some investigation, appear to be not links that were there and were taken down, but links that never served anything but a 404 Page Not Found.

Expand full comment
author

Excellent observation

Expand full comment

It seems to me that the performance at tests such as GRE or LSAT is mostly for publicity and hype purposes.

But there are other tests far more relevant for areas such as commonsense reasoning where interesting results are reported; for instance, in the WinoGrande test (involving pronoun resolution) we can see a big improvement in GPT-4 compared with both previous GPT versions and also other systems like Chinchilla.

Commonsense reasoning improvements won't make big headlines, but are real progress nonetheless.

Expand full comment
author

Their sentence there is slippery. It’s not even in top 10 of Winogrande leaderboard, afaik

Expand full comment

As usual, Ian Bogost ("Is This the Singularity for Standardized Tests?" _The Atlantic_ 2023-03-21 [1]) does a great job of explaining it:

"But why should it be startling in the slightest that software trained on the entire text of the internet performs well on standardized exams? AI can instantly run what amounts to an open-book test on any subject through statistical analysis and regression. Indeed, that anyone is surprised at all by this success suggests that people tend to get confused about what it means when computers prove effective at human activities."

[1]: https://www.theatlantic.com/technology/archive/2023/03/open-ai-gpt4-standardized-tests-sat-ap-exams/673458/

Expand full comment

The math failures are good examples of the fundamental limitations of LLMs + RLHF. If the system was in any way intelligent, it would have developed a subroutine for these calculations. I mean, it's doing seemingly more impressive things with code than a mere calculator. Of course, given the language modeling paradigm with its primitive optimization function, it's unsurprising that it fails to do it. But more importantly, no amount of parametric upscaling or additional GPUs will change that. The only thing that will change that is a paradigm shift.

Expand full comment

Perhaps this also points to flaws in how the SAT is designed and what it is supposed to measure?

Expand full comment

In a way, yes, and in a way, no. The SAT is certainly flawed, and these flaws are fairly well known and well discussed. But it is also a useful tool when used for its original intent, and is not truly terrible in the way that some other tests are.

I think that the problems with using it to "test" ChatGPT is as much or more about it simply being the wrong thing for that domain: just because it's a decent test for humans does not mean that it's going to be any good at, or even able, to test for the same thing in a machine-learning model.

Expand full comment

Well written. It is not AGI. It’s a bigger and better version of predicting “what comes next”. Nevertheless, it is paving path to discussions about security, alignment and others. Thank you for sharing your insights.

Expand full comment

I cross posted this Gary. I thought it was a good assessment of strengths and weaknesses. I also agree it is not a path to AGI

Expand full comment
author

Thanks, debate friend :)

Expand full comment
Mar 15, 2023Liked by Gary Marcus

Sir: screenshot of chatGPT-4? .... that mentions chatGPT-5??....

Thank you. You are above all AI experts so far in advancing a call to reject these toys as irrelevant, useless and plain primitive contraptions far removed from our Thoughts, Self-Awareness and unitary Consciousness.

But to me these algorithmic engines are a useful tool to exercise my curiosity and hopefully increase my utility as a thinking, helpful citizen of the World

Expand full comment

My assessment is that, in the ML world, GPT-N are amazing achievements; in the AGI world, however, they are dangerous lowest-common-denominator low-hanging fruit.

Expand full comment
author

Exactly

Expand full comment

I'm not sure I'd even agree with "lowest-common-denominator low-hanging fruit." I am not going to go so far to say that the work on ChatGPT is useless as far as AGI is concerned, but I am not seeing any plausible argument that this is a step towards artificial intelligence. Whether ChatGPT's answers are correct or not seems basically random, depending on how correct the training data is and how well the question aligns with the training data. How does this help us?

I find the argument that we can layer more and more hacks and filtering on top of such a system to fix it to be extremely implausible; in my experience (with computer programs, data sets and many other systems) this works for only a little while (when it works at all) and reaches the point of being ineffective long before the problems go away. (Perhaps that's what you mean by "low-hanging fruit.")

Expand full comment

You are right, GPT-4 is not a step towards true AGI –but it could be very useful for practical tasks, as long as they are not high-stakes ones.

The problem with OpenAI is that the quest for AGI is in their mission statement, so they almost have to pretend they are approaching AGI with each step.

Expand full comment

Transformers, however, may well turn out to be a useful component of a neurosymbolic AGI.

Expand full comment

Moreover they are already a useful component of vision systems, and new multistage ML architectures.

Expand full comment
Mar 15, 2023·edited Mar 15, 2023

By "lowest-common-denominator low-hanging fruit" I mean that this is basically what you end up with if you (or your financial backers - same thing) are (to all intents and purposes) in a profit-motivated race for AGI (because whoever ultimately controls the IP underlying AGI owns the future means of production, which is worth trillions). Rather than do the actual work, and solve the really hard problems, which would take decades (to the great displeasure of your financial backers), you instead simply hop from one piece of low-hanging technological fruit to the next (anything that, given sufficient $$$, you can push out in 3-18 months, generates high-profile papers and press releases, stimulates maximum media hype (the Eliza Effect on crack), and most importantly secures the next tranche of investment). This pattern of "research", with a very short (in AGI terms) event horizon, is realistically never going to solve the real problems underlying AGI (which are infinitely more complex than simply pushing data scraped off the internet through a neural net model that can be described in 6 pages), and is instead much more likely to take you down a dead-end path towards misaligned (and thereby dangerous) Artificial Stupidity, which, in addition to inflicting massive (and entirely foreseeable) societal harm at global scale over the next couple of decades, will nevertheless generate billions in short-term revenue, and keep those investment dollars coming. Or am I being too harsh...?

Expand full comment
Mar 15, 2023·edited Mar 15, 2023

You're being harsh, but I don't think *too* harsh; I'm basically on the same page as you here.

I do wonder whether "Artificial Stupidity" can be "misaligned," though. The goal of ChatGPT seems to be to produce convincing answers whether they are correct or not. Is this "misalignment," when the "massive (and entirely foreseeable) societal harm at global scale" is really coming out of the goals of (a subset of) humans, rather than a different intelligence with its *own* goals?

Expand full comment
Mar 15, 2023·edited Mar 15, 2023

Thanks!

The two qualities (intelligence and objective) are orthogonal. So on the one hand an AGI's problem-solving capabilities may be poor, reasonable, satisfactory, excellent, super-human etc, while on the other the final goal towards which it strives (via its problem-solving abilities) may be anything at all, and therefore aligned or mis-aligned with human preferences to any degree.

ChatGPT's "core" goal (i.e. GPT-3's core goal) is simply to generate a statistical continuation (relative to its training corpus) of the input prompt (where the input prompt represents the *user's* goal -- which is of course not necessarily aligned with the best long-term interests of mankind). This combined "GPT-3 + user"'s goal, however, is then modified/contained to some extent by the layer of RLHF that has been applied by OpenAI as a crude attempt at mitigating the combined "GPT-3 + user"'s worst instincts.

Expand full comment

"final goals towards which it strives"

What does it mean for an inanimate, non-sentient, agency-free, non-volitional entity to "strive", or to have "goals"? This whole "alignment" thing strikes me as people letting their anthropomorphizing get the upper hand over their reasoning

Expand full comment
Mar 15, 2023·edited Mar 15, 2023

I know, it seems ridiculous. How can a mindless automaton such as a CPU exhibit such high-level behaviour as "striving towards a goal"? Nevertheless, it is plausibly possible to construct many layers of increasingly complex software-implemented behaviour until the system in question genuinely does "strive towards a goal". Please see e.g. (a) https://cambridgeaisocial.org/assets/Slow_down__A_tutorial_on_and_summary_of_the_current_state_of_AI.pdf, followed by (b) https://bigmother.ai/resources/The_BigMother_Manifesto___summary-LATEST-DRAFT.pdf (apologies, the latter is unfinished, but the earlier sections should answer your question).

Expand full comment

"Whether ChatGPT's answers are correct or not seems basically random, depending on how correct the training data is and how well the question aligns with the training data."

Unlike humans, who, regardless of their education or upbringing, will reliable provide the correct answer to any question presented to them? Also, how can you say it is random and then explain the deterministic process in the same sentence?

Expand full comment

Humans will at least tend to present *consistent* answers when truly presenting their beliefs, and try to justify inconsistencies. ChatGPT does not at all; it will flip-flop all over the place, often even within a single response, for questions where simple logic shows that not all those answers can be correct. The 1973 film _The Sting_ cannot have five and six and seven acts. An intelligent entity would eventually settle on an answer, or "I'm not sure" with a set of answers with reasoning supporting each one. ChatGPT doesn't do that at all; it just again and again comes back with "That previous answer was wrong; here's the correct one," often going back to previous incorrect answers.

I was using "random" in the sense of "so hard to predict that it's effectively random to me," of course. If you're saying it's pseudorandom, sure, I'll agree that that's highly likely.

Expand full comment

Can anybody claim to know what the "right" thing to be doing is for advancing AGI? If not, it seems rather petty to criticize efforts as being the wrong thing to do

Expand full comment
Mar 16, 2023·edited Mar 16, 2023

I don't see why it's petty. If someone's claiming that "X will get us to AGI" and it seems unlikely to do so, you do not need to know a better path to call them out on it.

I may not know how to make it rain right now, but I can still be correct that someone urinating down my leg is not on the path to making it rain, and having ten people urinate down my leg in parallel is not going to bring us closer to making it rain.

I have not seen any good argument that ChatGPT is reasoning and plenty of evidence that seems to indicate it's not. And I've not seen a good argument that reason isn't necessary for AGI, or even basic intelligence. And the attempts to make ChatGPT not do certain things are clearly hacks; the extent to which jailbreaks of those attempts are successful shows that they're a patch on the surface trying to cover a deeper problem that's not yet been properly addressed (and may not be possible to address). That leads me to believe that, while what we learned while building ChatGPT may be useful information on the path to AGI, ChatGPT itself is not going to get anywhere near there.

A (very rough) parallel might be someone trying to build a supersonic aircraft with a straight wing. Improving the straight wing to take it from Mach 0.1 to 0.2 may look like progress, but it's actually no progress it all, because you're going to need a swept or delta wing before you can actually fly supersonic.

Expand full comment
Mar 16, 2023·edited Mar 16, 2023

Again, I don't hear OpenAI talking about closing in on AGI. But even if they were, I don't think we know, for a fact, that their current approach is any less fruitful than any other extant approach, since no one seems to be making great leaps forward.

I certainly agree that GPT isn't doing reasoning, and if we stipulate that reasoning is a necessary element of "intelligence" then the GPT approach is clearly not sufficient, but that doesn't mean it might not solve one or more significant pieces of a larger puzzle (though I'm not saying we know it to be necessary for that, either, but right now it's something we have). As for the hacking (which, btw, I totally agree IS hacking), I see that as completely orthogonal to intelligence. They're mostly trying to stop it from "do"ing things that any one of millions of humans that no one would argue weren't "intelligent" might do. They're just things that we find distasteful about (some) humans that we don't like our noses rubbed in by an AI.

"... what we learned while building ChatGPT may be useful information on the path to AGI..." -- is that not significant value in and of itself? No project the magnitude of AGI is going to be a linear path, with no side branches or backtracking. Building on your analogy, there're a lot more to practical supersonic flight than wing shape, and you can learn a lot of important stuff playing around at lower air speeds. "Progress" isn't a scalar quantity, it is a vector

Expand full comment

I'm in general agreement with everything you say in that comment; my complaint is more about the apparent expectations of investors and the general public about where this might go and how soon.

From my point of view, all the recent advances, while impressive, still leave us in the same place as we've been for literally more than half a century: "We have lots of large claims about when the big advances are going to come, and still have no idea when, or even if, we might start approaching something resembling AGI, especially for production use."

Remember all the hype about fully autonomous vehicles in the previous decade? As we came into the 2020s there was no big celebration of the 25th anniversary of the first fully autonomous U.S. coast-to-coast drive, probably because bringing that up would not exactly be great for investor confidence in how fast we're moving on this stuff.

Expand full comment
Mar 15, 2023·edited Mar 15, 2023

Of course, anybody can be an armchair AGI researcher. However, some people genuinely *are* AGI researchers! As of 2020, there were about 70 ongoing AGI projects in the world [https://gcrinstitute.org/papers/055_agi-2020.pdf]. For example, please have a look at (a) https://cambridgeaisocial.org/assets/Slow_down__A_tutorial_on_and_summary_of_the_current_state_of_AI.pdf, and (b) https://bigmother.ai (full disclosure, I am the author of both!)

Expand full comment

Of course there are AGI researchers. But how many of them are so convinced they've found the One True Way that they'd characterize alternate approaches as wrong?

Expand full comment

Probably all of them. By definition, if they believed there was a better way than their own approach, then they would pivot towards the better way!

AGI has, in my estimation, a 30+ year postgraduate learning curve. Most AGI researchers, and certainly most AI researchers, are still exploring their own personal learning journeys. I don't believe there's anything wrong with that. Eventually, given sufficient time, the genuinely stronger AGI designs will emerge as superior to the rest. It will just take a few decades!

Expand full comment

Given the lack of concrete successes, I would hope for more humility than that, but you're much closer to the "ground truth". Unfortunate

Expand full comment
Mar 15, 2023·edited Mar 15, 2023

I'm not sure I fully understand what you mean ("Unfortunate"...?)

Nevertheless, *true* AGI is a cathedral project. However smart you are (and the guys at OpenAI et al are all very, very smart), however much money you have (even billions), it will take many decades (and multiple lifetimes) to design, refine, and implement. There's simply no way of escaping that - if you want true AGI, you have to do the work!

Thus the "lack of concrete successes" to date is only to be expected. To paraphrase the movie Apollo 13, we're on step 8 of 1000.

Expand full comment

I think that’s right in that it is it’s own thing, not a path to AGI. I’d love to see a long list of use cases that ChatGPT is good for. I think there are many. I tend to think of it as having a brilliant but flawed friend helping me with my work. In some roles the brilliance is dominant. In others the flaws.

Expand full comment

It's failure to improve on the AP English scores of GPT-3.5 (2 out of 5) also indicates a lack of true generalized intelligence. While most standardized tests ask students to answer questions which already have answers, the AP English tests asks students to read a selection of sources and form an original argument that creatively and analytically incorporates material from the sources.

Other tests have similar style questions, but because they test a student's knowledge of a subject matter, they ask questions related to highly researched areas in the field, and provide sources that scholars have thoroughly analyzed. The AP US History exam, for example, might ask a student to write about the most important causes of the American War for Independence, a subject for which there are thousands of books, articles, and publicly available lectures. It will use documents, like the Declaration of Independence or speeches from figures such as Thomas Jefferson, that frequently appear in secondary texts. GPT-4 can easily pastiche and paraphrase what scholars in the field have written to produce what would be an impressive looking essay for a 17 year old.

The English Exams, however, are not testing mastery of a subject matter, but skillfulness in synthesizing, analyzing, and creatively interpreting information. They can ask questions and provide sources which haven't been picked over by scholars. In other words, they can test the ability to truly think creatively and analytically in the face of new information or questions.

Expand full comment

"GPT-4 seems to know little of 2024" - so true!

Expand full comment
author

Thanks! Fixed! (2023)

Expand full comment

My quip about GPT-3 when talking to non-AI people was that "it's great at the stuff GPT-2 was good at, and bad at the stuff GPT-2 was bad at". I suspect I will be refurbishing that one-liner.

Expand full comment

Does anyone else find the limitations of the LLMs, and therefore (hopefully) deep learning generally, quite relieving? We are not ready for AGI, indeed my hope is it is at least a century away, so this potentially being a false dawn in the quest for AI makes me a lot more optimistic for humanity's future.

Expand full comment

Interesting. It seems to me that the widespread realization that deep learning is totally irrelevant to solving AGI would in fact accelerate its arrival. I mean, it would cause many to abandon DL and work on other promising approaches.

Should not AGI pessimists instead praise DL and LLMs as much as they can? :-D

Expand full comment

Haha quite possibly - brb am going to cheer DL on twitter!

Expand full comment

OpenAI says plainly that ChatGPT is an augmentation of its users. When people speak to others to clarify their own thinking, by using them as a sounding board, they don't expect flawless thinking from the others. In fact errors in others can enhance that clarifying process ( Psalms 8:2, "Out of the mouth of babes and sucklings hast thou ordained strength" ). However, hostile intent on the part of ChatGPT users yields little but amplified hostility. Back in the day, occasionally software developers would say: I can make it fool proof, but I can't make it damn fool proof.

As Cassius said to Brutus (in Julius Caesar) Men at some time are masters of their fates: The fault, dear Brutus, is not in our stars, But in ourselves, that we are underlings.

Expand full comment

Me >> Can ChatGPT be likened to babes and sucklings?

ChatGPT >> In the context of Psalm 8:2, "babes and sucklings" refers to young and inexperienced children who are not yet able to fully articulate their thoughts and ideas. As an artificial intelligence language model, ChatGPT is not a human child and does not have the same limitations or developmental stages.

However, in a broader sense, ChatGPT can be compared to a child in the sense that it is still a relatively new technology that is constantly learning and growing. It is still in the early stages of development and has not yet reached its full potential. In this sense, it is like a child that is still growing and developing.

Moreover, ChatGPT's ability to process and generate language is a powerful tool that can be used to convey thoughts and ideas in a way that may be considered "strong" or impactful, similar to how the verse suggests that God can use the words of babes and sucklings to establish strength against His enemies.

Expand full comment

https://jamesclear.com/all-models-are-wrong

"In 1976, a British statistician named George Box wrote the famous line, “All models are wrong, some are useful.”

His point was that we should focus more on whether something can be applied to everyday life in a useful manner rather than debating endlessly if an answer is correct in all cases. As historian Yuval Noah Harari puts it, “Scientists generally agree that no theory is 100 percent correct. Thus, the real test of knowledge is not truth, but utility. Science gives us power. The more useful that power, the better the science.” "

Its likely this is a detour on the road to AGI: so what? Some can pursue AGI while others create useful tools. People build tools on current technology in general even while others work to improve technology.

You note: "solve any of the core problems of truthfulness and reliability". Humans aren't entirely truthful or reliable and yet they are sometimes useful. There are concerns over a replication crisis even in the world of science and flaws noted in the peer review process. Humans are still trying to figure out the best approach to collectively seek reliable information while seeking "truth". Humans don't always agree on results using a judicial process to seek "truth".

Humans in general often don't agree on what is truthful or reliable so putting that as a necessary hurdle to achieve is setting an impossible goal and possibly attaching a constraint that would also detour from the path towards AGI.

In the meantime: people need to grasp that machines can be fallible just like humans. They can compare human sources of information, machine sources, etc. Machines can aid with that process. Yes: tools and methods should be created acknowledging the reality of potential harms, just as people do already regarding other technology. People create anti-virus software and spam filters, etc.

The tech is invented by people trying to solve real world problems: regulators don't invent the tech and usually merely distract from the problem and can in fact detract from it. Regulatory capture often leads to big players shutting out competitors so despite myths, often big players want regulation. Unfortunately some humans don't try to find reliable information on all aspects of subjects they write about.

Admittedly of course increasing reliability and accuracy regarding reality is a goal to strive for since we'd like to improve on human reasoning: and to allow for instance humility for them to consider the reality they may be wrong or unreliable due to flaws in the world or themselves, just as humans should spend more time on, especially those who are listened to by the public.

You noted that a prior comment I had was "condescending": but I'd suggest the reaction is due to the one many have that your comments are implicitly condescending in the sense of not actually considering or addressing anything other than strawman versions of critiques against your writings.

Expand full comment

I often think of human intelligence as having a pre-conscious thought generator to basically come up with ideas that might be bulls***. This is what we observe in mindfulness meditation as the monkey mind, or what some people have called "babel". But we also have an editor or curator who looks at the stream of babel and selects those ideas which make sense in light of a broad cognitive model of reality.

It seems to me like the current generative AI models do a pretty good job of brainstorming and remixing stuff to produce some raw idea material but they totally lack the editor/curator functionality of human intelligence. When using Bing chat, the human user is essentially playing the role of editor to the babel produced by the model.

I would like to understand more why the second part is so difficult to build into the systems themselves. It seems these companies take a very ad hoc approach with the alignment stuff, which is essentially a kind of dumb editor that is slapped on after the fact, and is not integrated in an intelligent way.

I know this is an unsolved problem, but I'm curious what are the current best ideas for how we might build such a combination.

Expand full comment

"we also have an editor or curator who looks at the [unconscious] stream". Hmmm. So who is watching over that little homunculus? It's mother? Any such supposition remains within the realm of old-school philosophy.

Expand full comment

I'm not sure what you mean when you say somebody would need to watch over the editor or the curator? I'm talking about different cognitive functions here, they just do their thing no supervisor is needed. That said, there may be additional layers of metacognition in humans. I'm not claiming that there is exactly two layers. It could well be that there are cognitive functions that monitor and fine tune are basic thought editor. I'm not a cognitive scientist, but I have read a lot of it and this sounds like the sort of thing they talk about.

It seems to me your use of the term homunculus here is entirely rhetorical. You could say that about any theory which posits distinct cognitive modules that do stuff, which is pretty much all of modern cognitive science.

Expand full comment

I'm arguing that you're invoking a Cartesian theatre explanation for human consciousness. I'm not a fan of such dualism, preferring emergent theories. See https://en.wikipedia.org/wiki/Homunculus_argument for more details of the fallacy you're trying to use.

Expand full comment

Intrusive thoughts and schizophrenia are good evidence something like Avi is describing is actually happening.

Expand full comment

Nope. The real world (whatever that might mean) doesn't exist inside our heads. These purely subjective observations are only evidence for the brain being a hallucination/simulation generating machine, from which our consciousness arises as the 'best fit' to the current sensory inputs.

Expand full comment

Inspired by David Deutsch's work, I asked ChatGPT if its hallucinations are a bug or a feature in pursuit of AGI. Who cares about the truth? No one, apparently. Are we deploying misinformation at scale in pursuit of AGI? Yes, according to ChatGPT4: https://twitter.com/AAlmeidaAuthor/status/1635919608338317312

Expand full comment
Mar 15, 2023·edited Mar 15, 2023

It seems to me that AI is confronting us not with an issue of information complexity but with an issue of resilience. Scaling is not and has never something humans in general had a problem with. We adjust to scale EXTREMELY well.

Similarly, I don't think the issue with AI is its inability to correctly process information. It will always be flawed, like any artificial tool. The greater threat, in my opinion, lurks in this topic's slipstream. As AI becomes more and more reliable and better at returning believable information, humans get more reliant on it. For as well as we handle scale by developing tools to deal with it, we're all the more ready to apply these tools at any scale once we have them. AI, while useful, is just one more temptation (I realize I'm sounding vaguely biblical or at least preachy here - not my intention at all!) to let the already severely lacking level of media literacy slack off further.

I'm observing this in myself regarding creative pursuits. Writing a story with AI assistance is easy, and I actually have to force myself to write without it by now. It's becoming a real fight to resist the temptation, and while the AI comes up with some great ideas, its output sometimes lacks the direction and "foresight" that a story written by a human author (who, after all, tries to get to a point of some sort eventually) would have. But the only way to recreate that direction is to take the lead yourself, and that requires actually walking the walk while the AI seductively offers you a comfy seat in the back of its metaphorical taxicab.

Expand full comment

I struggled to use ChatGPT to generate an outline for a story, because it kept falling into genre cliches no matter how many times I regenerated the output telling it NOT to use those specific cliches, even writing outright contradictory sentences… I had to do such an insane amount of editing to make the story make sense (often by telling the AI to change some specific thing, sometimes even what wording to use) that it seemed dubious whether that was even better than writing it myself. Generative AI is pretty impressive in many respects, but in my experience whenever I try to make it make something with a bit more of a unique viewpoint or design (both for text and visual generative AI) it fails pretty badly, probably due to the fact that the way it's trained is literally to make things that are predictable. It's easy to use AI to generate something that basically already exists, but as soon as you try to make it write or make an image in a style that is a bit more rare, I find it starts to get better results to just do it manually.

Expand full comment

"It will always be flawed, like any artificial tool."

I suppose one can have various definitions of "flawed," but in terms of "always produces correct answers, when it produces an answer," no, not all our computational tools are flawed. Proof assistants (also known as interactive theorem provers) do reliably produce correct answers in the (ever-expanding) areas where they can produce an answer. (And as a corollary, reliably check their suggestions when they suggest going off in one direction or another to try to find a proof.)

Many years ago these kinds of programs were considered AI. They've since grown vastly more capable than their early versions, and at the same time are no longer considered AI. There's a moral in there somewhere....

Expand full comment