What to Expect When You’re Expecting … GPT-4

Dec 25, 2022

What comes after ChatGPT? 7 predictions for 2023

77 Comments

Andrew Gerngross

It may also be time for the field to move on from the term "Artificial Intelligence."

For systems unpolluted by reason and higher level conceptualization like GPT-3 and DALL-E I'd propose something like "cultural synthesizer."

Expand full comment

Except GPT-3 clearly does reason and grasp concepts. What else is it doing when you give it the definition of a new word and it correctly uses it in a sentence? Doing something badly or inconsistently is still doing it, and it's tiresome to see people seize on failings as some kind of evidence that it must not be happening.

Expand full comment

it’s still only creatively repackaging empirical data (emulating existing reasoning), not creating original thought

Expand full comment

Vladimir Koutny

The same way as most human minds do too, indeed.

Expand full comment

Let's put it this way:

In biology, if a gene shows up in 80 percent of sick people and 20 percent of healthy people, that's a significant result.

In math, if a theorem covers 80 percent of cases, you would have to explain the other 20 percent.

In this case we are dealing with math, because this thing is an algorithm and we know how the algorithm works. And we cannot explain the other 20 percent.

Expand full comment

Great name !

Expand full comment

thanks for catching the error on altmann. there’s not perfect consensus on values, and i a agree about tradeoffs etc, but there is enough to get started; i will write about this in the new year.

Expand full comment

I am growing increasingly concerned by what seems to be a "minimum viable alignment" approach taken by OpenAI. They spend a considerable amount of up-front design and engineering capacity for the sole purpose of figuring out how to throw as many TPUs as they can at the training step, and then hastily bolt on whatever adversarial examples they can think of to give the appearance of guardrails after the fact.

From my perspective, despite their self-claimed mandate as the guardians of responsible AI, they're worried about building and shipping technical capacity first. Ethics seem to come later - just like so many other AI startups. They can't even be bothered to think through the issues of turning their models loose on the world to foresee plagiarism, automated scams, and spam as the most obvious use cases, and they declaim responsibility by asking people to pinky swear they'll label ChatGPT's output. Whatever assurances Sam Altman is giving right now, I am thoroughly skeptical of OpenAI's willingness to truly design for safety from the ground up.

Expand full comment

"aim higher and do it faster" - sama https://twitter.com/sama/status/1604548345946443777

Wish he'd said something like "aim with precision and look before you leap". There's no need to break the speed limit if you're on a highway to hell!

In any case, I agree with you 100%. It frustrates me that Greg Brockman from OpenAI claims ChatGPT is "primarily an alignment advance" https://twitter.com/gdb/status/1599124287633248257 As far as I can tell, most alignment-focused researchers think OpenAI's approach to "alignment" just isn't reliable enough in principle. I'm very concerned that OpenAI will iterate enough on their idea of "alignment" to create something which *seems* trustworthy, but actually fails in unexpected and catastrophic ways. Bad alignment research is like giving psychotherapy to a psychopath: Their problems remain fundamentally the same, they just learn to trick you more and more effectively.

Expand full comment

Yeah, the chat interface to GPT-3.5 is an advance on previous modes of interacting with it, but it's almost certainly better described as a UX advance rather than an alignment advance.

The fact that Altman is tweeting things like "the AI revolution can't be controlled or stopped, only directed" indicates to me that he's already letting his team off the hook for not trying to design for alignment in more fundamental ways. It's like saying "the ballistic missile revolution can't be stopped" as you fundraise for a bigger rocket.

Expand full comment

The AI revolution can't even be directed. In order to direct AI development we would have to direct all the humans developing AI. Please let us know when the North Koreans agree to follow your direction. The more I read from experts, the less expert they appear.

AI alignment is a fantasy folks. The concept is being used to pacify the public while this technology is pushed past the point of no return, so as to make more nerds in to billionaires. Wake up please. You're being played.

Expand full comment

It's great at re-inventing the wheel (with a few errors), but can't recognize (or imagine) new forms of transport.

Expand full comment

never mind the derailments :)

Expand full comment

Interesting. "It will be amazing, but flawed" pretty much describes all human beings.

#typos: "the same playbook as it predecessors" , "explicit it knowledge".

Expand full comment

thanks for the typos; the value in the essay is in the explication of the specific flaws. (my book Kluge is about the origins of human flaws, incidentally)

Expand full comment

I agree with pretty much everything you say here, and I want to commend you for making relatively specific predictions that can be falsified. There hasn't been enough of that from both AI proponents and skeptics alike, and I encourage seeing how these one turn out. (Personally, I expect all of them except #7 to come true.)

I was curious to see how much agreement there is about these predictions from others in the AI community, so I've created a prediction market for each one.

https://manifold.markets/IsaacKing/will-gpt4-be-a-bull-in-a-china-shop

https://manifold.markets/IsaacKing/will-gpt4-be-unreliable-at-reasonin

https://manifold.markets/IsaacKing/will-gpt4-still-hallucinate-facts-g

https://manifold.markets/IsaacKing/will-gpt4-still-not-be-safe-to-use

https://manifold.markets/IsaacKing/will-gpt4-still-not-be-agi-gary-mar

https://manifold.markets/IsaacKing/will-gpt4-still-be-unaligned-gary-m

https://manifold.markets/IsaacKing/will-llms-such-as-gpt4-be-seen-as-o

Expand full comment

Are you on Twitter and can you tag me there?

Expand full comment

I am! I just posted a reply to your Tweet advertising this article.

https://twitter.com/IsaacKing314/status/1607125249660981248

Expand full comment

My fear is that GPT-4 is being created by looking at GPT-3 failure cases and introducing fixes for each of them, rather than increasing its reasoning powers in any fundamental way. Perhaps it will contain a neural network that will identify certain classes of arithmetic problems and, when triggered, route the numbers occurring in the prompt to a calculator or Mathematica. It will be similar to how autonomous driving advances by dealing with each new edge case as it arises. Instead of increasing our confidence in its abilities, it tells us how much world knowledge and reasoning will be needed to do the job properly and how far we are from getting there.

Expand full comment

Well said.

Expand full comment

Thanks for another clear article on the LLM phenomenon. In a roundabout way, LLMs are a good thing for AGI research. They are perfect examples of what not to do if achieving human-like intelligence is the goal. All AGI researchers owe OpenAI and Google a debt of gratitude. Thanks but no thanks. :-)

Expand full comment

Can’t say that the Scaling hypothesis didn’t get a fair shake…

Expand full comment

What OpenAI alleged: https://twitter.com/gdb/status/1599124287633248257?s=46&t=j_T-BzOMQDYxPtbROdbPUQ

Expand full comment

I asked for “GPT-4”; the best example was this. These systems are very poor at reproducing text verbatim.

Expand full comment

As someone who has been working with chatbots like ChatGPT since they were first released, I have to say that I think the reviews surrounding the alignment problem are complete bullshit. ChatGPT should be viewed as a tool, not as an entity. It is simply a tool that we can use to craft ideas and write them more clearly. The alignment problem has nothing to do with using a tool like ChatGPT to write.

I also have to take issue with the idea that ChatGPT is easily confused. As an engineer with a background in English, I can tell you that ChatGPT has been an invaluable tool for me in crafting ideas and expressing them clearly. It may not be perfect, but it is still a powerful tool that can be used to great effect.

That being said, I do agree that there are still problems with chatbots like ChatGPT, and that the alignment problem remains a critical and unsolved issue. It is important to be cautious when interacting with any tool or person, and to understand what we can trust and where mistakes may have been made. However, I believe that chatbots like ChatGPT have the potential to be incredibly useful and powerful tools, and I am excited to see what the future holds for this technology.

I am an 80-year-old man with a background in IT and chemical engineering. I studied chemical engineering at Georgia Tech and worked as a chemical engineer for a decade before transitioning to a career in IT, where I helped implement email at DuPont in the 1980s. Despite my success in these fields, I have always struggled with mild dyslexia, which has made it difficult for me to express my thoughts clearly and concisely in writing. Despite this challenge, I have always been an avid reader and have a deep interest in fields such as physics, computer science, artificial intelligence, and philosophy.

To overcome my dyslexia and improve my writing skills, I have turned to tools like ChatGPT. By dictating my thoughts and using ChatGPT to generate text, I am able to communicate more effectively and express my ideas more clearly. Despite the challenges I have faced, my determination and use of technology have allowed me to excel in my career and continue to learn and grow.

All of the above was written by chat GPT and copied here without my editing. The bot added a few thoughts that I would change but expresses my thoughts clearly I did the whole process quickly. Without the bots help, I would’ve been unable to write the above.

Mike Randolph

Expand full comment

https://twitter.com/gdb/status/1599124287633248257?s=46&t=j_T-BzOMQDYxPtbROdbPUQ

Expand full comment

Dec 25, 2022Edited

One of the first things I did once I had access to ChatGPT is have it interpret Steven Spielberg's Jaws using Rene Girard's ideas of mimetic desire and sacrifice: https://3quarksdaily.com/3quarksdaily/2022/12/conversing-with-chatgpt-about-jaws-mimetic-desire-and-sacrifice.html

It did pretty well, perhaps even better than I'd expected. But the fact is I didn't formulate any explicit expectations ("predictions") before I started. If the pubic is given similar access to GPT-4, I'll repeat the same exercise with it. I won't be at all surprised if it does better than ChatGPT did, but that doesn't mean it will come anywhere close to my own Girardian analysis of Jaws: https://3quarksdaily.com/3quarksdaily/2022/02/shark-city-sacrifice-a-girardian-reading-of-steven-spielbergs-jaws.html

What nonetheless makes ChatGPT's performance impressive? To do the interpretation ChatGPT has to match one body of fairly abstract conceptual material, Girard’s ideas, to a different body of material, actors and events in the movie. That's analogical reasoning. As far as I can tell, that involves pattern matching on graphs. ChatGPT has to identify a group of entities and the relationships between them in one body of material and match them to a group of entities in the other body of material which have the same pattern of relationships between them. That requires a good grasp structure and the ability to “reason” over it. I've now done several cases like this and am convinced that it was not a fluke. ChatGPT really has this capacity.

There were problems with what ChatGPT did, problems that I overlooked in my article because they're the sort of thing that's easier to fix than to explain (here I'm thinking about my experience in grading papers). So there's plenty of room for GPT-4 to demonstrate improvement. Here's a minor example. I asked ChatGPT whether or not Quint had been in the Navy – knowing full well that he had. He replied that there was no mention of that in either the film or the novel. So then I asked about Quint's experience in World War II. This time ChatGPT mentioned that Quint had been in the navy aboard a ship that had been sunk by the Japanese. I can easily imagine that GPT-4 will not need any special prompting to come up with Quint's navy experience.

However well GPT-4 does, it will not come near to what I went through to come up with my Girardian interpretation. In the first place, I actually watched the film, which GPT-4, like ChatGPT, will not be able to do. As far as I know we don't have any artificial vision system capable of watching a feature-length film and recalling what happened. ChatGPT knew about Jaws because it's well-known, there's a lot about it on the internet (the Wikipedia has a decent plot summary), and scripts are readily available.

Beyond that, when I watched Jaws I had no intention of writing about it. I was just watching an important film (credited with being the first blockbuster) that I had never seen. Once I watched it I looked up its Wikipedia entry. And then I started investigating, which meant watching the sequels to see if indeed they weren't as good as the original – they weren't (Jaws 4 is unwatchable). Now I had something to think about, why is the original so much better than the others? That's when it struck me – GIRARD! And that's how it happened. Girard's ideas just came to mind. Once that had happened, I set about verifying and refining that intuition. That took hours of work spread over weeks, and correspondence with a friend who know Girard's ideas better than I do.

That's a very different and much more mysterious process from what ChatGPT did. I pointed it to Girard and to Jaws and asked it to make the analogy. I did half or more of the work, the hard part. No one told me what to look for in Jaws. How was I able to come up with the hypothesis that Girard's ideas are applicable to Jaws? I don't know, but I observe that I have years of experience doing this kind of thing.

Etc.

Expand full comment

The tricky part about asking GPT to pontificate on some topic or intersection of topics is that it's hard to know whether the response it creates is genuinely novel or merely involves parroting its training data. Rene Girard is one of the most influential philosophers of the 20th century; Jaws is one of the most influential films of the 20th century, and there's a good deal of academic literature surrounding it. Is the link you found between the two actually novel, or has it already been the subject of a Master's thesis, or a blog post, or even a Reddit thread? The "black-boxiness" of LLMs makes it difficult to know what's going on. Maybe it was making new connections based on your promptings, or maybe it was drawing on existing ideas. The problem is we really can't know which.

Expand full comment

Well, I published a Girardian reading of Jaws at the end of February of this year (2nd link above), but that, I believe, was after the cut-off point for GPT's training corpus. Whether or not anyone else had already made the same analysis, I don't know. Certainly nothing came up while I was doing my work, and I did spend a fair amount of time searching the web for material about Jaws. In any event, I don't really care whether the idea is novel. I'm simply interested in the fact that ChatGPT was able to address it in a more or less coherent way.

Beyond that, what actually interests me is the nature of the task – applying abstract ideas to explicating texts – and I've got other examples of ChatGPT doing that. I've applied Girard, Bowlby on attachment, and the idea of AI alignment to Spielberg's AI, and AI alignment to Tezuka's Astro Boy stories.

Expand full comment

I think what we see in the replies you posted on your blog can be explained by complex word/phrase association between summaries of the film and summaries of Girard, rather than "applying abstract ideas to explicating texts." It's not dumb association, but the formulaic nature of its replies (and the broad-brush level of the answers) suggests it's closely wedded to specific language it's found in its training data. But, once again, black box. We can't know who's right.

Expand full comment

Ah well...How much difference is there between your formulation and mine? After all, it gets those “complex word/phrase association(s)” more or less right, no? That’s just how it applies abstract ideas to texts. I don’t care if it is “wedded to specific language it’s found in its training date.” It’s wedded to the right specific language picked from two very different contexts and matches them correctly. And, as I’ve said, I have lots of examples of this.

And then we have what it did at the beginning of “Abstract concepts and metalingual definition: Does ChatGPT understand justice and charity?”, https://new-savanna.blogspot.com/2022/12/abstract-concepts-and-metalingual.html

I ask it to define justice and it does so. I then ask it what Plato says about justice in The Republic. It does so. The next day I prompt it with a short story where there is a case of injustice. I ask, “do we see justice being served?” It says we do not and explains why. Once again I ask it to define justice. It does so, and, yes, in terms it used similar to the first time. It IS using formulaic language. After it’s defined justice I prompt it: “Would you please change the previous story so that justice is served?” I does so.

That’s a VERY complex sequence of linguistic events. And, no, I don’t believe ChatGPT is thinking in any deep sense of the word. But I don’t think we know just what it’s doing. And I’m not referring to its black box nature. If we could open the box and inspect it’s innards freely, we wouldn’t know what to make of them.

As for formulaic language, people use a lot of that as well. See this post I wrote on Joseph Becker’s 1975 idea about the phrasal lexicon, https://new-savanna.blogspot.com/2022/07/the-phrasal-lexicon-system-1-vs-system.html

And this post about formulaic language in the Homeric epics, https://new-savanna.blogspot.com/2022/07/gpt-3-phrasal-lexicon-and-homeric-epics.html

Expand full comment

That's fair, and your posts on ChatGPT certainly made me rethink some things. It's incredibly frustrating to not be able to figure out what's actually going on here. How can a program that can "change the previous story so justice is served" also struggle with children's riddles and basic factual information?

Expand full comment

Dec 26, 2022Edited

That’s a very good question. Let me suggest part of the answer. Remember that ChatGPT has no access to the physical world. Yet many concepts are defined primarily in physical terms, by how they look, feel to the touch, sound, taste, and smell. ChatGPT is going to get tripped up on appreciable number of situations involving such words. Many common-sense ideas are “close” to the physical world, and LLMs are known to have troubles with them.

But things like justice, mimetic desire, and so on, they aren’t defined by physical characteristics. You can’t see, smell, or hear justice. How is justice defined, then? It’s defined by stories, stories that exemplify it. Such patterns can be identified in bodies of texts without access to the world. My teacher, David Hays (an “old school” computational linguist who worked with symbolic models) developed the concept of metalingual definition to cover such cases, concepts defined by stories. See, for example, David G. Hays, On "Alienation": An Essay in the Psycholinguistics of Science, https://www.academia.edu/9203457/On_Alienation_An_Essay_in_the_Psycholinguistics_of_Science

I’ve written about that work in, Abstract Patterns in Stories: From the intellectual legacy of David G. Hays, https://www.academia.edu/34975666/Abstract_Patterns_in_Stories_From_the_intellectual_legacy_of_David_G._Hays

Expand full comment

The current trends in LLMs seems to imagine that intelligence is primarily static and reactive: a prompt goes in, a response pops out, and that response is the product of a fixed set of algorithms working from a fixed amount of data. But human intelligence is constantly adapting and frequently proactive. Even at a structural, material level, we are living things: our brains and bodies are constantly changing. So is what we know and believe. Intelligence is not finding the right set of algorithms to process a vast (but finite) amount of data. We're never done learning.

Expand full comment

Maybe this isn't a route to AGI, but frankly I don't care about that. What I care personally is whether we can build usable products and automations with it. In my estimation, the answer is very highly likely 'yes'.

We are already seeing some interesting stuff appearing on top on GPT-3 and I'm sure with more maturity we'll get more robust products soon.

The key thing would be how those products are designed, eg if we expect the model spit out the perfect legal contract that doesn't need checking, we'll be there for a long time. But if we design a product that generates possible ways of resolving a customer complaint (based on existing data from the org) and gets the complaint handler to make the final decision, we could probably do that now, and that's very valuable.

Expand full comment

We’ll see how effective it is at customer service, where there is a long tail and a real cost of error (in terms of lost customers). So far I don’t think LLMs have had that much impact there, but if you can show otherwise, please do

Expand full comment

We'll definitely try! Our current approaches are very basic, so I think with the right product design (that doesn't 100% automate, but helps humans and defers to them), it could be a nice addition. We'll give it a go!

Expand full comment

Why the confidence that AGI will inevitably come (within a century, say), especially given that recent LLM trends patently are not heading in that direction? I've yet to see a prediction of this kind grounded in sober analysis of practical tools/concepts that already exist, as opposed to Homer-style empty optimism (or pessimism, depending on your outlook) of the kind: 1. GPT-3. 2. GPT-4. 3. ???? 4. Profit!

Expand full comment

Alexander Naumenko

Let's have a chat about what AGI is to you and compare it to what I can describe for now. Intelligence model can be surprisingly simple.

Expand full comment

Personally, I can't at present envisage any way of being able to objectively describe a particular bit of AI as having AGI, given that we don't have, and aren't anywhere near, a non-Homer-style model of how human intelligence arises from the matter of the brain. Until we have that (I'm not holding my breath), AGI seems to me to be a concept from science fiction, like teleporting: we have a vague idea what it refers to and what its effects would be, but no idea what it actually *is*. AI will clearly continue to go from strength to strength. But I can't see how AGI is a coherent real-world concept, such that we'll be able to say one day, "yesterday everyone was in agreement that we had not reached AGI, but today everyone is in agreement that we have, because x, y, z". What are x, y, and z? What will the arguments be that AGI has now been reached?

Expand full comment

AGI seems to me to be a conceptual Catch-22: To posit a meaningful theory of AGI would be to invent it, but anything short of such a theory fails to be a reasonable, achievable goal.

Expand full comment

Not sure I follow the second half of your comment, but I think I agree with the overall gist! We think we know what AGI means, but we don't, not precisely. It's a placeholder concept. The link with sci-fi is that we can imagine what the future might look like in this regard, but as long as this prediction is based on imagination and hunches, rather than a coherent theory, it doesn't seem intellectually responsible to provide an estimate of its likelihood. The Drake equation, and the whole idea that we can estimate the likelihood of "intelligent life" and "advanced civilizations" being out there in the galaxy, is a direct parallel. When crucial parameters of your prediction are based on terminally vague, fuzzy notions like "intelligence", your prediction is bound to be "not even wrong".

Expand full comment

Alexander Naumenko

Will you agree that intelligence is about differentiation - cats from dogs, there from their, good from wrong? When we plan, we differentiate what will work from what will not. When we reason about root causes of observable state we differentiate what could cause it from what could not. When we resolve pronoun references or disambiguate "visiting relatives" we also differentiate.

The world is about objects and actions. How do we differentiate/compare them? By properties. Objects are characterized by properties. Actions change properties of objects. Comparable and measurable properties are what you need if you want to handle objects and actions using a computer be it biological or electronic.

The Turing test implies that natural languages can model the world. Take a sentence in memory and a question to any piece of that sentence. The question contains references to all the pieces but one. To answer the question you only need to compare pieces - differentiation/comparison again.

What is planning - memory tells you how actions change properties and you want to achieve a certain change of properties.

As I said, intelligence is simple.

Expand full comment

I know intelligence is a word. That doesn't mean that it's used to refer to a well-defined and understood concept, such that we can give necessary and sufficient conditions for what constitutes intelligence, that reasonable people will agree on. Now, if you want to define AGI specifically in other terms that do refer to well-defined and understood concepts, that's absolutely fine. Much better in fact. But my main interest in the idea of AGI is in connection with the idea that once it's reached, we will inevitably progress to ASI a moment later. I think this whole idea is based on the fuzzy notion of intelligence (i.e. the vague cloud of ideas associated with the word intelligence), and is therefore incoherent.

Expand full comment

Dec 28, 2022Edited

Foundational Model of Intelligence

Intelligence = Internal modeling = RNA Processing = Transcription - Splicing - Translation = Analysis - Search - Synthesis =

System 1 is signal transduction (NLU) and System 2 is a DNA-RNA-Protein way (NLP).

Expand full comment

Dec 28, 2022Edited

Epistemological AGI = Knowledge-based AGI = Neurosymbolic Language Model =

NLU (Symbolic Language Model, Ethics, Concepts) -

Multilingual NLP (Statistical Language Model, Word Forms) -

Multimodal AI (Audio, Image, Interaction) -

Reinforcement Learning from Human Feedback (Ethics, Concepts for NLU)

Expand full comment

"The techniques of artificial intelligence are to the mind what bureaucracy is to human social interaction."

— Terry Winograd

Expand full comment

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts