135 Comments

I started working on AI using Lisp machines and knowledge representation somewhat similar to the early yahoo knowledge encodings, but this was in the 80s. This approach was abandoned as not computationally tractable at that time. Semantic approaches were thought to be the right basis. Now we have found a statistical approach, that is in general very useful, but fraught with potential errors due to the lack of semantic underpinnings.

Expand full comment
May 27·edited May 27

Michael, we need functional programming back in the game! We do now have powerful hardware that can do massively parallel tail recursion and trampolining.

But yeah, everything is just corner cutting sampling (and greatly reduced precision floats at that) nowadays, with graph neural network traversals and/or decision forest workloads searches slapped on top of GPT-4 API calls, to try to bring the “wayward” LLM back to the fold… somewhat.

Kinda like emptying a bottle of multivitamins on oily, unhealthy pizza (with grotesque amounts of salt and sugar and a stingy sprinkle of the cheapest, nastiest, processed mystery meats) in the hope of experiencing some form of nutrition. Much better than glue, though 🤢

Expand full comment

Even just a little bit of semantic processing added to LLM’s would make a world of difference. I might try it.

Expand full comment
May 23Liked by Gary Marcus

Hi Gary! Indeed. 'Computing the next word/token' is all that happens, no more, no less. Given that, the hype that accompanies this calculation is mind-boggling. LLMs can't find anything mind-boggling, as an aside, lol - word calculators don't have a mind to start with.

There is no inherent meaning in any sequence/set of symbols - language, math, music, pixels, electronic circuits, chemical formulae, building floor plans... A foundational model trained on such data can't possibly "understand" any of it.

Expand full comment

Emily Bender gave a great example, somewhat similar to Searle's "Chinese Room": suppose you're brought to Thailand, and locked inside a library. You do not understand any Thai. The books in the library are all written in Thai; there are no accompanying pictures or numbers or English. You spend many many years studying these books, and you eventually become familiar with a great many patterns in Thai script, to the point where you can do a decent job of plausibly reconstructing portions of written Thai, given a starting prompt.

The big question is: would you understand Thai?

Expand full comment

I'm not Emily (duh and lol) but I have said something similar for years! It's this. Imagine a child who can't read yet, being locked up in a room full of books of all kinds, but with no words, till the kid is 18. When the kid emerges, would he/she know/understand anything at all? An updated room with multimedia feeds would be just as useless, because the audio, pictures and videos would mean equally nothing.

For humans, understanding happens first, symbols come next.

Expand full comment

I spent a month in Japan, the only Kanji I learned, (subsequently forgotten) was the one for "used" as I was looking for second hand electronics.

Expand full comment

Hinton is either being dishonest, or his level of understanding is even lower than I suspected. Any reversible transform/coding technique can convert one type of data space into another type of data space and then back. So, the statement "They don't pastiche together text that they've read on the web, because they're not storing any text. They're storing these weights." is complete nonsense.

Expand full comment
May 23Liked by Gary Marcus

Yes.

Expand full comment

If you remove token IDs from the weights of a Large Language Model (LLM), the model will no longer be able to convert text into the numerical representations it uses internally or interpret its output as text. Token IDs are essential for mapping text to embeddings and vice versa. Without them, the model cannot process input or generate meaningful output, as both encoding and decoding rely on these IDs. Thus, the token IDs are crucial for the model's functionality.

so yes LLMs store the text and any data type

Expand full comment
May 23Liked by Gary Marcus

Nothing gross about partial regurgitation. That's how mama birds feed their babies. Oh, pardon me I must have walked into the wrong classroom. Darned LLMs.

Expand full comment

People including Hinton apparently don't know how machines WORK.

The public needs a basic LESSON.

Machine 101: What is a machine what does it do?

A machine is a collection of assemblages (discrete manufactured PARTS) that takes loads (electrical / mechanical etc) and transfer them to do work.

What does a machine NOT do?

A machine doesn't have an "internal representation" of anything. Zero, zilch, nada. What you see as "variables" and "statements" in a program only has meaning to YOU, the person looking at the program. It doesn't have any meaning "to" the machine.

EXAMPLE

Here's a catapult (look up a pic on the internet...) Programming a catapult involves adjusting pivot points, tensions, and counterweights. The programming language of a catapult is contained within the positioning of the pivots, the amount of tension, the amount of counterweight, and so on. Does the catapult know "here is a rock, and when someone pulls on a lever I'm supposed to fling that rock in my basket... etc etc etc?" NO. It only "sees" the mechanical stress applied to its various parts, and only in the purely reactive sense (meaning it doesn't "feel" anything.) It doesn't "understand" any "rock," "basket," "lever," or whatever else is there. That goes for any and ALL machines, no matter how complex or sophisticated they appear to be, or if they even display anything or spit out any sound or text like "I'm a helpful assistant that's designed to help you." There isn't any "awareness" of anything whatsoever in there. You may see a machine as aware of something, but it most definitely isn't.

What's NOT a machine? A living entity is NOT a machine. As soon as you grow a blade of grass it's not a machine already; Check the dictionary definition if you're not sure. There's no "disassembling" a living organism... They're not made of manufactured parts, so you're going to end up CUTTING or TEARING tissue one way or the other if you even try. Not to mention living organisms are NOT designed; The process of evolution isn't that of design unless you're fielding an Intelligent Design argument, and even then it's not _human_ design. See definition of what an "artifact" is.

Good grief the sheer amount of ignorance perpetuated is utterly dismaying

Expand full comment

I think Hinton subscribes to the view that living things are also machines, and that all the features of living things that you say differentiate them from machines are illusory.

I share your view that this is nonsense, and I'd cite as evidence the utter failure of the biological sciences to achieve anything remotely similar in predictive power to the physical sciences. We can launch a rocket from Earth and calculate where it will be 20 years from now. But put a mouse in a box, and we have no idea what direction it'll take off in. The Hintons of the world think that it is, in principle, possible to deduce the mouse's trajectory ahead of time, if only we had the right formula and initial conditions.

To me, living things are so dramatically different from non-living things that the whole "AGI" project is hopeless nonsense. I'm guessing you agree. People like Hinton see the world very differently - to them, the human brain is equivalent to a massive, complex collection of transistors, and AI researchers are hard at work figuring out the mathematics that connect them.

Expand full comment

I think most of the current AI purveyors are mechanists. They believe humans and brains are just a set of physical weights and potentials that give rise to all behaviors, intelligence, morals and self-perceptions. They believe this in spite of the fact we haven't the first clue what consciousness is. They are running on the hope and excitement of a connectionist paradigm that recently had an incremental advance with LLMs, and extrapolating it without limit. I feel confident this is not an accurate reflection of reality.

Expand full comment

They are stuck in the obsolete Cartesian/Newtonian weltanschauung. Birhane's article "The Impossibility of Automating Ambiguity" is an excellent analysis and critique.

Expand full comment

> They believe humans and brains are just a set of physical weights and potentials that give rise to all behaviors, intelligence, morals and self-perceptions.

What else would they be?

Expand full comment

You don't need a theory of what something is in order to show what something isn't. Machines utilizes those things but the brain isn't a machine.

Expand full comment

Then what is it?

Expand full comment

A living entity (there are articles saying "life" won't be figured out for hundreds of years if EVER). I know they're not machines.

Expand full comment

It’s a weird mix of hubris and a rather gross view of living things.

Expand full comment

Well said. It's like some people have just decided that there can't be any mysteries. Us clever humans did such a bang-up job with physics, the universe now apparently owes us answers to the rest of our questions.

Expand full comment

What is the purpose of science, in your opinion?

Expand full comment

It's hard to come up with a great general answer to that question; I'm almost certainly going to exclude things I want to include or vice versa.

I think it's some combination of prediction and explanation of reality. Prediction is most important in practical applications, and in the testing of explanations. Explanations are usually needed for good prediction, and always needed for good prediction when pattern-matching to existing data can't be done. Explanations also satisfy our basic curiousity, which I think drives actual scientists moreso than the mere ability to predict.

With respect to the metaphor in my previous post, science describes the method of human inquiry by which we "ask questions of the universe" in a way that makes them answerable empirically.

Expand full comment

OK. I don't really get where "hubris and a rather gross view of living things" and "the universe now apparently owes us answers to the rest of our questions" are coming from, then.

Science is how we answer our questions about the universe, right? And there's nothing wrong with trying to answer them? And it's worked really well at answering the questions we used to have.

Expand full comment

And ignorance. Can't leave out ignorance.

Expand full comment

If that's the case then he needs to check the dictionary and see what the word "machine" refers to.

Expand full comment

I like Hinton, there is something of the Malkavian smart arse about him. I'm of the opinion that AGI/ASI will be a new species. Not perfect, not human, but not Animal, Mineral or Vegetable either. Humans can remember verbatim if they try, there is a memory Olympics. I'm betting when it comes to it the NYT will take the money, not a pyrrhic victory.

Expand full comment

Nicely put. I keep telling that to whoever would listen, however in general the reaction is disbelief. I've met some people who literally think their car knows its way home.

Expand full comment

There's no reason why a car *couldn't* know its way home, given a sufficiently large brain, but current self-driving cars are no more intelligent than a trained lobster.

Expand full comment

The human brain is a machine, too, though.

Expand full comment

I disagree. I would guess we're taking "machine" to mean different things, and in that regard it might not matter. Substantively though, if human brains (or any other biological things) count as machines, maybe you'll agree that machines can be categorized into either:

1) Machines created by humans (or other intelligent beings - don't want to leave out chimpanzee tools!)

2) Machines created by some process that we do not fully comprehend, and that we are unable to re-create. (All living things would fall into this category)

My preference is not to call things in category 2) "machines". We don't know how to make them, for one. We can build complex computers, but we can't build living things out of non-living things. And, as I said above, we cannot predict their future states remotely as well as we can predict the future states of the machines we've made. If I put a worm on the floor, no one can tell me exactly where it will be in five minutes time. We have no formulas to get us there. Not even if it's the humble C. elegans, a worm made up of fewer than 1,000 cells, whose neural connectome was mapped almost 40 years ago, and for which massive amounts of data have been collected.

I realize that this doesn't establish the brain as a non-machine. But if we're going to call it that, it's a machine whose commonality with human-made machines is a mystery that won't be solved any time soon.

Expand full comment

> Machines created by some process that we do not fully comprehend, and that we are unable to re-create. (All living things would fall into this category)

No, I don't agree at all. There's nothing magical about natural machines that prevents us from re-creating them; it's just a lot of work.

> If I put a worm on the floor, no one can tell me exactly where it will be in five minutes time

So what? You can't predict where a robot vacuum will be, either. You know that software can be non-deterministic?

> it's a machine whose commonality with human-made machines is a mystery that won't be solved any time soon.

I didn't say it would be reproduced soon, but to claim its abilities are fundamentally irreproducible just because it's made of mushy bits is silly pseudointellectualism.

Expand full comment

I'll concede that software can give apparently "non-deterministic" outputs, though as far as I know these are pseudo-random: there's still a formula hiding underneath, we just made it really complex so that the output looks stochastic. Give me the seed for the pseudo-random number generator and I (or someone, at least) can, in fact, predict where a robot vacuum will be.

Your claim that re-creating natural machines is "just a lot of work" is speculation. Mine, that this is impossible, is also speculation. You're awfully confident in your speculation, considering that what you're describing has never been done. I'll believe it when I see it, which likely means I'll never believe it.

It's funny, both sides in this debate accuse the other of magical thinking. You see the proposition that reality cannot in principal be reduced to equations as "pseudointellectualism". I see the proposition that it can as a confusion between scientific models and that which is being modeled. There's reality, and there's our understanding of reality. The former is under no obligation to assent itself to the latter.

Expand full comment

> there's still a formula hiding underneath, we just made it really complex so that the output looks stochastic.

If you are limiting yourself to machines = "pure logic", as someone mentioned earlier, sure, but I don't think anyone does this.

I consider a robot vacuum to be a machine, and as that machine's more-or-less deterministic logic interacts with the real world, sticking more to some rug fibers and slipping more on others, it will move in a non-deterministic way (and will typically involve feedback loops to correct this).

Likewise every analog signal is contaminated by unavoidable outside interference, and every analog-to-digital converter unavoidably records unpredictable thermal noise, and true random number generators use this entropy to behave in unpredictable ways, even if the code is identical on each run. The output or state of a non-deterministic machine is of course irreproducible (whether biological or artificial) but that doesn't mean the machine is.

> Your claim that re-creating natural machines is "just a lot of work" is speculation.

I don't see how. What would prevent re-creating them?

> I see the proposition that it can as a confusion between scientific models and that which is being modeled

I understand the difference. I don't see how that's a barrier to reproduction.

Expand full comment

If you can't model something then how do you engineer it?

Expand full comment

Your analog signal example is a good one; I agree with what you're saying here. I think I've wrongly ascribed to you an argument you didn't make: that the brain is a *computational* machine (which is only a little removed from the "machine = pure logic" interpretation that you reject). This is where I see the model being swapped with the thing being modeled: in the claim that our computational models of reality are doing the same thing as reality itself. This is a view I associate with Hinton, that I've heard stated strongly by David Deutsch, and that at least seems to be prevelant among computer scientists (but I'm not a computer scientist so maybe I've got that wrong).

What would prevent humans from re-creating brains? Well, what's prevented us from re-creating any kind of life? If you're just saying that life is a thing that somehow came into existence on Earth and so must be in principle createable, and that I just need to imagine human beings intentionally recreating the conditions and process that started with non-life and ended up with brains, I guess I can accept that. I don't see much use in it, but I'm not going to call it a literal impossibility.

I'm curious now if there's anything in the physical world that you'd say is not a machine. Is an atom a machine? What does the concept of "machine" contribute to science or philosophy in your use of the term?

Expand full comment

Your hypothesis is that living things are machines. The burden to prove this assertion is on you, not on those who disagree to disprove it. Extraordinary claims require extraordinary evidence.

Expand full comment

No. Whether living things are "machines" or not depends only on your definition of "machines". That isn't a scientific question, just a semantic one.

The scientific question would be a falsifiable statement about reproducibility of some function or other.

Expand full comment

I'm using English language definition. Merriam Webster 1d "an assemblage (see ASSEMBLAGE sense 1) of parts that transmit forces, motion, and energy one to another in a predetermined manner"

Expand full comment

I think Kurt Gödel made a pretty good argument (by way of his Incompleteness theorems) that it is not - inasmuch as a machine is an embodiment of logical statements.

Expand full comment

Since when are machines limited to "an embodiment of logical statements"?

Expand full comment

I'm not sure exactly what Ttimo had in mind there, but to be fair, our current working models of computation - Turing machines, lambda calculus, and more practically, stack machines or finite state automata - are all perfectly formal systems, just like propositional logic. Bring in the Curry-Howard isomorphism and the analogy between logic and computation becomes exact. The Church-Turing thesis is the further assertion that these formal models are adequate to encompass anything and everything we might call "computable" - the founders of computer science thought that literally nothing extra was required to fully describe all computations, and this view is widespread among computer scientists today. All modern computers are inspired by these models and add no new capabilities, including the ones that run training and inference for today's generative AI. So for example an LLM is quite literally an (exceptionally complicated) embodiment of pure logic.

Expand full comment

> So for example an LLM is quite literally an (exceptionally complicated) embodiment of pure logic.

That doesn't seems right to me at all. Don't LLMs use ReLU, Softmax, etc.? So there's a sort of "switching" going on at the inflection points, but really they're mostly piecewise approximations of non-linear functions and there's no real if-then logic going on. They could hypothetically be implemented using op-amps and resistors (if we wanted to waste the money) with no abrupt transitions between regions, and would work just as well.

Do you think of the inverted triple pendulum control system as "pure logic"?

Expand full comment

> That doesn't seems right to me at all. Don't LLMs use ReLU, Softmax, etc.? So there's a sort of "switching" going on at the inflection points, but really they're mostly piecewise approximations of non-linear functions and there's no real if-then logic going on.

Certainly we agree that all that math is being emulated (and logistic, GeLU, arctan, linear projection, etc etc). I'm not saying that neural nets are *designed* around logic or that that's the motivating paradigm their architects had in mind - not at all (although there are some attempts of encoding probabilistic logic operations e.g. in Neuro-symbolic architectures). But yes, technically the whole thing runs on a digital instruction set using digital floating point arithmetic to simulate all the math, so yes, the entire thing can in principle be written as one giant logic expression. You could call that pedantic, but it's technically the truth.

> They could hypothetically be implemented using op-amps and resistors (if we wanted to waste the money) with no abrupt transitions between regions, and would work just as well.

Yes, they could, and this would not be a waste of money at all. To the contrary, it would save a lot of resources in the long run because emulating this class of physics in a digital computer is massively more energy intensive than the actual physical system being emulated. See for example what the company Extropic is working on. I predict that *non*-digital neuromorphic chips will one day be embedded in all kinds of low-energy systems.

> Do you think of the inverted triple pendulum control system as "pure logic"?

It's interesting that you bring up this example, because while I'm not familiar with it in particular, there _are_ several simple pendulum systems which are non-integrable, chaotic systems. In other words, any digital (logical) emulation of such a system at fixed precision will inevitably lose accuracy exponentially with time. Really even just a double pendulum is enough (https://en.wikipedia.org/wiki/Double_pendulum). The only way to accurately simulate such a system over arbitrary time scales would be with unbounded computing resources that increase exponentially in the desired time horizon. Not so for a single pendulum - arbitrary precision can be had at any time scale with a cost that is constant in the time scale and at most polynomial in the desired level of precision.

No, I don't think of these systems as pure logic, even though their digital emulations technically are. The double pendulum really stretches the limit because even its digital emulations are technically arbitrarily imprecise beyond some short time horizon. The main point in this whole thread I think is that the map is not the territory, and analogies are useful but inexact. So we have this map, this analogy that says a neural net is loosely inspired by ideas about how biological nervous systems work, or even by just a class of functions that are known to be universal approximators, but then we run them in an approximate simulation in a digital system which is neither of those things.

Expand full comment

Yes, and I'm not going to read your 4000 word article. Make your own argument for why the brain is not a machine. If it's not a machine, what is it, and how does it work?

Expand full comment

You want my argument then here's my argument.

1. We know how machines work

2. How machines work prevents them from dealing with referents

3. We do, so we're not machines.

That's the short version. Long version, you'd still have to read my argument but you can skip to the section labeled "Functionalist objections (My response: They fail to account for underdetermination)" https://towardsdatascience.com/artificial-consciousness-is-impossible-c1b2ab0bdc46

Expand full comment

What does "dealing with referents" mean and why wouldn't machines be able to do it?

Expand full comment

You wanted my argument, I gave you a link to my argument. I wrote the article called "artificial consciousness is impossible." That's the argument itself. It's up to people to take it or leave it.

Expand full comment

I really don't want to believe that Geoff Hinton is arguing in bad faith, because 1) he comes across as sincere, and 2) unlike every other person in AI whose name we know, he does not have a massive financial stake in convincing the public that AI "reasons" and "understands" and "learns" in a manner we'd recognize as human-like. He walked away from Google, and I have no reason to suspect that he doesn't believe what he's saying.

I think Hinton just subscribes to an extremely reductionist ontology: the brain is a computer, "intelligence" and "understanding" and "reasoning" are phenomena that emerge from brains, therefore we can make these things emerge from the kinds of computer we make, too. And, since LLMs are "inspired" by our models of the human brain, LLMs are the best candidates for computers that do what brains do.

Unlike most of the people out there hyping AI, Hinton really understands how LLMs work. He earned his fame by making substantive contributions, and I think he's doing a victory lap after a long career of advocating for methods that in his view were unfairly dismissed. His worldview seems utterly absurd to me, which is why I'm constantly gobsmacked by the things he asserts as though they're plain fact. But I don't think he's a bad faith actor of the Sam Altman or Sundar Pichai variety.

Expand full comment

I would agree with you about Hinton with respect to his sincerity, understanding and outlook. I took pretty the same final year undergraduate course as him, albeit about 18 years later, in Experimental Psychology at the University of Cambridge. The approach there to understanding the human brain was/is quite distinctive and subscribes to a reductionist ontology, as you suggest.

In those days there were quite close links between experimental psychology at Cambridge and AI at Edinburgh and Sussex, where he subsequently undertook his doctoral and initial post-doctoral studies. As a result of this training, I believe Hinton carries a clear, deep-seated model of computational brain function, with the firm view that there is no difference in essence between the biological and electronic.

Overall, I am a supporter of this approach too, so Geoffrey Hinton's stated views give me more pause for reflection than the ramblings of the Sam Altmans of this world. Hinton is fallible, however. His claims, for example, that "... people should stop training radiologists now. It's just completely obvious that within five years deep learning is going to do better than radiologists ..." showed a laughable misunderstanding of the practice of medicine, for example how (medically significant) innovation drives massive demand for expertise in a sub-field. (Apparently, he denies saying this, and meant only that the machines should be reading the scans for them.)

Overall, I share your sympathy with Hinton and believe his views should be listened to respectfully but sceptically. I understand he is (accurately) quoted as having said "... the future depends on some graduate student who is deeply suspicious of everything I have said ..." Wise words.

Expand full comment

Even if you believe that the brain is a computer, it doesn't follow that every sufficiently large computer is a brain. In my view, there was always a bit of magical thinking in the assumption that training a model on text prediction would lead to general intelligence. Early thought experiments about intelligence put a lot of emphasis on the ability to generate believable text (actually, to hold believable conversations, but that part usually gets forgotten), and many people just sort of assumed that if they could nail the effect, the cause would follow. It's Goodhart's law at work, on a billions of dollars scale.

Regarding your last paragraph on Hinton specifically, personal pride can be just as powerful at clouding a person's judgement as a financial interest, if not more so. Hinton's motives may be in some sense more pure than Altman or Pichai, but his arguments are no less specious.

Expand full comment
May 23·edited May 23

I appreciate all of what you've written, but this is my favorite. A particularly good example -- well-explained and summarized -- is better than a million theoretical arguments or arXiv papers. This is truly powerful rhetoric against the irrational exuberance of GenAI proponents.

I was recently discussing GenAI with a friend and he was concerned that he was acting like a Luddite when cars took over for horses a century ago. Besides the point that Luddites might have been misunderstood as Doctorow likes to point out, I reminded my friend that, first, we're nowhere near GenAI being as valuable as cars compared to horses; but, more importantly, cars, despite their likely net positive benefit, were and are still much more dangerous than horses, killing over a million people per year (~40K in the U.S.). In other words, even if AGI ever arrives, we should still be very circumspect and careful with it despite any potential net benefits, especially as we watch Project Lavender and its offspring incompetently automate governance and war.

Expand full comment

When I run workshops for teachers who are drowning in the hype and demands for action, I like to pitch it from a perspective of playing a game of guess the next word.

I then point out that, given "Jingle", by some estimates GPT4 would take 1 trillion additions and 1 trillion multiplications of a gigantic spreadsheet to give us the "Bells", and same again for the next "Jingle" and so on... Usually makes a strong point about the cleverness of this approach.

I think an important thing to point out when talking about regurgitation is that the chat bot experience comes with an imposed randomisation of output - the models are FORCED to reach for synonyms by the temperature setting... Of course it will be unlikely that they quote verbatim in that interface and it is even more damning when they do - it memorised the text so well that the next 5 best options are either nonexistent or outside of selection scope for a given temperature.

Expand full comment

My favourite explanation of what LLMs are is this:

The following is analogy to give some perspective and hopefully provide better insight on the technology behind LLM based AI.

Training:

Imagine you are locked alone in a room full of books which appear to be written in Chinese. Since you don't speak Chinese, at first these books look like they are just filled with random symbols, but the more you look, the more you start to notice some simple repeating patterns amid the random chaos.

Intrigued (and bored), you pull out a sheet of paper and begin making a list, keeping track of all the patterns you identify. Symbols that often appear next to other symbols, and so on.

As time goes by and your list grows, you start to notice even more complex relationships. Symbol A is almost always followed by symbol B, unless the symbol immediately before that A is a C, and in that case A is usually followed by D, etc.

Now you've gone through an entire book and have a list of hundreds of combinations of symbols with lines connecting them and a shorthand code you've developed to keep track of the probabilities of each of these combinations.

What do you do next? You grab another book and test yourself. You flip to a random page and look at the last line of symbols, comparing it to your list and trying to guess what the symbols on the next page will be.

Each time, you make a note of how accurate your predictions are, making adjustments to your list and continue repeating this process until you can predict with a high degree of certainty what symbols will be on the next page.

You still have no idea what these symbols mean, but you have an extremely good system for identifying the patterns commonly found within them.

This is how an LLM is trained, but by reading massive libraries worth of books, testing itself millions of times, and compiling a list of billions of parameters to keep track of the relationships between all those symbols.

Inference:

Suddenly, you get a text message. The same symbols you have been studying, in a similar order to what you have seen before. Even though you have no clue what they mean, you consult your list and reply with what you think the most reasonable and expected combination of symbols would be.

To the person on the other end, who DOES know how to read Chinese, they see your reply and understand the meaning of your words as a logical response to the message they sent you, so they reply back. And so on.

That is known as inference, the process of generating text, using nothing more than the context of the previous text and an extremely detailed reference table of how that text, word by word (token by token), related to each other despite having no understanding or even a frame of reference to be capable of understanding what those words themselves mean or the concepts they represent.

Sentience:

That's the gap that needs to be bridged to achieve sentience and self-awareness. The AI would need to be able to understand the actual meaning of those words, not just the probability of them appearing in a certain order, and actually think in discrete concepts, rather than just the probability of one word following another.

It's much more complicated to reach this level. I mean, you can't just give them a dictionary and tell them to learn the definition of words, because they also don't know the meaning of the words in those definitions.

They can, however, create an excellent illusion of understanding, an emergent phenomenon stemming from an unfathomable amount of data and processing power being used to search for patterns so subtle and interconnected that a human mind could never unravel it, but based on simple and fundamental rules.

Summary and conclusion:

LLMs will likely be a part of whatever system first displays true sentience and self awareness, used as a tool to allow the AI to communicate in a way Humans can understand and interact with, but LLMs themselves simply aren't enough on their own.

Expand full comment

This is well put. Ironically, over the last decade or two most AI courses have dropped detailed consideration and discussion of Searle's Chinese Room challenge to the AI community and the responses it evoked. So most people under 35 are completely unaware of the debate, a little unfortunate given the fact that we are now surrounded by millions of chattering Chinese Rooms.

Expand full comment

inference is common to plants, animals and machines but, so far, reasoning is unique to humans. I noted Sam mentioning reasoning as an unmet requirement at AllIn a few days ago.

Expand full comment

The Salticidae family of spiders are capable of using their vision to locate prey, construct a plan for a stalk, modify the plan as new stimuli is processed, conduct experiments to validate the final stalk plan, and modify the final stalk due to their conclusion(s) from those experiments. Corvids have been proven to use causal, analogical, and statistical reasoning to reach a conclusion. The ability to reason is widespread in the Primate family, not just the Hominidae.

Expand full comment

I'm curious how you distinguish inference and reasoning. Are you talking about inference as a form of inductive pattern recognition? And then reasoning as the creation and application of abstract rules / deductive logic?

Expand full comment

Yes, words (and mathematics) allow virtual manipulation, time travel etc.. Reasoning is some times solitary but progress in understanding is social both competitive and cooperative.

Expand full comment

"They don't pastiche together text that they've read on the web, because they're not storing any text."

Humans also are not [directly] storing any text in their brains, but they are well capable of pastiche and plagiarism. Fortunately, many humans are also capable of understanding stored text.

Expand full comment

I think that you can test this. To get an LLM to plagiarize (I forget the exact metric, something like write 7 words on a row from another source) choose something that was in the training set, but only once or repeated in the same form. IE you used to be able to get them to pretty much recite Jabberwocky verbatim by asking an LLM to write a poem about a Jabberwocky, now they are a bit coy about it, and acknowledge Carol. But if there is a local news story or a lesser known poem, then cluster weights around that topic are sparser (maybe not a technically correct way to say that)...

Expand full comment

There are so many examples of this in the world of generative image models. For example, if you promt one with "cartoon image of a sponge with human features that lives in the ocean", it will give you SpongeBob Squarepants every time. It ain't miraculously independently re-inventing SpongeBob from its understanding of the prompt. Obviously. It's just statistically connecting the prompt to the gazillion images of SpongeBob that it was fed during training.

Expand full comment

It was the early days of 1-900 numbers. Noticing that my classmates would often quote sources about whom they knew nothing and without any sense of context, I proposed a 1-900-QUOTE. You could call me, and for a small fee I would give you whatever quote you needed to support your thesis. Over time I would have become so often quoted that I would have been assumed to be an actual expert in everything - Thomas Sowell meets phone sex meets h-index. Alas, the window only opened briefly before I would have been replaced by the Internet and AI.

Expand full comment

But nobody knows how it works! Throw some more money at it - there's gotta be a genius in there someplace :(

Expand full comment

Garbage in / garbage out.

What is important to understand is that it requires energy to reduce entropy. A matrix of undifferentiated weights with maximal entropy contains vast stores of nonsense, and requires energy to remove nonsense.

A corollary of the “infinite monkey theorem” and possibly maxwell’s demon is that given enough time every possible phrase will appear in a sentence somewhere on “the internet”. That is the source of undifferentiated weights. It takes energy to remove nonsense from internet strings before putting them onto a matrix of weights. You have to filter. Maxwells Demon had to pluck hot molecules (secretly using energy, surprise!), an LLM trainer has to pluck actual facts.

Simply feeding strings into a model will create a modicum of syntax and grammar which will reliably predict sequence of symbols which are grammatically and syntactically reliable.

However underlying information which reliably predicts reality is another order of effort. We’ve invested thousands of years of humanity to use science to separate fictions from facts which reliably predict reality.

If you feed a child a nonstop stream of fiction (religious text) it will understand grammar (albeit outdated) but cannot produce facts about reality.

I don’t think AGI will lie in LLMs like these which rely on a tethered demon to keep its entropy reducing. I think of a general cognitive system as having three features. It has to have a store (weights) a sensing mechanism to feed new information in, and control of energy required to filter out nonsense - reduce free energy - a Friston engine. Finally it has to have a mechanism to expand inputs when the immediate area is exhausted.

Step one will be to build an LLM big enough to develop models rough 1000x - 1,000,000 as complex as the current ones to approximate the possible entropy of the human brain (as measured by density of interconnects). Second is to configure it to automatically assess “strings” before they are incorporated into the store, and verify their truth. Lastly it must be able to autonomously find new sources when old ones are exhausted - seeking and curiosity.

A great example of this would be the ability to ingest data streams not through a pair of eyes, sensation from skin, gravity, tendons and sounds… but through millions of eyes, something a human can’t, along with gravity, acceleration, taste smell and so on, integrated from billions of sources. Throw in magnetometers, infrared and ultraviolet, stress sensors in roads everywhere, video feed from everywhere, temperature gauges over the world, and so on, and energy to separate junk from truth, before remembering everything. World mind. That’s AGP.

A panopticon.

You can also just take LSD and get rid of your own filters ;)

This is equivalent to a model of the world which is continually compared to sensation, and where there is a mismatch a process is invoked to change the model or reject the source.

The second is

Expand full comment

The anthropomorphism of llm is a big problem. If everyone can think of it from its actual mechanism, that is, mathematics.

Expand full comment

I totally agree. When I look at text created by an LLM, I am constantly reminding myself that what I'm reading was generated one token at a time, where each consecutive token was selected from a probability distribution and each probability distribution was created by multiplying vector representations of the current input text by goddamn enormous matrices full of values that were established through a repeated "guess the next word" exercise conducted on an incomprehensibly vast collection of text collected from lord knows where and powered by an army of GPUs sucking up enough megawatt hours to power a small city for a day or two.

This is a useful exercise, because for all our lives when we've read text we've imaged the human who wrote it. Written language has for millenia been a form of communication between human beings. It takes discipline to read output from an LLM and not imagine an intelligence on the other end.

Expand full comment

I personally verified the cheese on pizza thing and I can still reproduce it as of this morning.

Perhaps fortunately, GenAI should make everyone question everything, including critiques of GenAI.

Expand full comment

I was gonna ask if anyone knows if the cheese response is actually real?

Expand full comment