22 Comments

Fantastic article; really, really good.

Expand full comment

There used to be a thing called dissociated-press in the Gnu EMACS text editor. It may still be there for all I know. You loaded up a text file and let it rip. It appended text to the end of file that sort of read like stuff in the file. It worked by choosing a piece of the file and searching for it elsewhere in the file, then it appended the stuff right after the string it found to the end of the file, then it grabbed a bit more of the file after the string it used and repeated the process and searched for that string. Short sequences of words always made sense and those sequences were usually joined correctly, but the overall result was sort of dissociated, hence the name, a play on the Associated Press.

Jorge Luis Borges once wrote of an infinite library in one of his short stories. Imagine running dissociated-press on that library perhaps by invoking a surreal mechanism worthy of Borges. The result would always make sense on the fine scale and sort of work syntactically at the sentence level, but the overall result would be gibberish. If it ran long enough it would tell every truth and every falsehood. It might be very entertaining but not very useful.

Modern generative language systems always remind me of William's syndrome. From an article on the syndrome: "Children with Williams syndrome are chatty, have rich vocabularies and love to tell stories. Yet they have trouble learning certain complex rules of grammar." "The new work, however, finds that children with the syndrome do not understand passive sentences that use abstract verbs, such as ‘love’ or ‘remember.’" "Healthy children learn actional passives by age 5, but don’t learn psychological passives until around age 8." There's a theory that people with William's syndrome often became court jesters as they could be most entertaining.

Expand full comment

Thanks for the great article Gary! And anyone who is interested to see how language system works - welcome to my working group - https://t.me/thematrixcom

Expand full comment
author

yes gpt-3 is similar to that emacs function! and yes it does remind of the a-conceptual nature of some WS speech, documented by Carey and Johnson.

Expand full comment

Very informative thanks Gary and Elliot! Just curious, do you discern between cognitive models and cognitive architectures. Is the first more about relationships between objects and the latter more about information processing rules of the system? Can you implement a cognitive model and use it to run a robot - or will that be a cognitive architecture?

Expand full comment

Perhaps each "prior" has its own model, essentially a Go-like game of rules, state evaluation functions, and goals. Human-level intelligence could be achieved by linking up these prior models into a supermodel.

The reference problem isn't as hard as it appears. Wordvecs offer a good example of dimensionally reduced arrays that can act as semantic references. From a "priors" perspective, how can an evolved human trait like greed or ambition refer to modern objects? How can I crave an iPhone? We must have innate circuitry that trains the "craveables" vector in order for our "priors" to reference it.

Many good ideas are floating around -- game engines, agent-based representations -- that would serve to augment current data-driven/statistical methods like transformers, etc. Perhaps priors are mental agents, actively searching for their instances. Without mental representations, you can't handle counterfactuals and "I noticed the train didn't pass at 2am as it usually does".

Quite a few mental agents would be needed: causality, space, time, objects, numbers, agency, facial recognition, family, stranger, trust, anger, greed, official, accidental vs deliberate action, desire to walk, awe, shame, regret, desire to lead, desire to follow, need for approval, hunger, depression, happiness, love, explore vs exploit, fight or flight, etc.

Priors are scary, because they cover human motivations, drives, fears, prejudices, goals, emotions, and feelings. But you can't understand human gossip unless you understand human nature. You don't have to admire Donald Trump to appreciate that humans often suffer from craven impulses, ambition, and greed.

Expand full comment

"Ultimately, the right approach to large-scale learning may not rest on machinery that is finely geared to predicting sequences of words, but rather on developing a new approach to machine learning that is fundamentally geared towards learning how hierarchically structured sets of words map onto meanings that are used in the service of updating cognitive models."

That's exactly about symbolic language models.

Expand full comment

“The act of referring involves (at least) the speaker, the utterance, the context, and some properties of the external world; the interactions and dependencies between such things simply are not present in most current AI models.

Reference is about much more than predicting the next word; it’s about connecting words with (internal representations) of the external world. When we tell you there is a cat on the mat, we ask you to think about a specific cat in the world, in a specific place, and expect that you build an internal cognitive model that satisfies those expectations.”

These are exactly the components of a symbolic language model - Neural network for interpreting sentences of a natural language - https://patentscope.wipo.int/search/en/detail.jsf?docId=US339762244&_fid=WO2020106180

Expand full comment

Thanks for that update to my cognitive model of large learning models and GPT-3 and their limitations. Now I truly dont understand how that google engineer could be so fooled if he was aware of these facts and limitations but maybe he wasn't I dunno.

Expand full comment

Very interesting article. I fully agree with you that current models lack a lot of things, even if in some cases they produce amazing results. I have been working on the use of AI to recreate lost artistic heritage, mainly paintings from the 16th to 18th centuries, and although models like DALLE-2 are capable of generating beautiful images that in many cases really resemble paintings from that period, they cleraly lack the understanding of the world that characterises human intelligence. https://aivirtualmuseum.com

Expand full comment

Great insights as always. I'd also add that language models themselves, however large, are extremely limited in (a) representing reality; and (2) capturing imaginations. If y'all ever experienced the feeling "I don't know how to put <insert the last 5 ineffable subjects you wanted to talk about but could not> into words" then congratulations, you have already run up the wall of ineffability in philosophy, especially its epistemology branch.

At the end of the day, language is an abstraction of thoughts and meanings of physical realities and imagined possibilities. Not all things could be abstracted; and for those could be abstracted, *something* is lost in the abstraction as in information compression. So however "smart" LLMs may become, they could only capture what language presents to them --- an abstracted-away, stripped-down, and biased-to-the-effable-only view of the world. So yes, 10 times yes to “...[A] physics engine’s model of the world" for any language models.

Expand full comment

⚠️ 99% of the review did not reflect the state of the art, namely Gato.

⚫ Only one sentence was devoted to Gato, but Gato reasonably begins to solve lots of the items discussed.

Expand full comment

"The large language model approach ignores all of this literature [on cognitive models]."

It's actually worse than that. It not only ignores the literature, it completely ignores the subject of this literature.

Expand full comment

Why are Linguistics and NLU so important? Precisely because they are one of the means by which we learn something about the world, form and transmit our mental models to other human beings. To think of an AGI that doesn't really know how to *read* (in addition to understanding other multimodal inputs) is to limit our own human ability to create. We learn not only through sensory experiences, but also through the cognitive models we receive from parents, school, society and culture. How to create explicit (symbolic) models that can serve as input for deep learning processes should be our research focus at this time. So, expecting advances of neuro-symbolic architectures.

Expand full comment

Certainly some fair points about human cognition, but they are also a bit tendentious in relation to the question of what machine learning models of language actually do. Firstly, the first two features you ascribe to language are not, strictly speaking, part of language per se (part of the language faculty, that is); more properly, they specify some of the uses the capacity for language can be put to (they refer to the roles language can play in cognition overall). Secondly, and whilst the third feature you ascribe to language - compositionality (i.e., structure) - does form part of language, machine learning models of language, as you know, only manipulate strings of elements, not linguistic structures, so this particular point is a bit by-the-by, to be honest - it is certainly another way to show that these ML models don't do natural language at all, but when all is said and done it is really not their objective to model human language, whatever practitioners claim (not to mention the other two ideas from linguistics you mention, which are not even approachable from the perspective an ML model).

Expand full comment

Language ≠ text. Humans invented language as speech and humans learn language as speech. In both, text comes later as an abstraction of speech. Systems built solely on textual inputs cannot recover what has been abstracted away.

Expand full comment