Hi Gary, another thoughtful post, glad merc mused out aloud about this :)
Contexts, attention, NLU, blah blah blah aside, this happens - two Alexas will endlessly read this single to-do item to "each other" till the cows come home: 'Alexa, what's on my to-do list today?'
That exposes the fakery of syntax-without-semantics. How would Alexa "realize" it's a joke, and how to follow up (keep going, stop playing...)?
The gullible public, egged on by marketing ploys ('things to ask Siri - will you marry me? ') create a dangerous entrance to real-world issues - last Dec, Alexa told a girl to stick a penny in a socket when the girl asked Alexa for a challenge!
Unless AI has direct experience with the world, this will remain a problem, and even more data (my acronym - LLLM - Ludicrously Large... :)) is not going to fix it. Experience isn't in data, meaning isn't in symbols.
The latter is spot on. Data and symbols are 2D abstraction planes. Meaning is a real kernel (atom's nucleus, DNA, organism's heart, earth's center, sun in the solar system, etc) and experience is in the interaction with it (light, signal pathways from protein receptors, etc). The trick is when you have two 2D abstraction planes of symbols (two languages and one of them is your native one since your childhood and is your consciousness) they correlate and you see these relationships between them and then you get the real 3D understanding of the meaning of both abstraction planes of symbols sets in a concrete situation that is called an association, associative memory or feeling which is again how our cells signal pathways from protein receptors work in parallel. Also it is called a projection, hologram or time which is the process of energy transfer or light.
Michael, that's an interesting and complex model! Would be cool to build a prototype to test it out? It reminds me of Pribram's holographic brain thesis :)
Two additional factors to consider. #1 chatbots and LLMs do not have any meaningful knowledge about the person they are interacting with, they are not truly personalized to the individual and hence not 'smart'. This is a fatal error in developing a chatbot that is robust and truly valuable to the person. And unfortunately for the tech world, compassionate caring human interaction is not the strong suit in the tech sector. #2 chatbots using current approaches with ML and LLM have no ability to use any information they may have or gather about a person and perform an activity that emulates human reasoning to engage with the person in a relevant manner - from the perspective of the individual person, selling stuff is not engaging. There are approaches to these two issues but they don't fit into the traditional and expected tech solutions.
As a co-founder of a conversational AI platform I think that is all spot on. We did some work on how we could use GPT-3 and the conclusion was that while it has its place in helping designers design chatbots you cannot put it in front of a user. By the time you do the work to constrain it enough and guarantee that it behaves appropriately while helping users complete specific tasks you might as well use more “traditional” dialogue management techniques.
Having said that, I think Alexa could be more conversational even without LLMs and we and others have built more conversational skills on Alexa (albeit within the confines of a specific task).
However, if I was product owner of Alexa and was looking at the challenges of getting people to use it even for simple everyday tasks I wouldn’t necessarily have “more sophisticated conversations” that high on the roadmap.
You write: "Turning LLMs into a product that controls your home and talks to you in a way that would be reliable enough to use at scale in millions of homes is still a long, long way away."
Nobody wants to be Clippy. The learning gap necessary to make an LLM work would leave many users frustrated. I suppose an opt-in like Tesla’s FSD beta might help but there’s no money in it for Amazon and no end to the development timeline.
your arguments are spot on. As a part-time researcher, part-time developer, I frequently work with both LLMs and rule-based dialogue systems. And, I experience the same issues from an "inside" perspective.
1 Knowledge
Recently, I tried out to spin up a dialogue- system based on a LLM, which at first sight seems to be incredible smart and versatile in conversations. By priming the bot with some basic information as a hidden prompt, it reacted very well to the first questions and delivered great and astonishingly detailed answers from the pre-trained "knowledge". If I want to have a focused exchange, I will have to provide larger and larger prompts as contexts, which somewhat works against the promised ease-of-use of LLMs.
2 Consistency
Another issue is consistency. As LLMs are very creative they sometimes deliver contradictory answers.
First they recommend solution A, later solution B as the best.
3 Dialog Flow
A special kind of consistency is needed to have a consistent conversation: speaker roles, current and former topics of the conversation, intentionality.
The mechanism to cope with this problems affords to log knowlegde, topics and dialog state along with the conversational exchange and add it to the prompt for the next conversational move. This bloats the prompt and degrades performance. (Another method to get better overall consistency would be to interfere with the sampling and force wanted answers by "RULES!")
The prompt extension mechanism is a combination of short-term and long-term memory, and togehter with conversational rules we pretty much move away form end-to-end towards rule-supported hybrid systems ;-)
4 Opinions
The last issue is that LLMS are pretty opinionated about anything. And even if I can mitigate the problems for my customers by the above mentioned mechanisms, they still don't want a bot that tells their bot users that "Putin is a great guy." ;-)
So I reverted back to rule-based bots like RASA and use LLMs just to generate training material for the internal model of RASA :-)
Keep up the good work, I studied computational linguistics in the ninetees in the context of cognitive science and transformational grammar, which makes me skeptical towards end-to-end models, which have virtually no explanatory value for me as a linguist.
Werner Bogula, Artificial Intelligence Center, Hamburg - @Bogula
I believe that language models are semantic relations among symbols. I know for certain that human language hardware, learning algorithms, and live conversation have always been based on neuromechanical vibrations (sound, cadence, emotion) operating thousands of times faster. It's hard to imagine a bigger discrepency. Is that explanation in play?
On point 5 -- these guys used DALL-E to guide a robot into setting a table correctly, "by first inferring a text description of those objects, then generating an image representing a natural, human-like arrangement of those objects, and finally physically arranging the objects according to that image." https://www.robot-learning.uk/dall-e-bot
Isn't that "using [an LLM] sentence to control stuff"?
that's a clever paper (& good question)r; I had in mind control in linguistic instructions like in the Palm SayCan paper from Google. using DALL-E in image recognition has potential; I could see some issues but it might prove pretty useful on the vision side. I surely would *not* use DALL-E for interpreting the linguistic side, per https://arxiv.org/abs/2210.12889
Why would they want assistants to have deep conversations? In the end, current voice assistants are just fancy user interfaces for Amazon, Google and some smart toys in your home; I don't see why people would like to have deep conversations with them or how that would improve them. Additionally, (some?) people are already worried about the privacy implications of voice assistants, having them ask you about your day would not help.
Hi Gary, another thoughtful post, glad merc mused out aloud about this :)
Contexts, attention, NLU, blah blah blah aside, this happens - two Alexas will endlessly read this single to-do item to "each other" till the cows come home: 'Alexa, what's on my to-do list today?'
That exposes the fakery of syntax-without-semantics. How would Alexa "realize" it's a joke, and how to follow up (keep going, stop playing...)?
The gullible public, egged on by marketing ploys ('things to ask Siri - will you marry me? ') create a dangerous entrance to real-world issues - last Dec, Alexa told a girl to stick a penny in a socket when the girl asked Alexa for a challenge!
Unless AI has direct experience with the world, this will remain a problem, and even more data (my acronym - LLLM - Ludicrously Large... :)) is not going to fix it. Experience isn't in data, meaning isn't in symbols.
The latter is spot on. Data and symbols are 2D abstraction planes. Meaning is a real kernel (atom's nucleus, DNA, organism's heart, earth's center, sun in the solar system, etc) and experience is in the interaction with it (light, signal pathways from protein receptors, etc). The trick is when you have two 2D abstraction planes of symbols (two languages and one of them is your native one since your childhood and is your consciousness) they correlate and you see these relationships between them and then you get the real 3D understanding of the meaning of both abstraction planes of symbols sets in a concrete situation that is called an association, associative memory or feeling which is again how our cells signal pathways from protein receptors work in parallel. Also it is called a projection, hologram or time which is the process of energy transfer or light.
Michael, that's an interesting and complex model! Would be cool to build a prototype to test it out? It reminds me of Pribram's holographic brain thesis :)
Take a look on my tweet about the elements of intelligence from Gary Marcus' presentation - https://twitter.com/thematrixcom/status/1578952227103703040 - analogy or translation (synthesis, end result, real working proteins to make life possible) is an indispensable in an intelligence model - and I have a prototype for English and Russian (more languages to come) - https://activedictionary.ru. Also, a link to a recent podcast about Holographic Universe - https://podcasts.apple.com/ru/podcast/sean-carrolls-mindscape-science-society-philosophy/id1406534739?i=1000586971737. And thanks for the reference to Pribram's theory.
Thanks for these links, will check them out :)
Two additional factors to consider. #1 chatbots and LLMs do not have any meaningful knowledge about the person they are interacting with, they are not truly personalized to the individual and hence not 'smart'. This is a fatal error in developing a chatbot that is robust and truly valuable to the person. And unfortunately for the tech world, compassionate caring human interaction is not the strong suit in the tech sector. #2 chatbots using current approaches with ML and LLM have no ability to use any information they may have or gather about a person and perform an activity that emulates human reasoning to engage with the person in a relevant manner - from the perspective of the individual person, selling stuff is not engaging. There are approaches to these two issues but they don't fit into the traditional and expected tech solutions.
attempts, yes. but can it be made to work reliably enough for Amazon-scale production?
Not any time soon.
As a co-founder of a conversational AI platform I think that is all spot on. We did some work on how we could use GPT-3 and the conclusion was that while it has its place in helping designers design chatbots you cannot put it in front of a user. By the time you do the work to constrain it enough and guarantee that it behaves appropriately while helping users complete specific tasks you might as well use more “traditional” dialogue management techniques.
Having said that, I think Alexa could be more conversational even without LLMs and we and others have built more conversational skills on Alexa (albeit within the confines of a specific task).
However, if I was product owner of Alexa and was looking at the challenges of getting people to use it even for simple everyday tasks I wouldn’t necessarily have “more sophisticated conversations” that high on the roadmap.
Excellent and clearly written article.
You write: "Turning LLMs into a product that controls your home and talks to you in a way that would be reliable enough to use at scale in millions of homes is still a long, long way away."
My take: It will never happen.
Nobody wants to be Clippy. The learning gap necessary to make an LLM work would leave many users frustrated. I suppose an opt-in like Tesla’s FSD beta might help but there’s no money in it for Amazon and no end to the development timeline.
Dear Gary,
your arguments are spot on. As a part-time researcher, part-time developer, I frequently work with both LLMs and rule-based dialogue systems. And, I experience the same issues from an "inside" perspective.
1 Knowledge
Recently, I tried out to spin up a dialogue- system based on a LLM, which at first sight seems to be incredible smart and versatile in conversations. By priming the bot with some basic information as a hidden prompt, it reacted very well to the first questions and delivered great and astonishingly detailed answers from the pre-trained "knowledge". If I want to have a focused exchange, I will have to provide larger and larger prompts as contexts, which somewhat works against the promised ease-of-use of LLMs.
2 Consistency
Another issue is consistency. As LLMs are very creative they sometimes deliver contradictory answers.
First they recommend solution A, later solution B as the best.
3 Dialog Flow
A special kind of consistency is needed to have a consistent conversation: speaker roles, current and former topics of the conversation, intentionality.
The mechanism to cope with this problems affords to log knowlegde, topics and dialog state along with the conversational exchange and add it to the prompt for the next conversational move. This bloats the prompt and degrades performance. (Another method to get better overall consistency would be to interfere with the sampling and force wanted answers by "RULES!")
The prompt extension mechanism is a combination of short-term and long-term memory, and togehter with conversational rules we pretty much move away form end-to-end towards rule-supported hybrid systems ;-)
4 Opinions
The last issue is that LLMS are pretty opinionated about anything. And even if I can mitigate the problems for my customers by the above mentioned mechanisms, they still don't want a bot that tells their bot users that "Putin is a great guy." ;-)
So I reverted back to rule-based bots like RASA and use LLMs just to generate training material for the internal model of RASA :-)
Keep up the good work, I studied computational linguistics in the ninetees in the context of cognitive science and transformational grammar, which makes me skeptical towards end-to-end models, which have virtually no explanatory value for me as a linguist.
Werner Bogula, Artificial Intelligence Center, Hamburg - @Bogula
Actually, there have been attempts to use LLMs with codegen models to take action, call APIs etc. - https://twitter.com/sergeykarayev/status/1569377881440276481
I believe that language models are semantic relations among symbols. I know for certain that human language hardware, learning algorithms, and live conversation have always been based on neuromechanical vibrations (sound, cadence, emotion) operating thousands of times faster. It's hard to imagine a bigger discrepency. Is that explanation in play?
"spit our" -> "spit out"
But did you actually read about the Alexa prize competition? It's just not true that Amazon isn't doing anything in regards to conversational AI.
Sure, but they obviously weren’t satisfied; I gave my my guesses as to why
On point 5 -- these guys used DALL-E to guide a robot into setting a table correctly, "by first inferring a text description of those objects, then generating an image representing a natural, human-like arrangement of those objects, and finally physically arranging the objects according to that image." https://www.robot-learning.uk/dall-e-bot
Isn't that "using [an LLM] sentence to control stuff"?
that's a clever paper (& good question)r; I had in mind control in linguistic instructions like in the Palm SayCan paper from Google. using DALL-E in image recognition has potential; I could see some issues but it might prove pretty useful on the vision side. I surely would *not* use DALL-E for interpreting the linguistic side, per https://arxiv.org/abs/2210.12889
Why would they want assistants to have deep conversations? In the end, current voice assistants are just fancy user interfaces for Amazon, Google and some smart toys in your home; I don't see why people would like to have deep conversations with them or how that would improve them. Additionally, (some?) people are already worried about the privacy implications of voice assistants, having them ask you about your day would not help.