21 Comments

I'm shocked, totally SHOCKED!! that Microsoft released a product containing bugs and security issues. Who knew this could ever happen???

Expand full comment

“What’s even more disturbing is that Bing makes it look like the false narrative that it generates is referenced.”

Did you check the references? Were they real or conjured? Did they actually support what the bot wrote? I’ve seen many things written by humans that had plenty references, but the references bore no relation to the topic. On occasion a reference might even contradict what it supposedly supported.

Recently I got into a discussion with someone who was surprised that I didn’t support the idea of extending Medicaid to everyone. He considered that my opposition to it was counterproductive to society. He told me that this had been modeled mathematically and shown to increase productivity. I asked him where he had read it; he promised to send me the article.

Which he did. It was written in the expected word salad mode, but the hopefully redeeming feature was a flow diagram that would show how medical care fit into the greater scheme of the thesis of the article. It took me about 30 minutes to puzzle my way through it, but I finally did.

And ya know what? The number of times medical care of any kind made it into the calculations was ... wait for it... zero. Nowhere in the calculations was there anything even related to medical care. I pointed this out to the guy who sent it to me. Unsurprisingly, he didn’t reply to the email.

Bottom line: the devil’s in the details. So check the details.

Expand full comment

As Yann LeCun, Chief AI Scientist at Meta pointed out, human training to try to put "guardrails" may help some:

https://twitter.com/ylecun/status/1630615094944997376

"But the distribution of questions has a very, very long tail. So HF alone will mitigate but not fix the problems."

as a data scientist notes:

https://medium.com/@colin.fraser/chatgpt-automatic-expensive-bs-at-scale-a113692b13d5

"This is an infinite game of whack-a-mole. There are more ways to be sexist than OpenAI or anyone else can possibly come up with fine-tuning demonstrations to counteract. I would put forth a conjecture: any sufficiently large language model can be cajoled into saying anything that you want it to, simply by providing the right input text.....

There are a few other interesting consequences of tuning. One is what is referred to as the “alignment tax”, which is the observation that tuning a model causes it to perform more poorly on some benchmark tasks than the un-tuned model. "

Humans wish to use these tools for creative tasks. If you cripple them to be unable to imagine things that some people find offensive: its seems likely it'll cripple them in other ways.

Fairly recently there was a controversy at Stanford over a photo of a student choosing to read Mein Kemf since many thought it inappropriate to ever dream of doing such a thing. Others with more critical thinking skills and imagination grasped the idea that it can be useful to "know your enemy" and to understand how people with problematic ideas think in order to try to persuade them to change their views. The ACLU used to spread the idea that the remedy to bad speech is more good speech which counters the bad speech. To create that good speech: you need to see the bad speech and understand it.

One way to do so if you don't happen to have a controversial speaker willing to engage with you is to have an LLM use what its implicitly embodied in its learning corpus to try to generate what might possibly be the sort of speech such people come up with and then consider how to deal with it. Unless of course its muzzled by people that don't seem to have thought or read much about the history of free speech and attempts to limit it, or considered the potential unintended consequences of doing so.

It seems rather problematic to try to prevent an AI from ever being able to generate what some consider "bad speech". Its especially problematic when people won't always agree, ala the recent controversy of the covid lab leak issue where it was considered by many in early 2020 to be something no one should dare be allowed to talk about or consider.

Humans can generate misinformation also. AIs can also then help filter information.

Perhaps training a separate "censor/sensitivity reader" AI to filter the outputs made public by the main LLM would be the answer. Ideally people could choose whether they wish to enact the censor or not, or whether they should be treated like adults able to choose to evaluate information on their own. Unfortunately some authoritarians would like to use the regulatory process to impose their worldview on AI, and indirectly on the rest of the populace. George Orwell wrote about that in a book that was meant to be a warning, not a how-to guide.

Expand full comment

They could - but will they?

What if the Wrights had tried to learn to fly with an airplane that required a mattress to protect the pilot?

For exactly the reasons you state - I would like an LLM that is a real tool - not one that is "aligned" to the developer's ethics.

But Mr. Marcus' comment about 'volume' and 'at scale' really resonate with me. Are we learning to fly without figuring out how to land?

Expand full comment

In terms of the issue of things being done "at scale": it seems there are ways to limit the influence of "bots" in general whether they use sophisticated AI content or whatever they've used the last 10 years. The quality of the bot generated content may have changed so they are harder to detect, but many approaches don't relate to that.

There are ways to try to limit the number of accounts an individual human has on a system. They might use bot generated content on their limited number of accounts: but if 1 human can't have 10,000 accounts then that limits the scale of what they can do with bots. There are ways to detect clusters of web pages that are interlinked among themselves as bot generated likely since others aren't linking to that cluster.

I think the issue of "scale" can at least partly be kept separate from the issue of what a bot is allowed to generate.

The "scale" issue in some ways parallels the issue of computer viruses being a war between those trying to create them and those guarding against it. Its a war between those trying to find way to create fake things at scale vs. those trying to prevent fake things at scale. One example comes to mind off the top of my head. If humans are allowed N accounts on a site (e.g. a real name twitter and a few pen names) and don't use them all: hackers might try to reach scale if they get lists of passwords to hack into accounts and use up the unused accounts and hope the humans don't notice they've been hacked. However you can try to ensure humans know when new accounts are created for them and approve them, with the bot folks trying to intercept such notifications somehow. It seems likely useful methods can limit the viability of such approaches to scale.

To me it seems like the dangers of "scale" shouldn't influence what we allow AI to generate.

Expand full comment

Duh - my last comment - ouch. Please let me back up and try again.

What do you do when your AI disagrees with you? Apparently the first answer is "guardrails" and "what-a-mole" alignment.

But, tuning or aligning an AI program defeats the fundamental premise of AI programming. You don't write the program - you program the machine to learn patterns from the data.

You load enough of that junk in there and it ruins the Agent - but it doesn't disagree with you any more.

Now, the developer could ship the product without all that garbage - but I'll bet they won't. You can guess why.

But let's say we do get an LLM without the muck.

I think the bot garbage problem could be solved by resolving the ID problem. The source is known - then you add a hash to anything that source produces.

Maybe this is too naive - but at some point the internet has to resolve the bad actor problem - especially with the deep fake garbage coming on strong.

Thanks for your reply. I did study your original post.

Hope this expands the conversation.

Expand full comment

I think you have a major misunderstanding of what LLMs are and what this "tuning" and "aligning" is.

An LLM exists exists at all in the first place because it's been "tuned" and "aligned": the code was written and the training texts were chosen from the start to make it produce certain desired responses. An LLM that produces gibberish as output is still just as much an LLM as one that produces text indistinguishable from that written by a real human writer. It's just that the former is an undesired response and the latter a desired response for the LLM authors.

The new "guardrails" are merely more programming (and possibly more training) to remove yet another undesired response that it's producing. That you agree with the LLM creators that "Colorless green ideas sleep furiously" is an undesired response but disagree that "Trump lost the 2020 election" is an undesired response is simply your own value judgement: you differ from the LLM designers in what you want from the LLM.

You are correct that deep learning systems are learning patterns from data, but data are full of patterns, some of which are useful and some of which are not useful, or even harmful. Amazon had to scrap its deep learning resume analysis system because it "learned" that men were preferable to women for technical jobs. For someone hiring staff that's not a political correctness problem: that's a problem of a pattern in the data that correlates with whom they'd already hired that had nothing to do with technical skills. (Even if they could keep secret that that's what it was doing to avoid bad publicity, they still didn't want to be using that as a criterion because it's still working against their aim of selecting candidates based on technical skill.)

Your comment about students choosing to read _Mein Kampf_ also leads me to believe that you may not understand that LLMs are not intelligent in any way, shape or form. They are merely pattern generators: Ian Bogost explained it well when he pointed out that, "ChatGPT doesn’t actually know anything—instead, it outputs compositions that simulate knowledge through persuasive structure." They do not produce ideas: they generate text that looks similar to the human-written text they've been trained on. Whether or not this text contains any useful information or ideas (much less ideas worth discussing) is essentially random.

A lot of folks are trying to sell the idea that LLMs as somehow bringing us closer to having AGI (Artificial General Intelligence), but I don't can't see how they do. They _look_ really impressive, which certainly helps sell them, but how is a product that has _no_ capability for reasoning of any kind, and no obvious way to add this, helpful for AGI, except as an especially effective way of pointing out why the Turing test isn't actually useful?

A lot of AI researchers (especially in the commercial sector) seem to be repeating the same mistake made time and time again in the last sixty years by people overselling "AI": misunderstanding (or not explaining) what their particular technology really does and far overselling what they might actually achieve with it. I predict that when the dust settles down going to see yet another "AI winter" for this particular technology and it will end up in the same place as symbolic AI, expert systems, previous forms of machine learning and various other technologies: something we find useful without any longer considering it to be AI.

There's a clear line of evolution from MACSYMA to Wolfram Alpha along which both its capabilities massively increased and it went from "AI" to "calculator." I see nothing to suggest that LLMs won't go the same way.

Expand full comment

I'm in favor of just removing the guard rails layered above the actual LLM so that everyone can see what is really going on down there. At least for a while.

Expand full comment

Am I the only one who notices the similarities between 'adding guardrails' and the default IT pattern (also visible in the previous symbolic AI wave) where brittleness that is fought by adding extra rules is going to make the tool itself ever more unwieldy in the end? It is also visible in other disciplines where frameworks (like in management, such as SAFe, etc.) tend to grow in size, trying to handle ever more exceptions and boundary cases until they collapse under their own weight.

All digital AI approaches, LLM included, are 'data driven rule based systems in disguise'. And as they are rule-based, they are brittle. And they do not scale when they have to handle something in te real world (like these conversations).

I am convinced this is the case, so I bend all the facts to fit that conviction, like any human does ;-)

Expand full comment
author

In the early 90's Reuters had an 'indexing' AI app (link stories to a thesaurus of terms, users could search based on thesaurus terms) written by CMU in LISP that cost millions to maintain. I worked for a language technology firm that in the 1980's had created a symbolic AI translation system that I once saw described in a book as 'the only one of all the failures that was properly documented' (and where thus the standard excuse "some more work is needed to make it work" would not apply). The company took their first steps into statistics early 1990's. One of the things that was created was a word-statistics based indexing system that was built with standard unix tools like sed, awk and grep and that performed as well as (if not better than) the million-$ LISP system by CMU (we were able to do the actual compare) at a fraction of the cost. It went in production at a national newspaper around 1994. It worked so well the paper asked if they could fire their human indexers (for which the system worked as a 'suggestion' tool, greatly speeding up the process), but we warned them it would derail quickly if they would not keep humans in place. Fast forward 30 years...

Expand full comment

You're not alone.

What you describe is what brought about the second AI Winter when people realized Expert Systems were a road to nowhere.

Expand full comment

Could you please elaborate on the scale? Can you make any assessment about the impact this could have? Could you compare all that with what we have undergone so far without access to generative AIs?

Only then I will feel that yeah, we need to stop these AIs from taking over the internet and society. Until then, I dare say I really don't get this FUD.

Expand full comment

Here you go:

Acemoglu, Daron, Asuman Ozdaglar, and Ali ParandehGheibi. "Spread of (mis) information in social networks." Games and Economic Behavior 70.2 (2010): 194-227.

learn something.

Now a question for you:

What is the definition of "intelligence?" You people keep using the word but you never state what it means.

Expand full comment

Good question. No less a light than the great Marvin Minsky answered this way. In college I signed up for course in AI with him. We’re talking about around 1967, give or take. First lecture, he asks us to imagine some Martians coming to earth. They see some humans. They go over to the humans and put their antennae on the sides of their heads to examine their brains. Then mine says to another, “it’s kind of amazing that they could

do such elaborate calculations with brains like this. But of course, they aren’t really intelligent.”

Bottom line: slippery thing to define. Even Minsky didn’t do it.

Expand full comment

How does that even respond to my question? Your comment lacks any elaboration on the scale, and by "scale" I refer to the term used by Marcus; it lacks any reference to the assessment of the impact of generative AIs; it lacks any comparison with "with what we have undergone so far without access to generative AIs", even though it provides a paper that could be used as a term of comparison.

Expand full comment

'Considered response' - how I define intelligence.

Expand full comment

The spread of disinformation is major, even NATO is warning of cognitive warfare where disinformation campaigns play a key role. We are an AI-startup from Norway, and one of our proprietary ml models is actually on fact-checking, especially of the chatGTP generated content. We are currently testing with more users so feel free to try! https://www.youtube.com/watch?v=I17q-pPhyf0 Editor.factiverse.no - works best in chrome desktop

Expand full comment
Mar 1, 2023·edited Mar 1, 2023

Intelligence doesn't need to be computational in the algorithmic sense. But it's always a response, to some form of consideration. Ice melting would be one ex of unconsidered response, so it won't count as intelligent behavior.

PS: this is in 'response' to A Thornton's qn :)

Expand full comment

Not guard rails, but Band-Aid on a gaping, festering wound.

They opened the can of worms, now they need to deal with it. It's going to be interesting, to see how.

Expand full comment
Comment deleted
Expand full comment
author

this is silly. anyone can read anything and ignore words like “volume” and “at scale”

Expand full comment