Hi Gary, so true - where there is easy money to be made (ethics be damned!), there will be (are) scammers - using LLMs as the perfect scamplifiers that they are (fast, cheap, good).
Additionally, if people opt for Bard to summarize their Google search, this might add a second layer of BS coating. Google searchers will therefore face an ugly dichotomy: manually wade through possibly BS results, or have it be served with possibly more BS!
Further - the BS will be scraped by next gen LLM producers, to get folded (baked?!) in.
None of this looks appealing to any search provider, or to us users.
Feb 13, 2023·edited Feb 13, 2023Liked by Gary Marcus
Gary, I've been reading your essays and thoughts ever since I discovered your work. This one... where does one stop nodding one's head? I feel like a quantumly entangled bobblehead. No matter when you observe it, it will be nodding vigorously. :P
If we thought COVID was a pandemic to be feared and respected, the LLM virus our society faces now is unimaginably more virulent, pathological, and destructive. LLM itself ofc cannot be blamed for it is a non thinking, non living, non sentient technology; its impacts and utilization—that's on the human species, capable of extraordinary feats of creativity, intelligence, and scientific thought, along with extraordinary feats of parasitic deception, toxicity, and greed. Until and unless the technologists come to terms with how their brilliant work is more than likely to be used by the authors of the latter type of "achievement", this dynamic will not change, but merely shift. We've seen it throughout history... same pattern, different tools.
I sometimes miss the kind of discussions that I used to find in books about the good old fashioned AI. Like the AI Book by Norvig. Today, meaningful discussions and fundamentals are overshadowed by buzz words and everyone telling how the current new thing is going to transform the world, what it means for the business and asking everyone to invest in it.
The probable answer to your question to Yann LeCun is "no, we don't know". Even with Facebook's resources, when I was there two years ago there were similar questions that could and perhaps should have been answered, but weren't. In some cases the issue was simply finding the resources to do the data collection. In others, we knew enough to make an informed guess at the answer, but it wasn't clear that there would be positive business impact from knowing more than that. This is especially true for material that is subject to government regulations. Typically, the requirement in the regulation is to use your best effort given the information you have. If regulation-violating things are happening, and a measurement would clearly demonstrate this, but you don't have the ability or resources to fix the problem it is sometimes better not to know, at least in the short term.
I urged him to simply say “We don’t know” and he wouldn’t, because he was (then, before he pivoted) committed to saying, come hell or high water, there was no conceivable risk.
What I said was that the measurement which Gary asked Yann LeCun for is similar to (unspecified) other measurements that Facebook could, in the past, have made, but didn't. Decisions about whether to make these measurements are influenced by the business needs and available resources of the company, not just the interest of the measurement. Sorry not to be more specific, but all the information that I have is at least two years out of date, and commercially sensitive anyway. I will say that decisions about misinformation, nasty content and politics are extremely difficult, important and complex.
Luckily, my imaginary startup has developed a solution to all the problems associated with emerging AI technologies. Yup, we fixed it all!
The key to our solution to all AI problems was the decision to reboot long ignored 1940s technology which the big players in the tech industry have forgotten all about. I'm not supposed to reveal this yet until after we go public, so don't tell anybody except those people on the Internet please.
Just between you and me, we're calling our coming breakthrough product GNW!, otherwise known as Global Nuclear War. GNW! will solve all outstanding technical issues with AI, genetic engineering, and pretty much every other technology you can think of.
The SEO-spam industry has been using much more primitive "content spinners" for several years. The current model is to pay people (usually in developing nations where the rates are lower) to write an original article on a topic related to keywords they think are undervalued in the search market, then run them through content spinners to generate variations for several other "blogs". They're already in position to rapidly scale up their activities based on something like ChatGPT.
In about a year when Google loses all usefulness and credibility from an avalanche of ChatGPT & etc. spam. Derek Lowe in his "In the Pipeline" blog on Science noted a deluge of bogus papers in the field of Organometallics and that was before ChatGPT-3.
So we will need systems in place by Google or whoever that will flag suspected Content Farm BS (possibly helped by LLM summaries and comparisons with known BS websites previously flagged) and keep indexing this crap so that the search engines filter it out or push it much lower in the search results.
Hi Gary, so true - where there is easy money to be made (ethics be damned!), there will be (are) scammers - using LLMs as the perfect scamplifiers that they are (fast, cheap, good).
Additionally, if people opt for Bard to summarize their Google search, this might add a second layer of BS coating. Google searchers will therefore face an ugly dichotomy: manually wade through possibly BS results, or have it be served with possibly more BS!
Further - the BS will be scraped by next gen LLM producers, to get folded (baked?!) in.
None of this looks appealing to any search provider, or to us users.
“scamplifier”!!
Lol, my term (afaik - Google comes up empty-handed on a search) :)
Saty that's brilliant. Quick go trademark it :)
Thanks Birgitte!
I've seen many times on social media comments justifying the means or journey of misinformation to get to where AI should be.
We dont need to accept the cesspit of misinformation, toxicity and bias which are all due to lack of 'understanding' when there is better science. https://john-at-pat.medium.com/the-new-nlu-industry-a318c6e138d1
Gary, I've been reading your essays and thoughts ever since I discovered your work. This one... where does one stop nodding one's head? I feel like a quantumly entangled bobblehead. No matter when you observe it, it will be nodding vigorously. :P
If we thought COVID was a pandemic to be feared and respected, the LLM virus our society faces now is unimaginably more virulent, pathological, and destructive. LLM itself ofc cannot be blamed for it is a non thinking, non living, non sentient technology; its impacts and utilization—that's on the human species, capable of extraordinary feats of creativity, intelligence, and scientific thought, along with extraordinary feats of parasitic deception, toxicity, and greed. Until and unless the technologists come to terms with how their brilliant work is more than likely to be used by the authors of the latter type of "achievement", this dynamic will not change, but merely shift. We've seen it throughout history... same pattern, different tools.
I sometimes miss the kind of discussions that I used to find in books about the good old fashioned AI. Like the AI Book by Norvig. Today, meaningful discussions and fundamentals are overshadowed by buzz words and everyone telling how the current new thing is going to transform the world, what it means for the business and asking everyone to invest in it.
The probable answer to your question to Yann LeCun is "no, we don't know". Even with Facebook's resources, when I was there two years ago there were similar questions that could and perhaps should have been answered, but weren't. In some cases the issue was simply finding the resources to do the data collection. In others, we knew enough to make an informed guess at the answer, but it wasn't clear that there would be positive business impact from knowing more than that. This is especially true for material that is subject to government regulations. Typically, the requirement in the regulation is to use your best effort given the information you have. If regulation-violating things are happening, and a measurement would clearly demonstrate this, but you don't have the ability or resources to fix the problem it is sometimes better not to know, at least in the short term.
I urged him to simply say “We don’t know” and he wouldn’t, because he was (then, before he pivoted) committed to saying, come hell or high water, there was no conceivable risk.
What I said was that the measurement which Gary asked Yann LeCun for is similar to (unspecified) other measurements that Facebook could, in the past, have made, but didn't. Decisions about whether to make these measurements are influenced by the business needs and available resources of the company, not just the interest of the measurement. Sorry not to be more specific, but all the information that I have is at least two years out of date, and commercially sensitive anyway. I will say that decisions about misinformation, nasty content and politics are extremely difficult, important and complex.
Whoa. Back to the library card catalog. If that's even a safe option. Maybe the University of California can bring back MELVYL, air-gapped.
Luckily, my imaginary startup has developed a solution to all the problems associated with emerging AI technologies. Yup, we fixed it all!
The key to our solution to all AI problems was the decision to reboot long ignored 1940s technology which the big players in the tech industry have forgotten all about. I'm not supposed to reveal this yet until after we go public, so don't tell anybody except those people on the Internet please.
Just between you and me, we're calling our coming breakthrough product GNW!, otherwise known as Global Nuclear War. GNW! will solve all outstanding technical issues with AI, genetic engineering, and pretty much every other technology you can think of.
Reading this reminds me of the immortal Shakespeare line, by a witch from the Scottish play:
"By the pricking of my thumbs, something wicked this way comes!"
This sounds exactly like what chatGPT could do to online information and truth-searching, so to speak.
Can we crowdsource trust?
Has work been done on validating user? Trusted user validating (to the best of their knowledge) websites - and each other?
Maybe it's not possible - but it appears that something will have to be done - before we are all in the sewer.
The SEO-spam industry has been using much more primitive "content spinners" for several years. The current model is to pay people (usually in developing nations where the rates are lower) to write an original article on a topic related to keywords they think are undervalued in the search market, then run them through content spinners to generate variations for several other "blogs". They're already in position to rapidly scale up their activities based on something like ChatGPT.
Indeed they probably already are; just hard to prove…
Our ability to create content - of any truthiness and accuracy - is vastly greater than our ability to qualify and vet content.
In about a year when Google loses all usefulness and credibility from an avalanche of ChatGPT & etc. spam. Derek Lowe in his "In the Pipeline" blog on Science noted a deluge of bogus papers in the field of Organometallics and that was before ChatGPT-3.
So we will need systems in place by Google or whoever that will flag suspected Content Farm BS (possibly helped by LLM summaries and comparisons with known BS websites previously flagged) and keep indexing this crap so that the search engines filter it out or push it much lower in the search results.