There was a book out there, weapons of math destruction which documented all this for the forerunners of LLMs. It's just data+statistics, it comes back in all kinds of disguises, the paradigm is the same.
Humans are perfectly capable of recognising that something that is statistically overrepresented from *our* perspective doesn't necessarily describe the statistical reality of the world at large. Of course, we fail at this often, but as this paper shows, LLMs seem to fail at this more often than not, because their entire conception of the "world at large" is derived from the statistical picture presented to them by the data they're trained on.
This paper gives excellent proof that deferring decision making to LLMs is a abrogation of responsibility. It should not be done.
And yes, I agree, makers of these LLMs should withdraw their products, or clearly label that far from "almost AGI", these products posses nothing of the kind of intelligence their marketing would have you believe.
I can't wait to watch Andrew Ng try to talk himself in circles on this one. I mean I recall a 2023 paper that found that ChatGPT reproduces societal biases. I agree with you that this cannot stand and that we as the general public deserve better.
I'm not shocked. It's a language model after all. Chalk up one more limitation to people not understanding how bias in AI and ML actually has three distinct layers.
1. Cultural Bias
2. Data Bias
3. Algorithmic Bias... yes, AI is by by default a bias we place on the other two.
I'm very happy that there are smart people doing this important research, however, I can't help but think that results like this should be completely unsurprising. The models are simply reflecting the associations found in the training data. I assume similar unwanted correlations are found in many other areas. Try the same approach for questions that use language more associated with the questioner's sex, religion, or national origin, just to name a few. It seems like band-aid solutions could be found for each of these issues (maybe preprocess certain questions to preserve meaning but remove dialect that implies race, sex, etc.), but how far down does this problem go? It's just so very easy to believe that any imposed solution that tries to filter out this "bias" is fighting a losing battle against the very data the models are built on.
I don't think these problems can be solved with the current technology. We need systems with much more large scale feedback. And probably multiple independent systems which can evaluate competing priotities and considerations, including symbolic AI systems that can provide hard checks against reality, something completely inaccessible to LLMs. All they have is probabilities and what people say and write, a far cry from reality.
LLM by its very nature cannot be the foundation of any self-respecting production system that requires reliability and transparency. Recall would not be an issue if it was constrained to be a curiosity item for poem writing and such. Outside of that recall is quite appropriate to prevent harm. There is no shame in claiming one's core product is not LLM based, in fact, it should be an honor.
This has been a recurring theme for a while. The book I read on the topic was published in 2016 (Cathy O'Neil's "Weapons of Math Destruction"), and not much has changed since then. Gemini's infamous "Woke AI" was really just an ugly plaster over a very real hole in the load-bearing walls of LLMs.
Personally, I can live with this. The models are quite useful if you bear these weaknesses in mind and account for them... but too many people don't realize this is going to be a problem that won't be solved at scale.
"Don't become a statistic" as the okd saying goes.. speaks to the injustice around simply looking at statistics to make a decision that required a more thorough stakeholder analysis. How about a new saying for decision makers and app developers: "Don't become a stochastic parrot"!
What is missing from the analysis is how other forms of non-standard American English is treated. I don't think English as spoken by much of the country will fare much better. People don't create written text in the same manner as they speak. Even so, people with more "prestigious" careers generally have had to learn to write in correct Standard American English as part of their education process (I've had science instructors who had English minors ensuring that), with the result that correct English is statistically associated with those careers.
Given the assumption that African American English is more of a spoken dialect than written (show me the English courses requiring students to write in AAE), a better comparison would be with other spoken dialects, e.g., Northeast, Midwest, South, or more particular groupings.
Really significant finding. purely data driven models will never solve these root challenges
There was a book out there, weapons of math destruction which documented all this for the forerunners of LLMs. It's just data+statistics, it comes back in all kinds of disguises, the paradigm is the same.
Humans are perfectly capable of recognising that something that is statistically overrepresented from *our* perspective doesn't necessarily describe the statistical reality of the world at large. Of course, we fail at this often, but as this paper shows, LLMs seem to fail at this more often than not, because their entire conception of the "world at large" is derived from the statistical picture presented to them by the data they're trained on.
This paper gives excellent proof that deferring decision making to LLMs is a abrogation of responsibility. It should not be done.
And yes, I agree, makers of these LLMs should withdraw their products, or clearly label that far from "almost AGI", these products posses nothing of the kind of intelligence their marketing would have you believe.
I can't wait to watch Andrew Ng try to talk himself in circles on this one. I mean I recall a 2023 paper that found that ChatGPT reproduces societal biases. I agree with you that this cannot stand and that we as the general public deserve better.
I'm not shocked. It's a language model after all. Chalk up one more limitation to people not understanding how bias in AI and ML actually has three distinct layers.
1. Cultural Bias
2. Data Bias
3. Algorithmic Bias... yes, AI is by by default a bias we place on the other two.
https://www.polymathicbeing.com/p/eliminating-bias-in-aiml
I'm very happy that there are smart people doing this important research, however, I can't help but think that results like this should be completely unsurprising. The models are simply reflecting the associations found in the training data. I assume similar unwanted correlations are found in many other areas. Try the same approach for questions that use language more associated with the questioner's sex, religion, or national origin, just to name a few. It seems like band-aid solutions could be found for each of these issues (maybe preprocess certain questions to preserve meaning but remove dialect that implies race, sex, etc.), but how far down does this problem go? It's just so very easy to believe that any imposed solution that tries to filter out this "bias" is fighting a losing battle against the very data the models are built on.
I don't think these problems can be solved with the current technology. We need systems with much more large scale feedback. And probably multiple independent systems which can evaluate competing priotities and considerations, including symbolic AI systems that can provide hard checks against reality, something completely inaccessible to LLMs. All they have is probabilities and what people say and write, a far cry from reality.
This is, actually quite shocking.
This can use more exposure. I've added my own (with a thanks to you). https://ea.rna.nl/2024/03/07/aint-no-lie-the-unsolvable-prejudice-problem-in-chatgpt-and-friends/
If you aren’t familiar, please look for Erin Reddick. She is the founder of ChatBlackGPT which is currently in Beta.
Here is her LinkedIn
https://www.linkedin.com/in/erinreddick?utm_source=share&utm_campaign=share_via&utm_content=profile&utm_medium=ios_app
Holy cow! That's awful and there is no fix for it within the current LLM tech
LLM by its very nature cannot be the foundation of any self-respecting production system that requires reliability and transparency. Recall would not be an issue if it was constrained to be a curiosity item for poem writing and such. Outside of that recall is quite appropriate to prevent harm. There is no shame in claiming one's core product is not LLM based, in fact, it should be an honor.
I'll sign, of course.
Gary, It's most pleasing to read a takedown without lip service to AGI or super AI or the like. Thank you. Even lip service feeds the enemy.
This has been a recurring theme for a while. The book I read on the topic was published in 2016 (Cathy O'Neil's "Weapons of Math Destruction"), and not much has changed since then. Gemini's infamous "Woke AI" was really just an ugly plaster over a very real hole in the load-bearing walls of LLMs.
Personally, I can live with this. The models are quite useful if you bear these weaknesses in mind and account for them... but too many people don't realize this is going to be a problem that won't be solved at scale.
"Don't become a statistic" as the okd saying goes.. speaks to the injustice around simply looking at statistics to make a decision that required a more thorough stakeholder analysis. How about a new saying for decision makers and app developers: "Don't become a stochastic parrot"!
What is missing from the analysis is how other forms of non-standard American English is treated. I don't think English as spoken by much of the country will fare much better. People don't create written text in the same manner as they speak. Even so, people with more "prestigious" careers generally have had to learn to write in correct Standard American English as part of their education process (I've had science instructors who had English minors ensuring that), with the result that correct English is statistically associated with those careers.
Given the assumption that African American English is more of a spoken dialect than written (show me the English courses requiring students to write in AAE), a better comparison would be with other spoken dialects, e.g., Northeast, Midwest, South, or more particular groupings.