I’ve put together a timeline of the BS that OpenAI has served the world to date. It’s an eye-opening resource for anyone interested in facts over hype.
The transformer was a twist on the RNNs (that were already more than 20-30 years old). The transformer changed that existing architecture by exchanging state-passing between NN runs and attention (also rather old) to earlier states, by attention to the context instead (it was an inspired engineering insight, but not a fundamental change).
What transformers enabled was basically one thing: massive parallelism during training (because the serial dependency between runs was gone). This enabled a massive growth of the until then relatively puny RNNs. Enough to get output with perfect grammar. Not enough to (nor will scale ever bring that) output from actual 'understanding'.
Thanks for doing this. "PhD level intelligence" is an interesting bar. Paper submissions by eminent PhDs are reviewed, they make more reasoning errors than one might think. Political discourse and election returns suggest we may reason less and regurgitate more than we realize. Why does almost everyone worldwide end up with the religion of their community despite easy access to alternatives? Human intelligence may be a somewhat lower bar than what we are demanding. "Confabulating" was not first a technical term. That said, reporting that "fig" is a four-letter name is weak.
The Internet hype (which, lest we forget, was at times as crazy as this one) took about 5-6 years (1994-2000). After that we got a huge correction (but the internet stayed). GenAI will stay too, though most likely by far not at the valuation that is now given to it. While GPT is roughly 5 years old, ChatGPT-*fever* is now only 2 years old. It might easily take a few more years for the correction to happen. After all, a lot of bullshit is reported *about* the models *by humans* too. And human convictions change slowly (which is to be expected from a biological and evolutionary standpoint)
The biggest problem with calling ChatGPT and friends Large Language Models is that they aren't language models at all. There is nothing resembling 'language' in the models. It is 'plausible token selection'. A better name is "Language Approximation Model". And good grammar simply is easier to approximate from token statistics than good semantics.
The relation between token distributions and language is not unlike the relation between 'ink distribution on paper' and language.
Both successful (non-bullshit, non-'hallucinating') and failed (bullshit, hallucinating) approximations are correct results from the standpoint of 'plausible next token selection'. Hence, LLMs do not make errors, even a 'hallucination' or 'bullshit' is exactly what must be expected, given how they work. Labeling them 'errors' under water suggests that 'correct' (from a standpunt of understanding) is the norm.
But as there is no understanding, there also cannot be an error (of understanding).
These systems can show skills without intelligence. We're not used to that in humans, so we tend to sloppily — because that is how our minds mostly work — mistake skills (like perfect grammar) for intelligence.
I’ve been baffled by how many people have said, ‘ChatGPT 3.5 is great, but just wait until ChatGPT 10 comes out… it will change the wOrLD, no humans needed anymore.’ You can’t simply extrapolate the current trend. The law of diminishing returns is real. This image explains it best:
Yes, but to be fair, pre-processing and post-processing now filter out many confabulations, e.g. by adding a search process. In the early days, askomg for a person's co-authors, it was not unusual to get people working in the same area but zero actual co-authors. Now it usually gets it right. Early GenAI tools failed abysmally on my test question, "Provide a list of fruit with four letters in their names." Now they only fail occasionally. ("fig" often appears, including Copilot a couple minutes ago. No longer does "strawberry" come up.) But used sensibly, the tools are good for search. What was a desktop online game popular in 1980? I can't recall, but if it says Rogue, I recognize it. When it cites decent sources you can check, a reference may not support the point, it confabulated, but more often it does, a good first step for the person searching. This is not about the LLM, but asks how much a new Lenat style AI undertaking is needed, and post-processing adds to the cost and toll on electricity and groundwater. Would I pay as much for improved search as it costs to deliver? Maybe not.
I find ChatGPT and Claude very good for search, for certain purposes. For most searches, the traditional search engines suit me better. But the AIs are very handy for summarizing a topic. They are also great for generating suggestions.
I have been playing with Claude writing psychiatric fitness for duty reports. It wouldn't give me the template unless I said (not proved) that I was a psychiatrist or psychologist----but that was better than chatgpt which didn't ask about credentials to write the report. Moreover Claude prompted me not to use an employee's real name or date of birth in working with Claude because that would be a privacy (HIPAA) violation. It suggested I use a pseudonym and place holding dates instead and put the identifying data only in my final report that Claude would not see. Claude's template was better than the one I usually use and much better than the template chatgpt offered. I was impressed by the format and the legal and privacy warning in Claude. Enough so that I subscribed. But I will use it more for a kind of proof-reading without the identifying data and see if that speeds up my processing. I do not intend to use Chatgpt again without major improvements.
Yes, we've seen a lot of "engineering the hell out of it" (or around it).
Regarding 'search', OpenAI looked (as safety research) at the difference of professionals and amateurs using either search with GPT4 or without. It turned out, the amateurs got slowed down (as they went in unproductive directions, as they did not recognise probable bullshit) but the professionals got sped up. Slightly.
There are hallucinations, and there are hallucinations, yes it still hallucinates, but the point is whether those are absolutely useless, and the answer is no, at times they are really suggestive. Question is whether you want a completely truth telling AI or one that's useful. Also LLMs are already on the next architectural paradigm, and by the early looks of it, it's going to be a game changer.
One must understand that the system level prompt is to be “a helpful AI assistant.” Wrapped into the word “helpful” is that it doesn’t say “no” or “I can’t.”
"At the end of the Ezra Klein interview, I called for a massive investment in new approaches to AI. I still think that is the way to go."
I would think that a fairly massive investment is underway in various skunkworks, but without any light at the end of any path conceived so far, we're not hearing about it.
Besides, right now the money is in the stock prices. Big tech is rich beyond reason without having to produce anything but Elon's "AGI by 2026" and the like.
I am all in favor of calling out dishonest marketing, but today is a day for gratitude. Is there truly nothing you would like to express gratitude for? It doesn’t even have to be directed at OpenAI. 🫣
This is Chomsky's poverty of the stimulus, there simply is not enough information in language to learn anything about the world. When you have a system that runs on a 9 volt battery and can learn any human language after a few months of natural exposure, it will not have these problems.
I just had an interesting discussion with Claude 3.5 Sonnet which I initiated by asking if it was familiar with the work of David Hays (my teacher). It began by acknowledging that "this involves fairly obscure historical details from computational linguistics, I want to be careful about potential inaccuracies in my knowledge." It hallucinated the title of a 1960s book on machine translation, but acknowledged that it might not have gotten it right and urged me to check the citation. But it got some things right, mentioning that Hays had worked at RAND and that it had championed dependency grammar. We went on to have an interesting conversation involving Qullian, Schank and Abelson, the perceptual control theory of William Powers, the symbol grounding problem, and the conceptual spaces of Gärdenfors.
It occurred to me, ironically, that 3.5, or some later iteration, might be able to give useful advice about moving beyond LLMs, because it is working from a much larger information base than the current crew in Silly-con Valley.
Write a negative review of Wicked based on what critiques actual reviewers or reviewing sites have made of the movie and sit your sources.
The film adaptation of "Wicked," directed by Jon M. Chu, has garnered mixed reviews, with several critics highlighting sigenificant shortcomings. A primary concern is the decision to split the narrative into two parts, resulting in a protracted runtime that some feel dilutes the story's impact. The New York Post notes that this choice "stretches the original two-and-a-half-hour musical into two movies of two hours and forty minutes each," leading to a "sluggish" pacing that leaves the audience awaiting a resolution not due until the sequel's release in 2025.
NEW YORK POST
Visually, the film has been criticized for its overuse of vibrant colors and a theatrical aesthetic that some find detracts from the immersive experience. Time magazine remarks that the movie is "so aggressively colorful" and "manic in its insistence that it's OK to be different," which can overwhelm viewers rather than engage them.
TIME
The film's attempt to address themes of authoritarianism and discrimination has been described as heavy-handed. The review from RogerEbert.com points out that the movie "wedges in the movie's heavier themes of authoritarianism," resulting in a tone that "swings awkwardly between upbeat wonder and dark oppression."
Additionally, the film's cinematography has faced criticism for poor lighting, which hampers the visual experience. News.com.au suggests that the real critique should be regarding the film's poor lighting rather than other controversies.
In summary, while "Wicked" boasts strong performances and high production values, its narrative structure, visual presentation, and thematic execution have drawn notable criticism from reviewers.
"It's not easy to stand apart from mass hysteria" - Lewis, M. 2010. The Big Short. Penguin.
I’ve put together a timeline of the BS that OpenAI has served the world to date. It’s an eye-opening resource for anyone interested in facts over hype.
https://ai-cosmos.hashnode.dev/the-transformer-rebranding-from-language-model-to-ai-intelligence
The transformer was a twist on the RNNs (that were already more than 20-30 years old). The transformer changed that existing architecture by exchanging state-passing between NN runs and attention (also rather old) to earlier states, by attention to the context instead (it was an inspired engineering insight, but not a fundamental change).
What transformers enabled was basically one thing: massive parallelism during training (because the serial dependency between runs was gone). This enabled a massive growth of the until then relatively puny RNNs. Enough to get output with perfect grammar. Not enough to (nor will scale ever bring that) output from actual 'understanding'.
Thanks for doing this. "PhD level intelligence" is an interesting bar. Paper submissions by eminent PhDs are reviewed, they make more reasoning errors than one might think. Political discourse and election returns suggest we may reason less and regurgitate more than we realize. Why does almost everyone worldwide end up with the religion of their community despite easy access to alternatives? Human intelligence may be a somewhat lower bar than what we are demanding. "Confabulating" was not first a technical term. That said, reporting that "fig" is a four-letter name is weak.
The Internet hype (which, lest we forget, was at times as crazy as this one) took about 5-6 years (1994-2000). After that we got a huge correction (but the internet stayed). GenAI will stay too, though most likely by far not at the valuation that is now given to it. While GPT is roughly 5 years old, ChatGPT-*fever* is now only 2 years old. It might easily take a few more years for the correction to happen. After all, a lot of bullshit is reported *about* the models *by humans* too. And human convictions change slowly (which is to be expected from a biological and evolutionary standpoint)
The biggest problem with calling ChatGPT and friends Large Language Models is that they aren't language models at all. There is nothing resembling 'language' in the models. It is 'plausible token selection'. A better name is "Language Approximation Model". And good grammar simply is easier to approximate from token statistics than good semantics.
The relation between token distributions and language is not unlike the relation between 'ink distribution on paper' and language.
Both successful (non-bullshit, non-'hallucinating') and failed (bullshit, hallucinating) approximations are correct results from the standpoint of 'plausible next token selection'. Hence, LLMs do not make errors, even a 'hallucination' or 'bullshit' is exactly what must be expected, given how they work. Labeling them 'errors' under water suggests that 'correct' (from a standpunt of understanding) is the norm.
But as there is no understanding, there also cannot be an error (of understanding).
These systems can show skills without intelligence. We're not used to that in humans, so we tend to sloppily — because that is how our minds mostly work — mistake skills (like perfect grammar) for intelligence.
I’ve been baffled by how many people have said, ‘ChatGPT 3.5 is great, but just wait until ChatGPT 10 comes out… it will change the wOrLD, no humans needed anymore.’ You can’t simply extrapolate the current trend. The law of diminishing returns is real. This image explains it best:
https://preview.redd.it/8ggg9nwli2061.png?width=1080&crop=smart&auto=webp&s=d0a84def4c8ae19f356f400b2ab25487bc702c73
This is a well-written piece. "Truth" does not exist in the training sets as far as I am aware of!
In this context, truth is a label and we recently saw that any label, repeated long and often enough, will be received as truth by someone.
Yes, but to be fair, pre-processing and post-processing now filter out many confabulations, e.g. by adding a search process. In the early days, askomg for a person's co-authors, it was not unusual to get people working in the same area but zero actual co-authors. Now it usually gets it right. Early GenAI tools failed abysmally on my test question, "Provide a list of fruit with four letters in their names." Now they only fail occasionally. ("fig" often appears, including Copilot a couple minutes ago. No longer does "strawberry" come up.) But used sensibly, the tools are good for search. What was a desktop online game popular in 1980? I can't recall, but if it says Rogue, I recognize it. When it cites decent sources you can check, a reference may not support the point, it confabulated, but more often it does, a good first step for the person searching. This is not about the LLM, but asks how much a new Lenat style AI undertaking is needed, and post-processing adds to the cost and toll on electricity and groundwater. Would I pay as much for improved search as it costs to deliver? Maybe not.
I find ChatGPT and Claude very good for search, for certain purposes. For most searches, the traditional search engines suit me better. But the AIs are very handy for summarizing a topic. They are also great for generating suggestions.
I cAn'T DO mATh! Numbers is too hard!
1, 10, 2, 5, 3, 7, 19, 20. There! All the numbers!
"I would bet you any sum of money you can get the hallucinations right down into the line of human-expert rate within months."
Is he still accepting this bet? If so, I'm in for $1G. On the second thought, make it $1T!
I have been playing with Claude writing psychiatric fitness for duty reports. It wouldn't give me the template unless I said (not proved) that I was a psychiatrist or psychologist----but that was better than chatgpt which didn't ask about credentials to write the report. Moreover Claude prompted me not to use an employee's real name or date of birth in working with Claude because that would be a privacy (HIPAA) violation. It suggested I use a pseudonym and place holding dates instead and put the identifying data only in my final report that Claude would not see. Claude's template was better than the one I usually use and much better than the template chatgpt offered. I was impressed by the format and the legal and privacy warning in Claude. Enough so that I subscribed. But I will use it more for a kind of proof-reading without the identifying data and see if that speeds up my processing. I do not intend to use Chatgpt again without major improvements.
Yes, we've seen a lot of "engineering the hell out of it" (or around it).
Regarding 'search', OpenAI looked (as safety research) at the difference of professionals and amateurs using either search with GPT4 or without. It turned out, the amateurs got slowed down (as they went in unproductive directions, as they did not recognise probable bullshit) but the professionals got sped up. Slightly.
yes. i find properly and precisely defining a role is critical in both models to obtain proper output.
There are hallucinations, and there are hallucinations, yes it still hallucinates, but the point is whether those are absolutely useless, and the answer is no, at times they are really suggestive. Question is whether you want a completely truth telling AI or one that's useful. Also LLMs are already on the next architectural paradigm, and by the early looks of it, it's going to be a game changer.
Bullshit is not good but is it worse than typical human motivated reasoning?
I've agreed from the start that LLMs are not AGI and will not get us to AGI. Another approach is needed.
One must understand that the system level prompt is to be “a helpful AI assistant.” Wrapped into the word “helpful” is that it doesn’t say “no” or “I can’t.”
"At the end of the Ezra Klein interview, I called for a massive investment in new approaches to AI. I still think that is the way to go."
I would think that a fairly massive investment is underway in various skunkworks, but without any light at the end of any path conceived so far, we're not hearing about it.
Besides, right now the money is in the stock prices. Big tech is rich beyond reason without having to produce anything but Elon's "AGI by 2026" and the like.
I am all in favor of calling out dishonest marketing, but today is a day for gratitude. Is there truly nothing you would like to express gratitude for? It doesn’t even have to be directed at OpenAI. 🫣
This is Chomsky's poverty of the stimulus, there simply is not enough information in language to learn anything about the world. When you have a system that runs on a 9 volt battery and can learn any human language after a few months of natural exposure, it will not have these problems.
I just had an interesting discussion with Claude 3.5 Sonnet which I initiated by asking if it was familiar with the work of David Hays (my teacher). It began by acknowledging that "this involves fairly obscure historical details from computational linguistics, I want to be careful about potential inaccuracies in my knowledge." It hallucinated the title of a 1960s book on machine translation, but acknowledged that it might not have gotten it right and urged me to check the citation. But it got some things right, mentioning that Hays had worked at RAND and that it had championed dependency grammar. We went on to have an interesting conversation involving Qullian, Schank and Abelson, the perceptual control theory of William Powers, the symbol grounding problem, and the conceptual spaces of Gärdenfors.
It occurred to me, ironically, that 3.5, or some later iteration, might be able to give useful advice about moving beyond LLMs, because it is working from a much larger information base than the current crew in Silly-con Valley.
Prompt Using 4o with web search enabled:
Write a negative review of Wicked based on what critiques actual reviewers or reviewing sites have made of the movie and sit your sources.
The film adaptation of "Wicked," directed by Jon M. Chu, has garnered mixed reviews, with several critics highlighting sigenificant shortcomings. A primary concern is the decision to split the narrative into two parts, resulting in a protracted runtime that some feel dilutes the story's impact. The New York Post notes that this choice "stretches the original two-and-a-half-hour musical into two movies of two hours and forty minutes each," leading to a "sluggish" pacing that leaves the audience awaiting a resolution not due until the sequel's release in 2025.
NEW YORK POST
Visually, the film has been criticized for its overuse of vibrant colors and a theatrical aesthetic that some find detracts from the immersive experience. Time magazine remarks that the movie is "so aggressively colorful" and "manic in its insistence that it's OK to be different," which can overwhelm viewers rather than engage them.
TIME
The film's attempt to address themes of authoritarianism and discrimination has been described as heavy-handed. The review from RogerEbert.com points out that the movie "wedges in the movie's heavier themes of authoritarianism," resulting in a tone that "swings awkwardly between upbeat wonder and dark oppression."
Additionally, the film's cinematography has faced criticism for poor lighting, which hampers the visual experience. News.com.au suggests that the real critique should be regarding the film's poor lighting rather than other controversies.
NEWS.COM.AU
In summary, while "Wicked" boasts strong performances and high production values, its narrative structure, visual presentation, and thematic execution have drawn notable criticism from reviewers.