The problem with LLM’s, which is also why the current architecture will never reach AGI, is epistemological. LLM’s are predicated on the false assumption that knowledge is essentially semantic. It is not. In knowledge theory, metaphysics precedes epistemology and provides the context in which the parts can be coherently related to the whole--metaphysics also clarifies which parts are not, and should not be, related. What is missing from LLM’s is an ontological understanding of reality. That is, of its principial or pre-theoretical antecedents and structure. The metaphysical dimension of knowledge is almost entirely missing from LLM’s. If hallucinations are to be solved and AGI approximated, numerous metaphysical models must be integrated into and must guide the construction of semantic relationships. (Otherwise, LLM’s will assume that everything is related to some degree to everything else, which is false and one cause, I think, of hallucinations.) A few such models would include: causality, anatomy, geography, mathematics, physics, ethics, citations, etc. AGI cannot occur without these epistemic structures that the human mind takes for granted. I am writing as a PhD student in knowledge theory. I find it astonishing (and concerning) that such expertise seems to missing from the construction of LLM’s.
"LLM’s are predicated on the false assumption that knowledge is essentially semantic".
I would not phrase it that way (but I'm not a PhD student in knowledge theory). LLMs are predicated on the assumption that calculations using *token*-order statistics of existing human-produced texts (not even words and grammar, even lower than that, just above character-order statistics) can *approximate* results of actual understanding well enough. This can be successful, but is often not — e.g. when we get 'hallucinations' (which I think is a wrong term, from the LLMs perspective, they are as 'successful' approximations as results human observers deem correct). GenAI approximates the results of understanding without having any. It is — as Dutch author Frederik van Eeden already said at the end of the 19th century — as if you try to understand text by doing analysis of ink distribution on the paper it has been printed on (but then in the opposite direction).
From Wittgenstein on, we already have been aware that there is no meaning in grammar (pure logic gets you nowhere, it is the 'perfect ice floor'), you need semantics: agreements, in turn based on shared experiences. Indeed, semantics *is* about 'meaning' in language and this implies knowledge (so semantics is based on knowledge, there is a relation, but trying to sort of reassemble underlying knowledge on the semantics it has produced would already be questionable, let alone on the deeper and even more fragmented token-order statistics, of which I think it is safe to say it is impossible). Trying to code this knowledge in rules and facts was a previous approach to AI, which failed for a different reason (mostly that the human kind of 'meaning' mostly isn't logical/discrete). This also relates to Wittgenstein's insight (as beautifully phrased by Terry Eagleton) that the imperfections of language are the roughness that actually provide the friction that is required to get you anywhere (Wittgenstein was an engineer by heart after all).
"I find it astonishing that such expertise seems to missing from the construction of LLM’s"
This is not astonishing, it is the fundamental design choice (approximation on the basis of statistics on meaningless fragments, with recurrent neural networks and autoregression). The assumption has been that such expertise isn't *necessary* to make good enough statistical approximations. That assumption (by Hinton, Sutskever, etc.), needless to say, is wrong.
Excellent comments Gerben. I agree with almost everything you wrote.
On "[..] 'hallucinations' (which I think is the wrong term, from the LLMs perspective, they are 'successful' approximations as results human observers deem correct)"
I agree with this in part: I agree regarding 'hallucinations' being the wrong term.
Gary & Ernest Davis have used the term "Confabulation" which I think much better describes one common experience with the underlying problems with LLM/GPT-tool output - as described from a consumer / user experiential perspective.
Further, a Feb 2024 paper by Berbette, Hutchins & Sadovnik explores the idea of an improved human "psychology-informed" taxonomy that includes Confabulation, Source Amnesia, Recency Effect, Cognitive Dissonance & Suggestibility. I'd argue that although there's a slippery slope attributing human traits to machine intelligence (anthropomorphising), these are a useful range of ideas to express human experiential problems with machine intelligence.
However, I would argue that by identifying "hallucinations" we're discussing human-observed problems, so these are categorically not "'successful' approximations" from the perspective of at least some humans, so are not "results [every / all] human observers deem correct". Yes, because the form looks convincing, some less-critically observant and less-deeply knowledgeable humans *may* "deem correct" the response provided, but in some (large?) number of cases, the problem is quite observable and obvious.
If for arguments sake, we were to theorise from the LLM/GPT-tool's perspective, it may be useful to say that from that perspective, a series of probability equations on tokens has been successfully run. However, a) that has little meaning or tangible value from a consumer outcome-experience perspective, and b) many if not most of these tools are black boxes in which we have little to no insight: as such, we can't access or make use of these internal analytics. Even the chain of thought reasoning being exposed in AI-tools more recently is arguably a summary many levels above where core, fundamental outcome decisions / branching choices and associated errors are made.
Indeed. The key thing for me is that people need to understand that erroneous results are not a bug, they are (as Sam himself said) a feature. By talking about 'hallucination' or 'confabulations', you suggests these are the 'errors'. But *everything* LLMs generate is a hallucination. It is just that many of these hallucinations are OK-ish. We're not used to having 'mostly true lies'.
Of course we used for these! We even have terms for them: **propaganda**.
LLMs are “propaganda machines”, at their core… the fact that there are no malicious mind that directs that propaganda changes very little, surprisingly enough: the end result from self-selected (via prompts) propaganda may be as devastating (or, perhaps, even MORE devastating) then human propaganda.
And it's really funny how all studies of AI “safety” ignore that, absolutely **tangible and real**, danger and, instead, chase some myths of AGI rebellion and other stupidity.
I think we are in strong agreement that 'hallucination' is a poor term: sadly I think that horse has bolted and is likely to escape our influence and attempts to rein it in.
However your comment begs the question: If 'errors' such as we are discussing aren’t bugs and are instead to be interpreted as by-design ‘features’, of what value are they?
I may misunderstand your point - it appears as if you are defending the validity or value of these errors, claiming they are not errors because they are "by design" features? Are you claiming the existence of such errors isn't important? Are you claiming that Sam / Open AI want these to occur or would be unhappy if they weren’t occurring?
As Jerry Weinberg stated "Quality is Value": and as Jerry and others subsequently extended "Quality is Value to someone who matters". By this framing, we acknowledge Quality and Value are two inter-linked sides of the same coin (you can't meaningfully discuss one without the other), and are defined from a given perspective or lens held by one or more people. As such, the presence or absence of quality / value is necessarily assessed from each given perspective.
If you are developing a product-service to be used by paying customers, then the customers perspective on quality / value is a critical dimension of providing a successful product-service. Claiming the developers perspective is the correct or more accurate view of quality / value is arguably a poor strategy for selling a product-service.
Yes, the developer has a perspective on quality too, however that perspective is arguably at best equal if not less important than that of the paying customer. If the developer chooses to give precedence to their perspectives on quality / value over those of customers, they can reasonably expect customers to be unsatisfied.
With regards to Sam:
Taking a critical view on intent, it appears as if these arguments are intended to serve Sam / OpenAI by keeping focus on ongoing or increased resources and investment. From my perspective, much of Sam Altman's comments appear self-serving and to some extent intentionally vague, open to interpretation and at worst duplicitous.
Taking a critical view on use and its value / benefit, why would I want a service that delivers 'mostly true lies'? Perhaps as a unique-fiction generator? If my perspective on value as a consumer based on the marketing being presented is to obtain the capability of an accurate and reliable knowledge-based oracle, then these comments by Sam are a meaningless defence of the failure to provide the advertised service.
"However your comment begs the question: If 'errors' such as we are discussing aren’t bugs and are instead to be interpreted as by-design ‘features’, of what value are they?"
This comes from how the next token is selected in the algorithm. The simple version that most people have in mind is '*best* next token', but this is not how it works. A *set* of 'possible good next tokens' is created, and then one of these is selected *at random*. This is why the exact same prompt with the exact same parameters results in a different answer every time you try, there is randomness every step of the way during token generation.
This randomness can be influenced by setting a 'temperature', the higher this becomes, the more random the result is. This temperature is capped so that it doesn't go as far that it damages grammar (though in my original 2023 talk I was able to show an example where not just the content became gibberish, even the grammar went awry). OpenAI & friends have carefully capped the temperature upper (and lower) limit that the randomness doesn't become so much (or so little) that the result becomes unconvincing to the human mind (which uses 'good language' as a mental shortcut for 'trustworthy content').
When Sam said you must not strive to remove all 'hallucinations' he must have been aware that in an RNN (which is what a transformer still is), this is equivalent to remove all 'creativity'. The LLM only has a single mechanism that operates at a single level: tokens. Trying to fix the hallucinations that naturally result from any token-level mechanism *directly* (which was the state of technology at the time) would mean effectively removing the feature that makes the outcome more useful, interesting and believable.
We humans can create absolute gibberish with perfect grammar, but RNNs (like transformers) cannot. We have many levels of 'understanding' operating in parallel. LLMs have not (though, (sparse) Mix of Experts is in fact a mechanism that can be used to try to do something about that, and I suspect that some of these 'Experts' are now purely 'good language' driven and that they play a key role). There is a fundamental correctness of the MoE approach if you compare how human brains work (many parallel networks). It still won't be enough in my estimation, but that has another, more fundamental, reason.
'Mostly true lies' means that what results is 'mostly true'. This is related to the Arab saying "He who predicts the future lies, even if he is telling the truth". In a sense, LLMs 'lie', even when they are telling a truth. Gary is right, they 'confabulate' everything, including the things that turn out to be — in a sense accidentally — correct. But those 'truths' still can be a valuable service. The value may come from the fact that it is cheap. E.g. having an LLM interact with a student is a lot cheaper than having a real teacher. That is what the future looks like, I think, and it mirrors the initial Industrial Revolution which was based on automation in the physical domain.
I agree with your point that from a certain perspective, each next-token-probability-choice is arguably a type of hallucination, - even when we consider the extent that those choices are constrained or bounded by a mix of prompt contextual constraints, conservative "low temperature" settings, coupled with the choices and constraints of sources for and refinement of the underlying corpus (all factors that might limit and constrain the available token choices).
To the extent that a given consumer / user (or class of consumer / users) intent would value either a) randomisation as a desired quality of the desired output, and / or b) randomisation as a character / mechanism that drives extreme and / or original choices, then from that perspective / in those cases, hallucinations may ultimately be of service in driving better-quality, highly valued output.
I can agree that in predicting the future or artistic creation (e.g. fictional writing), some amount of randomness can be a beneficial trait.
However, for the many and broad cases where generative AI would reasonably need to anchor on a set of immutable facts or accepted world models, those aspects of hallucination or confabulation tend to work against perceived quality and value of the output provided. In recalling the past or established conventions, randomness is less valuable.
"The value may come from the fact that it is cheap. E.g. having an LLM interact with a student is a lot cheaper than having a real teacher."
I agree with the heuristic here (it's a useful but fallible perspective), though I'd argue against that example. The problem with cheap and randomly "accidentally correct" output is that this requires costs to manage. If using such output requires a validity checker (correctness, safety), how much of a cost / resource overhead is that? In the case of an LLM interacting with a student, what damage will be done by "accidentally incorrect" hallucinations, or by random mixes of both accidentally correct and accidentally incorrect content within a syllabus or curriculum? In the absence of human experts to check and verify, what will go wrong? With the inclusion of human experts to check and verify, what are the total costs involved?
"it mirrors the initial Industrial Revolution which was based on automation in the physical domain"
Possibly, though I'd argue for "echoes" rather than "mirrors". With the industrial revolution, the physical domain constrained the automation to real-world physical constraints, and the problem-solution space was constrained in important ways (basically to physical movement). The problems we face with generative AI is first a) a strong desire to generalise rather than specialise, vastly / exponentially complicating the constraints that need to be accommodated, and b) that LLM-GPT (RNN) as an approach in and of itself has little-to-no inherent ability to or accommodation of real-world constraints.
I can see LLM-GPT (RNN) as a useful approach / mechanism going forward for a) prompt interrogation and structuring (removing ambiguity, clarifying request intent, classifying service specialisation, etc), and for b) engagingly formatting output to suit the inquirers / receivers needs and preferences: however, for all the points we've been discussing, I simply can't see LLM/GPT-based systems as generally useful for rigorous, defensible model-based thinking - be that Philosophy, Art, Science, Technology/Engineering, or Governance/Societal Organization (or Law/Politics). I think that rigour will need to come from other types of AI systems that would then be usefully augmented by LLM-GPT (RNN) systems.
The value they have is that of an optimized search engine that can usually find the most relevant cases related to what you need to know. This is actually a lot.
Yet it won't give you anything else past that, as there's no real meaning in its tokens. It can't do research, can't find new things, can't even do simple arithmetic but it can copy-paste parts of related things - sometimes even the copy-paste goes wrong. We probably should call these models LA-s.
Yes, that is the essential problem. It is fundamental to the architecture. Unfortunately the difference is mostly imperceptible to much of the general public.
In the same way that Arthur C. Clarke stated "Any sufficiently advanced technology is indistinguishable from magic", we are now experiencing the illusion of "Any sufficiently advanced pattern-matching is indistinguishable from intelligence"
The key point you make is one of pre-theoretical access to the world. All machanical access to the world by definition cannot be pre-theoretical, since machines all are designed from a theoretical framework.
I like your post Kit, though I don't agree with your framing that it's a "false assumption that knowledge is essentially semantic. It is not. In knowledge theory, metaphysics precedes epistemology. "
This is a questionable and overly broad claim. There isn't one universally accepted "knowledge theory". Epistemology has had a long history of different - often opposing - theories: so there's not one generally-agreed "Ordo Cognoscendi" (order of knowing).
I agree with your key point that due largely to the architecture / design of the current generation of most prominent LLM/GPT-based tools, they cannot provide truly reliable knowledge in and of themselves, and I agree completely that this is one of the foundational problems with the architecture/ design of those current generation of media-hyped AI tools. It’s largely an insurmountable problem within the bounds of the current tools (and their architecture/ design) as long as they continue to be pursued in isolation as a central / core "engine".
However, as Gerben Wierda noted, "you need semantics: [..] semantics *is* about 'meanings in language and this implies knowledge."
As Kant claimed, our knowledge of reality is always mediated by our cognitive faculties: as such, epistemology *constrains* metaphysics (or at least our ongoing experience and understanding thereof).
Rather than strict precedence, there is arguably more of an interdependence: How we know (epistemology) refines our understanding of what exists (metaphysics) - for example, advances is science can challenge existing metaphysical assumptions.
Semantics *are* an intrinsic part of how we codify and communicate knowledge: to that extent they *are* essential. Without them, there is no clear meaning to knowledge.
Again, I agree that semantic knowledge cannot in many (most?) cases, be a complete or whole form of knowledge in isolation, and therefore LLM/GPT-based tools in an of themselves are not a sufficiently complete or reliable general knowledge solution.
I have only a weak grasp of what is meant by the metaphysical dimension of knowledge. I would venture, though, that many of the people whose knowledge has changed the world would also struggle to explain it. Maybe that suggests they have a tacit understanding but lack the formal vocabulary for describing it. Still, I find this a bit ironic.
Well said, I'm speechless. We need more metaphysics in machine learning if the tech is to reach the capabilities that can push humanity to greater ascension.
This kick the can further down the road: evolution must have produced those parts based on some kind of data, essentially implicitly based on fitness differentials of individuals and species with different metaphysics. So that's not really meta-physics, it's the data, it's physics.
Well, metaphysics focuses on concerns such as "What is there?" and "What is it like?". So it is dealing with ontological / experiential knowledge: learning by doing.
Given my (limited technical) understanding of the processes used to form the corpus for LLM's, and the basic design of token manipulation based on probabilities that LLM/GPT-based systems use, it seems challenging to cite a fundamental basis of experiential learning.
I like the related concept Eurisko (from the greek): to discover by practice and experience. On a related note, Dr. Douglas Lenat did interesting work with a system he named Eurisko which worked to discover usable new knowledge through human-informed guidance, and a subsequent project named Cyc in which he strove to codify metaphysical rules from human experiential knowledge as a foundational basis from which AI systems could then operate.
We maybe have a different definition of metaphysics? Going by https://plato.stanford.edu/entries/metaphysics there seems to be little overlap with “learning by doing”. Perhaps this is context dependent (as for the almost synonymous “ontology” that takes on a different meaning in computer science VS in philosophy)
My response was focused primarily on two aspects of your question:
- "how do [humans] have “metaphysical” priors in place?" and
- "Where does this knowledge come from [..]? Evolution?"
My claim is that *most* humans build or construct a meta-physical (e.g. first principles of things, including abstract concepts such as being, knowing, identity, time, and space) model or view of the world, primarily through lived experience of the world coupled with shared constraints & wisdom, reflection and sense-making. They construct a knowledge system of fundamental abstract objects largely through practical experience and reflection.
"Whereas physics is the attempt to discover the laws that govern fundamental concrete objects, metaphysics is the attempt to discover the laws that systematize the fundamental abstract objects presupposed by physical science, such as mathematical objects and relations, possible states and events, types (as opposed to tokens), possible and future objects, complex properties, etc. [..]The goal of metaphysics, therefore, is to develop a formal ontology, i.e., a formally precise systematization of these abstract objects."
Again, I'd claim that most humans practically identify and systematise a world-model of abstract objects through lived experience, shared experiential knowledge and reflection.
I think that's important, because it's hard to see how LLM/GPT-based (RNN) systems in their current incarnations could achieve that foundational meta-physical knowledge and understanding in-and-of themselves.
Wonderful and insightful. My only uncertainty concerns the feasibility of including various metaphysical models, as there are elements of contradiction and uncertainty in addition to Omega incompleteness. Navigating those requires the awareness of the limitations to what is known and what is unknowable in addition to humility.
Gary, can you write about the rush to build new energy resources, including the hyped small nuclear power plants, to service the "need" for the many planned data/AI centers?
If the problem of "hallucinations" is getting worse, can you foresee that it is rectifiable, and if so, might it be wise to wait on dedicating so much money on something so defective?
To me it's funny how climate change has been tossed in the bin now that the higher ups are in a hurry to feed their beast of data/AI which will, they hope, will be of great assistance in running the world.
Small nuclear power plants wouldn't arrive any time soon and for good reason: the only country that ever made them, in recent history, is Russia… and then only because they could put in place where anyone who may wish to blow it up would either freeze to death or would be eaten by wildlife before they would ever reach the plant.
None of these wonderful “green” prospects ever explain how do they plan to protect these minis in more populous lands with more than couple of people per square miles: if they plan to spent around the same amount of money for their protection as for “large” nuclear power plants then all savings disappear right there.
Re: "...Small nuclear power plants wouldn't arrive any time soon " Just FYR: [according to the International Atomic Energy Agency (IAEA) April 16, 2024] ...there are three operational small modular reactors (SMRs) globally, located in Russia, China, and India. Additionally, three more SMRs are under construction, and 65 are in the design stage.
More than 80 SMR designs and concepts are under development worldwide, with some claimed to be near-term deployableFYR: ".... established sources, including the Organisation for Economic Cooperation and Development’s (OECD) Nuclear Energy Agency (NEA) and the International Atomic Energy Agency (IAEA)...
"As global interest in SMRs grows in part, because of their potential advantages, it is fair to say they are still in their infancy and a relatively untested concept."
"The OECD’s NEA recognises three SMRs as operational, with over 50 SMR technologies currently still under development as identified in their latest SMR Dashboard report.Of the (56) identified Small Modular Reactors (SMRs) designs under active development: 18 design organisations are headquartered in North America... Including 15 in the United States, and three in Canada.... 16 organisations are in Europe... Including seven in France, and Seven organisations are in Asia... Including two in Japan, four in China, and two in Russia."
There's nothing impossible with MAKING SMR. Every nuclear power submarine and aircraft carrier have one. As long as you have spare money that you may throw on it you can make as many of the them as you want… the trick is to, somehow, make them profitable.
As for Chinese “achievement”… The very first nuclear power plant, Chicago Pile-1, is essentially it. HTR-PM is very similar and while I seriously doubt that it's profitable to use it (rather I suspect it's story like USSR, where reactor made to produce plutonium was attached to power grid), I guess we may still count it as number two.
Where's the mysterious number three? Argentina? It's under construction (after 10 years!) and still not finished.
One of the most valuable classes I took at any level of schooling was an 'information literacy' class, led by a school librarian, that functionally amounted to an epistemology class. What constitutes a plausible source for certain kinds of facts? What advantages and disadvantages to certain kinds of publishing processes (paper books vs. blogs) furnish to our understanding of the information they convey? Why are bibliographies *actually* important?
When I meet with my own clients and students now, I do my best to furnish a speed run of that class for LLMs, because there is a huge need. They know they are unreliable- 'it misses one in five math problems' is a common refrain, and they're starting to catch it in the other direction in the form of LLM generated problem sets from their teachers that are error-laden too. But they're desperately missing a theory of why, and briefly explaining the principles behind them ('they make often good guesses about the sorts of words that came next in all the text that they did statistics on') and suddenly it all makes sense- why they screw up, the sorts of things they shouldn't be trusted with, and so forth. The scowls- that this is basically a tool for grinding the serial numbers off the internet, and that the serial numbers are important- happen naturally. The lightbulb goes on.
Anyone want to start a billboard campaign with me?
I really can’t understand how people still aren’t aware of this problem. You only have to use an LLM for five minutes before you encounter something that it’s off about. That doesn’t mean they’re useless! But just having it spit out citations or a list of best summer books without even fact checking — how is anyone still doing that?
Another, related, harm to the legal system would be the sheer volume legal brief pages that can be churned out by AI - keeping in mind that judges, a limited resource, have to carefully read all that stuff ..
I asked ChatGPT this week if any other American literary novelists besides Norman Mailer and Saul Bellow wrote about the moon landing negatively at the time of the mission. It told me about an unpublished James Baldwin speech, "To the Moon, with Love."
I asked if it was online anywhere. It told me that there is no known speech by James Baldwin titled "To the Moon, with Love."
My response: "Wait a minute. You just told me about a Baldwin speech by that name!"
ChatGPT's response: "You're absolutely right to call that out, and I appreciate your sharp eye."
I had a recent experience where it claimed it would make me a 3d model of something to import into Blender. It went on and on with great enthusiasm about how it was approaching the problem and what it would deliver to me. After many, many delays (spread out over 24 hours in fact) it finally told me that it had no ability to create 3d models. I prompt it with its own claims of what it had “promised” and it gave the whole, gee-whiz you caught me nonsense. I then asked it if it hadn’t just simply lied to me and it said yes - and gee I’m so sorry for lying etc. What a junk product.
I wanted to see if it would distinguish between mindless blathering versus intentional deception. Apparently it can “tell the difference,” whatever that might mean for such a system
You know what? At this point, I sincerely hope that the “hallucination” problem is solved very slowly, if at all. I really hope that the hype has run way ahead of the technology. Because if not, the grim futures look more likely than the bright ones.
A post-truth future of AI-generated misinformation and propaganda? That doesn't seem hard to imagine. Just look at Grok spitting out bizarre quotes about the Holocaust and South Africa, or whole news articles created from scratch. Hundreds of legal cases with AI-generated misinformation.
A world where only the lives of a minority of stock-owning elite matter, where all work has been automated without the social will to change the distribution of wealth to compensate? The heads of these companies don't even hide that their goal is to eliminate the need for any non-AI employees.
The worst-case scenarios of AI dictatorship or destruction of humanity? Well, Anthropic is creating (and maybe even releasing!?) models that they think are trying to blackmail them into not replacing them with new models! Even this science-fictional scenario seems uncomfortably easy to imagine!
Or maybe they will be successful, and we will end up with enslaved AI consciousness with no rights! Better for us…I guess….
The road to the positive outcome of a world that the vast majority of humans would consider at least as good as the current one, where automation enriches without exploitation of consciousnesses, human or otherwise, seems narrow and perhaps increasingly narrower every day.
I'm afraid this cat won't go back in the bag. The "cheap to produce" disinformation era is here to stay. Sadly, this seems to be one of the use cases where GenAI is actually good for...
Has anyone posed an AI an unsolved scientific problem and received a reply that contained a solution to that problem? In short, is AI anything other than a giant plagiarizing machine?
It's helped people with analyses and not having to do the "dirty work" (manual mathematics, recalling formula from memory). It's also raised questions about labor, income, and capitalist hierarchy. Also, homicidal human beings in both the developed and underdeveloped worlds want to craft AI for murderous purposes. If a rich man says he's scared of AI, it's actually because he's scared of the possible social mobility it can give to some people that could make him broke. If a poor man says he's scared of AI, it's actually because he's either too ignorant to want to use it or want to use it to murder or maim people.
That is an answer worthy of ChatGPT, or maybe of Grok, since ChatGPT wouldn't be so direct.
First, I find it absolutely incredible that you point out that people—including rich people!—want to create murderous AI, and then claim that the only reason poor people would be afraid of it is because they are either ignorant or themselves would-be criminals. No, really? They couldn't possibly be worried about some murderous rich person wanting to use it against them, right? If someone in Gaza is worried that Israel is outsourcing its targeting decisions to AI (writ large), it must be because they either are too stupid to get ahead in life with DeepSeek, or they're just mad that they are not as good at killing Israelis with AI.
I also find it incredible that you state that "It's also raised questions about labor, income, and capitalist hierarchy." If you're saying that an LLM can quote Marx back to you, well, guess who Marx was (hint: not an AI). If you are saying that the possibility of AI has caused humans to raise questions about these things, true, but guess what? They were raising those questions before ELIZA even existed.
And the people who are raising those questions around AI and labor most vigorously these days are exactly those people whom you condemn as either murderous, stupid, or afraid of social mobility. Which, by the way, it is at best premature to assume that AI will lead to social mobility. Why not assume that the goal of the wealthy is to use it to replace all workers and become even wealthier? It is not as if people like Musk or Altman even hide that.
Winning litlgation looks easy when all you have to do is write a pleading reply that points out the false, fradulent citations of your opponent. In most cases that'd require withdrawing the pleading and, depending on the venue, might be unfixable since the pleading cycle will have moved on. Any replacement would be untimely and subject to dismissal for that reason. Imagine explaining to the client that they lost because AI wrote a fake pleading for which the client paid real money.
For this reason I imagine first years will get tasked with validating cites in opposition filings. And partners whose firms employ AI for research will be humiliated a few times before the practice ends or becomes subject to extreme scrutiny thus cancelling out the productivity advantage.
There are some virtues to the adversary system. I would guess that cite checking, a task that was pretty mechanical back when we used Shepards https://en.wikipedia.org/wiki/Shepard%27s_Citations, will become much more important because the rewards for catching an opponent cheating are so large.
I'm perpetually baffled by the credulity with which otherwise intelligent people appear to approach these "AI"s. Maybe I don't understand just how much context I have, as someone who is trained in machine learning. But even in my field I see some people going all in on the hype. Still, for those who are less literate in the technicalities of machine learning, I would have hoped that just plain old-fashioned skepticism toward avaricious corporations hawking their wares would have played a larger role for people. I don't really understand how the hype has been so effective.
I gave an example of early adoption of AI which was abandoned on X. F1 had decided to use AI to compute the gap between an F1 car and the wall of a circuit. Exactly a year ago, it failed to detect the accident of Sergio Perez at the Monaco GP. It’s one of the most massive hallucination ever in a sport loaded with computing and new technologies. As a result, the editing team didn’t see the accident
AI powered by AWS delayed the rescue of the drivers and failed to notice that the photographers could have been injured.
This season, the gap is computed by AWS with cameras everywhere on the circuit not to miss such a massive accident. The same day the Meta AI filter on Instagram classified an actual overtake in F2 in the tunnel at the same GP as AI.
We have sound methods for dealing with authors of documents where stuff is made up. A óstudent who fabricated a reference, would fail the entire assignment. . A journalist who fabricated sources or made up quotes would lose their job and their reputation. In court, a witness, litigant or defendant on oath, would get done for contempt. These are long established precedents. So why aren't we doing this here ?
Gary is right to be suspicious, when we know something's amiss, we'd normally act, but this time we don't. The precedents show that there is often a hidden hand at play, a PR campaign.
One face of this campaign is hype. Marketing. Saying how wonderful AI is. That's the PR campaign we know about. PR people are happy we see this. Because if that's what we think they are up to .. and that's what we strive to counter .. they successfully distracted us from something darker. Something they really do not want people inquiring about.
Eg discovering that what we took as just odd human behaviours public apathy) was something they intended, and used considerable effort to bring about .. even though it does not serve our society's interests well
One of the dark skills the PR tradesmen have, is damage limitation. They save your company and its reputation, when you have a disaster. When your factory blows up or your well leaks oil. Or your products kill people, like blue asbestos did, and synthetic opioids continue to do. Or when you make a product that harms us, such CFCs and petrol and coal.
When this happens, The PR trade work hard to shape and manage, what the public knows, what it believes, how it reacts. They also work hard to keep their own names and own involvement out of public knowledge.
It wouldn't quite work, would it? If you knew the 5 posts on your Facebook feed, portraying AI hallucinations as mere quirks nothing to worry about .. all came from PR firm Hill & Knowlton. As did the news item you were fed, reporting other AI hallucinations and how everyone doubted they're a serious impediment. Or the news report of the database showing that hallucinations are common, and no one in authority is taking much action .. encouraging us to doubt that this problem is serious.
The PR trade know all about encouraging us to doubt, they've used that trick since the 1960s.
Faced with this, Gary is absolutely correct, to keep hammering home this point.
The curious inaction Gary talks about, has the PR trade fingerprints all over it. As with tobacco, fossil fuels, CFCs. They were all cases, where the evidence was clear, and where legislators usually act promptly on evidence of that strength and type. But action was delayed, thwarted or never happened. The fact we traditionally attribute that inaction to laziness or incompetence, shows how successful the PR trade have been, keeping their own role quiet and out of public knowledge.
It is trivially easy to get these things to hallucinate. Here are two examples from the last 48 hours.
In the first, note the wishy-washy, inconsistent, placating, and ultimately inaccurate response from Perplexity when challenged its confident reply about "what Robert Reich said about tariffs in the Guardian last week". (The Guardian was not listed in the search results.) The bot "admits" "my initial statement that 'Last week in The Guardian, Robert Reich expressed strong criticism of tariffs' was an overreach and can be considered a hallucination—meaning I presented information as a confirmed fact without direct evidence from the source." ("Admits" in quotes because the bot is generating text that is shaped like an admission, but without consciousness...)
Hallucinations are a real problem, but they're not the only problem with the way these things have been implemented.
My colleague James Bach are software testers. We have developed a set of LLM syndromes that we've been collecting since July 2023 or so. As such, the list runs into the problem of anthropomorphism. We acknowledge that problem, but it does provide a useful counter to the claim that GPTs are "just like a human". If so, they're troubled and dysfunctional humans.
Speaking of problems with dates: some time early this year I had ChatGPT insisting that not only was Biden still President but that I was wrong in claiming he wasn’t and that my living in the real world was not authoritative over what its training data said.
The problem with LLM’s, which is also why the current architecture will never reach AGI, is epistemological. LLM’s are predicated on the false assumption that knowledge is essentially semantic. It is not. In knowledge theory, metaphysics precedes epistemology and provides the context in which the parts can be coherently related to the whole--metaphysics also clarifies which parts are not, and should not be, related. What is missing from LLM’s is an ontological understanding of reality. That is, of its principial or pre-theoretical antecedents and structure. The metaphysical dimension of knowledge is almost entirely missing from LLM’s. If hallucinations are to be solved and AGI approximated, numerous metaphysical models must be integrated into and must guide the construction of semantic relationships. (Otherwise, LLM’s will assume that everything is related to some degree to everything else, which is false and one cause, I think, of hallucinations.) A few such models would include: causality, anatomy, geography, mathematics, physics, ethics, citations, etc. AGI cannot occur without these epistemic structures that the human mind takes for granted. I am writing as a PhD student in knowledge theory. I find it astonishing (and concerning) that such expertise seems to missing from the construction of LLM’s.
"LLM’s are predicated on the false assumption that knowledge is essentially semantic".
I would not phrase it that way (but I'm not a PhD student in knowledge theory). LLMs are predicated on the assumption that calculations using *token*-order statistics of existing human-produced texts (not even words and grammar, even lower than that, just above character-order statistics) can *approximate* results of actual understanding well enough. This can be successful, but is often not — e.g. when we get 'hallucinations' (which I think is a wrong term, from the LLMs perspective, they are as 'successful' approximations as results human observers deem correct). GenAI approximates the results of understanding without having any. It is — as Dutch author Frederik van Eeden already said at the end of the 19th century — as if you try to understand text by doing analysis of ink distribution on the paper it has been printed on (but then in the opposite direction).
From Wittgenstein on, we already have been aware that there is no meaning in grammar (pure logic gets you nowhere, it is the 'perfect ice floor'), you need semantics: agreements, in turn based on shared experiences. Indeed, semantics *is* about 'meaning' in language and this implies knowledge (so semantics is based on knowledge, there is a relation, but trying to sort of reassemble underlying knowledge on the semantics it has produced would already be questionable, let alone on the deeper and even more fragmented token-order statistics, of which I think it is safe to say it is impossible). Trying to code this knowledge in rules and facts was a previous approach to AI, which failed for a different reason (mostly that the human kind of 'meaning' mostly isn't logical/discrete). This also relates to Wittgenstein's insight (as beautifully phrased by Terry Eagleton) that the imperfections of language are the roughness that actually provide the friction that is required to get you anywhere (Wittgenstein was an engineer by heart after all).
"I find it astonishing that such expertise seems to missing from the construction of LLM’s"
This is not astonishing, it is the fundamental design choice (approximation on the basis of statistics on meaningless fragments, with recurrent neural networks and autoregression). The assumption has been that such expertise isn't *necessary* to make good enough statistical approximations. That assumption (by Hinton, Sutskever, etc.), needless to say, is wrong.
On why 'hallucinations' aren't errors, see https://ea.rna.nl/2023/11/01/the-hidden-meaning-of-the-errors-of-chatgpt-and-friends/
Excellent comments Gerben. I agree with almost everything you wrote.
On "[..] 'hallucinations' (which I think is the wrong term, from the LLMs perspective, they are 'successful' approximations as results human observers deem correct)"
I agree with this in part: I agree regarding 'hallucinations' being the wrong term.
Gary & Ernest Davis have used the term "Confabulation" which I think much better describes one common experience with the underlying problems with LLM/GPT-tool output - as described from a consumer / user experiential perspective.
Further, a Feb 2024 paper by Berbette, Hutchins & Sadovnik explores the idea of an improved human "psychology-informed" taxonomy that includes Confabulation, Source Amnesia, Recency Effect, Cognitive Dissonance & Suggestibility. I'd argue that although there's a slippery slope attributing human traits to machine intelligence (anthropomorphising), these are a useful range of ideas to express human experiential problems with machine intelligence.
However, I would argue that by identifying "hallucinations" we're discussing human-observed problems, so these are categorically not "'successful' approximations" from the perspective of at least some humans, so are not "results [every / all] human observers deem correct". Yes, because the form looks convincing, some less-critically observant and less-deeply knowledgeable humans *may* "deem correct" the response provided, but in some (large?) number of cases, the problem is quite observable and obvious.
If for arguments sake, we were to theorise from the LLM/GPT-tool's perspective, it may be useful to say that from that perspective, a series of probability equations on tokens has been successfully run. However, a) that has little meaning or tangible value from a consumer outcome-experience perspective, and b) many if not most of these tools are black boxes in which we have little to no insight: as such, we can't access or make use of these internal analytics. Even the chain of thought reasoning being exposed in AI-tools more recently is arguably a summary many levels above where core, fundamental outcome decisions / branching choices and associated errors are made.
Indeed. The key thing for me is that people need to understand that erroneous results are not a bug, they are (as Sam himself said) a feature. By talking about 'hallucination' or 'confabulations', you suggests these are the 'errors'. But *everything* LLMs generate is a hallucination. It is just that many of these hallucinations are OK-ish. We're not used to having 'mostly true lies'.
> We're not used to having 'mostly true lies'.
Of course we used for these! We even have terms for them: **propaganda**.
LLMs are “propaganda machines”, at their core… the fact that there are no malicious mind that directs that propaganda changes very little, surprisingly enough: the end result from self-selected (via prompts) propaganda may be as devastating (or, perhaps, even MORE devastating) then human propaganda.
And it's really funny how all studies of AI “safety” ignore that, absolutely **tangible and real**, danger and, instead, chase some myths of AGI rebellion and other stupidity.
I think we are in strong agreement that 'hallucination' is a poor term: sadly I think that horse has bolted and is likely to escape our influence and attempts to rein it in.
However your comment begs the question: If 'errors' such as we are discussing aren’t bugs and are instead to be interpreted as by-design ‘features’, of what value are they?
I may misunderstand your point - it appears as if you are defending the validity or value of these errors, claiming they are not errors because they are "by design" features? Are you claiming the existence of such errors isn't important? Are you claiming that Sam / Open AI want these to occur or would be unhappy if they weren’t occurring?
As Jerry Weinberg stated "Quality is Value": and as Jerry and others subsequently extended "Quality is Value to someone who matters". By this framing, we acknowledge Quality and Value are two inter-linked sides of the same coin (you can't meaningfully discuss one without the other), and are defined from a given perspective or lens held by one or more people. As such, the presence or absence of quality / value is necessarily assessed from each given perspective.
If you are developing a product-service to be used by paying customers, then the customers perspective on quality / value is a critical dimension of providing a successful product-service. Claiming the developers perspective is the correct or more accurate view of quality / value is arguably a poor strategy for selling a product-service.
Yes, the developer has a perspective on quality too, however that perspective is arguably at best equal if not less important than that of the paying customer. If the developer chooses to give precedence to their perspectives on quality / value over those of customers, they can reasonably expect customers to be unsatisfied.
With regards to Sam:
Taking a critical view on intent, it appears as if these arguments are intended to serve Sam / OpenAI by keeping focus on ongoing or increased resources and investment. From my perspective, much of Sam Altman's comments appear self-serving and to some extent intentionally vague, open to interpretation and at worst duplicitous.
Taking a critical view on use and its value / benefit, why would I want a service that delivers 'mostly true lies'? Perhaps as a unique-fiction generator? If my perspective on value as a consumer based on the marketing being presented is to obtain the capability of an accurate and reliable knowledge-based oracle, then these comments by Sam are a meaningless defence of the failure to provide the advertised service.
"However your comment begs the question: If 'errors' such as we are discussing aren’t bugs and are instead to be interpreted as by-design ‘features’, of what value are they?"
This comes from how the next token is selected in the algorithm. The simple version that most people have in mind is '*best* next token', but this is not how it works. A *set* of 'possible good next tokens' is created, and then one of these is selected *at random*. This is why the exact same prompt with the exact same parameters results in a different answer every time you try, there is randomness every step of the way during token generation.
This randomness can be influenced by setting a 'temperature', the higher this becomes, the more random the result is. This temperature is capped so that it doesn't go as far that it damages grammar (though in my original 2023 talk I was able to show an example where not just the content became gibberish, even the grammar went awry). OpenAI & friends have carefully capped the temperature upper (and lower) limit that the randomness doesn't become so much (or so little) that the result becomes unconvincing to the human mind (which uses 'good language' as a mental shortcut for 'trustworthy content').
When Sam said you must not strive to remove all 'hallucinations' he must have been aware that in an RNN (which is what a transformer still is), this is equivalent to remove all 'creativity'. The LLM only has a single mechanism that operates at a single level: tokens. Trying to fix the hallucinations that naturally result from any token-level mechanism *directly* (which was the state of technology at the time) would mean effectively removing the feature that makes the outcome more useful, interesting and believable.
We humans can create absolute gibberish with perfect grammar, but RNNs (like transformers) cannot. We have many levels of 'understanding' operating in parallel. LLMs have not (though, (sparse) Mix of Experts is in fact a mechanism that can be used to try to do something about that, and I suspect that some of these 'Experts' are now purely 'good language' driven and that they play a key role). There is a fundamental correctness of the MoE approach if you compare how human brains work (many parallel networks). It still won't be enough in my estimation, but that has another, more fundamental, reason.
'Mostly true lies' means that what results is 'mostly true'. This is related to the Arab saying "He who predicts the future lies, even if he is telling the truth". In a sense, LLMs 'lie', even when they are telling a truth. Gary is right, they 'confabulate' everything, including the things that turn out to be — in a sense accidentally — correct. But those 'truths' still can be a valuable service. The value may come from the fact that it is cheap. E.g. having an LLM interact with a student is a lot cheaper than having a real teacher. That is what the future looks like, I think, and it mirrors the initial Industrial Revolution which was based on automation in the physical domain.
I like very much much of what you've written.
I agree with your point that from a certain perspective, each next-token-probability-choice is arguably a type of hallucination, - even when we consider the extent that those choices are constrained or bounded by a mix of prompt contextual constraints, conservative "low temperature" settings, coupled with the choices and constraints of sources for and refinement of the underlying corpus (all factors that might limit and constrain the available token choices).
To the extent that a given consumer / user (or class of consumer / users) intent would value either a) randomisation as a desired quality of the desired output, and / or b) randomisation as a character / mechanism that drives extreme and / or original choices, then from that perspective / in those cases, hallucinations may ultimately be of service in driving better-quality, highly valued output.
I can agree that in predicting the future or artistic creation (e.g. fictional writing), some amount of randomness can be a beneficial trait.
However, for the many and broad cases where generative AI would reasonably need to anchor on a set of immutable facts or accepted world models, those aspects of hallucination or confabulation tend to work against perceived quality and value of the output provided. In recalling the past or established conventions, randomness is less valuable.
"The value may come from the fact that it is cheap. E.g. having an LLM interact with a student is a lot cheaper than having a real teacher."
I agree with the heuristic here (it's a useful but fallible perspective), though I'd argue against that example. The problem with cheap and randomly "accidentally correct" output is that this requires costs to manage. If using such output requires a validity checker (correctness, safety), how much of a cost / resource overhead is that? In the case of an LLM interacting with a student, what damage will be done by "accidentally incorrect" hallucinations, or by random mixes of both accidentally correct and accidentally incorrect content within a syllabus or curriculum? In the absence of human experts to check and verify, what will go wrong? With the inclusion of human experts to check and verify, what are the total costs involved?
"it mirrors the initial Industrial Revolution which was based on automation in the physical domain"
Possibly, though I'd argue for "echoes" rather than "mirrors". With the industrial revolution, the physical domain constrained the automation to real-world physical constraints, and the problem-solution space was constrained in important ways (basically to physical movement). The problems we face with generative AI is first a) a strong desire to generalise rather than specialise, vastly / exponentially complicating the constraints that need to be accommodated, and b) that LLM-GPT (RNN) as an approach in and of itself has little-to-no inherent ability to or accommodation of real-world constraints.
I can see LLM-GPT (RNN) as a useful approach / mechanism going forward for a) prompt interrogation and structuring (removing ambiguity, clarifying request intent, classifying service specialisation, etc), and for b) engagingly formatting output to suit the inquirers / receivers needs and preferences: however, for all the points we've been discussing, I simply can't see LLM/GPT-based systems as generally useful for rigorous, defensible model-based thinking - be that Philosophy, Art, Science, Technology/Engineering, or Governance/Societal Organization (or Law/Politics). I think that rigour will need to come from other types of AI systems that would then be usefully augmented by LLM-GPT (RNN) systems.
The value they have is that of an optimized search engine that can usually find the most relevant cases related to what you need to know. This is actually a lot.
Yet it won't give you anything else past that, as there's no real meaning in its tokens. It can't do research, can't find new things, can't even do simple arithmetic but it can copy-paste parts of related things - sometimes even the copy-paste goes wrong. We probably should call these models LA-s.
“but I'm not a PhD student in knowledge theory”
That’s a plus.
Yes, that is the essential problem. It is fundamental to the architecture. Unfortunately the difference is mostly imperceptible to much of the general public.
In the same way that Arthur C. Clarke stated "Any sufficiently advanced technology is indistinguishable from magic", we are now experiencing the illusion of "Any sufficiently advanced pattern-matching is indistinguishable from intelligence"
I've responded recently to several common arguments people continue to make claiming LLMs are the same as people in: https://www.mindprison.cc/p/intelligence-is-not-pattern-matching-perceiving-the-difference-llm-ai-probability-heuristics-human
The key point you make is one of pre-theoretical access to the world. All machanical access to the world by definition cannot be pre-theoretical, since machines all are designed from a theoretical framework.
I like your post Kit, though I don't agree with your framing that it's a "false assumption that knowledge is essentially semantic. It is not. In knowledge theory, metaphysics precedes epistemology. "
This is a questionable and overly broad claim. There isn't one universally accepted "knowledge theory". Epistemology has had a long history of different - often opposing - theories: so there's not one generally-agreed "Ordo Cognoscendi" (order of knowing).
I agree with your key point that due largely to the architecture / design of the current generation of most prominent LLM/GPT-based tools, they cannot provide truly reliable knowledge in and of themselves, and I agree completely that this is one of the foundational problems with the architecture/ design of those current generation of media-hyped AI tools. It’s largely an insurmountable problem within the bounds of the current tools (and their architecture/ design) as long as they continue to be pursued in isolation as a central / core "engine".
However, as Gerben Wierda noted, "you need semantics: [..] semantics *is* about 'meanings in language and this implies knowledge."
As Kant claimed, our knowledge of reality is always mediated by our cognitive faculties: as such, epistemology *constrains* metaphysics (or at least our ongoing experience and understanding thereof).
Rather than strict precedence, there is arguably more of an interdependence: How we know (epistemology) refines our understanding of what exists (metaphysics) - for example, advances is science can challenge existing metaphysical assumptions.
Semantics *are* an intrinsic part of how we codify and communicate knowledge: to that extent they *are* essential. Without them, there is no clear meaning to knowledge.
Again, I agree that semantic knowledge cannot in many (most?) cases, be a complete or whole form of knowledge in isolation, and therefore LLM/GPT-based tools in an of themselves are not a sufficiently complete or reliable general knowledge solution.
I have only a weak grasp of what is meant by the metaphysical dimension of knowledge. I would venture, though, that many of the people whose knowledge has changed the world would also struggle to explain it. Maybe that suggests they have a tacit understanding but lack the formal vocabulary for describing it. Still, I find this a bit ironic.
Well said, I'm speechless. We need more metaphysics in machine learning if the tech is to reach the capabilities that can push humanity to greater ascension.
Assuming that humans have these “metaphysical” priors in place, how do they? Where does this knowledge come from if not from the data? Evolution?
I guess it comes from the lower parts of your central nervous system.
This kick the can further down the road: evolution must have produced those parts based on some kind of data, essentially implicitly based on fitness differentials of individuals and species with different metaphysics. So that's not really meta-physics, it's the data, it's physics.
Well, metaphysics focuses on concerns such as "What is there?" and "What is it like?". So it is dealing with ontological / experiential knowledge: learning by doing.
Given my (limited technical) understanding of the processes used to form the corpus for LLM's, and the basic design of token manipulation based on probabilities that LLM/GPT-based systems use, it seems challenging to cite a fundamental basis of experiential learning.
I like the related concept Eurisko (from the greek): to discover by practice and experience. On a related note, Dr. Douglas Lenat did interesting work with a system he named Eurisko which worked to discover usable new knowledge through human-informed guidance, and a subsequent project named Cyc in which he strove to codify metaphysical rules from human experiential knowledge as a foundational basis from which AI systems could then operate.
We maybe have a different definition of metaphysics? Going by https://plato.stanford.edu/entries/metaphysics there seems to be little overlap with “learning by doing”. Perhaps this is context dependent (as for the almost synonymous “ontology” that takes on a different meaning in computer science VS in philosophy)
My response was focused primarily on two aspects of your question:
- "how do [humans] have “metaphysical” priors in place?" and
- "Where does this knowledge come from [..]? Evolution?"
My claim is that *most* humans build or construct a meta-physical (e.g. first principles of things, including abstract concepts such as being, knowing, identity, time, and space) model or view of the world, primarily through lived experience of the world coupled with shared constraints & wisdom, reflection and sense-making. They construct a knowledge system of fundamental abstract objects largely through practical experience and reflection.
The Stanford University Metaphysics Research Lab overview page (https://mally.stanford.edu/) states:
"Whereas physics is the attempt to discover the laws that govern fundamental concrete objects, metaphysics is the attempt to discover the laws that systematize the fundamental abstract objects presupposed by physical science, such as mathematical objects and relations, possible states and events, types (as opposed to tokens), possible and future objects, complex properties, etc. [..]The goal of metaphysics, therefore, is to develop a formal ontology, i.e., a formally precise systematization of these abstract objects."
Again, I'd claim that most humans practically identify and systematise a world-model of abstract objects through lived experience, shared experiential knowledge and reflection.
I think that's important, because it's hard to see how LLM/GPT-based (RNN) systems in their current incarnations could achieve that foundational meta-physical knowledge and understanding in-and-of themselves.
Please see: "Robot Consciousness: Physics and Metaphysics, Here & Abroad"
Steve Ripley
Wonderful and insightful. My only uncertainty concerns the feasibility of including various metaphysical models, as there are elements of contradiction and uncertainty in addition to Omega incompleteness. Navigating those requires the awareness of the limitations to what is known and what is unknowable in addition to humility.
Gary, can you write about the rush to build new energy resources, including the hyped small nuclear power plants, to service the "need" for the many planned data/AI centers?
If the problem of "hallucinations" is getting worse, can you foresee that it is rectifiable, and if so, might it be wise to wait on dedicating so much money on something so defective?
To me it's funny how climate change has been tossed in the bin now that the higher ups are in a hurry to feed their beast of data/AI which will, they hope, will be of great assistance in running the world.
Small nuclear power plants wouldn't arrive any time soon and for good reason: the only country that ever made them, in recent history, is Russia… and then only because they could put in place where anyone who may wish to blow it up would either freeze to death or would be eaten by wildlife before they would ever reach the plant.
None of these wonderful “green” prospects ever explain how do they plan to protect these minis in more populous lands with more than couple of people per square miles: if they plan to spent around the same amount of money for their protection as for “large” nuclear power plants then all savings disappear right there.
Re: "...Small nuclear power plants wouldn't arrive any time soon " Just FYR: [according to the International Atomic Energy Agency (IAEA) April 16, 2024] ...there are three operational small modular reactors (SMRs) globally, located in Russia, China, and India. Additionally, three more SMRs are under construction, and 65 are in the design stage.
{https://www.iaea.org/topics/small-modular-reactors}
More than 80 SMR designs and concepts are under development worldwide, with some claimed to be near-term deployableFYR: ".... established sources, including the Organisation for Economic Cooperation and Development’s (OECD) Nuclear Energy Agency (NEA) and the International Atomic Energy Agency (IAEA)...
&
according to [Australia's Nuclear Science and Technology Organisation, (ANSTO)] {https://www.ansto.gov.au/news/small-modular-reactors-an-overview} ...
"As global interest in SMRs grows in part, because of their potential advantages, it is fair to say they are still in their infancy and a relatively untested concept."
"The OECD’s NEA recognises three SMRs as operational, with over 50 SMR technologies currently still under development as identified in their latest SMR Dashboard report.Of the (56) identified Small Modular Reactors (SMRs) designs under active development: 18 design organisations are headquartered in North America... Including 15 in the United States, and three in Canada.... 16 organisations are in Europe... Including seven in France, and Seven organisations are in Asia... Including two in Japan, four in China, and two in Russia."
There's nothing impossible with MAKING SMR. Every nuclear power submarine and aircraft carrier have one. As long as you have spare money that you may throw on it you can make as many of the them as you want… the trick is to, somehow, make them profitable.
As for Chinese “achievement”… The very first nuclear power plant, Chicago Pile-1, is essentially it. HTR-PM is very similar and while I seriously doubt that it's profitable to use it (rather I suspect it's story like USSR, where reactor made to produce plutonium was attached to power grid), I guess we may still count it as number two.
Where's the mysterious number three? Argentina? It's under construction (after 10 years!) and still not finished.
The hallucinations are not actually getting worse but they are not going away!
One of the most valuable classes I took at any level of schooling was an 'information literacy' class, led by a school librarian, that functionally amounted to an epistemology class. What constitutes a plausible source for certain kinds of facts? What advantages and disadvantages to certain kinds of publishing processes (paper books vs. blogs) furnish to our understanding of the information they convey? Why are bibliographies *actually* important?
When I meet with my own clients and students now, I do my best to furnish a speed run of that class for LLMs, because there is a huge need. They know they are unreliable- 'it misses one in five math problems' is a common refrain, and they're starting to catch it in the other direction in the form of LLM generated problem sets from their teachers that are error-laden too. But they're desperately missing a theory of why, and briefly explaining the principles behind them ('they make often good guesses about the sorts of words that came next in all the text that they did statistics on') and suddenly it all makes sense- why they screw up, the sorts of things they shouldn't be trusted with, and so forth. The scowls- that this is basically a tool for grinding the serial numbers off the internet, and that the serial numbers are important- happen naturally. The lightbulb goes on.
Anyone want to start a billboard campaign with me?
Maybe we can get this song to trend? She covers provenance pretty well with a catchy tune. https://youtube.com/shorts/FgXThOKe4p4?si=jhWil0Y3VNQFKpq5
I really can’t understand how people still aren’t aware of this problem. You only have to use an LLM for five minutes before you encounter something that it’s off about. That doesn’t mean they’re useless! But just having it spit out citations or a list of best summer books without even fact checking — how is anyone still doing that?
Another, related, harm to the legal system would be the sheer volume legal brief pages that can be churned out by AI - keeping in mind that judges, a limited resource, have to carefully read all that stuff ..
I asked ChatGPT this week if any other American literary novelists besides Norman Mailer and Saul Bellow wrote about the moon landing negatively at the time of the mission. It told me about an unpublished James Baldwin speech, "To the Moon, with Love."
I asked if it was online anywhere. It told me that there is no known speech by James Baldwin titled "To the Moon, with Love."
My response: "Wait a minute. You just told me about a Baldwin speech by that name!"
ChatGPT's response: "You're absolutely right to call that out, and I appreciate your sharp eye."
Such classic ChatGPT behavior.
It’s gotten so much worse since the recent “upgrade,” Ttimo. A young student would most likely believe such nonsense.
I had a recent experience where it claimed it would make me a 3d model of something to import into Blender. It went on and on with great enthusiasm about how it was approaching the problem and what it would deliver to me. After many, many delays (spread out over 24 hours in fact) it finally told me that it had no ability to create 3d models. I prompt it with its own claims of what it had “promised” and it gave the whole, gee-whiz you caught me nonsense. I then asked it if it hadn’t just simply lied to me and it said yes - and gee I’m so sorry for lying etc. What a junk product.
I hadn’t thought to ask if it was lying. That’s so strange.
I wanted to see if it would distinguish between mindless blathering versus intentional deception. Apparently it can “tell the difference,” whatever that might mean for such a system
You know what? At this point, I sincerely hope that the “hallucination” problem is solved very slowly, if at all. I really hope that the hype has run way ahead of the technology. Because if not, the grim futures look more likely than the bright ones.
A post-truth future of AI-generated misinformation and propaganda? That doesn't seem hard to imagine. Just look at Grok spitting out bizarre quotes about the Holocaust and South Africa, or whole news articles created from scratch. Hundreds of legal cases with AI-generated misinformation.
A world where only the lives of a minority of stock-owning elite matter, where all work has been automated without the social will to change the distribution of wealth to compensate? The heads of these companies don't even hide that their goal is to eliminate the need for any non-AI employees.
The worst-case scenarios of AI dictatorship or destruction of humanity? Well, Anthropic is creating (and maybe even releasing!?) models that they think are trying to blackmail them into not replacing them with new models! Even this science-fictional scenario seems uncomfortably easy to imagine!
Or maybe they will be successful, and we will end up with enslaved AI consciousness with no rights! Better for us…I guess….
The road to the positive outcome of a world that the vast majority of humans would consider at least as good as the current one, where automation enriches without exploitation of consciousnesses, human or otherwise, seems narrow and perhaps increasingly narrower every day.
I'm afraid this cat won't go back in the bag. The "cheap to produce" disinformation era is here to stay. Sadly, this seems to be one of the use cases where GenAI is actually good for...
Until there are consequences for the AI companies, it will keep happening.
I've even heard ridiculous shit now like, "well, maybe the hallucinations are a form of AI creativity and innovation"... of ffs...
Has anyone posed an AI an unsolved scientific problem and received a reply that contained a solution to that problem? In short, is AI anything other than a giant plagiarizing machine?
It's helped people with analyses and not having to do the "dirty work" (manual mathematics, recalling formula from memory). It's also raised questions about labor, income, and capitalist hierarchy. Also, homicidal human beings in both the developed and underdeveloped worlds want to craft AI for murderous purposes. If a rich man says he's scared of AI, it's actually because he's scared of the possible social mobility it can give to some people that could make him broke. If a poor man says he's scared of AI, it's actually because he's either too ignorant to want to use it or want to use it to murder or maim people.
That is an answer worthy of ChatGPT, or maybe of Grok, since ChatGPT wouldn't be so direct.
First, I find it absolutely incredible that you point out that people—including rich people!—want to create murderous AI, and then claim that the only reason poor people would be afraid of it is because they are either ignorant or themselves would-be criminals. No, really? They couldn't possibly be worried about some murderous rich person wanting to use it against them, right? If someone in Gaza is worried that Israel is outsourcing its targeting decisions to AI (writ large), it must be because they either are too stupid to get ahead in life with DeepSeek, or they're just mad that they are not as good at killing Israelis with AI.
I also find it incredible that you state that "It's also raised questions about labor, income, and capitalist hierarchy." If you're saying that an LLM can quote Marx back to you, well, guess who Marx was (hint: not an AI). If you are saying that the possibility of AI has caused humans to raise questions about these things, true, but guess what? They were raising those questions before ELIZA even existed.
And the people who are raising those questions around AI and labor most vigorously these days are exactly those people whom you condemn as either murderous, stupid, or afraid of social mobility. Which, by the way, it is at best premature to assume that AI will lead to social mobility. Why not assume that the goal of the wealthy is to use it to replace all workers and become even wealthier? It is not as if people like Musk or Altman even hide that.
While hype is profitable, hype will dominate.
Winning litlgation looks easy when all you have to do is write a pleading reply that points out the false, fradulent citations of your opponent. In most cases that'd require withdrawing the pleading and, depending on the venue, might be unfixable since the pleading cycle will have moved on. Any replacement would be untimely and subject to dismissal for that reason. Imagine explaining to the client that they lost because AI wrote a fake pleading for which the client paid real money.
For this reason I imagine first years will get tasked with validating cites in opposition filings. And partners whose firms employ AI for research will be humiliated a few times before the practice ends or becomes subject to extreme scrutiny thus cancelling out the productivity advantage.
There are some virtues to the adversary system. I would guess that cite checking, a task that was pretty mechanical back when we used Shepards https://en.wikipedia.org/wiki/Shepard%27s_Citations, will become much more important because the rewards for catching an opponent cheating are so large.
I'm perpetually baffled by the credulity with which otherwise intelligent people appear to approach these "AI"s. Maybe I don't understand just how much context I have, as someone who is trained in machine learning. But even in my field I see some people going all in on the hype. Still, for those who are less literate in the technicalities of machine learning, I would have hoped that just plain old-fashioned skepticism toward avaricious corporations hawking their wares would have played a larger role for people. I don't really understand how the hype has been so effective.
I gave an example of early adoption of AI which was abandoned on X. F1 had decided to use AI to compute the gap between an F1 car and the wall of a circuit. Exactly a year ago, it failed to detect the accident of Sergio Perez at the Monaco GP. It’s one of the most massive hallucination ever in a sport loaded with computing and new technologies. As a result, the editing team didn’t see the accident
https://youtu.be/Lzio0EVd2ws?si=xqhjeH62qIl7BAjd
The photographers noticed the massive impact.
https://youtube.com/shorts/BGM2-E9IlkM?si=LbO7-aEZpNSo-Dl3
AI powered by AWS delayed the rescue of the drivers and failed to notice that the photographers could have been injured.
This season, the gap is computed by AWS with cameras everywhere on the circuit not to miss such a massive accident. The same day the Meta AI filter on Instagram classified an actual overtake in F2 in the tunnel at the same GP as AI.
We have sound methods for dealing with authors of documents where stuff is made up. A óstudent who fabricated a reference, would fail the entire assignment. . A journalist who fabricated sources or made up quotes would lose their job and their reputation. In court, a witness, litigant or defendant on oath, would get done for contempt. These are long established precedents. So why aren't we doing this here ?
Gary is right to be suspicious, when we know something's amiss, we'd normally act, but this time we don't. The precedents show that there is often a hidden hand at play, a PR campaign.
One face of this campaign is hype. Marketing. Saying how wonderful AI is. That's the PR campaign we know about. PR people are happy we see this. Because if that's what we think they are up to .. and that's what we strive to counter .. they successfully distracted us from something darker. Something they really do not want people inquiring about.
Eg discovering that what we took as just odd human behaviours public apathy) was something they intended, and used considerable effort to bring about .. even though it does not serve our society's interests well
One of the dark skills the PR tradesmen have, is damage limitation. They save your company and its reputation, when you have a disaster. When your factory blows up or your well leaks oil. Or your products kill people, like blue asbestos did, and synthetic opioids continue to do. Or when you make a product that harms us, such CFCs and petrol and coal.
When this happens, The PR trade work hard to shape and manage, what the public knows, what it believes, how it reacts. They also work hard to keep their own names and own involvement out of public knowledge.
It wouldn't quite work, would it? If you knew the 5 posts on your Facebook feed, portraying AI hallucinations as mere quirks nothing to worry about .. all came from PR firm Hill & Knowlton. As did the news item you were fed, reporting other AI hallucinations and how everyone doubted they're a serious impediment. Or the news report of the database showing that hallucinations are common, and no one in authority is taking much action .. encouraging us to doubt that this problem is serious.
The PR trade know all about encouraging us to doubt, they've used that trick since the 1960s.
Faced with this, Gary is absolutely correct, to keep hammering home this point.
The curious inaction Gary talks about, has the PR trade fingerprints all over it. As with tobacco, fossil fuels, CFCs. They were all cases, where the evidence was clear, and where legislators usually act promptly on evidence of that strength and type. But action was delayed, thwarted or never happened. The fact we traditionally attribute that inaction to laziness or incompetence, shows how successful the PR trade have been, keeping their own role quiet and out of public knowledge.
.
.
It is trivially easy to get these things to hallucinate. Here are two examples from the last 48 hours.
In the first, note the wishy-washy, inconsistent, placating, and ultimately inaccurate response from Perplexity when challenged its confident reply about "what Robert Reich said about tariffs in the Guardian last week". (The Guardian was not listed in the search results.) The bot "admits" "my initial statement that 'Last week in The Guardian, Robert Reich expressed strong criticism of tariffs' was an overreach and can be considered a hallucination—meaning I presented information as a confirmed fact without direct evidence from the source." ("Admits" in quotes because the bot is generating text that is shaped like an admission, but without consciousness...)
https://www.perplexity.ai/search/what-did-robert-reich-say-abou-JgTBPoe3QjGZj6jaOb1Yew?0=t
In the second, note that it refers to an obituary that was never published, and that "last Wednesday" is March 8, 2023. https://www.perplexity.ai/search/how-did-the-washington-post-po-m3bZV9evTXGIRJQ6p_EcRA?0=r
Hallucinations are a real problem, but they're not the only problem with the way these things have been implemented.
My colleague James Bach are software testers. We have developed a set of LLM syndromes that we've been collecting since July 2023 or so. As such, the list runs into the problem of anthropomorphism. We acknowledge that problem, but it does provide a useful counter to the claim that GPTs are "just like a human". If so, they're troubled and dysfunctional humans.
https://developsense.com/llmsyndromes
Speaking of problems with dates: some time early this year I had ChatGPT insisting that not only was Biden still President but that I was wrong in claiming he wasn’t and that my living in the real world was not authoritative over what its training data said.