Gary, I cannot thank you enough for this post. As I am not an engineer, but an economist and planner, I need this kind of detail to help me see the structure of the hype-machine that Silicon Valley is addicted to. The model of: putting half-baked versions of technology out there to get mind share and early adopter market share; followed by marketing hype that tells half-truths and incomplete information; followed by a funding round that gets a billion dollar valuation, is addictive. I get that. But that is dangerous when dealing with something like AI, which you have pointed out has significant risks to not just investors, but the public at large when misused. Unfortunately, there is no way to constrain the SV hype model at this time, besides thoughtful assessments put into the public domain as you and others are doing. Thank you!
And the misuse is for real and serious. I think the big companies who have a lot of money to spend and many engineers can continue to train such models, which then get heavily misused (fake news/misinformation/disinformation, toxic content, deception,.....) that then we academic researchers spend our time to make this less wrong, less dangerous....and yet, we are far from being able to tell what stuff is just made up by these models (hallucination), or what is written by an AI system or human (AI detactability - counter-Turing test; our paper that just won EMNLP outstanding paper award)), or what is fake and what is real: https://www.linkedin.com/feed/update/urn:li:activity:7117565699258011648
I don't understand why they feel the need to over hype these things. They are pretty remarkable, even with faults and can be used as great tools with the appropriate guardrails in place/training.
The officers of any profit-motivated company have a (legal) fiduciary duty to maximise shareholder value. A little bit of marketing puff is considered to be par for the course.
Because they are incredibly expensive to build and the very existence of the companies building them (or the divisions in bigger companies like Google) are being upheld by expectations of future LLM/AGI ability. If these products perform less reliably than a human that can be hired at a fraction of the cost, then the valuation of the companies building them will tank and they will no longer have the money to continue designing new systems.
Everyone expects them to do better in the future, so Microsoft, Google, Meta, etc. are willing to keep pumping money into more and more expensive systems. If it turned out that there were hard limits or that progress had slowed for the foreseeable future, then these companies may be looking at hard times or closure.
Dec 7, 2023·edited Dec 8, 2023Liked by Gary Marcus
Thank you Gary for doing the semi-thankless job – along with philosophers – of continuing to ask good questions, seeing clearly, raising valid concerns, and generally being willing to say something more common sense than the crowd is often willing to hear, as they smash along in the bandwagon towards the next exciting thing, the next shining hope of our savior the LordAGI. As the wagon lurches on towards Reality, one can at least take some pleasure and peace in not being impaled on the rocks of dashed hopes and dreams! :)
"Lies, big lies, statistics, benchmarks". Hidden in Google's marketing blitz is actually some interesting information, that — if you take a good look — is pretty sobering. E.g. one can estimate that GPT4 has about a 1 in 100 billion chance of producing a correct small (~1000 lines of code) program (really) and Google's — 'state of the art'-surpassing — improves to 1 in 20 million... https://ea.rna.nl/2023/12/08/state-of-the-art-gemini-gpt-and-friends-take-a-shot-at-learning/
Hi there, I believe I was able to get GPT-4.5 to achieve the result Filip was after using Chain-of-Thought prompting. It's worth noting that this was done without examples, but allowing the system to work iteratively towards a more complex test case. I started with two intersecting rectangles, and then moved on to an irregular hexagon intersecting a rectangle. Here's my chat thread, which ends with the latest code. Would love to hear anyone's thoughts. Please let me know if you have problems accessing the thread. Thanks!
GPT-4.5 didn't "achieve the result". Some of the cognition required to solve the problem was going on inside your head, and some was going on inside GPT-4.5.
Think of it this way. Imagine the *minimum* amount of information that is strictly needed in order to formally specify the problem in question (e.g. a pre- and post-condition for the desired program). If you provide a problem description containing exactly this amount of information (and no more), and the machine (e.g. LLM-based chatbot) generates a correct solution, then the machine has solved the problem "all by itself". But if you provide *more* information than is strictly necessary (and if the machine is unable to correctly solve the problem without this additional information) then you will have performed some of the problem-solving reasoning yourself.
Andrej Karpathy likens prompt engineering (such as tree-of-thoughts, chain-of-thoughts, etc) to Kahneman's System 2 (going on inside your head) vs System 1 (going on inside the LLM). See this video [https://www.youtube.com/watch?v=bZQun8Y4L2A] at 27:28.
A thought (well, cheap shot of the day sort of thing, but...) crossed my mind: Maybe we've forgotten the lesson that we should have learned from Theranos: be cautious when dealing with Stanford dropouts.
In my little diagram of the AGI space [p 13, here: https://www.bigmother.ai/_files/ugd/d2335c_d884ad862fe94aa38151cb99e1fe6e74.pdf], GPS (The General Problem Solver, developed in the 1950s and 60s) is (by my reckoning) an AGI K. The GPT4 version of ChatGPT, released some 60 years later, is (by my reckoning) also an AGI K. And now it appears as though Gemini is also going to be ... (wait for it) ... an AGI K. It's starting to look as though AGI K is a difficult barrier to overcome! :-)
How much "subtle deception" can be pushed for something to be "actual deception?" A pointed question of "did you guys made this look like in real time" can be given the slippery reply of "we didn't state anywhere that it was in real time." Ugh.
While we're obsessing over consumer chatbots, the geopolitical powers are busy integrating AI in to their military systems. As the following video outlines it, whoever can act the fastest has a huge advantage on the battlefield, so AI is being used to automate ever higher levels of decision making. Here's the trailer on YouTube, full video on Netflix.
Sorry to be off topic of Gemini and the AI consumer hype machinery, but sometimes it's useful to stand back and put a topic in a wider context. As I understand it, the same competitive pressures that have forced the great powers to mass produce nuclear weapons will also force them to hand more and more military decision making over to automated systems. He who shoots last loses, so....
Gary, I cannot thank you enough for this post. As I am not an engineer, but an economist and planner, I need this kind of detail to help me see the structure of the hype-machine that Silicon Valley is addicted to. The model of: putting half-baked versions of technology out there to get mind share and early adopter market share; followed by marketing hype that tells half-truths and incomplete information; followed by a funding round that gets a billion dollar valuation, is addictive. I get that. But that is dangerous when dealing with something like AI, which you have pointed out has significant risks to not just investors, but the public at large when misused. Unfortunately, there is no way to constrain the SV hype model at this time, besides thoughtful assessments put into the public domain as you and others are doing. Thank you!
And the misuse is for real and serious. I think the big companies who have a lot of money to spend and many engineers can continue to train such models, which then get heavily misused (fake news/misinformation/disinformation, toxic content, deception,.....) that then we academic researchers spend our time to make this less wrong, less dangerous....and yet, we are far from being able to tell what stuff is just made up by these models (hallucination), or what is written by an AI system or human (AI detactability - counter-Turing test; our paper that just won EMNLP outstanding paper award)), or what is fake and what is real: https://www.linkedin.com/feed/update/urn:li:activity:7117565699258011648
I don't understand why they feel the need to over hype these things. They are pretty remarkable, even with faults and can be used as great tools with the appropriate guardrails in place/training.
Marketing departments have to do something to justify their existence.
The officers of any profit-motivated company have a (legal) fiduciary duty to maximise shareholder value. A little bit of marketing puff is considered to be par for the course.
Because they are incredibly expensive to build and the very existence of the companies building them (or the divisions in bigger companies like Google) are being upheld by expectations of future LLM/AGI ability. If these products perform less reliably than a human that can be hired at a fraction of the cost, then the valuation of the companies building them will tank and they will no longer have the money to continue designing new systems.
Everyone expects them to do better in the future, so Microsoft, Google, Meta, etc. are willing to keep pumping money into more and more expensive systems. If it turned out that there were hard limits or that progress had slowed for the foreseeable future, then these companies may be looking at hard times or closure.
Thank you Gary for doing the semi-thankless job – along with philosophers – of continuing to ask good questions, seeing clearly, raising valid concerns, and generally being willing to say something more common sense than the crowd is often willing to hear, as they smash along in the bandwagon towards the next exciting thing, the next shining hope of our savior the LordAGI. As the wagon lurches on towards Reality, one can at least take some pleasure and peace in not being impaled on the rocks of dashed hopes and dreams! :)
Great python example. Credibility is eroded. Articles with over the top AI claims appear, seemingly unchecked, in venues such as Nature. Post truth?
Gary, you the man! Just keep calling bullshit on them brother!
https://x.com/garymarcus/status/1732884774887567516?s=46
Thank you so much. Your insights, commentary and willingness to share are much, much appreciated.
"Lies, big lies, statistics, benchmarks". Hidden in Google's marketing blitz is actually some interesting information, that — if you take a good look — is pretty sobering. E.g. one can estimate that GPT4 has about a 1 in 100 billion chance of producing a correct small (~1000 lines of code) program (really) and Google's — 'state of the art'-surpassing — improves to 1 in 20 million... https://ea.rna.nl/2023/12/08/state-of-the-art-gemini-gpt-and-friends-take-a-shot-at-learning/
Hi there, I believe I was able to get GPT-4.5 to achieve the result Filip was after using Chain-of-Thought prompting. It's worth noting that this was done without examples, but allowing the system to work iteratively towards a more complex test case. I started with two intersecting rectangles, and then moved on to an irregular hexagon intersecting a rectangle. Here's my chat thread, which ends with the latest code. Would love to hear anyone's thoughts. Please let me know if you have problems accessing the thread. Thanks!
https://chat.openai.com/share/e11c5f9c-e3ed-4007-8b7c-9f372509da6c
GPT-4.5 didn't "achieve the result". Some of the cognition required to solve the problem was going on inside your head, and some was going on inside GPT-4.5.
Thanks for your reply! Can you explain that statement in more detail? I'm curious where you see me as contributing cognitively to the task.
Think of it this way. Imagine the *minimum* amount of information that is strictly needed in order to formally specify the problem in question (e.g. a pre- and post-condition for the desired program). If you provide a problem description containing exactly this amount of information (and no more), and the machine (e.g. LLM-based chatbot) generates a correct solution, then the machine has solved the problem "all by itself". But if you provide *more* information than is strictly necessary (and if the machine is unable to correctly solve the problem without this additional information) then you will have performed some of the problem-solving reasoning yourself.
Andrej Karpathy likens prompt engineering (such as tree-of-thoughts, chain-of-thoughts, etc) to Kahneman's System 2 (going on inside your head) vs System 1 (going on inside the LLM). See this video [https://www.youtube.com/watch?v=bZQun8Y4L2A] at 27:28.
My own (unfinished draft) AGI paper attempts to define when a problem-solver is "maximally-intrinsically intelligent", i.e. when it's doing all its own thinking. See section 1.2.22 of [https://www.bigmother.ai/_files/ugd/d2335c_d884ad862fe94aa38151cb99e1fe6e74.pdf] for details.
Hope this helps!
Thanks! I'll check out those links and think on this!
No special prompting needed, if you count using a geometry package: https://chat.openai.com/share/783f5333-9f24-4aba-8cc0-70cf76610ebe
A thought (well, cheap shot of the day sort of thing, but...) crossed my mind: Maybe we've forgotten the lesson that we should have learned from Theranos: be cautious when dealing with Stanford dropouts.
In my little diagram of the AGI space [p 13, here: https://www.bigmother.ai/_files/ugd/d2335c_d884ad862fe94aa38151cb99e1fe6e74.pdf], GPS (The General Problem Solver, developed in the 1950s and 60s) is (by my reckoning) an AGI K. The GPT4 version of ChatGPT, released some 60 years later, is (by my reckoning) also an AGI K. And now it appears as though Gemini is also going to be ... (wait for it) ... an AGI K. It's starting to look as though AGI K is a difficult barrier to overcome! :-)
There is a wide, wide gulf between academic points and systems in production.
How much "subtle deception" can be pushed for something to be "actual deception?" A pointed question of "did you guys made this look like in real time" can be given the slippery reply of "we didn't state anywhere that it was in real time." Ugh.
A massive stochastic memorization machine, anyone ?
:)
It is a fair statement that Gemini is in the ballpark of GPT-4. Google finally caught up (more or less).
None of this means that progress will stop. LLM is a first-cut approach, it simply finds the most likely solution.
We will likely see specialized modules being integrated into the workflow, to address specific classes of problems. Lots of cool stuff ahead.
I don't know, man. GPT-4 gives me a correct answer to the polygon intersection question using shapely: https://chat.openai.com/share/783f5333-9f24-4aba-8cc0-70cf76610ebe
As many of you undoubtably know already....
While we're obsessing over consumer chatbots, the geopolitical powers are busy integrating AI in to their military systems. As the following video outlines it, whoever can act the fastest has a huge advantage on the battlefield, so AI is being used to automate ever higher levels of decision making. Here's the trailer on YouTube, full video on Netflix.
https://www.youtube.com/watch?v=YsSzNOpr9cE
Sorry to be off topic of Gemini and the AI consumer hype machinery, but sometimes it's useful to stand back and put a topic in a wider context. As I understand it, the same competitive pressures that have forced the great powers to mass produce nuclear weapons will also force them to hand more and more military decision making over to automated systems. He who shoots last loses, so....
The (tech as the lead element in change) has turned marketing, branding, and PR on its head.
Tools don’t change paradigms, values and principles do.
Tech can never be a value or principle for change, thus, tech is just a tool. Use it accordingly.