Friends don’t let friends take demos seriously
Gary, I cannot thank you enough for this post. As I am not an engineer, but an economist and planner, I need this kind of detail to help me see the structure of the hype-machine that Silicon Valley is addicted to. The model of: putting half-baked versions of technology out there to get mind share and early adopter market share; followed by marketing hype that tells half-truths and incomplete information; followed by a funding round that gets a billion dollar valuation, is addictive. I get that. But that is dangerous when dealing with something like AI, which you have pointed out has significant risks to not just investors, but the public at large when misused. Unfortunately, there is no way to constrain the SV hype model at this time, besides thoughtful assessments put into the public domain as you and others are doing. Thank you!
I don't understand why they feel the need to over hype these things. They are pretty remarkable, even with faults and can be used as great tools with the appropriate guardrails in place/training.
Piekniewski's observation about Bard's failure on a programming problem resonated with me because I've long suspected that LLMs' success on programming problems is due largely to people giving the models textbook problems . So, I decided to give it a go with a problem that came up recently in a project I've been working on .
In brief, the results were a disaster. The code it produced was riddled with syntax errors. Even once those were fixed, it was never going to work as written because it tried to use a field from an intermediate result set that was not actually selected. Once that was fixed, it gave the wrong answer.
Programming assistants are supposed to be one of the killer-apps for AI, but at least for now you have to have a fair bit of programming skill to fix the model's mistakes. The incorrect answers are even worse. Testing new code is hard, especially when it's code meant to solve a problem that you don't already know the answer to, which is pretty much always the case in scientific programming. In the problem I described, my first step was to construct a dataset where I could compute the answer by hand, but a lot of people don't put in that kind of effort, which means that a lot of these bugs will go undetected. Therefore, I predict that the first big impact of the AI revolution in programming will be a whole raft-load of bugs escaping into production because the AI generated code produced plausible-looking results, and nobody was skeptical enough to dig deeper.
 They've also been successful, I've heard, with complex API calls with a lot of parameters to remember, which is useful, but not that impressive. IDEs have been doing that without AI help for at least 15 or 20 years.
 This is one of the few cases I've encountered in the past decade where the programming was actually the tricky part. Usually the hard part is precisely specifying the problem I am trying to solve, or figuring out what data might shed light on the effect I am trying to study, or things of that sort. As far as I know, the current crop of AI marvels can't really help with any of that.
Thank you Gary for doing the semi-thankless job – along with philosophers – of continuing to ask good questions, seeing clearly, raising valid concerns, and generally being willing to say something more common sense than the crowd is often willing to hear, as they smash along in the bandwagon towards the next exciting thing, the next shining hope of our savior the LordAGI. As the wagon lurches on towards Reality, one can at least take some pleasure and peace in not being impaled on the rocks of dashed hopes and dreams! :)
Great python example. Credibility is eroded. Articles with over the top AI claims appear, seemingly unchecked, in venues such as Nature. Post truth?
Gary, you the man! Just keep calling bullshit on them brother!
Thank you so much. Your insights, commentary and willingness to share are much, much appreciated.
"Lies, big lies, statistics, benchmarks". Hidden in Google's marketing blitz is actually some interesting information, that — if you take a good look — is pretty sobering. E.g. one can estimate that GPT4 has about a 1 in 100 billion chance of producing a correct small (~1000 lines of code) program (really) and Google's — 'state of the art'-surpassing — improves to 1 in 20 million... https://ea.rna.nl/2023/12/08/state-of-the-art-gemini-gpt-and-friends-take-a-shot-at-learning/
Hi there, I believe I was able to get GPT-4.5 to achieve the result Filip was after using Chain-of-Thought prompting. It's worth noting that this was done without examples, but allowing the system to work iteratively towards a more complex test case. I started with two intersecting rectangles, and then moved on to an irregular hexagon intersecting a rectangle. Here's my chat thread, which ends with the latest code. Would love to hear anyone's thoughts. Please let me know if you have problems accessing the thread. Thanks!
A thought (well, cheap shot of the day sort of thing, but...) crossed my mind: Maybe we've forgotten the lesson that we should have learned from Theranos: be cautious when dealing with Stanford dropouts.
In my little diagram of the AGI space [p 13, here: https://www.bigmother.ai/_files/ugd/d2335c_d884ad862fe94aa38151cb99e1fe6e74.pdf], GPS (The General Problem Solver, developed in the 1950s and 60s) is (by my reckoning) an AGI K. The GPT4 version of ChatGPT, released some 60 years later, is (by my reckoning) also an AGI K. And now it appears as though Gemini is also going to be ... (wait for it) ... an AGI K. It's starting to look as though AGI K is a difficult barrier to overcome! :-)
How much "subtle deception" can be pushed for something to be "actual deception?" A pointed question of "did you guys made this look like in real time" can be given the slippery reply of "we didn't state anywhere that it was in real time." Ugh.
A massive stochastic memorization machine, anyone ?
It is a fair statement that Gemini is in the ballpark of GPT-4. Google finally caught up (more or less).
None of this means that progress will stop. LLM is a first-cut approach, it simply finds the most likely solution.
We will likely see specialized modules being integrated into the workflow, to address specific classes of problems. Lots of cool stuff ahead.
I don't know, man. GPT-4 gives me a correct answer to the polygon intersection question using shapely: https://chat.openai.com/share/783f5333-9f24-4aba-8cc0-70cf76610ebe
As many of you undoubtably know already....
While we're obsessing over consumer chatbots, the geopolitical powers are busy integrating AI in to their military systems. As the following video outlines it, whoever can act the fastest has a huge advantage on the battlefield, so AI is being used to automate ever higher levels of decision making. Here's the trailer on YouTube, full video on Netflix.
Sorry to be off topic of Gemini and the AI consumer hype machinery, but sometimes it's useful to stand back and put a topic in a wider context. As I understand it, the same competitive pressures that have forced the great powers to mass produce nuclear weapons will also force them to hand more and more military decision making over to automated systems. He who shoots last loses, so....