42 Comments

Gary, I cannot thank you enough for this post. As I am not an engineer, but an economist and planner, I need this kind of detail to help me see the structure of the hype-machine that Silicon Valley is addicted to. The model of: putting half-baked versions of technology out there to get mind share and early adopter market share; followed by marketing hype that tells half-truths and incomplete information; followed by a funding round that gets a billion dollar valuation, is addictive. I get that. But that is dangerous when dealing with something like AI, which you have pointed out has significant risks to not just investors, but the public at large when misused. Unfortunately, there is no way to constrain the SV hype model at this time, besides thoughtful assessments put into the public domain as you and others are doing. Thank you!

Expand full comment

And the misuse is for real and serious. I think the big companies who have a lot of money to spend and many engineers can continue to train such models, which then get heavily misused (fake news/misinformation/disinformation, toxic content, deception,.....) that then we academic researchers spend our time to make this less wrong, less dangerous....and yet, we are far from being able to tell what stuff is just made up by these models (hallucination), or what is written by an AI system or human (AI detactability - counter-Turing test; our paper that just won EMNLP outstanding paper award)), or what is fake and what is real: https://www.linkedin.com/feed/update/urn:li:activity:7117565699258011648

Expand full comment

I don't understand why they feel the need to over hype these things. They are pretty remarkable, even with faults and can be used as great tools with the appropriate guardrails in place/training.

Expand full comment

Marketing departments have to do something to justify their existence.

Expand full comment

The officers of any profit-motivated company have a (legal) fiduciary duty to maximise shareholder value. A little bit of marketing puff is considered to be par for the course.

Expand full comment

People say this all the time, but it's not even remotely true! Company officers have a legal responsibility to exercise "due diligence"[1] in furthering the shareholders' interests. That duty of care recognizes that shareholders may have other interests besides the monetary value, especially in the short-term, of the company. Furthermore, their duty is limited to reasonable efforts to promote the shareholders' interests. In particular, officers are not required to act immorally or unethically just because doing so might add value to the company.

In other words, stop giving company executives a pass on bad behavior because they "have a legal duty to maximize shareholder value," no matter what. They don't.

[1] In finance circles the phrase "due diligence" has come to mean "research", presumably because someone exercising due diligence will do a lot of research; however, the concept actually encompasses a lot more than that.

Expand full comment

Because they are incredibly expensive to build and the very existence of the companies building them (or the divisions in bigger companies like Google) are being upheld by expectations of future LLM/AGI ability. If these products perform less reliably than a human that can be hired at a fraction of the cost, then the valuation of the companies building them will tank and they will no longer have the money to continue designing new systems.

Everyone expects them to do better in the future, so Microsoft, Google, Meta, etc. are willing to keep pumping money into more and more expensive systems. If it turned out that there were hard limits or that progress had slowed for the foreseeable future, then these companies may be looking at hard times or closure.

Expand full comment
Dec 8, 2023Liked by Gary Marcus

Piekniewski's observation about Bard's failure on a programming problem resonated with me because I've long suspected that LLMs' success on programming problems is due largely to people giving the models textbook problems [1]. So, I decided to give it a go with a problem that came up recently in a project I've been working on [2].

In brief, the results were a disaster. The code it produced was riddled with syntax errors. Even once those were fixed, it was never going to work as written because it tried to use a field from an intermediate result set that was not actually selected. Once that was fixed, it gave the wrong answer.

Programming assistants are supposed to be one of the killer-apps for AI, but at least for now you have to have a fair bit of programming skill to fix the model's mistakes. The incorrect answers are even worse. Testing new code is hard, especially when it's code meant to solve a problem that you don't already know the answer to, which is pretty much always the case in scientific programming. In the problem I described, my first step was to construct a dataset where I could compute the answer by hand, but a lot of people don't put in that kind of effort, which means that a lot of these bugs will go undetected. Therefore, I predict that the first big impact of the AI revolution in programming will be a whole raft-load of bugs escaping into production because the AI generated code produced plausible-looking results, and nobody was skeptical enough to dig deeper.

[1] They've also been successful, I've heard, with complex API calls with a lot of parameters to remember, which is useful, but not that impressive. IDEs have been doing that without AI help for at least 15 or 20 years.

[2] This is one of the few cases I've encountered in the past decade where the programming was actually the tricky part. Usually the hard part is precisely specifying the problem I am trying to solve, or figuring out what data might shed light on the effect I am trying to study, or things of that sort. As far as I know, the current crop of AI marvels can't really help with any of that.

Expand full comment
author

this sounds exactly correct to me

Expand full comment
Dec 7, 2023·edited Dec 8, 2023Liked by Gary Marcus

Thank you Gary for doing the semi-thankless job – along with philosophers – of continuing to ask good questions, seeing clearly, raising valid concerns, and generally being willing to say something more common sense than the crowd is often willing to hear, as they smash along in the bandwagon towards the next exciting thing, the next shining hope of our savior the LordAGI. As the wagon lurches on towards Reality, one can at least take some pleasure and peace in not being impaled on the rocks of dashed hopes and dreams! :)

Expand full comment

Great python example. Credibility is eroded. Articles with over the top AI claims appear, seemingly unchecked, in venues such as Nature. Post truth?

Expand full comment
Dec 7, 2023Liked by Gary Marcus

Gary, you the man! Just keep calling bullshit on them brother!

Expand full comment

Thank you so much. Your insights, commentary and willingness to share are much, much appreciated.

Expand full comment
Dec 9, 2023Liked by Gary Marcus

"Lies, big lies, statistics, benchmarks". Hidden in Google's marketing blitz is actually some interesting information, that — if you take a good look — is pretty sobering. E.g. one can estimate that GPT4 has about a 1 in 100 billion chance of producing a correct small (~1000 lines of code) program (really) and Google's — 'state of the art'-surpassing — improves to 1 in 20 million... https://ea.rna.nl/2023/12/08/state-of-the-art-gemini-gpt-and-friends-take-a-shot-at-learning/

Expand full comment

Hi there, I believe I was able to get GPT-4.5 to achieve the result Filip was after using Chain-of-Thought prompting. It's worth noting that this was done without examples, but allowing the system to work iteratively towards a more complex test case. I started with two intersecting rectangles, and then moved on to an irregular hexagon intersecting a rectangle. Here's my chat thread, which ends with the latest code. Would love to hear anyone's thoughts. Please let me know if you have problems accessing the thread. Thanks!

https://chat.openai.com/share/e11c5f9c-e3ed-4007-8b7c-9f372509da6c

Expand full comment

GPT-4.5 didn't "achieve the result". Some of the cognition required to solve the problem was going on inside your head, and some was going on inside GPT-4.5.

Expand full comment

So true. Much of the contemporary AI biz runs on the Clever Hans effect.

Expand full comment

Thanks for your reply! Can you explain that statement in more detail? I'm curious where you see me as contributing cognitively to the task.

Expand full comment
Dec 8, 2023·edited Dec 8, 2023

Think of it this way. Imagine the *minimum* amount of information that is strictly needed in order to formally specify the problem in question (e.g. a pre- and post-condition for the desired program). If you provide a problem description containing exactly this amount of information (and no more), and the machine (e.g. LLM-based chatbot) generates a correct solution, then the machine has solved the problem "all by itself". But if you provide *more* information than is strictly necessary (and if the machine is unable to correctly solve the problem without this additional information) then you will have performed some of the problem-solving reasoning yourself.

Andrej Karpathy likens prompt engineering (such as tree-of-thoughts, chain-of-thoughts, etc) to Kahneman's System 2 (going on inside your head) vs System 1 (going on inside the LLM). See this video [https://www.youtube.com/watch?v=bZQun8Y4L2A] at 27:28.

My own (unfinished draft) AGI paper attempts to define when a problem-solver is "maximally-intrinsically intelligent", i.e. when it's doing all its own thinking. See section 1.2.22 of [https://www.bigmother.ai/_files/ugd/d2335c_d884ad862fe94aa38151cb99e1fe6e74.pdf] for details.

Hope this helps!

Expand full comment

Thanks! I'll check out those links and think on this!

Expand full comment

No special prompting needed, if you count using a geometry package: https://chat.openai.com/share/783f5333-9f24-4aba-8cc0-70cf76610ebe

Expand full comment

A thought (well, cheap shot of the day sort of thing, but...) crossed my mind: Maybe we've forgotten the lesson that we should have learned from Theranos: be cautious when dealing with Stanford dropouts.

Expand full comment
Dec 7, 2023·edited Dec 7, 2023

In my little diagram of the AGI space [p 13, here: https://www.bigmother.ai/_files/ugd/d2335c_d884ad862fe94aa38151cb99e1fe6e74.pdf], GPS (The General Problem Solver, developed in the 1950s and 60s) is (by my reckoning) an AGI K. The GPT4 version of ChatGPT, released some 60 years later, is (by my reckoning) also an AGI K. And now it appears as though Gemini is also going to be ... (wait for it) ... an AGI K. It's starting to look as though AGI K is a difficult barrier to overcome! :-)

Expand full comment

There is a wide, wide gulf between academic points and systems in production.

Expand full comment

How much "subtle deception" can be pushed for something to be "actual deception?" A pointed question of "did you guys made this look like in real time" can be given the slippery reply of "we didn't state anywhere that it was in real time." Ugh.

Expand full comment

A massive stochastic memorization machine, anyone ?

:)

Expand full comment

It is a fair statement that Gemini is in the ballpark of GPT-4. Google finally caught up (more or less).

None of this means that progress will stop. LLM is a first-cut approach, it simply finds the most likely solution.

We will likely see specialized modules being integrated into the workflow, to address specific classes of problems. Lots of cool stuff ahead.

Expand full comment

I don't know, man. GPT-4 gives me a correct answer to the polygon intersection question using shapely: https://chat.openai.com/share/783f5333-9f24-4aba-8cc0-70cf76610ebe

Expand full comment

As many of you undoubtably know already....

While we're obsessing over consumer chatbots, the geopolitical powers are busy integrating AI in to their military systems. As the following video outlines it, whoever can act the fastest has a huge advantage on the battlefield, so AI is being used to automate ever higher levels of decision making. Here's the trailer on YouTube, full video on Netflix.

https://www.youtube.com/watch?v=YsSzNOpr9cE

Sorry to be off topic of Gemini and the AI consumer hype machinery, but sometimes it's useful to stand back and put a topic in a wider context. As I understand it, the same competitive pressures that have forced the great powers to mass produce nuclear weapons will also force them to hand more and more military decision making over to automated systems. He who shoots last loses, so....

Expand full comment