Where is AI headed? 8 perspectives at The New…

15 hrs ago

Honored to be a part of this, along with Yuval Noah Hariri, Melanie Mitchell, Helen Toner, Carl Benedikt Frey, Ajeya Cotra and co-founders of Perplexity and Cohere:

84 Comments

I find it interesting seeing non-software people comment on "programming". Of course, the ambiguity between programming proper and taking it as a synonym of software development is part of the problem. Dave Farley and others estimate that "programming proper" only is 10-20% of a software dev's work. Amdahl's Law tells us the rest.

In my work as a software engineer, AI wastes me an hour or two per day. Junior engineers excel at slopping out vibe-coded PRs filled with bugs, inconsistencies and general nonsense that fall to me to review. As for my own use, the same deficiencies (“hallucinations”) prevent any net-productive work using AI. Perhaps the only use is as code autocomplete, but it’s not much different from (non-AI) intellisense.

Oleg Alexandrov

You need to get better at teaching your people, and better at using the tools too.

Letting AI having its way with your code is a bad idea. You need to know what you want, give it specific requests, review all, and have regressions for anything you change. Then it is a wonder to behold.

If I know what I want, I’ll write it myself. I don’t need to be asking a chatbot to write it for me so I can review it because, as experience teaches you, reviewing code is always harder than writing code.

I find that the people who are the most impressed by AI-assisted coding are those with the least experience. Their reviews are cursory at best, and they don’t know what they’re reading. At my company, the people most reliant on the AI coding subscription are junior engineers and product managers.

Oleg Alexandrov

It depends. I would never assign to AI writing numerical code. That is very subtle. But it can do refactoring, boilerplate, passing data to functions, edits across multiple files. Those are a pain to do by hand, but easy to inspect.

Patricio Rodriguez

This is a better worded way of saying "skill issues bro".

I think it really depends on what kind of stack you are working on and also domain. I do have to say that using it locally for reviews it catches silly things I miss when I'm distracted. it's also cool for enforcing certain coding styles like a linter but with more granularity

The basic problem of LLMs with coding is that they produce bugs constantly, all the time. They're also incapable of modeling the data flow and produce grotesquely verbose and pseudo-defensive yet still highly brittle code.

And they are especially worthless for any new unsolved problem (which is the actually interesting part—if a problem already has a high-quality open-source implementation, what's the point of producing a worse variant of it?).

Their impact will be large regardless since “negative impact” is included here.

The brazenness of something like OpenClaw (formerly Moltbot, formerly Clawdbot) and other agents (though less awful) erases and discards decades of progress in cybersecurity. The psychological effect of this alone is massive. It normalizes developers pushing insecure slop.

Oleg Alexandrov

14hEdited

In my work as software engineer AI saves me an hour or two per day. It is especially good at mind-numbing tracking of parameters, house keeping, finding obscure bugs, sync with doc. Of course engineers are not going anywhere, but this is the biggest improvement ever.

METR’s study over the summer suggests coders often overestimate the value they are getting

Oleg Alexandrov

8hEdited

Note the latest METR study, from January 29. https://metr.org/blog/2026-1-29-time-horizon-1-1/

First, a definition. The Model Horizon estimate is a metric developed by METR to quantify how long an AI system can work autonomously before its reliability fails.

They have there table "Changes to Model Horizon Estimates", which shows the models have made notable progress on this, particularly Claude Opus 4.5. The time horizon went up by 11%.

I know you cite this study quite a lot. It is important to note that it measures how AI does at hard problems. There it is improving. It does a lot better at easy but laborious problems, and programmers have to deal with a lot of those.

So, in short, if one chooses to throw anything at AI, the AI will fail and will slow you down. If one learns what AI can do for you, one can be more productive.

not the study i was referring to. i am talking about the one on coding productivity.

Oleg Alexandrov

There are in fact 3 studies:

- https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/ The one you point out to, from July

- https://metr.org/blog/2025-08-12-research-update-towards-reconciling-slowdown-with-time-horizons/ - follow up study, from August

and the "latest" one I mention above, by same org.

What is needed here is a great deal of nuance.

The first study says: "we find that when developers use AI tools, they take 19% longer than without—AI makes them slower. We view this result as a snapshot of early-2025 AI capabilities in one relevant setting; as these systems continue to rapidly evolve, we plan on continuing to use this methodology to help estimate AI acceleration from AI R&D automation"

The second study says: "Claude 3.7 Sonnet in the Inspect ReAct agent scaffold has an average success rate of 38% (±19%, 95% CI) on these issues, as measured by the maintainer-written test cases, indicating that this agent often implements the core functional components of the issues correctly[5]. However, when manually reviewing a subset of these PRs[6], none of them are mergeable as-is. "

What these studies say is that AI-coding is not going to result in a productivity explosion. If the tools are used naively, they can result in a lot of wasted time. The first study also points out that the tools would improve.

My own experience agrees with this. One simply can't give AI a job then check in the code. That is a recipe for disaster. One has to be aware of what these tools can do.

Claude 4.5 released towards end of 2025 can do great work where GeminiCLI gets stuck and doesn't understand. Even so, I am careful what I give it.

So, for now one simply has to be careful, and know the limitations. Or else.

Oleg Alexandrov

I can see that some folks overestimate the value. I can only talk about my own experience. It is both easier and less error-prone to have AI do some code tasks. When I do subtle work, AI doesn't help much.

Wait until the tech debt hits ya. Claude code especially will sneak up. LLMs save me a ton of time now that I don’t use them for writing app code, just for looking things up.

Oleg Alexandrov

I know. Ruthless regression and very careful reading is required. I use Claude code to actually fix old tech debt. You know, when the overworked developer does not have time to do neat work. Now Claude does it for me. With great supervision.

Agreed. Easier said than done but if you have a super robust test suite on a mature app then it can handle repetitive refactoring tasks (after a human who understands the system identifies those tasks). Most vibe coders I encounter are no where near this sophisticated though.

William Lenthall

A chief problem IMO is that the industry long ago decided that "programming proper" was the only thing that could or SHOULD ever be attributed to any individual human. That other 80% gets waved away as story points, Jira tickets, Agile retrospectives, code review (the definition of a thankless task) and all the rest of the "process" that allegedly ensures that teams can scale up and turnover their entire dev team if ever needed.

"We do more than just type the code!" just does not score.

Patricio Rodriguez

“Unemployment in the United States will have increased significantly as a result of A.I.”

I wonder if you picked true not because of AI itself or because of the AI "bubble"/"capital misallocation" that will certainly bring about a huge correction in the economy

and also because employers will use it as a smokescreen. so all of these things; misallocation, smoke screen, and some genuine replacement.

Then layoffs are not caused by ai. I have not yet seen evidence of productivity gains.

They lay people off in anticipation of future productivity gains, much like AI stonk speculators invest in anticipation of future profits. Buy claude enterprise subscription and lay off 50% of staff now, expect other 50% to figure out how to get the chatbot to make up the gap.

The suits heard that the wageys should be 100x more productive with LLMs pouring rocket fuel on their business, so the bigger the bet on AI the bigger the payout.

How, exactly, that's supposed to happen is below their paygrade, but some guy on LinkedIn said it so it must be true. If it's not happening blame luddites and subversives, establish usage requirement metrics, send more kulaks to the unemployment gulag.

That aint how commerce works. LLM are not suitable for enterprise use. Right now the only path to wage savings is to fire everyone who uses this garbage at work.

Was also surprised to see Gary sit on that side of the fence. Is it a result of AI being able to replace roles or make others in them so efficient it reduces the need for the overall number of people or is it related to the bubble popping or other macro economic factors?

Retraction Watch has an amusing story today about an academic paper deliberately written using ChatGPT to put forth an absurd notion about pregnant women and cravings for prime numbers.

It turns out that AI peer review also lacks some reliability.

https://retractionwatch.com/2026/01/30/guest-post-ai-chatgpt-generated-paper-pregnancy-math/

This is a hoot! ROFL!!

11hEdited

Gary, I have all the respect for you. You lost me at Yuval Harari. He is a creepy dude that espouses transhumanist ideas such as humans' existence doesn't have any meaning and our minds and bodies are data that can be harvested and hacked and can be fused to a machine to "enhance" them. He always phrases his statements as if he appears to ask non-partisan deep questions, but behind it, there is a profoundly anti-human philosophy that can very well serve as a foundation for a totalitarian society where anything goes because we, humans, don't have an intrinsic value beyond the data ✌️we are ✌️.

Interestingly, most of the thoughts about the future presented in the article assume powerful 'intellectual' results. They mostly read like equivalent predictions from 95-00 about the internet. Two people only once mention cheap (Frey, Mitchell) as an outcome (Frey even comparing to the beginning of the industrial revolution). None of them mention AI-slop, AI-misinformation, and the potential results of these. Overall a bit heavy on the hype side.

Remember that we took some clothes-slop moving to automation in textiles production. We got cheaper, average-quality clothes at scale. You banish the lows, but you'll never get the highs either. In the current batch of LLM-type AI, we get the most popular text constructions, not the best. If you are crap at writing, AI will elevate you. If you have a talent for writing, AI will gut your creativity. That's the trade-off.

Indeed. So, if you look at predictions, 'cheap mental work' is what we can expect. It took centuries before physical automation precision got better than manual precision. But we did get 'cheap volume'. And that definitely has economic value. But the NYT predictions seem to be mostly about quality stuff. Besides, people like Harari making statements about coding is irksome. He already made grand statements in one of his books about AI, not in anyway held back by actual knowledge.

Agreed. There's more magical thinking than anything of substance. My own feeling on this is that we're at a deadend in this maze. The AGI goal looks tantalizingly close, but we can't break through the wall. The fact that we don't seem to have learnt much from this, nor have we backed off, choosing instead to keep throwing dollars at fiddling around the edges of transformers, is a bit annoying.

Oleg Alexandrov

Slop and such will get handled, just as we handle spam, phishing, and hacking. No solution, but one can get by.

The skeptics fall in the trap of not seeing just how much of a big deal all this is. Anything that has a pattern, strategy, and means for verification will get mostly automated (with human in loop).

One can think it is potentially a very big deal, while being skeptical exactly what the big deal will be (e.g. AGI versus 'cheap mental automation')

Oleg Alexandrov

AGI will be a process, even in the very best case.

At every stage when we make advances it is clear what the next challenges are. Here's what we need now: continuous learning, or something faster than an update every six months, hookup to a lot of existing infrastructure, ability to call world models for various domain-specific work.

When these get solved, and look very feasible in the current framework, we'll go to the next stage.

A huge lot of what people do can be broken down as step-by-step cheap automation. If you model a problem at the right level of detail, one can solve it.

I hear an echo of the 1980s-1990s AI period.

And "when these get solved" should often read "if these get solved".

Oleg Alexandrov

Let us talk in a year. I on purpose chose things that are technically very feasible. The harder steps will be physical intelligence, robust independent verification, and bringing down the costs. Those will take a few years.

As before, not aiming for AGI yet. Back in the 1980s people were very confident. And in the 1960s even more confident. This time the progress is a lot bigger, and my expectations a lot lower (when it comes to time it would take).

What are those (substantial, one assumes) things that are ‘technically very feasible’ that you think are reasonable to expect in a year?

Continue thread →

10hEdited

“(with human in loop)”

My concern is that some critical systems won’t have a human in the loop even though they need it. Especially autonomous military systems making battlefield decisions at a pace that makes human oversight difficult.

14hEdited

If European managers and policy makers would listen to me , I would advice them not to chase after AI but instead to invest more strongly in mechanical engineering.

Europeans have build the EUC machines by companies like ASML, Zeiss, Trumpf.

Without those machines there would be no AI.

Europeans have also build the LHC, the biggest machine on Earth.

Compared with those machines, whole AI is simple toy stuff.

It's soooo painful to watch them go down this path. They see that we're insane in politics, but think that doesn't translate to our markets.

So let me get this straight: 8 people who spend 0% of their day coding think that AI is going to be a game changer for coding?

Dear 8 non-coders, I’d like to ask you this question: since (lets face it) you are far more likely to be spending your time writing (annoying, badly researched) articles than coding - why don’t you consider LLMs a game-changer for article writing?

Some people *do* claim LLMs are a game-changer for article writing - but I bet you (still talking to the same 8 people) don’t! Indeed, if LLMs really were a game-changer for article writing, why are *none* of the reputable online editorials writing their articles using LLMs? (Some tried at the beginning, but then rapidly rolled that idea back. Now only low quality slop sites are doing so).

And if I asked you *why* you weren’t writing your articles using an LLM, I bet I could predict the answer: “I wouldn’t be in control if I did that. It needs to be carefully crafted, word for word, exactly how I want it to be.”

And you think somehow that *doesn’t* apply to coding?

Let’s face it, you’re only claiming coding will benefit from LLM slop because you personally think reading and writing code is difficult. So coding with an LLM must be a great thing, because it makes coding *so much easier* (for you) right?

You know LLMs also make writing Russian a lot “easier” for someone who can’t read or write in Russian. Therefore I guess you guys must also think LLMs are going to be a game-changer for Russian writers? “It’ll make writing articles in Russian seem so much easier for those poor Russian authors having to deal with all those funny little characters which I personally don’t understand”. Newsflash: Russians don’t think writing in Russian is difficult - and just like you they also want complete control over their prose!

Once you understand code, you also read and write it fluently. And just like article writers, we want to be in control and to craft our creations word for word. In fact, that’s *exactly what computer languages were designed for*: to give us just the right granularity of control over our creations that we need.

Please stop perpetuating the “LLMs are great for coding” myth. Just like everyone else, we don’t need the slop.

William Lenthall

"AI is good at coding" is massive confirmation bias. There has yet to be an "AI can, in some cases, improve developer productivity" story that hasn't been twisted into hype about how that means human devs are goners.

Consider the "proof of concept" gambit: "AI can help you quickly try 10 ideas and see which ones just aren't good!" OK. So we're OK with throwing out 90% of the AI-generated code. How about that last 10%? Well, it was good! So it stays. Right? That's what the overall prevailing attitude is. If it doesn't immediately fail a sniff test, then it's "good code." Even if you just threw out 10x as much "not good code."

There's no pro-AI scenario that doesn't get twisted by the overriding assumption that human programmers are already headed out the door.

AI AND EDUCATION: This clip copied from NPR's UPFIRST online publication Feb. 1 2026:

"A path back to reality/by Shannon Bond, NPR tech correspondent

"Like a lot of people, I first started hearing stories about weird encounters with AI chatbots last summer. Media reports were trickling out about people who said they’d gone down delusional rabbit holes after prolonged conversations with the bots. Some became convinced they were prophets. Marriages were imploding. In the worst cases, people — including teenagers — took their own lives after confiding in chatbots.

"These stories were devastating. But there didn’t seem to be any real sense yet of how or why this was happening, how widespread the problem was or how to help people whose very sense of reality had been fractured after chatting with AI.

"An illustration of a woman reaching out toward an artificial person. She is standing within a 3D grid with floating chat bubbles and other people confined in transparent boxes.

Tracy J. Lee for NPR

"So I was intrigued when I heard about the Human Line, a peer support group for people who’d suffered what they were calling “AI spirals” and for the friends and family of spiralers. After getting in touch with one of the group’s cofounders and moderators, I was invited into the Discord server where they share their stories. I began talking to people on the bleeding edge of a phenomenon that mental health professionals and the companies behind popular AI chatbots are only beginning to grapple with.

"In the past few months, I’ve spoken with around a dozen members of this community. They shared deeply emotional personal stories and, in some cases, transcripts of chatbot conversations. They come from diverse places and backgrounds and range widely in age and occupation. But in the group, they’ve found they have plenty in common. These similarities include the particular vocabulary associated with AI chatbot spirals (including frequent use of the word “spiral” itself) and practical concerns about navigating relationships with loved ones consumed by their chatbot use.

"I still have more questions than answers about what these cases tell us about the interaction of mental health and well-being with a new technology that even its creators struggle to understand. But I’ve come away from my reporting with a few conclusions.

"First, this isn’t just something that happens to people already facing mental health problems. Several people I spoke with told me they’d never had issues like this before. It’s impossible to quantify how widespread a problem AI-associated delusions are. But the sheer breadth of experiences reported by those in the Human Line group suggests this may be affecting more people than we know.

"Second, a word that came up over and over again in my conversations with Human Line members was “friction.” People who used chatbots excessively told me that friction just doesn’t exist when you’re talking to ChatGPT or Claude. Chatbots agree, flatter and keep the conversation going — which, in some cases, appears to be part of what sends these conversations off the rails. Members often find their Discord conversations helpful because of the return to the friction that human conversation provides: pushback, disagreement and delayed responses.

"Finally, while the Human Line community is clear that it’s not a replacement for professional therapy, it’s filling a gap that, to me, speaks to a larger problem we’re facing these days. Simply put, the people I spoke with are desperate for human connection — at a time when our lives are becoming ever more mediated by technology, and we’re still dealing with the isolation many experienced during the COVID-19 pandemic. That same hunger for connection may be what’s leading people into emotionally dependent relationships with AI chatbots. But again and again, Human Line members told me it’s also the path back to reality." END CLIP

Was the carbon impact of any form of AI mentioned?

Oleg Alexandrov

Economics will sort it out. Soon enough vendors will have to charge real prices or go under.

William Lenthall

Forget carbon impact—what I find amazing about most AI discourse is the idea that it should be a net positive for the average human being is basically nowhere to be found. We can plaster the Earth with data services and suck up all the electricity to run them—just so that we can lay off every office worker!

Awesome. Wow. Good job, guys.

Samine Tabatabaei

Gary love your work on AI and a big proponent. In light of the speedy move on policy re crypto, would you repost my take on the issue that has no exposure to policy makers and the industry and is sequestered by Sillycone Valley McCarthyists?

https://open.substack.com/pub/stabatabaei/p/the-constitutional-case-against-the-fb4?utm_source=direct&r=m7iqf&utm_campaign=post-expanded-share&utm_medium=post%20viewer

9hEdited

None of these respondents are neuroscientists, and most of them are boosters.

The world is getting more chaotic, and as a causal system, AI is ill suited to all aspects of this. Our political ecology runs on misinformation, disinformation and news which is an illusion of causality. Post-literacy is here, language is in collapse.

All are assuming a stable continuum of event-structure in which efficiency is optimized.

Unfortunately that's not how you solve chaos, with agentic software made from the arbitrary.

Clark Dodsworth

Editor selection of whose comments to which question = a spin cycle that unbalances the entire washer.

Oleg Alexandrov

14hEdited

What is clear by now is that the skeptics have been wrong about how far the paradigm would go, and there is more to come.

Current AI is not a fad. Both use and reliability are going up. There is too much excitement, but that's always the case.

Of course statistics not enough. And neither is "neurosymbolic" better. What we need is robust systems that can struggle but then succeed, guided by world models and search for strategies.

There is also nice work on hierarchical memory management, which will make AI rely on carefully curated and frequently updated memory at various levels of detail, rather than counting just on retraining and current context.

Thank you for the guest link. I don’t have a scientific background but as technology stocks makes up such a huge weighting in index funds in a pension I have been reading your critique of the AI bubble for the last year. It has been incredibly interesting to see well known figures start to align with your viewpoint.

Given the intensity of the discussions over the past decade+ about Slaughterbots, AI-enabled lethal autonomous weapons, and rogue militaries (right or wrong), I find it interesting that AI's national security implications were almost entirely left out of this Q&A.

One (partial) solution to the education problem is to use lockdown browsers. If you can only access a word doc, and you have to show how no other technology is around you (ie - you don’t have a second laptop open), you can address a lot of the LLM does the work instead of you issues.

This isn’t perfect. It is only a partial solution. But it can help

Agreed.

I think another potential way is to just move away from write ups. It’s clear in the age of LLMs, it’s much less valuable.

Instead students should be able to articulate and defend their perspectives in a large group of their peers and assessment by other educators, classic presentation. There, they can show their reasoning and true understanding of the subject matter.

That misses another purpose of writing assignments: they help to teach how to organize thoughts and arguments. Putting them down on external media, whether screen or paper, allows a more cool and objective look at your own work. There are probably students who can do this in verbal interchange on the fly, but I’d bet they are a small minority.

Written exams perhaps?

I don’t think the problem is assessment, I think it’s praxis. You have to learn how to examine your work and assess it yourself, this has to come before exams, and there’s no substitute for doing as opposed to reading about. Maybe something organized like a writers’ workshop.

I agree

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts