86 Comments
User's avatar
hugh's avatar

This aligns with my experience. I’m a Sr. Engineer and we use Copilot, Cursor, ChatGPT, ect. at my company.

Personally, I haven’t seen a meaningful uptick in feature velocity since we adopted GenAI coding assistants, but I am seeing more code volume from Jr. devs with bizarre bugs. My time digging through PRs has ticked up for sure.

In my dev work I’ll find myself turning off Copilot half the time, because it’s hallucinating suggestions get pretty distracting.

Expand full comment
User's avatar
Comment deleted
Oct 2
Comment deleted
Expand full comment
Eric Jeker's avatar

It's quite good at explaining code, writing tests, refactoring, writing comments, documentation. Basically anything that is based on already written code. The context is lost most of the time, and for debugging, I have a very low success rate. Usually it's just repeating nonsense. But you can paste an error message to have a good explanation, which certainly helps.

Expand full comment
Jim Amos's avatar

A software developer learns how to code. An LLM doesn't even know what code is. Throwing together a probabilistic sequence of vectors that appear many times in github repos will only get you so far.

Expand full comment
Ragland's avatar

Imagine if all this money was given to open source libraries / frameworks / higher level language creators..

They actually raise the level of abstraction and let programmers do more with less code.. And it's been happening since the beginning without any hype

Expand full comment
Andy X Andersen's avatar

This is not how things work. You don’t get to decide where the money goes. Business people decide, who invest their own money.

It is easy to imagine that only if you had those billions, you’d do a better job.

Expand full comment
Alexander Kurz's avatar

More generally, we have not figured out how to fund public goods (FOSS being just one example).

Expand full comment
Joe Jordan's avatar

There was a great paper recently (swe-bench) that found that off the shelf the best llm models solve about 2% of a curated set of github issues. Even if this can be 10x'd by fine tuning, that still is not a replacement for a software engineer, especially since someone still needs to verify the solutions.

Expand full comment
Paul Topping's avatar

The only way we'd see a 10x programming productivity gain is if AI could write entire apps reliably from some kind of easy to write description. Of course, that is exactly what some hype merchants have claimed. Assuming there was such a problem domain, management would quickly realize that this means it is so regular that they could write a single program that, with a few input parameters, could generate the target apps with greater reliability and maintainability, and fewer compute resources, than the AI solution.

Expand full comment
Alexander Kurz's avatar

"The only way we'd see a 10x programming productivity gain is if AI could write entire apps reliably from some kind of easy to write description." Another way could be if AI got 10x more people into programming.

Expand full comment
Paul Topping's avatar

I know you are joking but, in case you weren't, I believe that the productivity gain being sought is per-programmer while holding the quality and cost of the programmer constant. Of course, this is an unachievable ideal. For example, many programmers wouldn't want a job prompting a coding AI.

Expand full comment
Larry Jewett's avatar

If ever there was a misnomer, “prompt engineering” is it.

Prompt engineering” has nothing to do with real engineering.

Everyone wants to be called an engineer these days (even computer programmers)

Expand full comment
Paul Topping's avatar

I think it qualifies as an engineering task. I would stop short of calling someone an engineer if that's all they knew how to do.

Expand full comment
Alexander Kurz's avatar

Maybe sth you are missing is that there are a lot of different reasons why people are programming ... and a lot ways of being productive writing code ... maybe the kind of SE you are doing is only part of the whole picture?

Expand full comment
Paul Topping's avatar

Or perhaps a prompt engineer should no longer be called a programmer, or what they do as programming.

Expand full comment
Alexander Kurz's avatar

as an aside, there is also sth called LLM programming ... programming languages to program prompts ... https://github.com/yakazimir/esslli_2024_llm_programming

Expand full comment
Steven Ray Scott's avatar

Sergei Brin recently commented at the All-In Summit that none of his devs are using AI. He thought they should be and has been trying to encourage them to use it. He says he wowed them a few times when he used AI to quickly generate some demo apps. But it begs the question. Why aren't devs, the people most amenable to AI who readily use and adapt to new technology, adopting it, and instead have to be pushed into using it? Another data point in support of Gary's premise. Experts find much less benefit from LLMs than non experts who can be happy with an almost solution.

Expand full comment
Ben P's avatar

This, exactly. If I've run into a problem that I can't figure out, even after scouring places like stackoverflow, there's zero chance an LLM is gonna get me the answer. It's basically doing a stupider, less reliable version of searching the internet!

Expand full comment
John Wellbelove's avatar

I found the opposite. I've had to search pages of stackoverflow looking for ideas, trying to filter out all the answers that miss the point. I found that although copilot can sometimes give non-working answers, it's nearly always applicable to problem and can kickstart me on to the correct solution.

Expand full comment
Ben P's avatar

Good to hear, maybe I'll try genAI again next time I hit a wall. The few times I've tried, it would give me the kinds of suggestions I'd already seen and knew didn't work or we're addressing a different problem.

Admittedly there's some confirmation bias on my part. I would expect LLMs, given how they work, to have a hard time distinguishing solutions to similar-sounding problems from solutions to problem I'm facing, and all the more niche the problem, the worse they'll perform. When I scan through pages on stack overflow, I'm using my domain knowledge to identify promising leads and discard others. LLMs don't have domain knowledge, but they're awesome at syntax.

I'm open to the possibility that I've sold them short. Programming is only a small part of my job, so I speak from limited experience.

Expand full comment
John Wellbelove's avatar

I have found that it occasionally is unable to give me anything useful for some more niche C++ template meta-programming problems where there are few examples for it to train on. Sometimes I've had to make several attempts at rewording the problem until it stops repeating the same irrelevant answer.

Expand full comment
Alexander Kurz's avatar

Coming back to my distinction of shallow vs deep SE, one can push this a bit further. Even in deep SE there are shallow problems with which LLMs can help. Lacking documentation is an important one.

Expand full comment
User's avatar
Comment deleted
Oct 23
Comment deleted
Expand full comment
John Wellbelove's avatar

I've found it can sometimes get into a infinite loop of giving me a non-working answer (although very close), I tell it that it is wrong, it acknowledges its error, and then gives me the "corrected" code... which is exactly the same as the previous answer.

Expand full comment
Alexander Kurz's avatar

I like to distinguish shallow and deep SE. Many devs are working on deep SE. LLMs are not very useful there. But being able to create your own apps instead of using corporate apps can be really empowering and LLMs are great for that. Different application area, different people ... this is where I see a potential for 10x.

Expand full comment
Michel Schellekens's avatar

So much funding wasted on Sisyphus. No modularity in AI equates to no clever design. Slow and buggy code crops up rather than correct and optimal. Sigh.

Expand full comment
Rick Frank's avatar

People should be aware that using CoPilot is a risky intellectual property business. If you're writing code for your company, or for hire, you're potentially giving up your copyright to the code. Be careful.

Expand full comment
Larry Jewett's avatar

Microsoft claims they will pay to fight any copyright infringement suits, but when it comes right down to it, I’m sure their lawyers will find some excuse not to do so (based on some claimed violation of the end user agreement.) How many people have the money to hire a lawyer to fight Microsoft AND a copyright holder claiming infringement? Good luck with that.

Despite assurances from Microsoft, programmers and others are really foolish to be mindlessly using the output of GenAIs before the copyright issues have been resolved in the courts because the potential fines for infringement can be very steep.

Expand full comment
Larry Jewett's avatar

Well, if Microsoft is using their own in house code to train their AI, that would actually be the best argument against using the code generated by the Microsoft AI (even better than the potential copyright infringement argument)

To say that Microsoft is not known for reliable, secure, bug free software would not only be to state the obvious but also be an extreme understatement.

Expand full comment
Alexander Kurz's avatar

Afaik, the big players train their LLMs on in house code.

Expand full comment
GO GO GOLEMS's avatar

As someone who experiences the 10x in real life (despite the cringe attached to it, I think it's an apt term), I think critics are missing the obvious in their criticism.

1. building software is mostly not about code

2. llms don't do all that well at code but can generate things that have the right code shape

3. there are many artifacts that are not code in production that are extremely useful to building good software

If you put this all together and focus on "what do humans need to build good software collaboratively", good uses of LLMs become apparent:

- good documentation / rfcs / knowledge bases / onboarding docs / mentoring / etc...

- logging, monitoring, error messages, visualizers, analysis tools, etc...

- prototypes prototypes prototypes. You don't even need to run them, but they are a sort of solo-adventure-whiteboard-brainstorming

I gave a workshop about the topic that hopefully gives a bit more insight into how I approach things: https://www.youtube.com/watch?v=zwItokY087U

Handout is here: https://github.com/go-go-golems/go-go-workshop

What this looks like in practice (although my opensource stuff) is that I can build software like this: https://github.com/go-go-golems/go-go-labs/blob/main/web/voyage/app.md in an hour or two in the evening, after work, without feeling like I am really writing software.

For longer-term software: https://github.com/go-go-golems/go-go-labs/blob/main/pkg/zinelayout/parser/units_doc.md

I don't really care if I have to fill in the 10 lines that do the actual complicated thing, that's fun.

But I 100% stand behind 10x improvement in (productivity is maybe not the best word) quality. Faster "coding" means faster iteration/prototyping, and iteration is one of the key ingredients to building something that actually is useful.

Expand full comment
Alexander Kurz's avatar

"I don't really care if I have to fill in the 10 lines that do the actual complicated thing, that's fun." That is exactly my experience as well.

Expand full comment
Aaron Turner's avatar

Even 2x would be hype. And don't forget the LLM terms-of-service forbid working on AI/ML code.

Expand full comment
Sufeitzy's avatar

A lot of people are in denial.

It's not 10x; my calculations are 1000x.

Am I the only one who uses these tools?

I configured an SAP integration to create ANSI X.12 850 messages from IDOC02 documents to a gateway in 5 seconds. Python code to generate full coverage test vectors 30 seconds. Installation script with SAP API 5 seconds. Should a test fail, the fail is automatically combined with the code for a revision -

5 seconds. It was stable within a minute.

Tried more complex usage tests - generated a device driver for printing which intercepted an image, used a separate AI image API to upscale it to maximum printer resolution, then enqueued the result in 5 minutes. It took me longer to find an upscaling service.

When I generate a business book (200 page documents) as my system streams through the prompt matrix, when I want an illustration, the generator requests 10 python scripts to generate the diagram. They are tested automatically; The first one which works stops testing and is stored in a library within the book source.

It took me 8 hours to reverse engineer a data structure compatible with SAP, ServiceNow, SalesForce, Teamcenter and Kinaxis. 770 tables, and I set it to auto-populate for a set of enterprise simulations I needed to do.

Created a specialized document analysis tool in 1 hour that would have taken an ordinary devopment process a year - I know because many teams had not finished.

This is the limit of what I can share, but let me say, it's wonderful.

Expand full comment
Kevin's avatar

We make similar use. For us it's unbelievably useful. Some of these articles seem... A bit agenda driven.

Expand full comment
Sufeitzy's avatar

Yep, denial and inexperience.

I finished up my OpenAI written contract system today, as a kind of hobby. I’m glad others use these tools, I ran an IT department and worked for the CIO of a fortune 50 company, and oversaw a budget of $1.2B.- it would have collapsed my ERP staff within 6 months - we would be finished with 100% of projects. Aside from endless SAP, The last major custom tool we built has lasted 23 years - it was so integrated in so many systems it was hard to keep current.

For fun, I’ve specified how to self-mutate to reduce the interface rework time from a month to an hour (or less). It’s quite insane. It requires almost no people at all. That’s 10% of my old supply chain team gone.

I’m not talking about “copilot” I’m speaking of 10 years or more experienced coders made somewhat irrelevant.

The entire Ariba Network model becomes irrelevant for B2B integrations. Tools to “scrape” log files to reverse engineer enterprise systems flow - irrelevant. “No code” hideous graphical torture to build workflow - irrelevant.

I haven’t been able to find a single area where these tools - even at their simplest - don’t write code that’s better than highly experienced blue-chip programmers.

Expand full comment
Sufeitzy's avatar

The role of “programmer” is going to melt. They become somewhat irrelevant, since the problem moves from writing code to precisely specifying behavior.

I don’t hire or fire anyone. But it will be quite rare to need large teams to do detailed work.

Expand full comment
Sufeitzy's avatar

I know the time because when I have teams build these things it takes weeks if not months. I had a quote once to do an EDI850 mapping setup in “only 2 months.” For a single supplier/customs relationship.

ANSI X.12 standards have around 900 objects, around 20-30 are commonly used. They are quite old. Likewise SAP interfaces even R/4 are quite well known.

It literally took me 5 seconds to get the same SAP setup code.

Alarmed is not the word.

Hallucination is a word for the result of poor specification.

The software development process will have a nuclear bomb placed in the center with this.

You can deny it or you can leverage it.

Expand full comment
User's avatar
Comment deleted
Oct 5
Comment deleted
Expand full comment
Sufeitzy's avatar

I never use code directly, it is always verified. I just automated the process heavily. I don’t seek to change all minds, I merely point out that for those of us willing to try new things it’s a radical change that has staggering benefits.

Nobody writes assembly code.

Nobody writes Fortran-IV or COBOL code.

Nobody writes Pascal or C.

Advanced techniques tried to get people to “code” graphically. That’s all gone.

Nobody need write C# or Python or Java ever again.

OpenAI has every line of code written, just waiting to be conjured up by inference. It’s the biggest code library on earth.

OpenAI has every enterprise architecture ever created, nobody has to do solutions again.

OpenAI has every enterprise data model ever considered, nobody has to derive one again.

You just have to carefully ask for the correct one.

It’s not magic, it’s a library.

Expand full comment
User's avatar
Comment deleted
Oct 5
Comment deleted
Expand full comment
Sufeitzy's avatar

You get it! 😉

Expand full comment
Brandon G's avatar

As a C# developer, I believe ReSharper and their Rider IDE have done more to make my job easier than anything else.

Expand full comment
I Am's avatar

GenAI can do things that take some people days if not weeks, and does so with more precision than even the best human programmer. It also makes the most insane and subtle bugs I've ever seen.

As someone who's been programming for 15 years, it feels like magic—and with all the same caveats. It can provide incredible value, but ultimately is only as good as the person using it.

I appreciate that you're looking at the broader picture, beyond my, and other people's anecdotal evidence. The overall net effect is going to ultimately reflect the "energy" put into it. It will be a reflection of what motivates those people using it.

What outcome are you hoping for, either for GenAI, or your study and writings about it?

Expand full comment
David Hsing's avatar

Those types of claims utterly ignore technical debt up the wazoo that's gonna bite every "LLM-code" infested project out there: https://www.geekwire.com/2024/new-study-on-coding-behavior-raises-questions-about-impact-of-ai-on-software-development/

=====

But while AI may boost production, it could also be detrimental to overall code quality, according to a new research project from GitClear, a developer analytics tool built in Seattle.

The study analyzed 153 million changed lines of code, comparing changes done in 2023 versus prior years, when AI was not as relevant for code generation. Some of the findings include:

“Code churn,” or the percentage of lines thrown out less than two weeks after being authored, is on the rise and expected to double in 2024. The study notes that more churn means higher risk of mistakes being deployed into production.

The percentage of “copy/pasted code” is increasing faster than “updated,” “deleted,” or “moved” code. “In this regard, the composition of AI-generated code is similar to a short-term developer that doesn’t thoughtfully integrate their work into the broader project,” said GitClear founder Bill Harding.

The bottom line, per Harding: AI code assistants are very good at adding code, but they can cause “AI-induced tech debt.”

=====

Expand full comment
Alexander Kurz's avatar

and thanks for the link, very useful

Expand full comment
Alexander Kurz's avatar

"Those types of claims utterly ignore technical debt" On the other hand, LLMs can make some legacy code more maintainable.

Expand full comment
Sufeitzy's avatar

I did a test today, from zero code I created a python based tool which when given a PDF, DOCX or TXT file which is an arbitrarily complex contract in any of a few dozen types and subtypes, uses OpenAI to deconstruct it into a highly structured JSON document containig XML structures that verifiably comply with any arbitrary XML schema (XSD document) to comply with conversion guidelines from major ERP vendors, creates unified contract classification keys at contract, section, clause, entity class, entity, datatype and value levels. I just basically asked OpenAI got-4o to write code to do what I do when I analyze procurement contracts.

It supports clause linking, clause library, template library, type library, variable library generation. It runs interactively or batched, and self-instruments performance and prompt accuracy profiling, tests regenerabity of the original contract to ensure no data loss, and helped me discover errors in vendor documentation.

Tomorrow it will capture lightweight tabular text formatting in contracts into XML CDATA (I had to learn XML structure today) embed image blobs (signatures), and I'll allow it to consume all available processing resources either on a workststation, or within cloud compute resources to parallelize the efforts (or until OpenAI shoots me).

The only reason I wasn't done in 4 hours was that I was given an option between xmlschema and lxml and xmlschema gave me innacurate results making me think the XML generation was faulty.

The intention is to digest a quarter-million documents for a huge conversion process. I’ve reduced the process from 5 years farming the work to India to 41,000 process hours (sequential) which should be able to parallelize to 41 hours. I’ve never done distributed processing with modern tools but I suspect it’s pretty easy.

Assuming I hadn’t been sidelined by the xmlschema library, I would be done and ready for scaled testing. I will be pulling contracts (again, not me but OpenAi generated software) from the SEC EDGAR website which holds some contacts which are part of 10k filings for materiality - should be able to find a few hundred.

Again, would this have been 1000 hours of work, 800? 100? Would it have been possible without AI? I do know I had a tool in 6 hours that was not possible 5 years ago, solves a general problem, and could be used tomorrow. It’s not a “copilot” - I just asked the right questions.

Expand full comment
Sufeitzy's avatar

Every output is normalized to lowercase, space collapsed and compared to the original for clause detection, that’s level 1. The output can be 100% concatenated to regenerate the input text. The element detection in each clause extracts named entities. Level 2. The entities are extracted, typed, and variables substituted in each clause and stored for template use. Level 3. The xml for each clause can be reversed back and forth as can xml for variables. The typing is summed over many contracts to set one small set of common types of variables (start date and stop date, quantity/unit of measure types for example). The variable strings are they typed (date string, integer) level 4 and then values are stored level 5. Clauses are given named types over many contracts and standardized level-6 metadata. Contract types and subtypes are extracted and standardized over collections of contracts - level 7 metadata. I have separate passes to generate document metadata and conversions statistics. Clauses stripped of enumeration and variabilized are put into a clause library for later reconciliation, same with variable types/names and contract types / names. The output is also locked to a hash of the file with some other security.

Oh and I extract images, drop in a conversion placeholder pointing to the file, and run image classifier/recognition to for conversion annotation.

I asked OpenAI to write the verifiers as it wrote the extractions

I have a tool I generated which works the other way, I can start with a title, and I use OpenAI to expand the title into an outline, paragraphs, paragraph elements, sentences, it generates a fictional character schema, it can link multiple volumes, and can generate multi volume series, or a novel, short story, the sections can hold multiple content types - sentences, verse, generated images, tables, other strongly typed content. It can write any kind of book besides fiction, movie scripts, white papers, research papers, software, training manuals whatever. I write books for friends.

As for work experience I worked in banking and commodities trading out of my dorm room at Caltech when I was 19 in the early 80’s. I used internet as a playground when it was still ARPANEt and you have to have a login to a nicnode. I have run IT departments, and worked for CIO’s, and have been in the industry for almost 45 years. These tools data architect, solution architect, and code better than any person or team I’ve worked with globally in those 45 years. Personally, I’ve designed chips, built kernels and drivers and application packages, delivered or directed code delivery up to ERP scale, in supply chain, finance, product design, sales, service, and marketing.

These tools do the work in 1/1000 the time.

You can deny it, or you can figure out how to leverage it

Expand full comment
Kevin's avatar

This article is oh so flawed. There are a huge number of ancillary tasks that revolve around coding too. AI is in our daily work flow across the board and our *overall productivity is increased considerably. 10x? No of course not but regularly 2 or 3x. Creating text, making icons, background images, searching for an obscure bug and hundreds of other tasks that would typically involve multiple people rarely leave the same desk now. Also it is tangibly improving all the time and we don't have to change any of our flows to implement the improvements as they are server side.

Ps we use chatgpt, copilot and Claude s. Somehow gpt4o performs better even though is it supposed to be the same as copilot

Expand full comment