Most people don’t think GenAI has improved…

Jul 29, 2024

Damning new study from Upwork

36 Comments

We have been experimenting with GitHub CoPilot for a few months now and it has really only amounted to a fancier autocomplete. Often the same prompts generate different results or behavior, so between modifying the prompts and then correcting the code (e.g., unit tests), the productivity gains are minor at best.

Expand full comment

This is my experience too with my own personal coding (I'm not part of a team). Still, we keep seeing programmers claim huge gains in productivity from using LLMs. Are they being honest? Or does that depend on the kind of programming one does? I always imagine that these claims come from people in an environment where their entire jobs could be eliminated by the creation of some good abstractions. Take what they are doing, turn it into a few functions or processes with the right parameters, and it becomes a no-code situation.

Assuming your organization consists of multiple programmers all using CoPilot, do you have any that disagree with your assessment here? Do you have any that are claiming huge gains using it?

Expand full comment

I have 9 people that report to me and this seems to be the consensus. Interestingly enough, the best results actually come while modifying an application that was primarily built with a code generator I wrote, which aligns with what you're saying. So it works much better with highly pattern-based code, but works less so with detailed, specialized code.

Expand full comment

I find it does save typing sometimes. I enter the name of some function I want to call and it offers a reasonable guess as to what the parameters need to be based on surrounding code. But that's hardly a huge productivity gain. I spend way more time thinking about design than typing.

I do find ChatGPT and CoPilot useful in recalling some word or phrase that I know exists but have forgotten. Also in coming up with good names for things. I'm a big believer in proper naming in programming.

Expand full comment

My experience as well. It can generate code that's about the same as what a google search used to give me links to off of sites like substack. General purpose stuff that there's already tons of code samples for already.

I tried to use it for code to customize a process automation and content management system, and it was completely worthless compared to going to the web forums dedicated to that enterprise suite.

Expand full comment

Jul 30Edited

That might explain part of why I find it hard to believe LLMs help anyone with coding at all. Most programming I do is specialised, the result never being what I would call pattern-based code.

I haven't actually bothered using any, especially since I enjoy programming, and why would I want a LLM to do it for me when I can't even stand IDEs? I can't imagine LLMs are even capable of properly *attempting* a lot of specialised programming—any statistically infrequent programming—let alone succeeding.

I'll start using them the day when they can help me make sense of other people's disassembled code, or with challenging tasks like clean room reverse engineering. Probably, that will be the same day when pigs fly!

Expand full comment

I have heard it does very well at writing powershell, and I have seen the resulting code that it creates. I tried using it for C# but the code never worked.

Expand full comment

Mostly the same impression here. A similar use case is some variation of “boilerplate code”, which - while it can save time - still has to be proofread and more often than not, has subtle errors that take more time (and angst) to fix than to just write the thing from scratch.

Expand full comment

I work for a Fortune 500 in a finance-related role and there's zero adoption of AI that I've seen. I've talked to IT folks involved and there's just now some investigation behind the scenes. Stuff like scanning pdfs of contracts for certain language seems on the table, but nothing has been implemented yet. I have to imagine plenty of other companies are taking this kind of wait and see approach.

Expand full comment

We already have tools for doing text indexes and searches, and they're cheaper and more reliable.

Expand full comment

Here's what I said about this on LinkedIn:

Generative AI has been embraced by the unfortunately dominant C-suite population of commissars who have taken over from leaders. Commissars sit in their palaces and issue orders detached from reality. Leaders wade through the mud ahead of the troops, know what's going on, and influence by example. It's not an AI problem, it's a societal one.

I feel very sorry for the generations who have grown up under the commissars and who have been deprived of the opportunity to see what leadership looks like.

Expand full comment

Upwork link (1st) is broken, fyi.

Expand full comment

fixed in the online version

Expand full comment

The URL was entered twice. Delete the first instance and you'll get where you need to go

Expand full comment

Can confirm it is broken. 8:09an pst

Expand full comment

https://www.upwork.com/blog/generative-ai-impact-on-work. This works

Expand full comment

It would not surprise me if it was true, but there is little in the way of information about methodology etc.. So why trust this particular survey result? Because it confirms what we already convinced of?

Expand full comment

Thanks for this. I agree. Especially when the number one solution to the issue from the “Upwork Research Institute” was to “hire more gig workers.” Haha.

Expand full comment

I'm not surprised by any of this, especially the part about employers having expectations of productivity gains and the employees not knowing how to use AI to do so. Too many managers make demands without fully understanding the work. None of them knew how to use AI either. They just knew it was trendy and didn't want to look like they were behind.

Expand full comment

I am Jack's complete lack of surprise.

Expand full comment

It's the UX stupid. All llm require users to use their knowledge to interact offering no support for machine assisted smart queries. The LLM may know stuff but it can't pop the knowledge bubble. This has been a flaw in search engines for 20 years. The irony is no matter how much better llm gets they are fatally flawed at the UI.

Expand full comment

Agreed. It feels like the “product” part hasn’t evolved past a search field - stuck in the 1990s. On the bright side, there is much room for improvement one hopes?

Expand full comment

Zachary Collins

Is it remotely possible, too, that corporations and big tech companies don't actually understand the workflows and value of their own employees?

Alot of the discussion around AI and even AGI supposes extreme reductiveness on what intelligence is, and it often reflects back on how we perceive each other's competencies, attendances, and values.

What if genAI's paradoxical contribution is to limit test this absurdism until it's clear that a singular oracle machine is NOT what intelligence is in any form?

Expand full comment

Vladimir Vilimaitis

As an office worker, I can attest that Claude Sonnet writes good enough cover letters for me to switch my jobs faster.

Expand full comment

>Over three in four (77%) say AI tools have decreased their productivity and added to their workload in at least one way.”

Yes, that's what Erik Brynjolfsson discovered in study months back. That it accelerated the progress of mediocre achievers, and held back the high achievers.

Expand full comment

This one? https://arxiv.org/abs/2304.11771 from a year ago?

Expand full comment

It depends on the actual setting. OpenAI's safety research showed a while back (they researched if using GPT4 would improve people to build a bioweapon) that GPT4 sped up professionals (slightly) but slowed down amateurs (above all because they were more likely not to quickly recognise nonsense fabrications and errors and so lost a lot of time following them).

Expand full comment

Comment deleted

Comment deleted

Expand full comment

yup, also here: https://www.nber.org/papers/w31161

Expand full comment

Customer support agents were the test population. That is a population with rather specific properties and not the epitome of high quality work. That is a relevant thing to keep in mind. Here we have an example of "AI does not need to be good to have value" (improving on customer support is not a very high bar in my experience)

Expand full comment

I work in UX design and it's good for ideating UX copy, sloppy qualitative analysis, and structuring presentations.

Otherwise, 90% of the time, the effort expended sifting through walls of stock content to find something reasonable isn't worth it.

Expand full comment

Cross posting both this one and your more detailed post on neurosymbolic AI. The markets finally seem to be getting skeptical of the river of money pouring into Silicon Valley-into an industry with "no moat," as you've astutely pointed out so many times. This on the heels of Meta outright trying to crush OpenAI with latest release of LLAMA. If anyone believe Zuckerberg release LLAMA as "open source" because he really wants the whole world to hold hands and get along-they haven't been following the history of Meta! He's doing it to crush competitors, full stop. He's betting Meta has the money to ride it out.

Expand full comment

Kevin Zatloukal

Before predicting productivity gains from LLMs writing code, it would be good note Brian Kernighan's decades-old comment that "debugging is twice as hard as writing it correctly in the first place".

His comment was about debugging code you wrote yourself, so you start out with a solid understanding of how it works. Debugging code written by someone/thing else could be 10x harder than writing it yourself.

Debugging usually requires a full understanding of how the code works, which is very hard to attain for code written by someone else. In practice, the easiest way to figure out how such code works is often to try to rewrite it yourself. It is not unusual for coders taking over a project to rewrite much of it simply to gain a full understanding of how it works.

The more code you have to debug, the worse it gets. Having to debug 100k lines written by an LLM with a bug in it is a near-impossible task. And yet, unless the LLM can write 100-line functions with better than 99.9% reliability, you are more likely than not to have a bug in 100k lines.

Expand full comment

Here in northern Virginia we have 1/3 of the world's data centers. You read that correctly: one third. I am wondering if the upcoming LLM crash is going to cause a mini depression here when the accompanying steep reduction in need for cloud computing occurs. Ideas, anyone? (To clarify, I don't think it will have much effect on employment but rather on tax revenues.)

Expand full comment

I think I can say that Perplexity increased my productivity for doing research faster and ‘fuel’ brainstorming sessions quicker than a Google-then-browse loop

Expand full comment

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts