We have been experimenting with GitHub CoPilot for a few months now and it has really only amounted to a fancier autocomplete. Often the same prompts generate different results or behavior, so between modifying the prompts and then correcting the code (e.g., unit tests), the productivity gains are minor at best.
This is my experience too with my own personal coding (I'm not part of a team). Still, we keep seeing programmers claim huge gains in productivity from using LLMs. Are they being honest? Or does that depend on the kind of programming one does? I always imagine that these claims come from people in an environment where their entire jobs could be eliminated by the creation of some good abstractions. Take what they are doing, turn it into a few functions or processes with the right parameters, and it becomes a no-code situation.
Assuming your organization consists of multiple programmers all using CoPilot, do you have any that disagree with your assessment here? Do you have any that are claiming huge gains using it?
I have 9 people that report to me and this seems to be the consensus. Interestingly enough, the best results actually come while modifying an application that was primarily built with a code generator I wrote, which aligns with what you're saying. So it works much better with highly pattern-based code, but works less so with detailed, specialized code.
I find it does save typing sometimes. I enter the name of some function I want to call and it offers a reasonable guess as to what the parameters need to be based on surrounding code. But that's hardly a huge productivity gain. I spend way more time thinking about design than typing.
I do find ChatGPT and CoPilot useful in recalling some word or phrase that I know exists but have forgotten. Also in coming up with good names for things. I'm a big believer in proper naming in programming.
My experience as well. It can generate code that's about the same as what a google search used to give me links to off of sites like substack. General purpose stuff that there's already tons of code samples for already.
I tried to use it for code to customize a process automation and content management system, and it was completely worthless compared to going to the web forums dedicated to that enterprise suite.
That might explain part of why I find it hard to believe LLMs help anyone with coding at all. Most programming I do is specialised, the result never being what I would call pattern-based code.
I haven't actually bothered using any, especially since I enjoy programming, and why would I want a LLM to do it for me when I can't even stand IDEs? I can't imagine LLMs are even capable of properly *attempting* a lot of specialised programming—any statistically infrequent programming—let alone succeeding.
I'll start using them the day when they can help me make sense of other people's disassembled code, or with challenging tasks like clean room reverse engineering. Probably, that will be the same day when pigs fly!
I have heard it does very well at writing powershell, and I have seen the resulting code that it creates. I tried using it for C# but the code never worked.
Mostly the same impression here. A similar use case is some variation of “boilerplate code”, which - while it can save time - still has to be proofread and more often than not, has subtle errors that take more time (and angst) to fix than to just write the thing from scratch.
I work for a Fortune 500 in a finance-related role and there's zero adoption of AI that I've seen. I've talked to IT folks involved and there's just now some investigation behind the scenes. Stuff like scanning pdfs of contracts for certain language seems on the table, but nothing has been implemented yet. I have to imagine plenty of other companies are taking this kind of wait and see approach.
Generative AI has been embraced by the unfortunately dominant C-suite population of commissars who have taken over from leaders. Commissars sit in their palaces and issue orders detached from reality. Leaders wade through the mud ahead of the troops, know what's going on, and influence by example. It's not an AI problem, it's a societal one.
I feel very sorry for the generations who have grown up under the commissars and who have been deprived of the opportunity to see what leadership looks like.
It would not surprise me if it was true, but there is little in the way of information about methodology etc.. So why trust this particular survey result? Because it confirms what we already convinced of?
Thanks for this. I agree. Especially when the number one solution to the issue from the “Upwork Research Institute” was to “hire more gig workers.” Haha.
I'm not surprised by any of this, especially the part about employers having expectations of productivity gains and the employees not knowing how to use AI to do so. Too many managers make demands without fully understanding the work. None of them knew how to use AI either. They just knew it was trendy and didn't want to look like they were behind.
It's the UX stupid. All llm require users to use their knowledge to interact offering no support for machine assisted smart queries. The LLM may know stuff but it can't pop the knowledge bubble. This has been a flaw in search engines for 20 years. The irony is no matter how much better llm gets they are fatally flawed at the UI.
Agreed. It feels like the “product” part hasn’t evolved past a search field - stuck in the 1990s. On the bright side, there is much room for improvement one hopes?
Is it remotely possible, too, that corporations and big tech companies don't actually understand the workflows and value of their own employees?
Alot of the discussion around AI and even AGI supposes extreme reductiveness on what intelligence is, and it often reflects back on how we perceive each other's competencies, attendances, and values.
What if genAI's paradoxical contribution is to limit test this absurdism until it's clear that a singular oracle machine is NOT what intelligence is in any form?
>Over three in four (77%) say AI tools have decreased their productivity and added to their workload in at least one way.”
Yes, that's what Erik Brynjolfsson discovered in study months back. That it accelerated the progress of mediocre achievers, and held back the high achievers.
It depends on the actual setting. OpenAI's safety research showed a while back (they researched if using GPT4 would improve people to build a bioweapon) that GPT4 sped up professionals (slightly) but slowed down amateurs (above all because they were more likely not to quickly recognise nonsense fabrications and errors and so lost a lot of time following them).
Customer support agents were the test population. That is a population with rather specific properties and not the epitome of high quality work. That is a relevant thing to keep in mind. Here we have an example of "AI does not need to be good to have value" (improving on customer support is not a very high bar in my experience)
Cross posting both this one and your more detailed post on neurosymbolic AI. The markets finally seem to be getting skeptical of the river of money pouring into Silicon Valley-into an industry with "no moat," as you've astutely pointed out so many times. This on the heels of Meta outright trying to crush OpenAI with latest release of LLAMA. If anyone believe Zuckerberg release LLAMA as "open source" because he really wants the whole world to hold hands and get along-they haven't been following the history of Meta! He's doing it to crush competitors, full stop. He's betting Meta has the money to ride it out.
Before predicting productivity gains from LLMs writing code, it would be good note Brian Kernighan's decades-old comment that "debugging is twice as hard as writing it correctly in the first place".
His comment was about debugging code you wrote yourself, so you start out with a solid understanding of how it works. Debugging code written by someone/thing else could be 10x harder than writing it yourself.
Debugging usually requires a full understanding of how the code works, which is very hard to attain for code written by someone else. In practice, the easiest way to figure out how such code works is often to try to rewrite it yourself. It is not unusual for coders taking over a project to rewrite much of it simply to gain a full understanding of how it works.
The more code you have to debug, the worse it gets. Having to debug 100k lines written by an LLM with a bug in it is a near-impossible task. And yet, unless the LLM can write 100-line functions with better than 99.9% reliability, you are more likely than not to have a bug in 100k lines.
Here in northern Virginia we have 1/3 of the world's data centers. You read that correctly: one third. I am wondering if the upcoming LLM crash is going to cause a mini depression here when the accompanying steep reduction in need for cloud computing occurs. Ideas, anyone? (To clarify, I don't think it will have much effect on employment but rather on tax revenues.)
I think I can say that Perplexity increased my productivity for doing research faster and ‘fuel’ brainstorming sessions quicker than a Google-then-browse loop
We have been experimenting with GitHub CoPilot for a few months now and it has really only amounted to a fancier autocomplete. Often the same prompts generate different results or behavior, so between modifying the prompts and then correcting the code (e.g., unit tests), the productivity gains are minor at best.
This is my experience too with my own personal coding (I'm not part of a team). Still, we keep seeing programmers claim huge gains in productivity from using LLMs. Are they being honest? Or does that depend on the kind of programming one does? I always imagine that these claims come from people in an environment where their entire jobs could be eliminated by the creation of some good abstractions. Take what they are doing, turn it into a few functions or processes with the right parameters, and it becomes a no-code situation.
Assuming your organization consists of multiple programmers all using CoPilot, do you have any that disagree with your assessment here? Do you have any that are claiming huge gains using it?
I have 9 people that report to me and this seems to be the consensus. Interestingly enough, the best results actually come while modifying an application that was primarily built with a code generator I wrote, which aligns with what you're saying. So it works much better with highly pattern-based code, but works less so with detailed, specialized code.
I find it does save typing sometimes. I enter the name of some function I want to call and it offers a reasonable guess as to what the parameters need to be based on surrounding code. But that's hardly a huge productivity gain. I spend way more time thinking about design than typing.
I do find ChatGPT and CoPilot useful in recalling some word or phrase that I know exists but have forgotten. Also in coming up with good names for things. I'm a big believer in proper naming in programming.
My experience as well. It can generate code that's about the same as what a google search used to give me links to off of sites like substack. General purpose stuff that there's already tons of code samples for already.
I tried to use it for code to customize a process automation and content management system, and it was completely worthless compared to going to the web forums dedicated to that enterprise suite.
That might explain part of why I find it hard to believe LLMs help anyone with coding at all. Most programming I do is specialised, the result never being what I would call pattern-based code.
I haven't actually bothered using any, especially since I enjoy programming, and why would I want a LLM to do it for me when I can't even stand IDEs? I can't imagine LLMs are even capable of properly *attempting* a lot of specialised programming—any statistically infrequent programming—let alone succeeding.
I'll start using them the day when they can help me make sense of other people's disassembled code, or with challenging tasks like clean room reverse engineering. Probably, that will be the same day when pigs fly!
I have heard it does very well at writing powershell, and I have seen the resulting code that it creates. I tried using it for C# but the code never worked.
Mostly the same impression here. A similar use case is some variation of “boilerplate code”, which - while it can save time - still has to be proofread and more often than not, has subtle errors that take more time (and angst) to fix than to just write the thing from scratch.
I work for a Fortune 500 in a finance-related role and there's zero adoption of AI that I've seen. I've talked to IT folks involved and there's just now some investigation behind the scenes. Stuff like scanning pdfs of contracts for certain language seems on the table, but nothing has been implemented yet. I have to imagine plenty of other companies are taking this kind of wait and see approach.
We already have tools for doing text indexes and searches, and they're cheaper and more reliable.
Here's what I said about this on LinkedIn:
Generative AI has been embraced by the unfortunately dominant C-suite population of commissars who have taken over from leaders. Commissars sit in their palaces and issue orders detached from reality. Leaders wade through the mud ahead of the troops, know what's going on, and influence by example. It's not an AI problem, it's a societal one.
I feel very sorry for the generations who have grown up under the commissars and who have been deprived of the opportunity to see what leadership looks like.
Upwork link (1st) is broken, fyi.
fixed in the online version
The URL was entered twice. Delete the first instance and you'll get where you need to go
Can confirm it is broken. 8:09an pst
https://www.upwork.com/blog/generative-ai-impact-on-work. This works
It would not surprise me if it was true, but there is little in the way of information about methodology etc.. So why trust this particular survey result? Because it confirms what we already convinced of?
Thanks for this. I agree. Especially when the number one solution to the issue from the “Upwork Research Institute” was to “hire more gig workers.” Haha.
I'm not surprised by any of this, especially the part about employers having expectations of productivity gains and the employees not knowing how to use AI to do so. Too many managers make demands without fully understanding the work. None of them knew how to use AI either. They just knew it was trendy and didn't want to look like they were behind.
I am Jack's complete lack of surprise.
It's the UX stupid. All llm require users to use their knowledge to interact offering no support for machine assisted smart queries. The LLM may know stuff but it can't pop the knowledge bubble. This has been a flaw in search engines for 20 years. The irony is no matter how much better llm gets they are fatally flawed at the UI.
Agreed. It feels like the “product” part hasn’t evolved past a search field - stuck in the 1990s. On the bright side, there is much room for improvement one hopes?
Is it remotely possible, too, that corporations and big tech companies don't actually understand the workflows and value of their own employees?
Alot of the discussion around AI and even AGI supposes extreme reductiveness on what intelligence is, and it often reflects back on how we perceive each other's competencies, attendances, and values.
What if genAI's paradoxical contribution is to limit test this absurdism until it's clear that a singular oracle machine is NOT what intelligence is in any form?
As an office worker, I can attest that Claude Sonnet writes good enough cover letters for me to switch my jobs faster.
>Over three in four (77%) say AI tools have decreased their productivity and added to their workload in at least one way.”
Yes, that's what Erik Brynjolfsson discovered in study months back. That it accelerated the progress of mediocre achievers, and held back the high achievers.
This one? https://arxiv.org/abs/2304.11771 from a year ago?
It depends on the actual setting. OpenAI's safety research showed a while back (they researched if using GPT4 would improve people to build a bioweapon) that GPT4 sped up professionals (slightly) but slowed down amateurs (above all because they were more likely not to quickly recognise nonsense fabrications and errors and so lost a lot of time following them).
yup, also here: https://www.nber.org/papers/w31161
Customer support agents were the test population. That is a population with rather specific properties and not the epitome of high quality work. That is a relevant thing to keep in mind. Here we have an example of "AI does not need to be good to have value" (improving on customer support is not a very high bar in my experience)
I work in UX design and it's good for ideating UX copy, sloppy qualitative analysis, and structuring presentations.
Otherwise, 90% of the time, the effort expended sifting through walls of stock content to find something reasonable isn't worth it.
Cross posting both this one and your more detailed post on neurosymbolic AI. The markets finally seem to be getting skeptical of the river of money pouring into Silicon Valley-into an industry with "no moat," as you've astutely pointed out so many times. This on the heels of Meta outright trying to crush OpenAI with latest release of LLAMA. If anyone believe Zuckerberg release LLAMA as "open source" because he really wants the whole world to hold hands and get along-they haven't been following the history of Meta! He's doing it to crush competitors, full stop. He's betting Meta has the money to ride it out.
Before predicting productivity gains from LLMs writing code, it would be good note Brian Kernighan's decades-old comment that "debugging is twice as hard as writing it correctly in the first place".
His comment was about debugging code you wrote yourself, so you start out with a solid understanding of how it works. Debugging code written by someone/thing else could be 10x harder than writing it yourself.
Debugging usually requires a full understanding of how the code works, which is very hard to attain for code written by someone else. In practice, the easiest way to figure out how such code works is often to try to rewrite it yourself. It is not unusual for coders taking over a project to rewrite much of it simply to gain a full understanding of how it works.
The more code you have to debug, the worse it gets. Having to debug 100k lines written by an LLM with a bug in it is a near-impossible task. And yet, unless the LLM can write 100-line functions with better than 99.9% reliability, you are more likely than not to have a bug in 100k lines.
Here in northern Virginia we have 1/3 of the world's data centers. You read that correctly: one third. I am wondering if the upcoming LLM crash is going to cause a mini depression here when the accompanying steep reduction in need for cloud computing occurs. Ideas, anyone? (To clarify, I don't think it will have much effect on employment but rather on tax revenues.)
I think I can say that Perplexity increased my productivity for doing research faster and ‘fuel’ brainstorming sessions quicker than a Google-then-browse loop