Marcus on AI

This aligns with my experience. I’m a Sr. Engineer and we use Copilot, Cursor, ChatGPT, ect. at my company.

Personally, I haven’t seen a meaningful uptick in feature velocity since we adopted GenAI coding assistants, but I am seeing more code volume from Jr. devs with bizarre bugs. My time digging through PRs has ticked up for sure.

In my dev work I’ll find myself turning off Copilot half the time, because it’s hallucinating suggestions get pretty distracting.

Expand full comment

Comment deleted

Comment deleted

Expand full comment

Eric Jeker

Oct 13, 2024

It's quite good at explaining code, writing tests, refactoring, writing comments, documentation. Basically anything that is based on already written code. The context is lost most of the time, and for debugging, I have a very low success rate. Usually it's just repeating nonsense. But you can paste an error message to have a good explanation, which certainly helps.

Expand full comment

Jim Amos

A software developer learns how to code. An LLM doesn't even know what code is. Throwing together a probabilistic sequence of vectors that appear many times in github repos will only get you so far.

Expand full comment

Ragland

Imagine if all this money was given to open source libraries / frameworks / higher level language creators..

They actually raise the level of abstraction and let programmers do more with less code.. And it's been happening since the beginning without any hype

Expand full comment

Reply (2)

Andy X Andersen

Feb 17

This is not how things work. You don’t get to decide where the money goes. Business people decide, who invest their own money.

It is easy to imagine that only if you had those billions, you’d do a better job.

Expand full comment

More generally, we have not figured out how to fund public goods (FOSS being just one example).

Expand full comment

Joe Jordan

There was a great paper recently (swe-bench) that found that off the shelf the best llm models solve about 2% of a curated set of github issues. Even if this can be 10x'd by fine tuning, that still is not a replacement for a software engineer, especially since someone still needs to verify the solutions.

Expand full comment

The only way we'd see a 10x programming productivity gain is if AI could write entire apps reliably from some kind of easy to write description. Of course, that is exactly what some hype merchants have claimed. Assuming there was such a problem domain, management would quickly realize that this means it is so regular that they could write a single program that, with a few input parameters, could generate the target apps with greater reliability and maintainability, and fewer compute resources, than the AI solution.

Expand full comment

"The only way we'd see a 10x programming productivity gain is if AI could write entire apps reliably from some kind of easy to write description." Another way could be if AI got 10x more people into programming.

Expand full comment

I know you are joking but, in case you weren't, I believe that the productivity gain being sought is per-programmer while holding the quality and cost of the programmer constant. Of course, this is an unachievable ideal. For example, many programmers wouldn't want a job prompting a coding AI.

Expand full comment

Reply (2)

Larry Jewett

If ever there was a misnomer, “prompt engineering” is it.

Prompt engineering” has nothing to do with real engineering.

Everyone wants to be called an engineer these days (even computer programmers)

Expand full comment

I think it qualifies as an engineering task. I would stop short of calling someone an engineer if that's all they knew how to do.

Expand full comment

Maybe sth you are missing is that there are a lot of different reasons why people are programming ... and a lot ways of being productive writing code ... maybe the kind of SE you are doing is only part of the whole picture?

Expand full comment

Or perhaps a prompt engineer should no longer be called a programmer, or what they do as programming.

Expand full comment

as an aside, there is also sth called LLM programming ... programming languages to program prompts ... https://github.com/yakazimir/esslli_2024_llm_programming

Expand full comment

Michel Schellekens

So much funding wasted on Sisyphus. No modularity in AI equates to no clever design. Slow and buggy code crops up rather than correct and optimal. Sigh.

Expand full comment

Rick Frank

People should be aware that using CoPilot is a risky intellectual property business. If you're writing code for your company, or for hire, you're potentially giving up your copyright to the code. Be careful.

Expand full comment

Reply (3)

Larry Jewett

Microsoft claims they will pay to fight any copyright infringement suits, but when it comes right down to it, I’m sure their lawyers will find some excuse not to do so (based on some claimed violation of the end user agreement.) How many people have the money to hire a lawyer to fight Microsoft AND a copyright holder claiming infringement? Good luck with that.

Despite assurances from Microsoft, programmers and others are really foolish to be mindlessly using the output of GenAIs before the copyright issues have been resolved in the courts because the potential fines for infringement can be very steep.

Expand full comment

Larry Jewett

Oct 3, 2024

Well, if Microsoft is using their own in house code to train their AI, that would actually be the best argument against using the code generated by the Microsoft AI (even better than the potential copyright infringement argument)

To say that Microsoft is not known for reliable, secure, bug free software would not only be to state the obvious but also be an extreme understatement.

Expand full comment

Oct 2, 2024Edited

Afaik, the big players train their LLMs on in house code.

Expand full comment

GO GO GOLEMS

Oct 1, 2024Edited

As someone who experiences the 10x in real life (despite the cringe attached to it, I think it's an apt term), I think critics are missing the obvious in their criticism.

1. building software is mostly not about code

2. llms don't do all that well at code but can generate things that have the right code shape

3. there are many artifacts that are not code in production that are extremely useful to building good software

If you put this all together and focus on "what do humans need to build good software collaboratively", good uses of LLMs become apparent:

- good documentation / rfcs / knowledge bases / onboarding docs / mentoring / etc...

- logging, monitoring, error messages, visualizers, analysis tools, etc...

- prototypes prototypes prototypes. You don't even need to run them, but they are a sort of solo-adventure-whiteboard-brainstorming

I gave a workshop about the topic that hopefully gives a bit more insight into how I approach things: https://www.youtube.com/watch?v=zwItokY087U

Handout is here: https://github.com/go-go-golems/go-go-workshop

What this looks like in practice (although my opensource stuff) is that I can build software like this: https://github.com/go-go-golems/go-go-labs/blob/main/web/voyage/app.md in an hour or two in the evening, after work, without feeling like I am really writing software.

For longer-term software: https://github.com/go-go-golems/go-go-labs/blob/main/pkg/zinelayout/parser/units_doc.md

I don't really care if I have to fill in the 10 lines that do the actual complicated thing, that's fun.

But I 100% stand behind 10x improvement in (productivity is maybe not the best word) quality. Faster "coding" means faster iteration/prototyping, and iteration is one of the key ingredients to building something that actually is useful.

Expand full comment

"I don't really care if I have to fill in the 10 lines that do the actual complicated thing, that's fun." That is exactly my experience as well.

Expand full comment

Aaron Turner

Oct 1, 2024Edited

Even 2x would be hype. And don't forget the LLM terms-of-service forbid working on AI/ML code.

Expand full comment

A lot of people are in denial.

It's not 10x; my calculations are 1000x.

Am I the only one who uses these tools?

I configured an SAP integration to create ANSI X.12 850 messages from IDOC02 documents to a gateway in 5 seconds. Python code to generate full coverage test vectors 30 seconds. Installation script with SAP API 5 seconds. Should a test fail, the fail is automatically combined with the code for a revision -

5 seconds. It was stable within a minute.

Tried more complex usage tests - generated a device driver for printing which intercepted an image, used a separate AI image API to upscale it to maximum printer resolution, then enqueued the result in 5 minutes. It took me longer to find an upscaling service.

When I generate a business book (200 page documents) as my system streams through the prompt matrix, when I want an illustration, the generator requests 10 python scripts to generate the diagram. They are tested automatically; The first one which works stops testing and is stored in a library within the book source.

It took me 8 hours to reverse engineer a data structure compatible with SAP, ServiceNow, SalesForce, Teamcenter and Kinaxis. 770 tables, and I set it to auto-populate for a set of enterprise simulations I needed to do.

Created a specialized document analysis tool in 1 hour that would have taken an ordinary devopment process a year - I know because many teams had not finished.

This is the limit of what I can share, but let me say, it's wonderful.

Expand full comment

Reply (3)

Kevin

We make similar use. For us it's unbelievably useful. Some of these articles seem... A bit agenda driven.

Expand full comment

Yep, denial and inexperience.

I finished up my OpenAI written contract system today, as a kind of hobby. I’m glad others use these tools, I ran an IT department and worked for the CIO of a fortune 50 company, and oversaw a budget of $1.2B.- it would have collapsed my ERP staff within 6 months - we would be finished with 100% of projects. Aside from endless SAP, The last major custom tool we built has lasted 23 years - it was so integrated in so many systems it was hard to keep current.

For fun, I’ve specified how to self-mutate to reduce the interface rework time from a month to an hour (or less). It’s quite insane. It requires almost no people at all. That’s 10% of my old supply chain team gone.

I’m not talking about “copilot” I’m speaking of 10 years or more experienced coders made somewhat irrelevant.

The entire Ariba Network model becomes irrelevant for B2B integrations. Tools to “scrape” log files to reverse engineer enterprise systems flow - irrelevant. “No code” hideous graphical torture to build workflow - irrelevant.

I haven’t been able to find a single area where these tools - even at their simplest - don’t write code that’s better than highly experienced blue-chip programmers.

Expand full comment

The role of “programmer” is going to melt. They become somewhat irrelevant, since the problem moves from writing code to precisely specifying behavior.

I don’t hire or fire anyone. But it will be quite rare to need large teams to do detailed work.

Expand full comment

I know the time because when I have teams build these things it takes weeks if not months. I had a quote once to do an EDI850 mapping setup in “only 2 months.” For a single supplier/customs relationship.

ANSI X.12 standards have around 900 objects, around 20-30 are commonly used. They are quite old. Likewise SAP interfaces even R/4 are quite well known.

It literally took me 5 seconds to get the same SAP setup code.

Alarmed is not the word.

Hallucination is a word for the result of poor specification.

The software development process will have a nuclear bomb placed in the center with this.

You can deny it or you can leverage it.

Expand full comment

Comment deleted

Comment deleted

Expand full comment

I never use code directly, it is always verified. I just automated the process heavily. I don’t seek to change all minds, I merely point out that for those of us willing to try new things it’s a radical change that has staggering benefits.

Nobody writes assembly code.

Nobody writes Fortran-IV or COBOL code.

Nobody writes Pascal or C.

Advanced techniques tried to get people to “code” graphically. That’s all gone.

Nobody need write C# or Python or Java ever again.

OpenAI has every line of code written, just waiting to be conjured up by inference. It’s the biggest code library on earth.

OpenAI has every enterprise architecture ever created, nobody has to do solutions again.

OpenAI has every enterprise data model ever considered, nobody has to derive one again.

You just have to carefully ask for the correct one.

It’s not magic, it’s a library.

Expand full comment

Comment deleted

Comment deleted

Expand full comment

Oct 6, 2024

You get it! 😉

Expand full comment

Spadesman

As a C# developer, I believe ReSharper and their Rider IDE have done more to make my job easier than anything else.

Expand full comment

I Am

GenAI can do things that take some people days if not weeks, and does so with more precision than even the best human programmer. It also makes the most insane and subtle bugs I've ever seen.

As someone who's been programming for 15 years, it feels like magic—and with all the same caveats. It can provide incredible value, but ultimately is only as good as the person using it.

I appreciate that you're looking at the broader picture, beyond my, and other people's anecdotal evidence. The overall net effect is going to ultimately reflect the "energy" put into it. It will be a reflection of what motivates those people using it.

What outcome are you hoping for, either for GenAI, or your study and writings about it?

Expand full comment

David Hsing

Those types of claims utterly ignore technical debt up the wazoo that's gonna bite every "LLM-code" infested project out there: https://www.geekwire.com/2024/new-study-on-coding-behavior-raises-questions-about-impact-of-ai-on-software-development/

=====

But while AI may boost production, it could also be detrimental to overall code quality, according to a new research project from GitClear, a developer analytics tool built in Seattle.

The study analyzed 153 million changed lines of code, comparing changes done in 2023 versus prior years, when AI was not as relevant for code generation. Some of the findings include:

“Code churn,” or the percentage of lines thrown out less than two weeks after being authored, is on the rise and expected to double in 2024. The study notes that more churn means higher risk of mistakes being deployed into production.

The percentage of “copy/pasted code” is increasing faster than “updated,” “deleted,” or “moved” code. “In this regard, the composition of AI-generated code is similar to a short-term developer that doesn’t thoughtfully integrate their work into the broader project,” said GitClear founder Bill Harding.

The bottom line, per Harding: AI code assistants are very good at adding code, but they can cause “AI-induced tech debt.”

=====

Expand full comment

Reply (2)

and thanks for the link, very useful

Expand full comment

"Those types of claims utterly ignore technical debt" On the other hand, LLMs can make some legacy code more maintainable.

Expand full comment

I did a test today, from zero code I created a python based tool which when given a PDF, DOCX or TXT file which is an arbitrarily complex contract in any of a few dozen types and subtypes, uses OpenAI to deconstruct it into a highly structured JSON document containig XML structures that verifiably comply with any arbitrary XML schema (XSD document) to comply with conversion guidelines from major ERP vendors, creates unified contract classification keys at contract, section, clause, entity class, entity, datatype and value levels. I just basically asked OpenAI got-4o to write code to do what I do when I analyze procurement contracts.

It supports clause linking, clause library, template library, type library, variable library generation. It runs interactively or batched, and self-instruments performance and prompt accuracy profiling, tests regenerabity of the original contract to ensure no data loss, and helped me discover errors in vendor documentation.

Tomorrow it will capture lightweight tabular text formatting in contracts into XML CDATA (I had to learn XML structure today) embed image blobs (signatures), and I'll allow it to consume all available processing resources either on a workststation, or within cloud compute resources to parallelize the efforts (or until OpenAI shoots me).

The only reason I wasn't done in 4 hours was that I was given an option between xmlschema and lxml and xmlschema gave me innacurate results making me think the XML generation was faulty.

The intention is to digest a quarter-million documents for a huge conversion process. I’ve reduced the process from 5 years farming the work to India to 41,000 process hours (sequential) which should be able to parallelize to 41 hours. I’ve never done distributed processing with modern tools but I suspect it’s pretty easy.

Assuming I hadn’t been sidelined by the xmlschema library, I would be done and ready for scaled testing. I will be pulling contracts (again, not me but OpenAi generated software) from the SEC EDGAR website which holds some contacts which are part of 10k filings for materiality - should be able to find a few hundred.

Again, would this have been 1000 hours of work, 800? 100? Would it have been possible without AI? I do know I had a tool in 6 hours that was not possible 5 years ago, solves a general problem, and could be used tomorrow. It’s not a “copilot” - I just asked the right questions.

Expand full comment

Every output is normalized to lowercase, space collapsed and compared to the original for clause detection, that’s level 1. The output can be 100% concatenated to regenerate the input text. The element detection in each clause extracts named entities. Level 2. The entities are extracted, typed, and variables substituted in each clause and stored for template use. Level 3. The xml for each clause can be reversed back and forth as can xml for variables. The typing is summed over many contracts to set one small set of common types of variables (start date and stop date, quantity/unit of measure types for example). The variable strings are they typed (date string, integer) level 4 and then values are stored level 5. Clauses are given named types over many contracts and standardized level-6 metadata. Contract types and subtypes are extracted and standardized over collections of contracts - level 7 metadata. I have separate passes to generate document metadata and conversions statistics. Clauses stripped of enumeration and variabilized are put into a clause library for later reconciliation, same with variable types/names and contract types / names. The output is also locked to a hash of the file with some other security.

Oh and I extract images, drop in a conversion placeholder pointing to the file, and run image classifier/recognition to for conversion annotation.

I asked OpenAI to write the verifiers as it wrote the extractions

I have a tool I generated which works the other way, I can start with a title, and I use OpenAI to expand the title into an outline, paragraphs, paragraph elements, sentences, it generates a fictional character schema, it can link multiple volumes, and can generate multi volume series, or a novel, short story, the sections can hold multiple content types - sentences, verse, generated images, tables, other strongly typed content. It can write any kind of book besides fiction, movie scripts, white papers, research papers, software, training manuals whatever. I write books for friends.

As for work experience I worked in banking and commodities trading out of my dorm room at Caltech when I was 19 in the early 80’s. I used internet as a playground when it was still ARPANEt and you have to have a login to a nicnode. I have run IT departments, and worked for CIO’s, and have been in the industry for almost 45 years. These tools data architect, solution architect, and code better than any person or team I’ve worked with globally in those 45 years. Personally, I’ve designed chips, built kernels and drivers and application packages, delivered or directed code delivery up to ERP scale, in supply chain, finance, product design, sales, service, and marketing.

These tools do the work in 1/1000 the time.

You can deny it, or you can figure out how to leverage it

Expand full comment

Kevin