Discussion about this post

User's avatar
Kathleen Weber's avatar

When I learned that an LLM is simply an average of the Internet, I knew it could be nothing more EVER than a large but shallow pool of mediocrity.

Expand full comment
hugh's avatar

That benchmark is so absurd. How well an LLM does on coding tasks has everything to do with what it’s been trained and fine tuned on. Putting time to complete task on the y-axis is silly.

Coding LLMs are all fine tuned on popular stacks (ex: React/Next, Tailwind, FastAPI) and common patterns, so you can have an LLM cook up a React component that would have taken a dev a few days, even weeks or months, as long as what you need isn’t too far away from the training distribution (think drop downs, profile pages, comment sections, CRUD endpoints, ect). If not, then it’s mostly garbage code that will need to be rewritten.

It’s also quite hard to tell where the edge of the distribution is. In my experience I’ve been surprised how many basic tasks Claude Code falls apart on.

Of course the irony is that if you don’t know what you’re doing (most vibe coders don’t) then you’ll be generating code much lower quality than the equivalent open source library and taking way longer to get it working right.

Even the idea that you can accurately measure the time a coding task will take is laughable to any professional software engineer. SWE work isn’t like construction, it’s very hard to estimate timing.

Expand full comment
113 more comments...

No posts