2/7/2026
In 2025, a nonprofit called METR ran an actual randomized controlled trial on AI-assisted coding. Not a vibes survey, not a vendor benchmark, but a proper RCT with 16 experienced developers working on real codebases they had maintained for years, using Cursor Pro with Claude Sonnet.
The result? AI made them 19% slower. ๐คฏ
I gotta say, this was supposed to be the bad news. The story that proved the skeptics right. But the more I sat with it, the more I think itโs actually the most interesting piece of evidence we have for why agentic coding is here to stay, and why this whole shift feels different from previous hype cycles.
The slowdown wasnโt about bad tools. It was about what happens when experienced developers actually engage with AI output responsibly. They spent time reviewing AI suggestions, prompting and re-prompting, waiting for generations, checking results against their own understanding of the codebase. They didnโt blindly accept stuff. They applied judgment, and judgment takes time.
So that 19% overhead, I think, is just the cost of doing engineering properly. And itโs the same thing that separates what Andrej Karpathy recently started calling โagentic engineeringโ from the reckless โvibe codingโ he accidentally named a year ago.
the naming actually matters
Three days ago Karpathy posted a retrospective on the first anniversary of his vibe coding tweet. He seemed almost embarrassed by it, a shower thought that now has its own Wikipedia article ๐ . But the correction he offered wasnโt just cosmetic. He proposed โagentic engineeringโ and defined it pretty precisely:
โagentic because the new default is that you are not writing the code directly 99% of the time, you are orchestrating agents who do and acting as oversight. Engineering to emphasize that there is an art and science and expertise to it. Itโs something you can learn and become better at, with its own depth of a different kind.โ
That last sentence is the one I keep coming back to. A different kind of depth. Not shallower than traditional coding, just different. You used to demonstrate competence by writing an elegant implementation, and now you demonstrate it by recognizing when the agentโs implementation is subtly wrong, or by designing the constraints that stop it from going wrong in the first place.
Addy Osmani put it really well I think: โVibe coding = YOLO. Agentic engineering = AI does the implementation, human owns the architecture, quality, and correctness.โ That ownership is kinda the whole game, and it only works if you understand what youโre owning.
The numbers actually line up with this. The Stack Overflow 2025 Survey (49,000 respondents!) found that developer trust in AI accuracy dropped from 40% to 29% year over year, even as adoption climbed to 84%. The top frustration, cited by 45% of devs, was AI solutions that are โalmost right, but not quite.โ Two thirds said they spend more time debugging AI output than expected. So basically, people are using these tools more and trusting them less, which sounds contradictory but isnโt really. Itโs a profession learning that the hard part was never the typing.
antirez actually measured the impact
While Karpathy was finding the right vocabulary, antirez (Salvatore Sanfilippo, the creator of Redis) was kinda measuring what it looks like on the ground. In his post โDonโt fall into the anti-AI hypeโ (which pulled 431,000 views), he documented what he actually built with Claude Code in the span of a few hours:
He reproduced weeks of Redis Streams work in about 20 minutes. He built a pure C library for BERT model inference in 5 minutes, 700 lines of code, same output as PyTorch, 15% slower. He fixed transient test failures in Redis, the kind of flaky timing-related bugs that are genuinely miserable to debug manually. He added UTF-8 support to his linenoise library along with a terminal emulation testing framework, something heโd wanted to do for years but couldnโt justify the time investment.
His conclusion was pretty blunt: โWriting code is no longer needed for the most part. It is now a lot more interesting to understand what to do, and how to do it.โ
But (and this is the part I think gets glossed over when people quote that line) antirez didnโt just hand Claude Code a vague prompt and accept whatever came back. He had design documents, years of context about his own codebases, and the ability to look at Claudeโs output and immediately tell when something was off. He inspected, gave guidance, iterated. He was already doing agentic engineering before the term existed. Give the same tools and the same prompts to an inexperienced developer and theyโd ship garbage. Confidently. In 5 minutes.
Thatโs what I think the METR study actually shows. AI doesnโt replace judgment, it kinda amplifies whatever judgment you already have. If you have good judgment, it acts as leverage. If you donโt, it makes you more confident about being wrong, which is honestly how bugs ship at scale.
the shift, from a personal angle
Iโve been living this transition in a very small, concrete way. I run a personal AI agent (OpenClaw) on a VPS, orchestrated through Telegram. I built the whole system from scratch: hardened the server after a malware scare (which I already wrote about in the previous post!), set up a Tailscale mesh for zero-port access, configured workspace files that shape the agentโs personality and operational rules, built a custom Notion integration so it can publish blog posts directly. I switch between DeepSeek V3.2 for daily tasks and Kimi K2.5 for writing. Iโm also building cron jobs for morning briefings, overnight research, and weekly behavioral audits.
And I gotta say: I am absolutely not vibe coding. Not a single line of any of this was written โby vibes.โ Every decision was an architecture decision. What model for what task. What tools the agent can access. What constraints it operates under. What itโs allowed to do autonomously versus what requires my confirmation. The prompts matter, sure, but the system design matters way more.
This is what I think the shift looks like for working software engineers. We are not becoming obsolete, weโre kinda getting promoted to the job we should have been doing all along. Most of what we called โsoftware engineeringโ was actually implementation labor: translating a known design into syntax. The design part, where you decide what to build and how the components interact and what tradeoffs youโre willing to accept, that part was always the actual engineering. We just spent 90% of our time on the other thing because we had to.
Now we donโt have to as much. The engineers who thrive will be the ones who were already good at the design part, or who develop that skill fast. The ones who struggle will be those whose entire value proposition was โI can write React components quicklyโ or โI memorized the API surface of Spring Boot.โ Thatโs not really engineering, thatโs typing with context.
the pipeline problem (which kinda worries me)
OK but hereโs where my optimism hits a wall.
antirez wrote that he worries about people getting fired, and thatโs a totally legitimate concern. But I worry about something downstream that might actually be worse: where do the next senior engineers come from?
Every senior engineer I know, myself included, built their judgment by writing bad code for years. You learn what good architecture looks like by living through the consequences of bad architecture. You learn to spot subtle bugs because youโve spent painful hours debugging similar ones by hand. You develop taste in code because youโve read thousands of lines of it, both good and terrible, and built an intuitive sense of whatโs right.
If junior developers enter the industry in 2026 and their primary workflow is prompting agents and accepting output they donโt fully understand, theyโll ship code. They might even ship it fast. But they wonโt be building the muscle memory, the scar tissue, the hard-won intuition that turns a junior into a senior. Osmani flagged this as โdangerous skill atrophyโ and honestly I think heโs underselling it a bit. Itโs not really atrophy of existing skills, itโs the prevention of skills that were never developed in the first place.
The METR study showed that AI slowed experienced developers down. Imagine what it does to inexperienced ones. They were never fast in the right ways to begin with, so the new pattern just makes them faster at producing things they canโt evaluate. Which is just technical debt with a smiley face on it, basically.
antirez ended his post with something pretty generous: โThe fun is still there, untouched.โ And heโs right, for people like him. For someone with 20 years of systems programming intuition, AI is pure leverage. You finally get to spend all your time on the interesting problems. But for someone just starting out, the interesting problems are kinda invisible if youโve never struggled with the boring ones first. Debugging a segfault for 6 hours teaches you something that no amount of โpaste the error back into Claudeโ ever will.
where I think this goes
I donโt think AI kills software engineering. I think it kills the version of software engineering that was already dying, which is the commodity implementation work that we pretended was engineering because it required a CS degree to do at a minimally competent level.
What replaces that is harder and (I think) way more valuable: system design, constraint definition, agent orchestration, quality ownership, failure mode analysis. The stuff that was always the real job, now finally visible because the noise of implementation has been stripped away.
Karpathy is right that agentic engineering is a skill you can learn. antirez is right that refusing to engage with these tools is self-sabotage. The METR study is right that this isnโt free, and that the overhead of verification is where the actual engineering happens.
But we really do need to solve the pipeline problem. If we build a world where AI handles all the implementation and humans handle all the judgment, we need to figure out how humans develop that judgment without the implementation reps. Nobody has a good answer for this yet, and if we donโt find one, weโll end up with a generation of โagentic engineersโ who can orchestrate agents beautifully but canโt tell when the agents are confidently, eloquently wrong. ๐ค
Anyhow, thatโs the thing thatโs been keeping me up at night lately. If youโve been thinking about this too, Iโd love to hear from you.
Stay tuned, and keep on coding! โจ
Sources
- METR: Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity (July 2025, arXiv paper)
- Andrej Karpathy: Vibe Coding 1-Year Retrospective (February 4, 2026)
- Andrej Karpathy: 2025 LLM Year in Review (December 2025)
- antirez: Donโt fall into the anti-AI hype (January 2026)
- antirez: Reflections on AI at the end of 2025 (January 2026)
- Addy Osmani: Agentic Engineering (February 2026)
- Stack Overflow 2025 Developer Survey (July 2025)
- Stack Overflow Blog: Developers remain willing but reluctant to use AI (December 2025)