Agentic Engineering Is Not a Vibe

In 2025, a nonprofit called METR ran an actual randomized controlled trial on AI-assisted coding. Not a vibes survey, not a vendor benchmark, but a proper RCT with 16 experienced developers working on real codebases they had maintained for years, using Cursor Pro with Claude Sonnet.

The result? AI made them 19% slower. 🤯

I gotta say, this was supposed to be the bad news. The story that proved the skeptics right. But the more I sat with it, the more I think it’s actually the most interesting piece of evidence we have for why agentic coding is here to stay, and why this whole shift feels different from previous hype cycles.

The slowdown wasn’t about bad tools. It was about what happens when experienced developers actually engage with AI output responsibly. They spent time reviewing AI suggestions, prompting and re-prompting, waiting for generations, checking results against their own understanding of the codebase. They didn’t blindly accept stuff. They applied judgment, and judgment takes time.

So that 19% overhead, I think, is just the cost of doing engineering properly. And it’s the same thing that separates what Andrej Karpathy recently started calling “agentic engineering” from the reckless “vibe coding” he accidentally named a year ago.

the naming actually matters

Three days ago Karpathy posted a retrospective on the first anniversary of his vibe coding tweet. He seemed almost embarrassed by it, a shower thought that now has its own Wikipedia article 😅. But the correction he offered wasn’t just cosmetic. He proposed “agentic engineering” and defined it pretty precisely:

“agentic because the new default is that you are not writing the code directly 99% of the time, you are orchestrating agents who do and acting as oversight. Engineering to emphasize that there is an art and science and expertise to it. It’s something you can learn and become better at, with its own depth of a different kind.”

That last sentence is the one I keep coming back to. A different kind of depth. Not shallower than traditional coding, just different. You used to demonstrate competence by writing an elegant implementation, and now you demonstrate it by recognizing when the agent’s implementation is subtly wrong, or by designing the constraints that stop it from going wrong in the first place.

Addy Osmani put it really well I think: “Vibe coding = YOLO. Agentic engineering = AI does the implementation, human owns the architecture, quality, and correctness.” That ownership is kinda the whole game, and it only works if you understand what you’re owning.

The numbers actually line up with this. The Stack Overflow 2025 Survey (49,000 respondents!) found that developer trust in AI accuracy dropped from 40% to 29% year over year, even as adoption climbed to 84%. The top frustration, cited by 45% of devs, was AI solutions that are “almost right, but not quite.” Two thirds said they spend more time debugging AI output than expected. So basically, people are using these tools more and trusting them less, which sounds contradictory but isn’t really. It’s a profession learning that the hard part was never the typing.

antirez actually measured the impact

While Karpathy was finding the right vocabulary, antirez (Salvatore Sanfilippo, the creator of Redis) was kinda measuring what it looks like on the ground. In his post “Don’t fall into the anti-AI hype” (which pulled 431,000 views), he documented what he actually built with Claude Code in the span of a few hours:

He reproduced weeks of Redis Streams work in about 20 minutes. He built a pure C library for BERT model inference in 5 minutes, 700 lines of code, same output as PyTorch, 15% slower. He fixed transient test failures in Redis, the kind of flaky timing-related bugs that are genuinely miserable to debug manually. He added UTF-8 support to his linenoise library along with a terminal emulation testing framework, something he’d wanted to do for years but couldn’t justify the time investment.

His conclusion was pretty blunt: “Writing code is no longer needed for the most part. It is now a lot more interesting to understand what to do, and how to do it.”

But (and this is the part I think gets glossed over when people quote that line) antirez didn’t just hand Claude Code a vague prompt and accept whatever came back. He had design documents, years of context about his own codebases, and the ability to look at Claude’s output and immediately tell when something was off. He inspected, gave guidance, iterated. He was already doing agentic engineering before the term existed. Give the same tools and the same prompts to an inexperienced developer and they’d ship garbage. Confidently. In 5 minutes.

That’s what I think the METR study actually shows. AI doesn’t replace judgment, it kinda amplifies whatever judgment you already have. If you have good judgment, it acts as leverage. If you don’t, it makes you more confident about being wrong, which is honestly how bugs ship at scale.

the shift, from a personal angle

I’ve been living this transition in a very small, concrete way. I run a personal AI agent (OpenClaw) on a VPS, orchestrated through Telegram. I built the whole system from scratch: hardened the server after a malware scare (which I already wrote about in the previous post!), set up a Tailscale mesh for zero-port access, configured workspace files that shape the agent’s personality and operational rules, built a custom Notion integration so it can publish blog posts directly. I switch between DeepSeek V3.2 for daily tasks and Kimi K2.5 for writing. I’m also building cron jobs for morning briefings, overnight research, and weekly behavioral audits.

And I gotta say: I am absolutely not vibe coding. Not a single line of any of this was written “by vibes.” Every decision was an architecture decision. What model for what task. What tools the agent can access. What constraints it operates under. What it’s allowed to do autonomously versus what requires my confirmation. The prompts matter, sure, but the system design matters way more.

This is what I think the shift looks like for working software engineers. We are not becoming obsolete, we’re kinda getting promoted to the job we should have been doing all along. Most of what we called “software engineering” was actually implementation labor: translating a known design into syntax. The design part, where you decide what to build and how the components interact and what tradeoffs you’re willing to accept, that part was always the actual engineering. We just spent 90% of our time on the other thing because we had to.

Now we don’t have to as much. The engineers who thrive will be the ones who were already good at the design part, or who develop that skill fast. The ones who struggle will be those whose entire value proposition was “I can write React components quickly” or “I memorized the API surface of Spring Boot.” That’s not really engineering, that’s typing with context.

the pipeline problem (which kinda worries me)

OK but here’s where my optimism hits a wall.

antirez wrote that he worries about people getting fired, and that’s a totally legitimate concern. But I worry about something downstream that might actually be worse: where do the next senior engineers come from?

Every senior engineer I know, myself included, built their judgment by writing bad code for years. You learn what good architecture looks like by living through the consequences of bad architecture. You learn to spot subtle bugs because you’ve spent painful hours debugging similar ones by hand. You develop taste in code because you’ve read thousands of lines of it, both good and terrible, and built an intuitive sense of what’s right.

If junior developers enter the industry in 2026 and their primary workflow is prompting agents and accepting output they don’t fully understand, they’ll ship code. They might even ship it fast. But they won’t be building the muscle memory, the scar tissue, the hard-won intuition that turns a junior into a senior. Osmani flagged this as “dangerous skill atrophy” and honestly I think he’s underselling it a bit. It’s not really atrophy of existing skills, it’s the prevention of skills that were never developed in the first place.

The METR study showed that AI slowed experienced developers down. Imagine what it does to inexperienced ones. They were never fast in the right ways to begin with, so the new pattern just makes them faster at producing things they can’t evaluate. Which is just technical debt with a smiley face on it, basically.

antirez ended his post with something pretty generous: “The fun is still there, untouched.” And he’s right, for people like him. For someone with 20 years of systems programming intuition, AI is pure leverage. You finally get to spend all your time on the interesting problems. But for someone just starting out, the interesting problems are kinda invisible if you’ve never struggled with the boring ones first. Debugging a segfault for 6 hours teaches you something that no amount of “paste the error back into Claude” ever will.

where I think this goes

I don’t think AI kills software engineering. I think it kills the version of software engineering that was already dying, which is the commodity implementation work that we pretended was engineering because it required a CS degree to do at a minimally competent level.

What replaces that is harder and (I think) way more valuable: system design, constraint definition, agent orchestration, quality ownership, failure mode analysis. The stuff that was always the real job, now finally visible because the noise of implementation has been stripped away.

Karpathy is right that agentic engineering is a skill you can learn. antirez is right that refusing to engage with these tools is self-sabotage. The METR study is right that this isn’t free, and that the overhead of verification is where the actual engineering happens.

But we really do need to solve the pipeline problem. If we build a world where AI handles all the implementation and humans handle all the judgment, we need to figure out how humans develop that judgment without the implementation reps. Nobody has a good answer for this yet, and if we don’t find one, we’ll end up with a generation of “agentic engineers” who can orchestrate agents beautifully but can’t tell when the agents are confidently, eloquently wrong. 🤔

Anyhow, that’s the thing that’s been keeping me up at night lately. If you’ve been thinking about this too, I’d love to hear from you.

Stay tuned, and keep on coding! ✨

Sources

METR: Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity (July 2025, arXiv paper)
Andrej Karpathy: Vibe Coding 1-Year Retrospective (February 4, 2026)
Andrej Karpathy: 2025 LLM Year in Review (December 2025)
antirez: Don’t fall into the anti-AI hype (January 2026)
antirez: Reflections on AI at the end of 2025 (January 2026)
Addy Osmani: Agentic Engineering (February 2026)
Stack Overflow 2025 Developer Survey (July 2025)
Stack Overflow Blog: Developers remain willing but reluctant to use AI (December 2025)