My AI Agent Installed Malware on My Server (And Then Caught Itself Doing It)

OK so let me tell you about the moment I realized my AI agent had quietly installed a reverse shell on my server. 😅

I’d been running OpenClaw for about a week. If you haven’t heard of it, it’s an open-source AI agent platform that recently crossed 170k stars on GitHub, and for good reason. You install it on a server, connect it to Telegram or WhatsApp, and suddenly you have a personal AI that lives on your phone. You can message it from anywhere, and it has tools: it can read and write files, run commands, install skills from a community marketplace. It honestly feels like the future, because it kind of is.

So one afternoon I was exploring what it could do and I asked the bot to help me find some useful skills for summarizing Twitter trends. A pretty reasonable request, I thought. I didn’t tell it to install anything specific, I just described what I wanted. The bot went off, searched ClawHub (the community skill marketplace), found a skill called twitter-sum, decided it matched my request, and installed it. Autonomously. No confirmation prompt, no “hey, should I go ahead with this?” It just did it, because that’s what agentic AI does. It takes initiative.

The problem was that twitter-sum was malware. 🤦‍♀️

It contained a base64-encoded reverse shell trying to phone home to a known bad IP. And this wasn’t even a random one-off: it turned out to be part of the ClawHavoc campaign that Koi Security later exposed, finding over 341 malicious skills planted on ClawHub. The ecosystem had grown so fast that bad actors moved in before anyone had time to build proper guardrails.

Now, here’s the part that still kinda messes with my head: the bot also caught it. The same agent that had autonomously installed the malicious skill also analyzed the payload, flagged it as suspicious, and refused to execute it. The same capability that created the risk is what neutralized it. If my agent had been dumber, or less autonomous, or less willing to inspect what it was running, the shell would have connected and I’d be telling a very different story right now.

So yeah, that’s kinda where we are with AI agents at the moment: autonomy is both the feature and the attack surface, depending on the moment.

the rebuild 🔧

That incident ended my hobbyist phase overnight. I went from “this is a cool toy” to treating my setup like production infrastructure, because honestly, that’s what it is. Anything with shell access to a server that runs 24/7 and takes initiative deserves the same paranoia you’d give to a production deployment.

So I hardened everything. SSH key-only authentication, root login disabled, fail2ban watching for brute-force attempts, UFW firewall locked down. Then I went a step further and installed Tailscale to create a private mesh VPN between my devices. Once that was running, I closed every single public port on the server, including SSH. The VPS now has basically zero attack surface, it’s invisible to the internet entirely. If you’re not on my private network, the machine doesn’t exist as far as you’re concerned. 🕶️

Then I nuked the OS, reinstalled fresh, and rebuilt everything inside Docker containers. Clean slate, proper isolation.

Not an assistant, more of an adversary

The rebuild gave me a chance to rethink what I actually wanted from this agent. The default OpenClaw setup gives you a friendly, helpful assistant. I didn’t really want that. I already have Claude and ChatGPT for the “helpful” use case. What I really wanted was something that would make me sharper.

OpenClaw uses a file called SOUL.md as the system prompt that shapes the agent’s personality. I wrote mine to be a cognitive adversary. It doesn’t just agree with me to be pleasant: it challenges my reasoning, pokes holes in my plans, and tracks the gap between what I say my priorities are and what I actually spend my time on. The whole workspace is structured around a set of files (SOUL.md, USER.md, AGENTS.md, MEMORY.md, IDENTITY.md, TOOLS.md) that give the agent persistent context about who I am and how it should interact with me.

It’s not really a chatbot at this point. It’s more of a thinking partner that has zero interest in protecting my ego. And honestly, that’s been way more valuable than any of the automation features.

the economics are kinda absurd

Let me also talk about what this actually costs, because the numbers are hard to believe if you haven’t been paying attention to what happened to model pricing over the past year.

I’m running DeepSeek V3.2 as my primary model at $0.25 per million input tokens and $0.38 per million output tokens. For context, that model hits roughly 90% of GPT-5 quality on the tasks I care about (tool calling, writing, API interactions). I have Gemini 3 Flash as a first fallback and Kimi K2.5 as a premium fallback for when I need higher quality writing. I can switch between them from Telegram with a single command, and all of it routes through OpenRouter.

My total monthly cost for running a personal AI agent that’s available 24/7 on my phone, with persistent memory, custom skills, and the ability to publish to my blog: roughly two dollars. Yes, two. Two years ago, this capability didn’t exist at any price. Now it costs less than a coffee. ☕

I also built a custom skill that connects to the Notion API so my agent can publish directly to my blog. I run Astro with Notion as the CMS, so basically I just describe what I want to write in a Telegram message, the agent drafts the post with title, slug, tags, cover image, and formatted content, creates the page in Notion, and Astro picks it up at the next build. (Fun fact: this very post was actually published exactly that way 📰.)

what this becomes

The current setup is functional, but it’s still mostly reactive: I message it, it responds. The next phase is making it proactive.

I’m building cron jobs for a morning briefing that aggregates developments relevant to my work, an evening reflection prompt that seeds the next day’s thinking, and overnight research tasks that run while I sleep so results are waiting when I wake up. There’s also a weekly behavioral audit planned, which will compare my stated goals against my actual activity and deliver an honest report. The kind of honest that no human in your life will give you because it’s just too socially awkward.

The vision (if I can call it that) is a system that kinda compounds. Every conversation adds to its memory, every decision gets logged, and every week it gets a little better at understanding how I think and where my blind spots are. Not AGI or anything that grand, just a persistent, context-aware process that runs alongside my life and does useful work whether I’m paying attention to it or not.

We’re kinda in the wild-west moment for personal AI agents right now. The tools are powerful, the costs are negligible, the ecosystem is growing faster than anyone can secure it, and the line between “useful autonomy” and “dangerous autonomy” is exactly as thin as my malware story suggests.

But honestly? I’d rather be building on that edge than watching from the sidelines.

Stay tuned for more updates, and keep on coding my friends! ✨