OperationsAI Strategy26 February 20265 min readLee Leckenby // AI Systems Builder

Shipping agents is easy. Running them is not.

What it actually takes to run agents in production: stability, memory, governance.

// FOCUS

Operational reality of agent systems

// AUDIENCE

Builders, operators, and AI-native product people

// FORMAT

Article

I stood up OpenClaw on a VPS in thirty minutes.
Hardening it took days.

That gap is the real system boundary.

What looked like a quick infrastructure task forced me to design an operating model.

1. The naive start

VPS. Docker. Compose. Reverse proxy.

It ran.

That is where most builders stop.

I treated uptime as proof. It wasn’t.
It only proved the process survived launch. It said nothing about whether it could survive change.

2. The first cracks

Memory uncertainty. Restart noise. Permissions drifting.

The system appeared functional, but it could not prove its own integrity.

I realised I had built a demo loop.
It looked alive. It was not durable.

3. Stability is a different problem

The difference between it runs and it survives is operations.

Not features.

So I added the unglamorous layer:

  • Explicit firewall policy with restricted ingress

  • Reverse proxy with TLS termination

  • Secrets managed in compose

  • Persistent volumes with ownership control

  • Automated cleanup with systemd timers

Individually, each step is boring.

Collectively, they are the product.

4. Memory is everything

Without persistent, validated memory, an agent is stateless theatre.

I configured mem0 OSS. Proved read and write. Tracked permission drift. Added timed repair.

That shifted the work.

I stopped guessing.
I started instrumenting.

5. Configuration drift is real

AI infrastructure drifts quietly.

Ownership flips.
Environment variables diverge.
Providers auto-detect and change behaviour.

If I do not instrument it, I do not own it.

Configuration is not documentation.
It is executable constraint.

6. Governance is not theatre

The logs showed dangerous configuration flags.

It would have been easy to ignore them.

Instead, I treated them as an attack surface.

Governance is not a policy document.
It is a set of constraints that prevent bad states from existing in the first place.

7. The hidden layer

The final ten percent is not optional. It is the difference between a demo and a platform.

  • MCP wiring so tools actually execute

  • Gateway plumbing: domain, WebSocket URL, canvas mount, exposed host port

  • Provider routing: OpenRouter base URL, merge mode, fallback chain

  • Integrations: Telegram bot, Google OAuth services

  • Data stores: mem0 paths and volumes

None of this is glamorous.

All of it is required.

8. The bigger lesson

AI agents are distributed systems with state, identity, and attack surface.

Solo builders now carry platform-level responsibility.

The advantage will not belong to whoever spins up agents fastest.

It will belong to whoever keeps them stable, secure, and aligned over time.

I did not just build an agent.

I built infrastructure.
Observability.
Governance scaffolding.
Operational muscle.

That is the work.

And that is the edge.