AI SystemsAgent_ArchitectureSelf_Hosting10 March 20266 min readLee Leckenby // System Builder

Running agents locally is a different kind of control

After running OpenClaw on a VPS, I rebuilt the stack locally on a Mac Mini. The system works well now, but the real lessons came from operating it daily, architecture decisions, memory design, security boundaries, and the hidden cost of context.

// FOCUS

AI agent operations, local infrastructure, system architecture

// AUDIENCE

Builders, operators, and AI-native product people

// FORMAT

Article

In the last article I wrote about running OpenClaw on a VPS.

It worked.

But it never really felt like mine.

So I rebuilt the entire stack locally on a Mac Mini.

Not because it was easier.

Because it gave me full control of the environment.

And once you start running agents every day, control becomes the real requirement.

The system quickly stops being “an agent”

Standing up OpenClaw is straightforward.

Run the wizard.
Create an agent.
Connect a channel.

Done.

But once the system becomes part of your workflow, new problems appear.

Memory.
Costs.
Context size.
Where work outputs live.
How agents interact with tools.

You stop building an agent.

You start operating a system.

Architecture overview

The easiest way to understand the setup is as layers.

At the top sit the agents: Kodi, Rook, and Lab, each with a distinct role.

Beneath them, the OpenClaw gateway connects those agents to memory, tooling, automation, and the underlying workspace and provider infrastructure that makes the system work.

flowchart TD
    TG[Telegram Interface]
    TS[Tailscale Access]

    K[Kodi Operator Agent]
    R[Rook Revenue Agent]
    L[Lab Model Testing Agent]

    G[OpenClaw Gateway]

    Q[QMD Memory and Session State]
    A[Cron Heartbeat and Ollama Jobs]
    T[Local Tooling and Providers]

    C[Codex Cursor Gemini NotebookLM]
    P[Anthropic OpenRouter Ollama]
    X[Supabase GitHub Google]

    V[Obsidian Vault Kodi Workspace]
    W[Rook Revenue Workspace and Vault Mirror]

    TG --> K
    TG --> R
    TG --> L

    TS --> G

    K --> G
    R --> G
    L --> G

    G --> Q
    G --> A
    G --> T

    T --> C
    T --> P
    T --> X

    Q --> V
    Q --> W

    A --> V
    A --> W

Splitting agents changes everything

The base setup assumes one agent.

That breaks down quickly.

So the system now runs three.

Kodi handles operations and coordination.
Rook scouts ideas and revenue opportunities.
Lab is a safe place to test models and fallback strategies.

Each agent has isolated state, model routing, and its own Telegram bot interface.

That separation removed a surprising amount of noise.

Experiments no longer break production behaviour.
Cheap tasks no longer trigger expensive models.
Debugging becomes much easier.

The vault is the control plane

Instead of the default OpenClaw workspace, the main agent operates inside my Obsidian vault.

The vault runs an IPARAG structure a digital organisation framework that expands on Tiago Forte’s popular PARA method.

Ideas.
Projects.
Areas.
Resources.
Archive.
Governance.

That structure matters more than it sounds.

Agents read from it.
Write to it.
Organise work inside it.

Humans and agents are operating inside the same system.

That shared environment makes coordination far simpler.

Memory has to be inspectable

QMD-backed retrieval plus file-based memory keeps the system inspectable.

That gives the agents durable context.

But more importantly, it keeps the memory visible.

You can see what the agent knows.
Correct it.
Remove it.

Opaque memory systems make that almost impossible.

The biggest unsolved problem: context

The hardest issue I’m still working through is context size.

Every request carries a surprising amount of instruction and system context.

Agent rules.
Workspace files.
Memory references.
Tool instructions.

That front-loading adds up quickly.

It works.

But it is inefficient.

The real optimisation problem for agent systems is not just model choice.

It is how much context the system sends on every call.

That is where most token spend hides.

Coding agents are tools, not co-workers

Another lesson was how to handle implementation work.

Instead of forcing the OpenClaw agents to write and debug code themselves, they delegate to local tools.

Codex.
Gemini.
Cursor.

The OpenClaw agents prepare the brief.

The coding tools do the heavy work.

Then the agents review the results.

This keeps agent threads short and avoids endless debugging loops inside chat.

Codex became the real sidekick

The tool I rely on most now is the Codex app.

OpenClaw itself lives as a project inside Codex.

Whenever something behaves strangely, Codex helps investigate.

It reviews logs.
Checks OpenRouter token usage.
Surfaces configuration mistakes.

It is also useful for optimisation.

I regularly have it scan the system looking for improvements.

Sometimes that comes from log analysis.

Sometimes from ideas pulled in from other builders writing about similar systems.

The setup is constantly evolving.

Cheap automation matters

Some jobs run on powerful models.

Most should not.

Selected cron and maintenance jobs run on smaller models or local Ollama.

Daily briefings.
Maintenance tasks.
Simple housekeeping.

Those tasks do not need reasoning power.

They need reliability.

Running them locally keeps costs predictable.

Security needed attention early

The moment agents connect to real services, security becomes real too.

Two changes made a big difference.

Secrets moved to environment variables instead of config files.

And the gateway stays local on loopback, with Tailscale Serve layered on top for secure access.

That is a much safer posture than exposing a raw endpoint.

Not perfect.
But materially better.

It works, But it still has moments

The system works well now.

But it still has its moments.

Model quirks.
Unexpected context issues.
The occasional runaway instruction loop.
Dropped memory.
Failed cron jobs.

That seems to be the nature of agent systems right now.

You do not finish building them.

You operate them.
You tweak them.

The real lesson

Running agents is easy.

Operating them is the real work.

The intelligence is only one piece.

The architecture around it matters more.

Memory.
Security.
Context discipline.
Model routing.

Without those pieces you have a demo.

With them you start to have a system.