·4 min read

Agentic Workloads on Linux: Btrfs + Service Accounts at SCaLE 23x

linuxinfrastructureaiopen-sourcehomelab

Containers solved packaging. They did not solve persistent state for AI agents. That was the opening argument from David Duncan's talk on the Fedora Hatch Day track at SCaLE 23x, and it reframed how I think about where AI workloads belong on a Linux system.

The core idea is simple: each AI agent gets its own Linux service account. Not a container. A user. The agent's memory lives in its home directory, which is backed by a Btrfs subvolume. When you need to snapshot the agent's state, you snapshot the subvolume. When the agent comes back up, its memory is exactly where it was. When you take the agent down, the snapshot persists. That is not something you get from a container without significant external tooling.

The Unix isolation here is deliberate and useful. Service accounts cannot give each other privileges. An agent running as agent-review has no path to escalate to agent-deploy. Standard Linux permission model, nothing exotic. If you manage multiple agents with a YAML file, each one gets its own user and its own Btrfs subvolume, and that constraint is enforced at the OS layer.

David laid out the memory model in a way that made the design choices clear. Individual agent spaces hold that agent's working memory and context. Shared spaces, what he called community spaces, are directories that other agents can crawl. A research agent can deposit findings into a shared space. A synthesis agent can read from it. No message queue required. No broker. Just a directory with appropriate permissions.

Architecture diagram showing AI agents as Linux service accounts with Btrfs subvolumes for memory, PG Vector for long-term storage, btrfs send for incremental model weight replication, and shared community spaces for inter-agent communication.

Long-term memory for each agent goes into PG Vector. The Btrfs snapshot handles operational state and context. The vector database handles semantic retrieval across longer time horizons. These are different problems and the architecture treats them differently.

The Btrfs send/receive capability is where this gets interesting for people running local models. btrfs send -p <parent> <child> computes a binary delta between two snapshots and streams only the changed blocks. For LLM weight updates, that matters. If you fine-tune a model and the weights change, you are not pushing the entire model. You are pushing the diff. On a 70B parameter model at fp16, that is 140GB of base storage. ZSTD compression on Btrfs gets you 15 to 30 percent reduction on model weights. Incremental sends get you the difference between fine-tuning runs, not the full payload every time.

David compared this against the obvious alternatives. S3 sync copies the full object unless you build differential logic on top. Rsync does delta sync but it is not snapshot-aware and does not integrate with the memory model the way Btrfs does. Container layers get you deduplication but not the send/receive replication pattern. The one-to-many replication case is where Btrfs really separates: a single source volume can replicate to multiple agent targets simultaneously.

Why not containers, specifically? The talk addresses this directly. Containers were designed for stateless workloads where the container is destroyed and rebuilt. That pattern works well for services. It works poorly for agents that accumulate context over time and need that context to persist across restarts. The container lifecycle fights the agent lifecycle. This architecture aligns them.

The working example is on GitHub at github.com/davdunc/btrfs-replication-test, using Fedora as the OS. The repository includes deploy scripts for EC2 (two Fedora 43 instances with Btrfs volumes) and scripts to test full and incremental sends over SSH.

I have been running agents in containers out of habit. This talk gave me a reason to think about that more carefully. The service account model maps cleanly to the Unix mental model I already have. The Btrfs integration handles the persistence and replication concerns without a separate tool. For local homelab AI workloads, this is worth experimenting with.


Related: Container Vulnerability Hiding Workshop · KwaaiNet: Distributed AI Inference