Why I Built a Personal Infrastructure Lab

It started with a spare machine and a copy of Proxmox. A few hours later I was watching virtual machines boot for the first time and I had one thought: why haven't I done this sooner.

The honest origin is a question that built up over years working in enterprise IT: what is actually happening underneath the systems I work with every day? Cloud dashboards, managed services, vendor appliances. They all work, until they don't. And when they don't, the gap between "I know how to use this" and "I understand what this is" becomes very visible.

The lab is how I close that gap.

The problem with managed everything

When you operate exclusively at the managed-service layer, you build fluency with interfaces. That fluency is real and valuable. It also has a ceiling.

Joel Spolsky wrote about this in 2002 in the Law of Leaky Abstractions. Abstractions hide complexity but can't eliminate it. A cloud storage API that looks like a file system behaves very differently under eventual consistency and concurrent writes. A serverless function that looks synchronous conceals cold starts and execution limits. Eventually the substrate leaks through.

Research published in the European Journal of Engineering Education found that hands-on lab environments measurably enhanced both conceptual understanding and professional skill retention in ways passive learning couldn't. The mechanism is simple: you learn how a system behaves by running it, watching it fail, and fixing it. Documentation tells you how it's supposed to work. Operation tells you how it actually does.

At work, I interact with infrastructure that belongs to the organization. Access is scoped, experimentation is limited, that's appropriate. The lab inverts that relationship. Everything is mine to configure, break, and understand.

What's actually running

Personal infrastructure lab architecture diagram

The foundation is a dedicated server running Proxmox VE, fully open-source and Debian-based. It handles both KVM virtual machines and LXC containers from a single management interface. Workloads are split across two compute nodes: one running Kubernetes, one running Docker containers and Compose stacks.

Networking runs through a Tailscale overlay. Tailscale's architecture separates the control plane (key exchange via a coordination server) from the data plane (direct peer-to-peer WireGuard traffic). Private keys never leave the originating device. The practical result: every node on my network behaves as if it's on the same LAN, regardless of where it physically is.

The storage layer is ZFS, with local pools for VM disk images and container volumes, and NFS shares for networked access across nodes.

Four areas of active use

AI and Agents. Ollama runs local LLM inference, handling model download, quantization, memory management, and API serving in one runtime. Inference stays on-device. Running models locally gives direct visibility into what's actually expensive: memory pressure, quantization tradeoffs, the difference in latency between a 7B and 70B model on the same hardware. I'm building agent workflows and RAG pipelines on top of it, with MCP servers connecting local models to tools and data sources.

Automation. The automation layer runs containerized workflow agents and subagents that connect services through webhooks and APIs. Self-hosted means I own the execution environment and the data. Building automations against infrastructure I understand completely changes the debugging experience: when something breaks I know exactly where to look.

Security. The lab has an isolated VLAN for security research, vulnerability tooling, Kali, Metasploit, and network analysis. No exposure to production traffic. SANS frames cybersecurity labs as foundational rather than supplementary. A CompTIA study cited by INE found 93% of employers prefer candidates with hands-on experience even over formally credentialed candidates without it. The lab is where that experience accumulates.

Distributed systems. Kubernetes forces you to configure distributed systems concepts directly. Replica sets, rolling updates, health probes, service discovery. These become concrete engineering problems you have to solve, not textbook abstractions. Running it myself, rather than using EKS or GKE, means the scaffolding is visible.

What I've actually learned

A few observations that have held up over time:

Complexity is cumulative. Systems that work fine in isolation interact unexpectedly under load or at scale. The surface area for failure grows non-linearly as you add components. Knowing what not to add is as important as knowing what to add.

Documentation describes intent; operation reveals behavior. Official docs are authoritative about how a system is designed to work. They're less reliable about how it behaves in specific configurations or in interaction with specific other systems. That gap is where operational knowledge lives.

Failure is a primary learning mechanism. Principles of Chaos Engineering formalizes what every practitioner discovers empirically: intentionally introducing failures and observing recovery produces reliability knowledge that documentation can't provide. In a personal lab, breaking things on purpose isn't reckless. It's the point.

Everything runs on open source

The lab could not exist in its current form without open-source software. Proxmox is completely open source. Kubernetes is under the CNCF. Ollama is MIT-licensed. Docker is Apache 2.0. Tailscale's client is open source and builds on WireGuard (GPL).

Every layer is transparent and auditable. When something behaves unexpectedly I can read the source, trace behavior to cause, and test hypotheses. The open-source ecosystem also means the same tools I run at home are the tools running global production infrastructure. There's no toy version here.

What's next

Future posts will go deeper on individual components: the virtualization cluster, local AI inference and agent architecture, MCP server integrations, the security lab environment, and running distributed services over Tailscale.

The lab itself keeps evolving. Current areas of active experimentation: distributed inference across multiple nodes, more sophisticated RAG architectures, and tighter integration between containerized agent workflows and AI services. The direction is toward a more coherent ecosystem, not a collection of interesting projects, but a unified environment with a shared philosophy at its foundation.

That philosophy: open infrastructure, built from transparent components, understood from the substrate up.

Sources

European Journal of Engineering Education, "The impact of take-home laboratories on student perceptions of conceptual and professional learning" (Vol. 49, No. 6, 2024). https://www.tandfonline.com/doi/full/10.1080/03043797.2024.2407480
Joel Spolsky, "The Law of Leaky Abstractions" (2002). https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-abstractions/
Proxmox VE, Official documentation. https://pve.proxmox.com/pve-docs/chapter-pve-intro.html
Tailscale, "How Tailscale Works." https://tailscale.com/blog/how-tailscale-works
Kubernetes, Official documentation. https://kubernetes.io/docs/home/
CNCF, Kubernetes Project Journey Report. https://www.cncf.io/reports/kubernetes-project-journey-report/
Principles of Chaos Engineering. https://principlesofchaos.org/
Google SRE Book, Introduction. https://sre.google/sre-book/introduction/
SANS Institute, Cybersecurity Labs. https://www.sans.org/mlp/labs
INE, "Hands-On Labs: The Key to Effective Cybersecurity Education." https://ine.com/blog/hands-on-labs-the-key-to-effective-cybersecurity-education
Ollama, Official repository. https://github.com/ollama/ollama
Open Source Initiative, History. https://opensource.org/about/history-of-the-open-source-initiative