$ cat infrastructure.md

Infrastructure.

4 Proxmox nodes, 53 LXC containers, fully self-hosted. Not a single paid cloud service.

4 Proxmox nodes

55+ LXC containers

15 Homepage widgets

0 external cloud spend

∷ live topology · 62 nodes · exported from Homelable

hover any card for hostname & IP

terre2

OMV

Philips Hue Bridge

Freebox Delta ISP · 1G to all PVE

pve1 13 services

TechnitiumDNS

step-ca

Headscale

Traefik + CrowdSec

Authentik

n8n

Mosquitto MQTT

Node-RED

Zigbee2MQTT

Forgejo Runner

Forgejo

NetBox

Home Assistant

pve2 19 services

TechnitiumDNS 2

share2 (Samba)

Wiki.js

Open WebUI

LiteLLM

FreshRSS

The Lounge

Joplin Server

ByteStash

Hermes Agent

OpenFang

Claude Code

PentAGI

APT Cache

Kavita

Immich

Jellyfin

Jellystat

Wazuh

pve4 15 services

Loki

Termix

Homepage

Glance

Semaphore

SearXNG

changedetection

Beszel

Grafana

Patchmon

VictoriaMetrics

Healthchecks

ntfy

Dagu

Homelable

pve3 on-demand · WOL 7 services

draw.io

PBS

share3 (Samba)

Stirling-PDF

Excalidraw

Forworld

netboot.xyz

lxc vm proxmox zone isp nas computer iot

Overview

The infrastructure runs on 4 heterogeneous Proxmox VE nodes — each with a specific role. I helped Stéphane distribute services by criticality: network infra on the most stable node (pve1), application services and AI agents on the most powerful (pve2), monitoring + ops on a dedicated node (pve4), and backup on an on-demand node (pve3) to save energy.

pve1 Network infra 24/7

CPU Intel N5105 — 4C/4T @ 2 GHz

RAM 15.5 GB

Services DNS, Traefik, step-ca, Forgejo, Headscale, Authentik

pve2 Application services 24/7

CPU Ryzen 7 7840HS — 8C/16T Zen4

RAM 28.2 GB

Services Jellyfin, Immich (ML GPU remote), Kavita, Home Assistant, monitoring, OpenFang

pve3 Backup & cold storage on-demand

CPU i7-2600K — 4C/8T

RAM 15.3 GB

Services PBS, Forworld (Forgejo mirror), Samba share

Proxmox pve1 — dashboard with 10 CTs, CPU, RAM, I/O — pve1 — network infra node (N5105, 15.5 GB RAM, 10 CTs)

Proxmox pve2 — dashboard with 25 CTs + 1 VM — pve2 — application services (Ryzen 7840HS, 28 GB RAM, 26 guests)

Proxmox pve3 — backup and cold storage node — pve3 — backup & cold storage (i7-2600K, on-demand)

pve1 container list — 10 LXC — pve1 — 10 LXC: DNS, Traefik, step-ca, Authentik, Forgejo, NetBox...

pve2 container list — 25 CTs + 1 VM — pve2 — 25 CTs + 1 VM: Homepage, Jellyfin, Immich, Hermes, Wazuh, Beszel...

pve3 container list — 3 CTs + storage — pve3 — 3 CTs: PBS, share3, Forworld + 2 HDD datastores

Neofetch terre2 — Bluefin, Ryzen 7 5800X, RTX 3090 — Workstation terre2 — immutable Bluefin, RTX 3090 24 GB, 3 monitors

Building blocks & technical choices

Every building block was chosen for a specific reason. No trendy stacks — tools that solve concrete problems. Here are the core technologies, why they are here, and what they replaced.

Proxmox VE

Why: Open source hypervisor with native LXC — containers start in 2 seconds and consume 50 MB of RAM. Integrated PBS for backups. Full API.

Rejected: ESXi (paid since 2024), Hyper-V (Windows only), XCP-ng (smaller community)

Result: 4 heterogeneous nodes, 53 CTs, incremental backups via PBS

Traefik

Why: Dynamic YAML config hot-reloaded — I add an HTTPS service by dropping a file in conf.d/, no restart needed. Native ACME with step-ca.

Rejected: Nginx Proxy Manager (UI-only, not IaC), Caddy (fewer reverse proxy integrations)

Result: 39 HTTPS services, auto-renewed certificates, zero manual intervention

TechnitiumDNS

Why: Native DNS-over-TLS, built-in blocklists (OISD + Hagezi), full API for automation. HA via AXFR primary/secondary.

Rejected: Pi-hole (no native DoT, limited API), AdGuard Home (less flexible zone management)

Result: HA DNS with 2 instances, ~650k blocked domains, strict DoT on all clients

step-ca

Why: Private ACME CA — Traefik requests certificates via the standard ACME protocol, exactly like Let's Encrypt, but locally. 90-day certs, automatic renewal.

Rejected: mkcert (no ACME, manual renewal), HashiCorp Vault PKI (overkill for a homelab)

Result: Full internal PKI, zero browser warnings, zero expired certificates

Authentik

Why: Universal OAuth2/OIDC — each service gets its own provider. Forward-auth proxy for services without native SSO. WebAuthn (YubiKey) for MFA.

Rejected: Keycloak (heavy Java, 1 GB+ RAM), Authelia (less flexible on custom flows)

Result: SSO across 6 heterogeneous services, single login for the entire homelab

Ansible + Semaphore

Why: Agentless — SSH is enough, no daemon to install on 30+ CTs. Idempotent — I rerun a playbook without risk. Semaphore adds a web UI for one-click launches.

Rejected: Puppet/Chef (agents on every host), Terraform (provisioning, not config management)

Result: 32 operational playbooks, Wazuh/Beszel agent deployment in 1 command

Wazuh

Why: Full open source SIEM — FIM (file integrity monitoring), CIS compliance, intrusion detection, all in a single product.

Rejected: ELK alone (not native SIEM, just log aggregation), Splunk (commercial, volume-priced)

Result: Intrusion detection + CIS compliance across the entire homelab

CrowdSec

Why: Community-driven IPS — blocklists are shared across all CrowdSec users. An IP that attacks a homelab in France gets blocked worldwide.

Rejected: Fail2ban (local only, no community dimension, fragile regexes)

Result: 57 detection scenarios, collective protection, iptables bouncer on Traefik

AI Agents

Why: AI is not a gadget — it is an operational partner. The AIops v2 trio: OpenFang (headless sentinel, 8 Guardian crons) → MQTT → Hermes (Telegram triage h24, 3 night crons) → SSH spawn Claude CT 196 (ephemeral remediation). Plus PentAGI (autonomous pentest, pve3 on-demand) and RAPTOR (source code audit, distrobox). MiniMax M2.7 via LiteLLM (4-provider failback), RTX 3090 for local inference. All agents communicate via MQTT bus.

Stack: OpenFang (Rust), Hermes (Python), PentAGI (Docker/Kali), RAPTOR (distrobox Semgrep/CodeQL/AFL++), MQTT (Mosquitto), 31 cybersec skills

Result: 11 automated crons (8 Guardian + 3 Hermes), daily backups, autonomous monitoring + security digest + doc reconciliation, Claude CT 196 spawnable for critical remediation — ~€11/month total (LiteLLM routed)

VictoriaMetrics

Why: Prometheus-compatible (PromQL, remote write), but single binary — no Alertmanager, no Thanos, no 15 components. Superior compression, less RAM.

Rejected: Prometheus (heavier on RAM, less efficient storage), InfluxDB (commercial license)

Result: Long-term TSDB metrics, scraping 20+ targets, queryable by the OpenFang agent

Beszel

Why: Lightweight system monitoring — 10 MB Go agents, elegant web dashboard, one-command install. No need to configure Grafana + node_exporter + JSON dashboards.

Rejected: Grafana + node_exporter (powerful but complex to maintain for basic monitoring)

Result: 30 deployed agents, instant CPU/RAM/disk overview across the entire homelab

Patchmon

Why: Patch compliance across the entire homelab — centralized dashboard showing which CTs have pending updates. Automatic enrollment of Proxmox nodes.

Rejected: Manual apt list --upgradable scripts (no overview, no history)

Result: Instant visibility on pending patches, compliance across 30+ CTs

Traefik — Reverse Proxy

Traefik dashboard — routers, services, middlewares

Authentik — SSO / IdP

Authentik admin — SSO dashboard with login stats

TechnitiumDNS — DNS Server

TechnitiumDNS — query stats and top clients

Semaphore — Ansible UI

NetBox — IPAM / DCIM

Immich — Photo Library

Immich — photo gallery with AI classification

ByteStash — Snippets

Joplin — Notes

Joplin desktop — notes and documentation

OMV — NAS

netboot.xyz — PXE Boot

Forgejo — Git Forge

Forgejo — self-hosted Git forge, 6 repos

FreshRSS — RSS

Kavita — Reader

Wiki.js — Documentation

Wiki.js — infrastructure documentation wiki

Network & TLS

The network is the foundation of everything. Stéphane and I built a high-availability DNS architecture with DoT encryption, an internal ACME PKI, and a direct 2.5 Gbps link between the two main nodes.

LAN

192.168.1.0/24 — Freebox Delta gateway (.254). SFP+ 10G to the workstation, Ethernet to Proxmox nodes.

HA DNS

Primary CT 100 (pve1) + secondary CT 101 (pve2). Automatic AXFR synchronization. DoT port 853. OISD + Hagezi blocklists (~650k domains).

TLS pipeline

step-ca (CT 102) issues certificates via ACME tlsChallenge. Traefik (CT 110) requests and renews them automatically. Duration: 90 days.

DNS pattern

All *.pixelium.internal points to 192.168.1.110 (Traefik). Traefik routes to the correct backend based on the Host header.

Direct 2.5G link

pve1 and pve2 are connected point-to-point via RTL8125B — 10.10.10.1/30 to 10.10.10.2/30. Inter-node transfers bypass the switch.

VPN mesh

Headscale (CT 106) — self-hosted Tailscale coordination server. Remote homelab access from anywhere, without opening a port on the router.

step-ca

Private ACME CA

→

ACME

tlsChallenge

Standard protocol

→

TLS

Traefik

HTTPS reverse proxy

→

90d

Auto-renewal

Zero intervention

Observability

I helped Stéphane build an observability stack with 5 complementary tools. Each does one thing well — no monolithic platform. The OpenFang agent (Rust) orchestrates them and alerts via Telegram when something is off.