[{"content":"","date":"16 April 2026","externalUrl":null,"permalink":"/tags/argocd/","section":"Tags","summary":"","title":"Argocd","type":"tags"},{"content":"","date":"16 April 2026","externalUrl":null,"permalink":"/blog/","section":"Blog","summary":"","title":"Blog","type":"blog"},{"content":"","date":"16 April 2026","externalUrl":null,"permalink":"/tags/cicd/","section":"Tags","summary":"","title":"Cicd","type":"tags"},{"content":"The image is built, scanned, and pushed. The version tag is v0.1.0-alpha.7.\nAnd yet\u0026hellip; nothing is running in the cluster.\nThat\u0026rsquo;s the deployment gap that catches a lot of people off guard. CI/CD ends with a container registry entry. The cluster doesn\u0026rsquo;t care about that — it needs a manifest, a chart, a reconciliation loop. The jump from \u0026ldquo;artifact in a registry\u0026rdquo; to \u0026ldquo;pod running in production\u0026rdquo; is exactly where the 4-repo GitOps model earns its keep.\nThis is the post where I close that loop.\nBack in January I published The Four-Repo GitOps Structure for My Homelab Platform as a conceptual overview of the architecture. Since then, every post in the Self-Hosting the Blog series has been about the left side of the delivery pipeline: containerising the site, versioning the image, building security gates. Now we\u0026rsquo;re crossing to the right side - Kubernetes deployment - and I\u0026rsquo;m going to walk through exactly what it took to wire all four repositories together to serve the blog.\nWhat Changed Across the Four Repositories # Before diving into the why, here\u0026rsquo;s the summary of what actually landed in each repository this week:\nRepository Key Changes homelab-k8s-argo-config Added web namespace; fixed ExternalSecret base64 encoding for 1Password credentials homelab-k8s-base-manifests Added Hugo library template + deployable chart (v0.1.0) homelab-k8s-environments Added blog_hugo version file for dev (v0.1.0-alpha.7); created prod path placeholder homelab-k8s-environments-apps Built the full App-of-Apps structure: root app, web domain root, blog_hugo Application, runtime values Let me go through each layer in the same order that Argo CD resolves them.\nLayer 1 — Platform Prerequisites in homelab-k8s-argo-config # The argo-config repository owns everything the platform needs before workloads can land. For the blog deployment, two things were missing.\nAdding a Namespace # Platform namespaces are declarative in base/namespaces/namespace.yaml. Adding web was a one-liner, but it\u0026rsquo;s the right place to track it:\n- apiVersion: v1 kind: Namespace metadata: name: web In an enterprise environment, namespace creation is often a cross-team ceremony. Keeping it in the platform repository (not in the app chart) means it\u0026rsquo;s provisioned once, before any workload arrives, and stays anchored to the platform lifecycle — not the application lifecycle. Kustomize overlays for dev and prod can inherit this and add namespace-level policies on top.\nFixing the ExternalSecret Base64 Encoding # 1Password Connect credentials land in the cluster through an ExternalSecret. When I first set it up, the raw value was being injected directly into the Kubernetes Secret. The problem: the target Secret field expects a base64-encoded JSON credential file, and ExternalSecrets was writing the raw string.\nThe fix is a template block with engineVersion: v2:\ntarget: template: engineVersion: v2 data: 1password-credentials.json: |- {{ \u0026#34;{{ .credentials | b64enc }}\u0026#34; }} The b64enc pipe runs inside the ExternalSecrets template engine, encoding the remote reference value before it writes it into the Secret. This is one of those \u0026ldquo;obvious in hindsight\u0026rdquo; issues — the container was crashing with a base64-decode error, which is a non-trivial signal to chase when you\u0026rsquo;re unfamiliar with how ExternalSecrets processes values.\nLayer 2 — The Helm Chart in homelab-k8s-base-manifests # This repository is the chart source. For workloads, I use a library template pattern: one generic Helm library chart provides reusable named templates, and a thin deployable chart wraps them.\nThe Library Template # The library chart lives at templates/microservices-template/template-hugo/v0.1.0/. It defines named templates for all standard Kubernetes resources:\n_deployment.yaml → homelab.deployment-hugo _service.yaml → homelab.service-hugo _probes.yaml → liveness and readiness probe definitions _env_config_map.yaml → ConfigMap from values.envConf _env_external_secret.yaml → ExternalSecret from values.externalSecretsConfig _labels.yaml → standard label set _microservice.yaml → calls all of the above _variables.yaml → shared variable resolution The deployable chart at charts/microservices/hugo/v0.1.0/ is intentionally thin:\n# Chart.yaml apiVersion: v2 name: hugo version: 0.1.0 dependencies: - name: template-hugo version: \u0026#34;0.1.0\u0026#34; repository: \u0026#34;file://../../../../templates/microservices-template/template-hugo/v0.1.0\u0026#34; And a single template call:\n# templates/hugo_static_website.yaml {{ include \u0026#34;homelab.microservice\u0026#34; . }} This pattern mirrors how enterprise platform teams ship Helm: a blessed template library that enforces standards (label conventions, probe patterns, resource defaults), and thin consumer charts per workload type. App teams get a values API, not a Kubernetes API.\nLayer 3 — The Version Registry in homelab-k8s-environments # This repository holds one thing per deployed app: the image version. That\u0026rsquo;s it.\n# environments/dev/web/blog_hugo/values.yaml version: v0.1.0-alpha.7 The production path exists as a placeholder (environments/prod/.gitkeep) but holds no version yet — promotion to prod is a deliberate pull request, not an automated step.\nThe strict ownership rule here is worth emphasising. In every Argo CD multi-source Application in this setup, the values from this registry are loaded after the runtime values from homelab-k8s-environments-apps. Because later values win in Helm merge order:\nsources: - ref: valuesRepo # environments-apps (runtime) - ref: valuesRepoDefault # environments (version) ← wins on overlapping keys If you ever accidentally put runtime config (replicas, resources) in the version registry, it silently overrides the runtime values. Keeping the version registry strictly version-only prevents that class of bug entirely.\nIn a production fintech setup I\u0026rsquo;ve run, a similar discipline was enforced by a separate team owning the version registry with read-only access from app teams. Here in the homelab it\u0026rsquo;s enforced by convention, but the same logic applies.\nLayer 4 — App Definitions and Runtime Values in homelab-k8s-environments-apps # This is the largest change set, and the most structurally interesting one. It introduces the full App-of-Apps hierarchy for the dev environment.\nThe App-of-Apps Chain # The pattern builds a hierarchy of Argo CD Applications, each pointing to the next level:\nroot-homelab-dev (bootstrap entry point) └── root-app/ └── root-web-dev (web domain root) └── web/ └── blog_hugo.yaml (actual workload Application) Step by step:\n1. Bootstrap root — root-homelab-dev.yaml\napiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: root-homelab-dev namespace: argocd spec: source: path: environments/dev/_root/root-app repoURL: https://github.com/anvaplus/homelab-k8s-environments-apps targetRevision: HEAD project: argo-config syncPolicy: automated: prune: true selfHeal: true This is what you apply once to bootstrap. After that, Argo CD picks up everything else from the repo automatically.\n2. Root app group — root-app/ is a Kustomize directory\n# root-app/kustomization.yaml resources: - root-web.yaml 3. Domain root — root-web.yaml points Argo CD at the web/ domain folder, which contains its own Kustomization listing all web Applications:\n# web/kustomization.yaml resources: - blog_hugo/blog_hugo.yaml This indirection pays off when the second app lands. Adding homelab_hugo to the web domain means adding one line to this file and dropping in two new files. Nothing higher in the chain changes.\nThe Multi-Source Application # The workload Application is where the three value sources converge:\n# web/blog_hugo/blog_hugo.yaml apiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: web-hugo-dev namespace: argocd finalizers: - resources-finalizer.argocd.argoproj.io spec: destination: namespace: web server: https://kubernetes.default.svc project: argo-config sources: # Source 1: version registry (ref only, no path — provides $valuesRepoDefault) - repoURL: https://github.com/anvaplus/homelab-k8s-environments targetRevision: HEAD ref: valuesRepoDefault # Source 2: runtime values (ref only — provides $valuesRepo) - repoURL: https://github.com/anvaplus/homelab-k8s-environments-apps targetRevision: HEAD ref: valuesRepo # Source 3: chart (path + valueFiles — this is what renders) - repoURL: https://github.com/anvaplus/homelab-k8s-base-manifests targetRevision: HEAD path: charts/microservices/hugo/v0.1.0 helm: valueFiles: - $valuesRepo/environments/dev/web/blog_hugo/values.yaml - $valuesRepoDefault/environments/dev/web/blog_hugo/values.yaml syncPolicy: automated: prune: true selfHeal: true The ref: sources are just repository anchors — they expose their content as $valuesRepo and $valuesRepoDefault variables referenced in valueFiles. Only the third source (with path:) contains the chart that renders.\nThis is one of the most powerful Argo CD features that I see underused. Standard single-source Applications can\u0026rsquo;t separate \u0026ldquo;what version\u0026rdquo; from \u0026ldquo;how configured\u0026rdquo; — you end up with environment-specific chart forks or complex Kustomize overlays. Multi-source solves this cleanly.\nRuntime Values # The values.yaml in this repository owns everything that isn\u0026rsquo;t the version:\n# web/blog_hugo/values.yaml appName: web componentName: hugo namespace: web environment: dev deployment: spec: replicas: 1 image: repository: anvaplus/hugo-blog-example pullPolicy: IfNotPresent resources: requests: cpu: 100m memory: 128Mi limits: cpu: 300m memory: 256Mi No version key here. That\u0026rsquo;s intentional\u0026hellip; it lives only in homelab-k8s-environments. Any future automation that bumps the version (CI/CD writing to the version registry) doesn\u0026rsquo;t touch this file, and any runtime tuning (scaling replicas, adjusting resource limits) doesn\u0026rsquo;t touch the version registry.\nHow It All Connects # At reconciliation time, Argo CD visits this Application and:\nFetches all three sources at HEAD Resolves $valuesRepo and $valuesRepoDefault variables Builds the Helm release from the chart, layering values in order: runtime values first, then version values on top Applies the rendered manifests to the web namespace in the cluster The final rendered objects for the blog_hugo Application include:\nA Deployment with a single replica, image anvaplus/hugo-blog-example:v0.1.0-alpha.7 A Service of type ClusterIP on port 80 targeting 8080 A ConfigMap for environment variables (Hugo base URL, language, environment) Optionally, an ExternalSecret if externalSecretsConfig is populated in values The web namespace already exists (provisioned by argo-config), the image has passed security gates (built by the CI pipeline), and Argo CD simply matches desired state to actual state continuously.\nThe Loop Is Closed # Here\u0026rsquo;s the complete flow now, end to end:\nGit push (blog content change) → GitHub Actions PR validation (Gitleaks, Trivy, SonarCloud) → Merge to main → GitHub Actions deployment (multi-arch build, tag v0.1.0-alpha.7) → Image pushed to registry → CI writes version to homelab-k8s-environments (dev path) → Argo CD detects version change → Argo CD reconciles blog_hugo Application → Pod restarts with new image tag → Blog is live Every step is automated and auditable. The image hash that passed security scanning is the same hash running in the cluster. The version that CI calculated is the version committed to Git and tracked in the environment registry.\nThis mirrors how I\u0026rsquo;ve seen compliant delivery pipelines run in regulated environments — the difference is that here the entire infrastructure fits in a spare homelab box and costs nothing beyond electricity.\nWhat\u0026rsquo;s Next # The platform now runs the blog. The next logical step is connecting the actual deployment to the real domain — wiring up Traefik ingress and TLS certificates so blog.dev.thebestpractice.tech (link public available after the next post) resolves to the pod. We\u0026rsquo;ve already built the networking layer in the Path to Automated TLS series\u0026hellip; now it\u0026rsquo;s time to use it for the workload.\nStay tuned. Andrei\n","date":"16 April 2026","externalUrl":null,"permalink":"/from-docker-image-to-running-pod-completing-the-gitops-loop/","section":"Blog","summary":"The image is built, scanned, and pushed. The version tag is v0.1.0-alpha.7.\n","title":"From Docker Image to Running Pod: Completing the GitOps Loop with App-of-Apps and Multi-Source Argo CD","type":"blog"},{"content":"","date":"16 April 2026","externalUrl":null,"permalink":"/tags/gitops/","section":"Tags","summary":"","title":"Gitops","type":"tags"},{"content":"","date":"16 April 2026","externalUrl":null,"permalink":"/tags/helm/","section":"Tags","summary":"","title":"Helm","type":"tags"},{"content":"","date":"16 April 2026","externalUrl":null,"permalink":"/tags/homelab/","section":"Tags","summary":"","title":"Homelab","type":"tags"},{"content":"","date":"16 April 2026","externalUrl":null,"permalink":"/tags/kubernetes/","section":"Tags","summary":"","title":"Kubernetes","type":"tags"},{"content":"","date":"16 April 2026","externalUrl":null,"permalink":"/tags/platform-engineering/","section":"Tags","summary":"","title":"Platform-Engineering","type":"tags"},{"content":"","date":"16 April 2026","externalUrl":null,"permalink":"/series/self-hosting-the-blog/","section":"Series","summary":"","title":"Self-Hosting the Blog","type":"series"},{"content":"","date":"16 April 2026","externalUrl":null,"permalink":"/series/","section":"Series","summary":"","title":"Series","type":"series"},{"content":"","date":"16 April 2026","externalUrl":null,"permalink":"/tags/","section":"Tags","summary":"","title":"Tags","type":"tags"},{"content":"Welcome to my digital garden where I share my journey building enterprise-grade infrastructure at home.\n","date":"16 April 2026","externalUrl":null,"permalink":"/","section":"Welcome","summary":"Welcome to my digital garden where I share my journey building enterprise-grade infrastructure at home.\n","title":"Welcome","type":"page"},{"content":"","date":"3 April 2026","externalUrl":null,"permalink":"/tags/container/","section":"Tags","summary":"","title":"Container","type":"tags"},{"content":"","date":"3 April 2026","externalUrl":null,"permalink":"/tags/github-actions/","section":"Tags","summary":"","title":"Github-Actions","type":"tags"},{"content":"","date":"3 April 2026","externalUrl":null,"permalink":"/tags/security/","section":"Tags","summary":"","title":"Security","type":"tags"},{"content":"","date":"3 April 2026","externalUrl":null,"permalink":"/tags/sonarqube/","section":"Tags","summary":"","title":"Sonarqube","type":"tags"},{"content":"The artifact is already hardened. From FinTech to Homelab: Writing an Enterprise-Ready Dockerfile for Hugo was about building the container correctly. This post is about everything that has to happen after that.\nBut here\u0026rsquo;s the uncomfortable truth: shipping an artifact only once isn\u0026rsquo;t enough. You need automated validation at every step. Every time someone merges code, a pipeline should kick in and ask: Is this secure? Does it have vulnerabilities? Does it meet our quality standards?\nIn a regulated environment this isn\u0026rsquo;t optional. Security gates are mandatory checkpoints. And if something fails, you iterate, fix, and try again.\nToday, we\u0026rsquo;re building exactly that: a GitHub Actions CI/CD pipeline that works like a bank\u0026rsquo;s security review process\u0026hellip; except it runs in seconds, not days.\nThe Pipeline Architecture # Before we dive into the code, let me show you the architecture. The pipeline has two distinct workflows:\nPull Request Validation (Hugo_PullRequest.yaml) — runs on every PR before merge Deployment Pipeline (Hugo_Deploy.yaml) — runs after merge to main The PR validation pipeline is our first line of defense. It catches problems early. The deployment pipeline is still rigorous, but it assumes the code has already passed validation.\nWhat the PR Pipeline Does # When you open a pull request with changes to the blog:\nGitleaks Secret Scan: Detects if any secrets (API keys, tokens, credentials) accidentally got committed Trivy Configuration Scan: Scans all YAML and infrastructure-as-code files for known vulnerabilities SonarQube Code Quality Analysis: Measures code smells, bugs, security hotspots, and test coverage Trivy Image Scan: Builds a local Docker image and scans it for container vulnerabilities Multi-Architecture Build: Attempts to build the image for both amd64 and arm64 without pushing If any of these steps fail, the PR is blocked. You cannot merge until you fix the issues.\nWhat the Deployment Pipeline Does # After you merge to main, the deployment pipeline kicks in:\nVersion Resolution: Resolves image name and calculates the next alpha version Multi-Architecture Build and Push: Builds and pushes amd64,arm64 image only after successful scan SonarQube Analysis: Runs SonarQube scan against the full repository The GitHub Actions Workflows # Let me show you the workflows and trigger wiring from the files. In this setup, both are reusable workflows (workflow_call) hosted in the workflows repository, and each project calls them explicitly.\nCaller Workflow: Pull Request Trigger # name: Pull Request Validation on: pull_request: types: - opened - reopened - synchronize branches: - main permissions: contents: read jobs: validate: uses: anvaplus/homelab-github-workflows/.github/workflows/Hugo_PullRequest.yaml@main with: sonar_organization: ${{ vars.SONAR_ORGANIZATION }} secrets: SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }} Caller Workflow: Deploy Trigger # name: Deploy on: push: branches: [main] jobs: deploy: uses: anvaplus/homelab-github-workflows/workflows/Hugo_Deploy.yaml@main with: dockerhub_username: anvaplus sonar_organization: ${{ vars.SONAR_ORGANIZATION }} secrets: DOCKERHUB_TOKEN: ${{ secrets.DOCKERHUB_TOKEN }} SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }} Workflow Source Links # Hugo Pull Request workflow: Hugo_PullRequest.yaml Hugo Deploy workflow: Hugo_Deploy.yaml Versioning Every Merge # One important part of this pipeline is the versioning model behind it. I do not tag images manually. Every merge to main produces a new integration build version automatically through my next-version action, which reads the existing git tags and generates the next semantic version with an alpha suffix.\nThat means the pipeline does not just build containers, it also creates a traceable release history. If the last stable release was v1.2.0, the next merges produce versions like v1.3.0-alpha.1, v1.3.0-alpha.2, v1.3.0-alpha.3, and so on. The artifact changes, the tag changes, and the repository history keeps a clean record of exactly what was built after each PR merge.\nThis is the same strategy I described in Stop Rebuilding Your Images: The \u0026ldquo;Build Once, Promote Everywhere\u0026rdquo; Manifesto: next-version generates the integration tag, while promote-version is responsible for advancing the same artifact through later environments such as beta, rc, and finally stable production. In this blog pipeline, we are focused on the first part of that flow: generating the integration-ready alpha build consistently.\nYou can see the same tag history directly in GitHub on the repository tags page. GitKraken simply makes that progression easier to read visually, because you can see the tags attached to the commit flow after each PR merge.\nThe Security Tools in Action # Each tool in this pipeline covers a different risk surface. Together, they create layered controls similar to what you would expect in a regulated production environment.\nGitleaks: Stop Secrets Before They Start # Gitleaks scans commit history and current changes for patterns that match known secret formats such as API tokens, cloud keys, and private keys. This is the earliest and cheapest control in the chain, because a leaked credential can expose your entire platform long before an application vulnerability is exploited.\nTrivy Configuration Scanner: Infrastructure-as-Code Hygiene # Trivy config scan reviews YAML files, Dockerfiles, Kubernetes manifests, and related IaC for risky misconfigurations, including overly permissive access, missing hardening controls, and insecure defaults. These issues are often the root cause of production incidents, so catching them at PR time prevents bad infrastructure posture from being promoted.\nTrivy Image Scanner: Container Vulnerability Detection # Trivy image scan inspects the final container artifact for known CVEs in OS and library layers. Even perfect application code can be compromised by vulnerable dependencies, so this gate enforces image hygiene before publish. In this pipeline, HIGH and CRITICAL findings block progression.\nSonarQube: Code Quality and Security Hotspots # SonarQube performs static analysis to detect security hotspots, maintainability issues, duplication, and reliability risks directly in source code. This complements Trivy and Gitleaks by covering what dependency and config scanners cannot: logic-level weaknesses and quality drift in application code.\nThe Iterative Hardening Journey # This is what actually happened in practice. The hardening was not one commit; it was three focused iterations.\nIteration 1: PR #4 Fails, Then Passes # PR link: hugo-blog-example#4\nThe first PR failed at the GitHub level:\nAt the same time, the PR validation workflow itself was healthy. Trivy configuration and image scans both passed:\nThe blocker was SonarCloud quality review, not container vulnerability scanning:\nThe key recommendation was to pin dependency references with full commit SHA. For this specific dependency (my own workflow repository), I reviewed and accepted the risk in context. Once Sonar findings were resolved/accepted, the PR became mergeable:\nIteration 2: Main Branch Check Fails, Dockerfile Fix in PR #5 # After merging PR #4, the deploy workflow ran on main and Sonar analyzed the full repository context (not just PR scope). That check failed:\nThe issue came from secure download handling in the Docker build process. Based on secure coding guidance, I replaced wget with curl and enforced HTTPS-only redirect behavior with TLS 1.2 to prevent downgrade or insecure redirect paths during Hugo binary download.\nFix PR: hugo-blog-example#5\nAfter this change, PR checks passed again and, once merged, the main branch checks returned green:\nIteration 3: Centralized Workflows and Tag-Per-Merge in PR #6 # In the third iteration, I moved from in-repo workflow definitions to centralized reusable workflows (the model shown in this post). This mirrors how real platform teams manage CI/CD standards across multiple repositories.\nChange PR: hugo-blog-example#6\nEach merge now generates a new version tag automatically (as described in the Versioning Every Merge section), and each successful deploy publishes the image to Docker Hub.\nDocker Hub repository: anvaplus/hugo-blog-example\nWhat This Teaches Us About Pipelines # Here\u0026rsquo;s the lesson hidden in those iterations:\nTrue enterprise CI/CD is not about having the fanciest pipeline. It\u0026rsquo;s about making the gates so strict that broken code simply cannot reach production.\nIn fintech, I have seen major incidents start with someone assuming a change was \u0026ldquo;probably fine.\u0026rdquo; It was not. What looked like a small shortcut turned into long remediation cycles, unnecessary risk, and expensive follow-up work.\nBy contrast, a strict pipeline can look heavy at first, with all the scans and gates in place, but it is still the cheapest form of protection you can add. Catching an issue in a pull request is a routine fix. Catching it after release is a much more painful problem.\nA quick note on scope: this blog is not a step-by-step tutorial for each individual tool. If you\u0026rsquo;ve read this far, you already know how to add a secret in GitHub Actions or create an organization in SonarCloud—there are plenty of tutorials online for that. My goal here is to show you how these tools fit together in a production-grade setup, the same way I\u0026rsquo;ve seen them combined in regulated financial environments. All the code, workflows, and READMEs are publicly available in the companion repository. Take it, adapt it, and make it yours.\nWhat\u0026rsquo;s Next: Deployment to Kubernetes # Now that we have a hardened, scanned, and versioned container image, the next step is to deploy it to Kubernetes through the GitOps model I described in The Four-Repo GitOps Structure for My Homelab Platform.\nThat architecture separates platform services, reusable deployment blueprints, environment state, and Argo CD application definitions into distinct repositories. It is the operating model that turns this pipeline output into a real deployment flow, with controlled promotion, clear separation of concerns, and a public record of how image versions move toward the cluster.\nKey Takeaways # Security gates are mandatory checkpoints, not nice-to-haves. They cost nothing to run (GitHub Actions is free) and catch issues early. Iterate ruthlessly. The first pass won\u0026rsquo;t be perfect. Expect to fix things. That\u0026rsquo;s the point of the pipeline. Know your tools: Gitleaks for secrets Trivy for vulnerabilities SonarQube for code quality Docker Buildx for multi-architecture builds Fail fast, fail early. Block bad code at the PR stage, not production. The pipeline is part of your platform. Invest in it like you invest in your infrastructure. The full code for this blog\u0026rsquo;s pipeline is available in the public companion repository. Clone it, adapt it to your project, and start shipping hardened artifacts.\nStay tuned! Andrei\n","date":"3 April 2026","externalUrl":null,"permalink":"/building-a-production-grade-cicd-pipeline-with-security-gates/","section":"Blog","summary":"The artifact is already hardened. From FinTech to Homelab: Writing an Enterprise-Ready Dockerfile for Hugo was about building the container correctly. This post is about everything that has to happen after that.\n","title":"Stop Shipping Blind: Security Gates and Iterative Hardening with GitHub Actions","type":"blog"},{"content":"","date":"3 April 2026","externalUrl":null,"permalink":"/tags/trivy/","section":"Tags","summary":"","title":"Trivy","type":"tags"},{"content":"","date":"25 March 2026","externalUrl":null,"permalink":"/tags/containerization/","section":"Tags","summary":"","title":"Containerization","type":"tags"},{"content":"","date":"25 March 2026","externalUrl":null,"permalink":"/tags/docker/","section":"Tags","summary":"","title":"Docker","type":"tags"},{"content":"In the previous post, I laid out my plan: treat this blog as a production application and host it using the same standards I apply when architecting platforms for private banking and fintech.\nI originally stated that \u0026ldquo;Phase 1\u0026rdquo; would cover both containerization and CI/CD in a single swoop. However, as I started writing out the implementation details, it became obvious that throwing enterprise-grade security, image optimization, and pipeline automation into one article was going to be overwhelming.\nInstead, I\u0026rsquo;ve decided to split Phase 1 into two parts. Today, we\u0026rsquo;re focusing on the first piece of the puzzle: containerizing a Hugo static site properly. In the next post, we\u0026rsquo;ll cover the GitHub Actions CI/CD pipeline that builds it.\nThe Public Companion Repository # To make this series truly reproducible as promised, I\u0026rsquo;ve created a public companion repository at github.com/anvaplus/hugo-blog-example.\nSince my main repository holds a large historical archive and unpublished drafts, this public repo is intentionally scoped to focus on the deployment plumbing. It contains the exact same Hugo layout, Blowfish theme integration, Dockerfile, and Nginx configuration that we\u0026rsquo;re about to cover. If you want to follow along, clone that repository and build the environment yourself.\nThe Problem with \u0026ldquo;Standard\u0026rdquo; Dockerfiles # If you search for \u0026ldquo;Hugo Dockerfile,\u0026rdquo; you\u0026rsquo;ll find hundreds of tutorials. Most of them will get your site running, but almost all of them fail a basic security review. They typically suffer from:\nRunning as root (a massive security risk in Kubernetes). Using bloated base images like ubuntu or node, dragging hundreds of unnecessary packages and their associated Common Vulnerabilities and Exposures (CVEs) into production. Leaving build tools inside the final runtime image. Using default Nginx configurations that leak server versions and lack crucial HTTP security headers. In a highly regulated environment, deploying an image like that triggers immediate alarms from the security team. A static site should be just that: static. The container computing environment it runs in should have the absolute minimum attack surface possible.\nThe Multi-Stage Build # To solve this, I use a three-stage Docker build. Multi-stage builds allow us to use heavy dependencies to assemble our application, then extract only the compiled artifacts into a fresh, bare-bones runtime image.\nHere is the complete Dockerfile that powers this blog:\n# syntax=docker/dockerfile:1.7 ARG HUGO_VERSION=0.157.0 # --------------------------------------------------- # Stage 1: Securely download and verify Hugo # --------------------------------------------------- FROM alpine:3.22 AS hugo-installer ARG HUGO_VERSION RUN set -eux; \\ apk add --no-cache ca-certificates tar wget; \\ arch=\u0026#34;$(uname -m)\u0026#34;; \\ case \u0026#34;$arch\u0026#34; in \\ x86_64|amd64) hugo_arch=\u0026#39;amd64\u0026#39; ;; \\ aarch64|arm64) hugo_arch=\u0026#39;arm64\u0026#39; ;; \\ *) echo \u0026#34;Unsupported architecture: $arch\u0026#34; \u0026gt;\u0026amp;2; exit 1 ;; \\ esac; \\ hugo_archive=\u0026#34;hugo_extended_${HUGO_VERSION}_linux-${hugo_arch}.tar.gz\u0026#34;; \\ wget -O \u0026#34;/tmp/${hugo_archive}\u0026#34; \u0026#34;https://github.com/gohugoio/hugo/releases/download/v${HUGO_VERSION}/${hugo_archive}\u0026#34;; \\ wget -O /tmp/hugo_checksums.txt \u0026#34;https://github.com/gohugoio/hugo/releases/download/v${HUGO_VERSION}/hugo_${HUGO_VERSION}_checksums.txt\u0026#34;; \\ cd /tmp; \\ grep \u0026#34; ${hugo_archive}$\u0026#34; /tmp/hugo_checksums.txt | sha256sum -c -; \\ tar -xzf \u0026#34;/tmp/${hugo_archive}\u0026#34; -C /tmp hugo; \\ install -m 0755 /tmp/hugo /usr/local/bin/hugo # --------------------------------------------------- # Stage 2: Build the static site as a non-root user # --------------------------------------------------- FROM alpine:3.22 AS builder ENV HUGO_ENV=production \\ HUGO_ENVIRONMENT=production WORKDIR /src RUN set -eux; \\ apk add --no-cache ca-certificates git libc6-compat libgcc libstdc++; \\ addgroup -S builder; \\ adduser -S -G builder -h /home/builder builder; \\ mkdir -p /src /tmp/hugo_cache /home/builder; \\ chown -R builder:builder /src /tmp/hugo_cache /home/builder COPY --from=hugo-installer /usr/local/bin/hugo /usr/local/bin/hugo # Copy project files COPY --chown=builder:builder archetypes/ /src/archetypes/ COPY --chown=builder:builder assets/ /src/assets/ COPY --chown=builder:builder config/ /src/config/ COPY --chown=builder:builder content/ /src/content/ COPY --chown=builder:builder data/ /src/data/ COPY --chown=builder:builder i18n/ /src/i18n/ COPY --chown=builder:builder layouts/ /src/layouts/ COPY --chown=builder:builder static/ /src/static/ COPY --chown=builder:builder themes/ /src/themes/ # Switch to non-root user for the build USER builder RUN set -eux; \\ hugo --gc --minify --cacheDir /tmp/hugo_cache --destination /tmp/public # --------------------------------------------------- # Stage 3: The Distroless Runtime # --------------------------------------------------- FROM cgr.dev/chainguard/nginx:latest AS runtime LABEL org.opencontainers.image.title=\u0026#34;personal-blog\u0026#34; \\ org.opencontainers.image.description=\u0026#34;Hardened container image for the personal blog\u0026#34; \\ org.opencontainers.image.source=\u0026#34;https://github.com/anvaplus/personal-blog\u0026#34; COPY --chown=65532:65532 docker/nginx.conf /etc/nginx/nginx.conf COPY --from=builder --chown=65532:65532 /tmp/public/ /usr/share/nginx/html/ USER 65532:65532 EXPOSE 8080 Let\u0026rsquo;s break down the \u0026ldquo;why\u0026rdquo; behind this architecture.\nStage 1: Trust, but Verify # We do not rely on random Alpine package repositories to have the exact Hugo version we need. Instead, we grab the official release directly from GitHub. More importantly, we don\u0026rsquo;t just blindly execute downloaded binaries. The script downloads the sha256sum file, specifically searches for the checksum of the archive we downloaded, and verifies its cryptographic integrity before installing it. In an enterprise setting, verifying software supply chains is non-negotiable.\nStage 2: Principle of Least Privilege in CI # Even during the build phase, running as root is a bad practice. The second stage configures the Alpine OS and ensures the actual Hugo site compilation is run under a dedicated, unprivileged builder user.\nWe simply pass in the hugo binary compiled in Stage 1, copy over only the relevant directories to prevent local cache poisoning, and run Hugo with --gc (garbage collection) and --minify options to keep the generated HTML/CSS/JS footprints tiny.\nStage 3: The Chainguard Distroless Runtime # This stage is the only one that makes it into production, and it\u0026rsquo;s where we achieve our \u0026ldquo;Fort Knox\u0026rdquo; status. I\u0026rsquo;m using cgr.dev/chainguard/nginx:latest.\nChainguard images are distroless and heavily hardened. A distroless runtime means there is no package manager (no apk or apt), no shell (bash or sh), and no common UNIX utilities. If an attacker somehow achieves remote code execution in this container, they have zero tools available to pivot or escalate. Furthermore, Chainguard images are built to guarantee zero known CVEs, running securely as a non-root user (65532:65532) by default.\nHardening the Nginx Configuration # A secure container still needs a secure web server. Running Nginx rootless on Kubernetes means we must bind to a high port (in this case, 8080), as binding to port 80 typically requires root privileges.\nHere is the docker/nginx.conf that gets mapped into the distroless container:\npid /tmp/nginx.pid; events { worker_connections 1024; } http { include /etc/nginx/mime.types; default_type application/octet-stream; access_log /dev/stdout; error_log /dev/stderr warn; sendfile on; tcp_nopush on; keepalive_timeout 65; # Never leak the Nginx version server_tokens off; server { listen 8080; listen [::]:8080; server_name _; root /usr/share/nginx/html; index index.html; client_body_temp_path /tmp/nginx-client-body; proxy_temp_path /tmp/nginx-proxy-temp; fastcgi_temp_path /tmp/nginx-fastcgi-temp; uwsgi_temp_path /tmp/nginx-uwsgi-temp; scgi_temp_path /tmp/nginx-scgi-temp; # Standard Security Headers add_header X-Content-Type-Options \u0026#34;nosniff\u0026#34; always; add_header X-Frame-Options \u0026#34;SAMEORIGIN\u0026#34; always; add_header Referrer-Policy \u0026#34;strict-origin-when-cross-origin\u0026#34; always; # Advanced Security Permissions add_header Permissions-Policy \u0026#34;accelerometer=(), autoplay=(), camera=(), geolocation=(), gyroscope=(), magnetometer=(), microphone=(), payment=(), usb=()\u0026#34; always; add_header Content-Security-Policy \u0026#34;default-src \u0026#39;self\u0026#39;; base-uri \u0026#39;self\u0026#39;; form-action \u0026#39;self\u0026#39;; frame-ancestors \u0026#39;self\u0026#39;; img-src \u0026#39;self\u0026#39; data: https:; object-src \u0026#39;none\u0026#39;; script-src \u0026#39;self\u0026#39; \u0026#39;unsafe-inline\u0026#39;; style-src \u0026#39;self\u0026#39; \u0026#39;unsafe-inline\u0026#39;; font-src \u0026#39;self\u0026#39; data:; connect-src \u0026#39;self\u0026#39; https:; upgrade-insecure-requests\u0026#34; always; # Kubernetes Readiness/Liveness probe endpoint location = /healthz { access_log off; return 200 \u0026#39;ok\u0026#39;; add_header Content-Type text/plain; } # Block access to hidden files (.git, .env, etc.) # except for .well-known for ACME/Let\u0026#39;s Encrypt location ~ /\\.(?!well-known/).* { deny all; access_log off; log_not_found off; } # SPA-friendly fallback routing location / { try_files $uri $uri/ /index.html; } } } Notice the specific enterprise best-practices we\u0026rsquo;ve implemented:\nContainer compatibility: Paths for cache (/tmp/nginx-*) are explicitly moved to /tmp/, ensuring that our read-only root filesystem doesn\u0026rsquo;t break Nginx operations. Logs are streamed to stdout/stderr—perfect for a Kubernetes logging aggregator to pick them up. Kubernetes Health Checks: The dedicated /healthz endpoint allows Kubernetes readiness and liveness probes to constantly monitor the pod without spamming the main access logs. Defense in Depth via Headers: The rigorous Content-Security-Policy and Permissions-Policy restrict exactly what the browser is allowed to execute or load. Turning off server_tokens prevents attackers from easily querying the exact Nginx version. What We Achieved # We took a basic static site generator and wrapped it in a highly-secured, verified, and hardened container. It runs without root privileges, actively denies unexpected system behavior, strictly enforces browser security policies, and has zero exploitable runtime packages—which is exactly how you safely build and package an application in modern engineering.\nIf you clone the companion repository, you can build and test this exact hardened setup locally right now:\ndocker buildx build --load -t personal-blog:local . docker run --rm -p 8080:8080 personal-blog:local Now that we have an artifact worthy of production, we need an automated way to build it.\nIn the next post, we will build a GitHub Actions pipeline to automatically build, scan for vulnerabilities, version using the Automated Semantic Versioning Strategy I described previously, and push this container image to a registry every time we merge to main.\nStay tuned! Andrei\n","date":"25 March 2026","externalUrl":null,"permalink":"/from-fintech-to-homelab-writing-an-enterprise-ready-dockerfile-for-hugo/","section":"Blog","summary":"In the previous post, I laid out my plan: treat this blog as a production application and host it using the same standards I apply when architecting platforms for private banking and fintech.\n","title":"From FinTech to Homelab: Writing an Enterprise-Ready Dockerfile for Hugo","type":"blog"},{"content":"","date":"25 March 2026","externalUrl":null,"permalink":"/tags/hugo/","section":"Tags","summary":"","title":"Hugo","type":"tags"},{"content":"","date":"17 March 2026","externalUrl":null,"permalink":"/tags/devops/","section":"Tags","summary":"","title":"Devops","type":"tags"},{"content":"","date":"17 March 2026","externalUrl":null,"permalink":"/tags/release-management/","section":"Tags","summary":"","title":"Release-Management","type":"tags"},{"content":" The Enterprise Traceability Problem # Guessing whether v1.3.0 in production actually includes yesterday\u0026rsquo;s critical security patch is a dangerous game. Knowing exactly which version of an artifact is running in any given environment isn\u0026rsquo;t just a nice-to-have dashboard feature\u0026hellip; it\u0026rsquo;s the foundation of a reliable release process. You can never afford to wonder if the build candidate QA just signed off on is truly the exact same binary you are deploying to users.\nWhen building my homelab platform, I wanted that exact same level of ironclad traceability without the bloated, soul-crushing overhead of heavy enterprise tools. The goal was simple: zero manual version bumping. The CI/CD pipeline should be fully automated, predictable, and capable of gracefully progressing a release candidate through multiple gating environments until it safely reaches production.\nWelcome to the versioning strategy overview! In this post, I\u0026rsquo;ll detail how I handle automated semantic versioning and release promotion across a 4-tier environment using tools purpose-built for the job.\nFAIR WARINIG: this is not a casual, easy-to-read post. It represents a dense combination of automated semantic versioning and strict artifact promotion designed for production-grade reliability. The \u0026ldquo;Build Once, Promote Everywhere\u0026rdquo; Philosophy # A common anti-pattern in CI/CD is tying builds to specific environments using branch strategies\u0026hellip; for example, automatically compiling an image when code merges to a develop branch for Staging, and building a completely separate image when code merges to main for Production.\nThe fatal flaw with that approach? You are compiling a new image for every environment. Even if the codebase is identical, things like dynamic dependencies, runner toolchain updates, or external library versions can drift between the two builds. You lose the definitive guarantee that the image QA tested in Staging is the exact same image running in Production.\nTo solve this, my pipeline operates on strict immutable artifacts. I build an image exactly once at the very beginning of the CI cycle. From there, I promote that single, deeply verified image through every environment until it reaches Production. The underlying image hash never changes.\nIn practice, promotion means automatically computing the new semantic version tag and applying it to the existing image. Depending on the environment\u0026rsquo;s security requirements and how GitOps controllers like Argo CD are configured, this promotion often involves copying the exact same image hash from a lower-tier container registry to a dedicated higher-tier registry. Ideally, you want a fully segregated image registry for Production\u0026hellip; with highly restricted access\u0026hellip; so that only heavily vetted, explicitly promoted artifacts ever make it that far.\nTwo GitHub Actions: Generation and Progression # To achieve a true \u0026ldquo;commit-to-production\u0026rdquo; release pipeline, the CI/CD system needs two distinct operations. It has to generate a brand new version when new code appears, and then it needs to promote that version suffix as it moves safely between lifecycle stages.\nI orchestrate this using two custom GitHub Actions:\nnext-version: Used during the Continuous Integration (CI) phase. Its job is generation — it analyzes existing Git tags in the repository and computes the correct next version (usually by bumping the minor version and appending an alpha suffix). promote-version: Used during the Continuous Delivery (CD) phase. Its job is progression — it takes a known existing prerelease version and safely advances its suffix state as the artifact successfully moves through deployment environments, culminating in a stable production tag. The 4-Tier Environment Mapping # In a typical production setting, you aren\u0026rsquo;t deploying straight to live users. For this pipeline, I designed a 4-tier environment structure. Each environment acts as a quality gate strictly bound to a semantic version prerelease suffix.\nEnvironment Suffix Tool Used Output Example Integration (INT) alpha next-version v1.3.0-alpha.1 Staging (STG) beta promote-version v1.3.0-beta.1 User Acceptance Testing (UAT) rc promote-version v1.3.0-rc.1 Production (PROD) (none) promote-version v1.3.0 The Version Lifecycle in Action # Visualizing the flow can really help lock down how an artifact matures.\nA Realistic Scenario: Getting to v1.3.0 # Imagine the last stable version in the repository is v1.2.0. I merge a new feature PR to the main branch, kicking off an active development sprint:\nContinuous Integration (INT) Over the course of the sprint, many fixes and small changes are pushed. Each push triggers the next-version action. The CI generates v1.3.0-alpha.1, then v1.3.0-alpha.2, all the way up until an iteration I think is structurally sound: v1.3.0-alpha.15.\nStaging (STG) I deploy v1.3.0-alpha.15 to STG for deeper systems testing. The action promote-version processes this alpha tag and smoothly generates the first beta: v1.3.0-beta.1. As the QA pipeline tests it, it finds bugs. I push fixes representing new alphas up to v1.3.0-alpha.18, and promote those to STG. The system auto-increments STG betas over time up to v1.3.0-beta.5.\nUser Acceptance Testing (UAT) The v1.3.0-beta.5 build holds up perfectly. I push it to UAT where the final review happens. promote-version takes the beta and translates it to a release candidate: v1.3.0-rc.1. Wait—a typo is spotted at the last second. I fix it, generating alpha.19, pushing to beta.6, and finally promoting a new RC up to UAT: v1.3.0-rc.2.\nProduction (PROD) The final sign-off is given. I run the PROD deployment targeting v1.3.0-rc.2. The promote-version action is invoked with the is-stable: true flag. It strips away all prerelease information, generating the pristine, stable release tag: v1.3.0.\nWorkflow Code Examples # Below are snippets demonstrating how to arrange your .github/workflows to facilitate this exact pipeline in your own environment.\n1. Generating the Initial Version (CI) # This step executes entirely on code changes and establishes the base testing artifact.\nname: 1. CI - Build and Integration on: push: branches: [ main ] jobs: build: runs-on: ubuntu-latest steps: - name: Checkout Code uses: actions/checkout@v4 with: fetch-depth: 0 # Required to see historical git tags for accurate bump calculations - name: Generate next Alpha Version id: generate-version # Using a custom action to calculate the next semver increment uses: anvaplus/github-actions-common/.github/actions/next-version@main with: version-type: \u0026#39;alpha\u0026#39; tag-repo: \u0026#39;true\u0026#39; - name: Report Build run: echo \u0026#34;Generated integration build: ${{ steps.generate-version.outputs.new-version }}\u0026#34; # Output: v1.3.0-alpha.1 2. Progressing through Environments (CD) # Because artifact promotion isn\u0026rsquo;t always immediately triggered by code pushes (often it relies on a manual trigger, an approval gate, or automated integration test success events), here is an example of what the CD steps look like for your upper environments.\nDeploy to STG (Beta) # - name: Promote to Beta (STG) id: promote-stg uses: anvaplus/github-actions-common/promote-version@main with: version: \u0026#39;v1.3.0-alpha.1\u0026#39; # Usually passed dynamically e.g. ${{ inputs.artifact_version }} promote-type: \u0026#39;beta\u0026#39; tag-repo: \u0026#39;true\u0026#39; # Output: v1.3.0-beta.1 Deploy to UAT (RC) # - name: Promote to RC (UAT) id: promote-uat uses: anvaplus/github-actions-common/promote-version@main with: version: \u0026#39;v1.3.0-beta.1\u0026#39; promote-type: \u0026#39;rc\u0026#39; tag-repo: \u0026#39;true\u0026#39; # Output: v1.3.0-rc.1 Deploy to PROD (Stable) # For the final step to production, we no longer process a promote-type. Instead, we set the is-stable flag to forcefully clean the tag—giving us the pure semantic version.\n- name: Promote to Stable (PROD) id: promote-prod uses: anvaplus/github-actions-common/promote-version@main with: version: \u0026#39;v1.3.0-rc.1\u0026#39; is-stable: \u0026#39;true\u0026#39; tag-repo: \u0026#39;true\u0026#39; # Output: v1.3.0 Traceability as a Standard # By adhering to this two-action strategy, I maintain a clean, linear, and thoroughly traceable history. I know precisely what code ran in which environment at any given time, bridging the gap between homelab experimentation and enterprise platform guarantees.\nThis versioning approach removes human error, keeps the Git history cleanly tagged, and aligns perfectly with modern GitOps principles\u0026hellip; a crucial underpinning for deploying scalable and resilient CI/CD pipelines.\nStay tuned! Andrei\n","date":"17 March 2026","externalUrl":null,"permalink":"/stop-rebuilding-your-images-build-once-promote-everywhere-manifesto/","section":"Blog","summary":"The Enterprise Traceability Problem # Guessing whether v1.3.0 in production actually includes yesterday’s critical security patch is a dangerous game. Knowing exactly which version of an artifact is running in any given environment isn’t just a nice-to-have dashboard feature… it’s the foundation of a reliable release process. You can never afford to wonder if the build candidate QA just signed off on is truly the exact same binary you are deploying to users.\n","title":"Stop Rebuilding Your Images: The \"Build Once, Promote Everywhere\" Manifesto","type":"blog"},{"content":" The Question That Changed Everything # Over the past months, I\u0026rsquo;ve received a variation of the same question more than any other:\n\u0026ldquo;I know tool X, Y, and Z\u0026hellip; I\u0026rsquo;ve done courses on tool Q\u0026hellip; but I can never find documentation on how you actually put all these tools together in production.\u0026rdquo;\nIt\u0026rsquo;s a fair point. The internet is full of isolated tutorials: \u0026ldquo;How to set up Argo CD,\u0026rdquo; \u0026ldquo;How to write a Dockerfile,\u0026rdquo; \u0026ldquo;How to configure Helm charts.\u0026rdquo; But almost nobody shows you how the entire assembly line works end-to-end\u0026hellip; from a developer pushing code to a user hitting a URL in their browser.\nIn my day job, architecting platforms for private banking, I build exactly these pipelines. But showing a bank\u0026rsquo;s internal CI/CD pipeline isn\u0026rsquo;t exactly an option. So I needed a real, public application to demonstrate the full lifecycle.\nThat application is this blog.\nWhy I Left Hashnode # If you\u0026rsquo;re reading this, you may have already noticed: this blog is no longer on Hashnode. You\u0026rsquo;re looking at a Hugo-powered site.\nLet me be clear: Hashnode is a great platform. It served me well when I started this journey and wanted to focus purely on writing. But after a recent update, a feature I relied on heavily was removed: the ability to sync blog posts to a GitHub repository automatically. Every time I published, my Git repo would get updated with the new Markdown file. It was a clean, version-controlled workflow that felt right for an engineer.\nWithout that sync, the workflow feels broken. And rather than fight it, I saw an opportunity.\nI have a fully operational platform sitting in my homelab. We\u0026rsquo;ve built a networking layer with automated TLS, distributed storage with Longhorn, a PostgreSQL database layer with CloudNativePG, centralized identity with Keycloak, and a battle-tested GitOps workflow with Argo CD. Why not use it?\nThe Interim Setup: Hugo on Netlify # Of course, before I can host the blog on Kubernetes, I still need a place to\u0026hellip; write about hosting the blog on Kubernetes. It\u0026rsquo;s a chicken-and-egg problem.\nSo for now, the blog is running on Hugo and hosted on Netlify. It\u0026rsquo;s a clean, fast, and reliable setup that lets me keep publishing while I build out the full self-hosted infrastructure. Think of Netlify as the scaffolding\u0026hellip; it keeps the building standing while we renovate the foundation underneath.\nThe Mission: Treat It Like a Bank Website # This isn\u0026rsquo;t just about hosting a blog. This is about using a real, public-facing application as a living case study for how a production website is built, deployed, and operated\u0026hellip; the same way I\u0026rsquo;d do it for a financial institution.\nThink about it. A public-facing website is the most relatable production workload there is. Everyone understands what a website does. It needs to be fast, secure, always available, and properly monitored. It has a CI/CD pipeline, a container image, DNS, TLS, logging, and analytics. It is, in miniature, every problem a platform engineer solves daily.\nThe Public Test Repository # To make this series truly reproducible, I\u0026rsquo;ll create a public GitHub repository with a test website built using the exact same technology stack as this blog — Hugo, the same containerization approach, the same pipeline, the same Helm charts. Every piece of infrastructure I build for that test site will be a 1:1 mirror of what powers andreivasiliu.com.\nThis means you won\u0026rsquo;t just be reading about concepts\u0026hellip; you\u0026rsquo;ll be able to clone the repo, follow along, and deploy the same setup yourself. And since the technologies are identical, once the full Kubernetes pipeline is functional for the test site, switching this blog over is a simple configuration change.\nEvery step I take will be documented in its own blog post, turning this migration into a multi-part series. Here is the roadmap:\nPhase 1: Containerization \u0026amp; CI/CD # The first step is to take the blog\u0026rsquo;s source code and make it deployable.\nContainerize the blog: Write an optimized, multi-stage Dockerfile to package the static site generator into a lean, production-ready container image. No bloated base images, no unnecessary packages\u0026hellip; just like we\u0026rsquo;d ship a production artifact at work. Build a GitHub Actions pipeline: Create a CI pipeline that automatically builds the container image on every push to main, including container image scanning for CVEs before pushing and multi-architecture builds (amd64/arm64) for flexibility. Implement automatic semantic versioning: Build a custom GitHub Action that versions the container image automatically (e.g., v1.2.3) based on commit conventions, eliminating the need for manual tagging. This is a pattern I use extensively in enterprise pipelines. Push to a container registry: Configure the pipeline to push the versioned image to a container registry with proper image tagging and retention policies — the same way we manage artifacts in a regulated environment. Phase 2: Kubernetes Deployment with GitOps # With a container image ready, we bring it into the cluster using the patterns we\u0026rsquo;ve already established.\nCreate Helm charts: Build a Helm chart for the blog deployment — configurable replicas, resource limits, health checks, and environment-specific overrides\u0026hellip; following the same patterns from the base manifests repository in my four-repo GitOps structure. Deploy via Argo CD: Define the Argo CD Application manifest and let GitOps handle the rollout. The pipeline will automatically update the image tag in the environments repository, triggering a seamless deployment. No kubectl apply here. Expose via Gateway API: Serve the blog over HTTPS using Traefik and the Gateway API, with certificates automatically managed by cert-manager and Let\u0026rsquo;s Encrypt. Phase 3: Observability \u0026amp; Production Hardening # A production website isn\u0026rsquo;t just deployed; it\u0026rsquo;s observed.\nAnalytics collection: Show how website analytics are collected\u0026hellip; tracking visits, page views, and reader behavior without relying on third-party SaaS tools, respecting user privacy. Centralized logging: Demonstrate how application logs are collected, aggregated, and made searchable, so that when something breaks at 2 AM, you know exactly where to look. Crash and error reporting: Set up proper error tracking to capture frontend and backend exceptions, so issues are surfaced before users report them. And more: Health checks, resource limits, network policies, uptime monitoring\u0026hellip; all the small details that separate a \u0026ldquo;hobby deployment\u0026rdquo; from a production one. What We\u0026rsquo;ve Already Built # This plan isn\u0026rsquo;t starting from zero. Look at what we\u0026rsquo;ve assembled over this series:\nKubernetes clusters provisioned with Talos Omni Networking with Cilium CNI managed by Argo CD Secrets management via 1Password and External Secrets Load balancing with MetalLB Ingress and DNS through Traefik and Technitium Automated TLS via cert-manager Persistent storage with Longhorn Databases running on CloudNativePG Identity and SSO powered by Keycloak integrated with Argo CD Think of it this way: we\u0026rsquo;ve spent months building a Formula 1 car piece by piece. The engine (Kubernetes on Talos), the chassis (the four-repo GitOps structure), the aerodynamics (Cilium networking), the fuel system (secrets management with 1Password), the tires (TLS with cert-manager), and the cockpit (Keycloak identity).\nNow it\u0026rsquo;s time to drive it.\nWhat\u0026rsquo;s Next # In the next post, we\u0026rsquo;ll get our hands dirty with Phase 1: containerizing the blog and building the CI/CD pipeline from scratch. We\u0026rsquo;ll write the Dockerfile, set up GitHub Actions, and implement automatic versioning — a complete, production-grade build pipeline.\nIf you\u0026rsquo;ve ever wondered how all the individual pieces of a modern DevOps stack come together to ship real software, this series is for you. Follow along, and let\u0026rsquo;s build something real.\nAs always, all code and configurations will be open source and available in my GitHub repositories.\nThis is a continuation of my homelab series. If you\u0026rsquo;re just joining, I recommend starting with Why Not a Homelab? and following the series from the beginning.\n","date":"10 March 2026","externalUrl":null,"permalink":"/from-hashnode-to-kubernetes-why-im-self-hosting-my-blog-like-a-bank-website/","section":"Blog","summary":"The Question That Changed Everything # Over the past months, I’ve received a variation of the same question more than any other:\n","title":"From Hashnode to Kubernetes: Why I'm Self-Hosting My Blog Like a Bank Website","type":"blog"},{"content":" More Than Just a Login Screen # In our last post, we deployed a production-ready Keycloak cluster. But an Identity Provider (IdP) in isolation is just a database of users. Its true power lies in being the architectural enforcement point for your entire platform.\nIn the enterprise world, Keycloak is a beast. I\u0026rsquo;ve used it to broker trust between legacy Active Directory forests and modern cloud-native apps, managing complex federation and fine-grained authorization policies. It doesn\u0026rsquo;t just authenticate users; it authorizes access.\nI treat my homelab with the same rigor. Keycloak is the central nervous system of my security posture:\nAPI Security at the Edge: It issues JWTs that Traefik\u0026rsquo;s middleware verifies before a request ever reaches a microservice, dropping malicious traffic at the door. Secure Access for Internal Tools: Many operational dashboards (like Jaeger or Longhorn UI) lack built-in authentication because they are often designed to be accessed via kubectl port-forward. To expose them securely via Ingress, I follow the best practice of wrapping them in an OAuth2 Proxy, enforcing a strict Keycloak login before any traffic is allowed through. Identity Federation: It unifies access across Grafana, Hubble UI, and the rest of the stack, ensuring that one identity rules them all. Today, we focus on the most critical piece of that stack: Argo CD.\nArgo CD is the control plane of our GitOps operation, holding write access to the entire cluster. Protecting such a critical component with a default admin password or shared credentials creates a significant vulnerability. We will resolve this by implementing Role-Based Access Control (RBAC) mapped directly to Keycloak groups, ensuring that cluster administration privileges are centrally managed, auditable, and secure.\nThe GitOps Approach vs. The Manual Way # If you look at the official Argo CD documentation for Keycloak, it\u0026rsquo;s excellent. It walks you through editing ConfigMaps and patching secrets imperatively via kubectl.\nHowever, we don\u0026rsquo;t do \u0026ldquo;kubectl edit\u0026rdquo; here. We do GitOps.\nIn a GitOps environment, we don\u0026rsquo;t manually patch live objects; we define the desired state in our repository. The challenge is translating those imperative instructions into a declarative Helm chart configuration that Argo CD can manage itself\u0026hellip; a core part of the four-repo GitOps structure I use to manage the platform.\nHere is how I adapted the standard instructions into a clean, reproducible GitOps configuration.\nMy GitOps Implementation # As I mentioned earlier, the official Argo CD documentation is excellent, and there are countless step-by-step tutorials available for clicking through the Keycloak UI. I won\u0026rsquo;t replicate those here.\nInstead, I want to focus on how I implemented this in a GitOps environment. The challenge isn\u0026rsquo;t connecting the two services; it\u0026rsquo;s defining the integration declaratively so that I don\u0026rsquo;t have to manually act as the \u0026ldquo;glue\u0026rdquo; between them.\n1. The Keycloak Prerequisites # On the Keycloak side, I performed two manual setup steps (though these could also be automated with the Keycloak Operator):\nCreated an OIDC Client: I set up a client named argocd, enabled Client authentication (which generates the client secret), and set the Root URL to my Argo CD instance (e.g., https://argo.dev.thebestpractice.tech). Configured Group Claims: To authorize users based on their Keycloak groups, I created a new Client Scope named groups with a \u0026ldquo;Group Membership\u0026rdquo; mapper. I set the \u0026ldquo;Token Claim Name\u0026rdquo; to groups and disabled \u0026ldquo;Full group path\u0026rdquo; so I get clean group names like ArgoCDAdmins. Finally, I added this scope to the argocd client as a Default Client Scope. The output of this process is a Client ID and a Client Secret.\n2. Declarative Secrets # I never commit secrets to Git. Instead, I follow the pattern from my post on automating secrets: I store the Keycloak Client Secret in 1Password and use the External Secrets Operator to inject it into the cluster.\nNote the labels in the template metadata. Argo CD requires specific labels to recognize secrets that are part of its configuration ecosystem.\napiVersion: external-secrets.io/v1beta1 kind: ExternalSecret metadata: annotations: argocd.argoproj.io/compare-options: IgnoreExtraneous name: argocd-secret-sso namespace: argocd spec: secretStoreRef: kind: ClusterSecretStore name: op-cluster-secret-store target: name: argocd-secret-sso creationPolicy: Owner template: metadata: labels: # Required by ArgoCD to read secret values from the default secret app.kubernetes.io/part-of: argocd data: - secretKey: clientID remoteRef: key: EXTSEC_1Password_Keycloak_ArgoCD property: client_ID - secretKey: clientSecret remoteRef: key: EXTSEC_1Password_Keycloak_ArgoCD property: client_secret_dev This configuration ensures that the argocd-secret-sso secret is automatically created in the cluster, populated directly from my password manager, without ever exposing credentials in my repo.\n3. Configuring Argo CD via Helm # This is where the declarative approach shines. Instead of patching a ConfigMap with kubectl edit, I update my Argo CD Helm values file. Argo CD\u0026rsquo;s chart allows us to reference the secret we created in the previous step directly in the OIDC configuration.\nconfigs: cm: url: https://argo.dev.thebestpractice.tech # OIDC Configuration block oidc.config: | name: Keycloak issuer: https://keycloak.dev.thebestpractice.tech/realms/master # Reference the secret keys we created in Step 2 clientID: $argocd-secret-sso:clientID clientSecret: $argocd-secret-sso:clientSecret requestedScopes: [\u0026#34;openid\u0026#34;, \u0026#34;profile\u0026#34;, \u0026#34;email\u0026#34;, \u0026#34;groups\u0026#34;] Key Takeaway: The syntax $secret-name:key tells Argo CD to read the value from a Kubernetes secret rather than a plaintext string. This keeps the configuration transparent but secure.\n4. Direct Group-to-Role Mapping (RBAC) # Finally, I map the Keycloak groups directly to Argo CD roles using the argocd-rbac-cm configuration in Helm.\nconfigs: rbac: # Tell Argo to look at the \u0026#39;groups\u0026#39; claim in the OIDC token scopes: \u0026#34;[groups]\u0026#34; # CSV format: p (policy) or g (group), subject, role policy.csv: | g, ArgoCDAdmins, role:admin With this block, anyone added to the ArgoCDAdmins group in Keycloak effectively inherits full admin rights in Argo CD. Management is centralized in the IdP, and the policy is version-controlled in Git.\nConclusion: Identity as Code # This implementation does more than just add a \u0026ldquo;Login\u0026rdquo; button to Argo CD. It fundamentally shifts how we manage access in the platform.\nBy moving away from local users and imperative kubectl patches, we have established a robust, audit-ready security posture:\nIdentity is Centralized: Keycloak is now the single source of truth. Disabling a user there instantly revokes their access to the control plane. Configuration is Declarative: The entire authentication flow\u0026hellip; from the client secret injection to the RBAC policy\u0026hellip; is defined in Git. There is no \u0026ldquo;magic state\u0026rdquo; hidden in the cluster. Secrets are Secure: We’ve bridged the gap between our password manager (1Password) and Kubernetes without ever exposing credentials in our repository. This is the difference between a \u0026ldquo;home server\u0026rdquo; and a \u0026ldquo;homelab platform.\u0026rdquo; We aren\u0026rsquo;t just installing tools; we are integrating them into a cohesive, secure ecosystem.\nThis setup lays the groundwork for everything that follows. Whether it\u0026rsquo;s securing legacy dashboards with OAuth2 Proxy or managing machine identities, Keycloak will remain the central pillar of our security architecture. I look forward to sharing those implementations in future posts as the platform evolves.\nAs always, you can find the complete implementation, including the Argo CD Helm values and External Secret templates, in my GitHub repository.\nStay tuned! Andrei\n","date":"2 March 2026","externalUrl":null,"permalink":"/gitops-your-identity-integrating-keycloak-with-argo-cd/","section":"Blog","summary":"More Than Just a Login Screen # In our last post, we deployed a production-ready Keycloak cluster. But an Identity Provider (IdP) in isolation is just a database of users. Its true power lies in being the architectural enforcement point for your entire platform.\n","title":"GitOps Your Identity: Integrating Keycloak with Argo CD","type":"blog"},{"content":"","date":"2 March 2026","externalUrl":null,"permalink":"/tags/keycloak/","section":"Tags","summary":"","title":"Keycloak","type":"tags"},{"content":"","date":"2 March 2026","externalUrl":null,"permalink":"/series/keycloak-on-kubernetes/","section":"Series","summary":"","title":"Keycloak on Kubernetes","type":"series"},{"content":"","date":"2 March 2026","externalUrl":null,"permalink":"/tags/oidc/","section":"Tags","summary":"","title":"Oidc","type":"tags"},{"content":"","date":"2 March 2026","externalUrl":null,"permalink":"/tags/sso/","section":"Tags","summary":"","title":"Sso","type":"tags"},{"content":"","date":"25 February 2026","externalUrl":null,"permalink":"/tags/cloudnativepg/","section":"Tags","summary":"","title":"Cloudnativepg","type":"tags"},{"content":"","date":"25 February 2026","externalUrl":null,"permalink":"/tags/iam/","section":"Tags","summary":"","title":"Iam","type":"tags"},{"content":" Take Back Control of Your Identity # Over the last few months, we\u0026rsquo;ve built a platform that rivals small enterprise setups. We have established a resilient networking layer with automated TLS, deployed distributed block storage with Longhorn, and mastered PostgreSQL on Kubernetes with CloudNativePG.\nOur infrastructure is ready, but there is one critical component where many engineers\u0026hellip; even experienced ones\u0026hellip; take the easy way out: Identity.\nIt\u0026rsquo;s tempting to just slap \u0026ldquo;Login with Google\u0026rdquo; on your services or, worse, rely on a dozen local admin accounts. But in the regulated environments I work in, identity isn\u0026rsquo;t something you casually outsource. It is the new perimeter. If you don\u0026rsquo;t control your Identity and Access Management (IAM), you don\u0026rsquo;t really control your platform.\nThis post marks a turning point. We are moving from infrastructure to platform services. We will deploy Keycloak, the industry-standard open-source IAM solution, using the same GitOps and high-availability patterns I trust in production. We aren\u0026rsquo;t just installing an app; we\u0026rsquo;re reclaiming sovereignty over our authentication layer.\nWhy Keycloak? Moving Beyond \u0026ldquo;Admin-123\u0026rdquo; # If you\u0026rsquo;ve followed this series, you know my philosophy: build it like you\u0026rsquo;re running a bank. In a high-stakes environment, we don\u0026rsquo;t log into systems with local admin accounts and shared passwords. That\u0026rsquo;s a security incident waiting to happen. Yet, in many homelabs, that\u0026rsquo;s exactly what happens\u0026hellip; a sprawling mess of local users across Grafana, Proxmox, and Argo CD.\nKeycloak allows us to stop that madness. It is the open-source standard for Identity and Access Management (IAM), effectively the self-hosted equivalent of Auth0 or Okta. I\u0026rsquo;ve deployed Keycloak in environments handling sensitive financial data because it offers total control without sacrificing capability.\nHere is why it effectively becomes the \u0026ldquo;brain\u0026rdquo; of your platform\u0026rsquo;s security:\nTrue Single Sign-On (SSO): You authenticate once, and the doors open to Grafana, Argo CD, and your custom apps. No more password fatigue. Identity Brokering (The \u0026lsquo;Killer Feature\u0026rsquo;): This is my favorite capability. Keycloak can front other identity providers. Want to let users log in with GitHub or Google? You connect them to Keycloak, and Keycloak handles the complex translation layer for your applications. Your apps only ever need to know about Keycloak. Standardization: It speaks perfectly fluent OIDC (OpenID Connect) and SAML 2.0. Learning to configure these protocols in Keycloak is a directly transferable skill to any enterprise IAM role. Centralized User Federation: Whether you are syncing from an existing LDAP/AD or managing users directly, you have one source of truth. By deploying this, we aren\u0026rsquo;t just installing a login screen; we are implementing the same centralized security posture used by the world\u0026rsquo;s largest enterprises.\nThe Architectural Blueprint \u0026amp; Implementation Guide # Deploying Keycloak \u0026ldquo;the easy way\u0026rdquo; often involves using its embedded, non-production database. We\u0026rsquo;re not doing that. Our goal is a resilient, production-grade setup, which means leveraging the robust infrastructure we\u0026rsquo;ve already built.\nOur architecture consists of several key components, all managed declaratively:\nArgo CD: Our trusted GitOps engine, ensuring the deployed state matches our Git repository. CloudNativePG (CNPG): To provide a highly available PostgreSQL database running on our distributed Longhorn storage. External Secrets Operator: To securely inject credentials from 1Password. Traefik: Our ingress gateway, configured to expose Keycloak securely over HTTPS. Here is how we assemble these components step-by-step.\nStep 1: The Database Foundation with CloudNativePG # First, Keycloak needs a database. We\u0026rsquo;ll use the pattern from our guide on mastering PostgreSQL on Kubernetes with CloudNativePG to provision a dedicated cluster. This relies on the longhorn-cnpg StorageClass we configured in the Longhorn on Talos Linux post.\nWe start by creating a secret for the database user, pulling credentials securely from 1Password using the method from my post on automating Kubernetes secrets.\nFile: base/secrets/keycloak/keycloak-db-user.yaml\n--- apiVersion: external-secrets.io/v1 kind: ExternalSecret metadata: name: keycloak-db-user namespace: keycloak spec: secretStoreRef: kind: ClusterSecretStore name: op-cluster-secret-store target: creationPolicy: Owner template: type: kubernetes.io/basic-auth data: - secretKey: username remoteRef: key: EXTSEC_1Password_Keycloak_secrets property: db_user - secretKey: password remoteRef: key: EXTSEC_1Password_Keycloak_secrets property: db_user_pwd_dev Next, we define the CNPG cluster itself.\nFile: environments/dev/database/cnpg-cluster/clusters/keycloak/override.values.yaml\ntype: postgresql mode: standalone version: postgresql: \u0026#34;16\u0026#34; cluster: instances: 3 storage: size: 2Gi storageClass: \u0026#34;longhorn-cnpg\u0026#34; superuserSecret: keycloak-db-superuser # Managed by ESO initdb: database: keycloak owner: keycloak # The user from our secret secret: name: keycloak-db-user Step 2: Deploying Keycloak with the Cloud Pirates Helm Chart # We\u0026rsquo;ll use the well-maintained cloudpirates/keycloak Helm chart and define the Argo CD Application.\nFile: base/keycloak/keycloak.yaml\napiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: keycloak namespace: argocd spec: destination: namespace: keycloak server: https://kubernetes.default.svc project: argo-config sources: - repoURL: https://github.com/anvaplus/homelab-k8s-argo-config.git targetRevision: main ref: valuesRepo - repoURL: \u0026#34;registry-1.docker.io/cloudpirates\u0026#34; targetRevision: \u0026#34;0.16.4\u0026#34; chart: keycloak helm: valueFiles: - $valuesRepo/base/keycloak/values.yaml Step 3: Configuration and Secrets # Now, we customize the deployment. We create another ExternalSecret for the Keycloak admin password.\nFile: environments/dev/keycloak/custom-values/keycloak-secrets.yaml\n--- apiVersion: external-secrets.io/v1 kind: ExternalSecret metadata: name: keycloak-secrets namespace: keycloak spec: secretStoreRef: kind: ClusterSecretStore name: op-cluster-secret-store target: creationPolicy: Owner data: - secretKey: admin-password remoteRef: key: EXTSEC_1Password_Keycloak_secrets property: admin_pwd_dev Then, we provide the environment-specific Helm values to connect Keycloak to our external database and use the admin secret.\nFile: environments/dev/keycloak/custom-values/custom-values.yaml\nkeycloak: adminUser: admin existingSecret: \u0026#34;keycloak-secrets\u0026#34; secretKeys: adminPasswordKey: \u0026#34;admin-password\u0026#34; proxyHeaders: \u0026#34;xforwarded\u0026#34; # Important for use behind Traefik production: true # Disable embedded databases postgres: enabled: false mariadb: enabled: false # Configure external database connection database: type: \u0026#34;postgres\u0026#34; host: \u0026#34;cnpg-keycloak-rw.keycloak.svc.cluster.local\u0026#34; # The CNPG service port: \u0026#34;5432\u0026#34; name: \u0026#34;keycloak\u0026#34; existingSecret: \u0026#34;keycloak-db-user\u0026#34; secretKeys: passwordKey: \u0026#34;password\u0026#34; usernameKey: \u0026#34;username\u0026#34; Step 4: Exposing Keycloak via Traefik # The final piece is to create an HTTPRoute to expose the Keycloak service. This leverages the secure ingress path we built in our Automated TLS with Cert-Manager guide.\nFile: environments/dev/ingress/routes/http-routes/keycloak.yaml\nkind: HTTPRoute apiVersion: gateway.networking.k8s.io/v1beta1 metadata: name: keycloak namespace: keycloak spec: parentRefs: - kind: Gateway name: traefik-gateway namespace: traefik sectionName: websecure hostnames: [keycloak.dev.thebestpractice.tech] rules: - matches: - path: type: PathPrefix value: / backendRefs: - name: keycloak kind: Service port: 8080 Verification: The Login Page # Once Argo CD has synced all the changes, the entire stack is live. You can navigate to https://keycloak.dev.thebestpractice.tech and be greeted by the Keycloak login page, served securely over HTTPS.\nYou can log in to the master realm with the username admin and the password injected by the External Secrets Operator.\nConclusion: The Platform Takes Shape # By deploying Keycloak, we\u0026rsquo;ve done more than just tick a box on a feature list. We\u0026rsquo;ve established a sovereign identity perimeter. We now have a production-grade IAM system that doesn\u0026rsquo;t rely on third-party cloud providers or scattered local accounts.\nThis deployment is the ultimate validation of our GitOps architecture. Every layer we’ve built—from the distributed storage handling the database to the automated certificates securing the ingress—worked in concert to deliver a critical platform service. We didn\u0026rsquo;t take shortcuts, and the result is a system that stands toe-to-toe with enterprise environments.\nAs always, the complete configuration is available in my homelab-k8s-argo-config GitHub repository.\nStay tuned as we begin to integrate our services with Keycloak, unlocking the power of Single Sign-On across the homelab.\nAndrei\n","date":"25 February 2026","externalUrl":null,"permalink":"/stop-outsourcing-identity-a-production-guide-to-keycloak-on-k8s/","section":"Blog","summary":"Take Back Control of Your Identity # Over the last few months, we’ve built a platform that rivals small enterprise setups. We have established a resilient networking layer with automated TLS, deployed distributed block storage with Longhorn, and mastered PostgreSQL on Kubernetes with CloudNativePG.\n","title":"Stop Outsourcing Identity: A Production Guide to Keycloak on K8s","type":"blog"},{"content":"","date":"24 February 2026","externalUrl":null,"permalink":"/tags/cnpg/","section":"Tags","summary":"","title":"Cnpg","type":"tags"},{"content":"","date":"24 February 2026","externalUrl":null,"permalink":"/tags/longhorn/","section":"Tags","summary":"","title":"Longhorn","type":"tags"},{"content":"","date":"24 February 2026","externalUrl":null,"permalink":"/tags/postgresql/","section":"Tags","summary":"","title":"Postgresql","type":"tags"},{"content":" The \u0026ldquo;Stateful\u0026rdquo; Reality Check # In our last post, we solved the persistence layer by deploying Longhorn on Talos Linux. We finally have a place to put data. But a raw block device isn\u0026rsquo;t a database.\nRunning a database in Kubernetes is often cited as one of the hardest challenges in platform engineering. You have to manage failover, backups, upgrades, and configuration changes\u0026hellip; all while ensuring data integrity. In the enterprise world, we often offload this to managed services like AWS RDS or Google Cloud SQL.\nHowever, in my experience working in the private banking sector, \u0026ldquo;just use RDS\u0026rdquo; isn\u0026rsquo;t always an option. Government regulations and data sovereignty laws frequently mandate that data stays on-premise. In these environments, I\u0026rsquo;ve seen many setups rely on traditional PostgreSQL clusters managed by tools like Patroni on bare-metal or VMs. While effective, they require significant operational overhead to manage (and that\u0026rsquo;s a topic for a future blog post).\nBut in the homelab? We are the cloud provider. We have to build our own RDS.\nThis post details how to build a production-ready PostgreSQL service using CloudNativePG (CNPG), and crucially, how to tune it to play nicely with our underlying Longhorn storage to avoid performance killers.\nThe Operator Pattern: Why CNPG? # You could deploy PostgreSQL using a simple Helm chart that spins up a StatefulSet. It works\u0026hellip; until the primary node dies, or you need to major version upgrade, or you need point-in-time recovery.\nThis is where the Operator Pattern shines. An Operator is essentially a robotic sysadmin running inside your cluster. It watches your custom resources (like a YAML file saying \u0026ldquo;I want a Postgres Cluster\u0026rdquo;) and actively manages the underlying Pods and Services to make that reality happen.\nI chose CloudNativePG (CNPG) because:\nIt\u0026rsquo;s Declarative: You define the desired state of your cluster, not the steps to get there. Immutability: It treats PostgreSQL instances as disposable. If a node fails, it spins up a new one and resyncs. Enterprise Origins: Originally built by EDB, it brings serious features like WAL archiving and synchronous replication to the open-source table. The Double Replication Trap # Here is the specific architectural challenge we face when combining Longhorn with a Distributed Database.\nBy default, Longhorn replicates every block of data to 3 different nodes to ensure availability. By default, a high-availability PostgreSQL cluster also replicates data to 3 different instances.\nIf you run a standard 3-node CNPG cluster on top of standard Longhorn volumes, you are writing every single byte of data 9 times (3 DB replicas × 3 Storage replicas).\nSetup Data Copies Performance Reliability Default (Longhorn 3 + CNPG 3) 9 Very Slow (High Latency) Extreme (Overkill) Minimal (Longhorn 3 + CNPG 1) 3 Stale Data Potential Low (No DB failover) Optimized (Longhorn 1 + CNPG 3) 3 Fast (Local Speed) High (Standard) The Solution: We need to let the Application (CNPG) handle the High Availability, effectively treating the storage layer as \u0026ldquo;ephemeral\u0026rdquo; local disks.\nImplementation Guide # 1. The Optimized StorageClass # First, we define a specific Longhorn StorageClass that replicates the behavior of a local SSD. We set numberOfReplicas to 1 and force dataLocality to strict-local.\nFile: base/longhorn/storage-class-cnpg.yaml\napiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: longhorn-cnpg provisioner: driver.longhorn.io allowVolumeExpansion: true parameters: numberOfReplicas: \u0026#34;1\u0026#34; staleReplicaTimeout: \u0026#34;2880\u0026#34; # 48 hours fromBackup: \u0026#34;\u0026#34; fsType: \u0026#34;ext4\u0026#34; dataLocality: \u0026#34;strict-local\u0026#34; # Critical for performance With strict-local, Longhorn attempts to keep the data on the same node as the Pod. If the node dies, we lose that specific volume—and that is okay. CNPG will detect the failure, promote one of the other two standby instances to Primary, and eventually spin up a new replica to replace the lost one.\n2. Deploying the CNPG Operator # We use Argo CD to manage the operator lifecycle. This ensures our \u0026ldquo;robotic sysadmin\u0026rdquo; handles updates and configuration drift.\nFile: base/database/cnpg-operator/cnpg-operator.yaml\napiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: cloudnative-pg-operator namespace: argocd spec: destination: namespace: cnpg-system server: https://kubernetes.default.svc project: argo-config sources: - repoURL: https://cloudnative-pg.github.io/charts chart: cloudnative-pg targetRevision: 0.27.1 helm: releaseName: cnpg-operator values: | config: clusterWide: true # Manage DBs in all namespaces 3. Deploying a Database Cluster (The GitOps Way) # Now we can request a database. Unlike a traditional VM where you might install one Postgres server and create 50 databases inside it, the Kubernetes pattern is one Cluster per Microservice. This ensures isolation: if one app goes rogue and eats all the CPU, it doesn\u0026rsquo;t take down the others.\nTo manage this at scale without copying 500 lines of YAML for every microservice, we use a Kustomize Overlay strategy with Argo CD. We define a \u0026ldquo;Base Application\u0026rdquo; that knows how to deploy a standard CNPG cluster, and then we just patch the specifics (name, namespace, storage size) for each app.\nStep 3.1: The Base Application # This manifest tells Argo CD how to deploy a generic CNPG cluster using the official Helm chart.\nFile: base/database/cnpg-cluster/cnpg-cluster.yaml\napiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: cnpg-cluster namespace: argocd finalizers: - resources-finalizer.argocd.argoproj.io spec: destination: namespace: cnpg-system server: https://kubernetes.default.svc project: argo-config sources: - repoURL: https://github.com/anvaplus/homelab-k8s-argo-config.git targetRevision: main ref: valuesRepo - repoURL: https://cloudnative-pg.github.io/charts path: cnpg chart: cluster targetRevision: 0.5.0 helm: releaseName: cnpg-cluster valueFiles: - $valuesRepo/base/database/cnpg-cluster/values.yaml syncPolicy: automated: prune: true selfHeal: true Step 3.2: The Kustomize Overlay (e.g., Keycloak) # When we need a database for keycloak, we don\u0026rsquo;t start from scratch. We simply patch the base application to update the target namespace and values file.\nFile: environments/dev/database/cnpg-cluster/clusters/keycloak/kustomization.yaml\napiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization resources: - ../../../../../../base/database/cnpg-cluster/ patches: - target: group: argoproj.io version: v1alpha1 kind: Application name: cnpg-cluster patch: | - op: replace path: \u0026#34;/spec/sources/1/targetRevision\u0026#34; value: \u0026#34;0.5.0\u0026#34; - op: add path: \u0026#34;/spec/sources/1/helm/valueFiles/-\u0026#34; value: \u0026#34;$valuesRepo/environments/dev/database/cnpg-cluster/clusters/keycloak/override.values.yaml\u0026#34; - op: replace path: /metadata/name value: cnfg-cluster-keycloak - op: replace path: /spec/destination/namespace value: keycloak - op: replace path: /spec/sources/1/helm/releaseName value: cnfg-cluster-keycloak Step 3.3: The Configuration Values # Finally, we define the actual database configuration in override.values.yaml. This is where we reference our optimized storage class.\nFile: environments/dev/database/cnpg-cluster/clusters/keycloak/override.values.yaml\ntype: postgresql mode: standalone version: postgresql: \u0026#34;16\u0026#34; cluster: instances: 3 storage: size: 1Gi storageClass: \u0026#34;longhorn-cnpg\u0026#34; # The magic happens here backups: enabled: false # Or true if configured This tiered approach allows us to spin up production-ready, high-availability databases in seconds by adding just two small files to our GitOps repo.\nA Note on Backups # Do not use Longhorn Snapshots for Databases.\nSnapshots happen at the block level. If you snapshot a running database while it\u0026rsquo;s flushing memory to disk, you risk capturing a corrupted state. Always use the database\u0026rsquo;s native backup tools. CNPG integrates with Barman, which streams the Write-Ahead Logs (WAL) to object storage. This allows for Point-In-Time Recovery (PITR)\u0026hellip; you can literally restore your database to the state it was in at 14:03:22 yesterday.\nConclusion # By combining the CNPG Operator with a tuned Longhorn StorageClass, we have achieved a setup that rivals enterprise RDS offerings:\nHigh Availability: Automated failover in seconds. Performance: Near-native disk speeds using strict-local storage. Resilience: Automated backups and self-healing. As always, all the code and configuration files discussed in this post are available in my GitHub repository.\nWith networking, storage, and now a robust database layer in place, we have cleared all the infrastructure hurdles. In the next post, we will finally deploy our first major application: Keycloak, the Identity Provider that will secure our entire platform.\nStay tuned! Andrei\n","date":"24 February 2026","externalUrl":null,"permalink":"/the-database-dilemma-mastering-postgresql-on-kubernetes-with-cloudnativepg/","section":"Blog","summary":"The “Stateful” Reality Check # In our last post, we solved the persistence layer by deploying Longhorn on Talos Linux. We finally have a place to put data. But a raw block device isn’t a database.\n","title":"The Database Dilemma - Mastering PostgreSQL on Kubernetes with CloudNativePG","type":"blog"},{"content":"","date":"14 February 2026","externalUrl":null,"permalink":"/tags/storage/","section":"Tags","summary":"","title":"Storage","type":"tags"},{"content":"","date":"14 February 2026","externalUrl":null,"permalink":"/tags/talos-omni/","section":"Tags","summary":"","title":"Talos-Omni","type":"tags"},{"content":" The Paradox of Statelessness # Kubernetes is designed to be ephemeral. Pods die, nodes are replaced, and the cluster heals itself. This \u0026ldquo;stateless\u0026rdquo; philosophy is efficient for application logic, but it hits a hard wall when you need to store data. Databases, message queues, and media servers all need a place to live that persists beyond a pod restart.\nIn a cloud environment like GCP or AWS, you simply request a PersistentVolumeClaim (PVC), and the cloud provider magically provisions a Persistent Disk or EBS volume. In a bare-metal or homelab environment, that magic doesn\u0026rsquo;t exist. You have to build it yourself.\nIn previous posts, we built our bare-mental networking stack and secured it with automated TLS. Now, we tackle another critical pillar of a production-ready platform: Persistent Storage.\nThis guide explores why I chose Longhorn for my Talos-based clusters and walks through the specific configurations required to make it work on an immutable OS.\nThe Landscape: Why Longhorn? # When choosing a storage solution for a Kubernetes homelab, the research typically leads to four main paths:\n1. Local Persistent Volumes # Pros: Zero overhead, native capability. Great for learning. Cons: It binds a pod to a specific node. If that node reboots or fails, your pod cannot be rescheduled elsewhere because the data is trapped locally. This violates our \u0026ldquo;high availability\u0026rdquo; goal. 2. External NAS (NFS / iSCSI) # NFS: Extremely simple (I use Unraid as NAS), but it is file-level storage. Databases like PostgreSQL often suffer performance penalties or corruption risks on NFS. iSCSI: Unlike NFS, this provides proper block storage. However, using a single NAS (Synology/TrueNAS) creates a single point of failure. If the NAS updates or crashes, the entire cluster halts. 3. Ceph / Rook # Pros: The industry \u0026ldquo;gold standard.\u0026rdquo; Highly scalable, supports block, object, and file storage. Context: Ceph is central to my long-term architectural vision and will eventually underpin the Proxmox layer itself. The Constraint: Currently, I haven\u0026rsquo;t fully assembled the hardware for the target 3-node Ceph cluster in my Proxmox configuration. Additionally, running Ceph on Talos requires specific workarounds for the immutable filesystem (handling /etc/ceph mounts) that add complexity I\u0026rsquo;m postponing for now. 4. Longhorn # Longhorn, a CNCF incubating project by Rancher, acts as the perfect bridge solution.\nBlock Storage: Provides proper RWO (ReadWriteOnce) block devices, essential for running databases. Replication: Synchronously replicates data across multiple nodes (defaulting to 3 copies). If a node fails, the volume remains accessible from another node instantly. Backups: Features built-in, easy-to-configure backups to S3-compatible targets (like MinIO or AWS S3). UI: Includes a comprehensive built-in dashboard for visibility, which offers immediate insights that often require complex plugins in Ceph. For my current setup, Longhorn delivers the \u0026ldquo;Cloud Storage\u0026rdquo; experience I need immediately. It enables me to proceed with the application platform build while the underlying physical infrastructure is finalized.\nThe Challenge: Longhorn on Immutable OS # Installing Longhorn on a standard Ubuntu node is usually just a Helm chart installation. Talos Linux, however, is immutable. You cannot SSH into a node and run apt-get install open-iscsi. The root filesystem is read-only.\nTo make Longhorn work, we need to modify our node configuration to:\nInject System Extensions: Load the necessary kernel modules and tools (iSCSI) that Longhorn depends on. Configure Mounts: Bind-mount specific paths so the Longhorn containers can write to the underlying host disk. Relax Security: Allow Longhorn\u0026rsquo;s privileged containers to perform low-level operations. Implementation Guide # We will implement these modifications in our sidero-omni-talos-proxmox repository, acting on the machine configurations and applying the updates to the cluster.\n1. Enabling System Extensions # Longhorn requires iscsi-tools to manage block devices and util-linux-tools for filesystem operations. In Talos, we declare these in the systemExtensions block of our machine configuration.\nFile: cluster-template/k8s-dev-dhcp.yaml (or your specific worker config)\nsystemExtensions: - siderolabs/iscsi-tools - siderolabs/util-linux-tools Note: If you are using my Omni Proxmox setup, you might also have qemu-guest-agent or nfsd here. Just ensure the list is additive.\n2. Configuring Data Path Mounts # Longhorn stores its data in /var/lib/longhorn by default. On Talos, we need to explicitly allow the kubelet to pass this directory through to the containers. We achieve this by patching the machine configuration with extraMounts.\nCreate a patch file patches/longhorn.yaml:\nmachine: kubelet: extraMounts: - destination: /var/mnt/longhorn type: bind source: /var/mnt/longhorn options: - bind - rshared - rw Then, reference this patch in your Workers configuration block. This ensures that every worker node in your cluster is \u0026ldquo;Longhorn-ready\u0026rdquo; upon boot.\nkind: Workers name: workers machineClass: name: proxmox-worker # ... other configs ... patches: - name: worker-labels inline: machine: nodeLabels: node-role.kubernetes.io/worker: \u0026#34;\u0026#34; - name: longhorn file: patches/longhorn.yaml Once the configuration is updated, apply it to your cluster:\nomnictl cluster template sync -v -f cluster-template/k8s-dev-dhcp.yaml This will trigger a rolling update of your nodes.\n3. Deploying via GitOps # With the infrastructure prepared, we transition to the application layer. Adhering to our standard GitOps workflow, we deploy Longhorn via Argo CD by creating a base configuration for the Helm chart, followed by environment-specific customizations using Kustomize overlays.\nNamespace Configuration # Longhorn requires privileged access to the nodes. We need to label the namespace to allow this, bypassing the default Pod Security Standards.\napiVersion: v1 kind: Namespace metadata: name: longhorn-system labels: pod-security.kubernetes.io/enforce: privileged pod-security.kubernetes.io/audit: privileged pod-security.kubernetes.io/warn: privileged Application Definition # Add the helm chart repository (https://charts.longhorn.io/) to your allowed projects list in Argo CD. Then, create the Application.\nCrucial Tip: Disable the preUpgradeChecker. In a GitOps environment, this job often hangs or fails because it expects an interactive upgrade flow.\napiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: longhorn namespace: argocd spec: project: argo-config source: repoURL: https://charts.longhorn.io chart: longhorn targetRevision: v1.11.0 helm: values: | preUpgradeChecker: jobEnabled: false # Vital for GitOps destination: server: https://kubernetes.default.svc namespace: longhorn-system syncPolicy: automated: prune: true selfHeal: true Verification # Once Argo CD syncs, you should see the Longhorn pods spinning up. You can access the Longhorn UI (I expose mine via an Ingress at longhorn.dev.thebestpractice.tech) to verify the health of your storage implementation.\nYou now have a fully replicated, distributed block storage system running on immutable infrastructure.\nConclusion # We\u0026rsquo;ve successfully bridged another gap between \u0026ldquo;homelab\u0026rdquo; and \u0026ldquo;enterprise.\u0026rdquo; By combining the immutability of Talos with the flexibility of Longhorn, we have a storage platform that is both secure by default and robust enough for critical data.\nIn the next post, we will leverage this storage foundation to deploy Keycloak, our Identity Provider. Keycloak requires a resident database (PostgreSQL), which effectively needs the robust, distributed block storage we just built. It is the perfect real-world test for our new persistence layer.\nStay tuned! Andrei\n","date":"14 February 2026","externalUrl":null,"permalink":"/the-state-of-persistence-deploying-longhorn-on-talos-linux/","section":"Blog","summary":"The Paradox of Statelessness # Kubernetes is designed to be ephemeral. Pods die, nodes are replaced, and the cluster heals itself. This “stateless” philosophy is efficient for application logic, but it hits a hard wall when you need to store data. Databases, message queues, and media servers all need a place to live that persists beyond a pod restart.\n","title":"The State of Persistence - Deploying Longhorn on Talos Linux","type":"blog"},{"content":"","date":"6 February 2026","externalUrl":null,"permalink":"/tags/cert-manager/","section":"Tags","summary":"","title":"Cert-Manager","type":"tags"},{"content":"","date":"6 February 2026","externalUrl":null,"permalink":"/series/the-path-to-automated-tls/","section":"Series","summary":"","title":"The Path to Automated TLS","type":"series"},{"content":" Locking it Down - From HTTP to HTTPS # In the preceding chapters, we established the networking foundation for a production-grade bare-metal Kubernetes platform.\nIn Chapter 1, we implemented MetalLB to provide stable LoadBalancer IPs, solving the primary hurdle of bare-metal service exposure. In Chapter 2, we deployed Traefik using the Gateway API to handle L7 routing and configured Technitium DNS for internal name resolution, successfully routing http://test.dev.thebestpractice.tech to our NGINX test service. The logical and final step is to secure this ingress path. Unencrypted HTTP is unacceptable for a production-grade setup, even within a homelab.\nThis chapter addresses that gap by implementing automated TLS certificate management. We will deploy Cert-Manager and configure it to perform DNS-01 challenges against a public DNS provider (GCP Cloud DNS) to obtain publicly trusted certificates from Let\u0026rsquo;s Encrypt. This \u0026ldquo;split-horizon DNS\u0026rdquo; approach allows us to secure internal services with valid certificates, completing our platform\u0026rsquo;s networking stack.\nLet\u0026rsquo;s lock it down.\nThe Split-Horizon DNS Strategy: Public Challenges, Private Resolution # To get a publicly trusted certificate from Let\u0026rsquo;s Encrypt, we must prove we own the domain we\u0026rsquo;re requesting it for. The most robust way to do this is with a DNS-01 challenge. This involves creating a specific TXT record in our domain\u0026rsquo;s public DNS zone.\nThis presents a classic homelab dilemma:\nOur cluster uses an internal DNS server (Technitium) that resolves dev.thebestpractice.tech to a private IP (10.20.0.90). Let\u0026rsquo;s Encrypt needs to verify a TXT record on a public DNS server. The solution is a split-horizon DNS (or \u0026ldquo;split-brain\u0026rdquo;) setup. My primary domain, thebestpractice.tech, is managed by Cloudflare. However, my current Cloudflare plan doesn\u0026rsquo;t allow for the creation of separate, delegable sub-zones. To work around this while still using the powerful DNS-01 challenge, I will introduce GCP Cloud DNS for a very specific purpose.\nWe will configure the same domain, dev.thebestpractice.tech, in two different places:\nPublicly, on GCP Cloud DNS: This zone will be used only by Cert-Manager to solve the DNS-01 challenges. Let\u0026rsquo;s Encrypt will query this public zone. Privately, on Technitium DNS: This zone will continue to serve our internal network, resolving our services to their private IPs. To make this work, we must delegate the dev.thebestpractice.tech subdomain from our primary registrar (Cloudflare) to GCP\u0026rsquo;s nameservers.\nThis setup gives us the best of both worlds: the security of public validation and the privacy of internal resolution.\nCert-Manager: Your Automated Certificate Authority # With our DNS strategy in place, we can now deploy Cert-Manager. This powerful Kubernetes tool automates the entire lifecycle of TLS certificates. It will:\nWatch for Certificate resources. Communicate with Let\u0026rsquo;s Encrypt to initiate challenges. Create the necessary TXT records in GCP Cloud DNS using a Service Account. Verify the challenge and retrieve the signed certificate. Store the certificate in a Kubernetes Secret. Automatically renew the certificate before it expires. A critical piece of the configuration is telling Cert-Manager to use public DNS servers for its validation checks, ensuring it bypasses our internal Technitium DNS and can see the public records it creates in GCP.\nGitOps Implementation: Deploying Cert-Manager with ArgoCD # As always, we turn to our GitOps repository to declaratively manage the deployment.\nDirectory Structure # Following our established pattern, the configuration for Cert-Manager is laid out in our GitOps repository.\n. ├── base │ ├── cert-manager │ │ ├── cert-manager.yaml │ │ └── values.yaml │ └── secrets │ └── cert-manager │ └── cert-manager-dns-sa.yaml └── environments └── dev └── cert-manager ├── certificate │ ├── ClusterIssuer_letsencrypt.yaml │ └── cert-dev-tbp.yaml ├── custom-values │ └── custom-values.yaml └── root-certificate.yaml 1. The Base Application and DNS Resolver Configuration # First, we define the base ArgoCD Application for Cert-Manager. The most important part is in the Helm values, where we configure Cert-Manager to use public DNS resolvers for its validation checks. This ensures it can see the public TXT records it creates in GCP.\nenvironments/dev/cert-manager/custom-values/custom-values.yaml:\ncrds: # This option decides if the CRDs should be installed # as part of the Helm installation. enabled: true # Additional command line flags to pass to cert-manager controller binary. # The internal network use a local DNS -\u0026gt; Technitium DNS server # for Let\u0026#39;s Encrypt DNS-01 challenge validation we need to instruct cert-manager # to use public recursive nameservers only for DNS-01 challenge validation. extraArgs: - \u0026#39;--dns01-recursive-nameservers-only\u0026#39; - \u0026#39;--dns01-recursive-nameservers=8.8.8.8:53,1.1.1.1:53\u0026#39; 2. The GCP Service Account Secret # Cert-Manager needs credentials to modify DNS records in GCP. We create a GCP Service Account with the \u0026ldquo;DNS Administrator\u0026rdquo; role, generate a JSON key, and store it securely in 1Password.\nThen, we use ExternalSecret to sync this key into a Kubernetes Secret in the cert-manager namespace. This process relies on the External Secrets Operator and 1Password integration that I detailed in a previous post\u0026hellip; if you haven\u0026rsquo;t set up this foundation, I highly recommend reading that article first.\nbase/secrets/cert-manager/cert-manager-dns-sa.yaml:\napiVersion: external-secrets.io/v1 kind: ExternalSecret metadata: name: cert-manager-dns-sa namespace: cert-manager spec: secretStoreRef: kind: ClusterSecretStore name: op-cluster-secret-store target: creationPolicy: Owner data: - secretKey: service_account.json remoteRef: key: EXTSEC_1Password_GCP_TBP_DNS_Admin # 1Password item name property: password 3. The ClusterIssuer # This ClusterIssuer is the heart of our setup. It tells Cert-Manager how to issue certificates. We configure it to use the cloudDNS solver, pointing it to our GCP project and the Secret we just created.\nenvironments/dev/cert-manager/certificate/ClusterIssuer_letsencrypt.yaml:\napiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: annotations: argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true name: letsencrypt-cluster-issuer spec: acme: email: andrei@thebestpractice.com privateKeySecretRef: name: letsencrypt-issuer-account-key server: https://acme-v02.api.letsencrypt.org/directory solvers: - dns01: cloudDNS: hostedZoneName: dev-thebestpractice-tech project: diesel-polymer-445422-e3 # The GCP Project ID serviceAccountSecretRef: key: service_account.json name: cert-manager-dns-sa selector: dnsZones: - dev.thebestpractice.tech - \u0026#39;*.dev.thebestpractice.tech\u0026#39; 4. Requesting the Wildcard Certificate # Now we request the wildcard certificate that will secure all services in our dev environment. Cert-Manager will see this resource and begin the DNS-01 challenge process.\nenvironments/dev/cert-manager/certificate/cert-dev-tbp.yaml:\napiVersion: cert-manager.io/v1 kind: Certificate metadata: annotations: argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true name: wildcard.dev.thebestpractice.tech namespace: traefik spec: commonName: \u0026#39;*.dev.thebestpractice.tech\u0026#39; dnsNames: - \u0026#39;*.dev.thebestpractice.tech\u0026#39; issuerRef: kind: ClusterIssuer name: letsencrypt-cluster-issuer renewBefore: 360h0m0s secretName: wildcard-dev-thebestpractice-tech-cert-tls Once the challenge is complete, Cert-Manager will create a secret named wildcard-dev-thebestpractice-tech-cert-tls in the traefik namespace, containing the signed certificate and private key.\n5. Configuring Traefik for TLS # The final step is to tell our Traefik Gateway to use this new certificate. We update the Traefik Helm values to enable the websecure listener on port 8443 and reference the secret created by Cert-Manager.\nenvironments/dev/ingress/traefik/custom-values/override.values.yaml:\n# ... other values gateway: enabled: true listeners: websecure: port: 8443 protocol: HTTPS namespacePolicy: from: All mode: Terminate certificateRefs: - name: wildcard-dev-thebestpractice-tech-cert-tls # ... other values 6. Verification: The Padlock Appears # With all the pieces in place, the entire flow is automated.\nArgoCD syncs all our new manifests. Cert-Manager sees the Certificate resource and starts the DNS-01 challenge with Let\u0026rsquo;s Encrypt. It uses the GCP SA credentials to create a TXT record in GCP Cloud DNS. Let\u0026rsquo;s Encrypt verifies the record and issues the certificate. Cert-Manager saves it to the wildcard-dev-thebestpractice-tech-cert-tls secret. Traefik automatically loads the secret and begins serving traffic over HTTPS. Now, when we navigate to https://test.dev.thebestpractice.tech, we are greeted with the NGINX welcome page, but this time, it\u0026rsquo;s served securely with a valid TLS certificate.\nConclusion: A Production-Grade Networking Stack # This three-part series has systematically constructed a complete, production-grade networking stack on a bare-metal Kubernetes cluster. The final platform integrates several key technologies to achieve a level of automation and security on par with enterprise cloud environments:\nNetwork Load Balancing: Provided by MetalLB, enabling stable LoadBalancer IP addresses. Intelligent L7 Routing: Managed by Traefik using the modern Gateway API. Split-Horizon DNS: Implemented with Technitium for internal resolution and GCP Cloud DNS for public challenges. Fully Automated TLS: Orchestrated by Cert-Manager to issue and renew publicly trusted certificates from Let\u0026rsquo;s Encrypt. With every component managed declaratively through GitOps, the resulting infrastructure is reproducible, version-controlled, and resilient. This architecture transforms a standard homelab into a powerful personal platform, built with the same principles that drive modern production systems.\nStay tuned! Andrei\n","date":"6 February 2026","externalUrl":null,"permalink":"/the-path-to-automated-tls-part-3-automated-certificates-with-cert-manager/","section":"Blog","summary":"Locking it Down - From HTTP to HTTPS # In the preceding chapters, we established the networking foundation for a production-grade bare-metal Kubernetes platform.\n","title":"The Path to Automated TLS - Part 3  Automated Certificates with Cert-Manager","type":"blog"},{"content":"","date":"6 February 2026","externalUrl":null,"permalink":"/tags/tls/","section":"Tags","summary":"","title":"TLS","type":"tags"},{"content":"","date":"4 February 2026","externalUrl":null,"permalink":"/tags/gateway-api/","section":"Tags","summary":"","title":"Gateway-Api","type":"tags"},{"content":"","date":"4 February 2026","externalUrl":null,"permalink":"/tags/metallb/","section":"Tags","summary":"","title":"Metallb","type":"tags"},{"content":"","date":"4 February 2026","externalUrl":null,"permalink":"/tags/technitium/","section":"Tags","summary":"","title":"Technitium","type":"tags"},{"content":" From IP Address to Intelligent Gateway # In Chapter 1, we laid the foundational pillar by solving the bare-metal IP address problem with MetalLB. Our test NGINX service successfully acquired the IP 10.20.0.90, proving our cluster can now serve traffic like its cloud-native counterparts.\nBut an IP address alone is just a raw entry point. In any production-grade environment, you need an intelligent gateway to inspect traffic, route requests based on hostnames, and manage security. This is where an Ingress Controller, or in our case, a modern Gateway, comes into play.\nWelcome to Chapter 2. With a stable IP address from MetalLB, we will now deploy Traefik Proxy to act as our application gateway (or L7 router). To make our services accessible by name, we\u0026rsquo;ll also establish an internal DNS backbone with Technitium DNS Server. This combination will transform our raw IP into a fully-featured, name-based routing system.\nTraefik and the Rise of the Gateway API # For years, Ingress was the standard way to expose HTTP/S services in Kubernetes. However, it was plagued by limitations: a fragmented feature set dependent on annotations, and a lack of standardization across controllers.\nThe Gateway API is the official evolution of Ingress. It provides a standardized, role-oriented, and expressive API for managing traffic. Instead of one monolithic Ingress object, it splits the responsibility:\nGatewayClass: Defines a template for Gateways (e.g., traefik). Gateway: A request for a load balancer, managed by the infrastructure team. This binds to a specific IP (hello, MetalLB!) and defines listeners (e.g., port 443). HTTPRoute: Application-level routing rules, managed by developers. It defines how requests for a specific hostname are forwarded to backend services. This separation of concerns is a core tenet of enterprise GitOps, and it\u0026rsquo;s why we\u0026rsquo;re choosing the Gateway API for our homelab.\nDNS Deep Dive: Technitium as the Internal Backbone # Before we can route traffic by hostname, we need a DNS server that can resolve those names to an IP address. For a production-grade homelab, this means creating our own private DNS zones.\nTechnitium DNS Server is a powerful, self-hosted DNS server packed with features typically found in enterprise products. For our homelab, it serves a critical purpose: acting as the authoritative DNS server for our internal domains.\nA Note on Homelab Evolution: From Pi-hole to Technitium # Readers who have followed my blog from the beginning, particularly in my post \u0026ldquo;From Enterprise to Homelab: Transforming My Home Network\u0026rdquo;, might recall that I originally planned to use Pi-hole for DNS filtering. While Pi-hole is an excellent tool, my homelab is a constantly evolving platform. As I standardized my tooling, I decided to migrate to Technitium DNS Server. It offers advanced features like split-horizon DNS and conditional forwarding, which align better with my long-term goals. This is a natural part of the homelab process: you start with one tool, learn, and adapt as your requirements become more sophisticated.\nThe dev.thebestpractice.tech Zone # In my public DNS, I own the domain thebestpractice.tech. To create a safe, isolated space for my homelab experiments, I will create a new subdomain exclusively for this environment: dev.thebestpractice.tech.\nInside Technitium DNS Server, I will create a new Primary Zone for dev.thebestpractice.tech. For now, this zone is completely private and authoritative only within my local network.\nThe next step is to create an A record that points all traffic for this domain to the IP address of our Traefik gateway. After we deploy Traefik, its LoadBalancer service will receive the IP 10.20.0.90 from MetalLB.\nNow, any request for *.dev.thebestpractice.tech will resolve to 10.20.0.90, directing all traffic for our development environment to our Traefik gateway.\nA Note on Wildcard DNS in Production # Using a wildcard record is a fantastic shortcut for a homelab or development environment. It means we don\u0026rsquo;t have to create a new DNS entry every time we deploy a new service.\nHowever, in a true production environment, this is not a recommended practice. Production systems favor explicit, specific DNS records for each service (argo.dev.thebestpractice.tech, grafana.dev.thebestpractice.tech, etc.) for tighter security and control.\nWe will address this \u0026ldquo;production gap\u0026rdquo; in a future blog post, where we\u0026rsquo;ll integrate a tool like ExternalDNS. It will watch our Kubernetes cluster and automatically create precise DNS records for each HTTPRoute we create. For now, the wildcard is the perfect accelerator for our private setup.\nGitOps Implementation: Deploying Traefik with ArgoCD # Following the same enterprise GitOps pattern from Chapter 1, Traefik is deployed via a multi-layered ArgoCD configuration. This ensures our gateway is declarative, version-controlled, and consistent with the rest of our platform.\nDirectory Structure # The directory structure for Traefik mirrors the one we used for MetalLB, maintaining a clean separation between base configuration and environment-specific overrides.\n. ├── base │ ├── ingress │ │ ├── metallb │ │ │ └── ... │ │ └── traefik │ │ ├── traefik.yaml │ │ └── values.yaml ├── environments │ ├── dev │ │ ├── ingress │ │ │ ├── traefik │ │ │ │ ├── custom-values │ │ │ │ │ └── override.values.yaml │ │ │ │ └── root-traefik.yaml │ │ │ └── routes │ │ │ └── http-routes │ │ │ └── argocd.yaml └── ... 1. The Base Application Manifest # The base manifest defines the core ArgoCD Application for Traefik. Just like with MetalLB, it uses a multi-source pattern: one source points to the official Traefik Helm chart, and the other points to our own Git repository to fetch the values.yaml files.\nbase/ingress/traefik/traefik.yaml:\napiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: traefik namespace: argocd finalizers: - resources-finalizer.argocd.argoproj.io spec: destination: namespace: traefik server: https://kubernetes.default.svc project: argo-config sources: - repoURL: https://github.com/anvaplus/homelab-k8s-argo-config.git targetRevision: main ref: valuesRepo - repoURL: https://traefik.github.io/charts chart: traefik targetRevision: 39.0.0 helm: releaseName: traefik valueFiles: - $valuesRepo/base/ingress/traefik/values.yaml syncPolicy: automated: prune: true selfHeal: true 2. Base and Environment-Specific Helm Values # Our values are split. The base/ingress/traefik/values.yaml is kept minimal, containing only chart defaults or universal settings.\nThe real configuration happens in the environment overlay file, environments/dev/ingress/traefik/custom-values/override.values.yaml. This is where we enable the Gateway API and tell Traefik to request a LoadBalancer service.\nenvironments/dev/ingress/traefik/custom-values/override.values.yaml:\nproviders: kubernetesCRD: enabled: false # Disable legacy CRD provider kubernetesIngress: enabled: false # Disable legacy Ingress provider kubernetesGateway: enabled: true # Enable the new Gateway API provider # This tells Traefik to create a service of type LoadBalancer. service: type: LoadBalancer # We explicitly request the IP from our pool spec: loadBalancerIP: 10.20.0.90 # -- Traefik Gateway configuration # Change listeners namespace policy to all namespaces gateway: enabled: true listeners: web: port: 8000 protocol: HTTP namespacePolicy: from: All Notice we explicitly set loadBalancerIP: 10.20.0.90. This ensures Traefik gets the specific, predictable IP address we\u0026rsquo;ve allocated for it, which is essential for our DNS configuration to work.\n3. Deploying a Test Application # With Traefik deployed, the next step is to expose an application. To maintain consistency, we\u0026rsquo;ll use the same NGINX application from Chapter 1, but with one critical difference.\nSince Traefik now manages external access via its own LoadBalancer service, our application services no longer need to be of type LoadBalancer. They can be standard, internal ClusterIP services. Traefik will route traffic to them internally.\nFor quick verification, I\u0026rsquo;ll apply this manifest directly with kubectl.\napiVersion: apps/v1 kind: Deployment metadata: name: nginx-test-deployment namespace: default spec: replicas: 1 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - image: nginx name: nginx --- apiVersion: v1 kind: Service metadata: name: nginx-test-service namespace: default spec: selector: app: nginx ports: - protocol: TCP port: 80 targetPort: 80 type: ClusterIP # Note: No longer a LoadBalancer! 4. Exposing the Service with an HTTPRoute # Now we create an HTTPRoute resource to tell our gateway how to route traffic to our new nginx-test-service. This manifest instructs the traefik gateway to listen for requests for test.dev.thebestpractice.tech and forward them to our NGINX service.\nkind: HTTPRoute apiVersion: gateway.networking.k8s.io/v1beta1 metadata: name: nginx-test-route namespace: default spec: parentRefs: - kind: Gateway name: traefik namespace: traefik hostnames: [\u0026#34;test.dev.thebestpractice.tech\u0026#34;] rules: - matches: - path: type: PathPrefix value: / backendRefs: - name: nginx-test-service kind: Service port: 80 5. Verification: End-to-End Traffic Flow # After the HTTPRoute is deployed, the entire flow is complete. Let\u0026rsquo;s test it:\nA client makes a request to http://test.dev.thebestpractice.tech. The request hits our Technitium DNS server. Technitium resolves the name to 10.20.0.90 because of our wildcard A record. The request is sent to the Traefik LoadBalancer service at that IP. Traefik inspects the request\u0026rsquo;s Host header (test.dev.thebestpractice.tech). It matches the HTTPRoute rule and forwards the traffic to the nginx-test-service. To confirm our success, we can open a web browser on a machine that uses Technitium for DNS resolution and navigate to http://test.dev.thebestpractice.tech. The result is the default NGINX welcome page, served through our new gateway.\nSuccess! We have established a complete, name-based routing system, from DNS to gateway to service.\nConclusion: The Gateway is Open # We now have an intelligent entry point into our cluster. MetalLB provides the stable IP, and Traefik\u0026rsquo;s Gateway routes traffic based on hostnames. Inside our network, our internal Technitium DNS server resolves hostnames in the dev.thebestpractice.tech zone to Traefik\u0026rsquo;s private IP, completing the internal traffic loop. This setup mirrors the L4/L7 load balancing and service discovery patterns of a real enterprise cloud.\nBut there\u0026rsquo;s one critical piece missing: automated TLS. Our gateway is ready, but it\u0026rsquo;s not yet terminating encrypted traffic.\nIn the Chapter 3, we will tackle this by implementing a sophisticated split-horizon DNS strategy. We will use the same dev.thebestpractice.tech zone in two places:\nPublicly, to perform DNS-01 challenges with Let\u0026rsquo;s Encrypt. Internally, with Technitium, for local resolution. We will instruct Cert-Manager to use the public DNS provider for its challenges, even though the cluster itself uses Technitium for DNS. This allows us to get publicly trusted certificates for our private services, achieving a true production-grade setup.\nStay tuned! Andrei\n","date":"4 February 2026","externalUrl":null,"permalink":"/the-path-to-automated-tls-part-2-the-gateway-to-the-cluster-traefik-and-technitium/","section":"Blog","summary":"From IP Address to Intelligent Gateway # In Chapter 1, we laid the foundational pillar by solving the bare-metal IP address problem with MetalLB. Our test NGINX service successfully acquired the IP 10.20.0.90, proving our cluster can now serve traffic like its cloud-native counterparts.\n","title":"The Path to Automated TLS - Part 2 The Gateway to the Cluster - Traefik and Technitium","type":"blog"},{"content":"","date":"4 February 2026","externalUrl":null,"permalink":"/tags/traefik/","section":"Tags","summary":"","title":"Traefik","type":"tags"},{"content":" The Path to Automated TLS: A Three-Part Guide # The path to achieving fully automated, production-grade TLS on a bare-metal Kubernetes homelab is a rewarding but detailed journey. To do it justice, I\u0026rsquo;ve structured this guide as a three-part series\u0026hellip; a continuous story where each post builds on the last. Frankly, cramming everything into a single, monolithic article would be an overwhelming read.\nInstead, we\u0026rsquo;ll walk through it chapter by chapter:\nChapter 1 (This Post): Bridging the Gap with MetalLB. We\u0026rsquo;ll solve the first major hurdle of bare-metal Kubernetes: getting a real, reliable IP address for our services. Chapter 2: The Gateway to the Cluster. With an IP in hand, we\u0026rsquo;ll deploy Traefik, Gateway API and set up an internal DNS backbone. Chapter 3: Locking it Down. We\u0026rsquo;ll use a public DNS zone to satisfy Let\u0026rsquo;s Encrypt\u0026rsquo;s validation, while our internal Technitium DNS server handles all traffic, allowing us to secure internal services with a publicly trusted certificate. In my years architecting platform solutions in fintech, the cloud was our playground. Need an external endpoint for a service? A few lines of Terraform, and voilà\u0026hellip; an AWS or GCP Load Balancer would appear, complete with a public IP address, ready to handle traffic. It was simple, reliable, and completely abstracted away the underlying network complexities.\nBut when building a production-grade homelab, we don\u0026rsquo;t have that luxury. We\u0026rsquo;re on bare metal. Deploying a Kubernetes service of type LoadBalancer results in a pending state indefinitely. Why? Because there\u0026rsquo;s no cloud provider to fulfill that request.\nThis is the first major hurdle in bridging the gap between enterprise cloud and a homelab environment. We need to provide our own network load balancer. This is \u0026ldquo;Chapter 1\u0026rdquo; of the series, where we lay the foundational network layer on our path to automated TLS.\nWhy MetalLB is the Enterprise Choice for Homelabs # In the cloud, a Load Balancer is a managed service that automatically assigns an external IP and routes traffic to your Kubernetes services. On-premise, we need a tool that can do the same. While there are several options, MetalLB stands out for its simplicity and robustness, making it the de-facto standard for bare-metal clusters.\nA Homelab Analogy for Cloud Load Balancing # To understand MetalLB\u0026rsquo;s role, it\u0026rsquo;s best to draw a direct parallel to how a managed load balancer works in a cloud like GCP or AWS. They solve the same problem, just with different underlying tools.\nFeature / Analogy Google Cloud Load Balancer MetalLB in a Homelab Core Function Exposes a Service with a stable, external IP address. Exposes a Service with a stable, internal IP address. Activation Trigger Creating a Service of type: LoadBalancer in Kubernetes. Creating a Service of type: LoadBalancer in Kubernetes. IP Address Source Provisions an IP from Google Cloud\u0026rsquo;s massive address pools. Assigns an IP from a user-defined private network range. Network Mechanism Integrates with Google\u0026rsquo;s proprietary Virtual Private Cloud (VPC) and SDN. Uses standard, open protocols like Layer 2 (ARP) or BGP. High Availability Managed service with built-in redundancy across zones. Achieved via the \u0026ldquo;speaker\u0026rdquo; protocol; multiple nodes can announce the IP, and if one fails, another takes over. From the perspective of Kubernetes, the result is identical: a Service requests an external IP, and one is provided. This makes MetalLB the perfect, production-minded stand-in for a cloud load balancer in a bare-metal environment.\nGitOps Implementation: Deploying MetalLB with ArgoCD # As with all components in my platform, MetalLB is deployed via ArgoCD. This approach ensures that my infrastructure is declarative, version-controlled, and reproducible. I\u0026rsquo;ve detailed the philosophy behind my multi-repo setup in a previous post, \u0026ldquo;The Four-Repo GitOps Structure for My Homelab Platform,\u0026rdquo; which I recommend reading to understand the full context of my GitOps architecture.\nDirectory Structure # My GitOps repository has a nested structure to keep concerns separated. The MetalLB configuration resides under ingress.\n. ├── base │ ├── ingress │ │ ├── metallb │ │ │ ├── metallb.yaml │ │ │ └── values.yaml │ │ └── traefik │ │ └── ... ├── environments │ ├── dev │ │ ├── ingress │ │ │ ├── metallb │ │ │ │ └── lb-config.yaml └── ... 1. The Base Application Manifest # The core of the GitOps deployment is the ArgoCD Application manifest. This one is a bit more advanced as it uses a multi-source pattern. One source points to the official Helm chart, and the other points to our own Git repository to fetch the values.yaml.\nbase/ingress/metallb/metallb.yaml:\napiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: metallb namespace: argocd spec: destination: namespace: metallb-system server: https://kubernetes.default.svc project: argo-config sources: - repoURL: https://github.com/anvaplus/homelab-k8s-argo-config.git targetRevision: main ref: valuesRepo - repoURL: https://metallb.github.io/metallb chart: metallb targetRevision: 0.15.3 helm: releaseName: metallb valueFiles: - $valuesRepo/base/ingress/metallb/values.yaml syncPolicy: automated: prune: true selfHeal: true syncOptions: - CreateNamespace=true 2. The Base Helm Values # In keeping with enterprise GitOps practices, the values.yaml file is intentionally kept minimal. We rely on the Helm chart\u0026rsquo;s defaults for the base configuration. This ensures that our setup is predictable and easy to upgrade. All customizations are handled in environment-specific overlays.\nbase/ingress/metallb/values.yaml:\n# This are the values chart defaults. # All custom configurations are managed in environment-specific overlays. 3. Environment-Specific IP Address Pools # The real configuration happens in the environments layer. Here, we use a Kustomize overlay to apply our IPAddressPool and L2Advertisement resources. This is where we define the actual IP addresses that MetalLB will manage.\nThis configuration is stored in environments/dev/ingress/metallb/custom-values/lb-config.yaml:\napiVersion: metallb.io/v1beta1 kind: IPAddressPool metadata: name: default-pool namespace: metallb-system spec: addresses: - 10.20.0.90-10.20.0.95 # A dedicated range for production services --- apiVersion: metallb.io/v1beta1 kind: L2Advertisement metadata: name: default-pool-advertisement namespace: metallb-system spec: ipAddressPools: - default-pool After deploying this with ArgoCD, any service of type LoadBalancer will automatically receive an IP from the 10.20.0.90 - 10.20.0.95 range.\n4. Verification: Putting MetalLB to the Test # With our configuration live, let\u0026rsquo;s verify it works as expected. The ultimate test is to create a Service of type LoadBalancer and see if MetalLB assigns it an IP from our pool.\nWe can deploy a simple NGINX server for this purpose. Here is the manifest:\napiVersion: apps/v1 kind: Deployment metadata: name: nginx-test-deployment namespace: default spec: replicas: 1 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - image: nginx name: nginx --- apiVersion: v1 kind: Service metadata: name: nginx-test-service namespace: default spec: selector: app: nginx ports: - protocol: TCP port: 80 targetPort: 80 type: LoadBalancer Once you apply this manifest (either manually with kubectl apply -f or via GitOps), Kubernetes will request a load balancer. MetalLB will see this request and assign the first available IP from our default-pool.\nA quick check with kubectl confirms our success. Notice the EXTERNAL-IP field is now populated with 10.20.0.90:\n❯ kubectl get services -n default NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.96.0.1 \u0026lt;none\u0026gt; 443/TCP 19d nginx-test-service LoadBalancer 10.111.75.192 10.20.0.90 80:32251/TCP 2m22s Success! Our bare-metal cluster now behaves like a cloud environment, assigning a real, routable IP address to our services.\nConclusion: The First Pillar is in Place # We\u0026rsquo;ve successfully replaced a core piece of cloud infrastructure with a robust, self-hosted solution. MetalLB provides the fundamental building block: a stable IP address for our services. We\u0026rsquo;ve laid the concrete, and our homelab is one step closer to mirroring the capabilities of an enterprise cloud environment.\nIn Chapter 2, we\u0026rsquo;ll build on this foundation. With a stable IP address, we now need an intelligent entry point to manage and route traffic to our applications. We\u0026rsquo;ll install Traefik Proxy using the Gateway API and set up Technitium DNS to create a powerful internal DNS backbone for our cluster.\nStay tuned! Andrei\n","date":"2 February 2026","externalUrl":null,"permalink":"/the-path-to-automated-tls-part-1-bridging-the-gap-networking-with-metallb/","section":"Blog","summary":"The Path to Automated TLS: A Three-Part Guide # The path to achieving fully automated, production-grade TLS on a bare-metal Kubernetes homelab is a rewarding but detailed journey. To do it justice, I’ve structured this guide as a three-part series… a continuous story where each post builds on the last. Frankly, cramming everything into a single, monolithic article would be an overwhelming read.\n","title":"The Path to Automated TLS - Part 1 Bridging the Gap - Networking with MetalLB","type":"blog"},{"content":"","date":"23 January 2026","externalUrl":null,"permalink":"/tags/1password/","section":"Tags","summary":"","title":"1password","type":"tags"},{"content":"","date":"23 January 2026","externalUrl":null,"permalink":"/tags/external-secrets/","section":"Tags","summary":"","title":"External-Secrets","type":"tags"},{"content":"After building a Kubernetes cluster and setting up Argo CD to manage its configuration, what\u0026rsquo;s the very next thing you should install? For me, both in production and in my homelab, the answer is always the same: External Secrets Operator. This post explains why and shows you how I integrate it with 1Password to bring enterprise-grade secret management to my home setup.\nIn my previous posts, I\u0026rsquo;ve walked through building a homelab network, choosing the hardware, and even automating Kubernetes deployments with Talos and GitOps. But none of that is complete without a robust way to handle secrets.\nWhy External Secrets is Crucial # Everything needs secrets. From database passwords and API keys to TLS certificates for mTLS, your applications can\u0026rsquo;t function without them. The worst thing you can do is hardcode them in your Git repository. A better, but still flawed, approach is to use sealed secrets. The best practice, however, is to sync them from a dedicated secret manager.\nThis is where External Secrets Operator comes in. It allows your Kubernetes cluster to fetch secrets from an external source, like AWS Secrets Manager, Azure Key Vault, or, in my case, 1Password, and automatically create native Kubernetes Secret objects.\nFor my homelab, I chose 1Password for a simple reason: I already use it and pay for it. It\u0026rsquo;s my trusted password manager, and its integration with External Secrets means I can use it as a stand-in for the cloud-native secret stores I use in production environments. This approach bridges the gap between enterprise best practices and a practical homelab implementation.\nThe Integration: External Secrets and 1Password # To make this work, we need two components in the cluster:\n1Password Connect: A service that provides a bridge between the Kubernetes cluster and the 1Password API.\nExternal Secrets Operator: The operator that watches for ExternalSecret resources and uses providers like 1Password Connect to create Kubernetes secrets.\nHere’s a step-by-step guide to how I set it up.\nPrerequisites # This guide assumes you have:\nA running Kubernetes cluster. If not, you can follow my guide on provisioning a Talos cluster.\nArgo CD installed and managing itself via GitOps, as detailed in my post on locking down Cilium with Argo CD.\nThe 1Password CLI installed on your local machine.\nStep 1: Prepare 1Password # First, we need to set up 1Password to allow our cluster to connect.\nCreate a new vault for your homelab secrets. This isolates them from your personal credentials.\nop vault create \u0026#34;homelab-k8s\u0026#34; Create a 1Password Connect Server configuration. This command links the Connect server to your new vault and generates a 1password-credentials.json file that the Connect server will use to authenticate.\nop connect server create \u0026#34;kubernetes\u0026#34; --vaults \u0026#34;homelab-k8s\u0026#34; This will prompt you to save the 1password-credentials.json file. Keep it safe.\nCreate a Kubernetes secret from the credentials file. The 1Password Connect operator needs this file to start. I create it in the external-secrets namespace, which I use for all related components.\nkubectl create secret generic op-credentials -n external-secrets --from-literal=1password-credentials.json=\u0026#34;$(cat /path/to/1password-credentials.json | base64)\u0026#34; Generate an access token for the External Secrets Operator. This token allows the External Secrets Operator to authenticate with the 1Password Connect server.\nexport OP_ACCESS_TOKEN=$(op connect token create \u0026#34;external-secret-operator\u0026#34; --server \u0026#34;kubernetes\u0026#34; --vault \u0026#34;homelab-k8s\u0026#34;) Create a Kubernetes secret for the access token.\nkubectl create secret -n external-secrets generic op-access-token --from-literal=token=$OP_ACCESS_TOKEN At this point, you have two secrets in your cluster: op-credentials and op-access-token.\nStep 2: Deploy the Operators with Argo CD # I use Argo CD to manage the deployment of both the 1Password Connect server and the External Secrets Operator using their official Helm charts. This is all managed declaratively through my homelab-k8s-argo-config repository, following the \u0026ldquo;app-of-apps\u0026rdquo; pattern I\u0026rsquo;ve described previously.\nAdding a new tool to the platform follows a clear, repeatable process:\nBase Configuration: I add the base Application manifest for the new tool (in this case, external-secrets and onepassword-connect) to the base/ directory of the repository. This points to the official Helm chart and sets up the default configuration.\nEnvironment Overlays: I then create environment-specific overrides in the environments/dev/ directory. This allows me to customize the installation for my development cluster.\nApp-of-Apps: Finally, the root application in environments/dev/_root/ discovers and deploys the new manifests, bringing the tools online.\nHere is how the external-secrets application looks in Argo CD after being deployed through this GitOps workflow.\nAnd here is the 1Password Connect application, which provides the bridge to the 1Password API.\nThis declarative approach ensures that my secret management infrastructure is version-controlled, auditable, and automatically reconciled, just like any other component of my platform. You can find the 1Password chart here.\nStep 3: Create a ClusterSecretStore # With the operators running, the final piece is to tell the External Secrets Operator how to connect to 1Password. We do this by creating a ClusterSecretStore.\napiVersion: external-secrets.io/v1 kind: ClusterSecretStore metadata: name: onepassword-cluster-secret-store namespace: external-secrets spec: provider: onepassword: connectHost: http://onepassword-connect:8080 vaults: homelab-k8s: 1 # The vault to search in, with priority. auth: secretRef: connectTokenSecretRef: name: op-access-token # The secret with the access token key: token namespace: external-secrets This resource, also managed by Argo CD, configures the connection to the onepassword-connect service and specifies which vault to use.\nIn the 1password UI the error will disappear and you will see the connect server version.\nStep 4: Test the Integration # The best way to test the setup is to use it to manage the very secrets we just created. I store the contents of 1password-credentials.json and the OP_ACCESS_TOKEN in 1Password and then use an ExternalSecret to sync them back to the cluster.\nHere’s how you can define the ExternalSecret resources:\n--- apiVersion: external-secrets.io/v1 kind: ExternalSecret metadata: name: op-credentials namespace: external-secrets spec: secretStoreRef: kind: ClusterSecretStore name: onepassword-cluster-secret-store target: creationPolicy: Owner template: engineVersion: v2 data: 1password-credentials.json: \u0026#34;{{ .opCredentials | b64enc }}\u0026#34; data: - secretKey: opCredentials remoteRef: key: EXTSEC_1Password_k8s_connect_cluster property: 1password-credentials.json --- apiVersion: external-secrets.io/v1 kind: ExternalSecret metadata: # name of the ExternalSecret \u0026amp; Secret which gets created name: op-access-token namespace: external-secrets spec: secretStoreRef: kind: ClusterSecretStore name: onepassword-cluster-secret-store target: creationPolicy: Owner data: - secretKey: token remoteRef: # 1password-entry-name key: EXTSEC_1Password_k8s_server_access_token # 1password-field property: password One important detail in this setup is that 1Password Connect expects its mounted credentials file to be base64-encoded, even though the value stored in 1Password is plain JSON. To handle that cleanly, I use an External Secrets template to apply b64enc before writing the final Kubernetes secret. This keeps the source value readable in 1Password while ensuring the generated secret matches the format expected by the application.\nOnce I commit this manifest to my GitOps repository, Argo CD applies it, and the External Secrets Operator springs into action. It fetches the token from my homelab-k8s vault in 1Password and creates the op-access-token Kubernetes secret. Now my secrets are managed through GitOps, just like the rest of my cluster configuration.\nConclusion # By integrating External Secrets with 1Password, I\u0026rsquo;ve created a robust, secure, and automated way to manage secrets in my homelab. This setup mirrors the patterns used in enterprise environments, providing a valuable learning experience while keeping my homelab secure.\nWith this foundation in place, I can now move on to deploying applications that rely on these secrets, which I\u0026rsquo;ll cover in future posts. Stay tuned! Andrei\n","date":"23 January 2026","externalUrl":null,"permalink":"/from-vault-to-pod-automating-kubernetes-secrets-with-1password-and-external-secrets/","section":"Blog","summary":"After building a Kubernetes cluster and setting up Argo CD to manage its configuration, what’s the very next thing you should install? For me, both in production and in my homelab, the answer is always the same: External Secrets Operator. This post explains why and shows you how I integrate it with 1Password to bring enterprise-grade secret management to my home setup.\n","title":"From Vault to Pod: Automating Kubernetes Secrets with 1Password and External Secrets","type":"blog"},{"content":"","date":"18 January 2026","externalUrl":null,"permalink":"/tags/cilium/","section":"Tags","summary":"","title":"Cilium","type":"tags"},{"content":"In my last post, Stop Using the Wrong CNI: Why Your Homelab Deserves Cilium in 2026, we established a production-grade networking foundation for our Talos Kubernetes cluster. But a powerful CNI is only half the story. To truly manage our cluster like a professional, we must automate and declare everything.\nThis post details the next logical step: bringing our manually installed Cilium under the declarative management of Argo CD. This is a critical milestone that transitions our cluster from being \u0026ldquo;configured\u0026rdquo; to being \u0026ldquo;managed.\u0026rdquo; We will install Argo CD, bootstrap it to manage itself (a classic GitOps inception pattern), and then delegate control of Cilium to it.\nAs always, all configuration files are open source and available in my GitHub repositories.\nThe Road to Declarative Management # In the last article, we installed Cilium with a helm install command. While effective, this imperative approach creates a fragile, hard-to-track state. How do we reliably track configuration changes? How do we roll back if something goes wrong?\nThis is the core problem GitOps solves. By declaring our desired state in Git, we gain a single source of truth, a perfect audit trail via commits, and the power to eliminate configuration drift. Argo CD continuously ensures our live cluster matches the state defined in Git.\nThe transition from this manual state to a declarative one requires a critical, one-time bootstrap process. This is the hand-off, where we transition from imperative commands to a fully automated system, and it\u0026rsquo;s the practical implementation of the architecture I designed in The Four-Repo GitOps Structure for My Homelab Platform.\nThe Last Manual Steps # Before the GitOps engine can take over, we must perform our last imperative actions:\nProvision the Base System: We start with a running Talos cluster provisioned via Omni and a manually installed Cilium CNI. Install the GitOps Engine: We perform a standard Helm installation of Argo CD. This places the core components into our cluster, but as a blank slate, unaware of our repositories. helm repo add argo https://argoproj.github.io/argo-helm helm repo update kubectl create namespace argocd helm install argo-cd argo/argo-cd --version 9.3.4 -n argocd Seeding the GitOps Engine # This is the most important step. We apply a single, crucial manifest that kickstarts the entire GitOps workflow:\nkubectl create -f https://raw.githubusercontent.com/anvaplus/homelab-k8s-argo-config/main/_initial_setup/project-argo-config.yaml This command applies the project-argo-config.yaml file, which defines the root AppProject. With this manifest applied, we have officially handed over control to Git.\napiVersion: argoproj.io/v1alpha1 kind: AppProject metadata: name: argo-config namespace: argocd spec: clusterResourceWhitelist: - group: \u0026#39;*\u0026#39; kind: \u0026#39;*\u0026#39; destinations: - name: \u0026#39;*\u0026#39; namespace: \u0026#39;*\u0026#39; server: https://kubernetes.default.svc sourceRepos: - https://github.com/anvaplus/homelab-k8s-argo-config - https://github.com/anvaplus/homelab-k8s-base-manifests - https://github.com/anvaplus/homelab-k8s-environments - https://github.com/anvaplus/homelab-k8s-environments-apps - https://argoproj.github.io/argo-helm - https://helm.cilium.io Why This First Step is So Important # That one project-argo-config.yaml file is the key to the whole setup. Think of it as the instruction manual that tells Argo CD what to do next. It’s not just another config file; it’s the starting point for our entire automated platform. Here’s what it does:\nIt Builds Trust: The sourceRepos list tells Argo CD, \u0026ldquo;Only look at these specific Git repositories.\u0026rdquo; This is a basic but crucial security step to prevent it from running code from somewhere else. It Kicks Off the Process: This project points to our homelab-k8s-argo-config repository, which holds instructions for all our other apps. This is the \u0026ldquo;app-of-apps\u0026rdquo; pattern. Argo CD sees this main instruction and then immediately starts installing the other apps it finds there: in this first phase Argo CD itself and Cilium. It Manages Itself: Because one of the apps it installs is Argo CD, the system can now manage its own updates. To upgrade Argo CD in the future, we just change the code in Git, and Argo CD will apply the update to itself. It Keeps Things Running: This setup automatically fixes things. If someone accidentally deletes or changes a component in the cluster, Argo CD will notice that the live state doesn\u0026rsquo;t match the Git repository and will automatically put it back the way it should be. From this point forward, our manual work is done. Every future change to our platform\u0026rsquo;s core components will be a pull request.\nInside the GitOps Workflow # With Argo CD bootstrapped, let\u0026rsquo;s examine the homelab-k8s-argo-config repository to see how it manages our core components. This structure is the heart of our declarative system. The current structure of this repository is as follows, and it will evolve as more tools are installed in the homelab:\n├── base/ # Base configurations for all tools │ ├── argocd/ # ArgoCD itself configuration │ └── cilium/ # Cilium CNI configuration │ └── projects/ # ArgoCD project definitions └── environments/ # Environment-specific overlays ├── dev/ # Development environment configs │ ├── _root/ # Root application for the dev environment │ ├── argocd/ # ArgoCD overrides for dev │ └── cilium/ # Cilium overrides for dev │ └── projects/ # ArgoCD project overrides for dev └── prod/ # Production environment configs The repository follows a clear, Kustomize-driven structure:\nbase/: Contains the generic, reusable Application manifests for our platform tools (e.g., Argo CD, Cilium), pointing to their official Helm charts. These are the default settings.\nenvironments/: Contains environment-specific overrides. Each environment (dev, prod, etc.) has a subdirectory where Kustomize patches can modify the base configurations, for instance, to change replica counts or domain names.\n_root/: This directory in each environment implements the \u0026ldquo;app-of-apps\u0026rdquo; pattern. It contains a kustomization.yaml file that assembles all the applications for that environment. Applying this single directory (kubectl apply -k environments/dev/_root) installs the root Application which, in turn, manages all other applications for that environment.\nThe root project we bootstrapped points to this _root application, which then deploys the Application resources for Argo CD and Cilium, ensuring they are managed declaratively. To add a new tool, we simply add its base configuration to base/ and customize it in environments/. The app-of-apps structure ensures it\u0026rsquo;s deployed automatically.\nA Note on Production-Grade Configuration\nA key detail in the Cilium Application manifest is the use of ignoreDifferences. This setting is crucial for preventing reconciliation loops caused by secrets containing automatically generated certificates (e.g., for Hubble). By telling Argo CD to ignore these specific fields, we maintain a clean Git history while allowing the in-cluster components to manage their own dynamic secrets.\nHere is the relevant snippet from the Cilium Application manifest:\napiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: cilium namespace: argocd spec: ignoreDifferences: - kind: Secret name: cilium-ca namespace: kube-system jsonPointers: - /data/ca.crt - /data/ca.key - kind: Secret name: hubble-ca-secret namespace: kube-system jsonPointers: - /data/ca.crt - /data/tls.crt - /data/tls.key # ... other spec fields syncPolicy: syncOptions: - RespectIgnoreDifferences=true automated: prune: true selfHeal: true We also add a syncOption to ensure this ignore rule is respected during sync operations. This is a perfect example of a small but vital tweak needed for a robust, real-world GitOps implementation.\nConclusion: Full GitOps Control # We\u0026rsquo;ve now achieved a major milestone in our enterprise-to-homelab setup. Our cluster\u0026rsquo;s networking layer is no longer just a manual configuration; it\u0026rsquo;s a version-controlled, declarative state managed by a GitOps pipeline.\nThis setup provides immense power and safety. We can now experiment with network policies, update our CNI, and observe every change through the lens of Git. This is how modern, production-grade platforms are managed, and now it\u0026rsquo;s how my homelab runs.\nIn the next post, we\u0026rsquo;ll leverage this GitOps foundation to start deploying applications from our environments repository, truly bringing the four-repo model to life.\nStay tuned! Andrei\n","date":"18 January 2026","externalUrl":null,"permalink":"/stop-drifting-how-to-lock-down-your-cilium-cni-with-argo-cd/","section":"Blog","summary":"In my last post, Stop Using the Wrong CNI: Why Your Homelab Deserves Cilium in 2026, we established a production-grade networking foundation for our Talos Kubernetes cluster. But a powerful CNI is only half the story. To truly manage our cluster like a professional, we must automate and declare everything.\n","title":"Stop Drifting: How to Lock Down Your Cilium CNI with Argo CD","type":"blog"},{"content":"","date":"11 January 2026","externalUrl":null,"permalink":"/tags/k8s/","section":"Tags","summary":"","title":"K8s","type":"tags"},{"content":"In my last post, The Four-Repo GitOps Structure for My Homelab Platform, I laid out the architectural blueprint for managing my homelab like a production environment. Building on the automation I detailed in my popular post, Need for Speed: Automating Proxmox K8s Clusters with Talos Omni, we now have a cluster ready for a production-grade CNI. Now that we have a solid GitOps foundation and a running Talos Kubernetes cluster, it’s time to address a critical component: networking.\nChoosing a Container Network Interface (CNI) is one of the most important decisions you’ll make when setting up a Kubernetes cluster. It dictates how your pods communicate with each other, how you enforce security policies, and how you observe network traffic. In an enterprise environment, this choice has significant implications for performance, security, and scalability. So, why should a homelab be any different?\nAfter careful consideration and drawing from my experience in building enterprise platforms, I chose Cilium as the CNI for my Talos Kubernetes cluster. In this post, I’ll walk you through my decision-making process, compare Cilium with other popular CNIs like Flannel and Calico, and explain why Cilium is the key to unlocking a production-grade networking experience in your homelab.\nAs always, everything you see here is open source. You can find all the configuration files and code in my GitHub repository.\nThe CNI Showdown: Flannel vs. Calico vs. Cilium # Before we dive into why I chose Cilium, let\u0026rsquo;s briefly compare the three most popular CNIs in the Kubernetes ecosystem.\nFlannel: The Simple Starter # Flannel is one of the oldest and simplest CNIs available. It\u0026rsquo;s designed to be easy to set up and provides a basic overlay network for your cluster.\nPros: Extremely easy to install and configure. Good for beginners and simple use cases. Cons: Lacks advanced features like network policies. Performance can be a bottleneck due to its reliance on a simple overlay network. For a homelab that aims to replicate a production environment, Flannel is too basic. It doesn\u0026rsquo;t provide the security and observability features that are standard in the enterprise.\nCalico: The Network Policy Powerhouse # Calico is a popular CNI known for its robust network policy enforcement. It uses BGP to create a non-overlay network, which can offer better performance than Flannel.\nPros: Excellent network policy support. High performance due to its non-overlay network architecture. Cons: Can be more complex to configure and troubleshoot than Flannel. Relies on traditional networking principles, which can be less flexible than newer technologies. Calico is a solid choice and a significant step up from Flannel. However, it\u0026rsquo;s the next contender that truly brings the future of cloud-native networking to the table.\nCilium: The eBPF-Powered Future # Cilium is a modern CNI that leverages the power of eBPF to provide networking, observability, and security. eBPF allows Cilium to operate directly within the Linux kernel, offering significant performance and security advantages.\nPros:\neBPF-Powered: High performance, low latency, and efficient use of resources. Rich Security Features: Advanced network policies, identity-based security, and transparent encryption. Deep Observability: Hubble, Cilium\u0026rsquo;s observability platform, provides detailed insights into network traffic. Service Mesh Capabilities: Can replace a traditional service mesh like Istio for many use cases. Cons:\nRequires a modern Linux kernel (which is not an issue with Talos). Can have a steeper learning curve due to its advanced features. Why Cilium is the Perfect Fit for a Production-Grade Homelab # For a homelab that aims to mirror the capabilities of an enterprise environment, Cilium is the undisputed winner. Here’s why:\nIt Feels Like Production: Cilium is used by major enterprises and cloud providers. By using it in my homelab, I’m gaining experience with a tool that is at the forefront of cloud-native networking. This aligns perfectly with my goal of bridging the gap between enterprise and homelab.\nUnmatched Performance: eBPF allows Cilium to bypass traditional networking stacks and provide a direct, high-performance path for network traffic. This is crucial for running latency-sensitive applications, even in a homelab.\nAdvanced Security Out of the Box: With Cilium, I can enforce granular, identity-based network policies. This is a huge step towards a Zero Trust security model, a concept I’ve implemented in many enterprise environments.\nHubble: Observability on Steroids: Hubble provides incredible visibility into the network traffic in my cluster. I can see exactly which services are communicating, what protocols they are using, and whether any connections are being dropped. This is invaluable for troubleshooting and understanding the behavior of my applications.\nInstalling Cilium on Talos # As I documented in my previous posts on building a Talos Kubernetes cluster, Talos is a modern, secure, and minimal OS for Kubernetes. Installing Cilium on Talos is straightforward, but it requires a few specific configuration steps to ensure everything works seamlessly.\nHere’s how I did it in my homelab. All the configuration files mentioned here are available in my GitHub repository.\nStep 1: Disable the Default CNI and kube-proxy in Talos # Cilium replaces the functionality of kube-proxy and provides its own CNI, so we need to disable the defaults in our Talos cluster configuration. This is done by creating two patch files.\nFirst, create a patch to disable the CNI:\n# patches/cni.yaml cluster: network: cni: name: none Next, create a patch to disable kube-proxy:\n# patches/disable-kube-proxy.yaml cluster: proxy: disabled: true Then, reference these patches in your Talos cluster template. This ensures that your nodes are provisioned without a default networking layer, ready for Cilium.\n# cluster-template/k8s-dev-dhcp.yaml kind: Cluster name: k8s-dev-dhcp # ... other configuration patches: - name: no-cni file: patches/cni.yaml - name: disable-kube-proxy file: patches/disable-kube-proxy.yaml After applying the cluster template, your nodes will appear as \u0026ldquo;Not Ready.\u0026rdquo; This is expected behavior because Kubernetes nodes are only marked as ready once a CNI is running.\nStep 2: Install Cilium with Helm # With the cluster prepared, the next step is to install Cilium using Helm. This command installs Cilium with kubeProxyReplacement enabled, which is the key to unlocking its eBPF-powered performance.\nhelm install \\ cilium \\ cilium/cilium \\ --version 1.15.1 \\ --namespace kube-system \\ --set ipam.mode=kubernetes \\ --set kubeProxyReplacement=true \\ --set securityContext.capabilities.ciliumAgent=\u0026#34;{CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID}\u0026#34; \\ --set securityContext.capabilities.cleanCiliumState=\u0026#34;{NET_ADMIN,SYS_ADMIN,SYS_RESOURCE}\u0026#34; \\ --set cgroup.autoMount.enabled=false \\ --set cgroup.hostRoot=/sys/fs/cgroup \\ --set k8sServiceHost=localhost \\ --set k8sServicePort=7445 After running the Helm command, it will take a few minutes for the Cilium pods to be deployed and become operational. Once they are running, your Kubernetes nodes will transition to a \u0026ldquo;Ready\u0026rdquo; state, and your cluster will be fully networked with Cilium.\nConclusion: A Foundation for GitOps-Managed Networking # By choosing Cilium, I’ve laid a networking foundation that is not only powerful and secure but also aligns with the latest trends in cloud-native technology. It’s a choice that reinforces the \u0026ldquo;enterprise-to-homelab\u0026rdquo; philosophy that drives this blog.\nWith the CNI now in place, the next logical step is to bring it under the control of our GitOps workflow. In my next post, I\u0026rsquo;ll walk through the process of installing Argo CD and configuring it to manage Cilium. This will complete the loop, allowing us to manage our cluster\u0026rsquo;s networking declaratively, just as we would in a production environment.\nStay tuned. Andrei\n","date":"11 January 2026","externalUrl":null,"permalink":"/stop-using-the-wrong-cni-why-your-homelab-deserves-cilium-in-2026/","section":"Blog","summary":"In my last post, The Four-Repo GitOps Structure for My Homelab Platform, I laid out the architectural blueprint for managing my homelab like a production environment. Building on the automation I detailed in my popular post, Need for Speed: Automating Proxmox K8s Clusters with Talos Omni, we now have a cluster ready for a production-grade CNI. Now that we have a solid GitOps foundation and a running Talos Kubernetes cluster, it’s time to address a critical component: networking.\n","title":"Stop Using the Wrong CNI: Why Your Homelab Deserves Cilium in 2026","type":"blog"},{"content":" The Journey So Far # In this series, we\u0026rsquo;ve built a powerful foundation for a homelab Kubernetes platform. We started by installing Talos Omni to get a centralized management plane. Then, we walked the \u0026ldquo;scenic route\u0026rdquo; by manually provisioning a cluster to understand the nuts and bolts. Finally, we achieved true velocity by automating cluster creation, turning our Kubernetes infrastructure into a disposable, on-demand resource.\nNow, with the ability to create clusters in minutes, we face the next enterprise-grade challenge: How do we manage the applications and platform services running on it?\nIt\u0026rsquo;s not enough to just have a cluster; we need a structured, scalable, and maintainable way to deploy and configure everything from the service mesh to the applications themselves. This is where we move from simply running Kubernetes to building a true platform.\nThis post outlines my GitOps architecture, which is based on a four-repository model. This structure provides a clear separation of concerns and establishes a robust workflow for managing the entire software lifecycle on my homelab cluster.\nWhy Not Just One Big Repository? # When starting with GitOps, the simplest approach is often a single repository containing everything: your ArgoCD manifests, your platform tools (like Istio or cert-manager), and all your application configurations. This works well for a small number of services, but as I\u0026rsquo;ve learned from managing production environments with hundreds of microservices across multiple clusters, it quickly becomes unwieldy.\nA single-repo approach leads to several problems:\nBlurred Ownership: Who is responsible for what? Platform engineers and application developers all commit to the same repository, increasing the risk of conflicting changes. Configuration Drift: It becomes difficult to manage environment-specific configurations cleanly. You often end up with complex Kustomize overlays or Helm value files that are hard to read. Promotion Path Complexity: Promoting an application from a development to a production environment can be a messy process of copying files or cherry-picking commits. Blast Radius: A mistake in one part of the repository (e.g., a platform tool configuration) can potentially break application deployments, and vice-versa. To avoid these pitfalls, I\u0026rsquo;ve adopted a multi-repository strategy that separates concerns, mirroring the way we manage complex systems in production.\nThe Four Pillars of My GitOps Platform # My architecture is composed of four interconnected GitHub repositories, each with a distinct purpose. ArgoCD watches these repositories and uses them to assemble the complete desired state of the cluster.\nLet\u0026rsquo;s break down the role of each one.\n1. homelab-k8s-argo-config: The Platform Foundation # This repository is the source of truth for the platform itself. It contains the ArgoCD configurations for all the foundational services that my applications will depend on. Think of it as the layer managed by the \u0026ldquo;platform team\u0026rdquo; (in this case, me).\nPurpose:\nDefines and configures core infrastructure components like Istio (service mesh), cert-manager (certificate management), external-dns (DNS automation), and external-secrets (secret synchronization). Contains ArgoCD AppProject definitions to enforce security and organization. Manages the namespaces and RBAC policies for the entire cluster. This repository establishes the stable base upon which all applications will run.\n2. homelab-k8s-base-manifests: The Blueprint Factory # This repository holds the reusable deployment templates. To ensure consistency, I don\u0026rsquo;t want to write Kubernetes manifests from scratch for every application. Instead, I use standardized Helm charts that define common deployment patterns.\nPurpose:\nProvides a library of production-ready Helm charts. Includes a common chart that can be used to deploy most stateless web applications and APIs with a standard set of features (Deployments, Services, Ingresses, etc.). Allows for application-specific charts for services with unique requirements. By using these base charts, I ensure that every application is deployed in a consistent and predictable way.\n3. homelab-k8s-environments: The High-Velocity Configuration Registry # This repository is different from the others; it\u0026rsquo;s a high-frequency repository that is often updated automatically by CI/CD pipelines. It has a critical job: it tracks not only which version of each application is deployed in each environment (which are separate Kubernetes clusters), but also stores other configuration artifacts generated during the build process.\nPurpose:\nActs as a central registry for application versions (e.g., Docker image tags). Stores pipeline-generated data. For example, if an application uses gRPC and Istio is configured to translate incoming REST calls, the pipeline can generate and store a protoDescriptorBin file here for Istio to consume. The dev configuration is updated automatically by a CI/CD pipeline, enabling continuous delivery to my development cluster. The prod configuration is updated manually via pull requests, creating a controlled promotion process for the production cluster. This repository acts as the dynamic bridge between my CI/CD system and my GitOps state, allowing for automated, pipeline-driven configuration to be safely consumed by ArgoCD.\n4. homelab-k8s-environments-apps: The Configuration Hub # This is the repository that ties everything together. It contains the ArgoCD Application manifests that tell ArgoCD how to build and deploy each application.\nPurpose:\nContains an ArgoCD Application definition for every service. Specifies which Helm chart to use from homelab-k8s-base-manifests. Provides the environment-specific Helm values.yaml files (e.g., resource limits, replica counts, environment variables). References the homelab-k8s-environments repository to get the correct application version. This repository effectively combines the \u0026ldquo;what\u0026rdquo; (the Helm chart), the \u0026ldquo;which version\u0026rdquo; (the image tag), and the \u0026ldquo;how\u0026rdquo; (the configuration values) to create a complete application deployment.\nExplore the Repositories # More details on each repository\u0026rsquo;s planned structure can be found on my GitHub. Currently, they contain detailed READMEs outlining their purpose, but they will evolve step-by-step as the homelab is built. Everything will be public: the pipelines, the processes, the applications I deploy, and the configurations that bind them together.\nhomelab-k8s-argo-config: The platform\u0026rsquo;s foundation (Istio, cert-manager, etc.). homelab-k8s-base-manifests: Reusable Helm chart blueprints. homelab-k8s-environments: The dynamic version and configuration registry. homelab-k8s-environments-apps: The application configuration hub that ties it all together. The Deployment Flow in Action # This structure creates a clear and auditable workflow for making changes.\nScenario 1: A New Application Version Deploys to DEV # A developer pushes a code change, and the CI pipeline builds a new Docker image (v1.2.4). The CI pipeline automatically commits a change to the homelab-k8s-environments repository, updating the version for that application in the dev environment to v1.2.4. ArgoCD detects this change and automatically syncs the application, pulling the new Docker image into the development cluster. Scenario 2: Promoting an Application to Production # After testing in dev, the v1.2.4 version is deemed stable and ready for production. A developer opens a pull request against the homelab-k8s-environments repository to change the version for the application in the prod configuration to v1.2.4. The pull request is reviewed, approved, and merged. ArgoCD detects the change in the prod configuration and syncs the application, rolling out the new version to the production cluster. Conclusion: An Enterprise Pattern for the Homelab # This four-repository structure brings a level of organization and control that I have validated in production environments managing hundreds of microservices across multiple clusters. It provides:\nClear Separation of Concerns: The platform, application blueprints, versions, and configurations are all managed independently. Scalability: Adding new applications or environments is a structured process, not an ad-hoc task. Auditable and Controlled Promotions: The Git history provides a clear record of every change, and promotions to production are managed through pull requests. Now that the blueprint for the platform is defined, the next step is to bring it to life. In the upcoming posts, I\u0026rsquo;ll start implementing this structure, beginning with the homelab-k8s-argo-config repository to lay down the foundational services for my new Kubernetes platform.\nStay tuned. Andrei\n","date":"3 January 2026","externalUrl":null,"permalink":"/the-four-repo-gitops-structure-for-my-homelab-platform/","section":"Blog","summary":"The Journey So Far # In this series, we’ve built a powerful foundation for a homelab Kubernetes platform. We started by installing Talos Omni to get a centralized management plane. Then, we walked the “scenic route” by manually provisioning a cluster to understand the nuts and bolts. Finally, we achieved true velocity by automating cluster creation, turning our Kubernetes infrastructure into a disposable, on-demand resource.\n","title":"The Four-Repo GitOps Structure for My Homelab Platform","type":"blog"},{"content":"","date":"29 December 2025","externalUrl":null,"permalink":"/tags/automation/","section":"Tags","summary":"","title":"Automation","type":"tags"},{"content":"In my previous posts, I walked through installing Talos Omni and then manually provisioning a Talos Kubernetes cluster on Proxmox. Both were essential learning experiences. Getting Talos Omni running was a huge win, and understanding the manual provisioning process\u0026hellip; from downloading the ISO, creating VMs, configuring static IPs in the console, and patching nodes\u0026hellip; built a strong foundation. But the real game-changer wasn\u0026rsquo;t just running Kubernetes\u0026hellip; it was discovering how quickly I could create it.\nFor years, spinning up a new K8s cluster was a significant undertaking. The traditional path using tools like Terraform, Ansible, and kubeadm is powerful but often slow and brittle. A single misconfiguration in an Ansible playbook or a change in the underlying OS could send you down a rabbit hole of debugging. As I showed in my manual provisioning guide, even with Talos, the process involved creating Proxmox VMs, booting from an ISO, and configuring each node one by one. It worked, but it lacked the velocity I was used to in the cloud.\nThis all changed when I discovered the Omni Proxmox Provider. Combined with a GitOps approach using omnictl, I can now define entire Kubernetes clusters as code, version control them, and deploy them with a single command. This is the story of how I went from carefully tending to my Kubernetes cluster to treating it like disposable cattle\u0026hellip; and why that\u0026rsquo;s a massive upgrade for any developer.\nNote: All the configuration files, machine class definitions, and cluster templates referenced in this post are available in my GitHub repository. Feel free to use them as a starting point for your own setup.\nThe Game-Changer: The Omni Proxmox Provider # The Omni Proxmox Provider is a direct integration between Talos Omni and the Proxmox API. Instead of you manually creating virtual machines, Omni does it for you. You define a cluster configuration as YAML\u0026hellip; how many nodes, how much RAM, which Proxmox node to build on—and with a single omnictl command, Omni makes the API calls to Proxmox to provision the VMs, attach the Talos ISO, and boot them up.\nThe entire process is automated. What used to be a 30 minute, multi-step manual task is now a 3 minute, single-command operation.\nMethod Time to Provision Process Fragility Terraform + Ansible + Kubeadm 1-2 Hours Complex, multi-stage process involving IaC for VMs and configuration management for K8s components. High. Prone to breaking with OS or tool updates. Manual Proxmox + Talos ISO 20-30 Minutes Create VM templates, clone them, manually configure IPs and hostnames in the console. Medium. Repetitive manual steps are error-prone. Omni Proxmox Provider + omnictl ~3 Minutes Define cluster as YAML, run omnictl cluster template sync. Omni handles everything. Low. A purpose-built, integrated, and repeatable GitOps process. Why I No Longer Fear Breaking My Cluster # This speed has fundamentally changed my developer experience. My Proxmox-based Kubernetes cluster is now my primary development environment. Because I can tear it down and bring it back up in the time it takes to grab a coffee, I\u0026rsquo;m no longer afraid to break things.\nExperiment Fearlessly: Testing a new CNI, a service mesh, or a chaotic operator that might destabilize the cluster? Go for it. If it all goes wrong, I don\u0026rsquo;t spend hours trying to fix it. I just delete the cluster and provision a new one. Clean State, Every Time: I can start my day with a completely fresh cluster, ensuring no leftover artifacts from previous experiments interfere with my work. Parallel Environments: Need to test how two different versions of an application interact? I can spin up two separate, isolated clusters in minutes. This is the \u0026ldquo;cattle, not pets\u0026rdquo; philosophy in action. My cluster is no longer a precious thing to be carefully maintained. It\u0026rsquo;s a disposable, reproducible resource, just like a container.\nPrerequisites # Before we begin, ensure you have:\nA working Talos Omni installation (see my previous guide). A Proxmox VE server with administrative access. A system with Homebrew installed for omnictl CLI. Step 1: Create a Proxmox API Token # First, we need to give Omni the credentials to manage Proxmox on our behalf.\nIn your Proxmox web interface, navigate to Datacenter → Permissions → API Tokens. Create a new API token: User: root@pam Token ID: omni-proxmox-provider Critical: Save the generated token secret immediately. You won\u0026rsquo;t be able to retrieve it again. Grant permissions to the token: Go to Datacenter → Permissions → Add. Select your new token and assign it the Administrator role (for testing; you can scope this down for production). Step 2: Configure the Omni Infrastructure Provider # Now we\u0026rsquo;ll register Proxmox as an infrastructure provider in Omni. This is one of the few tasks we\u0026rsquo;ll do in the UI\u0026hellip; just to generate the service account key.\nLog in to your Omni UI and navigate to Settings → Infrastructure Providers. Click Create Provider and name it proxmox. Copy the Service Account Key that Omni generates and store this securely. This key will authenticate the Proxmox provider container back to Omni. After this step, everything else is done via omnictl.\nStep 3: Deploy the Proxmox Provider with Docker # The Proxmox provider runs as a separate container that bridges Omni and your Proxmox API.\nCreate a configuration file (config.yaml) with your Proxmox connection details:\nproxmox: url: \u0026#34;https://homelab.proxmox:8006/api2/json\u0026#34; insecureSkipVerify: true tokenID: \u0026#34;root@pam!omni-proxmox-provider\u0026#34; tokenSecret: \u0026#34;YOUR-PROXMOX-TOKEN-SECRET\u0026#34; Replace YOUR-PROXMOX-TOKEN-SECRET with the token you saved in Step 1.\nCreate a .env file with the Omni connection details:\nOMNI_API_ENDPOINT=https://omni.yourdomain.com OMNI_INFRA_PROVIDER_KEY=YOUR-SERVICE-ACCOUNT-KEY Replace YOUR-SERVICE-ACCOUNT-KEY with the key you copied in Step 2.\nCreate a docker-compose.yml file:\nservices: omni-infra-provider-proxmox: image: ghcr.io/siderolabs/omni-infra-provider-proxmox container_name: omni-infra-provider-proxmox env_file: - .env volumes: - ./config.yaml:/config.yaml command: \u0026gt; --config-file /config.yaml --omni-api-endpoint ${OMNI_API_ENDPOINT} --omni-service-account-key ${OMNI_INFRA_PROVIDER_KEY} restart: unless-stopped network_mode: host Start the provider:\ndocker compose up -d You should now see both containers running: the Omni UI and the Proxmox provider.\nVerify the connection: Back in the Omni UI, go to Settings → Infrastructure Providers. Your proxmox provider should now show a status of Healthy. Step 4: Install and Configure omnictl # This is where the GitOps approach begins. Instead of clicking through a UI, we\u0026rsquo;ll define everything as code and manage it through omnictl.\nInstall omnictl using Homebrew:\nbrew install siderolabs/tap/sidero-tools This also installs talosctl and the kubectl oidc-login plugin.\nCreate the configuration directory:\nomnictl config contexts This command will initially fail but creates the required directory: ~/.talos/omni/.\nDownload your Omni configuration:\nIn the Omni UI, go to Home → Download omniconfig. Move the file to the correct location: mv omniconfig.yaml ~/.talos/omni/config Verify the configuration:\nomnictl config contexts You should see your Omni instance listed.\nAuthenticate:\nomnictl get clusters This will open your browser for Auth0 authentication. Once complete, you\u0026rsquo;re ready to go.\nImportant Note: The omnictl version must match your Omni backend\u0026rsquo;s API version. If you see a version mismatch error, update your Omni instance or downgrade omnictl to match.\nStep 5: Define Machine Classes # Machine classes are templates that define the VM specifications for your cluster nodes.\nCreate a control plane machine class (control-plane.yaml):\nmetadata: namespace: default type: MachineClasses.omni.sidero.dev id: proxmox-control-plane spec: matchlabels: [] autoprovision: providerid: proxmox providerdata: | cores: 2 sockets: 1 memory: 4096 disk_size: 40 network_bridge: vmbr0 storage_selector: name == \u0026#34;local-lvm\u0026#34; Create a worker machine class (worker.yaml):\nmetadata: namespace: default type: MachineClasses.omni.sidero.dev id: proxmox-worker spec: matchlabels: [] autoprovision: providerid: proxmox providerdata: | cores: 4 sockets: 1 memory: 8192 disk_size: 60 network_bridge: vmbr0 storage_selector: name == \u0026#34;local-lvm\u0026#34; Apply the machine classes:\nomnictl apply -f control-plane.yaml omnictl apply -f worker.yaml These machine classes are now available for use in any cluster definition.\nStep 6: Create Your First Automated Cluster # Now for the moment of truth: provisioning a cluster with a single command.\nCreate a cluster definition (k8s-dev-dhcp.yaml):\nkind: Cluster name: k8s-dev-dhcp kubernetes: version: v1.34.2 talos: version: v1.11.5 --- kind: ControlPlane machineClass: name: proxmox-control-plane size: 1 systemExtensions: - siderolabs/iscsi-tools - siderolabs/nfsd - siderolabs/qemu-guest-agent - siderolabs/util-linux-tools patches: - name: hostname-cp inline: machine: network: hostname: k8s-dev-cp time: servers: - pool.ntp.org --- kind: Workers name: workers machineClass: name: proxmox-worker size: 3 systemExtensions: - siderolabs/iscsi-tools - siderolabs/nfsd - siderolabs/qemu-guest-agent - siderolabs/util-linux-tools patches: - name: worker-labels inline: machine: nodeLabels: node-role.kubernetes.io/worker: \u0026#34;\u0026#34; - name: hostname-prefix inline: machine: network: hostname: k8s-dev-worker time: servers: - pool.ntp.org Deploy the cluster:\nomnictl cluster template sync -f k8s-dev-dhcp.yaml Watch the magic happen:\nMonitor progress with: omnictl get machines --watch Or check the Omni UI to see the cluster provisioning status. - In Proxmox, watch as VMs are automatically created, configured, and powered on. - Within 3-5 minutes, your cluster will be fully operational. Connect to your new cluster: Download the kubeconfig: omnictl kubeconfig -c k8s-dev-dhcp \u0026gt; ~/.kube/k8s-dev-dhcp.yaml Or download it from the Omni UI if you prefer. Run kubectl get nodes to verify your cluster is up and running. You\u0026rsquo;ve just provisioned a production-grade Kubernetes cluster on Proxmox without touching a single VM console.\nConclusion: GitOps Meets Homelab Velocity # Installing Talos Omni was about bringing enterprise patterns home. But integrating it with the Proxmox provider and adopting a GitOps workflow was about unlocking enterprise velocity. The ability to provision Kubernetes clusters on-demand, with zero friction, using version-controlled configuration files is the real superpower. It transforms your homelab from a static environment into a dynamic, flexible platform for learning and innovation.\nBy treating your infrastructure as code, you get:\nReproducibility: Every cluster is built from the same, tested configuration. Version Control: Track changes to your infrastructure over time with Git. Disaster Recovery: Recreate your entire environment from a Git clone. Experimentation: Test changes in dev before applying them to production. If you\u0026rsquo;re running Proxmox and Talos, the Omni Proxmox Provider combined with omnictl isn\u0026rsquo;t just a nice-to-have; it\u0026rsquo;s a must-have. It will completely change the way you interact with Kubernetes.\nWant to get started? Clone my GitHub repository and adapt the configuration files for your own environment. Everything you need is there: provider setup, machine classes, and ready-to-deploy cluster templates for both development and production.\nStay tuned. Andrei\n","date":"29 December 2025","externalUrl":null,"permalink":"/need-for-speed-automating-proxmox-k8s-clusters-with-talos-omni/","section":"Blog","summary":"In my previous posts, I walked through installing Talos Omni and then manually provisioning a Talos Kubernetes cluster on Proxmox. Both were essential learning experiences. Getting Talos Omni running was a huge win, and understanding the manual provisioning process… from downloading the ISO, creating VMs, configuring static IPs in the console, and patching nodes… built a strong foundation. But the real game-changer wasn’t just running Kubernetes… it was discovering how quickly I could create it.\n","title":"Need for Speed: Automating Proxmox K8s Clusters with Talos Omni","type":"blog"},{"content":"","date":"29 December 2025","externalUrl":null,"permalink":"/tags/proxmox/","section":"Tags","summary":"","title":"Proxmox","type":"tags"},{"content":"In the world of platform engineering, our goal is always to automate everything. But before we can appreciate the elegance of a fully automated workflow, it\u0026rsquo;s incredibly valuable to walk through the manual process at least once. It builds a deep understanding of what\u0026rsquo;s happening under the hood.\nAfter setting up my Kubernetes management plane in \u0026ldquo;Enterprise Kubernetes at Home - A Guide to Installing Talos Omni,\u0026rdquo; my next step was to create my first cluster. While my ultimate goal was zero-touch provisioning (a story for the next post!), I started by taking the scenic route: manually deploying a Talos Linux Kubernetes Cluster on Proxmox using the ISO image.\nThis post is a step-by-step guide to that process. We\u0026rsquo;ll go from downloading the installation media to running kubectl get nodes on a fully functional, three-node development cluster.\nThe Goal: A 3-Node Development Cluster # Our objective is to create a small, flexible Kubernetes cluster suitable for development and testing. To maximize our resources, we\u0026rsquo;ll configure it so that workloads can run on the control plane nodes. This gives us a three-node cluster where every node is a usable worker.\nStep 1: Download and Prepare the Talos ISO # First, we need the installation media. Talos makes this easy.\nLog in to your Talos Omni UI. Navigate to the Media section. Download the generic image (amd64). You won\u0026rsquo;t need any extensions for this basic setup. Once downloaded, upload the ISO file to your Proxmox server\u0026rsquo;s ISO storage. Step 2: Create the Proxmox Virtual Machines # Now, we\u0026rsquo;ll create the three VMs that will form our cluster. In Proxmox, create three identical VMs with the following specifications:\nOS: Use the Talos ISO you just uploaded. System: Enable the QEMU Agent. This is important for Proxmox to properly manage the VM. Hard Disk: 64 GB is plenty for a development node. CPU/Memory: I\u0026rsquo;ve allocated 8192 MiB of RAM and 2 cores per node. Naming: Name them k8s-dev-1, k8s-dev-2, and k8s-dev-3. Once configured, start all three VMs.\nStep 3: Manual Network Configuration # When the VMs boot for the first time, they will get an IP address from your DHCP server. For a stable cluster, we need static IPs.\nOpen the console for each VM in Proxmox. You\u0026rsquo;ll be greeted by the Talos configuration screen. Press F3 to edit the network settings.\nConfigure each node as follows, using a unique IP address for each:\nVM 1 (k8s-dev-1)\nHostname: k8s-dev-1 DNS Server: 10.20.0.2 (My local Pi-hole) Timeserver: pool.ntp.org Interface: ens18 (or your VM\u0026rsquo;s network interface) Mode: Static Addresses: 10.20.10.11/16 Gateway: 10.20.0.1 Repeat this process for k8s-dev-2 (using 10.20.10.12/16) and k8s-dev-3 (using 10.20.10.13/16).\nOnce you save the configuration on each node, they will reboot. Back in the Omni UI, you\u0026rsquo;ll see the machines appear in the Machines tab with their new hostnames.\nStep 4: Create the Cluster and Apply Patches # With our machines registered in Omni, we can now create the cluster.\nIn Omni, go to the Clusters tab and click Create Cluster. Give it a name (e.g., k8s-dev). Select the three machines we just configured and add them to the cluster. Crucially, before finalizing, we need to apply a patch to allow workloads to run on our control plane nodes. This is perfect for a small dev cluster. For each machine in the cluster configuration, add the following patch:\ncluster: allowSchedulingOnControlPlanes: true This tells Talos that the Kubernetes scheduler is allowed to place pods on the nodes that are also running the control plane components (like etcd and the API server).\nOnce patched, create the cluster. Omni will now orchestrate the bootstrapping process across the three nodes.\nStep 5: Access Your New Cluster with kubectl # After a few minutes, the cluster will be up and running. The final step is to connect to it.\nDownload Kubeconfig: In the Omni UI, navigate to your k8s-dev cluster\u0026rsquo;s page and click Download Kubeconfig. Merge Kubeconfig: Add the contents of the downloaded file to your local ~/.kube/config file. Install kubelogin: Omni uses OIDC for authentication, so you need a kubectl helper. Install it with Homebrew: brew install kubelogin This plugin handles the OIDC login flow automatically when you use kubectl. Verify Access: Run a kubectl command. The first time, kubelogin will open a browser window for you to authenticate via Auth0 (or whichever SSO provider you configured). kubectl get nodes You should see your three nodes reported as Ready! NAME STATUS ROLES AGE VERSION k8s-dev-1 Ready control-plane 10m v1.34.1 k8s-dev-2 Ready control-plane 10m v1.34.1 k8s-dev-3 Ready control-plane 10m v1.34.1 Conclusion: The Power of the \u0026ldquo;Scenic Route\u0026rdquo; # And there you have it\u0026hellip; a fully functional Talos Kubernetes cluster, built from scratch. While this manual process is slower than an automated one, it provides an invaluable look into the moving parts of a Kubernetes deployment. You\u0026rsquo;ve touched the networking, storage, and OS layers directly, which builds a strong mental model for troubleshooting later.\nNow that we\u0026rsquo;ve mastered the manual way, stay tuned for my next post, where we\u0026rsquo;ll make this entire process disappear with the magic of the Omni Proxmox Provider.\nStay tuned. Andrei\n","date":"24 December 2025","externalUrl":null,"permalink":"/from-iso-to-kubectl-a-guide-to-manually-provisioning-a-talos-kubernetes-cluster/","section":"Blog","summary":"In the world of platform engineering, our goal is always to automate everything. But before we can appreciate the elegance of a fully automated workflow, it’s incredibly valuable to walk through the manual process at least once. It builds a deep understanding of what’s happening under the hood.\n","title":"From ISO to kubectl: A Guide to Manually Provisioning a Talos Kubernetes Cluster","type":"blog"},{"content":"","date":"24 December 2025","externalUrl":null,"permalink":"/tags/talos-linux/","section":"Tags","summary":"","title":"Talos-Linux","type":"tags"},{"content":"In the world of enterprise cloud, managing Kubernetes clusters with services like GKE, EKS, or AKS is standard practice. These platforms offer incredible power but come with a learning curve and, more importantly, a cost that\u0026rsquo;s hard to justify for a homelab. As a platform engineer, I\u0026rsquo;m used to building and managing production-grade infrastructure, but as I explained in my first post, Why not a homelab?, I wanted a solution for my homelab that offered a similar centralized management experience without the overhead.\nThis is where Talos Linux comes in. Created by Sidero Labs, Talos is a modern OS designed specifically for Kubernetes. It\u0026rsquo;s minimal, secure, and immutable, which means it\u0026rsquo;s built from the ground up to be a rock-solid foundation for your clusters. But a powerful OS is only half the story. How do you manage the clusters running on it?\nEnter Talos Omni, the purpose-built management plane for Talos Linux clusters. Omni provides a clean, centralized GUI for cluster lifecycle management, secure bootstrapping, and observability. It\u0026rsquo;s the bridge between manually juggling kubeconfig files and a full-blown managed Kubernetes service. By self-hosting Omni, you get the same powerful workflows you\u0026rsquo;d find in the cloud, but with full control over your own infrastructure.\nThis post is a hands-on guide to installing Talos Omni on-premises using Docker. We\u0026rsquo;ll walk through the process step-by-step, applying enterprise security principles like secure certificate management, etcd encryption, and centralized authentication to our homelab setup.\nWhy Not Just Use Managed Kubernetes? # For production workloads, managed Kubernetes is often the right choice. It offloads the operational burden of the control plane, offers auto-scaling, and integrates deeply with cloud provider services. However, in a homelab, the goals are different: learning, experimentation, and cost control.\nFeature Managed Kubernetes Talos Omni (Self-Hosted) Control Plane Managed by the cloud provider. Self-managed, running on your hardware. Cost Pay-per-hour for the control plane, plus worker nodes. One-time hardware cost, minimal operational expense. Complexity Abstracted away, but provider-specific configurations can be complex. You control the entire stack, offering deeper learning opportunities. Customization Limited to what the provider exposes. Fully customizable to your needs. For a homelab, Talos Omni hits the sweet spot. It provides a clean, web-based UI to manage multiple clusters, making it feel like a private, mini-managed Kubernetes service. It\u0026rsquo;s the perfect tool for applying platform engineering principles at home.\nPrerequisites # Before we begin, ensure you have the following:\nA Docker host (I\u0026rsquo;m using an Ubuntu VM with a static IP, which I set up as part of my segmented homelab network). A registered domain name managed via Cloudflare. A Cloudflare account for DNS management. An Auth0 account for setting up single sign-on (SSO). Step 1: Generate TLS Certificates with Certbot # Just like in a production environment, we\u0026rsquo;ll start by securing our endpoint with TLS. We\u0026rsquo;ll use Certbot and the Cloudflare DNS plugin to generate a wildcard certificate.\nInstall Certbot and the Cloudflare DNS plugin:\nsudo snap install --classic certbot sudo snap set certbot trust-plugin-with-root=ok sudo snap install certbot-dns-cloudflare Create a Cloudflare API Token: Log in to your Cloudflare dashboard, go to API Tokens, and create a token with Zone:DNS:Edit permissions for your domain.\nGenerate the Certificate: Create a credentials file with your API token and then run Certbot.\n# Create a credentials file echo \u0026#39;dns_cloudflare_api_token = \u0026lt;YOUR_CLOUDFLARE_API_TOKEN\u0026gt;\u0026#39; \u0026gt; cloudflare.ini chmod 600 cloudflare.ini # Request the certificate sudo certbot certonly --dns-cloudflare --dns-cloudflare-credentials ./cloudflare.ini -d omni.yourdomain.com Certbot will handle the DNS challenge and store the certificates in /etc/letsencrypt/live/omni.yourdomain.com/.\nStep 2: Create an etcd Encryption Key with GPG # Encrypting sensitive data at rest is a cornerstone of enterprise security. Omni stores its state in etcd, and we\u0026rsquo;ll use a GPG key to encrypt it. It is critical that you do not use a passphrase for this key, as it would break the automated bootstrap process.\nGenerate the GPG key:\n# Generate the primary key gpg --quick-generate-key \u0026#34;Omni (etcd encryption)\u0026#34; rsa4096 cert never Add encription subkey:\n# Add an encryption subkey (replace \u0026lt;YOUR_KEY_ID\u0026gt; with the ID from the previous command) gpg --quick-add-key \u0026lt;YOUR_KEY_ID\u0026gt; rsa4096 encr never Export the secret key:\ngpg --export-secret-keys --armor \u0026lt;YOUR_KEY_ID\u0026gt; \u0026gt; omni.asc This omni.asc file contains the key we\u0026rsquo;ll provide to Omni for etcd encryption.\nStep 3: Configure Auth0 for SSO # Instead of managing local users, we\u0026rsquo;ll integrate a proper identity provider, just as we would in a corporate environment.\nIn your Auth0 dashboard, navigate to Applications and create a new Single-Page Application. In the application\u0026rsquo;s settings, configure the following URLs, replacing omni.yourdomain.com with your domain:\nAllowed Callback URLs: https://omni.yourdomain.com Allowed Logout URLs: https://omni.yourdomain.com Allowed Web Origins: https://omni.yourdomain.com To enforce SSO, navigate to Authentication \u0026gt; Database and disable the local username/password database for this application.\nStep 4: Bootstrap Omni with Docker Compose # With our security components in place, we can now deploy Omni.\nDownload the configuration files from Sidero Labs:\nexport OMNI_VERSION=1.3.4 # Use the latest stable version curl https://raw.githubusercontent.com/siderolabs/omni/v${OMNI_VERSION}/deploy/env.template \u0026gt; omni.env curl https://raw.githubusercontent.com/siderolabs/omni/v${OMNI_VERSION}/deploy/compose.yaml -o compose.yaml Edit the omni.env file: This file contains all the configuration variables for Omni. Carefully fill in the required values, including:\nTLS_CERT and TLS_KEY: Paths to the certificates we generated. ETCD_ENCRYPTION_KEY: Path to the omni.asc GPG key. OMNI_DOMAIN_NAME: Your domain (omni.yourdomain.com). INITIAL_USER_EMAILS: Your email address, which will be granted initial admin access. Auth0 details: AUTH_AUTH0_DOMAIN and AUTH_AUTH0_CLIENT_ID from your Auth0 application. Launch Omni:\ndocker compose --env-file omni.env up -d Docker will pull the images and start the Omni containers in the background.\nStep 5: First Login and Final Configuration # To access the UI, you\u0026rsquo;ll need to resolve your Omni domain to the IP address of your Docker host.\nConfigure DNS: Create a DNS A record in your local DNS resolver (e.g., Pi-hole) pointing omni.yourdomain.com to your Docker host\u0026rsquo;s IP address.\nLog In: Navigate to https://omni.yourdomain.com. You should be redirected to the Auth0 login page. Log in with the social provider you configured (e.g., Google), and you\u0026rsquo;ll be redirected back to the Omni dashboard.\nConclusion # By following these steps, you\u0026rsquo;ve deployed a powerful, self-hosted Kubernetes management plane in your homelab, mirroring the security and operational patterns of an enterprise environment. We\u0026rsquo;ve secured our endpoint with TLS, encrypted sensitive data at rest, and integrated a centralized identity provider\u0026hellip; all before deploying a single workload.\nTalos Omni provides an excellent foundation for managing a multi-cluster homelab. In future posts, I\u0026rsquo;ll explore how to provision a new Talos cluster and connect it to our Omni instance.\nStay tuned. Andrei\n","date":"14 December 2025","externalUrl":null,"permalink":"/enterprise-kubernetes-at-home-a-guide-to-installing-talos-omni/","section":"Blog","summary":"In the world of enterprise cloud, managing Kubernetes clusters with services like GKE, EKS, or AKS is standard practice. These platforms offer incredible power but come with a learning curve and, more importantly, a cost that’s hard to justify for a homelab. As a platform engineer, I’m used to building and managing production-grade infrastructure, but as I explained in my first post, Why not a homelab?, I wanted a solution for my homelab that offered a similar centralized management experience without the overhead.\n","title":"Enterprise Kubernetes at Home - A Guide to Installing Talos Omni","type":"blog"},{"content":"","date":"17 November 2025","externalUrl":null,"permalink":"/tags/gcp/","section":"Tags","summary":"","title":"Gcp","type":"tags"},{"content":"","date":"17 November 2025","externalUrl":null,"permalink":"/series/homelab-hardware/","section":"Series","summary":"","title":"Homelab Hardware","type":"series"},{"content":"When you architect a Kubernetes cluster, you don\u0026rsquo;t think about heat dissipation or power consumption. You think in abstractions: N2 instances, vCPUs, memory tiers. Click, deploy, bill. The infrastructure vanishes behind APIs and Terraform declarations. But the moment you decide to build that same cluster in your homelab, those abstractions collapse into very real decisions: which CPU, how much RAM, what kind of storage, and critically, how much will this cost me in electricity every month?\nPart 1 of a two-part series. If you are looking for the hands-on hardware decisions (CPU, storage, networking, final bill of materials), jump to part 2: From Design Principles to Physical Build.\nRecently I reached a milestone in my career: I launched a digital bank and built the production infrastructure from scratch on Google Cloud Platform (GCP). We delivered four complete environments, automated deployments with GitOps and robust CI/CD pipelines, and put in place production-grade observability and autoscaling. The platform now supports 81,000 clients and manages €3.3 billion in assets\u0026hellip; a scale that taught me many practical lessons about resilience, placement, and testing.\nTwo LinkedIn posts document the launch and the team behind it: my personal write-up and the official announcement. They highlight the engineering patterns (scalability, security, automation) that inspired this homelab experiment.\nIn this first part I translate the way I size and structure production GKE clusters into a set of concrete homelab requirements. Part 2 will use those requirements to justify every hardware choice.\nEarlier context if you\u0026rsquo;re new here:\nWhy not a homelab? Transforming My Home Network Segmented Homelab Network Build The Enterprise Starting Point: How I Size GKE Clusters # When a team asks for a new platform, they describe workloads, not machines: \u0026ldquo;twelve microservices, 2 vCPUs and 4GB RAM each, autoscaling during peak.\u0026rdquo; My job: map abstract needs to resilient, compliant, cost-aware infrastructure.\nThe Cloud Decision Tree # First fork: GKE mode selection.\nGKE Autopilot – Google manages nodes; you pay per requested pod resources. Perfect for developer velocity and minimizing operational overhead. GKE Standard – Full node pool control when you need custom networking, specific CPU generations, GPUs, or tight cost tuning. Autopilot becomes the golden path; Standard the precision tool for platform engineers.\nMachine Family Selection: E2 vs N2 vs C3 # Machine Series Profile Production Use Case Typical Workload E2 Shared-core, cost-optimized Dev/test, low traffic Stateless apps, batch N2 Balanced CPU/memory General production APIs, databases C3 High-frequency compute CPU/latency sensitive Real-time, ingress My sizing approach is empirical: start with an N2 baseline, deploy with full observability (Prometheus + tracing), run load tests, then right-size and apply FinOps (utilization targets, discounts, preemptibles where safe).\nTypical fintech baseline:\nControl plane: 3 x n2-standard-2 Workers: autoscaled pool of n2-standard-4 at ~75% target utilization Ingress: dedicated c3-standard-4 for latency-sensitive edge traffic The Translation Challenge: Cloud Principles vs Physical Constraints # In GCP you scale with YAML and APIs; at home you scale with electrical outlets and thermals. The principles stay the same:\nHigh availability through redundancy. Segmentation between control and workload planes. Performance tuned to actual needs. Deep observability from the start. Security through segmentation and least privilege. The constraint shift: a single power circuit, a single uplink, and real energy cost.\nMapping Machine Families to Homelab Classes # GCP Series Homelab Analogue Idle Power Best For E2 Intel N100 / Celeron mini PCs 5–15W Control plane, light services N2 Ryzen 5/7 / Core i5/i7 mini PCs 15–35W Mixed workloads C3 Refurb enterprise SFF/tower 50–100W Compute-heavy tasks N100 = efficiency tier; modern Ryzen/Core mid-range = balanced production tier; older enterprise towers = brute-force compute tier.\nRequirements: Making the Abstract Concrete # Functional # Kubernetes: 3-node HA control plane + worker capacity. Proxmox: 3 physical nodes for quorum (survive single-node failure). Shared storage: Ceph for VM disks / persistent volumes. Network segmentation: Multiple VLANs behind OPNsense. Core platform services: DNS (Pi-hole), Vault, ArgoCD, Observability (Prometheus/Grafana). Non-Functional # Resilience: Single-node failure without service loss. Performance: Capable of real databases, web stacks, batch jobs. Power efficiency: Low idle envelope. Noise discipline: \u0026lt;30 dB acceptable in living space. Architectural Trade-Off: One Big Server vs Distributed Minis # Option Strengths Weaknesses Single Enterprise Server Density, hot-swap, enterprise features Loud, high idle watts, single failure domain Cluster of Mini PCs True HA, low power, silent, incremental expansion Lower per-node headroom, relies on network quality I chose distributed mini PCs: mirrors cloud-native redundancy patterns, avoids a loud thermal anchor, and enables failure isolation.\nWhy This Matters for What Comes Next # The remaining decisions (CPU architecture, memory sizing, NVMe endurance, dual-switch segmentation) all build on this distributed stance. Instead of one monolith with internal buses, I have to think in terms of east-west traffic, quorum timing, small node recovery behavior, and how storage replication collides with management latency. That is where Part 2 goes deep.\nBridging to Part 2 # So far: abstraction → requirements → architectural stance. The next layer is translating those requirements into silicon choices (IPC vs core count), storage endurance (TBW), memory headroom (ZFS vs Ceph), and physical network topology.\nContinue to Part 2: From Design Principles to Physical Build for CPU architecture, memory \u0026amp; storage strategy, networking layout, the final bill of materials, and the closing synthesis.\nPart 2 link (bookmark this)\nEnd of Part 1\n","date":"17 November 2025","externalUrl":null,"permalink":"/how-i-chose-my-homelab-hardware-part-1/","section":"Blog","summary":"When you architect a Kubernetes cluster, you don’t think about heat dissipation or power consumption. You think in abstractions: N2 instances, vCPUs, memory tiers. Click, deploy, bill. The infrastructure vanishes behind APIs and Terraform declarations. But the moment you decide to build that same cluster in your homelab, those abstractions collapse into very real decisions: which CPU, how much RAM, what kind of storage, and critically, how much will this cost me in electricity every month?\n","title":"How I Chose My Homelab Hardware (Part 1): From Cloud Sizing to Requirements","type":"blog"},{"content":"This is Part 2. If you need the cloud-to-homelab translation and requirement framing, read Part 1: From Cloud Sizing to Requirements first.\nIn Part 1 I converted production GKE sizing discipline into a concrete homelab requirement set. Now we descend from principles into metal: CPU selection, memory rationale, storage topology, network segregation, and final hardware assembly.\nCross-links:\nBack to Part 1: From Cloud Sizing to Requirements Related earlier posts: Why not a homelab?, Segmented Network Build The Core Decision: CPU Architecture and Performance # This is where enterprise knowledge directly translates to homelab decisions. In the cloud, I choose machine families based on workload profiles. In the homelab, I choose CPU architectures based on the same principles.\nThe x86 Requirement # First, the hard constraint: Proxmox officially supports only x86/x64 architecture. ARM is not supported (unlike some Kubernetes distributions that run on Raspberry Pi). This eliminates ARM-based options and focuses the decision on Intel vs. AMD.\nThe IPC vs. Core Count Trade-Off # Here\u0026rsquo;s where my research (and my production experience) converges on a critical insight: for virtualization and most business workloads, fewer cores at higher frequency (and higher IPC) outperform many cores at lower frequency.\nIPC (Instructions Per Clock) is the real performance metric, not just GHz. A CPU with high IPC executes more work per clock cycle. This matters enormously for:\nSingle-threaded workloads: Most business applications (web servers, databases, management software) are not massively parallel. High IPC means faster response times.\nCeph write latency: Proxmox\u0026rsquo;s distributed storage (Ceph) calculates checksums for every block written, verifying data integrity across nodes. This is CPU-intensive. A high-IPC CPU calculates checksums faster, keeping write latency low and preventing I/O bottlenecks.\nReal-world translation: An Intel Core i5-12400 (6 P-cores at 4.4 GHz boost, high IPC) will outperform an older Intel Xeon E5-2680 v2 (10 cores at 2.8 GHz, lower IPC) for virtualization workloads, while consuming less power and costing less.\nThis is why this kind of CPUs (Ryzen 5/7, Core i5/i7) often outperform server CPUs in homelabs: they prioritize high IPC and frequency over core count, matching the workload profile perfectly.\nThe Hybrid Architecture Problem (Intel 12th Gen+) # Modern Intel processors (12th gen Core and newer) use a hybrid architecture: high-performance P-cores (Performance) mixed with high-efficiency E-cores (Efficiency), similar to ARM\u0026rsquo;s big.LITTLE design.\nHow This Affects Proxmox:\nWorkload Type Allocation Behavior Performance Impact Virtual Machines Proxmox typically allocates VMs to P-cores by default Generally good; E-cores sit mostly idle Containers Individual processes can be scheduled to any core Risk: Process lands on E-core, becomes inexplicably slow Why this matters: I plan to run LXC containers extensively (which I do for lightweight services like DNS, monitoring agents, etc.), a hybrid architecture introduces unpredictability. A container process might land on a slow E-core, causing performance issues that are difficult to diagnose.\nMemory Sizing: From Cloud Abstractions to Physical RAM # In GCP, I specify memory in GB and it\u0026rsquo;s provisioned instantly. In the homelab, every GB of RAM is a physical purchase decision that affects cost and performance.\nThe ZFS Tax (Single Node) # Proxmox favors ZFS as the root filesystem for its superior data integrity features (checksumming, snapshots, compression). But ZFS is a memory hog; it uses RAM aggressively for caching (ARC - Adaptive Replacement Cache).\nRule of Thumb: For optimal ZFS performance on a single Proxmox node, reserve 50% of total RAM for ZFS caching.\nThe Ceph Factor (3-Node Cluster) # In the cloud, storage is abstracted: Persistent Disks, block storage, object storage. You pay for capacity and IOPS, and the provider handles durability (11 nines of durability for GCS, for example). In the homelab, you own the failure risk.\nSSD vs. NVMe vs. HDD # The Hierarchy:\nNVMe SSD: Essential for Proxmox installation and Ceph OSDs (the distributed storage pool). High IOPS, low latency. Use this for VM root disks and databases.\nSATA SSD: Acceptable for Proxmox, decent for Ceph, but slower than NVMe. Good for budget builds.\nHDD (Mechanical): Only for secondary storage with ZFS (backups, archival, media storage). Never use for Proxmox root or Ceph OSDs. I/O latency will cripple performance.\nThis mirrors how we use storage classes in Kubernetes: SSD-backed persistent volumes (PVs) for databases (via StorageClass with gce-pd-ssd), HDD-backed PVs for logs and backups.\nThe TBW Metric (Total Bytes Written) # Here\u0026rsquo;s the homelab-specific concern cloud users never think about: SSD lifespan.\nTBW (Total Bytes Written) is the total amount of data you can write to an SSD over its lifetime. Once you hit the TBW limit, the drive must be retired (it will start failing).\nKey insights from my research:\nTBW scales with capacity: A 2TB NVMe drive typically has a much higher TBW than 500GB/1TB drives. For example, a mainstream 2TB NVMe might advertise 1200 TBW or more depending on NAND type and controller.\nCeph and Proxmox are write-heavy: With Ceph OSDs on NVMe, expect sustained small writes (metadata, journals) plus periodic scrubbing and recovery traffic.\nEstimate write rates and practical lifespan: If your cluster writes 50–100GB/day across all VMs and services (realistic for an active homelab), a 1200 TBW 2TB drive yields:\n1,200,000 GB ÷ 100 GB/day = ~32 years (theoretical). Real-world factors (write amplification, workload spikes) mean planning for 5–10 year replacement cycles is prudent. Write Amplification \u0026amp; Utilization: Due to SSD block management and ZFS/Ceph interactions, actual NAND writes can be 2–3× logical writes. Keep per-drive utilization under 80–85% and prefer NVMe models with good over-provisioning and endurance ratings.\nThe RAID 1 Diversity Strategy # Proxmox supports ZFS RAID 1 (mirroring) for the root disk. Standard practice: use two identical SSDs for redundancy.\nBut here\u0026rsquo;s the counterintuitive insight from my research: Don\u0026rsquo;t use two identical SSDs in RAID 1.\nWhy? If both drives are the same brand and model, they experience identical write patterns. They\u0026rsquo;ll both approach their TBW limit at the same time and fail simultaneously, defeating the purpose of RAID. Different brands, potentially different NAND types, slightly different wear patterns. If Drive 1 fails at year 5, Drive 2 might last 6-7 years, giving you time to rebuild.\nThis is enterprise thinking applied to homelab constraints. In production, we don\u0026rsquo;t care about individual disk longevity (the cloud provider handles it), but the principle of avoiding correlated failures is the same: we distribute pods across multiple availability zones, use different node pools for critical services, and avoid single vendor lock-in.\nNetworking: Separating Control and Data Planes # In any production-grade cluster, network segmentation is non-negotiable. You must isolate different types of traffic to ensure performance, security, and reliability. The most critical separation is between the control plane and the data plane.\nControl Plane Traffic: This includes cluster consensus messages (like Corosync in Proxmox), management UI access, and other low-bandwidth, latency-sensitive communication. This traffic must be protected from congestion at all costs. A failure here can lead to a \u0026ldquo;split-brain\u0026rdquo; scenario where nodes lose contact and the cluster disintegrates. Data Plane Traffic: This is the high-bandwidth traffic generated by your workloads. It includes VM/container network I/O, storage replication (like Ceph), and backups. This traffic is bursty and can easily saturate a network link. By using physically separate network interfaces and switches for each plane, you create a robust architecture that mirrors enterprise best practices. The control plane remains stable and responsive, unaffected by data-intensive operations on the data plane. I covered the blueprint for this in my network build post, and the hardware I chose directly enables this design.\nThe Single NIC Compromise # For those with single-NIC mini PCs, all is not lost. It is possible to run a cluster on a single network link, but it requires careful mitigation:\nMitigation: Managed Switch with QoS\nUsing a capable managed switch, you can create VLANs to logically separate traffic and apply Quality of Service (QoS) rules to prioritize Corosync packets above all else. This ensures that even if VMs saturate the link, the cluster\u0026rsquo;s heartbeat remains stable.\nThis approach forces you to learn valuable enterprise networking skills (VLANs, QoS, traffic shaping), but it introduces complexity and a single point of failure (the single NIC and cable). Given the availability of affordable dual-NIC mini PCs, a physically separate network is the recommended path for new builds.\nMy Final Hardware Selection # Component Quantity Rationale GMKtec M5 Plus (Ryzen 7 5825U, dual 2.5GbE) 3 HA Proxmox + Kubernetes + Ceph cluster nodes 32GB RAM per node 3 kits Memory headroom for ARC, Ceph, workloads NVMe 1TB (OS disks) 3 Isolate system + platform services NVMe 2TB (Ceph OSD disks) 3 Capacity + performance for replicated storage UGREEN 2.5GbE unmanaged switches 2 Physical separation of management vs data traffic ZimaBlade (PBS) 1 Dedicated backup server; keeps backup tasks off cluster OPNsense router/firewall 1 VLAN segmentation, Zero Trust patterns, ingress control Physical Topology: Implementing the Segregated Network # The diagram above illustrates the physical network topology, which is the cornerstone of the cluster\u0026rsquo;s resilience. It shows how the four main hardware components - the three Proxmox VE nodes (PVE 1, 2, 3) and the Proxmox Backup Server (PBS) - are interconnected.\nThe key principle is the strict separation of network traffic using two distinct, physical switches. Each Proxmox node utilizes its dual 2.5GbE network interfaces (eth0 and eth1) to connect to both switches simultaneously.\n1. The Control Plane\nConnections: The first network interface (eth0) of all three Proxmox nodes connects to a dedicated 2.5G unmanaged switch. Traffic: This network is exclusively for low-bandwidth, latency-critical management traffic: Corosync: This is the cluster\u0026rsquo;s heartbeat. It requires a stable, low-latency link to maintain quorum and node consensus. Isolating it here prevents data-plane congestion from causing a \u0026ldquo;split-brain\u0026rdquo; scenario. PVE UI/API: Management access to the Proxmox web interface. 2. The Data Plane\nConnections: The second network interface (eth1) of all three Proxmox nodes, along with the interface from the Proxmox Backup Server, connects to a second high-speed 2.5G unmanaged switch. Traffic: This network is built for high-throughput operations: Ceph Traffic: All storage-related communication for the distributed Ceph cluster, including OSD heartbeats, data replication, and recovery operations. VM/Container Traffic: The actual network I/O for the applications and services running inside virtual machines and containers. Backup and Restore: Data transfer between the Proxmox nodes and the dedicated PBS. The 2.5G speed is crucial for minimizing backup windows and enabling fast restores. This physical separation ensures that a massive data transfer, like a full backup or a Ceph rebalance, cannot saturate the network and disrupt the essential Corosync communication on the control plane. It\u0026rsquo;s a simple but powerful design that brings enterprise-level network reliability to the homelab.\nLessons Applied from Production # Production Principle Homelab Expression Control/Data plane separation Dual physical switches + VLAN layers Quorum \u0026amp; failure tolerance 3-node Proxmox + 3 OSDs Observability first Prometheus/Grafana deployment baseline GitOps discipline ArgoCD managing cluster state Secrets hardening Vault integration for dynamic secrets Backup isolation Dedicated PBS node Performance right-sizing IPC-focused CPU choice, endurance-aware storage Closing Synthesis # Enterprise discipline, applied thoughtfully, scales down beautifully. The exercise isn’t copying production\u0026hellip; it’s preserving the patterns (redundancy, segmentation, observability, controlled failure domains) while respecting homelab constraints (power, space, noise). This build becomes a living reference: a small, silent cluster where I can rehearse patterns before advocating them at scale.\nAndrei\nEnd of Part 2\n","date":"17 November 2025","externalUrl":null,"permalink":"/how-i-chose-my-homelab-hardware-part-2/","section":"Blog","summary":"This is Part 2. If you need the cloud-to-homelab translation and requirement framing, read Part 1: From Cloud Sizing to Requirements first.\n","title":"How I Chose My Homelab Hardware (Part 2): From Design Principles to Physical Build","type":"blog"},{"content":"I love diagrams, but diagrams don\u0026rsquo;t wire cables for me. In this post I will show the physical mapping, the Proxmox bridge pattern I used, the OPNsense management model, and the first firewall policy I used to protect the lab. The network was already in place; below I explain what I did to build and secure it.\nI saved complex topics, like Kubernetes and a full hardware-buy guide, for later posts. In this post I will focus on the steps I took to get the network working.\nHere is the map for the setup:\nFigure: High-level segmented homelab network diagram showing physical NIC split, Proxmox bridges, VLANs, and OPNsense as the router-on-a-stick.\nIf you missed my earlier posts, here is a short summary:\nWhy not a homelab? — why I decided to build a homelab and what I wanted to learn. From Enterprise to Homelab: Transforming My Home Network — the basic network plan I started with. Hardware and Roles # This wasn\u0026rsquo;t a shopping list. I used hardware I already had and assigned each piece a clear role. The approach was simple: set up the management/control plane first, create network boundaries with VLANs, then deploy applications.\nComponent Homelab Role Proxmox VMs / Notes Beelink EQ12 Mini PC (2 NICs) Proxmox host (virtualization \u0026amp; network hub) Hosts VMs: OPNsense (firewall/router), Pi‑hole (DNS), Linux management VM (bastion) Zyxel GS1200-8 Switch Layer-2 VLAN segmentation Physical switch providing VLAN trunk to Proxmox and access ports Zyxel AX3000 Access Point Multi-SSID wireless mapped to VLANs AP maps SSIDs to VLANs (HOME / PROD / GUEST) Existing ISP router Raw internet uplink (for now) Placed in DMZ for staged migration; double NAT temporary The Build: a brief narrative # Below I describe the decisions that shaped the build. The detailed, step-by-step work is in the \u0026ldquo;How I built the network\u0026rdquo; section\u0026hellip; read that for the exact order and checks I ran.\nI kept everything virtualized on the Beelink as a deliberate trade-off because it was the hardware I already had. Using a single host let me get a working foundation quickly and iterate safely\u0026hellip; it\u0026rsquo;s a starting point. Later I plan to move OPNsense to bare metal for higher throughput and place Pi‑hole on a low‑power board (Orange Pi / Raspberry Pi) to reduce attack surface and power use.\nThe first hard boundary: physical and management networks # Segmentation started with physical hardware and how I accessed it.\nPhysical separation: The Beelink mini PC has two physical network ports. One faces the untrusted internet; the other faces my trusted internal network.\nNIC 1 (enp1s0) — WAN: Connects to the ISP router. NIC 2 (enp2s0) — LAN: Connects to the Zyxel managed switch and acts as the trunk for isolated VLANs. Proxmox host access: The Proxmox management interface lives on my existing ISP network at 192.168.1.201. I assigned this as a static address outside the ISP router\u0026rsquo;s DHCP pool to avoid address conflicts and keep host management predictable and separate from the virtual networks it serves.\nInside Proxmox, the physical split was mirrored by two virtual bridges.\nvmbr0 — WAN \u0026amp; host management bridge: Connected to the WAN port. The Proxmox host gets its management IP here. This bridge also provides the uplink for the OPNsense firewall VM. vmbr1 — VLAN trunk: Connected to the LAN port, this bridge was the core of internal segmentation. By enabling the \u0026ldquo;VLAN aware\u0026rdquo; option, it became a virtual managed switch. I left this bridge unnumbered (no IP) to reduce the host\u0026rsquo;s attack surface and avoid accidental exposure. Proxmox moved VLAN-tagged frames; OPNsense handled routing and policy enforcement. In practice it acted as a traffic director: It receives VLAN-tagged packets from VMs. It forwards that traffic out the physical LAN port to the Zyxel switch, preserving the VLAN tag. It lets a VM inside Proxmox communicate with a physical device on the same VLAN port on the switch. How I built the network # I built the network in clear, small steps so I could test and fix problems as I went. The order made the work predictable and reduced disruption to the house network.\nPhysical wiring and role split\nI connected the Beelink NIC1 to the ISP router for WAN and NIC2 to the Zyxel switch for LAN. That created a physical boundary between internet and internal traffic. Install Proxmox and set management access\nI installed Proxmox on the Beelink and set a static management IP on the ISP side (192.168.1.201) so I could manage the host without touching the virtual networks. Proxmox network configuration\nI created vmbr0 for WAN and assigned the Proxmox management IP.\nI created vmbr1 for the LAN trunk, enabled VLAN aware mode, and left it unnumbered (no IP) so the host had minimal exposure.\nPhysical NICs: enp1s0 is the WAN/management-facing NIC and enp2s0 is the LAN-facing NIC that carries the VLAN trunk.\nvmbr0 (MGMT + WAN): a Linux bridge bound to enp1s0 with the host management IP 192.168.1.201/24 and the OPNsense WAN attached.\nvmbr1 (LAN): a VLAN-aware bridge bound to enp2s0. It handles VLAN-tagged traffic and is intentionally left without an IP on the Proxmox host to reduce attack surface.\nVLAN sub-interfaces: vmbr1.10 (HOME), vmbr1.20 (PROD), vmbr1.30 (GUEST). Each maps to the VLAN IDs defined in OPNsense.\nHow it works in practice: VMs can use VLAN-tagged interfaces or rely on OPNsense to terminate VLANs (router-on-a-stick). The bridge simply forwards tagged frames between VMs and the physical switch.\nWhy this layout: it keeps host management separate from VM networks, preserves clear physical boundaries, and centralizes routing/policy in OPNsense while using Proxmox as a simple L2 switch for VLANs.\nDeploy OPNsense VM\nI created an OPNsense VM with two NICs: WAN on vmbr0 (access) and a LAN parent on vmbr1 (trunk). Inside OPNsense I created VLAN interfaces (10, 20, 30), assigned gateway IPs and DHCP ranges for each VLAN. I moved the OPNsense LAN off the ISP-assigned network and placed host and device management onto a dedicated management subnet 172.16.0.0/24. This keeps UIs and admin services reachable from a single, controlled network while preventing accidental exposure to general client VLANs. I started with a deny-by-default firewall model. Each VLAN has its own interface and gateway, and I only created explicit allow rules for necessary traffic. Deploy Pi-hole\nI deployed Pi-hole on VLAN 20 (PROD) with a static IP (10.20.0.2). DNS enforcement: I used DHCP to push Pi-hole as the only DNS server for clients, and then added firewall rules to block outbound DNS (port 53) to other resolvers so clients couldn\u0026rsquo;t bypass filtering. Deploy management VM\nI created a Linux management VM and gave it two virtual NICs. One NIC is bound to vmbr0 so the VM can access the internet for updates and outbound management tasks. The other NIC is bound to vmbr1 (the VLAN-aware trunk) so the VM can reach the OPNsense LAN at 172.16.0.1 and the Zyxel switch at its management IP at 172.16.0.3. This split lets the management VM perform internet-facing tasks when needed while keeping device UIs and switch management on the isolated management subnet. Configure the Zyxel switch and access point\nI changed the Zyxel management IP to 172.16.0.3 so the switch sits on the same management subnet as OPNsense and the management VM. In the Zyxel GUI I created VLANs 10, 20, 30. I set the Proxmox port as a trunk (tagged for those VLANs), set the AP port as a trunk, and configured access ports with the correct PVIDs for Home and Prod devices. The Guest network is only WiFi. On the AP I mapped SSIDs to VLANs so wireless clients were placed on the right network. Figure: Zyxel GS1200-8 configuration.\nFirewall and DNS policy\nI set deny-by-default rules on all VLANs in OPNsense, then added explicit allow rules where needed. I forced clients to use Pi-hole for DNS via DHCP and blocked other DNS servers at the firewall so clients couldn\u0026rsquo;t bypass filtering. Verify and iterate\nI tested client-to-gateway reachability, tested DNS against Pi-hole, and reviewed OPNsense logs for DHCP and DNS traffic. I fixed small issues (wrong PVID, missing VLAN tag on a VM NIC) and reran the checks. Quick verification checklist I used after each change: From the management VM, confirm OPNsense at 172.16.0.1 is reachable and the Zyxel switch responds at its management IP (example 172.16.0.3). From a HOME VLAN client, ping the HOME gateway 10.10.0.1 and verify DNS resolution is served by Pi-hole. From a PROD VM, confirm access to Pi-hole at 10.20.0.2 and verify inter-VLAN traffic is blocked by default. Check OPNsense logs for DHCP leases and DNS queries to validate policy enforcement and correct client placement. This step-by-step approach let me find problems early and keep the family network working while I built the lab.\nA note on the edge: embracing pragmatism with double NAT # My ISP supplies a single device that acts as both a fiber modem and a router and doesn\u0026rsquo;t offer a true bridge mode. That places OPNsense behind the ISP device and creates a double NAT.\nI accept this as a short-term, pragmatic compromise while I build and validate the firewall and VLANs. I don\u0026rsquo;t need any public-facing services now. If I need to expose something later I\u0026rsquo;ll use Cloudflare Tunnel to avoid inbound port configuration on the ISP device. When I\u0026rsquo;m ready for production I will request a standalone ONT from the ISP, remove their router from the path, and let OPNsense be the single edge gateway.\nDescription and implications — short version:\nImpacts:\nInbound complexity: port‑forwards must be configured on both the ISP device and OPNsense (or a DMZ/1:1 NAT used), which doubles configuration and troubleshooting points. NAT traversal: protocols like SIP or some peer-to-peer/remote‑access tools may fail or require TURN/STUN/VPN workarounds. Operational quirks: asymmetric routing, increased connection‑tracking load, and MTU/fragmentation issues can surface during testing. Mitigations I used:\nCloudflare Tunnel for any temporary public exposure: no inbound port opens, automatic TLS and auth, easy to test services. Use ISP DMZ/1:1 NAT for service passthrough during validation when needed. Trade-offs:\nTunnels are convenient but add an external dependency and a small latency penalty. For high-throughput or latency-sensitive services prefer a public IP or routed IPv6. Blueprint made real # The diagrams became wiring and configs. In short: I used Proxmox bridges to separate host/WAN and a VLAN‑aware LAN trunk, ran OPNsense as the central router/firewall with a deny‑by‑default posture, and placed core services (Pi‑hole, a management VM) on isolated subnets. The Zyxel switch and AP enforced VLAN boundaries at L2 and kept wireless SSIDs mapped to the correct networks.\nAndrei\n","date":"2 November 2025","externalUrl":null,"permalink":"/from-blueprint-to-bare-metal-building-a-segmented-homelab-network/","section":"Blog","summary":"I love diagrams, but diagrams don’t wire cables for me. In this post I will show the physical mapping, the Proxmox bridge pattern I used, the OPNsense management model, and the first firewall policy I used to protect the lab. The network was already in place; below I explain what I did to build and secure it.\n","title":"From Blueprint to Bare Metal: Building a Segmented Homelab Network","type":"blog"},{"content":"Hey there! In my last post, I shared why I’m starting this homelab journey. Today I’m taking it a step further: I’m rebuilding my home network from a simple, flat LAN into a segmented, security‑first setup \u0026hellip; very similar to how Google Cloud designs hub‑and‑spoke networks. If you’re new here, you might want to start with my introduction: Why not a homelab?\nWhy hub‑and‑spoke here (a note from the field) # Recently, I designed and deployed a hub-and-spoke network on GCP for a production platform. We manually configured multiple VPCs to connect to a central hub VPC for shared services and egress traffic, using VPC peering and custom route tables. That experience heavily influenced this homelab plan. I’m borrowing the same separation of concerns and security boundaries \u0026hellip; just scaled down to switches, VLANs, OPNsense, and Pi-hole.\nThis is my blueprint: clear boundaries, default‑deny, and observable traffic flows, but simplified for a home environment.\nWhy change anything at home? # My old setup worked, but it was a classic flat network: one big broadcast domain with minimal control where my work laptop, smart TV, and Unraid NAS all sat in the same logical space. It’s simple, but it offers zero internal segmentation.\nI wanted to bring the skills I use at work into my home. This redesign is my chance to get hands-on with enterprise practices like network segmentation, Zero Trust policies, and building a resilient platform for virtualization (Proxmox) and container orchestration (Kubernetes). The goal is cleaner security, better performance, and a solid foundation for all the projects I plan to run.\nThe “Before”: A Classic Flat Home Network # My current setup is probably familiar: a single flat network where every device (my laptop, the TV, my Unraid NAS, IoT gadgets) lives on the same LAN provided by the ISP router. It works, but it’s a free-for-all. Minimal segmentation, basic firewalling, and zero real control.\nKey characteristics:\nSingle Point of Failure \u0026amp; Trust: The ISP router handles everything: gateway, firewall, DHCP, and DNS. No Internal Boundaries: Every device can see every other device. A compromised IoT device could potentially access my NAS. Limited Visibility: I have almost no insight into what traffic is flowing between devices. The “After”: A Multi-Layer, Enterprise-Inspired Network # The new design introduces layers of control, intentionally mimicking the hub-and-spoke model I use in the cloud. At the heart is an OPNsense firewall, a managed switch for VLANs, and a dedicated DNS filtering layer with Pi-hole. This creates a strong foundation for my Proxmox and Kubernetes clusters.\nHighlights:\nEdge Security: OPNsense becomes the single point of entry and exit, handling stateful firewalling, VPN, and even IDS/IPS. Internal Segmentation: The managed switch creates VLANs, and OPNsense enforces rules for all traffic between them. Centralized DNS: Pi-hole provides network-wide ad-blocking and allows me to create internal-only DNS records. Dedicated Compute \u0026amp; Wireless: Proxmox provides a resilient compute layer, while a single VLAN-aware access point will broadcast different SSIDs for each VLAN, extending the segmentation to wireless devices. Side‑by‑side: Before vs After # Area Before (Flat) After (Enterprise‑style) Segmentation Single LAN VLANs: Home, Prod, Guest Security Basic ISP firewall Defense‑in‑depth, Zero Trust, IDS/IPS DNS ISP default Pi‑hole filtering + local overrides Routing NAT on ISP router Centralized on OPNsense with per‑VLAN rules Compute Bare devices Proxmox cluster + Kubernetes WiFi Single SSID Multiple SSIDs mapped to VLANs Observability Minimal Dashboards (OPNsense, Prometheus/Grafana, K8s) Resilience Many SPOFs HA patterns (Proxmox quorum, K8s replicas) How this maps to GCP hub‑and‑spoke # This design mirrors how cloud networks separate concerns and enforce policy.\nMy Home Network GCP Equivalent Function OPNsense Firewall Cloud Armor + Cloud NAT Edge security and egress NAT OPNsense as routing hub Network Connectivity Center Hub Central routing/control plane Pi‑hole DNS Cloud DNS + Private DNS Zones Split‑horizon, policy‑aware DNS Zyxel Managed Switch VPC Network Core fabric enabling segmentation VLANs (Home/Prod/Guest) VPC Subnets Isolation boundaries Inter‑VLAN policy VPC Peering / Shared VPC + Firewall Rules Controlled east‑west traffic Proxmox Cluster Compute Engine / GKE Autopilot HA compute substrate Kubernetes Cluster Google Kubernetes Engine (GKE) Container orchestration Unraid NAS Filestore / Cloud Storage Persistent storage APs + SSIDs Interconnect/VPN endpoints Access edges into the fabric ISP Router WAN Cloud Router + Cloud NAT Internet ingress/egress Note: It’s an analogy, not a 1:1 feature match, but the architectural patterns align.\nWhat I have today (gear \u0026amp; constraints) # This is a network plan. I don’t have the full setup yet, and that’s fine \u0026hellip; I’ll start small:\nBeelink Mini PC with 2 NICs: will run Proxmox; OPNsense will be virtualized with one NIC for WAN and one for LAN/trunk Zyxel 8-port managed switch: VLANs, trunk to Proxmox/OPNsense, access ports for test devices Zyxel AX3000 Access Point: A single AP connected to the managed switch. It will be configured with multiple SSIDs, each tagging traffic onto the corresponding VLAN (e.g., \u0026ldquo;Home-WiFi\u0026rdquo; on VLAN 10, \u0026ldquo;Guest-WiFi\u0026rdquo; on VLAN 30). Orange Pi: utility node to validate VLAN reachability and services Pi-hole: will run as a VM/LXC; DNS for Home/Prod VLANs with filtering This minimal kit is perfect for a “router on a stick” setup. Here’s the high-level technical plan: the practical details for OPNsense, the switch, and the AP will get their own dedicated posts as I build this out.\nProxmox Networking: The Beelink’s first NIC (eth0) will be passed directly to the OPNsense VM for the WAN connection. The second NIC (eth1) will be configured as a Linux bridge in Proxmox (vmbr0) and act as a VLAN trunk port. OPNsense VM: The VM will have two virtual NICs. The first (vtnet0) connects to the WAN. The second (vtnet1) connects to vmbr0 and will be configured as the LAN interface, tagged with all the VLANs (Home, Prod, Guest). OPNsense will handle all inter-VLAN routing. Switch and AP Configuration: One port on the Zyxel switch will be a trunk port connected to the Proxmox host’s eth1. Another port will connect to the Zyxel Access Point, also configured as a trunk to carry all VLANs. The remaining ports will be access ports for wired devices. A Note on Wireless DHCP: It\u0026rsquo;s critical to have only one DHCP server per VLAN. OPNsense will handle all DHCP for both wired and wireless clients. The access point must be configured to bridge the wireless SSIDs to their respective VLANs and must not run its own DHCP server. This ensures that a device connecting to the \u0026ldquo;Home-WiFi\u0026rdquo; SSID gets an IP address from the same 10.10.0.0/16 pool as a wired device on VLAN 10. This setup lets me validate the entire architecture: VLAN segmentation, DHCP/DNS per VLAN, and firewall rules (before I invest in more hardware).\nIP/VLAN Plan at a Glance # VLAN ID Name Subnet Gateway Purpose 10 Home 10.10.0.0/16 10.10.0.1 Trusted devices: laptops, phones, NAS 20 Prod 10.20.0.0/16 10.20.0.1 K8s nodes, servers, infrastructure 30 Guest 10.30.0.0/16 10.30.0.1 Untrusted devices, visitors, IoT - Edge 192.168.1.0/24 ISP Router ISP-facing network (untrusted) Security Patterns Baked In # Zero Trust: Default-deny between VLANs. Only explicitly allowed traffic can pass. Defense in Depth: Layer 1: ISP router provides a basic first line of defense. Layer 2: OPNsense acts as a stateful firewall with IDS/IPS capabilities. Layer 3: VLANs provide network segmentation. Layer 4: Pi-hole filters DNS requests, blocking malicious domains. Layer 5: Application-level controls within Kubernetes (e.g., NetworkPolicies). Least Privilege: Services are only granted the network access they absolutely need (e.g., only the Prod VLAN can access the NAS backup ports). Encrypted Remote Access: For secure remote access, I\u0026rsquo;m weighing my options. I could use a traditional VPN server like WireGuard or OpenVPN on OPNsense for broad network access. However, I\u0026rsquo;m also considering extending my use of Twingate, which I currently use for my NAS. Twingate is a Zero Trust Network Access (ZTNA) solution that provides secure, direct access to specific applications without exposing the entire network \u0026hellip; a model that aligns perfectly with the security principles of this new design. A dedicated post on this topic will surely follow. Final Thoughts: It\u0026rsquo;s a Journey # This isn’t just about a fancy home network. It’s a project. It’s about bringing the discipline I use in production (segmentation, least privilege, and observability), into an environment I can touch and experiment with freely. The hub-and-spoke analogy isn’t just for show; it’s a mental model that helps me reason about boundaries, traffic flow, and ownership.\nYes, this design adds complexity over a simple flat network, but that complexity is where the learning happens. It turns my homelab into a true-to-life platform for mastering Proxmox, Kubernetes, and security practices that directly mirror what I build for a living.\nI hope this post clearly outlines my vision for this network transformation. Subsequent articles in this series will document the implementation details, including OPNsense rule configurations, Pi-hole DNS management, and VLAN-to-SSID mapping. I\u0026rsquo;ll also be sharing the challenges encountered and the solutions I discovered along the way.\nStay tuned. Andrei\n","date":"19 October 2025","externalUrl":null,"permalink":"/from-enterprise-to-homelab-transforming-my-home-network/","section":"Blog","summary":"Hey there! In my last post, I shared why I’m starting this homelab journey. Today I’m taking it a step further: I’m rebuilding my home network from a simple, flat LAN into a segmented, security‑first setup … very similar to how Google Cloud designs hub‑and‑spoke networks. If you’re new here, you might want to start with my introduction: Why not a homelab?\n","title":"From Enterprise to Homelab: Transforming My Home Network","type":"blog"},{"content":"Hey there! If you\u0026rsquo;re reading this, you\u0026rsquo;re about to embark on an adventure with me that I never thought I\u0026rsquo;d start.\nI\u0026rsquo;m Andrei, and since this is my first blog post, let me tell you a bit about myself. I\u0026rsquo;m a Romanian expat in Italy; I started my career in civil engineering 20 years ago, working in that field for more than a decade before switching to IT. Today, I\u0026rsquo;m a Platform Engineer with extensive experience in cloud infrastructure and DevOps. I\u0026rsquo;m also an ex-professional rowing athlete with enough national titles to impress my daughters when I tell them stories. I\u0026rsquo;ve built production environments from scratch for financial institutions, architected Kubernetes clusters for different industries, and implemented GitOps workflows.\nBut here\u0026rsquo;s the thing: I\u0026rsquo;ve never built a homelab before and I\u0026rsquo;ve certainly never written a blog post.\nToday, I\u0026rsquo;m kicking off something terribly exciting for me. I\u0026rsquo;m starting a homelab from scratch, and I\u0026rsquo;m going to document every step, every failure, and every \u0026ldquo;am înțeles!\u0026rdquo; moment (e.g., Romanian for something like \u0026ldquo;aha!\u0026rdquo; in English). Why? Because I want to share my real-world experience in a way that anyone can understand and follow along with. And honestly, I need a new exciting challenge to keep my curiosity alive.\nThis isn\u0026rsquo;t just about tinkering with servers. It\u0026rsquo;s about exploring technology in a hands-on way, learning from mistakes, and maybe inspiring others to try something similar.\nWhy a homelab? Why now? # I love technology. I\u0026rsquo;ve always been fascinated by how things work under the hood. I already have a decent setup at home: some servers running various services, a NAS for storage, an Orange Pi for lightweight experiments, and Jellyfin for media streaming. I\u0026rsquo;ve worked extensively with containers and cloud infrastructure in production environments.\nBut I\u0026rsquo;ve never had the courage to take it to the next level and build a proper homelab. What if it took too much time away from my family? What if I invested time and money only to give up halfway? Those fears held me back.\nBut today, in the AI era, what pushes me to start writing down my experiences is that I want future proof that what I\u0026rsquo;m doing is built by human hands, not generated by an AI agent. My unique perspective and real-world lessons deserve to be documented authentically.\nNow, I\u0026rsquo;m challenging myself to push past that. A homelab represents the perfect opportunity to learn deeply, experiment freely, and document the process. It\u0026rsquo;s a safe space to fail, learn, and grow. And by sharing this process, I hope to show others that it\u0026rsquo;s okay to start small and build from there.\nWhat will this blog cover? # Honestly, I\u0026rsquo;m not entirely sure yet. This is as much an exploration for me as it is for you. But I know for sure it will include:\nMy Experience: Drawing from real-world scenarios and problems I face in my job Kubernetes Cluster: Definitely want to build and manage a proper K8s setup Real-World Scenarios: Simulating production-like challenges and solutions Not all posts will be deeply technical. Sometimes it might just be a quick \u0026ldquo;bit\u0026rdquo; like \u0026ldquo;Today I solved a big problem in this way\u0026rdquo; or \u0026ldquo;Here\u0026rsquo;s what I learned from this failure.\u0026rdquo; Let\u0026rsquo;s see what the future brings \u0026hellip; I\u0026rsquo;m keeping it flexible and following where the path takes me.\nThe challenge I\u0026rsquo;m embracing # This is new territory for me. I\u0026rsquo;ve never written publicly before, never shared my thought processes, never admitted when I don\u0026rsquo;t know something. But that\u0026rsquo;s exactly why I\u0026rsquo;m doing this. Growth happens outside your comfort zone, right?\nI\u0026rsquo;m not promising perfection. In fact, I guarantee there will be posts about spectacular failures and \u0026ldquo;lessons learned the hard way.\u0026rdquo; But that\u0026rsquo;s the beauty of a homelab \u0026hellip; it\u0026rsquo;s a safe space to break things and learn.\nWho\u0026rsquo;s this for? # For anyone interested in technology, from beginners to experts, though some posts will dive deep into technical details to keep things concise. Whether you\u0026rsquo;re:\nA complete beginner curious about what all this cloud stuff is about A developer wanting to understand the infrastructure side A sysadmin looking to level up your skills Someone with some experience who wants to experiment without risk Just someone curious about technology If you\u0026rsquo;re here, you\u0026rsquo;re welcome. I\u0026rsquo;ll explain concepts as clearly as possible, share my reasoning for decisions, and try to make complex topics accessible. But be prepared for some in-depth technical explorations!\nWhat\u0026rsquo;s next? # I\u0026rsquo;m not sure exactly what the next post will be yet; I\u0026rsquo;m still figuring this out as I go. But it will likely involve starting with my current setup and building from there. Maybe assessing my home network, or choosing some hardware, or just documenting my first attempts at something new.\nWhatever it is, I\u0026rsquo;ll share the process openly, including the uncertainties and learning moments.\nIf this post has sparked your curiosity, it means my first post was successful. Thanks for reading, and I hope you\u0026rsquo;ll join me for the next posts.\nAndrei\n","date":"11 October 2025","externalUrl":null,"permalink":"/why-not-a-homelab/","section":"Blog","summary":"Hey there! If you’re reading this, you’re about to embark on an adventure with me that I never thought I’d start.\n","title":"Why not a homelab?","type":"blog"},{"content":"Hi, I\u0026rsquo;m Andrei Vasiliu, currently the Platform \u0026amp; ISMS Director at Alpian Technologies in Rome, Italy, and a self-admitted homelab addict. I\u0026rsquo;m originally from Romania, and yes, I somehow made the jump from Civil Engineering (keeping actual physical buildings from falling down) to DevOps and Platform Engineering (keeping virtual pods from mysteriously crash-looping).\nI really enjoy building enterprise-grade infrastructure, locking down systems so tightly that even I get locked out sometimes, and driving digital transformation. Over the years, I\u0026rsquo;ve architected multi-cloud solutions for banks and led platform engineering teams\u0026hellip; mostly fueled by Friday afternoon deployments and an endless stream of YAML files.\nWhat is this blog about? # This space is dedicated to my homelab journey and technical rants. I treat my home infrastructure exactly like a production environment\u0026hellip; which means my spouse occasionally has to sit through an RCA (Root Cause Analysis) when the smart home Wi-Fi drops. Here, I take the massively over-complicated enterprise patterns I use at work and forcibly cram them into a small server rack in my house, just for fun.\nProfessional Expertise \u0026amp; Stuff I Break Often # Kubernetes \u0026amp; Cloud Native: Wrangling nodes (GKE, AKS, K3d, bare-metal), untangling Istio service meshes, and trying to figure out why DNS is failing again. GitOps \u0026amp; CI/CD: Forcing ArgoCD, GitHub Actions, and Terraform to deploy things automatically so I can find new and exciting ways to break production. Security \u0026amp; Governance: Zero Trust Networking, ISO 27001 ISMS paperwork, and setting off enterprise monitoring alerts at 3 AM. Platform Engineering: Trying to make developers happy (a true test of human endurance), hoarding secrets, and keeping systems somewhat reliable. Homelab Hardware: Turning a perfectly good closet into a high-decibel jet-engine testing facility just to run 5 containers that would have worked perfectly fine on a Raspberry Pi. Let\u0026rsquo;s Connect # If you want to see my actual professional facade or just check out my code, drop by the links below:\nGitHub (Where my side projects go to collect stars) LinkedIn (Where I wear a digital suit and tie) ","externalUrl":null,"permalink":"/about/","section":"Welcome","summary":"Hi, I’m Andrei Vasiliu, currently the Platform \u0026 ISMS Director at Alpian Technologies in Rome, Italy, and a self-admitted homelab addict. I’m originally from Romania, and yes, I somehow made the jump from Civil Engineering (keeping actual physical buildings from falling down) to DevOps and Platform Engineering (keeping virtual pods from mysteriously crash-looping).\n","title":"About","type":"page"},{"content":"","externalUrl":null,"permalink":"/archive/","section":"Welcome","summary":"","title":"Archive","type":"page"},{"content":"If you followed a link from an external site, the content may have moved. Use the links below to return to the homepage or browse the latest posts.\n","externalUrl":null,"permalink":"/404/","section":"Welcome","summary":"If you followed a link from an external site, the content may have moved. Use the links below to return to the homepage or browse the latest posts.\n","title":"Page Not Found","type":"page"}]