Skip to main content

Mini Cloud Platform β€” Bare-Metal Infrastructure

Private datacenter-equivalent infrastructure β€” built from scratch with MAAS, k3s, ArgoCD, and GitOps.


What This Project Is​

This documentation covers a complete bare-metal infrastructure built locally using MAAS (Metal as a Service), provisioning a 3-node cluster ready for Kubernetes and production workloads.

Equivalent to: AWS EC2 + VPC + Auto Provisioning β€” but local.

Infrastructure at a Glance​

NodeIPRoleHardware
set-hog10.0.0.2Control PlaneThinkPad T15 Gen 1
fast-skunk10.0.0.4WorkerThinkPad T490
fast-heron10.0.0.7WorkerThinkPad T490

MAAS Controller: Ubuntu + dual NIC (WiFi β†’ internet, Ethernet β†’ 10.0.0.1)


Complete Roadmap​

Each phase builds directly on the previous one β€” nothing requires something that hasn't been set up yet.

PhaseTopicKey TechnologyStatus
0MAAS + 3-node provisioningMAAS, PXE, cloud-initβœ… Done
1Kubernetes clusterk3sπŸ”œ Next
2kubectl local accesskubeconfigπŸ”œ
3Remote access from anywhereTailscale, Cloudflare Tunnel, HomerπŸ”œ
4Load balancer IPs on bare-metalMetalLBπŸ”œ
5Persistent storageLonghorn, NFSπŸ”œ
6Expose apps to the networkNGINX IngressπŸ”œ
7Private container registryHarbor + TrivyπŸ”œ
8Cluster monitoringPrometheus, GrafanaπŸ”œ
9First real workloadkubectl, containerdπŸ”œ
10Infrastructure automationAnsibleπŸ”œ
11Infrastructure as CodeTerraform / OpenTofu, CrossplaneπŸ”œ
12GitOps deploymentArgoCDπŸ”œ
13CI/CD pipelinesGitLab / GiteaπŸ”œ
14Backup & disaster recoveryVelero, etcd snapshotsπŸ”œ
15Security hardeningVault, cert-manager, RBACπŸ”œ
16Automation & workflowsn8n, Temporal, Apache AirflowπŸ”œ
17Event-driven & autoscalingKEDA, NATSπŸ”œ
18Developer platformBackstageπŸ”œ
19AI / ML platformOllama, MLflow, KubeflowπŸ”œ
20Reliability & chaos testingChaos MeshπŸ”œ
21Advanced observabilityLoki, Jaeger, AlertmanagerπŸ”œ
22eBPF networking (advanced)Cilium, HubbleπŸ”œ
β€”Data LayerKafka/Redpanda, ClickHouse, dbt, Superset, OpenMetadataπŸ”œ
β€”Security LayerKeycloak, OPA/Gatekeeper, Falco, Cosign+SBOM, kube-benchπŸ”œ

Final Stack (When Complete)​

── INFRASTRUCTURE ──────────────────────────────────────────────────
MAAS β†’ bare-metal provisioning
k3s β†’ Kubernetes cluster
MetalLB β†’ load balancer IPs
Longhorn β†’ distributed storage
Harbor β†’ private container registry

── AUTOMATION & DELIVERY ───────────────────────────────────────────
Ansible β†’ infrastructure automation
Terraform β†’ infrastructure as code
Crossplane β†’ Kubernetes-native IaC
ArgoCD β†’ GitOps
GitLab β†’ CI/CD

── PLATFORM SERVICES ───────────────────────────────────────────────
Velero β†’ backup & disaster recovery
Vault β†’ secrets management
n8n β†’ visual workflow automation
Temporal β†’ code-based workflow orchestration
Airflow β†’ data pipeline scheduling
KEDA β†’ event-driven autoscaling
NATS β†’ message broker
Backstage β†’ developer portal

── OBSERVABILITY ───────────────────────────────────────────────────
Prometheus β†’ metrics
Grafana β†’ dashboards
Loki β†’ logs
Jaeger β†’ traces
Chaos Mesh β†’ reliability testing
Cilium β†’ eBPF networking + Hubble
Tailscale β†’ remote access VPN

── DATA LAYER ──────────────────────────────────────────────────────
Redpanda β†’ event streaming (Kafka-compatible)
Debezium β†’ change data capture (CDC)
ClickHouse β†’ columnar analytics warehouse
dbt β†’ SQL transformation layer
Superset β†’ self-hosted BI dashboards
OpenMetadata β†’ data catalog, lineage, governance

── SECURITY LAYER ──────────────────────────────────────────────────
Keycloak β†’ SSO / OIDC identity provider
OPA/Gatekeeper→ admission control (policy as code)
Falco β†’ runtime threat detection (eBPF)
Cosign β†’ image signing + SBOM supply chain
kube-bench β†’ CIS compliance scoring

── AI / ML ─────────────────────────────────────────────────────────
Ollama β†’ local LLMs (Mistral, LLaMA 3)
MLflow β†’ ML experiment tracking
Kubeflow β†’ ML pipelines + distributed training

CV / LinkedIn Summary​

  • Designed and deployed a 3-node bare-metal infrastructure using MAAS
  • Implemented PXE-based automated OS provisioning via network boot (PXE)
  • Built isolated cluster network (10.0.0.0/24) with DHCP/DNS management
  • Resolved complex networking issues (IPv6 conflicts, DHCP overlap, alias interfaces)
  • Deployed full Kubernetes platform: k3s, ArgoCD, Prometheus, Harbor, Vault
  • Built private AI platform with local LLM serving (Ollama) and ML pipelines (Kubeflow)
  • Implemented remote access via Tailscale VPN and Cloudflare Tunnel
  • Applied chaos engineering with Chaos Mesh to validate cluster resilience
  • Built end-to-end data platform: Kafka/Redpanda β†’ ClickHouse β†’ dbt β†’ Superset with OpenMetadata governance
  • Implemented enterprise security: Keycloak SSO, OPA/Gatekeeper admission control, Falco runtime detection, Cosign supply chain signing