- Introduction
- Features
- Prerequisites
- Installation
- Usage
- Examples
- Advanced Configurations
- Troubleshooting
- Final Product
- Additional Resources
ClusterCreator automates the creation and maintenance of fully functional Kubernetes (K8S) clusters of any size on Proxmox. Leveraging Terraform/OpenTofu and Ansible, it facilitates complex setups, including decoupled etcd clusters, diverse worker node configurations, and optional integration with Unifi networks and VLANs.
Having a virtualized K8S cluster allows you to not only simulate a cloud environment but also scale and customize your cluster to your needs—adding or removing nodes and disks, managing backups and snapshots of the virtual machine disks, customizing node class types, and controlling state.
Watch a step-by-step demo on my blog.
- Automated VM and VLAN Creation: Utilize OpenTofu to create VMs and VLANs tailored to your cluster needs.
- Kubernetes Installation and Configuration: Ansible playbooks handle the installation of Kubernetes and essential add-ons.
- Scalable Cluster Management: Easily add or remove nodes, customize node classes, and manage hardware requirements.
- Optional Unifi Network Integration: Configure dedicated networks and VLANs with Unifi.
- Highly Available Control Plane: Implement HA control planes using Kube-VIP.
- Customizable Networking: Support for dual-stack networking (IPv4 & IPv6).
- Dynamic Worker Classes: Define worker nodes with varying CPU, memory, disk, and networking specifications.
Before proceeding, ensure you have the following:
- Proxmox VE: A running Proxmox cluster.
- Unifi Controller (optional): For managing networks and VLANs.
- Terraform/OpenTofu: Installed and configured.
- Ansible: Installed on your control machine.
- Access Credentials: For Proxmox and Unifi (if used).
ClusterCreator requires access to the Proxmox cluster. Execute the following commands on your Proxmox server to create a datacenter user:
pveum user add terraform@pve -comment "Terraform User"
pveum role add TerraformRole -privs "Datastore.Allocate Datastore.AllocateSpace Datastore.AllocateTemplate Datastore.Audit Pool.Allocate Pool.Audit Sys.Audit Sys.Console Sys.Modify SDN.Use VM.Allocate VM.Audit VM.Clone VM.Config.CDROM VM.Config.Cloudinit VM.Config.CPU VM.Config.Disk VM.Config.HWType VM.Config.Memory VM.Config.Network VM.Config.Options VM.Migrate VM.Monitor VM.PowerMgmt User.Modify Mapping.Use"
pveum aclmod / -user terraform@pve -role TerraformRole
sudo pveum user token add terraform@pve provider --privsep=0
For additional documenation see Proxmox API Token Authentication.
Rename and edit secrets.tf.example to secrets.tf. These secrets are used by Tofu to interact with Proxmox and Unifi.
cp secrets.tf.example secrets.tf
- Proxmox Credentials: Refer to for creating API tokens.
- Unifi Credentials: Create a service account in the Unifi Controller with Site Admin permissions for the Network app.
Rename and edit .env.example
to .env
. These secrets are used in bash scripts for VM operations.
cp .env.example .env
Note: There may be overlapping configurations between secrets.tf and .env.
Customize the following configuration files to suit your environment:
k8s.env
: Specify Kubernetes versions and template VM configurations.vars.tf
: Define non-sensitive variables for Tofu.clusters.tf
: Configure cluster specifications. Update the username to your own.main.tf
: Manage VM, VLAN, and pool resources with Tofu.
Run the create_template.sh
script to generate a cloud-init ready VM template for Tofu.
./create_template.sh
What It Does:
- Installs necessary apt packages (e.g., kubeadm, kubelet, kubectl, helm).
- Compiles and installs packages from source (e.g., cilium CLI, etcdctl).
- Updates the operating system.
- Configures system settings for Kubernetes (kernel modules, sysctl).
- Sets up multipath configuration for storage systems like Longhorn.
- Supports both Ubuntu and Debian images.
Outcome: A VM template that installs all required packages and configurations, ready for cloud-init.
Initialize Tofu modules. This step is required only once.
tofu init
Create a dedicated workspace for your cluster.
tofu workspace new <cluster_name>
Purpose: Ensures Tofu commands are scoped to the specified cluster. Switch between workspaces using:
tofu workspace switch <cluster_name>
Apply the Tofu configuration to create VMs and related resources.
tofu apply [--auto-approve] [-var="template_vm_id=<vm_id>"]
Functionality:
- Clones the VM template.
- Sets cloud-init parameters.
- Creates a Proxmox pool and Unifi VLAN (if configured).
- Generates
cluster_config.json
for Ansible.
Default template_vm_id
: 9000
Run the Ansible playbooks to set up Kubernetes.
./install_k8s.sh --cluster_name <CLUSTER_NAME> [-a/--add-nodes]
Options:
--add-nodes
: Adds new nodes to an existing cluster.
Includes:
- Optional decoupled etcd cluster setup.
- Highly available control plane with Kube-VIP.
- Cilium CNI (with optional dual-stack networking).
- Metrics server installation.
- Node labeling and tainting.
- StorageClass configuration.
- Node preparation and joining.
Note: Avoid using --add-nodes
for setting up or editing a decoupled etcd cluster.
Configure your kubeconfig
to interact with the clusters:
export KUBECONFIG=~/.kube/config:~/.kube/alpha.yml:~/.kube/beta.yml:~/.kube/gamma.yml
Tip: Add the export c
8000
ommand to your shell's configuration file (~/.bashrc
or ~/.zshrc
) for persistence.
Use tools like kubectx
or kubie
to switch between contexts.
Remove a node from the cluster:
./remove_node.sh -n/--cluster-name <CLUSTER_NAME> -h/--hostname <NODE_HOSTNAME> -t/--timeout <TIMEOUT_SECONDS> [-d/--delete]
Options:
--delete
: Deletes and resets the node for fresh re-commissioning.
Note: Not applicable for decoupled etcd nodes.
Reset the Kubernetes cluster:
./uninstall_k8s.sh -n/--cluster_name <CLUSTER_NAME> [-h/--single-hostname <HOSTNAME_TO_RESET>]
Options:
--single-hostname
: Resets a specific node. Without this, all nodes are reset, and the cluster is deleted.
Remove VMs, pools, and VLANs:
tofu destroy [--auto-approve] [--target='proxmox_virtual_environment_vm.node["<vm_name>"]']
Options:
--target
: Specifies particular VMs to destroy.
Manage VM power states:
./powerctl_pool.sh [--start|--shutdown|--pause|--resume|--hibernate|--stop] <POOL_NAME> [--timeout <timeout_in_seconds>]
Requirements: QEMU Guest Agent must be running on VMs.
Execute bash commands on specified Ansible host groups:
./run_command_on_host_group.sh [-n/--cluster-name <CLUSTER_NAME>] [-g/--group <GROUP_NAME>] [-c/--command '<command>']
Example:
./run_command_on_host_group.sh -n mycluster -g all -c 'sudo apt update'
A minimal cluster resembling Minikube or Kind.
- Cluster Name:
alpha
- Control Plane:
- Nodes: 1
- Specifications: 16 CPU cores, 16GB RAM, 100GB disk
Note: Less than one worker node results in the control plane being untainted, allowing it to run workloads.
Expand with additional worker nodes for diverse workloads.
- Cluster Name:
beta
- Control Plane:
- Nodes: 1
- Specifications: 4 CPU cores, 4GB RAM, 30GB disk
- Workers:
- Nodes: 2 (class:
general
) - Specifications: 8 CPU cores, 4GB RAM, 30GB disk each
- Nodes: 2 (class:
Note: etcd nodes are utilized by control plane nodes but are not explicitly shown.
A robust setup with multiple control and etcd nodes, including GPU workers.
- Cluster Name:
gamma
- Control Plane:
- Nodes: 3
- Specifications: 4 CPU cores, 4GB RAM, 30GB disk each
- Decoupled etcd:
- Nodes: 3
- Specifications: 2 CPU cores, 2GB RAM, 30GB disk each
- Workers:
- General Nodes: 5
- Specifications: 8 CPU cores, 4GB RAM, 30GB disk
- GPU Nodes: 2
- Specifications: 2 CPU cores, 2GB RAM, 20GB disk, with attached GPUs
- General Nodes: 5
Leverage OpenTofu and Ansible to create highly dynamic cluster configurations:
- Control Plane Nodes: 1 to ∞
- etcd Nodes: 0 to ∞
- Worker Nodes: 0 to ∞, with varying classes (defined by name, CPU, memory, disk, networking, labels)
Configure IPv4 and IPv6 support:
- IPv6 Disabled:
ipv6.enabled = false
- Cluster operates with IPv4 only.
- IPv6 Enabled, Single Stack:
ipv6.enabled = true
ipv6.dual_stack = false
- Host and VLAN have IPv6, but the cluster uses IPv4.
- IPv6 Enabled, Dual Stack:
ipv6.enabled = true
ipv6.dual_stack = true
- Both IPv4 and IPv6 are active within the cluster.
Note: IPv6-only clusters are not supported due to complexity and external dependencies (e.g., GitHub Container Registry lacks IPv6).
Tip: The HA kube-vip API server can utilize an IPv6 address without enabling dual-stack.
Define custom worker classes in clusters.tf
to meet specific workload requirements:
-
GPU Workers:
- Example: Already implemented in
clusters.tf
- Use Case: AI and machine learning workloads.
- Example: Already implemented in
-
Storage Workers:
- Configuration: Extra disks, taints for storage systems like Rook.
-
Database Workers:
- Configuration: Increased memory for database operations.
-
FedRAMP Workers:
- Configuration: Taints to restrict workloads to government containers.
-
Backup Workers:
- Configuration: Reduced CPU and memory, expanded disks, taints for backup storage.
Common Issues:
-
Proxmox Clone Failures: Proxmox may struggle with cloning identical templates repeatedly.
Solution:- Retry
tofu apply
multiple times with larger cluster sizes. - Add nodes in smaller batches to distribute across the Proxmox cluster.
- Retry
-
Configuration Conflicts: Errors related to existing configurations or unresponsive VMs.
Solution:- Ensure no conflicting resources exist before applying.
- Use
./uninstall_k8s.sh
to reset VMs if necessary.
Workaround: For persistent issues, create brand-new VMs to ensure a clean environment.
- Flux Kubernetes Repository: christensenjairus/Flux-Kubernetes
Explore how to orchestrate Kubernetes infrastructure and applications using infrastructure as code with Flux.