Experimental This project is experimental and a work in progress. Use at your own risk and do not expect thorough support!
This project deploys EKS-A Anywhere on Baremetal on Equinix Metal using the minimum requirements.
See https://aws.amazon.com/blogs/containers/getting-started-with-eks-anywhere-on-bare-metal/ for more information about EKS-A on Bare Metal.
A guided step-by-step manual installation workshop is available at
https://equinix-labs.github.io/eks-anywhere-on-equinix-metal-workshop/ If you want to learn more about how EKS-A is installed on Metal to better understand how and where you can adapt changes for your environments, we recommend following the manual workshop.
In the examples/lab directory, you can find a Terraform module to faciliate EKS-A on Bare Metal Lab environments.
EKS-A requires UEFI booting, which is supported by the following Equinix Metal On Demand plans:
- m3.small.x86
m3.large.x86n3.xlarge.x86a3.large.x86
With your Equinix Metal account, project, and a User API token, you can use Terraform v1+ to install a proof-of-concept demonstration environment for EKS-A on Baremetal.
Enter the examples/deploy
directory.
$ cd examples/deploy
Create a terraform.tfvars
file in the root of this project with metal_api_token
and project_id
defined. These are the required variables needed to run terraform apply
. See variables.tf
for additional settings that you may wish to customize.
# terraform.fvars
metal_api_token="...your Metal User API Token here..."
project_id="...your Metal Project API Token here..."
Note Project API Tokens can not be used to access some Gateway features used by this project. A User API Token is required.
Terraform will create an Equinix Metal VLAN, Metal Gateway, IP Reservation, and Equinix Metal servers to act as the EKS-A Admin node and worker devices. Terraform will also create the initial hardware.csv
with the details of each server and register this with the eks-anywhere
CLI to create the cluster. The worker nodes will be provisioned through Tinkerbell to act as a control-plane node and a worker-node.
Once complete, you'll see the following output:
$ terraform apply
... (~12m later)
Apply complete! Resources: 19 added, 0 changed, 0 destroyed.
Outputs:
eksa_admin_ip = "203.0.113.3"
eksa_admin_ssh_key = "/Users/username/.ssh/my-eksa-cluster-xed"
eksa_admin_ssh_user = "root"
eksa_nodes_sos = tomap({
"eksa-node-cp-001" = "b0e1426d-4d9e-4d01-bd5c-54065df61d67@sos.sv15.platformequinix.com"
"eksa-node-worker-001" = "84ffa9c7-84ce-46eb-97ff-2ae310fbb360@sos.sv15.platformequinix.com"
})
SSH into the EKS-A Admin node and follow the EKS-A on Baremetal instructions to continue within the Kubernetes environment.
ssh -i $(terraform output -json | jq -r .eksa_admin_ssh_key.value) root@$(terraform output -json | jq -r .eksa_admin_ip.value)
root@eksa-admin:~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
eksa-node-cp-001 Ready control-plane,master 7m56s v1.22.10-eks-7dc61e8
eksa-node-worker-001 Ready <none> 5m30s v1.22.10-eks-7dc61e8
This section is an example of adding a new node of the exact same time as the previous nodes to the cluster. For example, if you use project defaults you'll want to add a m3.small.x86 as the new node. Also, this example is just adding a new worker node for simplicity. Adding control plane nodes is possible, but requires thinking through how many nodes are added as well as labeling them as type=cp
instead of type=worker
.
NEW_HOSTNAME="your new hostname"
POOL_ADMIN="IP address of your admin machine"
metal device create --plan m3.small.x86 --metro da --hostname $NEW_HOSTNAME
--ipxe-script-url http://$POOL_ADMIN/ipxe/ --operating-system custom_ipxe
Make note of the device's UUID, maybe use metal device get
to list them.
DEVICE_ID="UUID you noted above"
BOND0_PORT=$(metal devices get -i $DEVICE_ID -o json |
jq -r '.network_ports [] | select(.name == "bond0") | .id')
ETH0_PORT=$(metal devices get -i $DEVICE_ID -o json |
jq -r '.network_ports [] | select(.name == "eth0") | .id')
VLAN_ID="Your VLAN ID, likely 1000"
metal port convert -i $BOND0_PORT --layer2 --bonded=false --force
metal port vlan -i $ETH0_PORT -a $VLAN_ID
Put the following in a new csv file hardware2.csv
hostname,mac,ip_address,gateway,netmask,nameservers,disk,labels
<HOSTNAME>,<MAC_ADDRESS>,<IP>,<GATEWAY>,<NETMASK>,8.8.8.8|8.8.4.4,/dev/sda,type=worker
Get your machine deployment group name:
kubectl get machinedeployments -n eksa-system
Generate the kubernetes yaml from your hardware2.csv file:
eksctl anywhere generate hardware -z hardware2.csv > cluster-scale.yaml
Edit cluster-scale.yaml and remove the two bmc items.
Use the machinedeployment group name along with the csv file to scale the cluster.
kubectl apply -f cluster-scale.yaml
kubectl scale machinedeployments -n eksa-system <Your MachineDeployment Group Name> --replicas 1
This section covers the basic steps to connect your cluster to EKS with the EKS Connector. There are many more details (include pre-requisites like IAM permissions) in the EKS Connector Documentation.
Connect to the eksa-admin host.
ssh -i $(terraform output -json | jq -r .eksa_admin_ssh_key.value) root@$(terraform output -json | jq -r .eksa_admin_ip.value)
Follow the AWS documentation and set the environment variables with your authentication info for AWS. For example:
export AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
export AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
export AWS_DEFAULT_REGION=us-west-2
Now use eksctl to register the cluster
eksctl register cluster --name my-cluster --provider my-provider --region region-code
If it succeeded, the output will show several .yaml files that were created and need to be registered with the cluster. For example, at the time of writing, applying those files would be done like so:
kubectl apply -f eks-connector.yaml,eks-connector-clusterrole.yaml,eks-connector-console-dashboard-full-access-group.yaml
Even more info can be found at the eksctl documentation.
Note This section will serve as manual instructions for installing EKS-A Bare Metal on Equinix Metal. The Terraform install above performs all of these steps for you. These instructions offer a step-by-step install with copy+paste commands that simplify the process. Refer to the open issues and please open issues if you encounter something not represented there.
Steps below align with EKS-A on Bare Metal instructions. While the steps below are intended to be complete, follow along with the EKS-A Install guide for best results.
No open issues are currently blocking. If you run into something unexpected, check the open issues and open a new issue reporting your experience.
The following tools will be needed on your local development environment where you will be running most of the commands in this guide.
- A Unix-like environment (Linux, OSX, Windows WSL)
- jq
- metal-cli (v0.9.0+)
-
Create an EKS-A Admin machine: Using the metal-cli:
Create an API Key and register it with the Metal CLI:
metal init
metal device create --plan=m3.small.x86 --metro=da --hostname eksa-admin --operating-system ubuntu_20_04
-
Create a VLAN:
metal vlan create --metro da --description eks-anywhere --vxlan 1000
-
Create a Public IP Reservation (16 addresses):
metal ip request --metro da --type public_ipv4 --quantity 16 --tags eksa
These variables will be referred to in later steps in executable snippets to refer to specific addresses within the pool. The correct IP reservation is chosen by looking for and expecting a single IP reservation to have the "eksa" tag applied.
#Capture the ID, Network, Gateway, and Netmask using jq VLAN_ID=$(metal vlan list -o json | jq -r '.virtual_networks | .[] | select(.vxlan == 1000) | .id') POOL_ID=$(metal ip list -o json | jq -r '.[] | select(.tags | contains(["eksa"]))? | .id') POOL_NW=$(metal ip list -o json | jq -r '.[] | select(.tags | contains(["eksa"]))? | .network') POOL_GW=$(metal ip list -o json | jq -r '.[] | select(.tags | contains(["eksa"]))? | .gateway') POOL_NM=$(metal ip list -o json | jq -r '.[] | select(.tags | contains(["eksa"]))? | .netmask') # POOL_ADMIN will be assigned to eksa-admin within the VLAN POOL_ADMIN=$(python3 -c 'import ipaddress; print(str(ipaddress.IPv4Address("'${POOL_GW}'")+1))') # PUB_ADMIN is the provisioned IPv4 public address of eks-admin which we can use with ssh PUB_ADMIN=$(metal devices list -o json | jq -r '.[] | select(.hostname=="eksa-admin") | .ip_addresses [] | select(contains({"public":true,"address_family":4})) | .address') # PORT_ADMIN is the bond0 port of the eks-admin machine PORT_ADMIN=$(metal devices list -o json | jq -r '.[] | select(.hostname=="eksa-admin") | .network_ports [] | select(.name == "bond0") | .id') # POOL_VIP is the floating IPv4 public address assigned to the current lead kubernetes control plane POOL_VIP=$(python3 -c 'import ipaddress; print(str(ipaddress.ip_network("'${POOL_NW}'/'${POOL_NM}'").broadcast_address-1))') TINK_VIP=$(python3 -c 'import ipaddress; print(str(ipaddress.ip_network("'${POOL_NW}'/'${POOL_NM}'").broadcast_address-2))')
-
Create a Metal Gateway
metal gateway create --ip-reservation-id $POOL_ID --virtual-network $VLAN_ID
-
Create Tinkerbell worker nodes
eksa-node-001
-eksa-node-002
with Custom IPXE http://{eks-a-public-address}. These nodes will be provisioned as EKS-A Control Plane OR Worker nodes.for a in {1..2}; do metal device create --plan m3.small.x86 --metro da --hostname eksa-node-00$a \ --ipxe-script-url http://$POOL_ADMIN/ipxe/ --operating-system custom_ipxe done
Note that the
ipxe-script-url
doesn't actually get used in this process, we're just setting it as it's a requirement for using the custom_ipxe operating system type. -
Add the vlan to the eks-admin bond0 port:
metal port vlan -i $PORT_ADMIN -a $VLAN_ID
Configure the layer 2 vlan network on eks-admin with this snippet:
ssh root@$PUB_ADMIN tee -a /etc/network/interfaces << EOS auto bond0.1000 iface bond0.1000 inet static pre-up sleep 5 address $POOL_ADMIN netmask $POOL_NM vlan-raw-device bond0 EOS
Activate the layer 2 vlan network with this command:
ssh root@$PUB_ADMIN systemctl restart networking
-
Convert
eksa-node-*
's network ports to Layer2-Unbonded and attach to the VLAN.node_ids=$(metal devices list -o json | jq -r '.[] | select(.hostname | startswith("eksa-node")) | .id') i=1 # We will increment "i" for the eksa-node-* nodes. "1" represents the eksa-admin node. for id in $(echo $node_ids); do let i++ BOND0_PORT=$(metal devices get -i $id -o json | jq -r '.network_ports [] | select(.name == "bond0") | .id') ETH0_PORT=$(metal devices get -i $id -o json | jq -r '.network_ports [] | select(.name == "eth0") | .id') metal port convert -i $BOND0_PORT --layer2 --bonded=false --force metal port vlan -i $ETH0_PORT -a $VLAN_ID done
-
Capture the MAC Addresses and create
hardware.csv
file oneks-admin
in/root/
(run this on the host with metal cli on it):-
Create the CSV Header:
echo hostname,vendor,mac,ip_address,gateway,netmask,nameservers,disk,labels > hardware.csv
-
Use
metal
andjq
to grab HW MAC addresses and add them to the hardware.csv:node_ids=$(metal devices list -o json | jq -r '.[] | select(.hostname | startswith("eksa-node")) | .id') i=1 # We will increment "i" for the eksa-node-* nodes. "1" represents the eksa-admin node. for id in $(echo $node_ids); do # Configure only the first node as a control-panel node if [ "$i" = 1 ]; then TYPE=cp; else TYPE=worker; fi; # change to 3 for HA NODENAME="eks-node-00$i" let i++ MAC=$(metal device get -i $id -o json | jq -r '.network_ports | .[] | select(.name == "eth0") | .data.mac') IP=$(python3 -c 'import ipaddress; print(str(ipaddress.IPv4Address("'${POOL_GW}'")+'$i'))') echo "$NODENAME,Equinix,${MAC},${IP},${POOL_GW},${POOL_NM},8.8.8.8|8.8.4.4,/dev/sda,type=${TYPE}" >> hardware.csv done
The BMC fields are omitted because Equinix Metal does not expose the BMC of nodes. EKS Anywhere will skip BMC steps with this configuration.
-
Copy
hardware.csv
toeksa-admin
:scp hardware.csv root@$PUB_ADMIN:/root
-
We've now provided the eksa-admin
machine with all of the variables and configuration needed in preparation.
-
Login to eksa-admin with the
LC_POOL_ADMIN
andLC_POOL_VIP
variable defined# SSH into eksa-admin. The special args and environment setting are just tricks to plumb $POOL_ADMIN and $POOL_VIP into the eksa-admin environment. LC_POOL_ADMIN=$POOL_ADMIN LC_POOL_VIP=$POOL_VIP LC_TINK_VIP=$TINK_VIP ssh -o SendEnv=LC_POOL_ADMIN,LC_POOL_VIP,LC_TINK_VIP root@$PUB_ADMIN
Note The remaining steps assume you have logged into
eksa-admin
with the SSH command shown above. -
Install
eksctl
and theeksctl-anywhere
plugin on eksa-admin.curl "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" \ --silent --location \ | tar xz -C /tmp sudo mv /tmp/eksctl /usr/local/bin/
export EKSA_RELEASE="0.14.3" OS="$(uname -s | tr A-Z a-z)" RELEASE_NUMBER=30 curl "https://anywhere-assets.eks.amazonaws.com/releases/eks-a/${RELEASE_NUMBER}/artifacts/eks-a/v${EKSA_RELEASE}/${OS}/amd64/eksctl-anywhere-v${EKSA_RELEASE}-${OS}-amd64.tar.gz" \ --silent --location \ | tar xz ./eksctl-anywhere sudo mv ./eksctl-anywhere /usr/local/bin/
-
Install
kubectl
on eksa-admin:snap install kubectl --channel=1.25 --classic
Version 1.25 matches the version used in the eks-anywhere repository.
Alternatively, install via APT.
curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list apt-get update apt-get install kubectl
-
Install Docker
Run the docker install script:
curl -fsSL https://get.docker.com -o get-docker.sh chmod +x get-docker.sh ./get-docker.sh
Alternatively, follow the instructions from https://docs.docker.com/engine/install/ubuntu/.
-
Create EKS-A Cluster config:
export TINKERBELL_HOST_IP=$LC_TINK_VIP export CLUSTER_NAME="${USER}-${RANDOM}" export TINKERBELL_PROVIDER=true eksctl anywhere generate clusterconfig $CLUSTER_NAME --provider tinkerbell > $CLUSTER_NAME.yaml
Note: The remaining steps assume you have defined the variables set above.
Install yq
snap install yq
Generate a public SSH key and store it in a variable called 'SSH_PUBLIC_KEY'
ssh-keygen -t rsa export SSH_PUBLIC_KEY=$(cat /root/.ssh/id_rsa.pub)
-
Run the below yq command to make the following necessary changes to the $CLUSTER_NAME.yaml file.
- Set control-plane IP for Cluster resource.
- Set the TinkerbellDatacenterConfig resource spec in config
- Set the public ssh key in TinkerbellMachineConfig users[name=ec2-user].sshAuthorizedKeys
- Set the hardwareSelector for each TinkerbellMachineConfig
- Change the templateRef for each TinkerbellMachineConfig section
yq eval -i ' (select(.kind == "Cluster") | .spec.controlPlaneConfiguration.endpoint.host) = env(LC_POOL_VIP) | (select(.kind == "TinkerbellDatacenterConfig") | .spec.tinkerbellIP) = env(LC_TINK_VIP) | (select(.kind == "TinkerbellMachineConfig") | (.spec.users[] | select(.name == "ec2-user")).sshAuthorizedKeys) = [env(SSH_PUBLIC_KEY)] | (select(.kind == "TinkerbellMachineConfig" and .metadata.name == env(CLUSTER_NAME) + "-cp" ) | .spec.hardwareSelector.type) = "cp" | (select(.kind == "TinkerbellMachineConfig" and .metadata.name == env(CLUSTER_NAME)) | .spec.hardwareSelector.type) = "worker" | (select(.kind == "TinkerbellMachineConfig") | .spec.templateRef.kind) = "TinkerbellTemplateConfig" | (select(.kind == "TinkerbellMachineConfig") | .spec.templateRef.name) = env(CLUSTER_NAME) ' $CLUSTER_NAME.yaml
-
Append the following to the $CLUSTER_NAME.yaml file.
cat << EOF >> $CLUSTER_NAME.yaml --- apiVersion: anywhere.eks.amazonaws.com/v1alpha1 kind: TinkerbellTemplateConfig metadata: name: ${CLUSTER_NAME} spec: template: global_timeout: 6000 id: "" name: ${CLUSTER_NAME} tasks: - actions: - environment: COMPRESSED: "true" DEST_DISK: /dev/sda IMG_URL: https://anywhere-assets.eks.amazonaws.com/releases/bundles/29/artifacts/raw/1-25/bottlerocket-v1.25.6-eks-d-1-25-7-eks-a-29-amd64.img.gz image: public.ecr.aws/eks-anywhere/tinkerbell/hub/image2disk:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-29 name: stream-image timeout: 600 - environment: CONTENTS: | # Version is required, it will change as we support # additional settings version = 1 # "eno1" is the interface name # Users may turn on dhcp4 and dhcp6 via boolean [enp1s0f0np0] dhcp4 = true dhcp6 = false # Define this interface as the "primary" interface # for the system. This IP is what kubelet will use # as the node IP. If none of the interfaces has # "primary" set, we choose the first interface in # the file primary = true DEST_DISK: /dev/sda12 DEST_PATH: /net.toml DIRMODE: "0755" FS_TYPE: ext4 GID: "0" MODE: "0644" UID: "0" image: public.ecr.aws/eks-anywhere/tinkerbell/hub/writefile:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-29 name: write-netplan pid: host timeout: 90 - environment: BOOTCONFIG_CONTENTS: | kernel { console = "ttyS1,115200n8" } DEST_DISK: /dev/sda12 DEST_PATH: /bootconfig.data DIRMODE: "0700" FS_TYPE: ext4 GID: "0" MODE: "0644" UID: "0" image: public.ecr.aws/eks-anywhere/tinkerbell/hub/writefile:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-29 name: write-bootconfig pid: host timeout: 90 - environment: DEST_DISK: /dev/sda12 DEST_PATH: /user-data.toml DIRMODE: "0700" FS_TYPE: ext4 GID: "0" HEGEL_URLS: http://${LC_POOL_ADMIN}:50061,http://${LC_TINK_VIP}:50061 MODE: "0644" UID: "0" image: public.ecr.aws/eks-anywhere/tinkerbell/hub/writefile:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-29 name: write-user-data pid: host timeout: 90 - image: public.ecr.aws/eks-anywhere/tinkerbell/hub/reboot:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-29 name: reboot-image pid: host timeout: 90 volumes: - /worker:/worker name: ${CLUSTER_NAME} volumes: - /dev:/dev - /dev/console:/dev/console - /lib/firmware:/lib/firmware:ro worker: '{{.device_1}}' version: "0.1" EOF
-
Create an EKS-A Cluster. Double check and be sure
$LC_POOL_ADMIN
and$CLUSTER_NAME
are set correctly before running this (they were passed through SSH or otherwise defined in previous steps). Otherwise manually set them!eksctl anywhere create cluster --filename $CLUSTER_NAME.yaml \ --hardware-csv hardware.csv --tinkerbell-bootstrap-ip $LC_POOL_ADMIN
-
When the command above indicates that it is
Creating new workload cluster
, reboot the two nodes. This is to force them attempt to iPXE boot from the tinkerbell stack thateksctl anywhere
command creates. Note that this must be done without interrupting theeksctl anywhere create cluster
command.Option 1 - You can use this command to automate it, but you'll need to be back on the original host.
node_ids=$(metal devices list -o json | jq -r '.[] | select(.hostname | startswith("eksa-node")) | .id') for id in $(echo $node_ids); do metal device reboot -i $id done
Option 2 - Instead of rebooting the nodes from the host you can force the iPXE boot from your local by accessing each node's SOS console. You can retrieve the uuid and facility code of each node using the metal cli, UI Console or the Equinix Metal's API. By default, any existing ssh key in the project can be used to login.
ssh {node-uuid}@sos.{facility-code}.platformequinix.com -i </path/to/ssh-key>
- You can see the below logs message if the whole process is successful.
Installing networking on workload cluster Creating EKS-A namespace Installing cluster-api providers on workload cluster Installing EKS-A secrets on workload cluster Installing resources on management cluster Moving cluster management from bootstrap to workload cluster Installing EKS-A custom components (CRD and controller) on workload cluster Installing EKS-D components on workload cluster Creating EKS-A CRDs instances on workload cluster Installing GitOps Toolkit on workload cluster GitOps field not specified, bootstrap flux skipped Writing cluster config file Deleting bootstrap cluster :tada: Cluster created! -------------------------------------------------------------------------------------- The Amazon EKS Anywhere Curated Packages are only available to customers with the Amazon EKS Anywhere Enterprise Subscription -------------------------------------------------------------------------------------- Enabling curated packages on the cluster Installing helm chart on cluster {"chart": "eks-anywhere-packages", "version": "0.2.30-eks-a-29"}
-
To verify the nodes are deployed properly OR Not.
LC_POOL_ADMIN=$POOL_ADMIN LC_POOL_VIP=$POOL_VIP LC_TINK_VIP=$TINK_VIP ssh -o SendEnv=LC_POOL_ADMIN,LC_POOL_VIP,LC_TINK_VIP root@$PUB_ADMIN
cp <CLUSTER_NAME Directory>/<CLUSTER_NAME>-eks-a-cluster.kubeconfig /root/.kube/config
kubectl get nodes -A kubectl get pods -A