Stay frosty like Tony
A home for my system configurations and home lab using Nix Flakes.
Be warned, I'm still learning and experimenting.
Features:
- ACME certificates for TLS on services
- Reverse proxy with mTLS protection to internal services
- Private CA TLS certificates for direct service access (k8s, OPNsense, Proxmox, access point)
- Wireguard VPN remote access
- Unencrypted client DNS trapped, filtered, and upgraded to DoT
- Malicious traffic dropped with dynamic list updates
- DDNS
- QEMU guest agent
- Prometheus metrics export
- Wake-on-LAN GUI
- Configuration backed up to two locations
Todo:
- Maybe Tailscale OPNsense
- Test local DNS from VPNs
- Find a DDNS provider that supports the generic update mechanism, not a proprietary API. Switch to Inadyne DDNS client for that?
- Host authoritative DNS server, maybe Hickory. See ns-global for some resiliency.
- Review
net.inet.tcp.tso
for VM safety/perf - Add dNAT port forwarding for Proxmox managment GUI from 443 to 8006
- Enable mDNS responses from OPNsense box permanently
- Tune Wireguard
- Add IPsec
- Fix/add OpenVPN ref ref
- Figure out why DNAT of DNS traffic to loopback doesn't work and has to be LAN IP address
- Figure out how to make the configuration work when the v6 prefix changes
- Add compatibility option/translation layer for IPv6->IPv4
- Remove IPv4
- Host an authoritative DNS
- See about getting my own AS and IPv6 prefix
Features:
- iPXE/TFTP with Netboot for multi-option
- Automatic disk resize on NixOS VMs
- Mix of VMs and physical nodes
- Standardized configuration and configuration management of all nodes
Todo:
- Convert nodes to use ssh certificates for client authentication and server certificates instead of TOFU
- Swap my user to a lower privilege one on Proxmox and OPNsense
- See about more modern watchdog options - apparently this one is ancient 32 bit PCI
- Debug watchdog not stopping on control node reboot.
- Either stabilize or hardware watchdog Topton N100
- Work out watchdog on OPNsense/BSD
- Set up OpenAMT for out-of-band management.
Features:
- Caddy reverse proxy
- Prometheus+Alertmanager+Grafana monitoring stack
- Garage S3 cluster
- Valheim server
- Nix binary cache
- Kanidm identity management
Todo:
- Advanced monitoring (Mimir, Tempo, Loki, Trickster, Victoria Metrics, InfluxDB, etc)
- Configure what can be for Otel
- Spire for node identity
- Stop Spire agent dying if stale join token
- Secrets (Vault/OpenBao?)
- Certificate authority? (step-ca?)
- More identity integration
- Switch routing to dynamic subdomains.
- Add Uptime Kuma publicly
- Deploy external dead man's switch and route Alertmanager to it.
- Find a nice way to make foundational services upstream in Nginx config either nicer or subsume it.
- Look into different Nix store cache, maybe Attic or Harmonia
Features:
- Custom, bare-metal deployment configuration
- Private CA certificates
- Single-stack IPv6 with native routing/no overlay
- Dynamic BGP peering of nodes with router/OPNsense
- CoreDNS inside cluster
Todo:
- External-DNS, Certificate-Manager, FluxCD
- Figure cluster bootstrapping out
- Do dynamically-delegated prefixes for node pod CIDRs. Honestly I'm not sure this is a value-add but it would be cool. See diagram below.
- Set up IPv6 public ingress and firewalling
- Make FRR BGP config persist on OPNsense
- Use the kubernetes mkCert and mkKubeConfig functions example
- Look into kubernetes managing itself with etc+cluster CAs in
/etc/kubernetes/pki
- See about CSR auto-approval project
- Add WASM runtime
- Find some kind of dynamic PV/storage option. I'm thinking Longhorn. post 1 post 2 Maybe OpenEBS.
- Play around with Timoni, Kluctl, etc
- Add tracing endpoint for Containerd, maybe monitor better article prom docs
- "Package" an app using generic Helm charts
- Write a custom cloud provider using SSH and WoL.
- Adjust the custom cloud provider to use OpenAMT.
- Pull k8s module out into it's own flake/repo/overlay?
- Use sig-addonmanager to bootstrap a CD tool and a CNI
- Look into
buildEnv
overdevShell
Get a container image build with nix goingJamey blog Amos's example
See also:
Pre-requisites:
- NixOS flashed to USB
- Mash F10/Esc to hit the bios (this was a thowback and a pain to do).
Or just use
systemctl reboot --firmware-setup
~ Future Ariel. - Update the BIOS
- Load the settings from
HpSetup.txt
OR follow along the rest. - Configure the following
- Ensure legacy boot is enabled.
- I disabled secure boot and MS certificate in case
- Turn off fast boot (might be optional)
- Add boot delay 5 seconds (purely QoL)
- Ensure USB takes priority over local disk
- I disabled prompt on memory change so if I add RAM later I don't have to displace the system.
- I disabled Intel's sgx or whatnot. Don't trust it after the RST debacle.
- Save and reboot
- Hit escape to select boot option of USB (esc maybe not required)
- Follow the instructions to install NixOS
- 23.05 (but higher is fine)
- User nixos
- Same password for
root
- Auto login (QoL but consult your threat model)
- Use
nix-shell
to obtain Git and Helix - Clone this flake repo from github
- Copy the machine-specific disk config from
/etc/nixos/hardware-configuration.nix
. Place it in the machine'shardware-configuration.nix
in the flake repo. (This step may no longer be necessary) - Nix rebuild switch to the flake's config.
- Confirm SSH remote access is working.
- Reboot and enter bios.
- Turn fast boot back on
- Set boot delay to 0
- Disable UEFI boot priority. If we need to boot from USB we'll reenter the BIOS.
- Save BIOS changes and one last confirmation that the system boots and is remotable.
- Move the machine to it's final home.
- Remotely retrieve the hardware configuration and commit it to the flake repo.
-
Download BIOS update and place on Ventoy USB.
-
Mash
F10
to enter BIOS, boot update. Enter1
and it should update. -
Reset and wait, it will beep and hang and reboot but eventually it should come good.
-
Set USB boot precendence above internal drive/s
-
Boot Proxmox installer and walk through Set static IP with netmask as same as router's DHCP netmask. Best I can tell this is required to send traffic back to origin.
-
Highly recommended but optionally, trust your SSH keys.
curl https://github.com/arichtman.keys >> ~/.ssh/authorized_keys
-
Optionally, add static DHCP lease to the router. If you do this, you can also optionally remove the fixed interface configuration. Edit
/etc/network/interfaces
and switch the virtual bridge network configuration frommanual
todhcp
. -
Optionally, install trusted certificates. Instructions are on my blog.
-
Run some of the proxmox helper scripts At least the post install one to fix sources. I also ran the microcode update, CPU scaling governor, and kernel cleanup (since I had been operating for a while).
-
Enable IOMMU. First, check GRUB/systemd
efibootmgr -v
. If GRUB,sed -i -r -e 's/(GRUB_CMDLINE_LINUX_DEFAULT=")(.*)"/\1\2 intel_iommu=on"/' /etc/default/grub
-
echo 'vfio vfio_iommu_type1 vfio_pci vfio_virqfd' >> /etc/modules
-
Reboot to check config
-
Set BIOS settings:
- Boot:
- Disable beep
- Enable fast boot
- Enable network stack
- Chipset:
- PCH-IO:
- Enable Wake on lan and BT
- Enable TCO timer
- PCH-IO:
- Boot:
-
Install Prometheus node exporter,
apt install prometheus-node-exporter
. -
Install Avahi daemon to enable mDNS,
apt install avahi-daemon
. -
Install grub package so actual grub binaries get updates,
apt install grub-efi-amd64
. -
Optionally comment out the Cron job on reboot that sets it to power save.
-
Disable IPMI service since we don't have support,
systemctl disable openipmi
.
If I check /etc/grub.d/000_ proxmox whatever it says update-grub
isn't the way and to use proxmox-boot-tool refresh
.
It also looks like there's a specific proxmox grub config file under /etc/default/grub.d/proxmox-ve.cfg
.
I don't expect it hurts much to have iommu on as a machine default, and we're not booting anything else...
Might tidy up the sed config command though.
Looking at the systemd we could probably do both without harm.
That one's also using the official proxmox command.
References:
- Proxmox package repo docs
- Servethehome net passthru tutorial
- Reddit BIOS post
- Actual BIOS download
- Grub forum post
- Arch wiki on CPU scaling
- Proxmox performance tuning
- Proxmox CPU selection tutorial
We did run mkfs -t ext4
but it didn't allow us to use the disk in the GUI.
So using GUI we wiped disk and initialized with GPT.
For the USB rust bucket we found the device name with fdisk -l
.
Then we
Never mind, same dance with the GUI, followed by heading to Node > Disks > Directory and creating one.mkfs -t ext4 /dev/sdb
, followed by a mount /dev/sdb /media/backup
.
Use blkid
to pull details and populate a line in /etc/fstab
for auto remount of backup disk.
Ref
/etc/fstab
:
# <file system> <mount point> <type> <options> <dump> <pass>
/dev/pve/root / ext4 errors=remount-ro 0 1
UUID=C61A-7940 /boot/efi vfat defaults 0 1
/dev/pve/swap none swap sw 0 0
proc /proc proc defaults 0 0
UUID=b35130d3-6351-4010-87dd-6f2dac34cfba /mnt/pve/Backup ext4 defaults,nofail,x-systemd.device-timeout=5 0 2
I used this to shift OPNsense to 999 and any templates to >=1000.
- Stop VM
- Get storage group name
lvs -a
- Rename disk
lvrename prod vm-100-disk-0 vm-999-disk-0
- Enter
/etc/pve/nodes/proxmox/qemu-server
- Edit conf file to use renamed disk.
- Move conf file to new id
nix-shell -p cloud-utils
growpart /dev/sda 1
resize2fs /dev/sda1
- Download iso and unpack
- Move iso to
/var/lib/vz/template/iso
- Create VM with adjustments:
I'm trying 2 cores now, utilization was low but we had spikes which I suspect were system stuff
- Start at boot
- SSD emulation, 48GiB
- 1 socket+ 2 cores, NUMA enabled
- 2048 MiB RAM
- Use proxmox under datacenter to configure a backup schedule.
The following should keep a rolling 4-weekly history.
- Sunday 0100
- Notify only on fail
- Keep weekly 4
- Boot machine and follow installer
- Add PCIe ethernet controllers
- Boot system and root login
- Assign WAN and LAN interfaces to ethernet controllers
- Check for updates either
opkg update
or maybe system > firmware - Add static DHCP leases for any machines using static IPs so Upbound will serve records for them
- Install an intermediate cert and it's corrosponding bundle under system > trust
- Switch to using the TLS certificate under System > Settings > Administration
- Set both interfaces to delete protected and IPv6 SLAAC
- Under System > General:
- set hostname
- set domanin
- configure DNS servers
- Disallow DNS override on WAN
- Reporting > netflow set capture on
- System > settings > cron
- once daily to update the block lists
- once weekly after the backup is taken (this ensures we can restore)
- Follow Ben Tasker's stuff
- Configure Upbound DNS service
- enable DNSSEC
- enable DHCP lease registration
- Disallow system nameservers in DoT and add records with blank domains+port 853
- Enable blocklists
- OIDS Ads
- Steven Black
- Hagezi Multi Pro++
- Enable data capture
- Firewall
- Add aliases for static boxes, localhost
- Create a NAT port-forward:
- LAN interface
- IPv4+6
- TCP+UDP
- Invert
- Destination LAN net
- from dns to dns
- Redirect target Localhost:53
- Test
- DNS redirection:
- Unbound host override bing.com to something
- Check this returns the override
dig +trace @4.4.4.4 bing.com
- Ad blocking https://d3ward.github.io/toolz/adblock.html
- DNS redirection:
- Block bad IPs
- Add aliases for IP lists (see references)
- Add firewall rules: Either floating destination FireHOL level 3 + Spamhaus OR WAN outbound FireHOL Level 1 + WAN inbound all bad IPs
- Unbound DoT tutorial
- DNS tutorial
- OPNsense forum thread
- Blocking blog post
- Fedi posts about it alt server Spamhaus
Follow one of the 6000 tutorials AKA yes, I forgot to document it.
Follow tutorial AKA forgot to document it.
See also wg0.conf
in this repo.
/usr/local/etc/rc.syshook.d/update/99-alacritty-terminal
:
#!/bin/sh
# Configures terminal for Alacritty
curl -sSL https://raw.githubusercontent.com/alacritty/alacritty/master/extra/alacritty.info -o alacritty.info
infotocap alacritty.info >> /usr/share/misc/termcap
cap_mkdb /usr/share/misc/termcap
rm alacritty.info
- NextCloud backup, configure with an app key.
- Git backup. Create an uninitialized repository and provide API key and HTTPS URL.
- Prometheus exporter for monitoring.
- DynamicDNS client, configure with AWS Access Key.
- tftp plugin (unmaintained but workable)
src.
Make directory
/usr/local/tftp
and downloadnetboot.xyz.kpxe
. I also downloadednetboot.xyz.efi
for good measure. Enable TFTP and set listening IP to0.0.0.0
. This defaulted to127.0.0.1
which may have worked but I didn't test. - ACME client tutorial
- Install
os-wol
to wake on lan. Add all physical machines to the list of known, you can use ISC DHCP leases to find all the MACs in one place. - Optionally: themes (rebellion)
Notes:
I will revisit the resources supplied after running the box for a bit.
CPU seems fine, spikey with what I think are Python runtime startups from the control layer. RAM looks consistently under about 1Gb so I'll trim that back from the recommended minimum 2Gb. We're doing pretty well on space too but I'm less short on that.
References:
- Access Proxmox directly
- Visit Backup storage and confirm thre is an intact and appropriate snapshot image to restore.
- Disable VM protection and delete it.
- Select backup image and restore to VM ID
999
. Enable start on boot and disable unique feature.
- Use GUI installer to deploy, the CLI one's a pain, 12GiB storage minimum, 4 cores/8GiB ram (min 4+ GiB). We'll scale it down later, it bombs completely without plenty of RAM.
- Thing's a pain to bootstrap and the web console is limited
Sudo edit
/etc/nixos/configuration.nix
- Enable the openssh service
security.sudo.wheelNeedsPassword = false
- Disable OSprober by removing the line
- Set disk source to
/dev/sda
- I'm not sure those last 2 make much of a difference once the machine is under flake control
- Rebuild
- Bounce the machine so it releases using the new hostname
- Then pull down some keys to get in, it should have already DHCP'd over the bridge network.
mkdir ~~/.ssh && curl https://github.com/arichtman.keys -o ~/.ssh/authorized_keys && chmod 600 ~~/.ssh/authorized_keys
- Upgrade the system
sudo nixos-rebuild switch --upgrade --upgrade-all
- Reboot to test
- Clear history
sudo nix-rebuild list-generations
sudo rm /nix/var/nix/profiles/system-#-profile
sudo nix-collect-garbage --delete-old
- Adjust hardware down to 1/2GiB.
- Make template
Arguably this mingles with substratum, as PKI/trust/TLS is required or very desirable for VPN/HTTPS etc. SPIFFE/SPIRE will address this somewhat.
- Create root CA
xkcdpass --delimiter - --numwords 4 > root-ca.pass
step certificate create "ariel@richtman.au" ./root-ca.pem ./root-ca-key.pem --profile root-ca --password-file ./root-ca.pass
- Distribute the intermediate certificates and keys
- Secure the root CA, it's a bit hidden but Bitwarden does take attachments.
- Publish the root CA, with my current setup this meant uploading it to s3.
- Update the sha256 for the root certificate
fetchUrl
call
garage layout assign --zone garage.services.richtman.au --capacity 128GB $(garage node id 2>/dev/null)
garage layout apply --version 1
# Create a client certificate with admin
step certificate create cluster-admin cluster-admin.pem cluster-admin-key.pem \
--ca ca.pem --ca-key ca-key.pem --insecure --no-password --template granular-dn-leaf.tpl --set-file dn-defaults.json --not-after 8760h \
--set organization=system:masters
# Construct the kubeconfig file
# Here we're embedding certificates to avoid breaking stuff if we move or remove cert files
kubectl config set-cluster home --server https://fat-controller.systems.richtman.au:6443 --certificate-authority ca.pem --embed-certs=true
kubectl config set-credentials home-admin --client-certificate cluster-admin.pem --client-key cluster-admin-key.pem --embed-certs=true
kubectl config set-context --user home-admin --cluster home home-admin
- Create private key
openssl genpkey -out klient-key.pem -algorithm ed25519
- Create CSR
openssl req -new -config klient.csr.conf -key klient-key.pem -out klient.csr
export KLIENT_CSR=$(base64 klient.csr | tr -d "\n")
- Submit the CSR to the cluster
envsubst -i klient-csr.yaml | kubectly apply -f -
- Approve the request
kubectl certificate user approve
For security reasons, it's not possible for nodes to self-select roles.
We can label our nodes using label.sh
.
hmmm, deleting the nodes (reasonably) removes labels. ...and since they can't self-identify, we have to relabel every time. I expect taints would work the same way, so we couldn't use a daemonset or spread topology with labeling privileges since it wouldn't know what to label the node. Unless... we deploy it with a configMap? That's kinda lame. I suppose all the nodes that need this are dynamic, ergo ephemeral and workers, so we could make something like that. Heck, a static pod would work fine for this and be simple as. But then it'd be a pod, which is a continuous workload, which we really don't need. A job would suit better, but then it's like, why even run this on the nodes themselves? Have the node self-delete (it'll self-register again anyway), and have the admin box worry about admin like labelling. I wonder if there's any better way security-wise to have nodes be trusted with certain labels. Already they need apiServer-trusted client certificates, it'd be cool if the metadata on those could determine labels.
There is a way to tell the Kubelet to register with labels but it's limited to a specific group. I doubt the Kubelet has an option to open that up and since we're getting denied even starting the binary it's probably not settable on the APIserver.
kubectl get csr --no-headers -o jsonpath='{.items[*].metadata.name}' | xargs -r kubectl certificate approve
Checking builds manually: nix build .#nixosConfigurations.fat-controller.config.system.build.toplevel
Minimal install ~3.2 gigs
Lab-node with master node about 3.2 gb also, so will want more headroom.
Add to nomicon
- fakesha256
- nix-prefetch-url > hash.txt
Using tasker
Profile: AutoPrivateDNS
State: Wifi Connected [ SSID:sugar_monster_house MAC:* IP:* Active:Any ]
Enter: Anon
A1: Custom Setting [ Type:Global Name:private_dns_mode Value:off Use Root:Off Read Setting To: ]
Exit: Anon
A1: Custom Setting [ Type:Global Name:private_dns_mode Value:hostname Use Root:Off Read Setting To: ]
nix shell nixpkgs#android-tools -c adb shell pm grant net.dinglisch.android.taskerm android.permission.WRITE_SECURE_SETTINGS
References:
Trust chain system install:
sudo security add-trusted-cert -r trustRoot -k /Library/Keychains/System.keychain -d ~/Downloads/root-ca.pem
OPNsense/openssl's ciphers are too new, to install client certificate you may need to pkcs12 bundle legacy.
openssl pkcs12 -export -legacy -out Certificate.p12 -in certificate.pem -inkey key.pem
- Update everything
softwareupdate -ia
- Optionally install rosetta
softwareupdate --install-rosetta --agree-to-license
I didn't explicitly install it but it's on there somehow now. There was some mention that it auto-installs if you try running x86_64 binaries. - Determinant systems install nix
curl --proto '=https' --tlsv1.2 -sSf -L https://install.determinate.systems/nix | sh -s -- install
- Until this is resolved nix-darwin/nix-darwin#149
sudo mv /etc/nix/nix.conf /etc/nix/.nix-darwin.bkp.nix.conf
- Nix-Darwin build and run installer
nix-build https://github.com/LnL7/nix-darwin/archive/master.tar.gz -A installer
./result/bin/darwin-installer
edit default configuration.nix? n
# Accept the option to manage nix-darwin using nix-channel or else it bombs
manage using channels? y
add to bashrc y
add to zshrc? y
create /run? y
# a nix-channel call will now fail
Bootstrapping:
- do the xcode-install method
- Build manually once
nix build github:arichtman/nix#darwinConfigurations.macbook-pro-work.system
- Switch manually once
./result/sw/bin/darwin-rebuild switch --flake .#macbook-pro-work
- If bootstrapped, build according to flake
./result/sw/bin/darwin-rebuild switch --flake github:arichtman/nix
To do: look into Nix VMs on Mac
some very wip notes about the desktop.
- Installer with nVidia drivers worked ok in simplified mode
- Despite the claims of signing automation for secure boot it still needs to be disabled, 'less you like 800x600.
- Bluetooth pair the speaker though you may have to change the codec in settings > sound
- I ran
bluetoothctl trust $MAC
to try and start off autoconnect - I fiddled about in display settings to get orientation of monitors correct
- Used the Determinate Systems Nix installer
- Added
trusted-users = @wheel
to/etc/nix/nix.conf
- Used
nix shell helix home-manager
to bootstrap home-manager switch --flake . -b backup
- Installed my root certificate
sudo curl https://www.richtman.au/root-ca.pem -o source/anchors/root-ca.pem
sudo update-ca-trust
- Enabled WoL tutorial
- Set resolved's upstream DNS from DHCPv4, figure out what to do about v6 dynamic DNS server.
- Fix Firefox image pasting
- Get CLI clipboard access post
- Learn about universal blue/ostree and decide if I want to keep this
- find the proper fix to not sourcing the nix-daemon script that sets
PATH
correctly - look into errors running
tracker-miner-fs-3.service
- Work out how to uninstall
nano-default-editor
rpm-ostree override remove
- Fix Zellij exits still leaving you in a Bash session.
- Make Alacritty visible on the launch pad or whatever it's called
- Opinionated flake structure
- Home-manager configuration options
- Misterio77's starter configs
- Just generally sucking at it, spelunking
nixpkgs
andNixOS-WSL
source Nix files - Jake Hamilton videos
- Nebucatnetzer's config
- Post about inline nix use helm