Tim's blah blah blah

Homelab Proxmox + VyOS + Debian setup from scratch

(Updated: )

Here I document my home server config. I’m trying to integrate the router in it by using a second USB ethernet as WAN port. Running Proxmox I can run a Debian installation for my usual stuff and a separate router VM for routing/firewall/adblock. Since setting up in 2022, my setup has grown considerably.

Contents

Goal

I’m looking for a small and energy efficient server with some storage capability.

Todo

  1. Set up warning in case thin volumes exceed usage of thin pool (autoextend? source? (sleeplessbeastie.eu))
  2. fstrim guest os volumes after rebooting pve host (source (askubuntu.com))
  3. Move Home Assistant to separate VM – less nesting –> done
  4. Reconfigure disk sizes, 8GB was a bit too conservative for proxmox –> set to 12GB by claiming swap, OK for now

Hardware

I’m looking for a small and energy efficient server with some storage capability. I’ve settled for using a NUC with an extra 2.5" bay, which suits my needs.

Target services & architecture

I considered the following virtualization software:

For routing, I considered the following platforms:

In the end I settled for Proxmox + VyOS

Software distribution

I went with Ubuntu Server 20.04-LTS before, but this had frequent updates (feels like daily, also non-security). On top of the Ubuntu’s ideological direction is a bit off course, hence for this setup I’m using Debian Stable instead, which incidentally is the same as the host OS useed by Proxmox.

Setting up Proxmox

Download ISO

Get ISO from proxmox (proxmox.com) and copy to USB disk (proxmox.com) on macOS using hdiutil and dd:

hdiutil convert proxmox-ve_*.iso -format UDRW -o proxmox-ve_*.dmg
diskutil list
diskutil unmountDisk /dev/diskX
sudo dd if=proxmox-ve_*.dmg bs=1M of=/dev/rdiskX

Installation

LVM settings

Install Proxmox as described on their wiki (proxmox.com), reserving a small part for the OS and most disk space for VMs:

ZFS settings

I use ZFS RAID0 on two disks with default settings (ashift=12, compress=on, checksum=on, copies=1, ARC max size=3201MB, hdsize=1800GB), using the SSD as primary disk.

Initial networking

In my Proxmox-as-router setup, I use a USB dongle ethernet for WAN connection and the onboard NIC for LAN purposes. I set the LAN-facing NIC as management interface, and leave IP/gateway/DNS as-is during installation, which will be fixed later. My initial /etc/network/interfaces looks like:

cat << 'EOF' > /etc/network/interfaces
auto lo
iface lo inet loopback

auto enx7c10c9194780
iface enx7c10c9194780 inet dhcp
#iface enx7c10c9194780 inet manual

#auto vmbr1
#iface vmbr1 inet manual
#  bridge-ports enx7c10c9194780
#  bridge-stp off
#  bridge-fd 0
##WAN

iface eno1 inet manual

auto vmbr0
iface vmbr0 inet manual
  bridge-ports eno1
  bridge-stp off
  bridge-fd 0
  bridge-vlan-aware yes
  bridge-vids 2-4094
#LAN

auto vmbr0.10
iface vmbr0.10 inet static
  address 172.17.10.6/24
  #gateway 172.17.10.1
#Mgmt interface
EOF

service networking restart

Now connect to 172.17.10.6/24 on VLAN 10 to further configure the machine.

Post-installation

Optional: post-install fixes, sourced from Proxmox Helper Scripts (github.io). Never run scripts from the internet.

apt-get update
apt-get dist-upgrade
apt install sudo

# From https://raw.githubusercontent.com/tteck/Proxmox/main/misc/post-pve-install.sh
# Disable Enterprise Repository
sed -i "s/^deb/#deb/g" /etc/apt/sources.list.d/pve-enterprise.list
sed -i "s/^deb/#deb/g" /etc/apt/sources.list.d/ceph.list

# Enable No-Subscription Repository https://pve.proxmox.com/wiki/Package_Repositories#sysadmin_no_subscription_repo
cat << 'EOF' >>/etc/apt/sources.list
# Enable No-Subscription Repository https://pve.proxmox.com/wiki/Package_Repositories#sysadmin_no_subscription_repo
deb http://download.proxmox.com/debian/pve bookworm pve-no-subscription
EOF

# Optional: disable Subscription Nag
echo "DPkg::Post-Invoke { \"dpkg -V proxmox-widget-toolkit | grep -q '/proxmoxlib\.js$'; if [ \$? -eq 1 ]; then { echo 'Removing subscription nag from UI...'; sed -i '/data.status/{s/\!//;s/active/NoMoreNagging/}' /usr/share/javascript/proxmox-widget-toolkit/proxmoxlib.js; }; fi\"; };" >/etc/apt/apt.conf.d/no-nag-script
apt --reinstall install proxmox-widget-toolkit &>/dev/null

# Upgrade proxmox now
apt-get update
apt-get dist-upgrade
reboot

# remove unused kernels (from https://raw.githubusercontent.com/tteck/Proxmox/main/misc/kernel-clean.sh)
dpkg --list | grep 'kernel-.*-pve' | awk '{print $2}' | grep -v $(uname -r) | sort -V
apt purge proxmox-kernel-6.8.4-2-pve-signed
proxmox-boot-tool refresh

Limit journal filesize to 100M

# set SystemMaxUse=100M
sed -i "s/^.SystemMaxUse=/SystemMaxUse=100M/g" /etc/systemd/journald.conf
grep SystemMaxUse /etc/systemd/journald.conf

service systemd-journald restart

Add regular user with sudo power:

adduser tim

usermod -aG sudo,adm tim
mkdir -p ~tim/.ssh/
touch ~tim/.ssh/authorized_keys
chown -R tim:tim ~tim/.ssh
chmod og-rwx ~tim/.ssh/authorized_keys
cat << 'EOF' >>~tim/.ssh/authorized_keys
ssh-rsa AAAAB...
EOF

Now forbid root login for SSH and forbid password authentication (use public key only):

sed -i "s/^PermitRootLogin yes/PermitRootLogin no # TvW disallow root login/g" /etc/ssh/sshd_config
grep PermitRootLogin /etc/ssh/sshd_config

sed -i "s/^.PasswordAuthentication yes/PasswordAuthentication no # TvW disallow password (only allow pubkey)/g" /etc/ssh/sshd_config
grep PasswordAuthentication /etc/ssh/sshd_config

sshd -t
systemctl restart ssh

Optional: add (lower privileged) user to Proxmox VE:

pveum user add tim@pve -firstname "Tim"
pveum passwd tim@pve
pveum acl modify / -user tim@pve -role PVEVMAdmin

Enable colors in shell & vim:

sed -i "s/^# alias /alias /" ~/.bashrc

cat << 'EOF' >> ~/.bashrc
alias grep='grep --color=auto'
alias fgrep='fgrep --color=auto'
alias egrep='egrep --color=auto'
EOF

Set ondemand cpu governer for power saving. Add intel_pstate=disable to boot parameters (2019) (linuxquestions.org) (details here (proxmox.com)):

sed -i 's/^GRUB_CMDLINE_LINUX_DEFAULT="quiet/GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_pstate=disable/' /etc/default/grub
grep GRUB_CMDLINE_LINUX_DEFAULT /etc/default/grub
# vi /etc/default/grub
# GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_pstate=disable"
update-grub && /usr/sbin/reboot

# or if using systemd-boot:
vi /etc/kernel/cmdline
/usr/sbin/proxmox-boot-tool refresh && reboot

# confirm parameter was passed:
sudo dmesg | grep "Command line"

cat /sys/devices/system/cpu/cpufreq/policy0/scaling_driver
cat /sys/devices/system/cpu/cpufreq/policy0/scaling_available_governors

Set scaling governer to schedutil (kernel.org) (superior to ondemand and conservative)

apt install cpufrequtils
echo schedutil | sudo tee -a /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

Alternatively try enabling speedstep in BIOS (2018) (stackoverflow.com) (not needed for my setup).

Measure power consumption:

Optional tweaks & bugfixes

Optional: assign USB dongle interface a nice name (stackexchange.com). N.B. this breaks proxmox recognizing the adapter as network interface in the GUI, disabling some configuration options.

echo 'SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="7c:10:c9:19:47:80", NAME="usb0"' | tee -a /etc/udev/rules.d/70-persistent-net.rules

echo "auto usb0
iface usb0 inet dhcp" | tee -a /etc/network/interfaces

udevadm control --reload-rules && udevadm trigger

Optional: Fix boot delay after bluetooth driver error. Looks like it’s caused by DHCP timeout (askubuntu.com) on unconnected ethernet port, leave as is. Add intel-ibt-17* bluetooth driver for NUC –> does not work, conflicts with proxmox kernel. Wait until adopted in main PVE kernel.

Bugfix: bridge brought up before physical port not up: “error: vmbr1: bridge port enx000ec6955446 does not exist”

  1. Increase bridge_maxwait to 40s
  2. Alternative: Increase bridge_waitport ?
  3. Try something else

Setting up Proxmox storage

I have two disks of different speed (PCIE & SATA SSD). I want the VMs to run on the fast disk and bulk data to run on the slow disk. With LVM, I ensure this by first extending the LVM over both disks (beause proxmox was installed on one of the disks (should be the fast one)), then create two thinpools, the first one for VMs which resides on the fast disk, the second spans the remainder of the fast disk + the full slow disk. This wastes a small amount of space between the two thinpools, but ensures the VMs run on the fast disk.

With ZFS I haven’t found how to organize this, as it can only make one pool, and optionally can use a disk as cache (SLOG/L2ARC), but that requires a separate disk, e.g. “In general, you should probably expect a pool’s performance to exhibit the worst characteristics of each vdev inside it. In practice, there’s no guarantee where reads will come from inside the pool – they’ll come from “whatever vdev they were written to”, and the pool gets to write to whichever vdevs it wants to for any given block(s).” (jrs-s.net). The advantage of ZFS is that it can compress on the fly (generally great for VMs, e.g. )

Some other sources

  1. Performance comparison between ZFS and LVM (proxmox.com)

ZFS

Ars (arstechnica.com) has a good background article on zfs to get you started, this Reddit post (reddit.com) clarifies how Proxmox sets it up by default, and the PVE manual is also useful on ZFS (proxmox.com).

TODO: maybe set vm.swappiness = 10

Initial zfs pool config looks something like this:

zpool status
  pool: rpool
 state: ONLINE
config:

  NAME                                                       STATE     READ WRITE CKSUM
  rpool                                                      ONLINE       0     0     0
    ata-ST2000LM007-1R8174_WCC0HHLE-part3                    ONLINE       0     0     0
    ata-Samsung_SSD_850_EVO_M.2_250GB_S33CNX0J508589Y-part3  ONLINE       0     0     0

errors: No known data errors
zpool list -v
NAME                                                        SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
rpool                                                      2.04T  1.47G  2.04T        -         -     0%     0%  1.00x    ONLINE  -
  ata-ST2000LM007-1R8174_WCC0HHLE-part3                    1.82T  1.05G  1.81T        -         -     0%  0.05%      -    ONLINE
  ata-Samsung_SSD_850_EVO_M.2_250GB_S33CNX0J508589Y-part3   232G   436M   230G        -         -     0%  0.18%      -    ONLINE
zfs list
NAME               USED  AVAIL  REFER  MOUNTPOINT
rpool             1.47G  1.97T   104K  /rpool
rpool/ROOT        1.47G  1.97T    96K  /rpool/ROOT
rpool/ROOT/pve-1  1.47G  1.97T  1.47G  /
rpool/data          96K  1.97T    96K  /rpool/data
rpool/var-lib-vz    96K  1.97T    96K  /var/lib/vz

By default pve creates these entries:

  1. rpool is the ZFS pool
  2. rpool/ROOT dataset for root?
  3. rpool/ROOT/pve-1 dataset for pve
  4. rpool/data volume block for VM images (freebsd.org)
  5. rpool/var-lib-vz dataset mounted at /var/lib/vz for backups

You can see this back partially in /etc/pve/storage.cfg:

cat /etc/pve/storage.cfg
dir: local
  path /var/lib/vz
  content iso,vztmpl,backup

zfspool: local-zfs
  pool rpool/data
  sparse
  content images,rootdir

Expand over all disks

If your second disk is larger than your boot disk, the initial zfs pool doesn’t use the full second disk, e.g. here my sda is only used for 50%, likely because the boot disk (nvme0n1) is half the size of sda:

root@pve:/home/tim# lsblk
NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sda           8:0    0  3.6T  0 disk 
├─sda1        8:1    0 1007K  0 part 
├─sda2        8:2    0    1G  0 part 
└─sda3        8:3    0  1.8T  0 part 
nvme0n1     259:0    0  1.8T  0 disk 
├─nvme0n1p1 259:1    0 1007K  0 part 
├─nvme0n1p2 259:2    0    1G  0 part 
└─nvme0n1p3 259:3    0  1.8T  0 part 

to expand, follow this guide (doussan.info):

/usr/sbin/cfisk
# Expand partition manually
zpool online -e rpool ata-Samsung_SSD_860_EVO_4TB_S45JNB0M500432F-part3
zpool list -v

Profit!

Tweak & configure

Now we tweak the initial zfs config:

  1. Reserve 20GB as minimum for PVE
  2. 150GB quota (github.io) for backups so it doesn’t choke the rest
  3. Set recordsize=16k, atime=relative, and logbias=throughput to minimize (unixtutorial.org) wear (serverfault.com)
  4. Split data volume into one for VMs (vmdata) and bulk (bulkdata)
zpool set autotrim=on rpool
zfs set reservation=20GB rpool/ROOT/pve-1
zfs set quota=150G rpool/var-lib-vz
zfs set recordsize=16k atime=off rpool

zfs rename rpool/data rpool/vmdata
sed -i 's/rpool\/data/rpool\/vmdata/' /etc/pve/storage.cfg

Now expand by creating second pool on remainder of fast disk + all of slow disk for static data:

zpool create tank nvme0n1p5 sda
zpool set autotrim=on tank
zfs set atime=off logbias=throughput exec=off tank

zfs create -o compress=zstd tank/bulk
# zfs create -o compress=zstd tank/backups
zfs create -o compress=zstd tank/backups
zfs create -o compress=zstd tank/backups/backupsmba
zfs set quota=300G tank/backups/backupsmba
zfs create -o compress=zstd tank/backups/backupsmbp
zfs set quota=1200G tank/backups/backupsmbp
zfs create -o compress=zstd tank/backups/data-tim
zfs create -o compress=zstd tank/backups/data-helene

reboot

Set up periodic scrubbing, weekly for OS disk, monthly for data disk:

sudo systemctl enable zfs-scrub-monthly@tank.timer --now
sudo systemctl enable zfs-scrub-weekly@rpool.timer --now

The result view looks like:

zfs list
NAME                       USED  AVAIL  REFER  MOUNTPOINT
rpool                     20.0G  1.95T   104K  /rpool
rpool/ROOT                20.0G  1.95T    96K  /rpool/ROOT
rpool/ROOT/pve-1          1.34G  1.97T  1.34G  /
rpool/backups              392K  1.95T   104K  /rpool/backups
rpool/backups/backupsmba    96K  1.95T    96K  /rpool/backups/backupsmba
rpool/backups/backupsmbp    96K  1.95T    96K  /rpool/backups/backupsmbp
rpool/backups/backupstex    96K  1.95T    96K  /rpool/backups/backupstex
rpool/bulkdata             192K  1.95T    96K  /rpool/bulkdata
rpool/bulkdata/bulk         96K  1.95T    96K  /rpool/bulkdata/bulk
rpool/var-lib-vz            96K   500G    96K  /var/lib/vz
rpool/vmdata                96K  1.95T    96K  /rpool/vmdata

You can see that rpool/var-lib-vz decreased in AVAIL to the 500GB quota, and that the maximum AVAIL for all except rpool/ROOT/pve-1 has dropped by the reservation of 20GB. The nice thing of zfs is that the PVE root is also part of the main pool, while for LVM this (by default) is not the case, i.e. with LVM you might have created a PVE root partition that is too small (really annoying), or too big (wastes space).

Measure power consumption on backup NUC (NUC5i3RYH, 256GB m.2 SSD + 2TB 2.5" HDD):

Initial load: 20:13:35 up 9 min, 1 user, load average: 0.13, 0.22, 0.13

LVM

Expand LVM over two disks

Sources here (serverfault.com), here (kenmoini.com), here (proxmox.com), here (proxmox.com) and here (sleeplessbeastie.eu).

Find disk to extend LVM onto, create full-disk LVM partition, and wipe previous LVM configs

lsblk -o name,size,type,model,serial
cfdisk /dev/sda
pvremove -y -ff /dev/sda*

Find disk by serial, create new physical volume (pv) for LVM, if the disk contained previous partitions you might get

ls /dev/disk/by-id/*WCC0HHLE*
pvcreate /dev/disk/by-id/ata-ST2000LM007-1R8174_WCC0HHLE-part1
pvs -a

Now extend the existing pool onto it

vgextend pve /dev/disk/by-id/ata-ST2000LM007-1R8174_WCC0HHLE-part1
vgs -a

Remove ’local-lvm’ storage via GUI or pvesm to start with a clean sheet.

pvesm remove local-lvm

Now extend data LVM pool, recreate new one later (could also extend probably)

lvremove pve/data
lvs -a
# tim@pve2:~$ lvs -a
#   LV   VG  Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
#   root pve -wi-ao---- 20.00g                                                    
#   swap pve -wi-ao----  2.00g                                                    

Set up LVM thin pools & thin volumes

My setup looks something like:

Show current setup

lvs -a
pvdisplay
lvdisplay

Create data setup in LVM. In my case sda is slow, sdb is fast

# Create thinpool for vms on fast disk. I want 0.5TB. In case the disk is big enough to hold this, run
lvcreate --thin -L 0.5TB pve/thinpool_vms /dev/sdb3

# If the disk is not big enough and you need to expand over the second disk, check the available extents on the fast disk, create thinpool to fill this up
pvdisplay -m /dev/sdb3 | grep "Free PE"
lvcreate --thin -l53730 pve/thinpool_vms /dev/sdb3
lvextend -L 0.5TB /dev/mapper/pve-thinpool_vms
# Create thinpool on remainder of fast disk + slow disk
lvcreate --thin -l 100%FREE pve/thinpool_data

Ensure data split across disks was succesful.

pvdisplay -m /dev/sda1
pvdisplay -m /dev/sdb3
lvdisplay -m /dev/pve/thinpool_vms
lvdisplay -m /dev/pve/thinpool_data

# Create LVs for future use on thinpool_data
lvcreate --thinpool pve/thinpool_data --name lv_bulk --virtualsize 1.2T
lvcreate --thinpool pve/thinpool_data --name lv_backup_vms --virtualsize 0.5T
lvcreate --thinpool pve/thinpool_data --name lv_backup_mbp --virtualsize 1T
lvcreate --thinpool pve/thinpool_data --name lv_backup_mba --virtualsize 0.25T
# lvcreate --thinpool pve/thinpool_data --name lv_backup_tex --virtualsize 1.25T
lvcreate --thinpool pve/thinpool_data --name lv_data_tim --virtualsize 0.75T
lvcreate --thinpool pve/thinpool_data --name lv_data_helene --virtualsize 0.75T

Allow freeing up of unused space on thin volumes (see source (askubuntu.com))

vi /etc/lvm/lvm.conf
# issue_discards = 1

Create filesystems on drive

mkfs.ext4 /dev/mapper/pve-lv_bulk
mkfs.ext4 /dev/mapper/pve-lv_backup_vms
mkfs.ext4 /dev/mapper/pve-lv_backup_mbp
mkfs.ext4 /dev/mapper/pve-lv_backup_mba
# mkfs.ext4 /dev/mapper/pve-lv_backup_tex
mkfs.ext4 /dev/mapper/pve-lv_data_tim
mkfs.ext4 /dev/mapper/pve-lv_data_helene

Mount in Proxmox

mkdir /mnt/bulk
mkdir -p /mnt/backup/{vms,mba,mbp,tex,data-tim,data-helene}
mount /dev/mapper/pve-lv_bulk /mnt/bulk/
mount /dev/mapper/pve-lv_backup_vms /mnt/backup/vms
mount /dev/mapper/pve-lv_backup_mbp /mnt/backup/mbp
mount /dev/mapper/pve-lv_backup_mba /mnt/backup/mba
# mount /dev/mapper/pve-lv_backup_tex /mnt/backup/tex
mount /dev/mapper/pve-lv_data_tim /mnt/backup/data-tim
mount /dev/mapper/pve-lv_data_helene /mnt/backup/data-helene

chmod og-rx /mnt/backup/{vms,mba,mbp,tex,data-tim,data-helene}
chmod og-rx /mnt/bulk/

Ensure automount

cat << 'EOF' >>/etc/fstab
/dev/mapper/pve-lv_bulk       /mnt/bulk        ext4  defaults  0  2
/dev/mapper/pve-lv_backup_vms /mnt/backup/vms  ext4  defaults  0  2
/dev/mapper/pve-lv_backup_mbp /mnt/backup/mbp  ext4  defaults  0  2
/dev/mapper/pve-lv_backup_mba /mnt/backup/mba  ext4  defaults  0  2
/dev/mapper/pve-lv_backup_tex /mnt/backup/tex  ext4  defaults  0  2
/dev/mapper/pve-lv_data_tim /mnt/backup/data-tim  ext4  defaults  0  2
/dev/mapper/pve-lv_data_helene /mnt/backup/data-helene  ext4  defaults  0  2
EOF

Add directory to PVE storage manager

pvesm add dir backup --path /mnt/backup/vms --content vztmpl,iso,backup

Add thin pool to PVE storage manager

pvesm scan lvmthin pve
pvesm add lvmthin thinpool_vms --vgname pve --thinpool thinpool_vms

Push back backups from elsewhere & optionally resize disks/partitions (serverfault.com)

e2fsck -fy /dev/pve/vm-200-disk-0
resize2fs /dev/pve/vm-200-disk-0 300G
lvreduce -L 300G /dev/pve/vm-200-disk-0

# Edit LXC config in /etc/pve/lxc
#rootfs: thinpool_vms:vm-200-disk-0,size=300G

Final setup looks something like:

tim@pve:~$ sudo lvs -a
  LV                    VG  Attr       LSize   Pool          Origin Data%  Meta%  Move Log Cpy%Sync Convert
  lv_backup_mba         pve Vwi-aotz-- 256.00g thinpool_data        2.96                                   
  lv_backup_mbp         pve Vwi-aotz--   1.00t thinpool_data        83.89                                  
  lv_backup_vms         pve Vwi-aotz-- 512.00g thinpool_data        96.31                                  
  lv_bulk               pve Vwi-aotz--   1.20t thinpool_data        97.45                                  
  lv_data_helene        pve Vwi-aotz-- 768.00g thinpool_data        0.44                                   
  lv_data_tim           pve Vwi-aotz-- 768.00g thinpool_data        0.48                                   
  [lvol0_pmspare]       pve ewi------- 128.00m                                                             
  root                  pve -wi-ao---- <11.08g                                                             
  swap                  pve -wi-a-----   1.00g                                                             
  thinpool_data         pve twi-aotz--  <4.95t                      50.64  28.42                           
  [thinpool_data_tdata] pve Twi-ao----  <4.95t                                                             
  [thinpool_data_tmeta] pve ewi-ao----  80.00m                                                             
  thinpool_vms          pve twi-aotz-- 512.00g                      37.40  22.63                           
  [thinpool_vms_tdata]  pve Twi-ao---- 512.00g                                                             
  [thinpool_vms_tmeta]  pve ewi-ao---- 128.00m                                                             
  vm-100-disk-0         pve Vwi-aotz--   8.00g thinpool_vms         97.57                                  
  vm-101-disk-0         pve Vwi-aotz--  32.00g thinpool_vms         96.35                                  
  vm-101-disk-1         pve Vwi-aotz--   4.00m thinpool_vms         0.00                                   
  vm-202-disk-0         pve Vwi-aotz--   8.00g thinpool_vms         50.22                                  
  vm-203-disk-0         pve Vwi-aotz-- 256.00g thinpool_vms         58.13                                  

Maintenance

Sometimes trimming might be required (how/when does this happen automatically?)

Before:

tim@pve:~$ sudo lvs -a
  LV                    VG  Attr       LSize   Pool          Origin Data%  Meta%  Move Log Cpy%Sync Convert
  lv_backup_mba         pve Vwi-aotz-- 256.00g thinpool_data        2.96                                   
  lv_backup_mbp         pve Vwi-aotz--   1.00t thinpool_data        1.86                                   
  lv_backup_tex         pve Vwi-aotz--   1.25t thinpool_data        1.84                                   
  lv_backup_vms         pve Vwi-aotz-- 512.00g thinpool_data        69.36                                  
  lv_bulk               pve Vwi-aotz--   3.00t thinpool_data        36.66                                  
  [lvol0_pmspare]       pve ewi------- 128.00m                                                             
  root                  pve -wi-ao---- <11.08g                                                             
  swap                  pve -wi-a-----   1.00g                                                             
  thinpool_data         pve twi-aotz--  <4.95t                      30.24  21.56                           
  [thinpool_data_tdata] pve Twi-ao----  <4.95t                                                             
  [thinpool_data_tmeta] pve ewi-ao----  80.00m                                                             
  thinpool_vms          pve twi-aotz-- 512.00g                      41.91  25.68                           
  [thinpool_vms_tdata]  pve Twi-ao---- 512.00g                                                             
  [thinpool_vms_tmeta]  pve ewi-ao---- 128.00m                                                             
  vm-100-disk-0         pve Vwi-aotz--   8.00g thinpool_vms         97.56                                  
  vm-101-disk-0         pve Vwi-aotz--  32.00g thinpool_vms         96.18                                  
  vm-101-disk-1         pve Vwi-aotz--   4.00m thinpool_vms         0.00                                   
  vm-201-disk-0         pve Vwi-a-tz-- 300.00g thinpool_vms         41.88                                  
  vm-202-disk-0         pve Vwi-aotz--   8.00g thinpool_vms         86.02                                  
  vm-203-disk-0         pve Vwi-aotz-- 256.00g thinpool_vms         16.99                                  

Trimming

tim@pve:~$ sudo pct fstrim 201
/var/lib/lxc/201/rootfs/: 175.5 GiB (188471455744 bytes) trimmed
tim@pve:~$ sudo pct fstrim 101
Configuration file 'nodes/pve/lxc/101.conf' does not exist
tim@pve:~$ sudo pct fstrim 203
/var/lib/lxc/203/rootfs/: 213.1 GiB (228798124032 bytes) trimmed
tim@pve:~$ sudo pct fstrim 202
/var/lib/lxc/202/rootfs/: 5.4 GiB (5754052608 bytes) trimmed
tim@pve:~$ sudo fstrim /mnt/bulk
tim@pve:~$ sudo fstrim /mnt/backup/vms

After:

tim@pve:~$ sudo lvs -a
  LV                    VG  Attr       LSize   Pool          Origin Data%  Meta%  Move Log Cpy%Sync Convert
  lv_backup_mba         pve Vwi-aotz-- 256.00g thinpool_data        2.96                                   
  lv_backup_mbp         pve Vwi-aotz--   1.00t thinpool_data        1.86                                   
  lv_backup_tex         pve Vwi-aotz--   1.25t thinpool_data        1.84                                   
  lv_backup_vms         pve Vwi-aotz-- 512.00g thinpool_data        39.00                                  
  lv_bulk               pve Vwi-aotz--   3.00t thinpool_data        36.66                                  
  [lvol0_pmspare]       pve ewi------- 128.00m                                                             
  root                  pve -wi-ao---- <11.08g                                                             
  swap                  pve -wi-a-----   1.00g                                                             
  thinpool_data         pve twi-aotz--  <4.95t                      27.17  20.62                           
  [thinpool_data_tdata] pve Twi-ao----  <4.95t                                                             
  [thinpool_data_tmeta] pve ewi-ao----  80.00m                                                             
  thinpool_vms          pve twi-aotz-- 512.00g                      41.28  25.53                           
  [thinpool_vms_tdata]  pve Twi-ao---- 512.00g                                                             
  [thinpool_vms_tmeta]  pve ewi-ao---- 128.00m                                                             
  vm-100-disk-0         pve Vwi-aotz--   8.00g thinpool_vms         97.56                                  
  vm-101-disk-0         pve Vwi-aotz--  32.00g thinpool_vms         96.18                                  
  vm-101-disk-1         pve Vwi-aotz--   4.00m thinpool_vms         0.00                                   
  vm-201-disk-0         pve Vwi-a-tz-- 300.00g thinpool_vms         41.88                                  
  vm-202-disk-0         pve Vwi-aotz--   8.00g thinpool_vms         34.82                                  
  vm-203-disk-0         pve Vwi-aotz-- 256.00g thinpool_vms         17.32                                  

Resizing / adjusting

From (5.4.15. Shrinking Logical Volumes)[https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/6/html/logical_volume_manager_administration/lv_reduce]:

pct shutdown 201
lvreduce --resizefs -L 1.2t pve/lv_bulk
lvresize --resizefs -L 0.3t pve/lv_backup_mba
lvresize --resizefs -L 1.3t pve/lv_backup_mbp
lvremove pve/lv_backup_tex
pct start 201

Setting up VyOS & networking

I’ve documented my VyOS networking setup in a post at setting up VyOS from scratch (vanwerkhoven.org).

Proxmox networking

To get VLAN-aware VyOS working on Proxmox, setup VLAN-aware networking on management interface (proxmox.com). See also “Virtualize OPNsense on Proxmox as Your Primary Router” (homenetworkguy.com) Resulting /etc/network/interfaces config:

auto lo
iface lo inet loopback

auto enx000ec6955446
# iface enx000ec6955446 inet dhcp
iface enx000ec6955446 inet manual

auto vmbr1
iface vmbr1 inet manual
 bridge-ports enx000ec6955446
 bridge-stp off
 bridge-fd 0
#WAN

iface enp0s25 inet manual

auto vmbr0
iface vmbr0 inet manual
  bridge-ports enp0s25
  bridge-stp off
  bridge-fd 0
  bridge-vlan-aware yes
  bridge-vids 2-4094
#LAN

auto vmbr0.10
iface vmbr0.10 inet static
  address 172.17.10.6/24
  gateway 172.17.10.1
#Mgmt interface

source /etc/network/interfaces.d/*

Now (re)connect to pve over ethernet at 172.17.10.6 using VLAN 10.

Configure GS108EP switch

I use a GS108EP switch to connect my Access Points over PoE. To ensure these get the right VLAN, we need to configure it as well. Unforatuntely the GS108E does not support changing the management VLAN from 1, so we have to use a workaround.

As reminder:

  1. The PVID defines the VLAN where untagged frames TO the switch are sent to (ingress). This is typically the same with the one-and-only Untagged VLAN you have on a 802.1q VLAN port
  2. The port memebership determines how VLAN tags are applied to traffic going from the switch (egress).

Since we use our management interface in an untagged fashion, we can use 10 for the VyOS config and the rest of the network, and use 1 for the netgear switch.

Seting up Debian

Proxmox supports two guest architectures:

I finally went for LXC because of disk speed (arstechnica.com):

fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --numjobs=1 --size=2g --iodepth=1 --runtime=30 --time_based --end_fsync=1
pve host zfs ext4 (20250301)
  WRITE: bw=198MiB/s (208MB/s), 198MiB/s-198MiB/s (208MB/s-208MB/s), io=6082MiB (6377MB), run=30698-30698msec

pve host zfs raid0 2 device (20250228)
  WRITE: bw=33.3MiB/s (34.9MB/s), 33.3MiB/s-33.3MiB/s (34.9MB/s-34.9MB/s), io=1013MiB (1062MB), run=30430-30430msec

pve host zfs raid0 1 device (20250228)
  WRITE: bw=70.0MiB/s (73.4MB/s), 70.0MiB/s-70.0MiB/s (73.4MB/s-73.4MB/s), io=2121MiB (2224MB), run=30277-30277msec

pve host zfs raid0 (20250226)
  WRITE: bw=53.1MiB/s (55.7MB/s), 53.1MiB/s-53.1MiB/s (55.7MB/s-55.7MB/s), io=1605MiB (1683MB), run=30229-30229msec

pve host:
  WRITE: bw=220MiB/s (231MB/s), 220MiB/s-220MiB/s (231MB/s-231MB/s), io=6690MiB (7015MB), run=30431-30431msec

debign LXC guest @ mountpoint (20250226)
  WRITE: bw=52.5MiB/s (55.0MB/s), 52.5MiB/s-52.5MiB/s (55.0MB/s-55.0MB/s), io=1584MiB (1661MB), run=30172-30172msec

debian VM guest @ virtio:
  WRITE: bw=67.5MiB/s (70.7MB/s), 67.5MiB/s-67.5MiB/s (70.7MB/s-70.7MB/s), io=2048MiB (2147MB), run=30363-30363msec

debian VM guest @ scsi:
  WRITE: bw=31.1MiB/s (32.6MB/s), 31.1MiB/s-31.1MiB/s (32.6MB/s-32.6MB/s), io=1493MiB (1566MB), run=48003-48003msec

debian LXC guest @ virtio
  WRITE: bw=192MiB/s (202MB/s), 192MiB/s-192MiB/s (202MB/s-202MB/s), io=5876MiB (6161MB), run=30533-30533msec

old proteus:
  WRITE: bw=172MiB/s (180MB/s), 172MiB/s-172MiB/s (180MB/s-180MB/s), io=5338MiB (5598MB), run=31078-31078msec

fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=64k --size=128m --numjobs=16 --iodepth=16 --runtime=30 --time_based --end_fsync=1
pve host zfs ext4 (20250301)
  WRITE: bw=1698MiB/s (1780MB/s), 99.6MiB/s-118MiB/s (104MB/s-124MB/s), io=50.8GiB (54.5GB), run=30103-30626msec

pve host zfs raid0 1 device (20250228)
  WRITE: bw=689MiB/s (723MB/s), 13.2MiB/s-117MiB/s (13.8MB/s-123MB/s), io=21.4GiB (23.0GB), run=30213-31814msec

pve host zfs raid0 (20250226)
  WRITE: bw=741MiB/s (777MB/s), 16.8MiB/s-265MiB/s (17.6MB/s-278MB/s), io=22.6GiB (24.3GB), run=31220-31226msec

pve host:
  WRITE: bw=2429MiB/s (2547MB/s), 140MiB/s-164MiB/s (147MB/s-172MB/s), io=72.7GiB (78.0GB), run=30127-30641msec

debign LXC guest @ mountpoint (20250226)
  WRITE: bw=499MiB/s (524MB/s), 15.7MiB/s-161MiB/s (16.5MB/s-169MB/s), io=17.9GiB (19.2GB), run=36632-36644msec

debian VM guest @ virtio:
  WRITE: bw=1856MiB/s (1946MB/s), 108MiB/s-123MiB/s (114MB/s-129MB/s), io=55.7GiB (59.8GB), run=30133-30712msec

debian LXC guest @ virtio
  WRITE: bw=2045MiB/s (2145MB/s), 117MiB/s-141MiB/s (123MB/s-148MB/s), io=61.9GiB (66.4GB), run=30585-30979msec

old proteus:
  WRITE: bw=286MiB/s (300MB/s), 15.8MiB/s-20.8MiB/s (16.6MB/s-21.8MB/s), io=9648MiB (10.1GB), run=30656-33702msec

Install & configure Debian server as LXC (selected)

Get images using Proxmox’ Proxmox VE Appliance Manager (proxmox.com):

sudo pveam update
sudo pveam available
sudo pveam download local debian-11-standard_11.6-1_amd64.tar.zst
sudo pveam download local debian-12-standard_12.7-1_amd64.tar.zst
sudo pveam list local

Check storage to use

pvesm status

Create and configure LXC container (proxmox.com) based on downloaded image. Ensure it’s an unprivileged container to protect our host and router running on it.

sudo pct create 203 local:vztmpl/debian-12-standard_12.7-1_amd64.tar.zst --description "Debian 12 LXC" --hostname proteus2 --rootfs thinpool_vms:256 --unprivileged 1 --cores 4 --memory 16384 --ssh-public-keys /root/.ssh/tim.id_rsa.pub --net0 name=eth0,bridge=vmbr0,firewall=0,gw=172.17.10.1,ip=172.17.10.7/24,tag=10

Now configure networking, on Proxmox’ vmbr0 with VLAN ID 10. This means the guest can only

# This does not work, cannot create network device on vmbr0.10
# pct set 203 --net0 name=eth0,bridge=vmbr0.10,firewall=0,gw=172.19.10.1,ip=172.19.10.2/24
# Does not work:
# pct set 203 --net0 name=eth0,bridge=vmbr0,firewall=0,gw=172.17.10.1,ip=172.17.10.2/24,trunks=10
# Works:
# pct set 203 --net0 name=eth0,bridge=vmbr0,firewall=0,gw=172.17.10.1,ip=172.17.10.2/24,tag=10
sudo pct set 203 --onboot 1

Optional: only required if host does not have this set up correctly (could be because network was not available at init)

sudo pct set 203 --searchdomain lan.vanwerkhoven.org --nameserver 172.17.10.1

If SSH into guest fails or takes a long time, this can be due to LXC / Apparmor security features (stackoverflow.com) which prevent mount from executing. To solve, ensure nesting is allowed (ostechnix.com):

sudo pct set 203 --features nesting=1

To enable Docker (jlu5.com) inside the LXC container, we need both nesting & keyctl:

sudo pct set 203 --features nesting=1,keyctl=1

Start & log in, set root password, configure some basics

sudo pct start 203
sudo pct enter 203

passwd
apt install sudo vim
dpkg-reconfigure locales
dpkg-reconfigure tzdata

Add regular user, add to system groups (debian.org), and set ssh key

adduser tim
usermod -aG adm,render,sudo,staff tim
mkdir -p ~tim/.ssh/
touch ~tim/.ssh/authorized_keys
chown -R tim:tim ~tim/.ssh

cp /root/.ssh/authorized_keys ~tim/.ssh/authorized_keys
chmod og-rwx ~tim/.ssh/authorized_keys

cat << 'EOF' >>~tim/.ssh/authorized_keys
ssh-rsa AAAA...
EOF

# Allow non-root to use ping
setcap cap_net_raw+p $(which ping)

Update & upgrade and install automatic updates (linode.com)

sudo apt update
sudo apt upgrade

sudo apt install unattended-upgrades
# Comment 'label=Debian' to not auto-update too much
sudo vi /etc/apt/apt.conf.d/50unattended-upgrades

# Tweak some settings
cat << 'EOF' | sudo tee -a /etc/apt/apt.conf.d/50unattended-upgrades
Unattended-Upgrade::Remove-Unused-Kernel-Packages "true";
Unattended-Upgrade::Remove-New-Unused-Dependencies "true";
Unattended-Upgrade::Remove-Unused-Dependencies "true";
EOF

sudo unattended-upgrades --dry-run --debug

Install Docker (docker.com). Need to use custom apt repo to get latest version which works inside an unprivileged LXC container (as proposed on the docker forums (docker.com)):

sudo apt remove docker docker-engine docker.io containerd runc docker-compose

sudo apt update

sudo apt install \
   ca-certificates \
   curl \
   gnupg \
   lsb-release

sudo mkdir -m 0755 -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/debian/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg

echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/debian \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

sudo apt update
sudo apt install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

sudo docker run hello-world

Non-solutions

I also tried these options that didn’t work for my older Docker version:

And we maybe need to change (stackoverflow.com) boot (my-take-on.tech) parameters (proxmox.com):

sed -i 's/^GRUB_CMDLINE_LINUX_DEFAULT="quiet/GRUB_CMDLINE_LINUX_DEFAULT="quiet systemd.unified_cgroup_hierarchy=0/' /etc/default/grub

This failed.

Docker inside (unpriviliged) LXC not supported, but can be made to work.

Try newer version of docker as proposed on the docker forums (docker.com)

sudo apt-get install docker-compose-plugin docker-compose docker.io

Fails.

Try to install all packages:

sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

Works! So it was missing a package?!

Now try to install the debian original docker (fewer apt repositories is more stability)

sudo apt-get remove docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

Install & configure Debian server as VM (not used)

# Get ISO from https://cdimage.debian.org/debian-cd/current/amd64/iso-cd/
ls /var/lib/vz/template/iso/
qm create 200 --name proteus --description "Debian VM server" --cores 4 --memory 12288 --net0 virtio,bridge=vmbr0,firewall=0,tag=10 --ide2 media=cdrom,file=local:iso/debian-11.6.0-amd64-netinst.iso --virtio0 thinpool_vms:300
# ipconfig0 did not work? --ipconfig0 gw=172.17.10.1,ip=172.17.10.2/24 
qm set 200 -serial0 socket
qm set 200 --onboot 1

Open terminal via Spice/xterm.js, install image, remove image, and reboot

qm start 200

# in guest: install image as usual
qm set 200 --ide2 none
qm reboot 200

Add QEMU guest agent

qm set 200 --agent 1
qm agent 200 ping

Test docker

apt install docker.io docker-compose
sudo docker run hello-world

Works

Enable GPU sharing (cetteup.com) in VM:

# vi /etc/default/grub
# GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_pstate=disable
#intel_iommu=on 
#i915.enable_gvt=1"
GRUB_CMDLINE_LINUX_DEFAULT+="intel_iommu=on i915.enable_gvt=1"
proxmox-boot-tool refresh && reboot

# Check for success
cat /proc/cmdline
dmesg | grep -e DMAR -e IOMMU

# Load modules
cat << 'EOF' >> /etc/modules
# Modules required for PCI passthrough
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

# Modules required for Intel GVT
kvmgt
exngt
vfio-mdev
EOF
reboot

# Pass through PCI --> Via GUI
# Not sure how this works on CLI, something like: qm set 200 --hostpci0 0000:00:02.0,mdev=i915-GVTg_V5_4

Expose bulk storage to Debian server

I prefer to keep the guest OS disks smallish so I can back them up. However if I want to store bulk data I don’t have space. To solve this there are three approaches to share storage from host to guest:

  1. Via Samba on host machine, mount in guest. Pro: always works. Con: more complex setup, increases host attack surface
  2. Via bind mount points. Pro: works well in LXC. Fast. Con: only LXC (selected)
  3. Via disk pass-through. Pro: works well in KVM (& LXC?). Fast. Con: cannot write from two guests simultaneously.

Automounting Samba in LXC guest didn’t work for me, giving error “Starting of mnt-bulk.automount not supported.” LXC containers are special, apparently (reddit.com). However I document the steps here for reference.

1. Share data via Samba (not used)

Set up Samba server on Proxmox (digitalocean.com).

Install, disable unnecessary netbios deamon, and stop samba itself during configuration.

apt install samba
systemctl stop nmbd.service
systemctl disable nmbd.service
systemctl stop smbd.service
# systemctl disable smbd.service

Configure

[global]
   server string = pve.vanwerkhoven.org
   server role = standalone server
   interfaces = lo vmbr0.10
   bind interfaces only = yes
   disable netbios = yes
   smb ports = 445
   log file = /var/log/samba/smb.log
   max log size = 10000
   # log level = 3 passdb:5 auth:5

Add users

adduser --home /mnt/bulk --no-create-home --shell /usr/sbin/nologin --disabled-password --uid 1010 bulkdata
adduser --home /mnt/backup/mbp --no-create-home --shell /usr/sbin/nologin --disabled-password --uid 1011 backupmbp
adduser --home /mnt/backup/mba --no-create-home --shell /usr/sbin/nologin --disabled-password --uid 1012 backupmba
adduser --home /mnt/backup/tex --no-create-home --shell /usr/sbin/nologin --disabled-password --uid 1013 backuptex

chown backupmbp:backupmbp /mnt/backup/mbp
chown backupmba:backupmba /mnt/backup/mba
chown backuptex:backuptex /mnt/backup/tex
chown bulkdata:bulkdata /mnt/bulk

chmod 2770 /mnt/backup/{mba,mbp}
chmod 2770 /mnt/bulk

openssl rand -base64 20
smbpasswd -a backupmbp
smbpasswd -a backupmba
smbpasswd -a backuptex

smbpasswd -e backupmba
smbpasswd -e backupmbp
smbpasswd -e backuptex

Set up shares

[bulk]
    path = /mnt/bulk
    browseable = yes
    read only = no
    writable = yes
    force create mode = 0660
    force directory mode = 2770
    valid users = sambarw
[backupmbp]
    comment = Time Machine mbp
    path = /mnt/backup/mbp
    browseable = yes
    writeable = yes
    create mask = 0600
    directory mask = 0700
    spotlight = yes
    vfs objects = catia fruit streams_xattr
    fruit:aapl = yes
    fruit:time machine = yes
    valid users = backupmbp
[backupmba]
    comment = Time Machine MBA
    path = /mnt/backup/mba
    browseable = yes
    writeable = yes
    create mask = 0600
    directory mask = 0700
    spotlight = yes
    vfs objects = catia fruit streams_xattr
    fruit:aapl = yes
    fruit:time machine = yes
    valid users = backupmba

Restart Samba

systemctl restart smbd.service

Now mount Samba share automatically (stackexchange.com) on client from pve host:

sudo apt install smbclient cifs-utils

cat << 'EOF' >>/root/.smbcredentials
user=sambarw
password=redacted
EOF

Automount, but ensure mounting doesn’t fail (askubuntu.com) because network is not up yet (askubuntu.com).

sudo mkdir /mnt/bulk
sudo chown root:users /mnt/bulk/
sudo chmod g+rw /mnt/bulk/
sudo cat << 'EOF' >>/etc/fstab
//pve.lan.vanwerkhoven.org/bulk /mnt/bulk   cifs    credentials=/root/.smbcredentials,rw,uid=tim,gid=users,auto,x-systemd.automount,_netdev      0       0
EOF

2. Share data via mount points (LXC only) (selected)

In the second approach, we mount something on the host and propagate it to the guest (bayton.org), or create a privileged container (proxmox.com).

Mount points require some care regarding UID/GIDs (e.g. see documented on the proxmox wiki (proxmox.com)), but overall seem an easy method to get storage from host to guest.

What worked for me was adding a mountpoint using pct (thushanfernando.com):

sudo mkdir /mnt/bulk
sudo chown tim:users /mnt/bulk
sudo chmod g+w /mnt/bulk

Make user on host (bulkdata:bulkdata) that we’ll propagate UID/GID to in the guest:

adduser --home /mnt/bulk --no-create-home --shell /usr/sbin/nologin --disabled-password --uid 1010 bulkdata
adduser --home /mnt/backup/mbp --no-create-home --shell /usr/sbin/nologin --disabled-password --uid 1011 backupmbp
adduser --home /mnt/backup/mba --no-create-home --shell /usr/sbin/nologin --disabled-password --uid 1012 backupmba
adduser --home /mnt/backup/tex --no-create-home --shell /usr/sbin/nologin --disabled-password --uid 1013 backuptex
usermod -aG bulkdata tim

Set up UID/GID mapping to propagate users 1010–1020 to the same uid on the host (e.g. using this tool (github.com)). N.B. this is only required if you want to write from both the host and guest. If you only write in (multiple) guests, you only need to ensure the user/group writing from the different guests have the same UID/GID.

cat << 'EOF' >>/etc/pve/lxc/201.conf
# uid map: from uid 0 map 1010 uids (in the ct) to the range starting 100000 (on the host), so 0..1010 (ct) → 100000..101010 (host)
lxc.idmap = u 0 100000 1010
lxc.idmap = g 0 100000 1010
# we map 10 uids starting from uid 1010 onto 1010, so 1010 → 1010
lxc.idmap = u 1010 1010 10
lxc.idmap = g 1010 1010 10
# we map the rest of 65535 from 1020 upto 101020, so 1020..65535 → 101020..165535
lxc.idmap = u 1020 101020 64516
lxc.idmap = g 1020 101020 64516
EOF

Add the following to /etc/subuid and /etc/subgid (there might already be entries in the file, also for root):

cat << 'EOF' >>/etc/subuid
root:1010:10
EOF
cat << 'EOF' >>/etc/subgid
root:1010:10
EOF

Now mount the actual bind point

pct shutdown 201
pct set 201 -mp0 /mnt/bulk,mp=/mnt/bulk
pct start 201

and that’s it. Now we can continue configuring the services.

3. Via disk pass through (KVM only) (not used)

Pass through bulk storage using volume pass through with virtio (should be faster than SCSI or IDE (proxmox.com)):

qm set 200 -virtio1 /dev/disk/by-id/dm-name-pve-lv_bulk,backup=0,snapshot=0
#qm set 200 -scsi1 /dev/disk/by-id/dm-name-pve-lv_bulk,backup=0,snapshot=0

Proxmox hardening

Tips from Samuel’s Website (samuel.domains) and pveproxy(8) man page (proxmox.com)

Limit server access to specific IPs:

cat << 'EOF' >>/etc/default/pveproxy
# TvW 20230114 added for security reasons
DENY_FROM="all"
ALLOW_FROM="172.17.10.0/24"
POLICY="allow"

# For PVE-Manager >= 6.4 only.
LISTEN_IP="172.17.10.4"
EOF

Disable NFS:

cat << 'EOF' >>/etc/default/nfs-common
# TvW 20230114 disabled for security reasons
NEED_STATD=no
EOF

Install Unifi Network Application (controller) as LXC container

Install Unifi Network Application (Controller) on Debian (only supported Linux platform) using the Unifi guide (ui.com) and the the Alpine guide (alpinelinux.org).

Get images using Proxmox’ Proxmox VE Appliance Manager (proxmox.com):

pveam update
pveam available
pveam download local debian-11-standard_11.6-1_amd64.tar.zst #OR Alpine?
pveam list local

Check storage to use

pvesm status

Create and configure LXC container (proxmox.com) based on downloaded image. Ensure it’s an unprivileged container to protect our host and router running on it. Also configure networking, run on Proxmox’ vmbr0 with VLAN ID 10 in the Management VLAN.

pct create 202 local:vztmpl/debian-11-standard_11.6-1_amd64.tar.zst --description "Debian LXC Unifi Network Application" --hostname unifi --rootfs thinpool_vms:8 --unprivileged 1 --cores 2 --memory 2048 --ssh-public-keys /root/.ssh/tim.id_rsa.pub --net0 name=eth0,bridge=vmbr0,firewall=0,gw=172.17.10.1,ip=172.17.10.5/24,tag=10
pct set 202 --onboot 1

Optional: only required if host does not have this set up correctly (could be because network was not available at init):

pct set 202 --searchdomain lan.vanwerkhoven.org --nameserver 172.17.10.1

Start & log in, set root password, configure some basics

pct start 202
pct enter 202

passwd
apt install sudo vim
dpkg-reconfigure locales
dpkg-reconfigure tzdata

If SSH into guest fails or takes a long time, this can be due to LXC / Apparmor security features (stackoverflow.com) which prevent mount from executing. To solve, ensure nesting is allowed (ostechnix.com):

pct shutdown 202
pct set 202 --features nesting=1
pct start 202

Hardening sshd is not required: by default, root is only allowed to login with pubkey authentication.

Install required packages to add Unifi apt source, then add new source & related keys

apt-get update && apt-get install ca-certificates apt-transport-https
echo 'deb https://www.ui.com/downloads/unifi/debian stable ubiquiti' | tee /etc/apt/sources.list.d/100-ubnt-unifi.list
wget -O /etc/apt/trusted.gpg.d/unifi-repo.gpg https://dl.ui.com/unifi/unifi-repo.gpg 

Unifi (v7.3.83 in my case) has very specific MongoDB requirements:

 unifi : Depends: mongodb-server (>= 2.4.10) but it is not installable or
                  mongodb-10gen (>= 2.4.14) but it is not installable or
                  mongodb-org-server (>= 2.6.0) but it is not installable
         Depends: mongodb-server (< 1:4.0.0) but it is not installable or
                  mongodb-10gen (< 4.0.0) but it is not installable or
                  mongodb-org-server (< 4.0.0) but it is not installable

Prep for specific MongoDB version, see this guide (mongodb.com). The MongoDB repo for Stretch (Debian 9) has the newest compatible version (3.6) with a matching pgp key, a bit newer than the 3.4 version as written in the Unifi guide (ui.com). The PGP key for this repo will expire on 2023-12-09, not sure what will happen then.

wget -O /etc/apt/trusted.gpg.d/mongodb-repo.gpg https://pgp.mongodb.com/server-3.6.pub
echo "deb https://repo.mongodb.org/apt/debian stretch/mongodb-org/3.6 main" | tee /etc/apt/sources.list.d/mongodb-org-3.6.list
apt-get update

Install Unifi Network Application from apt, this takes 560 MB of disk space for the package & required dependencies (yeah, for just a controller).

apt-get update && apt-get install unifi

Enable, autostart, and start Unifi service:

systemctl is-enabled unifi
systemctl enable unifi

Update & upgrade and install automatic updates (linode.com)

apt update && apt upgrade

apt install unattended-upgrades
# Comment 'label=Debian' to not auto-update too much
vi /etc/apt/apt.conf.d/50unattended-upgrades

# Tweak some settings
cat << 'EOF' | sudo tee -a /etc/apt/apt.conf.d/50unattended-upgrades
Unattended-Upgrade::Remove-Unused-Kernel-Packages "true";
Unattended-Upgrade::Remove-New-Unused-Dependencies "true";
Unattended-Upgrade::Remove-Unused-Dependencies "true";
EOF

sudo unattended-upgrades --dry-run --debug

Migrate Debian services

Install dependencies, prefer python packages via apt for system-wide install (askubuntu.com) and potentially some security because we don’t install from public pip repository

apt install jq curl python3-netcdf4

Service overview

Now:

  1. InfluxDB + data (port X) - via apt 1.6 (else we need special apt repo) –> OK DONE
  1. Worker scripts
  2. All scripts: 1. Unify naming: <source>2<target>, e.g. knmi2influxdb 2. Update in-place with credentials, ideally with backwards compatibility
  3. co2signal –> OK DONE * migrate to HA: no, HA has time lag issue * normalize, separate secrets, add influxdb login: OK * tested: OK
  4. knmi –> OK DONE * migrate to HA: no, keep separate * normalize, separate secrets, add influxdb login * tested: OK
  5. mqtt2influxdb –> OK done * migrate to HA: not possible, different functionality * migrated: OK tested: OK
  6. smeter –> OK done –> phase out, use dsmr reader on HASS * migrate to HA: yes? * migrated: OK tested: OK
  7. water_meter_reader –> OK Done * migrate to HA –> already working via esphome detector & mqtt push
  8. epexspot –> OK done * migrate to HA: no? * migrated: OK tested: OK
  9. hue –> phase out, use powercalc on HASS * migrate to HA: no, requires too much calculation/processing * migrated: OK tested: NOK
  10. mkwebdata * migrate to HA: no * migrated: OK tested: OK
  11. multical –> OK done –> phase out * migrate to HA: no, custom stuff
  12. SBFspot –> OK Done –> phase out * Check which scripts are being used, archive old ones * Read secrets from external file * migrated: OK tested: OK
  13. evohome –> phase out * migrate to HA: yes? * needed in future: no * migrated: OK tested: NOK
  14. Collectd (for data generation/collection) – on proxmox?
  1. Nginx + letsencrypt (port 80/443) –> OK DONE
  2. Docker
  1. Mosquitto (glueing home automation) – on proteus –> OK DONE

Later:

  1. Transmission (downloading torrents) – on proteus
  2. Plex/Jellyfin (HTPC) – needs hw accel, required running in privileged container
  3. smbd (for Time Machine backups) – on proteus

Service hardware requirements

Get HW accel in guest/container: https://www.reddit.com/r/jellyfin/comments/s417qw/hardware_acceleration_inside_proxmox_lxc_not/ (reddit.com)

Prepare old server

  1. Stop cron jobs from collecting data (not strictly necessary)

Docker

First install docker (also see above)

sudo apt install docker.io docker-compose

Reverse proxy options for containers

Optional: prepare forwarding traffic from WAN to containers using a reverse proxy, following some best practices (reddit.com), e.g. using nginx-proxy (github.com).

  1. Expose & publish Docker container ports on host, then reverse proxy to specific port (e.g. -p 127.0.0.1:8000:8000)
  1. Use internal Docker network to map reverse proxy (e.g. dynamically using nginx-proxy (github.com))
  1. Use Traefik (digitalocean.com) to reverse proxy for Docker (bonus: built-in ACME challenge)

I decided to go for option 1: most effort and most overkill (security/speed) for my situation :p

Portainer to ease container management –> not needed

And optionally install portainer to help manage docker. Bind to localhost to ensure this service cannot be accessed outside the machine

sudo docker volume create portainer_data
sudo docker run -d \
  --name portainer \
  --restart=always \
  -p 127.0.0.1:8000:8000 -p 127.0.0.1:9443:9443 \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v portainer_data:/data \
  portainer/portainer-ce:latest
sudo docker ps

InfluxDB

Make backup on old system

/usr/bin/influxd backup -portable /home/tim/backup/influx_snapshot.db/$(date +%Y%m%d)-migrate

Use (old) native Debian package for stability, security & least additional apt repositories

apt install influxdb-client influxdb
scp -P 10022 -r tim@172.17.10.107:/home/tim/backup/influx_snapshot.db/20230713-migrate .
influxd restore -portable /home/tim/backup/influx_snapshot.db/20230713-migrate/

Migrate config, reload

scp -P 10022 tim@172.17.10.107:/etc/influxdb/influxdb.conf influxdb.conf-migrate
sudo diff /etc/influxdb/influxdb.conf /etc/influxdb/influxdb.conf-migrate
sudo service influxdb restart

Add users in InfluxDB (influxdata.com)

influx -precision rfc3339ls -a

CREATE USER influxadmin WITH PASSWORD 'pwd' WITH ALL PRIVILEGES

CREATE USER influxwrite WITH PASSWORD 'pwd'
GRANT WRITE ON collectd TO influxwrite
GRANT WRITE ON smarthomev3 TO influxwrite
CREATE USER influxread WITH PASSWORD 'pwd'
GRANT READ ON collectd TO influxread
GRANT READ ON smarthomev3 TO influxread
CREATE USER influxreadwrite WITH PASSWORD 'pwd'
GRANT ALL ON collectd TO influxreadwrite
GRANT ALL ON smarthomev3 TO influxreadwrite

Test account with curl

chmod o-r ~/.profile
cat << 'EOF' >>~/.profile
export INFLUX_USERNAME=influxadmin
export INFLUX_PASSWORD=pwd
EOF

curl -G http://localhost:8086/query --data-urlencode "q=SHOW DATABASES"
curl -G http://localhost:8086/query -u influxwrite:pwd   --data-urlencode "q=SHOW DATABASES"

In case InfluxDB is not running, check that path to types.db is correct.

Failed to connect to http://localhost:8086: Get "http://localhost:8086/ping": dial tcp [::1]:8086: connect: connection refused
Please check your connection settings and ensure 'influxd' is running.

Restore retention policies (vanwerkhoven.org) (Archive (archive.org)) – NB this was not necessary when restoring from backup

SHOW RETENTION POLICIES ON collectd
CREATE RETENTION POLICY "always" ON "collectd" DURATION INF REPLICATION 1
CREATE RETENTION POLICY "five_days" ON "collectd" DURATION 5d REPLICATION 1 DEFAULT

# For Grafana viewing - see https://github.com/grafana/grafana/issues/4262#issuecomment-475570324
INSERT INTO always rp_config,idx=1 rp="five_days",start=0i,end=432000000i -9223372036854775806
INSERT INTO always rp_config,idx=2 rp="always",start=432000000i,end=3110400000000i -9223372036854775806

# Restore continuous queries
cq_60m_cpu       CREATE CONTINUOUS QUERY cq_60m_cpu ON collectd BEGIN SELECT mean(value) AS value INTO collectd.always.cpu FROM collectd.five_days.cpu GROUP BY time(1h), * END
cq_60m_cpufreq   CREATE CONTINUOUS QUERY cq_60m_cpufreq ON collectd BEGIN SELECT mean(value) AS value INTO collectd.always.cpufreq FROM collectd.five_days.cpufreq GROUP BY time(1h), * END
cq_60m_df        CREATE CONTINUOUS QUERY cq_60m_df ON collectd BEGIN SELECT mean(value) AS value INTO collectd.always.df FROM collectd.five_days.df GROUP BY time(1h), * END
cq_60m_interface CREATE CONTINUOUS QUERY cq_60m_interface ON collectd BEGIN SELECT mean(rx) AS rx, mean(tx) AS tx INTO collectd.always.interface FROM collectd.five_days.interface GROUP BY time(1h), * END
cq_60m_iwinfo    CREATE CONTINUOUS QUERY cq_60m_iwinfo ON collectd BEGIN SELECT mean(value) AS value INTO collectd.always.iwinfo FROM collectd.five_days.iwinfo GROUP BY time(1h), * END
cq_60m_load      CREATE CONTINUOUS QUERY cq_60m_load ON collectd BEGIN SELECT mean(longterm) AS longterm, mean(midterm) AS midterm, mean(shortterm) AS shortterm INTO collectd.always.load FROM collectd.five_days.load GROUP BY time(1h), * END
cq_60m_memory    CREATE CONTINUOUS QUERY cq_60m_memory ON collectd BEGIN SELECT mean(value) AS value INTO collectd.always.memory FROM collectd.five_days.memory GROUP BY time(1h), * END
cq_60m_ping      CREATE CONTINUOUS QUERY cq_60m_ping ON collectd BEGIN SELECT mean(value) AS value INTO collectd.always.ping FROM collectd.five_days.ping GROUP BY time(1h), * END
SHOW CONTINUOUS QUERIES

Home Assistant

VM

Get Home Assistant image (home-assistant.io)

cd /var/lib/vz/template/iso
sudo wget https://github.com/home-assistant/operating-system/releases/download/13.1/haos_ova-13.1.qcow2.xz
sudo xz -d haos_ova-13.1.qcow2.xz 

Create the new VM. See this guide (stefandroid.com) for command-line examples

qm create 101 -agent 1 -tablet 0 -localtime 1 -bios ovmf -cpu host -cores 4 -memory 8192 -name haos -net0 virtio,bridge=vmbr0,macaddr=02:85:73:A4:71:88,tag=10 -onboot 1 -ostype l26 -scsihw virtio-scsi-pci

# Allow xterm.js
qm set 101 -serial0 socket

qm importdisk 101 /var/lib/vz/template/iso/haos_ova-13.1.qcow2 thinpool_vms
qm set 101 --scsi0 thinpool_vms:vm-101-disk-0,cache=writethrough
qm set 101 --boot c --bootdisk scsi0
pvesm alloc thinpool_vms 101 vm-101-disk-1 4M
qm set 101 -efidisk0 thinpool_vms:vm-101-disk-1

Explanation copied from the guide (stefandroid.com)

  1. Create the VM. I’m using 4 cores & 8GB here
  2. Import the decompressed qcow2 image as a disk to the local-lvm storage. Change the storage if you store your Proxmox VMs somewhere else. –> I added ,cache=writethrough in case it’s not default
  3. Assign the imported disk from (2) to the VM.
  4. Set the imported disk from (2) as the boot disk.
  5. Allocate 4 MiB for the EFI disk.
  6. Assign the EFI disk to the VM.

The tteck script (githubusercontent.com) does the same, but a bit more opaquely.

Docker

Migrate config from old machine, see https://www.home-assistant.io/installation/linux#install-home-assistant-container (home-assistant.io)

# Create backup on old config (HA Core)
sudo systemctl stop home-assistant@homeassistant.service
sudo tar cvf ~/homeassistant.tar.gz ~homeassistant/.homeassistant

# Move to new machine & right place
scp oldserver:homeassistant.tar.gz newserver:/var/lib/
scp -r -P 10022 tim@172.17.10.107:homeassistant.tar.gz .
cd /var/lib/ && sudo tar xvf ./homeassistant.tar.gz
sudo mv .homeassistant homeassistant
sudo chown root:root homeassistant
sudo chmod og-rwx homeassistant/

Start docker container via docker run or docker compose:

# Run new docker, do not use --privileged for safety and easier running in LXC
# https://community.home-assistant.io/t/why-does-the-documentation-say-we-need-priviledged-mode-for-a-docker-install-now/336556/2
sudo docker run -d \
  --name homeassistant \
  --restart=unless-stopped \
  -e TZ=Europe/Brussels \
  -v /var/lib/homeassistant:/config \
  --network=host \
  ghcr.io/home-assistant/home-assistant:stable

cat << 'EOF' >> ~tim/docker/home-assistant-compose.yml
#version: '3'
# https://www.home-assistant.io/installation/linux#docker-compose
# docker compose -f home-assistant-compose.yml up -d  # run
# docker compose -f home-assistant-compose.yml pull # update
services:
  homeassistant:
    container_name: homeassistant
    image: "ghcr.io/home-assistant/home-assistant:stable"
    volumes:
      - /var/lib/homeassistant:/config
      - /etc/localtime:/etc/localtime:ro
    restart: unless-stopped
    network_mode: host
    devices:
      - /dev/usb-dresden_elektronik_ingenieurtechnik_GmbH_ConBee_II_DE2149131-if00
EOF
sudo docker compose -f home-assistant-compose.yml up -d

Forward Zigbee USB device - not working

I want some usb devices to be accessible in my containers. We could use udev to set user/owner (stackexchange.com) on the original device, however this might mess up something on the host using those devices. Instead, I use udev to create symlinks (stackexchange.com) and run scripts that copies the devices and chowns them suitable for use in my containers.

First we create udev rules to create symlinks that I can programmatically find (suffix with container-link). I use the same udev rule to run a script after usb devices are online.

cat << 'EOF' | sudo tee /etc/udev/rules.d/65-usb-for-containers.rules
SUBSYSTEM=="tty", ENV{ID_SERIAL}=="FTDI_FT232R_USB_UART_AC2F17KR", SYMLINK+="FTDI_FT232R_USB_UART_AC2F17KR-container-link", RUN+="/usr/local/bin/mk_usb-for-containers.sh"
SUBSYSTEM=="tty", ENV{ID_SERIAL}=="FTDI_FT232R_USB_UART_AQ00K6K3", SYMLINK+="FTDI_FT232R_USB_UART_AQ00K6K3-container-link", RUN+="/usr/local/bin/mk_usb-for-containers.sh"
SUBSYSTEM=="tty", ENV{ID_SERIAL}=="dresden_elektronik_ingenieurtechnik_GmbH_ConBee_II_DE2149131", SYMLINK+="dresden_elektronik_ingenieurtechnik_GmbH_ConBee_II_DE2149131-container-link", RUN+="/usr/local/bin/mk_usb-for-containers.sh"
EOF

In the script I copy the devices to a separate location. Copying nodes can be done with tar (stackoverflow.com) or can be done with (stackoverflow.com) cp -R (ibm.com).

cat << 'EOF' | sudo tee /usr/local/bin/mk_usb-for-containers.sh
#!/usr/bin/env bash
sudo rm -f /lxc/201/devices/*container-link && sudo cp -Lrp /dev/*-container-link /lxc/201/devices/ && sudo chown 100000:100020 /lxc/201/devices/*
EOF
sudo chmod 0750 /usr/local/bin/mk_usb-for-containers.sh

Then reload rules to test this is working

sudo udevadm control --reload-rules && sudo service udev restart && sudo udevadm trigger

Then create the lxc mount points based on the new links in my separate location:

lxc.cgroup2.devices.allow: c 188:* rwm
lxc.mount.entry: /lxc/201/devices/FTDI_FT232R_USB_UART_AC2F17KR-container-link dev/usb-FTDI_FT232R_USB_UART_AC2F17KR-if00-port0 none bind,optional,create=file
lxc.mount.entry: /lxc/201/devices/FTDI_FT232R_USB_UART_AQ00K6K3-container-link dev/usb-FTDI_FT232R_USB_UART_AQ00K6K3-if00-port0 none bind,optional,create=file
lxc.cgroup2.devices.allow: c 166:* rwm
lxc.mount.entry: /lxc/201/devices/dresden_elektronik_ingenieurtechnik_GmbH_ConBee_II_DE2149131-container-link dev/usb-dresden_elektronik_ingenieurtechnik_GmbH_ConBee_II_DE2149131-if00 none bind,optional,create=file

Now apply to Home Assistant Docker image

root@pve:/etc/pve/lxc# ls -l /dev/serial/by-id/usb-dresden_elektronik_ingenieurtechnik_GmbH_ConBee_II_DE2149131-if00
# lrwxrwxrwx 1 root root 13 Jul 27 20:50 /dev/serial/by-id/usb-dresden_elektronik_ingenieurtechnik_GmbH_ConBee_II_DE2149131-if00 -> ../../ttyACM0
root@pve:/etc/pve/lxc# ls -l /dev/ttyACM0 
#crw-rw---- 1 root dialout 166, 0 Jul 27 20:51 /dev/ttyACM0

mkdir -p /lxc/201/devices
cd /lxc/201/devices/
sudo mknod -m 660 usb-dresden_elektronik_ingenieurtechnik_GmbH_ConBee_II_DE2149131-if00 c 166 0
sudo chown 100000:100020 usb-dresden_elektronik_ingenieurtechnik_GmbH_ConBee_II_DE2149131-if00
ls -al /lxc/201/devices/usb-dresden_elektronik_ingenieurtechnik_GmbH_ConBee_II_DE2149131-if00

cat << 'EOF' | sudo tee -a /etc/pve/lxc/201.conf
lxc.cgroup2.devices.allow: c 166:* rwm
lxc.mount.entry: /lxc/201/devices/usb-dresden_elektronik_ingenieurtechnik_GmbH_ConBee_II_DE2149131-if00 dev/usb-dresden_elektronik_ingenieurtechnik_GmbH_ConBee_II_DE2149131-if00 none bind,optional,create=file
EOF

Error when restarting Docker container – WIP

Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error creating device nodes: mount /dev/usb-dresden_elektronik_ingenieurtechnik_GmbH_ConBee_II_DE2149131-if00:/var/lib/docker/overlay2/963a244fa0d220f872cc0e02714e6045b112c5db6404ce5a47903ec936b2e51e/merged/dev/usb-dresden_elektronik_ingenieurtechnik_GmbH_ConBee_II_DE2149131-if00 (via /proc/self/fd/6), flags: 0x1000: no such file or directory: unknown

Not due to old docker: https://forum.proxmox.com/threads/docker-container-would-not-start-after-upgrading-proxmox-ve-8-1-to-8-2-with-oci-runtime-create-failed.145875/ (proxmox.com)

Check mount points (docker.com)

sudo docker container inspect b52be18b84a5 --format '{{ json .Mounts }}' | jq
[
  {
    "Type": "bind",
    "Source": "/var/lib/homeassistant",
    "Destination": "/config",
    "Mode": "rw",
    "RW": true,
    "Propagation": "rprivate"
  },
  {
    "Type": "bind",
    "Source": "/etc/localtime",
    "Destination": "/etc/localtime",
    "Mode": "ro",
    "RW": false,
    "Propagation": "rprivate"
  }
]

Recrate merged/ folder

https://stackoverflow.com/questions/65655199/how-to-fix-docker-container-after-deleting-overlay2-folder (stackoverflow.com) https://stackoverflow.com/questions/70666374/how-to-disable-docker-diff (stackoverflow.com) https://github.com/docker/for-mac/issues/1396 (github.com) – for max specifically


docker system prune --all
sudo systemctl stop docker
sudo rm -r /var/lib/docker/overlay2/
sudo systemctl restart docker
DOCKER_BUILDKIT=0
docker compose down && docker compose build --progress plain --no-cache && docker compose up --> this didnt recreate my images

sudo docker system prune -a

sudo docker compose -f pigallery2-compose.yml pull
sudo docker compose -f pigallery2-compose.yml down
sudo docker system prune
sudo docker compose -f pigallery2-compose.yml up -d

sudo docker compose -f nextcloud-compose.yml pull
sudo docker compose -f nextcloud-compose.yml down
sudo docker system prune
sudo docker compose -f nextcloud-compose.yml up -d

sudo docker compose -f home-assistant-compose.yml pull
sudo docker compose -f home-assistant-compose.yml down
sudo docker system prune
sudo docker compose -f home-assistant-compose.yml up -d

Alternative: retry adding USB using cgroups (stackoverflow.com)

https://stackoverflow.com/questions/24225647/docker-a-way-to-give-access-to-a-host-usb-or-serial-device (stackoverflow.com) https://marc.merlins.org/perso/linux/post_2018-12-20_Accessing-USB-Devices-In-Docker-_ttyUSB0_-dev-bus-usb-_-for-fastboot_-adb_-without-using-privileged.html (merlins.org)

Run zigbee2mqtt - WIP

on Linux

Because USB in docker is not working, running zigbee2mqtt (zigbee2mqtt.io) separately might be more robust.

Check that nodejs is not used anywhere

apt-cache rdepends --installed nodejs

Get custom apt repo for nodejs. I don’t like running custom scripts as root from internet, so I downloaded it and decomposed it into separate commands.

# sudo curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -

sudo apt install -y apt-transport-https ca-certificates curl gnupg

sudo mkdir -p /usr/share/keyrings
sudo rm -f /usr/share/keyrings/nodesource.gpg || true
sudo rm -f /etc/apt/sources.list.d/nodesource.list || true

curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key | sudo gpg --dearmor -o /usr/share/keyrings/nodesource.gpg

node_version="20.x"
arch=$(dpkg --print-architecture)
echo "deb [arch=$arch signed-by=/usr/share/keyrings/nodesource.gpg] https://deb.nodesource.com/node_$node_version nodistro main" | sudo tee /etc/apt/sources.list.d/nodesource.list

# N|solid Config
echo "Package: nsolid" | sudo tee /etc/apt/preferences.d/nsolid > /dev/null
echo "Pin: origin deb.nodesource.com" | sudo tee -a /etc/apt/preferences.d/nsolid > /dev/null
echo "Pin-Priority: 600" | sudo tee -a /etc/apt/preferences.d/nsolid > /dev/null

# Nodejs Config
echo "Package: nodejs" | sudo tee /etc/apt/preferences.d/nodejs > /dev/null
echo "Pin: origin deb.nodesource.com" | sudo tee -a /etc/apt/preferences.d/nodejs > /dev/null
echo "Pin-Priority: 600" | sudo tee -a /etc/apt/preferences.d/nodejs > /dev/null

sudo apt update -y

and install nodejs

sudo apt install nodejs git make g++ gcc libsystemd-dev
# g++ is already the newest version (4:10.2.1-1).
# gcc is already the newest version (4:10.2.1-1).
# git is already the newest version (1:2.30.2-1+deb11u2).
# make is already the newest version (4.3-4.1).
# The following NEW packages will be installed:
#   libsystemd-dev nodejs
# Verify that the correct nodejs and npm (automatically installed with nodejs)
# version has been installed
node --version  # Should output V18.x, V20.x, V21.X
npm --version  # Should output 9.X or 10.X

Create a new user and directory for zigbee2mqtt and set your user as owner of it. We do need a home dir as npm uses it for dependencies.

sudo adduser --disabled-login zigbee2mqtt
sudo adduser zigbee2mqtt dialout

sudo mkdir /opt/zigbee2mqtt
sudo chown -R zigbee2mqtt: /opt/zigbee2mqtt

Create mosquitto account

passwd=$(openssl rand -base64 24)
sudo mosquitto_passwd /etc/mosquitto/passwd zigbee2mqtt
sudo systemctl restart mosquitto.service 

Clone Zigbee2MQTT repository

sudo -u zigbee2mqtt git clone --depth 1 https://github.com/Koenkk/zigbee2mqtt.git /opt/zigbee2mqtt

Install dependencies (as user “pi”)

cd /opt/zigbee2mqtt
sudo -u zigbee2mqtt npm ci

If this command fails and returns an ERR_SOCKET_TIMEOUT error, run this command instead: npm ci --maxsockets 1

Build the app

sudo -u zigbee2mqtt npm run build

Copy and open the configuration file

sudo -u zigbee2mqtt  cp /opt/zigbee2mqtt/data/configuration.example.yaml /opt/zigbee2mqtt/data/configuration.yaml
sudo -u zigbee2mqtt  vim /opt/zigbee2mqtt/data/configuration.yaml
# Home Assistant integration (MQTT discovery)
homeassistant: true

# allow new devices to join, set this to false by default, then temporarily allow via frontend, see https://www.zigbee2mqtt.io/guide/usage/pairing_devices.html#frontend-recommended
permit_join: false

# MQTT settings
mqtt:
  # MQTT base topic for zigbee2mqtt MQTT messages
  base_topic: zigbee2mqtt
  # MQTT server URL
  server: 'mqtt://localhost'
  # MQTT server authentication, uncomment if required:
  user: user
  password: password
  # Choose your channel carefully, see. https://www.metageek.com/training/resources/zigbee-wifi-coexistence/ and https://www.zigbee2mqtt.io/advanced/zigbee/02_improve_network_range_and_stability.html#reduce-wi-fi-interference-by-changing-the-zigbee-channel
  channel: 11

# Note: all options are optional
availability:
  active:
    # Time after which an active device will be marked as offline in
    # minutes (default = 10 minutes)
    timeout: 5
  passive:
    # Time after which a passive device will be marked as offline in
    # minutes (default = 1500 minutes aka 25 hours)
    timeout: 30

# Serial settings
serial:
  # Location of CC2531 USB sniffer
  #port: /dev/ttyACM0
  port: /dev/usb-dresden_elektronik_ingenieurtechnik_GmbH_ConBee_II_DE2149131-if00
  adapter: deconz

advanced:
  # Set network_key: GENERATE to let Zigbee2MQTT generate a new random key on the first start. The configuration.yml gets updated with the new key. Changing the network_key requires repairing of all devices
  # https://www.zigbee2mqtt.io/guide/configuration/zigbee-network.html#network-config
  network_key: GENERATE

frontend: true

Start

cd /opt/zigbee2mqtt
sudo -u zigbee2mqtt npm start

Start as deamon

cat << 'EOF' | sudo tee /etc/systemd/system/zigbee2mqtt.service
[Unit]
Description=zigbee2mqtt
After=network.target

[Service]
Environment=NODE_ENV=production
Type=notify
ExecStart=/usr/bin/node index.js
WorkingDirectory=/opt/zigbee2mqtt
StandardOutput=inherit
# Or use StandardOutput=null if you don't want Zigbee2MQTT messages filling syslog, for more options see systemd.exec(5)
StandardError=inherit
WatchdogSec=10s
Restart=always
RestartSec=10s
User=zigbee2mqtt

[Install]
WantedBy=multi-user.target
EOF

Now configure at http://localhost:8080

Ensure you set Reporting settings correctly, e.g. some Innr plugs report energy update only with large changes (e.g ~0.5kWh). Concretely, set Min rep change to ~0.001 for Cluster seMetering & Attribute currentSummDelivered. See also this github issue (github.com).

On Docker - doesn’t work

Create mosquitto account

passwd=$(openssl rand -base64 24)
sudo mosquitto_passwd /etc/mosquitto/passwd zigbee2mqtt << ${passwd}
sudo systemctl restart mosquitto.service 

Create data directory, get default config, update mosquitto account

sudo mkdir /var/lib/zigbee2mqtt/
sudo chmod o-rwx /var/lib/zigbee2mqtt/

wget https://raw.githubusercontent.com/Koenkk/zigbee2mqtt/master/data/configuration.yaml -P /var/lib/zigbee2mqtt/data 
sudo chmod o-rwx /var/lib/zigbee2mqtt/data/configuration.yaml

Init docker compose config (zigbee2mqtt.io):

cat << 'EOF' | tee ~tim/docker/zigbee2mqtt-compose.yml
# Start: docker compose up -d zigbee2mqtt
# Update: docker compose pull zigbee2mqtt && docker compose up -d zigbee2mqtt
# version: '3.8' # obsolete
services:
  zigbee2mqtt:
    container_name: zigbee2mqtt
    image: koenkk/zigbee2mqtt
    restart: unless-stopped
    volumes:
      - /var/lib/zigbee2mqtt/data:/app/data
      - /run/udev:/run/udev:ro
    ports:
      # Frontend port
      - 8080:8080
    environment:
      - TZ=Europe/Berlin
    devices:
      # Make sure this matched your adapter location
      - /dev/usb-dresden_elektronik_ingenieurtechnik_GmbH_ConBee_II_DE2149131-if00:/dev/ttyACM0
    # Optional: run as rootless mode, but has quite some caveats (see https://docs.docker.com/engine/security/rootless/)
    # group_add:
    #   - dialout
    # user: 1000:1000
EOF

Start

sudo docker compose -f zigbee2mqtt-compose.yml up -d

Creates error for usb forward

Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error creating device nodes: mount /dev/ttyACM0:/var/lib/docker/overlay2/666a6b67f09494fbe8eef8cd80abf60e8d3d091a94bfacb4cb871d6465f526d1/merged/dev/ttyACM0 (via /proc/self/fd/6), flags: 0x1000: no such file or directory: unknown

Clean up docker container

Reverse proxy via nginx

server {
  listen 443 ssl http2;
  listen [::]:443 ssl http2;

  server_name homeassistant.vanwerkhoven.org;

  location / {
    # include snippets/nginx-server-proxy-tim.conf;
    # TvW 20230222 Default options for server blocks acting as reverse proxy. Should be part of location / { }
    # include modules-available/nginx-server-proxy-tim.conf; 
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
    proxy_set_header X-Forwarded-Host $server_name;
    #proxy_set_header X-Forwarded-Ssl on;
    #proxy_set_header Upgrade $http_upgrade;
    #proxy_set_header Connection "upgrade";

    #client_max_body_size 16G;
    proxy_buffering off;
    proxy_pass http://127.0.0.1:8123;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection upgrade;
  }
  # include snippets/nginx-server-ssl-tim.conf;
  # TvW 20230222 Default options for server blocks serving ssl

  # Added 20190122 TvW Add HTTPS strict transport security
  add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;

  ssl_certificate /etc/letsencrypt/live/vanwerkhoven.org/fullchain.pem; # managed by Certbot
  ssl_certificate_key /etc/letsencrypt/live/vanwerkhoven.org/privkey.pem; # managed by Certbot
  include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
  ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot
}

epexspot add-on

To get electricity spot prices directly into Home Assistant, I use the ha_epex_spot (github.com) component. I selected the 3.0.0-dev-5 pre-release as it breaks compatibility with 2.x, and although the version might have some bugs I hope this prevents me from future migration (but beware of premature optimization (xkcd.com)).

wget https://github.com/mampfes/ha_epex_spot/archive/refs/tags/3.0.0-dev-5.zip -O ha_epex_spot-3.0.0-dev-5.zip
unzip ha_epex_spot-3.0.0-dev-5.zip
scp -O -P 22222 -r ha_epex_spot-3.0.0-dev-5/custom_components/epex_spot/ root@homeassistant.lan.vanwerkhoven.org:/mnt/data/supervisor/homeassistant/custom_components

If you get sh: /usr/libexec/sftp-server: not found; scp: Connection closed you’re using a too new OpenSSH (openwrt.org) for the receiving end, that’s why we use the -O flag in scp.

Add epex_spot add-in, select EPEX web scraper.

Tweak ‘Net Price’ to local situation:

  1. Set uplift to 0.10154
  2. Set VAT to 21%

Install Apex-charts (github.com) to see next day prices:

mkdir -p config/www
wget https://github.com/RomRider/apexcharts-card/releases/download/v2.1.2/apexcharts-card.js

Add resource via GUI (github.com):

  1. Make sure, advanced mode is enabled in your user profile (click on your user name to get there)
  2. Navigate to Dashboard –> Edit Dashboard –> Manage resources. Click the ‘+ Add Resource’ icon in bottom-right.
  3. Enter URL /local/apexcharts-card.js and select type “JavaScript Module”.
  4. Restart Home Assistant.

Add chart via these instructions (github.com), with some tweaks (see also here (github.com) and here (github.com):

type: custom:apexcharts-card
header:
  show: true
  title: Electricity Prices
graph_span: 48h
span:
  start: day
now:
  show: true
  label: Now
yaxis:
  - decimals: 0
    min: ~0
    max: ~15
apex_config:
  legend:
    show: false
  xaxis:
    tooltip: false
  tooltip:
    x:
      show: true
      format: HH:00 - HH:59
series:
  - entity: sensor.epex_spot_data_price
    name: Electricity Price
    type: column
    unit: ct/kWh
    extend_to: end
    data_generator: |
      return entity.attributes.data.map((entry) => {
        return [new Date(entry.start_time), entry.price_per_kwh*100];
      });

Solaredge Modbus add-on

Get home-assistant-solaredge-modbus add-on (github.com):

wget https://github.com/binsentsu/home-assistant-solaredge-modbus/archive/refs/tags/V1.11.1.zip -O home-assistant-solaredge-modbus-V1.11.1.zip
unzip home-assistant-solaredge-modbus-V1.11.1.zip
scp -O -P 22223 -r home-assistant-solaredge-modbus-1.11.1/custom_components/solaredge_modbus root@homeassistant.lan.vanwerkhoven.org:config/custom_components/

Migrate HA to MariaDB

@TODO Figure out how to setup MariaDB later (if we need it at all)

Optimize configuration: add mariadb, influxdb, tweak recorder to only store relevant stuff https://smarthomescene.com/guides/optimize-your-home-assistant-database/ (smarthomescene.com) https://community.home-assistant.io/t/migrating-home-assistant-database-from-sqlite-to-mariadb/96895 (home-assistant.io)

# Remove 
sudo apt remove mysql-server-8.0 mysql-server mysql-client-8.0 mysql-client-core-8.0 mysql-common
sudo apt-get remove --purge
sudo apt install mariadb-server

# Fix apparmor because of old mysql installation
# https://askubuntu.com/questions/1185710/mariadb-fails-despite-apparmor-profile
# https://stackoverflow.com/questions/40997257/mysql-service-fails-to-start-hangs-up-timeout-ubuntu-mariadb
echo "# TvW 20230127 fix apparmor issue mariadb" | sudo tee -a /etc/apparmor.d/usr.sbin.mysqld
echo "/usr/sbin/mysqld { }" | sudo tee -a /etc/apparmor.d/usr.sbin.mysqld
sudo apparmor_parser -v -R /etc/apparmor.d/usr.sbin.mysqld
sudo systemctl restart mariadb
sudo reboot
# Did not help? Try reboot
#sudo /etc/init.d/apparmor reload

sudo mysql_secure_installation

## create database
mysql -e 'CREATE SCHEMA IF NOT EXISTS `hass_db` DEFAULT CHARACTER SET utf8mb4'
## create user (use a safe password please)
mysql -e "CREATE USER 'hass_user'@'localhost' IDENTIFIED BY 'pwd'"
mysql -e "GRANT ALL PRIVILEGES ON hass_db.* TO 'hass_user'@'localhost'"
mysql -e "GRANT usage ON *.* TO 'hass_user'@'localhost'"

Migrate: method 1, use only SQL

pip install sqlite3-to-mysql
sqlite3mysql -f ./home-assistant_v2.db -d hass_db -u hass_user -p

Migrate: method 2, todo

sqlite3 ~homeassistant/.homeassistant/home-assistant_v2.db .dump > hadump.sql
git clone https://github.com/athlite/sqlite3-to-mysql

recorder:
  auto_purge: true
  purge_keep_days: 21
  auto_repack: true
  db_url: mysql://user:password@localhost/homeassistant?unix_socket=/var/run/mysqld/mysqld.sock&charset=utf8mb4

ntfy.sh

Install using Debian instructions (ntfy.sh).

sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://archive.heckel.io/apt/pubkey.txt | sudo gpg --dearmor -o /etc/apt/keyrings/archive.heckel.io.gpg
sudo apt install apt-transport-https
sudo sh -c "echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/archive.heckel.io.gpg] https://archive.heckel.io/apt debian main' \
    > /etc/apt/sources.list.d/archive.heckel.io.list"  
sudo apt update
sudo apt install ntfy
sudo systemctl enable ntfy
sudo systemctl start ntfy

Add subdomain for ntfy (local and global DNS), then add reverse proxy to nginx (ntfy.sh):

server {
        listen 443 ssl http2;
        listen [::]:443 ssl http2;

        server_name ntfy.vanwerkhoven.org;

        location / {
                include snippets/nginx-server-proxy-tim.conf;
                proxy_buffering off;
                # Use fixed IP instead because DNS might not be up yet
                # resuting in error
                # "nginx: [emerg] host not found in upstream"
                # https://stackoverflow.com/questions/32845674/nginx-how-to-not-exit-if-host-not-found-in-upstream
                resolver 172.17.10.1 valid=30s;
                set $upstream_ha ntfy.lan.vanwerkhoven.org;
                proxy_pass http://$upstream_ha:2586;
                proxy_set_header Upgrade $http_upgrade;
                proxy_set_header Connection “upgrade”;
                client_max_body_size 0; # Stream request body to backend
        }
        include snippets/nginx-server-ssl-tim.conf;
        include snippets/nginx-server-cert-vanwerkhoven-tim.conf;
}

Test config and restart

sudo nginx -t
sudo systemctl restart nginx.service

Configure ntfy service /etc/ntfy/server.yml:

base-url: "http://ntfy.vanwerkhoven.org"
listen-http: ":2586"
cache-file: "/var/cache/ntfy/cache.db"
attachment-cache-dir: "/var/cache/ntfy/attachments"
attachment-total-size-limit: "256M"
attachment-file-size-limit: "15M"
attachment-expiry-duration: "3h"
behind-proxy: true
# For notifications via Firebase & APNS
upstream-base-url: "https://ntfy.sh"

auth-file: "/var/lib/ntfy/user.db"
auth-default-access: "deny-all"

Notification topics layout depending on audience

Create users

ntfy user add --role=user app_client
ntfy user add --role=user app_pub

ntfy access app_pub t_* rw
ntfy access app_client t_* ro

sudo ntfy token add app_pub

Restart

sudo systemctl restart ntfy

Test pub/sub locally

ntfy sub -d http://ntfy.lan.vanwerkhoven.org:2586/test
ntfy pub -d http://ntfy.lan.vanwerkhoven.org:2586/test test

Test via reverse proxy

ntfy sub --token tk_iu9tgrhirttz854zisxf7q6hzrkhb -d https://ntfy.vanwerkhoven.org/t_all
ntfy pub --token tk_iu9tgrhirttz854zisxf7q6hzrkhb -t "Title hello world"  -d https://ntfy.vanwerkhoven.org/t_all message

Integrate in HA, either using Apprise (home-assistant.io) or REST (home-assistant.io) command (diecknet.de). I chose the latter for less dependencies, adding this to configuration.yaml directly:

shell_command:
    ntfy: >
        curl
        -X POST
        --url 'https://ntfy.vanwerkhoven.org/{{ topic | default("t_verbose") }}'
        --data '{{ message }}'
        --header 'X-Title: {{ title }}'
        --header 'X-Tags: {{ tags }}'
        --header 'X-Priority: {{ priority | default('default')}}'
        --header 'X-Delay: {{ delay }}'
        --header 'X-Actions: {{ actions }}'
        --header 'X-Click: {{ click }}'
        --header 'X-Icon: {{ icon }}'        
        --header 'Authorization: Bearer tk_xx'

Some useful emojis for tags (ntfy.sh):

Add notification to automations

action: shell_command.ntfy
data:
  tags: arrow_forward
  topic: t_verbose
  title: Dryer started

action: shell_command.ntfy
data:
  tags: white_check_mark
  topic: t_all
  title: Dryer finished
  message: >-
    Used {{ ((states(power_consumption_sensor_var) | float) - 
    (power_consumption_start | float)) | round(2) }}{{power_consumption_unit}}

Add to proxmox –> via GUI

Add to Grafana, see instructions in ntfy docs (ntfy.sh).

Optionally define a shorter message template under ‘Contact points’ -> ‘Notification template’ -> ‘+Add notification template group’ -> ‘Add example’ -> ‘Default template for notification messages’, then edit as desired, e.g.:

{{- /* This is a copy of the "default.message" template. */ -}}
{{- /* Edit the template name and template content as needed. */ -}}
{{ define "default.message.brief" }}{{ if gt (len .Alerts.Firing) 0 }}**Firing**
{{ template "__text_alert_list.brief" .Alerts.Firing }}{{ if gt (len .Alerts.Resolved) 0 }}

{{ end }}{{ end }}{{ if gt (len .Alerts.Resolved) 0 }}**Resolved**
{{ template "__text_alert_list.brief" .Alerts.Resolved }}{{ end }}{{ end }}

{{ define "__text_alert_list.brief" }}{{ range . }}
Value: {{ template "__text_values_list.brief" . }}
Labels:
{{ range .Labels.SortedPairs }} - {{ .Name }} = {{ .Value }}
{{ end }}{{ end }}{{ end }}

{{ define "__text_values_list.brief" }}{{ if len .Values }}{{ $first := true }}{{ range $refID, $value := .Values -}}
{{ if $first }}{{ $first = false }}{{ else }}, {{ end }}{{ $refID }}={{ $value }}{{ end -}}
{{ else }}[no value]{{ end }}{{ end }}

Add ntfy to crontab (ntfy.sh)

curl -H 'Authorization: Bearer tk_xx' -H tags:warning -H prio:high -d "Certificate renewal failed" https://ntfy.vanwerkhoven.org/t_all

Grafana

We can either use apt or the docker image. I go for apt here so I can more easily re-use my letsencrypt certificate via /etc/grafana/grafana.ini (grafana.com).

Install for Debian (grafana.com)

sudo apt-get install -y apt-transport-https
sudo apt-get install -y software-properties-common wget
sudo wget -q -O /usr/share/keyrings/grafana.key https://apt.grafana.com/gpg.key

Add repo

echo "deb [signed-by=/usr/share/keyrings/grafana.key] https://apt.grafana.com stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list

Install

sudo apt-get install grafana

Start now & start automatically

sudo systemctl daemon-reload
sudo systemctl start grafana-server
sudo systemctl status grafana-server
sudo systemctl enable grafana-server.service

Enable HTTPS using letsencrypt certificate (grafana.com)

sudo ln -s /etc/letsencrypt/live/vanwerkhoven.org/privkey.pem /etc/grafana/grafana.key
sudo ln -s /etc/letsencrypt/live/vanwerkhoven.org/fullchain.pem /etc/grafana/grafana.crt

# Allow access
sudo groupadd letsencrypt-cert
sudo usermod --append --groups letsencrypt-cert grafana

sudo chgrp -R letsencrypt-cert /etc/letsencrypt/*
sudo chmod -R g+rx /etc/letsencrypt/*
sudo chgrp -R grafana /etc/grafana/grafana.crt /etc/grafana/grafana.key
sudo chmod 400 /etc/grafana/grafana.crt /etc/grafana/grafana.key

Migrate configuration

  1. Install used plugin on new server – later
  2. Stop Grafana service on source and destination server – OK
  3. Copy /var/lib/grafana/grafana.db from old to new server – OK
  4. Check /etc/grafana/grafana.ini - OK
  5. Reconnect to datasource
scp -P 10022 tim@172.17.10.107:/etc/grafana/grafana.ini /etc/grafana/grafana.ini-migrate
sudo diff /etc/grafana/grafana.ini /etc/grafana/grafana.ini-migrate

Grafana notifications

https://grafana.com/docs/grafana/latest/alerting/fundamentals/alert-rules/message-templating/ (grafana.com) https://grafana.com/docs/grafana/latest/alerting/manage-notifications/template-notifications/using-go-templating-language/ (grafana.com)

Nextcloud

Install regular Docker image (github.com) (instead of the all-in-one image with possibly too much junk). Enable cron requires a separate container (github.com), and link to Redis by setting:

cat << 'EOF' | tee ~tim/docker/nextcloud-compose.yml
# https://github.com/nextcloud/docker#running-this-image-with-docker-compose
# docker compose -f nextcloud-compose.yml up -d 


volumes:
  nextcloud:
  db:

services:
  db:
    image: mariadb:10.11
    restart: always
    command: --transaction-isolation=READ-COMMITTED --log-bin=binlog --binlog-format=ROW
    volumes:
      - db:/var/lib/mysql
    environment:
      - MYSQL_ROOT_PASSWORD=dtgUUVhZcbuGmks4gajHBHcnXX2yXyve
      - MYSQL_PASSWORD=LlI8Fcp0vBi7QIgcwQ02vFGtH8I2Wn2B
      - MYSQL_DATABASE=nextcloud
      - MYSQL_USER=nextcloud

  redis:
    image: redis:alpine
    restart: always

  app:
    image: nextcloud:30
    restart: always
    ports:
      - 9081:80
    depends_on:
      - db
      - redis
    links:
      - db
      - redis
    volumes:
      - nextcloud:/var/www/html
    environment:
      - MYSQL_PASSWORD=LlI8Fcp0vBi7QIgcwQ02vFGtH8I2Wn2B
      - MYSQL_DATABASE=nextcloud
      - MYSQL_USER=nextcloud
      - MYSQL_HOST=db
      - APACHE_DISABLE_REWRITE_IP=1
      - TRUSTED_PROXIES=172.17.10.0/24
      - REDIS_HOST=redis

  cron:
    image: nextcloud:30
    volumes_from:
      - app
    entrypoint: /cron.sh
EOF

sudo docker compose -f nextcloud-compose.yml up -d 

Migrate data, then force Nextcloud to scan for new files (nextcloud.com) (see useful docs on external storage (nextcloud.com)).

rsync -e 'ssh -p 10022' --archive --progress --verbose root@172.17.10.107:/var/snap/nextcloud/common/nextcloud/data/timlow/files/ .
docker exec -u www-data 715a4e700667 php occ files:scan --all

Add https schema for reverse proxy config (nextcloud.com)

vim /var/lib/docker/volumes/docker_nextcloud/_data/config/config.php 
# 'overwrite.cli.url' => 'https://nextcloud.vanwerkhoven.org'
# 'overwritehost'     => 'nextcloud.vanwerkhoven.org',
# 'overwriteprotocol' => 'https',
# 'overwritewebroot'  => '/',

Update: should probably not do this but instead use parameters like specified here (github.com):

    environment:
      - APACHE_DISABLE_REWRITE_IP=1
      - TRUSTED_PROXIES=172.17.10.0/24

Enable file uploads >2M

Optional: make certain folders accessible outside Docker using bind mount, first as trial, then permanently on boot via /etc/fstab:

sudo mount -o bind /var/lib/docker/volumes/docker_nextcloud/_data/data/tim/files/alexandra /media/alexandra

cat << 'EOF' | sudo tee -a /etc/fstab
/var/lib/docker/volumes/docker_nextcloud/_data/data/tim/files/alexandra /media/alexandra none bind
EOF

Running OCC commands (github.com) is done as:

docker compose exec --user www-data app php occ
sudo docker compose -f nextcloud-compose.yml exec --user www-data app php occ db:add-missing-indices
sudo docker compose -f nextcloud-compose.yml exec --user www-data app php occ maintenance:repair --include-expensive

Upgrading nextcloud is documented on github (github.com)

  1. Increment major version by 1 in compose file
  2. Run sudo docker compose -f nextcloud-compose.yml pull && docker compose -f nextcloud-compose.yml up -d

This worked smoothly from 24 –> 25 –> 26 –> 27 –> 28 –> 29 –> 30, and MariaDB from 10.5 to 10.11 (at the final step as suggested by nextcloud itself).

If you get 502 Bad Gateway from nginx, nextcloud is likely still starting up.

PiGallery2

Configure docker compose file (github.com) for PiGallery2 only (github.com), we do the reverse nginx proxy ourselves. Furthermore, bind mount (docker.com) the images directory directly to the source in the Nextcloud Docker volume.

cat << 'EOF' > ~tim/docker/pigallery2-compose.yml
version: '3.2'
# Version 3.2 required for long-syntax volume configuration -- see https://docs.docker.com/compose/compose-file/compose-file-v3/#volumes
# Source: https://github.com/bpatrik/pigallery2/blob/master/docker/README.md
# docker compose -f pigallery2-compose.yml up -d

services:
  pigallery2:
    image: bpatrik/pigallery2:latest
    container_name: pigallery2
    environment:
      - NODE_ENV=production # set to 'debug' for full debug logging
    volumes:
      - "/var/lib/pigallery/config:/app/data/config" # CHANGE ME -> OK
      - "db-data:/app/data/db"
      # - "/media/alexandra:/app/data/images:ro" # CHANGE ME -> OK
      - type: bind
        source: /var/lib/docker/volumes/docker_nextcloud/_data/data/tim/files/alexandra/
        target: /app/data/images
        read_only: true
      - "/var/lib/pigallery/tmp:/app/data/tmp" # CHANGE ME -> OK
    ports:
      - 3010:80
    restart: always

volumes:
  db-data:
EOF

sudo docker compose -f pigallery2-compose.yml pull
sudo docker compose -f pigallery2-compose.yml up -d 

Add virtual host, something like below:

server {
        listen 443 ssl http2;
        listen [::]:443 ssl http2;

        server_name photos.vanwerkhoven.org;

        location / {
                include snippets/nginx-server-proxy-tim.conf; 
                client_max_body_size 1G;
                proxy_pass http://127.0.0.1:3010;
         }
        include snippets/nginx-server-ssl-tim.conf;

        ssl_certificate /etc/letsencrypt/live/vanwerkhoven.org/fullchain.pem; # managed by Certbot
        ssl_certificate_key /etc/letsencrypt/live/vanwerkhoven.org/privkey.pem; # managed by Certbot
}

Login, configure settings, create sharing link.

Mosquitto

Install daemon and clients

sudo apt install mosquitto mosquitto-clients

Port configuration, don’t use SSL for now

cat << 'EOF' | sudo tee  /etc/mosquitto/conf.d/tim.conf
# TvW 20190818
# From https://www.digitalocean.com/community/questions/how-to-setup-a-mosquitto-mqtt-server-and-receive-data-from-owntracks
connection_messages true
log_timestamp true

# https://www.digitalocean.com/community/tutorials/how-to-install-and-secure-the-mosquitto-mqtt-messaging-broker-on-ubuntu-16-04
# TvW 201908
allow_anonymous false
password_file /etc/mosquitto/passwd

listener 1883
EOF

cat << 'EOF' | sudo tee /etc/mosquitto/conf.d/ssl-tim.conf.off
# Letsencrypt needs different CA https://mosquitto.org/blog/2015/12/using-lets-encrypt-certificates-with-mosquitto/ 
# Or not?
#cafile /etc/ssl/certs/DST_Root_CA_X3.pem
certfile /etc/letsencrypt/live/vanwerkhoven.org/cert.pem
cafile /etc/letsencrypt/live/vanwerkhoven.org/chain.pem
keyfile /etc/letsencrypt/live/vanwerkhoven.org/privkey.pem

tls_version tlsv1.2
listener 8883
EOF

Port users from old server

sudo touch /etc/mosquitto/passwd
sudo chown mosquitto /etc/mosquitto/passwd
sudo chmod og-rwx /etc/mosquitto/passwd
cat << 'EOF' | sudo tee -a /etc/mosquitto/passwd
user:$6$SALT$7HASH==
EOF

Test run config (sudo is important, else you might get Error: Unable to write pid file.)

sudo /usr/sbin/mosquitto -c /etc/mosquitto/mosquitto.conf -v

If running mosquitto >2.0 and using letsencrypt certificates, ensure to copy them properly after deployment (mosquitto.org) using e.g. this script (github.com). I’m not using this as it requires too many moving parts. Instead, consider using a 100-years self-signed cert.

Worker scripts

co2signal2influxdb

Done, no extra actions

SBFspot

Because it’s not possible to tunnel bluetooth on LXC guests, install this one on pve host (source (~January 2020) (proxmox.com) (’ Bluetooth in Linux funktioniert als Netzwerk-Device, sprich, beim Laden des passenden Treibers für Bluetooth-Hardware registriert sich das ‘hci’ device nicht als USB, sondern als Netzwerk-Adapter. Das durchreichen der USB /dev-node ist damit also nutzlos, weil die Kommunikation über ganz andere Schnittstellen funktioniert.’) - NB this (github.com) and this (proxmox.com) don’t work for bluetooth).

Install SBFspot (github.com) from source (no builds for x86):

sudo apt-get -y --no-install-recommends install bluetooth libbluetooth-dev
sudo apt-get install -y libboost-date-time-dev libboost-system-dev libboost-filesystem-dev libboost-regex-dev
sudo apt-get install -y sqlite3 libsqlite3-dev
sudo apt-get install -y g++
sudo apt-get install -y mosquitto-clients

sudo mkdir /var/log/sbfspot.3
sudo chown -R $USER:$USER /var/log/sbfspot.3

sbfspot_version=3.9.7
wget –c https://github.com/SBFspot/SBFspot/archive/V$sbfspot_version.tar.gz

# Slightly tweaked from docs
mkdir sbfspot-$sbfspot_version
tar -xvf V$sbfspot_version.tar.gz -C sbfspot-$sbfspot_version --strip-components 1

cd sbfspot-$sbfspot_version/SBFspot
make -j4 sqlite
sudo make install_sqlite

Port data / configuration from previous setup

scp -r -P 10022 tim@172.17.10.107:/usr/local/bin/sbfspot.3/SBFspot.cfg /usr/local/bin/sbfspot.3/
sudo chown root:tim /usr/local/bin/sbfspot.3/SBFspot.cfg
sudo chmod o-r SBFspot.cfg
sudo chmod g+w SBFspot.cfg
rsync -av -e "ssh -p 10022" tim@172.17.10.107:/var/lib/sbfspot /var/lib/

Check bluetooth devices, select one to use

hcitool dev
# Devices:
#   hci1  00:19:0E:07:4E:47 # Belkin (Atech)
#   hci0  04:EA:56:87:A6:12 # Intel

Add to cron

# SBFspot, every minute in sync with smeterd. since SBFspot/bluetooth don't
*/1 5-22 * * * /home/tim/workers/SBFspot2influxdb/get_sbfspot_daydata.sh
30 23 * * * /usr/local/bin/sbfspot.3/SBFspot -sp0 -ad7 -am2 -ae2 -finq -q 2>&1 | logger -p user.err; /home/tim/workers/SBFspot2influxdb/sbfspot_month2influxdb.sh

Repair any gaps from SMA history

/home/tim/workers/sbfspot2influxdb/sbfspot_month2influxdb.sh 20230714

epexspot2influx2b.py

pip install entsoe-py

heat meter

On host: identify & forward USB port (github.com)

ls -l /dev/serial/by-id/usb-FTDI_FT232R_USB_UART_AQ00K6K3-if00-port0
# lrwxrwxrwx 1 root root 13 Jul 15 22:01 /dev/serial/by-id/usb-FTDI_FT232R_USB_UART_AQ00K6K3-if00-port0 -> ../../ttyUSB0
ls -l /dev/ttyUSB1
crw-rw---- 1 root dialout 188, 1 Jul 15 22:03 /dev/ttyUSB1

mkdir -p /lxc/201/devices
cd /lxc/201/devices/
sudo mknod -m 660 usb-FTDI_FT232R_USB_UART_AQ00K6K3-if00-port0 c 188 1
sudo chown 100000:100020 usb-FTDI_FT232R_USB_UART_AQ00K6K3-if00-port0
ls -al /lxc/201/devices/usb-FTDI_FT232R_USB_UART_AQ00K6K3-if00-port0

cat << 'EOF' | sudo tee -a /etc/pve/lxc/201.conf
lxc.cgroup2.devices.allow: c 188:* rwm
lxc.mount.entry: /lxc/201/devices/usb-FTDI_FT232R_USB_UART_AQ00K6K3-if00-port0 dev/usb-FTDI_FT232R_USB_UART_AQ00K6K3-if00-port0 none bind,optional,create=file
EOF

smeterd

On host: identify & forward USB port (github.com)

ls -l /dev/serial/by-id/usb-FTDI_FT232R_USB_UART_AC2F17KR-if00-port0 
# lrwxrwxrwx 1 root root 13 Jul 15 22:01 /dev/serial/by-id/usb-FTDI_FT232R_USB_UART_AC2F17KR-if00-port0 -> ../../ttyUSB0
ls -l /dev/ttyUSB0
# crw-rw---- 1 root dialout 188, 0 Jul 15 22:01 /dev/ttyUSB0

mkdir -p /lxc/201/devices
cd /lxc/201/devices/
mknod -m 660 usb-FTDI_FT232R_USB_UART_AC2F17KR-if00-port0 c 188 0
chown 100000:100020 usb-FTDI_FT232R_USB_UART_AC2F17KR-if00-port0
ls -al /lxc/201/devices/usb-FTDI_FT232R_USB_UART_AC2F17KR-if00-port0

cat << 'EOF' | sudo tee -a /etc/pve/lxc/201.conf
lxc.cgroup2.devices.allow: c 188:* rwm
lxc.mount.entry: /lxc/201/devices/usb-FTDI_FT232R_USB_UART_AC2F17KR-if00-port0 dev/usb-FTDI_FT232R_USB_UART_AC2F17KR-if00-port0 none bind,optional,create=file
EOF

Get script, add user to dialout

sudo apt install python3-pip -y
pip install smeterd

sudo usermod --append --groups dialout tim

Live DNS IP updater

@TODO migrate to new server & set live

Via gandi-live-dns-config.py

sudo install -m 600 -o tim -g tim /dev/null /etc/gandi-live-dns-config.py # equivalent to touch && chmod 600 && chown root:root
cat << 'EOF' | sudo tee /etc/gandi-live-dns-config.py
# my config
api_secret='secret API string goes here'
domains={'vanwerkhoven.org':['www','home','nextcloud','photos','alexandramaya']}
ttl='1800' # our IP doesnt change that often, 30min down is ~OK
ifconfig4='http://whatismyip.akamai.com' # returns ipv4
ifconfig6='' # disabled until we get IPv6 right for VPN/firewall/etc.
#ifconfig6='https://ifconfig.co/ip' # returns ipv6
interface='' # set empty because else we get local ipv6
EOF

# Add crontab entry
# TvW 20210927 Disabled because I want some subdomains ipv4-only (home) because 
# of VPN. Also, if my IPv6 address changes I need to update router firewalling
# and port forwarding as well. -- Update: run all hostnames as ipv4 only for now
*/5 * * * * python3 /home/tim/workers/gandi-live-dns/src/gandi-live-dns.py >/dev/null 2>&1 

Nginx

Here we install and configure nginx. This DigitalOcean guide (digitalocean.com) is a useful reference for nginx configuration.

More sources:

Base install

sudo apt install nginx

Inspect, clean, and migrate nginx configuration:

sudo install -m 644 -o root -g root /dev/null /etc/nginx/conf.d/nginx-http-tim.conf # equivalent to touch && chmod 644 && chown root:root
cat << 'EOF' | sudo tee /etc/nginx/conf.d/nginx-http-tim.conf
# TvW 20230222 Additional default http block configuration settings, included automatically by default nginx.conf

# TvW 20200604 Disabled don't advertise version
server_tokens off;

# Add log format separate per virtual host so we can use goaccess to view who visits the server
# Parse using 
# `goaccess /var/log/nginx/access.log --log-format=VCOMBINED -o report-all.html`
log_format vcombined '$host: $remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"';

access_log /var/log/nginx/access.log vcombined;

# TvW 20230222 expand gzip options - don't remember why, probably speed
gzip_vary on;
gzip_proxied any;
gzip_comp_level 6;
gzip_buffers 16 8k;
gzip_http_version 1.1;
gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;
EOF
sudo install -m 644 -o root -g root /dev/null /etc/nginx/snippets/nginx-server-site-tim.conf # equivalent to touch && chmod 644 && chown root:root
cat << 'EOF' | sudo tee /etc/nginx/snippets/nginx-server-site-tim.conf
# TvW 20230222 Default options for server blocks serving files
# include snippets/nginx-server-site-tim.conf;

# Add index.php to the list if you are using PHP
index index.php index.html index.htm index.nginx-debian.html;

location / {
  # First attempt to serve request as file, then
  # as directory, then fall back to displaying a 404.
  try_files $uri $uri/ =404;
  # This is cool because no php is touched for static content.
  # include the "?$args" part so non-default permalinks doesn't break when using query string
  #try_files                $uri $uri/ /index.php?$args;
}

# deny access to .htaccess files, if Apache's document root 
# concurs with nginx's one
location ~ /\.ht {
  deny all;
}

# Cache control
location ~* \.(?:js|css|png|jpg|jpeg|webp|gif|ico)$ {
  expires 30d;
  add_header Cache-Control "public, no-transform";
}
EOF
sudo install -m 644 -o root -g root /dev/null /etc/nginx/snippets/nginx-server-proxy-tim.conf # equivalent to touch && chmod 644 && chown root:root
cat << 'EOF' | sudo tee /etc/nginx/snippets/nginx-server-proxy-tim.conf
# TvW 20230222 Default options for server blocks acting as reverse proxy. Should be part of location / { }
# include snippets/nginx-server-proxy-tim.conf; 
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-Forwarded-Host $server_name;
#proxy_set_header X-Forwarded-Ssl on;
#proxy_set_header Upgrade $http_upgrade;
#proxy_set_header Connection "upgrade";
EOF
sudo install -m 644 -o root -g root /dev/null /etc/nginx/snippets/nginx-server-ssl-tim.conf # equivalent to touch && chmod 644 && chown root:root
cat << 'EOF' | sudo tee /etc/nginx/snippets/nginx-server-ssl-tim.conf
# TvW 20230222 Default options for server blocks serving ssl
# include snippets/nginx-server-ssl-tim.conf;

# Added 20190122 TvW Add HTTPS strict transport security
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
# Added 20190121 TvW Logjam attack - see weakdh.org
ssl_dhparam /etc/ssl/private/dhparams_weakdh.org.pem;
EOF

Fix logrotate conf to keep logs for a year (instead of 14 days):

cat << 'EOF' | sudo tee /etc/logrotate.d/nginx
/var/log/nginx/*.log {
  # Rotate weekly instead of default daily
  weekly
  missingok
  # Keep 52 instead of 14 files
  rotate 52
  compress
  # Don't delay, compress after first rotation
  # delaycompress
  notifempty
  create 0640 www-data adm
  sharedscripts
  prerotate
    if [ -d /etc/logrotate.d/httpd-prerotate ]; then \
      run-parts /etc/logrotate.d/httpd-prerotate; \
    fi \
  endscript
  postrotate
    invoke-rc.d nginx rotate >/dev/null 2>&1
  endscript
}
EOF

Parse logs (goaccess.io) into visually digestable data using Goaccess:

# --persist/--keep-db-files on all files parsed
# --restore/--load-from-disk on second & subsequent files parsed
mkdir -p /tmp/goaccess-{nextcloud,photos,all}/

# At reboot, run goaccess on all files, then run on latest file every 5min
zgrep --no-filename "^nextcloud.vanwerkhoven.org" /var/log/nginx/access.log* | nice -n 19 goaccess --log-format=VCOMBINED -o /var/www/html/stats/report-nextcloud.html --keep-db-files --db-path /tmp/goaccess-nextcloud/ - 
zgrep --no-filename "^nextcloud.vanwerkhoven.org" /var/log/nginx/access.log | nice -n 19 goaccess --log-format=VCOMBINED -o /var/www/html/stats/report-nextcloud.html --load-from-disk --keep-db-files --db-path /tmp/goaccess-nextcloud/ - 
zgrep --no-filename "^photos.vanwerkhoven.org" /var/log/nginx/access.log* | nice -n 19  goaccess --log-format=VCOMBINED -o /var/www/html/stats/report-photos.html --keep-db-files --db-path /tmp/goaccess-photos/ - 
zgrep --no-filename "^photos.vanwerkhoven.org" /var/log/nginx/access.log | nice -n 19  goaccess --log-format=VCOMBINED -o /var/www/html/stats/report-photos.html --keep-db-files --load-from-disk --db-path /tmp/goaccess-photos/ - 
zgrep --no-filename -v '^nextcloud.vanwerkhoven.org\|^photos.vanwerkhoven.org' /var/log/nginx/access.log* | nice -n 19 goaccess --log-format=VCOMBINED -a -o /var/www/html/stats/report-all.html --keep-db-files --db-path /tmp/goaccess-all/ - 
zgrep --no-filename -v '^nextcloud.vanwerkhoven.org\|^photos.vanwerkhoven.org' /var/log/nginx/access.log | nice -n 19 goaccess --log-format=VCOMBINED -a -o /var/www/html/stats/report-all.html --keep-db-files --load-from-disk --db-path /tmp/goaccess-all/ - 

# Optional in case of problems, use something like below (from: https://goaccess.io/faq#configuration)
# LC_TIME="en_US.UTF-8" bash -c 'goaccess /var/log/nginx/access.log --log-format=VCOMBINED -o report.html'
# Dump & inspect existing config
nginx -T

# Migrate config
scp -r oldserver:/etc/ngnix/nginx.conf newserver:/etc/ngnix/nginx.conf # -- if you don't have tweaks here you might want to keep the vanilla configuration in case something's improved.
scp -r oldserver:/etc/ngnix/conf.d/ newserver:/etc/ngnix/conf.d/
scp -r oldserver:/etc/ngnix/modules-available/ newserver:/etc/ngnix/modules-available/
scp -r oldserver:/etc/ngnix/sites-available/ newserver:/etc/ngnix/sites-available/
scp -r oldserver:/etc/ngnix/sites-enabled/ newserver:/etc/ngnix/sites-enabled/

Migrate Certbot. Recommended package manager is snap (eff.org), which has some FOSS (letsencrypt.org)issues (linuxmint.com) being closed source. Hence we stick with apt for now, which has an older version (1.12.0), but which should be fine (I was still using 0.40.0 on my old Ubuntu server).

Alternatives:

Install certbot client, this installs both /etc/cron.d/certbot and a systemd timer which can be seen running systemctl list-timers (see this explanation (letsencrypt.org)).

sudo apt install certbot python3-certbot-dns-gandi

Two options:

  1. Get new certificate with maybe new account (preferred)
  2. Migrate certificates

New certificates

sudo apt install certbot python3-certbot-dns-gandi python3-certbot-nginx

sudo install -m 600 -o root -g root /dev/null /etc/letsencrypt/gandi.ini # equivalent to touch && chmod 600 && chown root:root
cat << 'EOF' | sudo tee /etc/letsencrypt/gandi.ini
 # live dns v5 api key
certbot_plugin_gandi:dns_api_key=APIKEY

# optional organization id, remove it if not used
certbot_plugin_gandi:dns_sharing_id=SHARINGID
EOF

# Get certificate, use old plugin syntax because debian uses an old certbot client
sudo certbot certonly -a certbot-plugin-gandi:dns --certbot-plugin-gandi:dns-credentials /etc/letsencrypt/gandi.ini -d vanwerkhoven.org -d \*.vanwerkhoven.org --server https://acme-v02.api.letsencrypt.org/directory

# IMPORTANT NOTES:
#  - Congratulations! Your certificate and chain have been saved at:
#    /etc/letsencrypt/live/<domain>/fullchain.pem
#    Your key file has been saved at:
#    /etc/letsencrypt/live/<domain>/privkey.pem
#    Your certificate will expire on 2023-05-24. To obtain a new or
#    tweaked version of this certificate in the future, simply run
#    certbot again. To non-interactively renew *all* of your
#    certificates, run "certbot renew"


# Optional: Run nginx installer to install to servers, else install manually
sudo certbot run --nginx --certbot-plugin-gandi:dns-credentials /etc/letsencrypt/gandi.ini -d vanwerkhoven.org -d \*.vanwerkhoven.org --server https://acme-v02.api.letsencrypt.org/directory

# Optional: install automatic certificate renewal (also installed by default), either explicitly using the plugin, or implicitly via settings stored in /etc/letsencrypt/renewal/<domain>.org.conf
0 0 * * 0 certbot renew -q --authenticator dns-gandi --dns-gandi-credentials /etc/letsencrypt/gandi.ini --server https://acme-v02.api.letsencrypt.org/directory # explicitly use settings
0 0 * * 0 certbot renew -q # implictly use settings

Migrate certificates

Transfer settings/certs (serverfault.com), something like:

ssh proteus
sudo scp -r /etc/letsencrypt/*  <target>

Didn’t work this out

Deploy Let’s Encrypt certificates

@TODO figure out how to propagate the certificate safely and automatically across services Push certificate to PVE (proxmox.com)

  1. Use SSH with unencrypted public key authentication only available to specific user
  2. Use shared disk mount / mount point, copy new certificates there, poll daily from receiving server
cp fullchain.pem /etc/pve/nodes/pve/pveproxy-ssl.pem
cp private-key.pem /etc/pve/nodes/pve/pveproxy-ssl.key

Jellyfin

sudo apt install jellyfin

GPU acceleration in lxc guest

Set up GPU acceleration (from this reddit post (reddit.com)):

On host, identify hardware device:

apt install vainfo
ls -l /dev/dri
drwxr-xr-x 2 root root         80 Jul 21 20:08 by-path
crw-rw---- 1 root video  226,   0 Jul 21 20:08 card0
crw-rw---- 1 root render 226, 128 Jul 21 20:08 renderD128

Check what group ids video and render have:

grep "video\|render" /etc/group
video:x:44:
render:x:103:

Prepare UID mapping to guest, allow root to map group ids video and render via /etc/subgid:

cat << 'EOF' | sudo tee -a /etc/subgid
root:44:1
root:103:1
EOF

Now pass hardware through to lxc, and group map video and render. Note I have to merge this with my group mapping of bulkdata user/groups.

cat << 'EOF' | sudo tee -a /etc/pve/lxc/201.conf
lxc.cgroup2.devices.allow: c 226:* rwm
lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir

lxc.idmap: g 0 100000 44
lxc.idmap: g 44 44 1
lxc.idmap: g 45 100045 60
lxc.idmap: g 105 103 1
lxc.idmap: g 106 100106 904
EOF

After this, reboot the LXC container:

# From host:
sudo pct reboot 201
# From guest:
sudo reboot

Now prepare the guest, using Debian native packages (alternatively, install from Intel apt repo (intel.com))

sudo usermod -aG render,video root
sudo usermod -aG render,video jellyfin

sudo apt install --no-install-recommends libva2 libigdgmm11 mesa-va-drivers intel-media-va-driver-non-free
sudo apt install vainfo

Now vainfo should work:

sudo vainfo
error: can't connect to X server!
libva info: VA-API version 1.10.0
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_10
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.10 (libva 2.10.0)
vainfo: Driver version: Intel iHD driver for Intel(R) Gen Graphics - 21.1.1 ()
vainfo: Supported profile and entrypoints
      VAProfileNone                   : VAEntrypointVideoProc
      VAProfileNone                   : VAEntrypointStats
      VAProfileMPEG2Simple            : VAEntrypointVLD
      VAProfileMPEG2Simple            : VAEntrypointEncSlice
      VAProfileMPEG2Main              : VAEntrypointVLD
      VAProfileMPEG2Main              : VAEntrypointEncSlice
      VAProfileH264Main               : VAEntrypointVLD
      VAProfileH264Main               : VAEntrypointEncSlice
      VAProfileH264Main               : VAEntrypointFEI
      VAProfileH264Main               : VAEntrypointEncSliceLP
      VAProfileH264High               : VAEntrypointVLD
      VAProfileH264High               : VAEntrypointEncSlice
      VAProfileH264High               : VAEntrypointFEI
      VAProfileH264High               : VAEntrypointEncSliceLP
      VAProfileVC1Simple              : VAEntrypointVLD
      VAProfileVC1Main                : VAEntrypointVLD
      VAProfileVC1Advanced            : VAEntrypointVLD
      VAProfileJPEGBaseline           : VAEntrypointVLD
      VAProfileJPEGBaseline           : VAEntrypointEncPicture
      VAProfileH264ConstrainedBaseline: VAEntrypointVLD
      VAProfileH264ConstrainedBaseline: VAEntrypointEncSlice
      VAProfileH264ConstrainedBaseline: VAEntrypointFEI
      VAProfileH264ConstrainedBaseline: VAEntrypointEncSliceLP
      VAProfileVP8Version0_3          : VAEntrypointVLD
      VAProfileVP8Version0_3          : VAEntrypointEncSlice
      VAProfileHEVCMain               : VAEntrypointVLD
      VAProfileHEVCMain               : VAEntrypointEncSlice
      VAProfileHEVCMain               : VAEntrypointFEI
      VAProfileHEVCMain10             : VAEntrypointVLD
      VAProfileHEVCMain10             : VAEntrypointEncSlice
      VAProfileVP9Profile0            : VAEntrypointVLD
      VAProfileVP9Profile2            : VAEntrypointVLD

Profit!

Plex

Install as Docker image or via apt source (plex.tv) (I chose apt install because less dependencies)

echo deb https://downloads.plex.tv/repo/deb public main | sudo tee /etc/apt/sources.list.d/plexmediaserver.list
curl https://downloads.plex.tv/plex-keys/PlexSign.key | sudo apt-key add -
sudo apt install plexmediaserver

Disable local network auth (plex.tv) in advanced settings (plex.tv)

<Preferences allowedNetworks="172.17.0.0/255.255.0.0" />

For first login, log in via localhost using ssh tunnel, e.g.

ssh -L 32400:localhost:32400 proteus
open http://localhost:32400/web

Open: set up https reverse proxy or not?

Collectd

On proxmox host (to get pure CPU stats). Need to manually add ping and snmp libs to prevent error ERROR: dlopen("/usr/lib/collectd/ping.so") failed: liboping.so.0: cannot open shared object file

sudo apt install collectd-core liboping0 libsnmp40

Set up /etc/collectd/collectd.conf:

datadir: "/var/lib/collectd/rrd/"
libdir: "/usr/lib/collectd/"

BaseDir "/var/run/collectd"
Include "/etc/collectd/conf.d"
PIDFile "/var/run/collectd.pid"
PluginDir "/usr/lib/collectd"
TypesDB "/usr/share/collectd/types.db"

Hostname "pve"
Interval 60

LoadPlugin memory
<Plugin memory>
  ValuesPercentage false
  ValuesAbsolute true
</Plugin>

LoadPlugin cpu
<Plugin cpu>
  ValuesPercentage false
  ReportByCpu false
  ReportByState false
</Plugin>

LoadPlugin cpufreq

LoadPlugin ping
<Plugin ping>
  TTL 127
  Interval 60
  AddressFamily ipv4
  Host "dataix.ru"
  Host "linx.net"
  Host "ams-ix.net"
</Plugin>

LoadPlugin network
<Plugin network>
  Server "influxdb.lan.vanwerkhoven.org" "25826"
  Forward false
</Plugin>

LoadPlugin load

LoadPlugin snmp
<Plugin snmp>
  <Data "ifmib_if_octets32_table">
    Type "if_octets"
    Table true
    TypeInstanceOID "IF-MIB::ifDescr"
    # TypeInstancePrefix "if" # Optional
    Values "IF-MIB::ifInOctets" "IF-MIB::ifOutOctets"
    InvertMatch true
    Ignore "*eth1*"
  </Data>
  # Numerical approach, less readable once passed on to influxdb because interface name is not used
  # <Data "ifmib_if_octets32">
  #   Type "if_octets"
  #   Table false
  #   TypeInstance "iso.3.6.1.2.1.2.2.1.2.9"
  #   Values "iso.3.6.1.2.1.2.2.1.10.9" "iso.3.6.1.2.1.2.2.1.16.9"
  # </Data>
  <Host "vyos">
    Address "172.17.10.1"
    Version 2
    Community "public"
    Collect "ifmib_if_octets32" "ifmib_if_octets32_table"
    Interval 30
  </Host>
</Plugin>

LoadPlugin df
 <Plugin "df">
   Device "tank"
   Device "rpool"
   Device "rpool/ROOT/pve-1"
 </Plugin>

Enable snmp on VyOS, this flow from vyos -> proxmox -> influxdb server.

set service snmp community public authorization ro
set service snmp community public network 172.17.10.0/24
set service snmp listen-address 172.17.10.1

Test this, then find what you need (I wanted traffic through my WAN interface)

See also:

snmpwalk -v 2c -c public 172.17.10.1 | less
# Get MIB on your collectd server:
# MIB search path: /home/tim/.snmp/mibs:/usr/share/snmp/mibs:/usr/share/snmp/mibs/iana:/usr/share/snmp/mibs/ietf

mkdir -p /tmp/migrate_mibs/
scp -r vyos@vyos:/usr/share/snmp/mibs /tmp/migrate_mibs/
sudo chown root:root /tmp/migrate_mibs/mibs/*.txt
sudo mv /tmp/migrate_mibs/mibs/*.txt /usr/share/snmp/mibs/

# IF: iso.3.6.1.2.1.2.2.1.2.9 = STRING: "eth1.300"
# RX: iso.3.6.1.2.1.2.2.1.10.9 = Counter32: 1306354856
# TX: iso.3.6.1.2.1.2.2.1.16.9 = Counter32: 968162683

Monitor LVM usage

LVM plugin for collectd is deprecated, and telegraf is too new for Debian. Instead we hack together a plugin in Python (or maybe shell?)

Data we need:

sudo lvs -S 'lv_attr =~ ^t'
  LV            VG  Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  thinpool_data pve twi-aotz--  <4.95t             25.52  19.78                           
  thinpool_vms  pve twi-aotz-- 512.00g             17.65  16.10                           

tim@pve:~$ sudo lvs -S 'lv_attr =~ ^V' 
  LV            VG  Attr       LSize   Pool          Origin Data%  Meta%  Move Log Cpy%Sync Convert
  lv_backup_mba pve Vwi-aotz-- 256.00g thinpool_data        2.96                                   
  lv_backup_mbp pve Vwi-aotz--   1.00t thinpool_data        1.86                                   

Desired format (only LV name and Data% and meta%):

  thinpool_data 25.52  19.78                           
  thinpool_vms  17.65  16.10                           

Check lvmreport how to shape the right names Get list of output cols: sudo lvs -O help

sudo lvs -S 'lv_attr =~ ^t|V' -o lv_name,data_percent,metadata_percent --noheading --separator ","

Allow non-root to run this via /etc/sudoers

adduser --no-create-home --disabled-login collectd-plugin
cat << 'EOF' >>/etc/apt/sources.list
%collectd-plugin ALL= NOPASSWD: /usr/sbin/lvs 
EOF

Alternatively you could set capabilities on the lvs binary itself, however this allows all users to run this – see here (stackexchange.com) and here (stackoverflow.com).

Create collectd Exec plugin (collectd.org) based on the lvs output, inspired by df (collectd.org) type/instance use. See the plain text protocol (collectd.org) and collectd-exec(5) man page (collectd.org) for details.

sudo install -m 755 /dev/null /usr/local/bin/collectd-lvm-usage.sh
cat << 'EOF' | sudo tee /usr/local/bin/collectd-lvm-usage.sh
#!/usr/bin/env bash
# /usr/local/bin/collectd-lvm-usage
HOSTNAME="${COLLECTD_HOSTNAME:-localhost}"
INTERVAL="${COLLECTD_INTERVAL:-60}"

while true; do
  while read line; do
    # Trim whitespace https://unix.stackexchange.com/questions/102008/how-do-i-trim-leading-and-trailing-whitespace-from-each-line-of-some-output
    IFS=',' linearr=($(echo "${line}" | awk '{$1=$1};1'))
    volname=${linearr[0]:-undefined}
    dataused=${linearr[1]:-99}
    echo "PUTVAL \"$HOSTNAME/lvm-${volname}/percent_bytes-used_data\" interval=$INTERVAL N:${dataused}"
    metaused=${linearr[2]}
    if [[ -n ${metaused} ]]; then
      echo "PUTVAL \"$HOSTNAME/lvm-${volname}/percent_bytes-used_meta\" interval=$INTERVAL N:${metaused}"
    fi
  done <<< "$(sudo lvs -S 'lv_attr =~ ^t|V' -o lv_name,data_percent,metadata_percent --noheading --separator ",")"
  sleep "$INTERVAL";
done
EOF

Add this to your collectd config:

LoadPlugin exec
<Plugin exec>
  Exec "collectd-plugin:collectd-plugin" "/usr/local/bin/collectd-lvm-usage.sh"
</Plugin>

Transmission

Install tranmission

sudo apt install transmission-daemon
usermod -aG bulkdata debian-transmission

Update config, ensure daemon is stopped to prevent overwriting on daemon exit

sudo systemctl stop transmission-daemon.service 
    "blockl:st-url": "http://list.iblocklist.com/?list=bt_level1&fileformat=p2p&archiveformat=gz",
    "download-dir": "/mnt/bulk/temp",
    "incomplete-dir": "/mnt/bulk/temp",
    "rpc-authentication-required": false, # optional, else could leave transmission:transmission default
    "rpc-whitelist": "127.0.0.1,172.17.20.*",

Add port 9091 to VyOS firewall

set firewall name FW_TRUST2INFRA rule 210 destination port +9091

Crashes and recovery

Unscheduled power off

On pve:

Aug 11 13:14:37 pve systemd-modules-load[438]: Inserted module 'kvmgt'
Aug 11 13:14:37 pve systemd-modules-load[438]: Failed to find module 'exngt'
Aug 11 13:14:37 pve systemd-modules-load[438]: Failed to find module 'vfio-mdev'
[...]
Aug 11 13:14:37 pve systemd-fsck[560]: There are differences between boot sector and its backup.
Aug 11 13:14:37 pve systemd-fsck[560]: This is mostly harmless. Differences: (offset:original/backup)
Aug 11 13:14:37 pve systemd-fsck[560]:   65:01/00
Aug 11 13:14:37 pve systemd-fsck[560]:   Not automatically fixing this.
Aug 11 13:14:37 pve systemd-fsck[560]: Dirty bit is set. Fs was not properly unmounted and some data may be corrupt.
Aug 11 13:14:37 pve systemd-fsck[560]:  Automatically removing dirty bit.
Aug 11 13:14:37 pve systemd-fsck[560]: *** Filesystem was changed ***
Aug 11 13:14:37 pve systemd-fsck[560]: Writing changes.
Aug 11 13:14:37 pve systemd-fsck[560]: /dev/nvme0n1p2: 5 files, 84/130812 clusters

Changelog

20231011: extended PVE root partition to 10GB because disk was full (8GB was too optimistic)

sudo swapoff -a
free -h
sudo vim /etc/fstab
lsblk
sudo lvreduce --size -2G /dev/mapper/pve-swap
sudo lvextend -l +100%FREE /dev/mapper/pve-root
lsblk -o +PARTTYPE
sudo resize2fs /dev/mapper/pve-root
df -h

20240809: extended backup_vm volume because backups were getting too big

sudo lvextend -L +0.25T /dev/mapper/pve-lv_backup_vms
sudo resize2fs /dev/mapper/pve-lv_backup_vms

#Networking #Nextcloud #Nginx #Security #Server #Smarthome #Debian #Vyos #Proxmox #Unix