16 data disks and counting
I have a lot of data- the 12 3.5" slots on the storage primary host are all filled with the exception of a cold spare and a 20TB disk I connect quarterly to update the offline backup. Most of the data is fairly unique- personal pictures/videos, backups of old websites, images of prior computers (one day I’ll virtualize them for fun), tons of (actual) Linux ISOs (I keep a copy of almost all I use), and more. I’m a frequent visitor to the Internet Archive and like to keep personal copies of things.
Being most of the data is irreplaceable I have had quality backups for a long time (3-2-1 to the extreme).
The hardware / data
All data lives on ZFS with all pools able to withstand at least 1 disk failure:
tank
- 42TB
- Primary archive storage segmented into
tier0
andtier1
- 6 disks, 3
vdevs
of mirrored disks - Some “shucked”, some Seagate Exos
ssdpool01
- 500GB
nvme
disk pool for VMs- 2 disks, 1
vdev
of mirrored disks - Samsung consumer grade SSDs that are heavily abused
scratchpool01
- 1TB
- Heavy write workloads
- 2 disks, 1
vdev
of mirrored disks - Ancient 15k 2.5" SAS disks from reddit
scratchpool02
- 4TB
- Reliable ingest/cache for “uncommitted” data
- 2 disks, 1
vdev
of mirrored disks - Ancient 7200k 3.5" SAS disks from reddit
archivedr
- 42TB
- Primary archive storage
- 6 disks, 3
vdevs
of mirrored disks - Some “shucked”, some Seagate Exos
Backups
Backups are performed differently based on workload:
- Archive data in
tank
backed up via file-level tools via a central VM mounting thetank
ZFS filesystem over NFS - Virtual Machine data in
ssdpool01
,scratchpool01
, andscratchpool02
are backed up via Proxmox Backup Server
Archive Data
This is the “exciting” backup method. I have 4 backups of the “Archive” data:
Tier 0 (critical data)
borg
link backup to borgbase in Europerestic
link backup to BackBlaze B2 in California
Tier 0 and 1 (everything)
borg
link backup to a ZFS pool on a secondary host over 10gb ethernetrestic
link quarterly “offline” backup to a 20TB Seagate Exos which lives in a safe
Virtual Machines
This is the less exciting backup method with only 1 local online copy.
I elaborate more in the 2023 Summer Homelab services architecture post but I keep all local services in self-contained VMs to make backup/restore dead simple. Most of the VMs are simply backed up in a consistent way (qemu freeze/thaw) via Proxmox Backup Server every night at 12:30am CT. It’s boring and reliable but I’ve used Proxmox Backup Server to restore 25+ VMs during host swaps so it’s been a mainstay ever since Proxmox Backup Server was released to replace the vzdump method in 2020.
Security
All data (local and remote) backups are encrypted prior to upload. Borg, restic, and Proxmox Backup Server treat encrypted backups as first class citizens which makes this trivial to implement (in restic’s case, you don’t get a choice).
Another “security” item is the choice of tooling- all Archive data is backed up using 2 tools in case of corruption via an update or issues with the backup repository.
Future Improvements
All of my infrastructure is being migrated to be provisioned as code which will help with backing up services. The exception are databases but the offline backup method will likely move from a bash script to Ansible that can dump databases and copy them over to something.
Summary
This backup architecture is a little wasteful (similar to the services hosting). I’ll need to trim things up a bit as I scale up but I don’t expect much scaling in the foreseeable future as my free time has dwindled (affecting time to search for interesting data sets). For now, I generally sleep pretty well knowing the data is highly available and backed up with multiple tools over many mediums (cloud and local).