Does the dandelion drop all its seeds at the base of its stalk? Does the cuckoo lay its eggs in one nest? So long as your backups are in one place, you are vulnerable to the fortunes of the world.
Time Machine works well for local backups on macOS. Recently, I did an inventory of my data assets and came to the conclusion that it would be best to have an off-site backup of my data, in addition to a local Time Machine backup. It’s a good idea to have some redundancy in case the Time Machine HDD gets lost, damaged or corrupted.
There are two main requirements that I had when choosing an off-site backup tool:
- Must have end-to-end encryption (E2EE)
- Must be open-source (mostly so that I can easily E2EE claim)
The first requirement excludes services like Dropbox and Google Drive from the search. The second requirement excludes closed-source software that claim to have E2EE (e.g. Arq, Backblaze).
In the end, I decided to learn how to use BorgBackup. BorgBackup (or Borg) is an open source “deduplicating archiver with compression and encryption”. This project has many contributors and turned 10 years old in March 2020.
There are many good articles describing how Borg works and how to use it, so I won’t focus on that (but I will link to these articles in the process). Instead, this article is a compilation of notes that I took while setting up and automating Borg on macOS Catalina.
FYI this article was originally written for macOS Catalina 10.15.5 and borg 1.1.13. In Dec 2023 it received a major update for macOS Sonoma 14.2.1 and borg 1.2.7.
Hosting
I decided to go with Hetzner Storage Box since it comes with Borg support. In this case there is no need to worry about setting up Borg server, only the client on the local machine. The only thing that needs to be done on the server is adding the local machine SSH public key to ~/.ssh/authorized_keys
on the server. Storage Boxes use RAID which is great since Borg does not add redundancy to deal with hardware malfunction.
In general, any VPS hosting would work; you just need to make sure that both the client and the server have Borg installed.
Installation
There is a Brew formula for Borg so installation on macOS is simple:
brew install borgbackup
For the server, there are distribution packages available for most common Linux and BSD distros.
Remote backup setup
Repo initialization
Before a backup can be made a repository has to be initialized. The main decision that has to be made at this point is encryption key mode selection (this cannot be changed later).
Borg offers various options for authenticated encryption with associated data (AEAD). All options use AES-CTR-256 for encryption.
See Borg docs on Encryption and borg init for more information on these options.
Example initialization:
borg init --encryption=repokey-blake2 \
ssh://[email protected]:23/./backup/mbp2015
Backup script
I used a combination of sources to write the backup script:
Hetzner article give a detailed walkthrough of a typical archive-then-prune workflow, I won’t go into great detail here. Borg usage docs are also very useful, especially when writing command arguments.
Here is my final backup script that I’ve been using for my daily backups for the last several years. This script contains detailed comments in case any of the sources goes down; remove the comments in between command arguments before running:
#!/bin/sh
export BORG_FILES_CACHE_TTL=40 # https://borgbackup.readthedocs.io/en/stable/faq.html#it-always-chunks-all-my-files-even-unchanged-ones
# default repo location so that we can use '::archive' shorthand notation later
export BORG_REPO='ssh://[email protected]:23/./backup/mbp2015'
# get repo passphrase from 1Password
export BORG_PASSCOMMAND='/opt/homebrew/bin/op read "op://borg-backup/remote/passphrase"'
# some helpers and error handling:
info() { printf "\n%s %s\n\n" "$( date )" "$*" >&2; }
trap 'echo $( date ) Backup interrupted >&2; exit 2' INT TERM
info "Starting backup"
# create a daily backup
/opt/homebrew/bin/borg create \
# work on log level INFO;
--verbose \
# output list of items added (A), modified (M)
# also output if error happened when accessing a file (E)
--list --filter=AME \
# print stats for the created archive at the end; log the return code (rc)
--stats --show-rc \
# use lzma compression (low speed, high compression)
# use a heuristic to decide per chunk whether to compress or not (auto)
--compression auto,lzma,6 \
# repo name is inferred from $BORG_REPO
'::{hostname}-daily-{now}' \
# directories to backup
/Users/mmxmb/my_important_docs \
/Users/mmxmb/my_photos \
/Users/mmxmb/Desktop
backup_exit=$?
info "Pruning repository"
# prune the repo
/opt/homebrew/bin/borg prune \
# output verbose list of archives kept/pruned; display prune stats at the end
--list --stats \
# only consider archive names starting with this prefix
--glob-archives '{hostname}-daily-' \
--show-rc \
# number of archives to keep for each time interval
# example visualisation: https://borgbackup.readthedocs.io/en/stable/usage/prune.html
--keep-daily 7 \
--keep-weekly 5 \
--keep-monthly 6
prune_exit=$?
# use highest exit code as exit code
global_exit=$(( backup_exit > prune_exit ? backup_exit : prune_exit ))
if [ ${global_exit} -eq 1 ];
then
info "Backup and/or Prune finished with a warning"
fi
if [ ${global_exit} -gt 1 ];
then
info "Backup and/or Prune finished with an error"
fi
exit ${global_exit}
Note that I am using 1Password as my secrets manager. In particular, I store a backup server ssh key and an encryption passphrase in a dedicated vault. I use 1Password SSH agent to access the ssh private key (configured in my ssh config) and 1Password CLI to access the passphrase. Set BORG_RSH
if you need to customize how Borg uses ssh (e.g. to specify path to private key file).
One minor issue with running Borg in a shell script is that 1Password tells me that bash
is trying to access my vault which is very broad:
I prefer to wrap the shell script in a Go executable for audit purposes:
package main
import (
"log"
"os"
"os/exec"
"path/filepath"
)
func main() {
ex, err := os.Executable()
if err != nil {
log.Fatal(err)
}
dir := filepath.Dir(ex)
script := filepath.Join(dir, "backup.sh")
cmd := exec.Command("/bin/sh", script)
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
if err := cmd.Run(); err != nil {
log.Fatal(err)
}
}
Save the compiled binary as remote_backup
. Now 1Password shows a better message:
At this point it is a good idea to stop and interact Borg to check that everything works smoothly before moving on to the automation step. In my case, I initialized a test Borg repo, used the above script to create an archive of a directory full of text files (so that I don’t have to wait much for compression and encryption), tested extract and prune commands, and finally deleted the repo.
Remote backup automation
Hetzner tutorial assumes that cron
is used for scheduling the backup process; some other articles that you may find online use systemd
. Since cron
has been deprecated by Apple back in 2005, we need to use launchd
which is kind of like systemd
but for macOS.
Since I want to automate backups for my laptop, i.e. a machine that is not always online and that is used only by me when logged in as mmxmb
, it makes sense to define a user agent job. A user agent job is a job that runs on behalf of a currently logged in user. Job definition for launchd
is specified in a special XML file called a property list. Here is a remote Borg backup propery list that I use:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>mmxmb.borgbackup-remote</string>
<key>Program</key>
<string>/path/to/remote/backup/script/go_wrapper_binary</string>
<key>StandardErrorPath</key>
<string>/path/to/remote/backup/log/backup.log<</string>
<key>StandardOutPath</key>
<string>/path/to/remote/backup/log/backup.log</string>
<key>RunAtLoad</key>
<true/>
<key>StartCalendarInterval</key>
<dict>
<key>Hour</key>
<integer>12</integer>
<key>Minute</key>
<integer>0</integer>
</dict>
</dict>
</plist>
This file is then saved as ~/Library/LaunchAgents/mmxmb.borgbackup-remote.plist
(any job.label.plist
filename would work). In general, all user agent jobs should be stored in ~/Library/LaunchAgents
.
Here’s an overview of relevant job properties:
Label
is a unique identifier for of a job for thelaunchd
instance.Program
is the path to the executable. In this case it’s the backup script.StandardOutPath
andStandardErrorPath
are for redirecting output generated by the script. These are important for observability and debugging.RunAtLoad
run the job as soon as it is loaded. For an agent this means run at login.StartCalendarInterval
schedules the job to run at a specific time. In this case the job is run every day at 12:00 PM.
For more information on launchd and job properties see launchd docs.
The job is loaded using:
launchctl bootstrap gui/$(id -u $(whoami)) ~/Library/LaunchAgents/mmxmb.borgbackup-remote.plist
If the job is loaded successfully, you should see the following notification:
And the following message in Console:
To unload the job use:
launchctl bootout gui/$(id -u $(whoami)) ~/Library/LaunchAgents/mmxmb.borgbackup-remote.plist
For more information on launchctl see launchctl docs.
Now the backup job should run every day at noon or right after login. It’s good to checks logs once in a while to make sure that jobs run successfully.
As an aside, one advantage of using launchd
over cron
on a laptop is that cron
jobs do not execute if the system is turned off or asleep. launchd
jobs scheduled with StartCalendarInterval
run when computer wakes up, if the computer was asleep when the job should have run. However, if the machine is off when the job should have run, the job does not execute until the next designated time occurs. See Apple Developer Documentation Archive: Scheduling Timed Jobs.
Local backup
At this point why not use Borg for local backups as well?
The way I use my laptop, an external backup HDD can be attached for a few days, while I use the laptop at my desk, and then detached for some time when I need to take my laptop with me somewhere. The fact that the HDD is not always attached to the laptop (as opposed to an always available backup server) requires the backup script and launchd
job definition to be adjusted slightly.
In particular, having a local backup run once every 24 hours is nice. This condition is easily achievable with StartCalendarInterval
launchd
property, just like in the remote backup job definition. But if a backup drive hasn’t been attached for a few days it would be ideal if the backup job runs as soon as the drive is re-attached, and not on the next StartCalendarInterval
trigger. In Linux, this can be achieved using a udev rule (see Automated backups to a local hard drive tutorial). Since udev doesn’t exist on macOS, here’s one way to achieve similar functionality with another launchd
property and some Bash.
Local backup launch daemon job definiton:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>mmxmb.borgbackup-local</string>
<key>Program</key>
<string>/path/to/local/backup/script/local_backup</string>
<key>StandardErrorPath</key>
<string>/path/to/local/backup/log/backup.log</string>
<key>StandardOutPath</key>
<string>/path/to/local/backup/log/backup.log</string>
<key>StartCalendarInterval</key>
<dict>
<key>Hour</key>
<integer>11</integer>
<key>Minute</key>
<integer>0</integer>
</dict>
<key>WatchPaths</key>
<array>
<string>/Volumes/backup-disk</string>
</array>
</dict>
</plist>
The new property, that is not used in remote backup job definition, is WatchPaths
. Here’s how WatchPaths
works when pointed at a directory:
If the path points to a directory, creating and removing this directory, as well as creating, removing and writing files in this directory will start the job. Actions performed in subdirectories of this directory will not be detected.
If the backup volume is named backup-disk
and it has backup
directory which contains Borg repos then then the backup job is triggered when the volume is mounted or unmounted. From the point of view of WatchPaths
that is equivalent to /Volumes/backup-disk
directory being created or deleted.
Since the backup needs to start only when the volume is mounted, the backup script needs to handle the case when the job is triggered when the volume is unmounted:
#!/bin/sh
# it seems that sometimes launchd job is triggered on volume mount
# but the disk is not immediately accessible, so sleeping for a bit helps
sleep 5
DISK_NAME=backup-disk
MOUNTPOINT=/Volumes/$DISK_NAME
# some helpers and error handling:
info() { printf "\n%s %s\n\n" "$( date )" "$*" >&2; }
trap 'echo $( date ) Backup interrupted >&2; exit 2' INT TERM
# exit if disk is not mounted; launchd job gets triggered both when disk is mounted/unmounted
if [ ! -d "$MOUNTPOINT" ]; then
info "The disk $MOUNTPOINT is not mounted. Exiting."
exit 0
fi
export BORG_FILES_CACHE_TTL=40 # https://borgbackup.readthedocs.io/en/stable/faq.html#it-always-chunks-all-my-files-even-unchanged-ones
# default repo location so that we can use '::archive' shorthand notation later
export BORG_REPO="$MOUNTPOINT/backup/mbp2015"
export BORG_PASSCOMMAND='/opt/homebrew/bin/op read "op://borg-backup/local/passphrase"'
# get unix time of the last complete backup
# source: https://projects.torsion.org/witten/borgmatic/issues/86
LAST_BACKUP_TIME=`/usr/local/bin/borg list --sort timestamp --format '{time:%s}{TAB}{name}{NEWLINE}' | grep -v '\.checkpoint$' | tail -1 | cut -f 1`
# find time difference between now and last complete backup
NOW_TIME=`date +"%s"`
SECONDS_SINCE_LAST=$((NOW_TIME - LAST_BACKUP_TIME))
SECONDS_IN_DAY=86400
# exit if last backup took place less than 24 hours ago
if [ $SECONDS_SINCE_LAST -lt $SECONDS_IN_DAY ];
then
info "Last backup happened less than 24 hours ago. Exiting."
exit 0
fi
info "Starting local backup"
# create a daily backup
/opt/homebrew/bin/borg create \
--verbose \
--list --filter=AME \
--stats --show-rc \
--compression auto,lzma,6 \
'::{hostname}-daily-{now}' \
/Users/mmxmb/my_important_docs \
/Users/mmxmb/my_photos \
/Users/mmxmb/Desktop
backup_exit=$?
info "Pruning local repository"
# prune the repo
/opt/homebrew/bin/borg prune \
--list --stats \
--glob-archives '{hostname}-daily-' \
--show-rc \
--keep-daily 7 \
--keep-weekly 5 \
--keep-monthly 6
prune_exit=$?
# use highest exit code as exit code
global_exit=$(( backup_exit > prune_exit ? backup_exit : prune_exit ))
if [ ${global_exit} -eq 1 ];
then
info "Backup and/or Prune finished with a warning"
fi
if [ ${global_exit} -gt 1 ];
then
info "Backup and/or Prune finished with an error"
fi
exit ${global_exit}
This script also contains some logic preventing the backup job from creating a new archive too often, i.e. when a backup drive is re-attached many times throughout the day.