Does the dandelion drop all its seeds at the base of its stalk? Does the cuckoo lay its eggs in one nest? So long as your backups are in one place, you are vulnerable to the fortunes of the world.
Time Machine works well for local backups on macOS. Recently, I did an inventory of my data assets and came to the conclusion that it would be best to have an off-site backup of my data, in addition to a local Time Machine backup. It’s a good idea to have some redundancy in case the Time Machine HDD gets lost, damaged or corrupted.
There are two main requirements that I had when choosing an off-site backup tool:
- Must have end-to-end encryption (E2EE)
- Must be open-source (mostly so that I can easily verify E2EE claim)
The first requirement excludes services like Dropbox and Google Drive from the search. The second requirement excludes closed-source software that claim to have E2EE (e.g. Arq, Backblaze).
In the end, I decided to learn how to use BorgBackup. BorgBackup (or Borg) is an open source “deduplicating archiver with compression and encryption”. This project has many contributors and turned 10 years old in March 2020. Borg has also been called the “holy grail of backups”; this definitely intrigued me.
There are many good articles describing how Borg works and how to use it, so I won’t focus on that (but I will link to these articles in the process). Instead, this article is a compilation of notes that I took while setting up and automating Borg on macOS Catalina.
FYI this article was written using macOS Catalina 10.15.5 and borg 1.1.13.
Hosting
I decided to go with Hetzner Storage Box since it comes with Borg support. In this case there is no need to worry about setting up Borg server, only the client on the local machine. The only thing that needs to be done on the server is adding the local machine SSH public key to ~/.ssh/authorized_keys
on the server. Storage Boxes use RAID which is great since Borg does not add redundancy to deal with hardware malfunction.
In general, any VPS hosting would work; you just need to make sure that both the client and the server have Borg installed.
Installation
There is a Brew cask for Borg so installation on macOS is simple:
brew cask install borgbackup
For the server, there are distribution packages available for most common Linux and BSD distros.
Remote backup setup
Repo initialization
Before a backup can be made a repository has to be initialized. The main decision that has to be made at this point is encryption key mode selection (this cannot be changed later).
Borg offers 4 options for authenticated encryption with associated data (AEAD):
repokey
: the key is stored inside the repo but still needs a passphrase; use HMAC-SHA-256 for authenticationkeyfile
: the key is stored with the client, still needs a passphrase; use HMAC-SHA-256 for authenticationrepokey-blake2
: likerepokey
but use BLAKE2b-256 for authenticationkeyfile-blake2
: likekeyfile
but use BLAKE2b-256 for authentication
All 4 options use AES-256-CTR for encryption.
See Borg docs on Encryption and borg init for more information on these options.
Example initialization:
borg init --encryption=repokey-blake2 \
ssh://username@username.your-storagebox.de:23/./backup/mbp2015
Backup script
I used a combination of sources to write the backup script:
- The Practical Administrator: Backups using Borg
- Hetzner Tutorials: Install and Configure BorgBackup
- Borg Documentation
Since both articles give a detailed walkthrough of a typical archive-then-prune workflow, I won’t go into great detail here. Borg usage docs are also very useful, especially when writing command arguments.
Here is my final backup script with detailed comments in case any of the sources goes down; remove the comments in between command arguments before running:
#!/bin/sh
# default repo location so that we can use '::archive' shorthand notation later
export BORG_REPO='ssh://username@username.your-storagebox.de:23/./backup/mbp2015'
# explicitly specify the SSH key
export BORG_RSH='ssh -i /Users/mmxmb/.ssh/storagebox_key'
# only passphrase is needed for repokey borg repo
export BORG_PASSPHRASE='VERY_LONG_PASSPHRASE'
# some helpers and error handling:
info() { printf "\n%s %s\n\n" "$( date )" "$*" >&2; }
trap 'echo $( date ) Backup interrupted >&2; exit 2' INT TERM
info "Starting backup"
# create a daily backup
/usr/local/bin/borg create \
# work on log level INFO;
--verbose \
# output list of items added (A), modified (M)
# also output if error happened when accessing a file (E)
--list --filter=AME \
# print stats for the created archive at the end; log the return code (rc)
--stats --show-rc \
# use lzma compression (low speed, high compression)
# use a heuristic to decide per chunk whether to compress or not (auto)
--compression auto,lzma,6 \
# repo name is inferred from $BORG_REPO
'::{hostname}-daily-{now}' \
# directories to backup
/Users/mmxmb/my_important_docs \
/Users/mmxmb/my_photos \
/Users/mmxmb/Desktop
backup_exit=$?
info "Pruning repository"
# prune the repo
/usr/local/bin/borg prune \
# output verbose list of archives kept/pruned; display prune stats at the end
--list --stats \
# only consider archive names starting with this prefix
--prefix '{hostname}-daily-' \
--show-rc \
# number of archives to keep for each time interval
# example visualisation: https://borgbackup.readthedocs.io/en/stable/usage/prune.html
--keep-daily 7 \
--keep-weekly 5 \
--keep-monthly 6
prune_exit=$?
# use highest exit code as exit code
global_exit=$(( backup_exit > prune_exit ? backup_exit : prune_exit ))
if [ ${global_exit} -eq 1 ];
then
info "Backup and/or Prune finished with a warning"
fi
if [ ${global_exit} -gt 1 ];
then
info "Backup and/or Prune finished with an error"
fi
exit ${global_exit}
This script is adapted mostly from The Practical Administrator article.
At this point it is a good idea to stop and play around with Borg to check that everything works smoothly before moving on to the automation step. In my case, I initialized a test Borg repo, used the above script to create an archive of a directory full of text files (so that I don’t have to wait much for compression and encryption), tested extract and prune commands, and finally deleted the repo.
Remote backup automation
Both aforementioned tutorials assume that either cron
or systemd
is used for scheduling the backup process. Since cron
has been deprecated by Apple back in 2005, we need to use launchd
which is kind of like systemd
but for macOS.
Daemon job definition for launchd
is specified in a special XML file called a property list. Here is a sample Borg backup propery list:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>mmxmb.borgbackup</string>
<key>Program</key>
<string>/path/to/remote/backup/script/backup.sh</string>
<key>StandardErrorPath</key>
<string>/path/to/remote/backup/log/backup.log</string>
<key>StandardOutPath</key>
<string>/path/to/remote/backup/log/backup.log</string>
<key>UserName</key>
<string>mmxmb</string>
<key>RunAtLoad</key>
<true/>
<key>StartCalendarInterval</key>
<dict>
<key>Hour</key>
<integer>12</integer>
<key>Minute</key>
<integer>0</integer>
</dict>
</dict>
</plist>
This file is then saved as /Library/LaunchDaemons/mmxmb.borgbackup.plist
(any job.label.plist
filename would work).
Here’s an overview of relevant job properties:
Label
is a unique identifier for of a job for thelaunchd
instance.Program
is the path to the executable. In this case it’s the backup script.StandardOutPath
andStandardErrorPath
are for redirecting output generated by the script. These are very important for observability.UserName
specifies as which user the job should run. It’s important to make sure that this user has access to all files that Borg archives, as well as job log files, and Borg server SSH key.RunAtLoad
this is self-descriptive and purely optional. Daemons are loaded at boot time so the script will run after each boot. This also makes debugging the job slightly easier.StartCalendarInterval
schedules the job to run at a specific time. In this case the job is run every day at 12:00 PM.
For more information on job properties see launchd website.
The job is loaded using:
sudo launchctl load /Library/LaunchDaemons/mmxmb.borgbackup.plist
If RunAtLoad
is false
, the job can be started after loading using its label:
sudo launchctl start mmxmb.borgbackup
To unload the job use:
sudo launchctl unload /Library/LaunchDaemons/mmxmb.borgbackup.plist
It’s a good idea too keep an eye on /var/log/system.log
for possible errors when loading and starting a job.
As an aside, one significant advantage of using launchd
as opposed to cron
on a laptop is that cron
jobs do not execute if the system is turned off or asleep. launchd
jobs scheduled with StartCalendarInterval
run when computer wakes up, if the computer was asleep when the job should have run. However, if the machine is off when the job should have run, the job does not execute until the next designated time occurs. See Apple Developer Documentation Archive: Scheduling Timed Jobs.
Automation pitfalls
I used cron
and systemd
extensively but never used launchd
. Therefore I had quite a few problems when setting up the launchd
job.
Log files permissions
After loading and attempting to run the job, there is a cryptic error message in /var/log/system.log
:
Jun 1 01:23:45 mmxmbs-MacBook-Pro com.apple.xpc.launchd[1] (mmxmb.borgbackup[18763]): Service could not initialize: 19F101: xpcproxy + 14521 [XXX][XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX]: 0xd
Jun 1 01:23:45 mmxmbs-MacBook-Pro com.apple.xpc.launchd[1] (mmxmb.borgbackup[18763]): Service exited with abnormal code: 78
This is caused by the user specified under UserName
key not having write perimssions for log files specified in StandardErrorPath
and StandardOutPath
.
SSH key permissions
The job starts but exits shortly with an error (logged to backup error log) when creating an archive:
Remote: Host key verification failed.
Connection closed by remote host. Is borg working on the server?
terminating with error status, rc 2
When debugging this I make sure I am logged in as the user that is specified in the job definition. I then try to SSH to the server the with maximum verbose mode on and SSH public key specified explicitly:
ssh -vvv -i /Users/mmxmb/.ssh/storagebox_key username@username.your-storagebox.de
This is what I see:
debug1: Next authentication method: publickey
debug1: Offering public key: /Users/mmxmb/.ssh/storagebox_key RSA SHA256:... explicit agent
debug3: send packet: type 50
debug2: we sent a publickey packet, wait for reply
debug3: receive packet: type 60
debug1: Server accepts key: /Users/mmxmb/.ssh/storagebox_key RSA SHA256:... explicit agent
debug3: sign_and_send_pubkey: RSA SHA256:...
debug3: sign_and_send_pubkey: signing using ssh-rsa
debug3: send packet: type 50
debug3: receive packet: type 51
debug1: Authentications that can continue: publickey,password
debug2: we did not send a packet, disable method
debug3: authmethod_lookup password
debug3: remaining preferred: ,password
And this is what I expect:
debug1: Next authentication method: publickey
debug1: Offering public key: /Users/mmxmb/.ssh/storagebox_key RSA SHA256:... explicit agent
debug3: send packet: type 50
debug2: we sent a publickey packet, wait for reply
debug3: receive packet: type 60
debug1: Server accepts key: /Users/mmxmb/.ssh/storagebox_key RSA SHA256:... explicit agent
debug3: sign_and_send_pubkey: RSA SHA256:...
debug3: sign_and_send_pubkey: signing using rsa-sha2-512
debug3: send packet: type 50
debug3: receive packet: type 52
debug1: Authentication succeeded (publickey)
Line debug2: we did not send a packet, disable method
indicates that the client fails to send the public key for some reason.
In my case the problem is caused by incorrect permissions for SSH keys used to authenticate with Borg server. To resolve the problem I create a new key when logged in as the user specified in the job definition and upload the public key to the sever.
SIP woes
The backup script seems to run well and archive almost everything that is needed. The last message in the backup log:
Tue Jun 1 12:34:56 EDT 2020 Backup and/or Prune finished with a warning
Which turns out to be caused by this error during borg create
:
/Users/mmxmb/Desktop: scandir: [Errno 1] Operation not permitted: '/Users/mmxmb/Desktop'
E /Users/mmxmb/Desktop
This error is due to System Integrity Protection (SIP). When a third-party binary is ran in foreground and it needs to scan a restricted directory like ~/Desktop
, a dialog pops up asking you if it’s OK to give access to the binary. But when a binary is ran in foreground (e.g. daemon) its access will be denied by default.
This can be fixed by giving the binary full-disk access (FDA) in System Preferences -> Security & Privacy -> Privacy
settings. The problem in our case is that the backup script is not a binary; therefore it is not possible to give it FDA. Giving FDA to /usr/local/bin/borg
doesn’t work since, from the SIP point of view, the restricted directory gets accessed by the program specified in the launchd
job, not borg
.
One somewhat hacky workaround that I found in a relevant AskDifferent answer is to create a wrapper binary that calls the backup script. Here’s the code for such a binary written in Go:
// backup provides a binary to run borgbackup script in MacOS Catalina with Full Disk Access
package main
import (
"log"
"os"
"os/exec"
"path/filepath"
)
func main() {
ex, err := os.Executable()
if err != nil {
log.Fatal(err)
}
dir := filepath.Dir(ex)
script := filepath.Join(dir, "backup.sh")
cmd := exec.Command("/bin/sh", script)
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
if err := cmd.Run(); err != nil {
log.Fatal(err)
}
}
Compile this file and replace the Program
property of the backup job property list with the path to this binary. Finally, give FDA to this binary.
Now the backup job should run as expected and terminate with success status.
Local backup
At this point why not use Borg for local backups as well?
The way I use my laptop, an external backup HDD can be attached for a few days, while I use the laptop at my desk, and then detached for some time when I need to take my laptop with me somewhere. The fact that the HDD is not always attached to the laptop (as opposed to an always available backup server) requires the backup script and launchd
job definition to be adjusted slightly.
In particular, having a local backup run once every 24 hours is nice. This condition is easily achievable with StartCalendarInterval
launchd
property, just like in the remote backup job definition. But if a backup drive hasn’t been attached for a few days it would be ideal if the backup job runs as soon as the drive is re-attached, and not on the next StartCalendarInterval
trigger. In Linux, this can be achieved using a udev rule (see Automated backups to a local hard drive tutorial). Since udev doesn’t exist on macOS, here’s one way to achieve similar functionality with another launchd
property and some Bash.
Local backup launch daemon job definiton:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>mmxmb.borgbackup-local</string>
<key>Program</key>
<string>/path/to/local/backup/script/backup.sh</string>
<key>StandardErrorPath</key>
<string>/path/to/local/backup/log/backup.log</string>
<key>StandardOutPath</key>
<string>/path/to/local/backup/log/backup.log</string>
<key>UserName</key>
<string>mmxmb</string>
<key>StartCalendarInterval</key>
<dict>
<key>Hour</key>
<integer>11</integer>
<key>Minute</key>
<integer>0</integer>
</dict>
<key>WatchPaths</key>
<array>
<string>/Volumes/backup-disk</string>
</array>
</dict>
</plist>
The new property, that is not used in remote backup job definition, is WatchPaths
. Here’s how WatchPaths
works when pointed at a directory:
If the path points to a directory, creating and removing this directory, as well as creating, removing and writing files in this directory will start the job. Actions performed in subdirectories of this directory will not be detected.
If the backup volume is named backup-disk
and it has backup
directory which contains Borg repos then then the backup job is triggered when the volume is mounted or unmounted. From the point of view of WatchPaths
that is equivalent to /Volumes/backup-disk
directory being created or deleted.
Since the backup needs to start only when the volume is mounted, the backup script needs to handle the case when the job is triggered when the volume is unmounted:
#!/bin/sh
# it seems that sometimes launchd job is triggered on volume mount
# but the disk is not immediately accessible, so sleeping for a bit helps
sleep 5
DISK_NAME=backup-disk
MOUNTPOINT=/Volumes/$DISK_NAME
# some helpers and error handling:
info() { printf "\n%s %s\n\n" "$( date )" "$*" >&2; }
trap 'echo $( date ) Backup interrupted >&2; exit 2' INT TERM
# exit if disk is not mounted; launchd job gets triggered both when disk is mounted/unmounted
if [ ! -d "$MOUNTPOINT" ]; then
info "The disk $MOUNTPOINT is not mounted. Exiting."
exit 0
fi
export BORG_REPO="$MOUNTPOINT/backup/mbp2015"
export BORG_PASSPHRASE='VERY_LONG_PASSPHRASE'
# get unix time of the last complete backup
# source: https://projects.torsion.org/witten/borgmatic/issues/86
LAST_BACKUP_TIME=`/usr/local/bin/borg list --sort timestamp --format '{time:%s}{TAB}{name}{NEWLINE}' | grep -v '\.checkpoint$' | tail -1 | cut -f 1`
# find time difference between now and last complete backup
NOW_TIME=`date +"%s"`
SECONDS_SINCE_LAST=$((NOW_TIME - LAST_BACKUP_TIME))
SECONDS_IN_DAY=86400
# exit if last backup took place less than 24 hours ago
if [ $SECONDS_SINCE_LAST -lt $SECONDS_IN_DAY ];
then
info "Last backup happened less than 24 hours ago. Exiting."
exit 0
fi
info "Starting local backup"
# create a daily backup
/usr/local/bin/borg create \
--verbose \
--list --filter=AME \
--stats --show-rc \
--compression auto,lzma,6 \
'::{hostname}-daily-{now}' \
/Users/mmxmb/my_important_docs \
/Users/mmxmb/my_photos \
/Users/mmxmb/Desktop
backup_exit=$?
info "Pruning local repository"
# prune the repo
/usr/local/bin/borg prune \
--list --stats \
--prefix '{hostname}-daily-' \
--show-rc \
--keep-daily 7 \
--keep-weekly 5 \
--keep-monthly 6
prune_exit=$?
# use highest exit code as exit code
global_exit=$(( backup_exit > prune_exit ? backup_exit : prune_exit ))
if [ ${global_exit} -eq 1 ];
then
info "Backup and/or Prune finished with a warning"
fi
if [ ${global_exit} -gt 1 ];
then
info "Backup and/or Prune finished with an error"
fi
exit ${global_exit}
This script also contains some logic preventing the backup job from creating a new archive too often, i.e. when a backup drive is re-attached many times throughout the day.
SIP woes, again
If launchd
job calls the script directly, there is a problem even before the archive creation starts:
Sat Jun 06 12:34:56 EDT 2020 Starting local backup
Local Exception
Traceback (most recent call last):
File "borg/locking.py", line 130, in acquire
FileExistsError: [Errno 17] File exists: '/Volumes/backup-disk/backup/mbp2015/lock.exclusive'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "borg/archiver.py", line 4565, in main
File "borg/archiver.py", line 4497, in run
File "borg/archiver.py", line 161, in wrapper
File "borg/repository.py", line 190, in __enter__
File "borg/repository.py", line 421, in open
File "borg/locking.py", line 350, in acquire
File "borg/locking.py", line 363, in _wait_for_readers_finishing
File "borg/locking.py", line 134, in acquire
File "borg/locking.py", line 159, in kill_stale_lock
PermissionError: [Errno 1] Operation not permitted: '/Volumes/backup-disk/backup/mbp2015/lock.exclusive'
Platform: Darwin mmxmbs-MacBook-Pro.local 19.5.0 Darwin Kernel Version 19.5.0: Tue May 26 12:34:56 PDT 2020; root:xnu-6153.121.2~2/RELEASE_X86_64 x86_64
Borg: 1.1.13 Python: CPython 3.5.9 msgpack: 0.5.6
PID: 50157 CWD: /
sys.argv: ['/usr/local/bin/borg', 'create', '--verbose', '--list', '--filter=AME', '--stats', '--show-rc', '--compression', 'auto,lzma,6', '::{hostname}-daily-{now}', '/Users/mmxmb/my_important_docs', '/Users/mmxmb/photos', '/Users/mmxmb/Desktop']
SSH_ORIGINAL_COMMAND: None
terminating with error status, rc 2
SIP restricts access to removable volumes so the trick with creating a wrapper binary and giving it FDA has to be applied here.