BorgBackup on macOS

June 24, 2020 - December 29, 2023

Does the dandelion drop all its seeds at the base of its stalk? Does the cuckoo lay its eggs in one nest? So long as your backups are in one place, you are vulnerable to the fortunes of the world.

- The Tao of Backup


Time Machine works well for local backups on macOS. Recently, I did an inventory of my data assets and came to the conclusion that it would be best to have an off-site backup of my data, in addition to a local Time Machine backup. It’s a good idea to have some redundancy in case the Time Machine HDD gets lost, damaged or corrupted.

There are two main requirements that I had when choosing an off-site backup tool:

The first requirement excludes services like Dropbox and Google Drive from the search. The second requirement excludes closed-source software that claim to have E2EE (e.g. Arq, Backblaze).

In the end, I decided to learn how to use BorgBackup. BorgBackup (or Borg) is an open source “deduplicating archiver with compression and encryption”. This project has many contributors and turned 10 years old in March 2020.

There are many good articles describing how Borg works and how to use it, so I won’t focus on that (but I will link to these articles in the process). Instead, this article is a compilation of notes that I took while setting up and automating Borg on macOS Catalina.


FYI this article was originally written for macOS Catalina 10.15.5 and borg 1.1.13. In Dec 2023 it received a major update for macOS Sonoma 14.2.1 and borg 1.2.7.

Hosting

I decided to go with Hetzner Storage Box since it comes with Borg support. In this case there is no need to worry about setting up Borg server, only the client on the local machine. The only thing that needs to be done on the server is adding the local machine SSH public key to ~/.ssh/authorized_keys on the server. Storage Boxes use RAID which is great since Borg does not add redundancy to deal with hardware malfunction.

In general, any VPS hosting would work; you just need to make sure that both the client and the server have Borg installed.

Installation

There is a Brew formula for Borg so installation on macOS is simple:

brew install borgbackup

For the server, there are distribution packages available for most common Linux and BSD distros.

Remote backup setup

Repo initialization

Before a backup can be made a repository has to be initialized. The main decision that has to be made at this point is encryption key mode selection (this cannot be changed later).

Borg offers various options for authenticated encryption with associated data (AEAD). All options use AES-CTR-256 for encryption.

See Borg docs on Encryption and borg init for more information on these options.

Example initialization:

borg init --encryption=repokey-blake2 \
    ssh://[email protected]:23/./backup/mbp2015

Backup script

I used a combination of sources to write the backup script:

Hetzner article give a detailed walkthrough of a typical archive-then-prune workflow, I won’t go into great detail here. Borg usage docs are also very useful, especially when writing command arguments.

Here is my final backup script that I’ve been using for my daily backups for the last several years. This script contains detailed comments in case any of the sources goes down; remove the comments in between command arguments before running:

#!/bin/sh

export BORG_FILES_CACHE_TTL=40 # https://borgbackup.readthedocs.io/en/stable/faq.html#it-always-chunks-all-my-files-even-unchanged-ones
# default repo location so that we can use '::archive' shorthand notation later
export BORG_REPO='ssh://[email protected]:23/./backup/mbp2015'
# get repo passphrase from 1Password
export BORG_PASSCOMMAND='/opt/homebrew/bin/op read "op://borg-backup/remote/passphrase"'

# some helpers and error handling:
info() { printf "\n%s %s\n\n" "$( date )" "$*" >&2; }
trap 'echo $( date ) Backup interrupted >&2; exit 2' INT TERM

info "Starting backup"

# create a daily backup
/opt/homebrew/bin/borg create \
    # work on log level INFO; 
    --verbose \
   # output list of items added (A), modified (M)
   # also output if error happened when accessing a file (E)
   --list --filter=AME \
   # print stats for the created archive at the end; log the return code (rc)
   --stats --show-rc \
   # use lzma compression (low speed, high compression)
   # use a heuristic to decide per chunk whether to compress or not (auto)
   --compression auto,lzma,6 \
   # repo name is inferred from $BORG_REPO
    '::{hostname}-daily-{now}' \
      # directories to backup
      /Users/mmxmb/my_important_docs \
      /Users/mmxmb/my_photos \
      /Users/mmxmb/Desktop

backup_exit=$?

info "Pruning repository"

# prune the repo
/opt/homebrew/bin/borg prune \
    # output verbose list of archives kept/pruned; display prune stats at the end
    --list --stats \
    # only consider archive names starting with this prefix
    --glob-archives '{hostname}-daily-' \
    --show-rc \
    # number of archives to keep for each time interval
    # example visualisation: https://borgbackup.readthedocs.io/en/stable/usage/prune.html
    --keep-daily 7 \
    --keep-weekly 5 \
    --keep-monthly 6

prune_exit=$?

# use highest exit code as exit code
global_exit=$(( backup_exit > prune_exit ? backup_exit : prune_exit ))

if [ ${global_exit} -eq 1 ];
then
    info "Backup and/or Prune finished with a warning"
fi

if [ ${global_exit} -gt 1 ];
then
    info "Backup and/or Prune finished with an error"
fi

exit ${global_exit}

Note that I am using 1Password as my secrets manager. In particular, I store a backup server ssh key and an encryption passphrase in a dedicated vault. I use 1Password SSH agent to access the ssh private key (configured in my ssh config) and 1Password CLI to access the passphrase. Set BORG_RSH if you need to customize how Borg uses ssh (e.g. to specify path to private key file).

One minor issue with running Borg in a shell script is that 1Password tells me that bash is trying to access my vault which is very broad:

Allow bash to use SSH key 1Password access request

Allow bash to get CLI access 1Password access request

I prefer to wrap the shell script in a Go executable for audit purposes:

package main

import (
    "log"
    "os"
    "os/exec"
    "path/filepath"
)

func main() {
    ex, err := os.Executable()
    if err != nil {
        log.Fatal(err)
    }
    dir := filepath.Dir(ex)
    script := filepath.Join(dir, "backup.sh")
    cmd := exec.Command("/bin/sh", script)
    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr
    if err := cmd.Run(); err != nil {
        log.Fatal(err)
    }
}

Save the compiled binary as remote_backup. Now 1Password shows a better message:

Allow remote backup to use SSH key 1Password access request

Allow remote backup to get CLI access 1Password access request

At this point it is a good idea to stop and interact Borg to check that everything works smoothly before moving on to the automation step. In my case, I initialized a test Borg repo, used the above script to create an archive of a directory full of text files (so that I don’t have to wait much for compression and encryption), tested extract and prune commands, and finally deleted the repo.

Remote backup automation

Hetzner tutorial assumes that cron is used for scheduling the backup process; some other articles that you may find online use systemd. Since cron has been deprecated by Apple back in 2005, we need to use launchd which is kind of like systemd but for macOS.

Since I want to automate backups for my laptop, i.e. a machine that is not always online and that is used only by me when logged in as mmxmb, it makes sense to define a user agent job. A user agent job is a job that runs on behalf of a currently logged in user. Job definition for launchd is specified in a special XML file called a property list. Here is a remote Borg backup propery list that I use:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
	<key>Label</key>
	<string>mmxmb.borgbackup-remote</string>
	<key>Program</key>
    <string>/path/to/remote/backup/script/go_wrapper_binary</string>
	<key>StandardErrorPath</key>
    <string>/path/to/remote/backup/log/backup.log<</string>
	<key>StandardOutPath</key>
    <string>/path/to/remote/backup/log/backup.log</string>
    <key>RunAtLoad</key>
    <true/>
	<key>StartCalendarInterval</key>
	<dict>
		<key>Hour</key>
		<integer>12</integer>
		<key>Minute</key>
		<integer>0</integer>
	</dict>
</dict>
</plist>

This file is then saved as ~/Library/LaunchAgents/mmxmb.borgbackup-remote.plist (any job.label.plist filename would work). In general, all user agent jobs should be stored in ~/Library/LaunchAgents.

Here’s an overview of relevant job properties:

For more information on launchd and job properties see launchd docs.


The job is loaded using:

launchctl bootstrap gui/$(id -u $(whoami)) ~/Library/LaunchAgents/mmxmb.borgbackup-remote.plist

If the job is loaded successfully, you should see the following notification:

launchd scheduled job success Console message

And the following message in Console:

launchd scheduled job success Console message

To unload the job use:

launchctl bootout gui/$(id -u $(whoami)) ~/Library/LaunchAgents/mmxmb.borgbackup-remote.plist

For more information on launchctl see launchctl docs.

Now the backup job should run every day at noon or right after login. It’s good to checks logs once in a while to make sure that jobs run successfully.


As an aside, one advantage of using launchd over cron on a laptop is that cron jobs do not execute if the system is turned off or asleep. launchd jobs scheduled with StartCalendarInterval run when computer wakes up, if the computer was asleep when the job should have run. However, if the machine is off when the job should have run, the job does not execute until the next designated time occurs. See Apple Developer Documentation Archive: Scheduling Timed Jobs.

Local backup

At this point why not use Borg for local backups as well?

The way I use my laptop, an external backup HDD can be attached for a few days, while I use the laptop at my desk, and then detached for some time when I need to take my laptop with me somewhere. The fact that the HDD is not always attached to the laptop (as opposed to an always available backup server) requires the backup script and launchd job definition to be adjusted slightly.

In particular, having a local backup run once every 24 hours is nice. This condition is easily achievable with StartCalendarInterval launchd property, just like in the remote backup job definition. But if a backup drive hasn’t been attached for a few days it would be ideal if the backup job runs as soon as the drive is re-attached, and not on the next StartCalendarInterval trigger. In Linux, this can be achieved using a udev rule (see Automated backups to a local hard drive tutorial). Since udev doesn’t exist on macOS, here’s one way to achieve similar functionality with another launchd property and some Bash.

Local backup launch daemon job definiton:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>mmxmb.borgbackup-local</string>
    <key>Program</key>
    <string>/path/to/local/backup/script/local_backup</string>
    <key>StandardErrorPath</key>
    <string>/path/to/local/backup/log/backup.log</string>
    <key>StandardOutPath</key>
    <string>/path/to/local/backup/log/backup.log</string>
    <key>StartCalendarInterval</key>
    <dict>
        <key>Hour</key>
        <integer>11</integer>
        <key>Minute</key>
        <integer>0</integer>
    </dict>
    <key>WatchPaths</key>
    <array>
        <string>/Volumes/backup-disk</string>
    </array>
</dict>
</plist>

The new property, that is not used in remote backup job definition, is WatchPaths. Here’s how WatchPaths works when pointed at a directory:

If the path points to a directory, creating and removing this directory, as well as creating, removing and writing files in this directory will start the job. Actions performed in subdirectories of this directory will not be detected.

If the backup volume is named backup-disk and it has backup directory which contains Borg repos then then the backup job is triggered when the volume is mounted or unmounted. From the point of view of WatchPaths that is equivalent to /Volumes/backup-disk directory being created or deleted.

Since the backup needs to start only when the volume is mounted, the backup script needs to handle the case when the job is triggered when the volume is unmounted:

#!/bin/sh

# it seems that sometimes launchd job is triggered on volume mount
# but the disk is not immediately accessible, so sleeping for a bit helps
sleep 5

DISK_NAME=backup-disk
MOUNTPOINT=/Volumes/$DISK_NAME

# some helpers and error handling:
info() { printf "\n%s %s\n\n" "$( date )" "$*" >&2; }
trap 'echo $( date ) Backup interrupted >&2; exit 2' INT TERM

# exit if disk is not mounted; launchd job gets triggered both when disk is mounted/unmounted
if [ ! -d "$MOUNTPOINT" ]; then
  info "The disk $MOUNTPOINT is not mounted. Exiting."
  exit 0
fi

export BORG_FILES_CACHE_TTL=40 # https://borgbackup.readthedocs.io/en/stable/faq.html#it-always-chunks-all-my-files-even-unchanged-ones
# default repo location so that we can use '::archive' shorthand notation later
export BORG_REPO="$MOUNTPOINT/backup/mbp2015"
export BORG_PASSCOMMAND='/opt/homebrew/bin/op read "op://borg-backup/local/passphrase"'

# get unix time of the last complete backup
# source: https://projects.torsion.org/witten/borgmatic/issues/86
LAST_BACKUP_TIME=`/usr/local/bin/borg list --sort timestamp  --format '{time:%s}{TAB}{name}{NEWLINE}' | grep -v '\.checkpoint$' | tail -1 |  cut -f 1`

# find time difference between now and last complete backup
NOW_TIME=`date +"%s"`
SECONDS_SINCE_LAST=$((NOW_TIME - LAST_BACKUP_TIME))
SECONDS_IN_DAY=86400

# exit if last backup took place less than 24 hours ago
if [ $SECONDS_SINCE_LAST -lt $SECONDS_IN_DAY ];
then
  info "Last backup happened less than 24 hours ago. Exiting."
  exit 0
fi

info "Starting local backup"

# create a daily backup
/opt/homebrew/bin/borg create \
    --verbose \
   --list --filter=AME \
   --stats --show-rc \
   --compression auto,lzma,6 \
    '::{hostname}-daily-{now}' \
      /Users/mmxmb/my_important_docs \
      /Users/mmxmb/my_photos \
      /Users/mmxmb/Desktop

backup_exit=$?

info "Pruning local repository"

# prune the repo
/opt/homebrew/bin/borg prune \
    --list --stats \
    --glob-archives '{hostname}-daily-' \
    --show-rc \
    --keep-daily 7 \
    --keep-weekly 5 \
    --keep-monthly 6

prune_exit=$?

# use highest exit code as exit code
global_exit=$(( backup_exit > prune_exit ? backup_exit : prune_exit ))

if [ ${global_exit} -eq 1 ];
then
    info "Backup and/or Prune finished with a warning"
fi

if [ ${global_exit} -gt 1 ];
then
    info "Backup and/or Prune finished with an error"
fi

exit ${global_exit}

This script also contains some logic preventing the backup job from creating a new archive too often, i.e. when a backup drive is re-attached many times throughout the day.