Pull Backups with Borgmatic

Posted on Nov 24, 2023

I use Borg for my personal backups; I find its compression and deduplication features really useful. However, it is architected around the threat model of having a client you trust, but a backup server you don’t. As a result, backups are all encrypted and push-based (i.e. the client ssh’s into the remote and only sends encrypted chunks). While this is fine for quite a lot of usecases, I wanted to back up a VPS that I maintain. I didn’t really want to give perpetual access to my home network to this VPS.

Thankfully, Borg publishes a guide that details how to perform pull-based backups. One of its prescribed methods involves using socat(1) and a reverse ssh tunnel to open a pathway for backups to flow. While its documentation is quite thorough in its methods, it does leave a couple of things to the imagination. I wanted to write a post detailing how I set this up with my backup server, and how I managed to integrate it into Borgmatic, a Borg automation tool.

A word of caution: I wouldn’t necessarily call this post a tutorial, but rather a writeup of some details that I ran into along the way that I feel are worth sharing.

Setting Up The Backup Socket

The aforementioned Borg guide is quite detailed, and is worth reading, but to recap

The Borg server places a UNIX socket in /run.
The server listens on this socket using socat, forwarding output to an instance of borg serve (which is what borg uses for RPC¹)
The server connects to the client using an SSH reverse tunnel, forwarding its local UNIX socket to a remote UNIX socket.
The client can then perform a set of borg commands over this socket, using Borg’s rsh option.

This is definitely a great starting point, and will by all means work. One thing that caught my eye, however, was this note in the documentation

When used in production you may also use systemd socket-based activation instead of socat on the server side. You would wrap the borg serve command in a service unit and configure a matching socket unit to start the service whenever a client connects to the socket.

Well that’s certainly interesting! Unfortunately, no example is given (remember how I said some things are left to the imagination?). While I’ve seen socket units before, I’ve never actually written one, so this was a fun learning opportunity. For the uninformed, a socket unit instructs systemd to start a unit once a connection is made to a socket. We can use this to emulate the socat that the guide instructs us to run on the server.

Here’s the unit I ended up with:

[Unit]
Description=Borg Backup Socket
PartOf=remote-backup@.service

[Socket]
ListenStream=/run/remote-backup/borg.sock
Accept=Yes
SocketUser=remote-backup

Importantly, we instruct systemd to create this socket with the owner remote-backup, which is simply an unprivileged user I’ve created for the backup². Now, when there is a connection on this socket, it will start remote-backup@.service. This must be a templated unit (note the @ on the service name), as we use the Accept flag, and as such, systemd will spawn a new instance of the service for every connection. Here’s what that unit looks like:

[Unit]
Description=Serve Remote Borg Backup
After=network-online.target remote-backup.socket
Requires=remote-backup.socket

[Service]
ExecStart=borg serve --append-only --restrict-to-path /path/to/my/repository
Type=simple
User=remote-backup
StandardInput=socket
StandardOutput=socket
StandardError=journal

Most of this unit is fairly straightforward if you’ve ever written a systemd service unit before; we set a dependency between this service and the socket unit, and we run borg serve, just as the socat did. Again, we scope this service to the remote-user user, to sandbox the execution a little bit³. There are a few gotchas, though.

Firstly, we need to redirect StandardInput and StandardOutput of the borg serve process to the socket. systemd does not do this by default, and Borg clients will unceremoniously fail to connect if it doesn’t get any responses to its RPC messages. Secondly, we must explicitly tell systemd to redirect stderr to the journal with StandardError=journal. If this is not specified, systemd will forward stderr to the same location as StandardOutput (the socket), which will send human-readable messages to the Borg client (and of course, the client chokes on these). You could also very well set this to null, rather than journal, but I figure logging it may be useful.

One other small tidbit: because systemd starts this service units with templated names, if you want to view the journal for these units, you can do that with journalctl -e -u 'remote-backup*' (mind the quotes).

If all you wanted out of this guide was an example of the socket unit the Borg documentation alludes to, you’re all done once you systemd enable --now the socket unit. Otherwise, read on!

Setting Up The Remote To Accept Connections

Now that we have a way to activate borg serve, we need to prepare the VPS for our reverse ssh tunnel. This section is definitely up to how you’ve chosen to configure your remote server, but this is what I did.

First, I created a dedicated user for the backup process to ssh in as. This user has the public ssh key of the backup server’s remote-user in its authorized_keys.

Second, I added a line to my sudoers to allow this remote user to execute borg as root so that it can read the entire disk for a backup. My entry looks like this

borg    ALL=(root:root) NOPASSWD:SETENV: /usr/bin/borg

I use NOPASSWD here so that I can automate the backup process, but I wouldn’t recommend doing this unless you’ve disabled password auth for this user and remote ssh connections. SETENV is also there for some additional automation details that I will get to later.

Lastly, there’s a note in the Borg docs that caught my attention

As the default value of OpenSSH for StreamLocalBindUnlink is no, the socket file created by sshd is not removed. Trying to connect a second time, will print a short warning, and the forwarding does not take place

This note definitely alludes to the fact we can StreamLocalBindUnlink yes to ease reconnection problems, but one thing worth noting is that this must be done on the VPS’ sshd config, given we are using a reverse tunnel. If we were setting up a forward tunnel, this option would be specified on the backup server⁴.

At this point, we should be able to actually perform a backup by hand! Here’s the command I ran on my backup server:

sudo -u remote-backup ssh \
	-R /tmp/borg.sock:/run/remote-backup/borg.sock \
	borg@vps
	sudo borg \
		--rsh="sh -c 'exec socat STDIO UNIX-CONNECT:/tmp/borg.sock'"\
		create ssh://server/path/to/my/repository::test /

The hostname in the ssh:// string is not important, as the address will be ignored when the connection is sent to borg serve.

Hooking It Up To Borgmatic

Borgmatic is pretty great, but out of the box it will just call borg directly on the machine its running on. This is fine for most usecases, but here I need to actually run borg on the VPS, while still keeping Borgmatic on the backup server (or somewhere else that can handle the automation). Thankfully, Borgmatic exposes a configuration flag, local_path, that lets us use an alternative borg binary. I wrote this small wrapper script, and pointed borgmatic to it:

#!/usr/bin/env bash

exec ssh -R /tmp/borg.sock:/run/remote-backup/borg.sock \
	-o SendEnv=BORG_PASSPHRASE \
	borg@vps \
	$'sudo -E borg --rsh="sh -c \'exec socat STDIO UNIX-CONNECT:/tmp/borg.sock\'" '$@

The only thing that is new here is the SendEnv of BORG_PASSPHRASE; I chose to store this passphrase on the backup server, rather than the VPS. borgmatic sends this to the Borg binary with the environment variable BORG_PASSPHRASE, which I pass through the ssh connection with SendEnv⁵.

That’s all there is to it! To recap, we

Use Borgmatic to execute a script that will set up a local UNIX socket between the VPS and the backup server.
Execute Borg on the VPS to send data back to the backup server.
Use a systemd service on the backup server to activate borg serve.

Even if you are a regular Borg user, you may not have seen this before. This is normally executed on the remote for you by the borg command. However, because there is no ssh access from the client to the server, we must do this ourselves. Check out the documentation for more details. ↩︎
You can omit this if you’d like, but I wouldn’t recommend it. While this is strictly controlling the socket ownership, we’re allowing a foreign system to send data to us, so we may as well sandbox it slightly. Running everything locally as a non-root user grants us an extra layer of security. ↩︎
You will want to make sure this user can read your Borg repository, of course :) ↩︎
This comment on StackExchange is what tipped me off here. ↩︎
Note, you must have an AcceptEnv for this variable on your remote’s sshd config. ↩︎