Pull Backups with Borgmatic
I use Borg for my personal backups; I find its compression and deduplication features really useful. However, it is architected around the threat model of having a client you trust, but a backup server you don’t. As a result, backups are all encrypted and push-based (i.e. the client ssh’s into the remote and only sends encrypted chunks). While this is fine for quite a lot of usecases, I wanted to back up a VPS that I maintain. I didn’t really want to give perpetual access to my home network to this VPS.
Thankfully, Borg publishes a
guide
that details how to perform pull-based backups. One of its prescribed methods
involves using socat(1)
and a reverse ssh tunnel to open a pathway for backups
to flow. While its documentation is quite thorough in its methods, it does leave
a couple of things to the imagination. I wanted to write a post detailing how I
set this up with my backup server, and how I managed to integrate it into
Borgmatic, a Borg automation tool.
A word of caution: I wouldn’t necessarily call this post a tutorial, but rather a writeup of some details that I ran into along the way that I feel are worth sharing.
Setting Up The Backup Socket
The aforementioned Borg guide is quite detailed, and is worth reading, but to recap
- The Borg server places a UNIX socket in
/run
. - The server listens on this socket using
socat
, forwarding output to an instance ofborg serve
(which is whatborg
uses for RPC1) - The server connects to the client using an SSH reverse tunnel, forwarding its local UNIX socket to a remote UNIX socket.
- The client can then perform a set of
borg
commands over this socket, using Borg’srsh
option.
This is definitely a great starting point, and will by all means work. One thing that caught my eye, however, was this note in the documentation
When used in production you may also use systemd socket-based activation instead of socat on the server side. You would wrap the
borg serve
command in a service unit and configure a matching socket unit to start the service whenever a client connects to the socket.
Well that’s certainly interesting! Unfortunately, no example is given (remember
how I said some things are left to the imagination?). While I’ve seen socket
units before, I’ve never actually written one, so this was a fun learning
opportunity. For the uninformed, a socket
unit
instructs systemd to start a unit once a connection is made to a socket. We can
use this to emulate the socat
that the guide instructs us to run on the
server.
Here’s the unit I ended up with:
[Unit]
Description=Borg Backup Socket
PartOf=remote-backup@.service
[Socket]
ListenStream=/run/remote-backup/borg.sock
Accept=Yes
SocketUser=remote-backup
Importantly, we instruct systemd to create this socket with the owner
remote-backup
, which is simply an unprivileged user I’ve created for the
backup2. Now, when there is a connection on this socket, it will start
remote-backup@.service
. This must be a templated unit (note the @
on the
service name), as we use the Accept
flag, and as such, systemd
will spawn a
new instance of the service for every connection. Here’s what that unit looks
like:
[Unit]
Description=Serve Remote Borg Backup
After=network-online.target remote-backup.socket
Requires=remote-backup.socket
[Service]
ExecStart=borg serve --append-only --restrict-to-path /path/to/my/repository
Type=simple
User=remote-backup
StandardInput=socket
StandardOutput=socket
StandardError=journal
Most of this unit is fairly straightforward if you’ve ever written a systemd
service unit before; we set a dependency between this service and the socket
unit, and we run borg serve
, just as the socat
did. Again, we scope this
service to the remote-user
user, to sandbox the execution a little bit3.
There are a few gotchas, though.
Firstly, we need to redirect StandardInput
and StandardOutput
of the borg serve
process to the socket. systemd
does not do this by default, and Borg
clients will unceremoniously fail to connect if it doesn’t get any responses to
its RPC messages. Secondly, we must explicitly tell systemd to redirect stderr
to the journal with StandardError=journal
. If this is not specified, systemd
will forward stderr to the same location as StandardOutput
(the socket), which
will send human-readable messages to the Borg client (and of course, the client
chokes on these). You could also very well set this to null
, rather than
journal
, but I figure logging it may be useful.
One other small tidbit: because systemd starts this service units with templated
names, if you want to view the journal for these units, you can do that with
journalctl -e -u 'remote-backup*'
(mind the quotes).
If all you wanted out of this guide was an example of the socket unit the Borg
documentation alludes to, you’re all done once you systemd enable --now
the
socket unit. Otherwise, read on!
Setting Up The Remote To Accept Connections
Now that we have a way to activate borg serve
, we need to prepare the VPS for
our reverse ssh tunnel. This section is definitely up to how you’ve chosen to
configure your remote server, but this is what I did.
First, I created a dedicated user for the backup process to ssh in as. This user
has the public ssh key of the backup server’s remote-user
in its
authorized_keys
.
Second, I added a line to my sudoers
to allow this remote user to execute
borg
as root so that it can read the entire disk for a backup. My entry looks
like this
borg ALL=(root:root) NOPASSWD:SETENV: /usr/bin/borg
I use NOPASSWD
here so that I can automate the backup process, but I wouldn’t
recommend doing this unless you’ve disabled password auth for this user and
remote ssh connections. SETENV
is also there for some additional automation
details that I will get to later.
Lastly, there’s a note in the Borg docs that caught my attention
As the default value of OpenSSH for
StreamLocalBindUnlink
is no, the socket file created by sshd is not removed. Trying to connect a second time, will print a short warning, and the forwarding does not take place
This note definitely alludes to the fact we can StreamLocalBindUnlink yes
to
ease reconnection problems, but one thing worth noting is that this must be done
on the VPS’ sshd config, given we are using a reverse tunnel. If we were setting
up a forward tunnel, this option would be specified on the backup server4.
At this point, we should be able to actually perform a backup by hand! Here’s the command I ran on my backup server:
sudo -u remote-backup ssh \
-R /tmp/borg.sock:/run/remote-backup/borg.sock \
borg@vps
sudo borg \
--rsh="sh -c 'exec socat STDIO UNIX-CONNECT:/tmp/borg.sock'"\
create ssh://server/path/to/my/repository::test /
The hostname in the ssh://
string is not important, as the address will be
ignored when the connection is sent to borg serve
.
Hooking It Up To Borgmatic
Borgmatic is pretty great, but out of the box it will just call borg
directly
on the machine its running on. This is fine for most usecases, but here I need
to actually run borg
on the VPS, while still keeping Borgmatic on the backup
server (or somewhere else that can handle the automation). Thankfully, Borgmatic
exposes a configuration
flag,
local_path
, that lets us use an alternative borg
binary. I wrote this small
wrapper script, and pointed borgmatic
to it:
#!/usr/bin/env bash
exec ssh -R /tmp/borg.sock:/run/remote-backup/borg.sock \
-o SendEnv=BORG_PASSPHRASE \
borg@vps \
$'sudo -E borg --rsh="sh -c \'exec socat STDIO UNIX-CONNECT:/tmp/borg.sock\'" '$@
The only thing that is new here is the SendEnv
of BORG_PASSPHRASE
; I chose
to store this passphrase on the backup server, rather than the VPS. borgmatic
sends this to the Borg binary with the environment variable BORG_PASSPHRASE
,
which I pass through the ssh connection with SendEnv
5.
That’s all there is to it! To recap, we
- Use Borgmatic to execute a script that will set up a local UNIX socket between the VPS and the backup server.
- Execute Borg on the VPS to send data back to the backup server.
- Use a systemd service on the backup server to activate
borg serve
.
-
Even if you are a regular Borg user, you may not have seen this before. This is normally executed on the remote for you by the
borg
command. However, because there is no ssh access from the client to the server, we must do this ourselves. Check out the documentation for more details. ↩︎ -
You can omit this if you’d like, but I wouldn’t recommend it. While this is strictly controlling the socket ownership, we’re allowing a foreign system to send data to us, so we may as well sandbox it slightly. Running everything locally as a non-root user grants us an extra layer of security. ↩︎
-
You will want to make sure this user can read your Borg repository, of course :) ↩︎
-
This comment on StackExchange is what tipped me off here. ↩︎
-
Note, you must have an
AcceptEnv
for this variable on your remote’s sshd config. ↩︎