Linux backup options – PHP Developer

Until recently I’ve been happily using Obnam for all my backup needs, both on local machines and servers. However, the main developer has decided to step down, and the community, including myself, hasn’t put forward a volunteer to take over. This means that I’m looking for a new primary backup system by the end of the year.

Even if Obnam development was continuing, I still want to look into other software because I don’t want to lose all my backups due to a bug in the backup software. Ideally I will be running at least two different backup options, possibly three (depending on disk space – it’s cheap but not free).

The requirements of a good backup system are:

Data integrity: No point in having corrupted backups.
Restorability: Can you restore from places other than the original host? How easy is it to restore one file out of the thousands you have backed up?
Simplicity: You need to be able to say ‘backup these files to this location’ as simply as possible. The harder this is, the less likely you will bother to backup.
Speed: The quicker backups are, the more frequently you can run them.

There are plenty of other ‘nice to haves’, but the above are absolute requirements.

rsync

The simplest backup option I can think of is to run rsync to copy the source files to a backup target. This has the advantages of being simple and low-bandwidth, since rsync is intelligent enough to only send the parts of files which have changed. You can also run rsync over SSH, giving you transport security and the ability to use keys without a passphrase for unattended backups.

The biggest downside to rsync is that there is no easy way to remove files from the backup a given amount of time after they have been removed from the source. You also can’t easily keep revisions of files. Effectively you have a single snapshot of the source, which is better than nothing but doesn’t give you the ability to go back in time.

tar files

Another simple backup option is to create a tar file of everything you want to backup, and then transfer that elsewhere. Usually you rotate the files using filenames, e.g. you store 01-backup.tar on the first of the month, 02-backup.tar on the second etc., giving you a month’s worth of backups.

The big advantages of tar files are:

Ubiquity: tar is installed on most Linux and macOS machines by default. It’s also available for Solaris and Windows (via Cygwin).
Compression: gzip, bzip2 and LZMA2 are all supported out of the box.
Stability and longevity: tar has been around for decades.

However, you don’t get any de-duplication with tar, which means you either have to store multiple copies of every file, or work out a mechanism for doing incremental backups. Storing multiple copies is fine (and a good idea) if you have the disk space, but on servers I often don’t have this option.

Borg

Borg describes itself as a ‘deduplicating archiver with compression and encryption’. You can backup to local storage or remote hosts over SSH. It’s been around for several years and seems to be stable, and it has a huge feature-set, although I haven’t used most of them. It also appears to scale well, and I know people who are using it to backup a large estate of machines to multiple sources.

The major downside to Borg is that the command to verify the contents of a repository or an archive, borg check, is incredibly slow, at least for my use case. I also find the idea of creating a new archive name for each snapshot a bit annoying, although I got around this with a simple wrapper script that uses the current date/time. Despite this, Borg is my primary backup system at the moment.

Restic

Restic is another de-duplicating backup system which works in a similar way to Borg, at least from the user’s point of view.

Advantages of Restic include:

Active: Development is ongoing and the team are quick to respond to issues (my documentation pull request was merged in 24 hours).
Cross-platform: Restic is written in Go, which makes it easy to cross-compile to other platforms, although I haven’t tested it on Windows. It also produces a single binary which you can upload to servers.

However, Restic is a new piece of software, and I generally prefer my backup software, like my filesystems, to be old, boring and stable. For that reason, I wouldn’t recommend using Restic as your primary backup system until it has settled down. It’s a good secondary system though and I’m looking forward to improvements over the coming months.

My choices

At the moment I’m using Borg as my primary backup system and restic as a secondary, with tar files as a tertiary system when disc space isn’t a problem (e.g. backing up my desktop to a 1TB USB drive). All of my systems run Linux, but I’m still on the lookout for good solutions for Windows (must be open source) to encourage automated backups amongst my family.