From 3e23ce6f406f8caf39e3295e541301f66873f031 Mon Sep 17 00:00:00 2001 From: Kaan Barmore-Genc Date: Sun, 29 May 2022 16:39:04 -0400 Subject: [PATCH] My New Backup Setup with Kopia --- .../posts/2022.05.29.my-new-backup-kopia.md | 65 +++++++++++++++++++ 1 file changed, 65 insertions(+) create mode 100644 content/posts/2022.05.29.my-new-backup-kopia.md diff --git a/content/posts/2022.05.29.my-new-backup-kopia.md b/content/posts/2022.05.29.my-new-backup-kopia.md new file mode 100644 index 0000000..a95dc68 --- /dev/null +++ b/content/posts/2022.05.29.my-new-backup-kopia.md @@ -0,0 +1,65 @@ +--- +title: "My New Backup Setup with Kopia" +date: 2022-05-29T16:37:03-04:00 +draft: false +toc: false +images: +tags: + - untagged +--- + +I've recently switched to [Kopia](https://kopia.io/) after having some trouble with Duplicati. There +was some sort of issue with mono (the runtime used by Duplicati) not reading the +certificate files on my system, and failing to authenticate the Backblaze B2 +connections. After most workarounds I read online not solving the issue, and the +problem not being solved after months of waiting, I decided it might be time to +check out some other backup solutions. + +## What I want from backup software + +There are some features that I think are crucial for backup software. + +- Incremental backups. These save massive amounts of space, and it's + non-negotiable in my opinion. I'm not going to waste space storing a hundred + duplicates of each file, any sane backup solution must be able to deduplicate + the data in the backups. +- Client-side encryption. While I have some level of trust for the services I'm + backing up my data on, I don't trust them to not read through my data. Between Google implementing a [copyrighted material scanner](https://torrentfreak.com/google-drive-uses-hash-matching-detect-pirated-content/) and the [said scanner going haywire](https://www.bleepingcomputer.com/news/security/google-drive-flags-nearly-empty-files-for-copyright-infringement/), while I have nothing illegal in my backups I'd rather keep my data out of these services hands. +- Compression is also important to me. A lot of the data on my computer that I + want to back up is stuff like code files, configuration files, game saves and + such. A lot of these files are not compressed well or at all, so compressing + the backed up data can be a major win in terms of space savings. Modern + processors can decompress data faster than disks can read or write them with + the right algorithms, so this usually comes at effectively no cost too. Of + course this may be less important for you if what you are trying to back up is + already compressed data like images, videos, and music files. +- Being able to restore only some files or folders without doing a full restore. + Some services like Backblaze B2 charge you for data downloaded, so it's + important that if I'm only restoring a few files, I can do so without + downloading the entire archive. + +## Kopia + +Kopia checks all these boxes. Backups are incremental, everything is encrypted +client side. Compression is supported and is configurable, and you can mount +your backups to restore only a subset of files or read them without restoring. + +Something small that is amazing though is that Kopia can read `.gitignore` files +to filter those files out of your backups! This is amazing as a developer +because I already have gitignore files set up to ignore things like +`node_modules` or project build directories, which I wouldn't care about backing +up. Thanks to Kopia, these are immediately filtered out of my backups without +any extra effort. + +## Are incremental backups and compression really worth it? + +Yes, absolutely! + +Right now I have Kopia set up to back up my home directory, which contains about +9.8GB of data after all excluding all the cache files, video games, and applying +gitignores. I have 13 snapshots of this, which in total take up 4.9GB on disk. +13 snapshots take less space than the entirety of my home directory! + +I have Kopia set up to use pgzip compression which seems to be the best option +in terms of compression speed versus storage size. +