add "my local data storage setup"

2022-03-10 00:41:44 -05:00 · 2022-03-10 00:41:44 -05:00 · 36abeb13c5
parent ddf6f1bc1e
commit 36abeb13c5
2 changed files with 189 additions and 1 deletions
--- a/build.boot
+++ b/build.boot
@ -22,7 +22,7 @@
        (sift :to-resource #{#"^extra/(.*)"})
        (sift :to-resource #{#"^CNAME"})
        (garden :styles-var 'site.styles/base :output-to "main.css" :pretty-print (if optimize? false true))
-        (cljs :optimizations (if optimize? :advanced :none) :source-map (if optimize? false true))
+        ;(cljs :optimizations (if optimize? :advanced :none) :source-map (if optimize? false true))
        (perun/ttr)  ;; Time to read
        (perun/word-count)
        (perun/render :renderer 'site.core/page)
--- a/content/raid.md
+++ b/content/raid.md
@ -0,0 +1,188 @@
+---
+title: My local data storage setup
+date: 2022-03-10
+---
+
+Recently, I've needed a bit more storage. In the past I've relied on Google
+Drive, but if you need a lot of space Google Drive becomes prohibitively
+expensive. The largest option available, 2 TB, runs you $100 a year at the time
+of writing. While Google Drive comes with a lot of features, it also comes with
+a lot of privacy concerns, and I need more than 2 TB anyway. Another option
+would be Backblaze B2 or AWS S3, but the cost is even higher. Just to set a
+point of comparison, 16 TB of storage would cost $960 a year with B2 and a
+whopping $4000 a year with S3.
+
+Luckily in reality, the cost of storage per GB has been coming down steadily.
+Large hard drives are cheap to come by, and while these drives are not
+incredibly fast, they are much faster than the speed of my internet connection.
+Hard drives it is then!
+
+While I could get a very large hard drive, it's generally a better idea to get
+multiple smaller hard drives. That's because these drives often offer a better
+$/GB rate, but also because it allows us to mitigate the risk of data loss. So
+after a bit of search, I found these "Seagate Barracuda Compute 4TB" drives. You
+can find them on [Amazon](https://www.amazon.com/gp/product/B07D9C7SQH/) or
+[BestBuy](https://www.bestbuy.com/site/seagate-barracuda-4tb-internal-sata-hard-drive-for-desktops/6387158.p?skuId=6387158).
+
+These hard drives are available for $70 each at the time I'm writing this,and I bought 6 of them.
+This gets me to around $420, plus a bit more for SATA cables.
+Looking at [Backblaze Hard Drive Stats](https://www.backblaze.com/blog/backblaze-drive-stats-for-2021/),
+I think it's fair to assume these drives will last at least 5 years.
+Dividing the cost by the expected lifetime, that gets me $84 per year, far below what the cloud storage costs!
+It's of course not as reliable, and it requires maintenance on my end, but
+the difference in price is just too far to ignore.
+
+## Setup
+
+I decided to set this all up inside my desktop computer. I have a large case so
+fitting all the hard drives in is not a big problem, and my motherboard does
+support 6 SATA drives (in addition to the NVMe that I'm booting off of). I also
+run Linux on my desktop computer, so I've got all the required software
+available.
+
+For the software side of things, I decided to go with `mdadm` and `ext4`. There
+are also other options available like ZFS (not included in the linux kernel) or
+btrfs (raid-5 and raid-6 are known to be unreliable), but this was the setup I
+found the most comfortable and easy to understand for me. How it works is that
+`mdadm` combines the disks and presents it as a block device, then `ext4`
+formats and uses the block device the same way you use it with any regular
+drive.
+
+### Steps
+
+I was originally planning to write the steps I followed here, but in truth I
+just followed whatever the [ArchLinux wiki](https://wiki.archlinux.org/title/RAID#Installation)
+told me. So I'll just recommend you follow that as well.
+
+The only thing I'll warn you is that the wiki doesn't clearly note just how long
+this process takes. It took almost a week for the array to build, and until the
+build is complete the array runs at a reduced performance. Be patient, and just
+give it some time to finish. As a reminder, you can always check the build
+status with `cat /dev/mdstat`.
+
+## Preventative maintenance
+
+Hard drives have a tendency to fail, and because RAID arrays are resilient, the
+failures can go unnoticed. You **need** to regularly check that the array is
+okay. Unfortunately, while there are quite a few resources online on how to set
+up RAID, very few of them actually talk about how to set up scrubs (full scans
+to look for errors) and error monitoring. 
+
+For my setup, I decided to set up systemd to check and report issues. For this,
+I first set up 2 timers: 1 that checks if there are any reported errors on the
+RAID array, and another that scrubs the RAID array. Systemd timers are 2 parts,
+a service file and a timer file, so here's all the files.
+
+- `array-scrub.service`
+  ```toml
+  [Unit]
+  Description=Scrub the disk array
+  After=multi-user.target
+  OnFailure=report-failure-email@array-scrub.service
+
+  [Service]
+  Type=oneshot
+  User=root
+  ExecStart=bash -c '/usr/bin/echo check > /sys/block/md127/md/sync_action'
+
+  [Install]
+  WantedBy=multi-user.target
+  ```
+- `array-scrub.timer`
+  ```toml
+  [Unit]
+  Description=Periodically scrub the array.
+
+  [Timer]
+  OnCalendar=Sat *-*-* 05:00:00
+
+  [Install]
+  WantedBy=timers.target
+  ```
+
+The timer above is the scrub operation, it tells RAID to scan the drives for
+errors. It actually takes up to a couple days in my experience for the scan to
+complete, so I run it once a week.
+
+- `array-report.service`
+  ```toml
+  [Unit]
+  Description=Check raid array errors that were found during a scrub or normal operation and report them.
+  After=multi-user.target
+  OnFailure=report-failure-email@array-report.service
+
+  [Service]
+  Type=oneshot
+  ExecStart=/usr/bin/mdadm -D /dev/md127
+
+  [Install]
+  WantedBy=multi-user.target
+  ```
+- `array-report.timer`
+  ```toml
+  [Unit]
+  Description=Periodically report any issues in the array.
+
+  [Timer]
+  OnCalendar=daily
+
+  [Install]
+  WantedBy=timers.target
+  ```
+
+And this timer above checks the RAID array status to see if there were any
+errors found. This timer runs much more often (once a day), because it's
+instant, and also because RAID can find errors during regular operation even
+when you are not actively running a scan.
+
+### Error reporting
+
+Another important thing here is this line in the service file:
+```toml
+OnFailure=report-failure-email@array-report.service
+```
+
+The automated checks are of no use if I don't know when something actually
+fails. Luckily, systemd can run a service when another service fails, so I'm
+using this to report failures to myself. Here's what the service file looks like:
+
+- `report-failure-email@.service`
+  ```toml
+  [Unit]
+  Description=status email for %i to user
+
+  [Service]
+  Type=oneshot
+  ExecStart=/usr/local/bin/systemd-email address %i
+  User=root
+  ```
+- `/usr/local/bin/systemd-email`
+  ```sh
+  #!/bin/sh
+
+  /usr/bin/sendmail -t <<ERRMAIL
+  To: homelab@bgenc.net
+  From: systemd <root@$HOSTNAME>
+  Subject: Failure on $2
+  Content-Transfer-Encoding: 8bit
+  Content-Type: text/plain; charset=UTF-8
+
+  $(systemctl status --lines 100 --no-pager "$2")
+  ERRMAIL
+  ```
+
+The service just runs this shell script, which is just a wrapper around
+sendmail. The `%i` in the service is the part after `@` when you use the
+service, you can see that the `OnFailure` hook puts `array-report` after the `@`
+which then gets passed to the email service, which then passes it on to the mail
+script.
+
+To send emails, you also need to set up `sendmail`. I decided to install
+[msmtp](https://wiki.archlinux.org/title/Msmtp), and set it up to use my GMail
+account to send me an email.
+
+To test if the error reporting works, edit the `array-report.service` and change
+the line `ExecStart` line to `ExecStart=false`. Then run the report service with
+`systemd start array-report.service`, you should now get an email letting you
+know that the `array-report` service failed, and attaches the last 100 lines of
+the service status to the email.