bgenc.net/output/posts/raid.gmi

# My local data storage setup

2022-03-10 00:00

Recently, I've needed a bit more storage. In the past I've relied on Google Drive, but if you need a lot of space Google Drive becomes prohibitively expensive. The largest option available, 2 TB, runs you 960 a year with B2 and a whopping $4000 a year with S3.

Luckily in reality, the cost of storage per GB has been coming down steadily. Large hard drives are cheap to come by, and while these drives are not incredibly fast, they are much faster than the speed of my internet connection. Hard drives it is then!

While I could get a very large hard drive, it's generally a better idea to get multiple smaller hard drives. That's because these drives often offer a better $/GB rate, but also because it allows us to mitigate the risk of data loss. So after a bit of search, I found these "Seagate Barracuda Compute 4TB" drives. You can find them on Amazon or BestBuy.

=> https://www.amazon.com/gp/product/B07D9C7SQH/ Amazon
=> https://www.bestbuy.com/site/seagate-barracuda-4tb-internal-sata-hard-drive-for-desktops/6387158.p?skuId=6387158 BestBuy

These hard drives are available for 420, plus a bit more for SATA cables. Looking at Backblaze Hard Drive Stats, I think it's fair to assume these drives will last at least 5 years. Dividing the cost by the expected lifetime, that gets me $84 per year, far below what the cloud storage costs! It's of course not as reliable, and it requires maintenance on my end, but the difference in price is just too far to ignore.

=> https://www.backblaze.com/blog/backblaze-drive-stats-for-2021/ Backblaze Hard Drive Stats

## Setup

I decided to set this all up inside my desktop computer. I have a large case so fitting all the hard drives in is not a big problem, and my motherboard does support 6 SATA drives (in addition to the NVMe that I'm booting off of). I also run Linux on my desktop computer, so I've got all the required software available.

For the software side of things, I decided to go with `mdadm` and `ext4`. There are also other options available like ZFS (not included in the linux kernel) or btrfs (raid-5 and raid-6 are known to be unreliable), but this was the setup I found the most comfortable and easy to understand for me. How it works is that `mdadm` combines the disks and presents it as a block device, then `ext4` formats and uses the block device the same way you use it with any regular drive.

### Steps

I was originally planning to write the steps I followed here, but in truth I just followed whatever the ArchLinux wiki told me. So I'll just recommend you follow that as well.

=> https://wiki.archlinux.org/title/RAID#Installation ArchLinux wiki

The only thing I'll warn you is that the wiki doesn't clearly note just how long this process takes. It took almost a week for the array to build, and until the build is complete the array runs at a reduced performance. Be patient, and just give it some time to finish. As a reminder, you can always check the build status with `cat /dev/mdstat`.

## Preventative maintenance

Hard drives have a tendency to fail, and because RAID arrays are resilient, the failures can go unnoticed. You **need** to regularly check that the array is okay. Unfortunately, while there are quite a few resources online on how to set up RAID, very few of them actually talk about how to set up scrubs (full scans to look for errors) and error monitoring.

For my setup, I decided to set up systemd to check and report issues. For this, I first set up 2 timers: 1 that checks if there are any reported errors on the RAID array, and another that scrubs the RAID array. Systemd timers are 2 parts, a service file and a timer file, so here's all the files.

* `array-scrub.service` ```toml [Unit] Description=Scrub the disk array After=multi-user.target OnFailure=report-failure-email@array-scrub.service

[Service]   Type=oneshot   User=root   ExecStart=bash -c '/usr/bin/echo check > /sys/block/md127/md/sync_action'

[Install]   WantedBy=multi-user.target

```
- `array-scrub.timer`
  ```toml
  [Unit]
  Description=Periodically scrub the array.

  [Timer]
  OnCalendar=Sat *-*-* 05:00:00

  [Install]
  WantedBy=timers.target
```

The timer above is the scrub operation, it tells RAID to scan the drives for errors. It actually takes up to a couple days in my experience for the scan to complete, so I run it once a week.

* `array-report.service` ```toml [Unit] Description=Check raid array errors that were found during a scrub or normal operation and report them. After=multi-user.target OnFailure=report-failure-email@array-report.service

[Service]   Type=oneshot   ExecStart=/usr/bin/mdadm -D /dev/md127

[Install]   WantedBy=multi-user.target

```
- `array-report.timer`
  ```toml
  [Unit]
  Description=Periodically report any issues in the array.

  [Timer]
  OnCalendar=daily

  [Install]
  WantedBy=timers.target
```

And this timer above checks the RAID array status to see if there were any errors found. This timer runs much more often (once a day), because it's instant, and also because RAID can find errors during regular operation even when you are not actively running a scan.

### Error reporting

Another important thing here is this line in the service file:

```toml
OnFailure=report-failure-email@array-report.service
```

The automated checks are of no use if I don't know when something actually fails. Luckily, systemd can run a service when another service fails, so I'm using this to report failures to myself. Here's what the service file looks like:

* `report-failure-email@.service` ```toml [Unit] Description=status email for %i to user

[Service]   Type=oneshot   ExecStart=/usr/local/bin/systemd-email address %i   User=root

```
- `/usr/local/bin/systemd-email`
  ```sh
  #!/bin/sh

  /usr/bin/sendmail -t <<ERRMAIL
  To: homelab@bgenc.net
  From: systemd <root@$HOSTNAME>
  Subject: Failure on $2
  Content-Transfer-Encoding: 8bit
  Content-Type: text/plain; charset=UTF-8

  $(systemctl status --lines 100 --no-pager "$2")
  ERRMAIL
```

The service just runs this shell script, which is just a wrapper around sendmail. The `%i` in the service is the part after `@` when you use the service, you can see that the `OnFailure` hook puts `array-report` after the `@` which then gets passed to the email service, which then passes it on to the mail script.

To send emails, you also need to set up `sendmail`. I decided to install msmtp, and set it up to use my GMail account to send me an email.

=> https://wiki.archlinux.org/title/Msmtp msmtp

To test if the error reporting works, edit the `array-report.service` and change the line `ExecStart` line to `ExecStart=false`. Then run the report service with `systemd start array-report.service`, you should now get an email letting you know that the `array-report` service failed, and attaches the last 100 lines of the service status to the email.