HDD with dark grey smoke coming off it.

RPi NAS: Extras – Replacing a Drive, Drive Selection and Space Balancing

19 January 2024

Sometimes hard drives fail but luckily we’ve set up our NAS so that at least two copies of each file exist. So we shouldn’t have lost any data. To make sure it stays like that we’ll look at replacing the failed drives in this post.

This post is part of a series about building a Network-Attached Storage (NAS) with redundancy using a Raspberry Pi (RPi). See here for a list of all posts in this series.


Replacing Broken Hard Drives

Let’s assume the drive at /nas_mounts/drive3 has failed. The bad news is that any data that belonged to a Samba share that only saves one copy of each file will be lost. The good news is that we set up our NAS so that important data is saved at least twice, so unless we’re really unlucky1 all important data should still be there.

Let’s also assume that we don’t have a replacement drive available right away. Ideally

  • our remaining drives have enough capacity to compensate for the failed drive and
  • there are enough drives with spare capacity so that the required number of duplicates can be saved2.

We’re in a risky situation if our remaining drives don’t have enough free space to compensate for the failed drive. In that case any additional drive failure could lead to a loss of data. So

  1. get a replacement drive ASAP and
  2. consider using a temporary medium, like a sufficiently large USB stick (or many), in the meantime3.

Note, you don’t have to use new, empty drives. Any drive with some free space can be added to Greyhole as described below in section Adding a Replacement Drive (obviously don’t format your drive if there’s already data on it, you may also want to choose a different mount point if it’s a temporary drive).

Removing the Broken Drive

Now let’s assume that there is sufficient space on our remaining drives (maybe only after adding temporary drives). First we have to let Greyhole know that the old drive is no longer available. We can do that by removing the drive from Greyhole with command

sudo greyhole --remove=/nas_mounts/drive3/gh

Greyhole will ask you if the drive is still available. In this case it isn’t available so we’ll have to answer with No.

Then we run a file system check to create new duplicates of the files that were previously saved on disk3 (this is not done automatically by the previous command).

sudo greyhole fsck

The new duplicates will be spread out among the remaining drives. Our data should now be safe even if another drive fails.

Adding the Replacement Drive

Once our new hard drive has arrived we can add it to Greyhole as we did during the initial setup (have a look at post 8 for a demonstration). To recapitulate:

  1. Connect the new drive and format it if necessary.
  2. Get the UUID of the new drive with the help of
ls -l /dev/disk/by-uuid
  1. Open /etc/fstab and replace the UUID of the failed drive with the one of the new drive (the mount point will stay the same, it doesn’t have to but it’s convenient that way).
  2. Save and close fstab. Make sure the new fstab contains no errors with
sudo systemctl daemon-reload
sudo findmnt --verify
  1. Mount the new drive with sudo mount -a and check if it was mounted correctly with lsblk.
  2. Next we need to create folder /nas_mounts/drive3/gh. If the drive is newly formatted then the permissions will not yet be correct. So let’s create the new gh directory and update permissions with
sudo mkdir /nas_mounts/drive3/gh
sudo chown -R nas_user:users /nas_mounts/drive3
  1. Now we can add the drive to Greyhole, for example using the Web Interface (see post 8) or via Greyhole’s configuration file. For the latter option open /etc/greyhole.conf and add the line below. Then close the file and restart greyhole with sudo systemctl restart greyhole.
storage_pool_drive = /nas_mounts/drive3/gh, min_free: 10gb

Greyhole won’t automatically move files from the existing drives to the new drive. You can tell Greyhole to balance the used space but that may not work as expected, see section Space Balancing for more information.

Replacing Working Hard Drives

Now let’s assume that drive3 has not yet failed but that our monthly S.M.A.R.T. check reported it as being close to failing. So we immediately bought a new hard drive which we now have available.

The procedure to replace the old drive is almost identical to the previous section. The only difference is that when we run

sudo greyhole --remove=/nas_mounts/drive3/gh

and Greyhole asks us if the drive is still available, we can answer with Yes. Any files that are only available on drive3 (i.e. files of shares that save data only once) will now be moved to the other drives. So we won’t lose any data.

When the command has finished we can unmount and physically remove the drive from the RPi. If some of our drives are of the same make then it may be difficult to spot which drive we want to remove (although failing drives are often easy to spot due to the noise they make). I put little stickers on my HDDs to make it easier to identify them. For example, the drive mounted under /nas_mounts/drive3 has a 3 written on its sticker.

So that’s all for the main part of this post. In the remaining two sections we’ll look more closely at a few additional aspects of how Greyhole works.

Drive Selection Algorithms

Let’s take a brief detour to drive selection algorithms. When new files get added to a Samba share that’s managed by Greyhole, GH has to decide to which drives the new files should be copied. By default it chooses the drives with the largest amount of free space. An alternative is to choose the drives to which Greyhole has so far saved the least amount of data 4. You can select which algorithm Greyhole should use by opening Greyhole’s configuration file (vim /etc/greyhole.conf) and setting the option drive_selection_algorithm. You can also do it on the web-interface under Greyhole Config – Drive Selection.

If all your drives are roughly of the same size then both algorithms will fill-up your drives evenly. The difference only becomes apparent when some drives are a lot larger than others.

For example, consider a setup with two 4TiB disks and two 2TiB disks and each file is saved twice. Under the alternative algorithm, as you add data to the NAS all four disks will be filled up evenly until there are (roughly) 2 TiB on each of the four disks. If a drive fails now then 2TiB will have to be copied to a replacement drive (or to the remaining existing drives), regardless of which drive fails. Compare that to the default algorithm where a much larger amount of data has to be copied when a 4TiB disk fails than when a 2TiB fails (which can be problematic5).

On the other hand, consider a setup with one 4TiB disk and two 2TiB disks. Under the alternative policy all three disks will fill up evenly until there are (roughly) 2TiB on each disk. Once that point is reached only the 4TiB drive will have any space left. Thus, any files that are subsequently added to the NAS cannot be saved on two drives but only on the one 4TiB drive. So there won’t be any data redundancy for new files until you add new drives6. Under the default algorithm we would be able to add new files (with redundancy) until all drives are full7.

Personally, I’ve stuck to the default policy because I don’t see a huge advantage from the alternative policy.

Space Balancing

So after this little detour let’s come back to the main topic of this post. A drive that’s been newly added to the NAS will typically be empty. So the new drive will probably have a lot of free space while the existing drives don’t. With the following command we can tell Greyhole to move files across drives so that the available space becomes more evenly balanced.

sudo greyhole --balance

Let’s have a closer look at how this command works (I’m going into details here because at first I was a bit surprised when nothing happened after running the command on my test setup).

First the target free space is calculated by taking the total amount of space available on all drives (minus the minimum free space which defaults to 10GiB) and dividing it by the number of drives. That’s just the average free space across all drives. So for example, let’s assume we have the following four drives:

  • Drive0, total capacity 916GiB, free space 849GiB
  • Drive1, total capacity 56.1GiB, free space 33.2GiB
  • Drive2, total capacity 112GiB, free space 107GiB
  • Drive3, total capacity 28GiB, free space 16.6GiB

Since drive0 is much larger than the others, one copy of all files that were added to the NAS were saved on drive0 and a second copy was saved on one of the other drives. The target free space is calculated as

(849 + 33.2 + 107 + 16.6 – 4 * 10) / 4 = 241.45GiB.

Any drive with more free space than the target free space can receive files (so there will be less free space afterwards). Any drive with less free space than the target free space can have files moved out (so there will be more free space afterwards). The intended effect is that free space becomes more even across all drives. That’s exactly what the default Drive Selection Algorithm (see previous section) aims to achieve.

In the specific example we’re using here one drive is significantly larger than the others8. In this case drive0 can receive files (849 > 241.45) but drives 1, 2 and 3 cannot receive files (33.2 < 241.45, …). The only way to reduce target free space is by moving files from drives 1, 2 or 3 to drive 0. However, in this setup each file is already saved once on drive0 so no file will be moved at all (this is perfectly in line with the default Drive Selection Algorithm).

If you’re using the alternative Drive Selection Algorithm then you probably want to use a rebalancing strategy that results in files being evenly distributed across drives (rather than free space being evenly distributed across drives). As of when I’m writing this post such a rebalancing strategy is not implemented (see here) but if you really need it then you can raise an issue on Greyhole’s github page9.


Footnotes:

  1. We’ll also loose any data that was saved twice but whose second copy suffered from data corruption. See this post for a way to minimize (but not eliminate) that risk. For really important data there are recovery tools or professional recovery services … ↩︎
  2. Specifically, the remaining drives should have enough free space to hold new (to-be-created) duplicates of the files that were on drive3. Also, for each file on drive3 there should be at least one drive with spare capacity that doesn’t yet contain the file. ↩︎
  3. You can use the commands from section Adding the Replacement Drive to add temporary drives, but create a new mount point for the temporary drives instead of reusing the existing /nas_mounts/drive3. ↩︎
  4. So in the default algorithm we focus on balancing free space whereas in the alternative algorithm we focus on balancing used space (space used by Greyhole). ↩︎
  5. Having to copy more data takes longer and, depending on your hardware (e.g. for HDDs), may cause additional drives to fail due to increased wear. This can be risky. For example, consider the situation where a 4TiB drive fails. We now have to copy all data from the other 4TiB drive to the two 2TiB drives. If the second 4TiB drive fails during the copy process then we may loose all data. ↩︎
  6. You could then reshuffle data from the 2TiB disks to the 4TiB disk to make some space. E.g. if a file is saved on both 2TiB disks (but not on the 4TiB disk) then you could move one copy to the 4TiB disk. Based on my tests you’d have to do that manually, though. Greyhole doesn’t automatically do it for you. ↩︎
  7. In this specific scenario we’ve copied 6TiB to the NAS and each file is saved twice. So in total 3TiB of unique data have been written. Under the default algorithm 3TiB would have gone onto the 4TiB drive and 1.5TiB on each of the 2TiB drives. So there’s another 1TiB of space left on the 4TiB disk and another 0.5TiB of space available on each of the 2TiB drives. ↩︎
  8. That’s also the case in my actual setup although not as extreme. ↩︎
  9. If you really want to move files then you can also do it manually with sudo greyhole --mv ... (see Greyhole’s man pages for details). ↩︎

This post is tagged as:



Comments

Leave a Reply

Your email address will not be published. Required fields are marked *