RPi NAS: Extras – Greyhole

I thought it might be a good idea to briefly talk about how Greyhole works before we continue with the main series of posts. So this post is intended to give an overview of Greyhole, it doesn’t contain any detailed instructions (those will be covered separate posts).


We’ll use Greyhole to provide data redundancy for our NAS. In part 8 of this series we’ll do the actual installation and detailed setup but until then it’s probably good to have a rough understanding of how Greyhole works.

Setup

Let’s first look at a few important steps when setting up Greyhole:

  • First we have to create Samba shares. Those are network folders which our users can mount and use for storing and retrieving data. Users will only ever interact with the Samba shares.
  • Then we add some (hard- or solid-state-) drives to Greyhole. This is where our data will actually be saved. Users won’t access those drives directly (only indirectly, via the Samba shares). We’ll call these drives storage drives.
  • Next we choose the Samba shares for which Greyhole should be active. Greyhole can only provide data redundancy on shares for which it is enabled. If Greyhole is not enabled for a Samba share then the data of that share will not be moved to the storage drives (but it will remain on the share and there won’t be any redundancy).
  • Finally, we have to decide how many copies of the files on a Samba share should be saved. Each copy will be saved on a different drive. So if, for example, we’ve added three drives to Greyhole, then each file on a Samba share can be saved at most three times. We can set the number of copies for each share separately (so there may be two copies saved of all files on share A and three copies of all files on share B).

Saving and Retrieving Data

As mentioned in the introduction, if Greyhole is activated for a Samba share, then it monitors all transactions on that share. If a file is copied to the share then Greyhole will want to process it. Generally, Greyhole processes files sequentially. So the first file that’s copied onto the share is processed first, followed by the second, etc.

If a file gets locked before Greyhole has been able to process it (e.g. if a user opened it for writing immediately after creating the file) then Greyhole will wait until the file becomes available. So the locked file and all files that were added to the share after the locked file will only be processed after the lock has been released (e.g. after the file has been closed).

When there is no lock on a file then Greyhole can process it. That normally happens almost immediately (within a few seconds of the file becoming available). Depending on your configuration, two or more copies of each file will be saved1. Each of the copies is saved on a different storage drive. So when Greyhole processes a file it has to decide on which drives to save it. Currently you can choose from two different storage selection algorithms. The default algorithm saves the new file on the drives with the largest amount of available space (see this post for more details).

After the drives have been selected, Greyhole copies the file to each of the chosen drives. Then it replaces the original file on the Samba share with a symbolic link to one of the new copies of the file.

When we read a file on the Samba share then we actually read the symbolic link, which redirects us to one of the storage drives2.

Trash

If we don’t need a file anymore we can delete it from the Samba share. By default Greyhole then moves the copies on the storage drives into a trash folder. That means that deleting a file doesn’t immediately free up space. For that we’ll have to empty the trash. If you don’t want to use a trash then you can deactivate it. We’ll look at Greyhole’s trash in more detail in this post.

General Data Access

Generally, all interactions with the NAS should be done via (mounted) Samba shares. Greyhole can only process changes if it knows about them, and it will only know about them if they happen through Samba. So we shouldn’t change any of the data on the storage drives directly.

We’re almost done. So when a user saves data to the NAS it is first stored on a Samba share and then moved to some of the storage drives. That means that we have to be careful with choosing where our Samba shares are saved. They must be in a location with enough free space to hold all data we want to copy to our NAS at once. The space taken by the data will be freed up again as Greyhole moves it from the Samba shares to the storage drives. Greyhole’s wiki recommends saving them on the largest storage drive.

Finally, to save data to the NAS we don’t write it directly to the disk location where the Samba shares are saved. Instead we have to mount the shares and write our data to the mounted shares (so Samba and Greyhole know about it and can process it).

I think that’s a good start into how Greyhole works. Now let’s continue with the main part of this series.


Footnotes:

  1. You could also just save one copy but then there’s no real need for Greyhole. ↩︎
  2. Note, you will only read the data from one of the drives. This is different, for example, to how some RAID setups work where the data can be read in parallel from multiple drives, which improves read speeds. ↩︎

This post is tagged as:



Comments

Leave a Reply

Your email address will not be published. Required fields are marked *