RPi NAS: Extras – Blocking Greyhole while Copying to the NAS

10 January 2024

On the Raspberry Pi 4b the bandwidth of the USB bus is limited to 4Gb/s for all USB devices taken together. If you’re copying a lot of data onto the NAS then Greyhole will start moving it while the copy process is still active. This can cause a little congestion on the USB bus, especially if multiple duplicates are created of the files on your share. In this post we’re going to look at two ways to prevent Greyhole from running while the copy process is active.

This post is part of a series about building a Network-Attached Storage (NAS) with redundancy using a Raspberry Pi (RPi). See here for a list of all posts in this series.


The upside of temporarily blocking Greyhole is that the copy process will generally be a bit faster. At least that’s true if we copy multiple files. If we only copy one large file then there will be no impact because, regardless of whether we block it or not, Greyhole will only be able to start moving that file when the copy process has been finished. I can think of two potential downsides of this approach but both of them feel quite contrived:

  • If the drive that holds our Samba share has less free space than the amount of data we want to copy to the NAS then we’ll get a copy error once the drive’s capacity is reached. This may be avoided to some extent if Greyhole immediately starts moving data from the share to the storage drives (although if the drive holding our Samba share is not large enough then we should probably reconsider the Samba share’s placement and the disk layout we’ve chosen).
  • If the drive holding the Samba shares fails during the copy process then we’ll lose the data on the NAS because no duplicates have yet been created (although we probably still have the data on your computer).

Quite contrived, right? If you can think of real disadvantages please let me know in the comments below.

Using a Lock File

One way of blocking Greyhole from running while we copy data to the Samba share is to first create a lock file. In bash we can do so with

echo "I'm a Lock File" > /path/to/Samba/share/lockfile; exec 3>/path/to/Samba/share/lockfile; flock -x 3

This command creates a file called lockfile on the Samba share. The file doesn’t have to be called lockfile, you can give it any name you like. Note, we write I’m a Lock File into the file. It doesn’t have to be that specific text. It can be anything. Just don’t use touch /path/to/Samba/share/lockfile because Greyhole will simply ignore the file if it’s empty.

The second part of the command gets a handle on the lock file. File handles 0, 1 and 2 refer to standard input, standard output and standard error. File handles 3-9 can be freely used. This handle is just a way for the system to be able to reference the file. Finally we use flock to actually lock the file. flock requires a file handle, not a file name (that’s why we opened the handle before).

We run all commands in one line (i.e. in quick succession) to minimize the chance that Greyhole starts processing it after we’ve created it but before we’ve locked it. (Once the file is processed locking it will have no impact.)

At this point there’s a file on the share but it’s locked. Greyhole works on new files in the order in which they were added to the share. Since lockfile is locked Greyhole won’t work on it (or on any file added to the share after lockfile) until we release the lock. So now we can copy data onto the share without Greyhole processing it.

Note, Greyhole will still run and realize that new files are being added. It just won’t start duplicating and moving them.

When we’re done with the copy process we have to release the lock with

exec 3>&-; rm /path/to/Samba/share/lockfile

Behaviour

With this lock file we can choose if we want to use it when copying some data onto the NAS. Or put differently, with the lock file we will have to remember to use it. There is also some (potentially) unwanted behaviour if we lock a share repeatedly:

Consider a situation where we create a lock file, copy some data and delete the lock file. Shortly afterwards we want to add some more data. So we create another lock file, copy the additional data and then remove the lock file. After we created the second lock file Greyhole won’t process any of the additional data (as intended). It will, however, process all of the original data we copied onto the NAS before. So in that situation the lock file will not block Greyhole from working on the original data while we copy the additional data. This may not be desired.

Using a Script

Greyhole allows for scripts to be run when certain events happen1. This is achieved with hooks. Before we look at those hooks, let’s create a script that will check if data is currently being copied to a Samba share. If that’s the case then the script will sleep and thereby block Greyhole. The script will only return when no more data is being copied.

To find out if data is being copied to a Samba share we can use the command sudo smbstatus -L. The output of the command contains a list of files that are currently locked2. Those files are normally locked because they are being written to. If no files are locked then smbstatus outputs No locked files. For some reason that output goes to the error stream instead of the standard output. Thus, to parse it with grep we redirect the error stream to the standard output (with 2>&1). The script sleeps as long as there are locks (i.e. as long as data is being written to the NAS). It only returns once all locks are gone.

#!/usr/bin/bash
while: 
do
    # smbstatus puts 'No locked files' into the error stream
    if sudo smbstatus -L 2>&1 | grep -qF 'No locked files'; then
        break
    fi
    sleep 0.5s
done

Now let’s have a look at Greyhole’s hooks. The available hooks are

create, edit, rename, delete, mkdir, rmdir,

warning, error, critical, fsck,

idle and not_idle.

If you’re interested in more information about those hooks, have a look at the manual of greyhole.conf. I think for our purpose hooks create and not_idle are useful.

If we choose not_idle then Greyhole will run the script just before it starts processing files. Let’s assume we’re doing a large file transfer. Once the first file has been copied to the NAS, Greyhole’s daemon will become active and it will call the script. The process that copies files is still active and so there are file locks. The script will block until all files have been copied. Then Greyhole will start duplicating and moving files.

If we choose create then the script will be run (after) every time a file was copied from the Samba shares folder to the storage drives. This add some overhead to Greyhole’s operations but it will interrupt Greyhole any time new data gets copied to the NAS.

To add a hook add the following line to file /etc/greyhole.conf and restart Greyhole’s daemon (sudo systemctl restart greyhole). Of course the script must have the execute bit set (chmod +x /path/to/shell/script.sh).

hook[not_idle] = /path/to/shell/script.sh

Behaviour

Unlike the lock file, the script will always run so we don’t have to remember to use it. Let’s consider the same situation we described at the end of the previous section (repeated locking and copying of data to the NAS) but instead of the lock file we use the script. In that case the behaviour will depend on when the script is run. If it’s run at event not_idle then the behaviour will be the same as for the lock file. If it’s run after event create then Greyhole will stop processing existing data when new data gets added.

A potential downside of the latter option is that you’ll have to make sure to give Greyhole enough time to process new data before the drive holding the Samba share is full (Greyhole won’t move out any data while you write to the NAS so the drive holding Samba shares will just keep growing while you add data). Of course you could expand the shell script so that it only works when there is sufficient space available.

That’s the beauty of Linux. You can do adapt anything to your liking.


Footnotes:

  1. See section hook in man greyhole.conf for details. ↩︎
  2. The option -L means that only information about locks is printed, nothing else. ↩︎

This post is tagged as:



Comments

Leave a Reply

Your email address will not be published. Required fields are marked *