Four HDDs

RPi NAS: Part 5 Storage Drives

21 January 2024

In the previous post we selected a USB Hub for our NAS. But before we can save the data that arrives via Ethernet we have to pick a drive on which to store the Samba shares. In this post we’ll investigate which transfer speeds we can expect for HDDs and SSDs. Then we’ll pick one of them to hold our Samba shares.

This post is part of a series about building a Network-Attached Storage (NAS) with redundancy using a Raspberry Pi (RPi). See here for a list of all posts in this series.


In the second part of this series we looked at the boot medium for our Raspberry Pi. The version of Raspberry Pi OS we used (2023-10-10) requires at least 5.8GB disk space. That’s small enough for us to consider an SD card or a USB stick as boot medium. But for storing data we want devices whose capacity is on the order of terabytes. So the only real options available to us for storage are HDDs and SSDs.

Everything we said in the second part about access times and transfer speeds still holds here. In that post we found that under ideal circumstances SSDs should outperform HDDs. However, if you’re like me then you’ll want to reuse some HDDs that you already have at home (on which you’ve been storing your data). Also, HDDs are still quite a bit cheaper than SSDs so they are an attractive option for bulk-storing data.

Remember from our discussion about Greyhole that data is first copied onto a Samba share and then moved and duplicated to a set of storage drives. I’m going to assume that we’re reusing the existing HDDs as storage drives. The location of the Samba shares, however, is not yet determined. We could put them on an existing HDD or on a (potentially newly bought) SSD.

The objective of this post is to find out if putting our Samba shares on an SSD can give a significant speed boost over putting them on a HDD. This is not guaranteed because the overall performance will be limited by

  • the network speed (~117.5MB/s),
  • the hardware of the Raspberry Pi, and
  • the software we use for data redundancy (Greyhole).

Note, both the 3.5″ HDD and the SSD can achieve much higher speeds than 117.5MB/s, so their performance will not limit read and write speeds.

Speed Tests

Initially I wrote a section about the kinds of file access patterns that occur at different stages of the NAS. However, it ended up being quite a dry read and it’s probably not necessary. So let’s get right to the speed tests.

Types of Speed Tests

There are two major ways in which drives are typically benchmarked,

  • sequential and
  • random

reads and writes. Sequential tests read/write data from/to a contiguous part of the disk (i.e. to locations adjacent to each other1). Random tests read/write from/to random locations on the disk. The latter is typically slower because it has the additional overhead of addressing the new locations (and for HDDs the physical head has to move2).

Which of those tests (if any) is more appropriate depends on the intended usage of the device. For example, a web-server will get lots of queries from different users. To satisfy them, databases will be accessed at lots of different locations. For that application fast random read access is important.

There is a very good and powerful program to test random and sequential reads and writes called fio. You can use it to simulate all sorts of disk access patterns. However, we know pretty well what kind of access patterns we can expect. So instead of synthetic test programs, let’s use some actual data and see how HDDs and SSDs perform.

I’ll test two scenarios:

  1. Copying an archived directory (tar.gz) of project data of size 24.1GB.
  2. Copying a directory of photographs of size 17.5GB. The directory contains multiple sub-folders (one for each day of a trip). Each sub-folder contains many RAW (~32MB per file), JPG (~16MB) and sidecar (~2KB) files.

The scripts I used for running the speed tests can be found here.

Test Environment

To get reliable results we need a consistent environment. For these tests I connected my computer and the Raspberry Pi via Ethernet to a 4G internet router. I did most of the tests on one day. During that day somebody in my household was using the WiFi to stream movies. Initially I didn’t think that would be an issue but on the next day, when I wanted to run some additional tests, I got much faster speeds. I suspect that this 4G router is not very powerful and that Ethernet speeds drop when WiFi is used at the same time.

So unless stated otherwise, the results below refer to speeds obtained over Ethernet while somebody was streaming movies over WiFi. Thus, these speeds may be underestimating the speeds we can achieve from a better router or a less-used network.

Copying Data onto the NAS

Before we do an actual test we have to decide on which command to use for copying data. Two of the most common commands (at least on Linux systems) are cp and rsync. Both can copy data but rsync has a lot more functions and its primary use is to copy data to remote locations3. One of its features is that, after copying a file, it uses checksums to test if the file was copied correctly. So there is a good chance that errors during the copy process a found and corrected. I did the tests below with both commands. The cp command was marginally faster so I’m reporting those numbers here4.

Let’s start with copying data to Samba shares for which Greyhole is disabled. So although Greyhole has been started, the Greyhole daemon is idle throughout the process. This should give us an idea of the overhead caused by Samba (and by somebody streaming a video over WiFi).

The table below shows copy speeds for when the Samba shares are on the SSD or on the 3.5″ HDD respectively. Two 2.5″ HDDs are connected via a powered USB hub as storage drives (but they are not used in this test because Greyhole is not active for the Samba share). All HDDs are already up and running for all tests in this section, so no spin-up time is included.

DriveScenario 1
Speed in MB/s
Scenario 2
Speed in MB/s
SSD112.7696.17
3.5″ HDD112.9692.95
Copying data to a Samba share for which Greyhole is disabled. All data was copied six times, the first one was discarded. The reported numbers are the average of the remaining five runs.

I also tested putting Samba shares on my third 2.5″ HDD but it repeatedly locked up during the copy process5.

Before we look at the numbers I want to point out that these are empirical measurements over just five samples. I don’t report any deviation statistics here (to keep the tables readable) but the standard deviation is usually a very small number of MB/s. So these numbers give a good overall impression but small differences in them could be due to chance.

Now to the numbers. These write speeds are a bit lower than the network speed of 117.5MB/s. In scenario 1 we copy one large file to the NAS so there’s no overhead from opening and closing multiple small files (like in the second scenario). The difference to the network speed is roughly 5MB/s. When I ran the test again on the second day I got a speed of 116.1MB/s. So ~1.5MB/s is a rough guess of the overhead added by Samba and 3.5MB/s is a rough guess of the overhead added by somebody streaming over WiFi. The difference between scenarios 1 and 2 (~15-20MB/s) is a rough estimate of the overhead of opening/copying/closing multiple smaller files instead of one large file. The SSD and the 3.5″ HDD perform similar with the SSD being a bit faster for multiple smaller files.

Next we’re going to enable Greyhole for the Samba share but we’ll use a lock file to prevent Greyhole from duplicating data until the copy process has finished. So Greyhole will run and it will notice that there are new files but it won’t copy them. See this post for the commands to create and remove the lock file.

DriveScenario 1
Speed in MB/s
Scenario 2
Speed in MB/s
SSD91.51
3.5″ HDD89.16
Copying data to a Samba share for which Greyhole is enabled. A lock file is used so Greyhole cannot start duplicating data during the process. All data was copied six times, the first one was discarded. The reported numbers are the average of the remaining five runs.

We’ve lost roughly 4MB/s to Greyhole’s non-copy-related overheads6. Initially I only ran this test for scenario 2. On the second day I wanted to run it for scenario 1 but again I got much higher speeds (roughly 116MB/s). I didn’t put the numbers into the table because it wouldn’t be a fair comparison to the numbers of scenario 2.

Back to the first day. Let’s now run the full system. Greyhole is enabled and it duplicates data while we still copy to the NAS.

DriveScenario 1
Speed in MB/s
Scenario 2
Speed in MB/s
SSD108.2788.49
3.5″ HDD104.9684.45
Copying data to a Samba share with Greyhole enabled. All data is duplicated onto two 2.5″ HDDs. All data was copied six times, the first one was discarded. The reported numbers are the average of the remaining five runs.

We’ve lost another 2-5MB/s to the copy process. Overall it seems that the SSD performs just a bit better than the 3.5″ HDD but remember than none these tests includes the spin-up time of HDDs.

Reading Data from the NAS

We now do the same test as before but instead of writing the archive or photos to the NAS we read from it. When reading from the NAS we first have to look up the symbolic link on the Samba share and then the actual file on the storage drive. This redirection adds overheads (see this post for details).

In the previous section on writing to the NAS I used the copy command cp because it performed a bit better than rsync. In this section on reading from the NAS rsync has been much faster. So the numbers below refer to copying data with rsync.

When I ran the test on the first day I got the following speeds:

DriveScenario 1
Speed in MB/s
Scenario 2
Speed in MB/s
SSD92.2363.31
3.5″ HDD95.4670.89
Reading data from the Samba share. All data was read six times, the first one was discarded. The reported numbers are the average of the remaining five runs.

I was a bit surprised that the 3.5″ HDD performed a little better than the SSD. So on the next day I ran the test again and got these results:

DriveScenario 1
Speed in MB/s
Scenario 2
Speed in MB/s
SSD100.5773.17
3.5″ HDD100.3776.50
Reading data from the Samba share. All data was read six times, the first one was discarded. The reported numbers are the average of the remaining five runs.

This time the router was used exclusively by the NAS. Again, there is no major difference in the read speed between the SSD and the 3.5″ HDD (although it does look like the HDD may have the edge, again this doesn’t include spin-up times).

The Impact of Access Time

So far our tests have excluded the time it takes HDDs to spin up. And that’s because the spin-up time is a one-off cost. It will always be the same, irrespective of how many bytes we copy. So it doesn’t make sense to include that time in an average write or read test (the results would change depending on the number of bytes written or read).

However, the spin-up time does matter. In the second part of this series we roughly estimated it for my HDDs. Both the 3.5″ and the 2.5″ HDDs took about 3.5 seconds to spin up.

So what does that mean for results we got above? The tables in the previous sections only presented average speeds. In absolute numbers each of the tests took somewhere between 3.5-5 minutes to complete. So spin-up adds a few seconds (1.2-1.7 percent) to that.

I think I’d barely notice if I have to wait for a few seconds longer when copying a large file or folder. However, I did notice the increased access time when browsing through the Samba shares. With the SSD the folder structure loads almost instantly, with the 3.5″ HDD there’s a small but noticeable lag (which can be annoying).

Alternative Disk Layouts

Lot’s of other disk layouts are possible. If you want to further improve write and especially read speeds then you could consider getting one SSD for every HDD you use. Greyhole has an option to stick shares to one or more specific disk(s) with options sticky files and stick_into. The symbolic links created on the Samba share always point to the first drive you defined. So you could stick each share 1) to an SSD and 2) to a HDD7. In this way all read operations from the NAS access only SSDs (but there’s still the speed penalty of using a USB hub with the RPi 4b). The HDDs are then only used for data redundancy. The downside is that using more SSDs will make the NAS more expensive.

Decision Time

I think both SSD and 3.5″ HDD produce very good speeds. Writing data to the NAS at 80-100MB/s and reading at 60-100MB/s is certainly fast enough for my purpose.

It looks like using the SSD for storing Samba shares results in slightly faster write speeds and very similar read speeds. Browsing through the Samba shares is a lot smoother with the SSD. For me the differences are small enough so that I have no preference for either of the two.

For my specific setup I picked the 3.5″ HDD because I want to use my SSD for another purpose (backing up photos when I’m on trips).

So that’s all for our speed tests. In the next post we’ll look at which power supply to use for our Raspberry Pi.


Footnotes:

  1. Some file systems, like Ext4, leave (a little) space between files to allow them to grow. So data is not actually saved in perfectly adjacent locations. Also note that if the files are adjacent to each other then they are adjacent in the logical abstraction of the disk. The physical locations may not be adjacent (that’s up to the disk controller). ↩︎
  2. see https://en.wikipedia.org/wiki/Hard_disk_drive_performance_characteristics for more details ↩︎
  3. For more details see the manual with man rsync or have a look at this website. ↩︎
  4. I decided to use the fastest commands, I’m not sure if that’s the right decision. ↩︎
  5. I don’t know the exact cause of the lock-up of the 2.5″ HDD. No problems occurred with the 3.5″ HDD but that drive is self-powered and built for different requirements (the 2.5″ drive probably had to make some compromises to allow for its improved portability). ↩︎
  6. These 4MB/s seem a bit high. But to investigate this further I’d have to spent more time testing and by the end of the second day I didn’t really want to spent much more time on it. I hope you understand. ↩︎
  7. There are more options. For each share you can select to which set of drives data shall be saved using drive_selection_groups. So you could, for example, set a drive selection group for a share which contains one SSD and all HDDs. Then stick the share into the SSD so all symbolic links point to the SSD for fast reading and writing (a free HDD is then used for additional file copies). ↩︎

This post is tagged as:



Comments

Leave a Reply

Your email address will not be published. Required fields are marked *