{"id":414,"date":"2023-12-19T10:07:10","date_gmt":"2023-12-19T10:07:10","guid":{"rendered":"http:\/\/192.168.8.136\/wordpress\/?p=414"},"modified":"2024-01-25T14:23:20","modified_gmt":"2024-01-25T13:23:20","slug":"rpi-nas-extras-hdd-s-m-a-r-t","status":"publish","type":"post","link":"https:\/\/mpr-projects.com\/index.php\/2023\/12\/19\/rpi-nas-extras-hdd-s-m-a-r-t\/","title":{"rendered":"RPi NAS: Extras &#8211; HDD S.M.A.R.T."},"content":{"rendered":"\n<p>S.M.A.R.T. stands for Self-Monitoring, Analysis, and Reporting Technology. Practically that means that HDDs that support SMART can run self-tests to diagnose problems and report them to us. Although Greyhole can provide our NAS with redundancy, i.e. no data is lost if an HDD fails, it&#8217;s still good to know how healthy our HDDs are before we decide to use them and once they&#8217;re in operation. So in this post we&#8217;ll have a little look at how we can use SMART.<\/p>\n\n\n<p>This post is part of a series about building a Network-Attached Storage (NAS) with redundancy using a Raspberry Pi (RPi). See <a href=\"https:\/\/mpr-projects.com\/index.php\/2023\/11\/13\/building-a-raspberry-pi-nas-with-data-redundancy-part-1-overview\/#RPi_NAS_Post_List\" data-type=\"post\" data-id=\"8\">here<\/a> for a list of all posts in this series.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" style=\"margin-top:var(--wp--preset--spacing--30);margin-bottom:var(--wp--preset--spacing--30)\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Basics<\/h2>\n\n\n\n<p>We&#8217;re going to use <a href=\"https:\/\/www.smartmontools.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">smartmontools<\/a><sup data-fn=\"a83e52dc-3e27-4409-8b0c-9b93d2fc1117\" class=\"fn\"><a href=\"#a83e52dc-3e27-4409-8b0c-9b93d2fc1117\" id=\"a83e52dc-3e27-4409-8b0c-9b93d2fc1117-link\">1<\/a><\/sup> to access the SMART features of our HDDs. Let&#8217;s plug in a HDD and have a look what information we can get. Note, there&#8217;s no need to mount the drive once it&#8217;s connected to the computer (although it won&#8217;t hurt if you do). The drive I&#8217;ve just connected, which I salvaged from an old Laptop, has path <em>\/dev\/sdb<\/em>. We can get some basic information about it with<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo smartctl -i \/dev\/sdb<\/code><\/pre>\n\n\n\n<p>which should return an output as shown below:<\/p>\n\n\n\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;69e79629420ea&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"69e79629420ea\" class=\"wp-block-image size-large wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"617\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" src=\"https:\/\/mpr-projects.com\/wp-content\/uploads\/2023\/12\/SMART_1-edited-1024x617.png\" alt=\"\" class=\"wp-image-562\" srcset=\"https:\/\/mpr-projects.com\/wp-content\/uploads\/2023\/12\/SMART_1-edited-1024x617.png 1024w, https:\/\/mpr-projects.com\/wp-content\/uploads\/2023\/12\/SMART_1-edited-300x181.png 300w, https:\/\/mpr-projects.com\/wp-content\/uploads\/2023\/12\/SMART_1-edited-768x463.png 768w, https:\/\/mpr-projects.com\/wp-content\/uploads\/2023\/12\/SMART_1-edited.png 1131w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\taria-label=\"Enlarge\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.imageButtonRight\"\n\t\t\tdata-wp-style--top=\"state.imageButtonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewBox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n\n\n\n<p>We&#8217;re mainly looking at the last two lines, which tell us that SMART is available and enabled. If it&#8217;s available but not enabled then you can enable it with<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo smartctl --smart=on \/dev\/sdb<\/code><\/pre>\n\n\n\n<p>Let&#8217;s now check if a selftest has been run before. We can use the command below to list selftests.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo smartctl -l selftest \/dev\/sdb<\/code><\/pre>\n\n\n\n<p>I have never run a selftest on this drive so the output is<\/p>\n\n\n\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;69e7962943760&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"69e7962943760\" class=\"wp-block-image size-large wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"327\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" src=\"https:\/\/mpr-projects.com\/wp-content\/uploads\/2023\/12\/SMART_2-edited-1024x327.png\" alt=\"\" class=\"wp-image-563\" srcset=\"https:\/\/mpr-projects.com\/wp-content\/uploads\/2023\/12\/SMART_2-edited-1024x327.png 1024w, https:\/\/mpr-projects.com\/wp-content\/uploads\/2023\/12\/SMART_2-edited-300x96.png 300w, https:\/\/mpr-projects.com\/wp-content\/uploads\/2023\/12\/SMART_2-edited-768x245.png 768w, https:\/\/mpr-projects.com\/wp-content\/uploads\/2023\/12\/SMART_2-edited.png 1134w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\taria-label=\"Enlarge\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.imageButtonRight\"\n\t\t\tdata-wp-style--top=\"state.imageButtonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewBox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n\n\n\n<p>So let&#8217;s run a test to see how the drive is doing. There are a few options available. Let&#8217;s start with a short test. According to the man pages the short test &#8220;check[s] the electrical and mechanical performance as well as the read performance of the disk&#8221;.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo smartctl -t short \/dev\/sdb<\/code><\/pre>\n\n\n\n<p>This command starts the test and tells us to come back later (see the image below). The short selftest is fast (2 minutes for this drive) and it runs in the background. If you&#8217;ve mounted the drive then you can still use it during the self-test, although it may be slower than without the selftest. You can also run the selftest in the foreground with command <code>sudo smartctl -t short -C \/dev\/sdb<\/code>. In that case no partitions of the drive should be mounted.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"299\" src=\"https:\/\/mpr-projects.com\/wp-content\/uploads\/2023\/12\/SMART_3-1024x299.png\" alt=\"\" class=\"wp-image-554\" srcset=\"https:\/\/mpr-projects.com\/wp-content\/uploads\/2023\/12\/SMART_3-1024x299.png 1024w, https:\/\/mpr-projects.com\/wp-content\/uploads\/2023\/12\/SMART_3-300x88.png 300w, https:\/\/mpr-projects.com\/wp-content\/uploads\/2023\/12\/SMART_3-768x224.png 768w, https:\/\/mpr-projects.com\/wp-content\/uploads\/2023\/12\/SMART_3.png 1430w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>After about two minutes we can list the selftests again.<\/p>\n\n\n\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;69e7962944db6&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"69e7962944db6\" class=\"wp-block-image size-large wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"272\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" src=\"https:\/\/mpr-projects.com\/wp-content\/uploads\/2023\/12\/SMART_4-1024x272.png\" alt=\"\" class=\"wp-image-555\" srcset=\"https:\/\/mpr-projects.com\/wp-content\/uploads\/2023\/12\/SMART_4-1024x272.png 1024w, https:\/\/mpr-projects.com\/wp-content\/uploads\/2023\/12\/SMART_4-300x80.png 300w, https:\/\/mpr-projects.com\/wp-content\/uploads\/2023\/12\/SMART_4-768x204.png 768w, https:\/\/mpr-projects.com\/wp-content\/uploads\/2023\/12\/SMART_4.png 1438w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\taria-label=\"Enlarge\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.imageButtonRight\"\n\t\t\tdata-wp-style--top=\"state.imageButtonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewBox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n\n\n\n<p>According to the last line the test was completed without an error. That&#8217;s great! Let&#8217;s try to get a bit more information by running<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo smartctl -x \/dev\/sdb<\/code><\/pre>\n\n\n\n<p>The output of that command can be quite long. I&#8217;m going to focus on what I think is most important for us.<\/p>\n\n\n\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;69e796294537b&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"69e796294537b\" class=\"wp-block-image size-large wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"761\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" src=\"https:\/\/mpr-projects.com\/wp-content\/uploads\/2023\/12\/SMART_5-1024x761.png\" alt=\"\" class=\"wp-image-556\" srcset=\"https:\/\/mpr-projects.com\/wp-content\/uploads\/2023\/12\/SMART_5-1024x761.png 1024w, https:\/\/mpr-projects.com\/wp-content\/uploads\/2023\/12\/SMART_5-300x223.png 300w, https:\/\/mpr-projects.com\/wp-content\/uploads\/2023\/12\/SMART_5-768x571.png 768w, https:\/\/mpr-projects.com\/wp-content\/uploads\/2023\/12\/SMART_5.png 1192w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\taria-label=\"Enlarge\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.imageButtonRight\"\n\t\t\tdata-wp-style--top=\"state.imageButtonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewBox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n\n\n\n<p>This is a table that describes certain attributes of the drive. The first point to note is that these attributes are not standardized. So there is no document that describes exactly what the attributes mean. Their exact meaning is set by each manufacturer and that information is typically not published. So the table is not as useful as it could be but there&#8217;s still quite some information it it.<\/p>\n\n\n\n<p>The second column contains the name of the attribute. The last column its <em>RAW_VALUE<\/em>. That value typically has some physical meaning. For example, <em>RAW_VALUE<\/em> in the row with id 194 shows a value of 30, which means that the temperature of the drive was 30 degrees Celsius.<\/p>\n\n\n\n<p>The raw values then get converted to some encoded value <em>VALUE<\/em>. That conversion takes place inside the drive so how exactly that happens is up to the manufacturer. The values can be in the range 1 to 253. A higher value is usually better. Some manufacturers don&#8217;t use 253 as the highest value for all attributes but some other value, for example 100. <em>WORST<\/em> refers to the worst value that was recorded for an attribute<sup data-fn=\"6fb03318-83a8-4b9d-99e2-e31a35cda532\" class=\"fn\"><a href=\"#6fb03318-83a8-4b9d-99e2-e31a35cda532\" id=\"6fb03318-83a8-4b9d-99e2-e31a35cda532-link\">2<\/a><\/sup>. Finally, <em>THRESH<\/em> contains a threshold value which is set by the manufacturer. If <em>VALUE<\/em> falls below this threshold then there is an increased risk of drive failure.<\/p>\n\n\n\n<p>Look <a href=\"https:\/\/en.wikipedia.org\/wiki\/Self-Monitoring,_Analysis_and_Reporting_Technology#Known_ATA_S.M.A.R.T._attributes\">h<\/a><a href=\"https:\/\/en.wikipedia.org\/wiki\/Self-Monitoring,_Analysis_and_Reporting_Technology#Known_ATA_S.M.A.R.T._attributes\" target=\"_blank\" rel=\"noreferrer noopener\">e<\/a><a href=\"https:\/\/en.wikipedia.org\/wiki\/Self-Monitoring,_Analysis_and_Reporting_Technology#Known_ATA_S.M.A.R.T._attributes\">re<\/a> for a general description of these attributes (remember, their exact meaning varies by manufacturer). A few of them, like <em>Read Error Rate<\/em>, are quite important because they can indicate imminent disk failures.<\/p>\n\n\n\n<p>I&#8217;ve also run the commands above with another old HDD and I got the following result. You can see in the row with id 1 that the drive has failed.<\/p>\n\n\n\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;69e79629459f9&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"69e79629459f9\" class=\"wp-block-image size-large wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"849\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" src=\"https:\/\/mpr-projects.com\/wp-content\/uploads\/2023\/12\/SMART_6-1024x849.png\" alt=\"\" class=\"wp-image-567\" srcset=\"https:\/\/mpr-projects.com\/wp-content\/uploads\/2023\/12\/SMART_6-1024x849.png 1024w, https:\/\/mpr-projects.com\/wp-content\/uploads\/2023\/12\/SMART_6-300x249.png 300w, https:\/\/mpr-projects.com\/wp-content\/uploads\/2023\/12\/SMART_6-768x637.png 768w, https:\/\/mpr-projects.com\/wp-content\/uploads\/2023\/12\/SMART_6.png 1182w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\taria-label=\"Enlarge\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.imageButtonRight\"\n\t\t\tdata-wp-style--top=\"state.imageButtonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewBox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">About Load Cycle Count<\/h3>\n\n\n\n<p>I&#8217;ve seen a few discussion in online forums about <em>load cycle count<\/em>. Remember that HDDs have a <em>head<\/em> which moves around to read and write data. When a disk is not used then the head can be moved into a <em>parking position<\/em> where it&#8217;s not directly above the disk. This is done to protect the disk from damage<sup data-fn=\"b5d87284-e743-487d-b1ff-e4ffa494ac1a\" class=\"fn\"><a href=\"#b5d87284-e743-487d-b1ff-e4ffa494ac1a\" id=\"b5d87284-e743-487d-b1ff-e4ffa494ac1a-link\">3<\/a><\/sup>. <em>Load cycle count<\/em> counts how often the head has been parked. Repeatedly parking the head (many thousands of times) can cause some wear and tear. Disks are typically rated for a certain number of load cycles. For example, depending on the model, WD HDDs are typically rated for 350,000 to 600,000 cycles. That&#8217;s quite a large number.<\/p>\n\n\n\n<p>In the main part of this series I assumed that we access our NAS three times a day (which is roughly my access pattern). If the head gets parked after each access then we have three load cycles per day. That&#8217;s 1095 per year, so we&#8217;ll only reach 350k cycles after 319 years. If we access the disk once every 10 minutes then we still have more than 6.5 years before we reach 350k load cycles. So under normal circumstances we shouldn&#8217;t have to worry about the load cycle count in our NAS.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Long Selftest<\/h2>\n\n\n\n<p>In the first section we ran a short selftest. Now let&#8217;s run a more extensive test. This test is similar to the short test but it goes over the drive a lot more thoroughly.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo smartctl -t long \/dev\/sdb<\/code><\/pre>\n\n\n\n<p>It can take many hours for this test to finish. So it may be a good idea to run it overnight. Note, depending on your setup, the test may get interrupted and <code>sudo smartctl -l selftest \/dev\/sdb<\/code> may report an error <em>Interrupted (host reset)<\/em>. In my case it worked fine on the Raspberry Pi but not on my main computer. We can prevent this error by running<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo watch -d--cumulative -n 15 smartctl -a \/dev\/sdb<\/code><\/pre>\n\n\n\n<p>right after we start the extended test. This command will run <code>sudo smartctl -a \/dev\/sdb<\/code> every 15 seconds and it will highlight any changes in the output (so we can see the progress of the test). We don&#8217;t have to use option <em>smartctl<\/em> with option <em>-a<\/em>, we could use any command that queries some information from the drive. The important point is that we regularly query the drive so that it&#8217;s kept awake.<\/p>\n\n\n\n<p>Note, the command doesn&#8217;t stop when the selftest has finished. You&#8217;ll have to do that manually with Ctrl-c.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Scheduling Regular Checks<\/h2>\n\n\n\n<p>We probably don&#8217;t want to run the selfcheck manually for every drive in our NAS. So let&#8217;s automate those checks. I&#8217;m going to run a long self test once a month. We could use the systemd service called <em>smartd<\/em> that&#8217;s included in <em>smartmontools<\/em>. However, I don&#8217;t think it&#8217;s appropriate for our NAS so we&#8217;ll first go over using <em>cron<\/em> for scheduling checks and then briefly discuss why we&#8217;re not using <em>smartd<\/em>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Cron<\/h3>\n\n\n\n<p>Cron is a utility that allows us to schedule when certain programs should run. We&#8217;ll use it to periodically (monthly) run a script that<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>starts a self test,<\/li>\n\n\n\n<li>keeps the drive awake while the script runs and<\/li>\n\n\n\n<li>writes the output to one or multiple files on the file system.<\/li>\n<\/ul>\n\n\n\n<p>If you set up email on your RPi then you&#8217;ll receive the result of the self test in your inbox. Unfortunately my email provider ProtonMail doesn&#8217;t support smtp-sending on individual plans (not even on the Unlimited plan!). I don&#8217;t want to set up gmail or any other &#8216;free&#8217; <sup data-fn=\"14552963-097e-4b92-8374-73921d6dc44f\" class=\"fn\"><a href=\"#14552963-097e-4b92-8374-73921d6dc44f\" id=\"14552963-097e-4b92-8374-73921d6dc44f-link\">4<\/a><\/sup> provider on my RPi so I&#8217;m going to write the result into files on each Samba share. The script can be found <a href=\"https:\/\/github.com\/mpr-projects\/RPi-NAS\/blob\/main\/nas_smart_test.py\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a>. It&#8217;s a bit longer because it contains lots of checks. The script has the following signature:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>nas_smart_test.py type device polling_interval output_path --output_into_subfolders<\/code><\/pre>\n\n\n\n<p><em>Type<\/em> refers to the type of self test. We&#8217;ll be using <em>long<\/em> for our monthly tests. <em>Device<\/em> is a path to the drive that should run the self test. Ideally we use a path that uniquely identifies the disk. Common labels like <em>\/dev\/sdb<\/em> are not fixed so they are not ideal for this usage. Let&#8217;s instead use the disk id, which we can get with <code>ls -l \/dev\/disk\/by-id<\/code>. This command will return something like<\/p>\n\n\n\n<p class=\"has-text-align-center\">ata-WDC_WD19SDRW-45VUVS2_WD-WX33EC15ZMDN -&gt; ..\/..\/sdb<\/p>\n\n\n\n<p>which tells us the id of drive <em>sdb<\/em>. Note, we want the id of the entire drive, not of a partition, so don&#8217;t use the lines with <em>-partX<\/em> at the end.<\/p>\n\n\n\n<p><em>Polling_interval<\/em> refers to the number of seconds between subsequent calls to the disk. This has to be small enough so the disk doesn&#8217;t go to sleep or gets disconnected. My drives go to sleep after 30 minutes. Below I&#8217;ll set the polling interval to 20 minutes.<\/p>\n\n\n\n<p><em>Output_path<\/em> refers to the directory where the result of the self test should be saved. If option <em>&#8211;output_into_subfolders<\/em> is specified then the results will not be saved in <em>output_path<\/em> but in each of the subfolders of <em>output_path<\/em> (so we can save the result in each Samba share).<\/p>\n\n\n\n<p>Let&#8217;s save the script under <em>\/usr\/local\/bin\/nas_smart_test.py<\/em> and make sure it&#8217;s executable. If it isn&#8217;t then let&#8217;s make it executable with <code>sudo chmod +x nas_smart_test.py<\/code>. Finally, the command we&#8217;ll run is:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\/usr\/local\/bin\/nas_smart_test.py long \/dev\/disk\/by-id\/ata-WDC_WD19SDRW-45VUVS2_WD-WX33EC15ZMDN 1200 \/nas_mounts\/hdd1\/shares --output_into_subfolders<\/code><\/pre>\n\n\n\n<p>The only thing left to do is to add this command to cron. To specify a command for cron we run <code>crontab -e<\/code>. Since we need to run our command with root privileges we&#8217;ll do<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo crontab -e<\/code><\/pre>\n\n\n\n<p>The structure of crontab is very simple. We need to enter five integers that specify when the command should be run, followed by the actual command. The integers refer to <em>minute<\/em>, <em>hour<\/em>, <em>day of the month<\/em>, <em>month<\/em> and <em>day of the week<\/em>. We don&#8217;t have to specify all values. If we want to leave one unspecified<sup data-fn=\"12d1a11a-3fa3-40b7-8b97-c26f028af364\" class=\"fn\"><a href=\"#12d1a11a-3fa3-40b7-8b97-c26f028af364\" id=\"12d1a11a-3fa3-40b7-8b97-c26f028af364-link\">5<\/a><\/sup> then we type <em>*<\/em>. We&#8217;re going to run the self test at 2am on the first of each month. So the entry in crontab is<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>0 2 1 * * \/usr\/local\/bin\/nas_smart_test.py long \/dev\/disk\/by-id\/ata-WDC_WD19SDRW-45VUVS2_WD-WX33EC15ZMDN 1200 \/nas_mounts\/hdd1\/shares --output_into_subfolders<\/code><\/pre>\n\n\n\n<p>Save and exit. It may be a good idea to check if our Raspberry Pi uses the correct time. If you&#8217;re not using a GUI, run <code>timedatectl<\/code> to see if the local time is correct (cron uses local time)<sup data-fn=\"ef2e7653-9ff4-41d5-88d7-9e06dde16f40\" class=\"fn\"><a href=\"#ef2e7653-9ff4-41d5-88d7-9e06dde16f40\" id=\"ef2e7653-9ff4-41d5-88d7-9e06dde16f40-link\">6<\/a><\/sup>.<\/p>\n\n\n\n<p>From now on, every month a long self test will be run and the result will be saved on each share. You can always adjust the options to your liking (e.g. only save the results to one Samba share).<\/p>\n\n\n\n<p>If you&#8217;re interested in why we&#8217;re not using <em>smartd<\/em> then keep on reading the next section.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Smartd<\/h3>\n\n\n\n<p>As mentioned above, I don&#8217;t think <em>smartd<\/em> is suitable for our NAS. By default, when <em>smartd<\/em> is active, it polls disks every 30 minutes. This prevents the disks from going to sleep. In my NAS, which sees only light usage, disks <em>should<\/em> go to sleep when they&#8217;re not being used. We can add the following line into the file <em>\/etc\/default\/smartmontools<\/em> to extend the polling interval to, for example, one hour (1h = 3600 seconds).<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>smartd_opts=\"--interval=3600\"<\/code><\/pre>\n\n\n\n<p>We could increase that interval to a much larger number, e.g. to one month, so that <em>smartd<\/em> only wakes up our drives when needed<sup data-fn=\"d5d0a87f-5bb5-4b29-8e3e-bf5fe391e604\" class=\"fn\"><a href=\"#d5d0a87f-5bb5-4b29-8e3e-bf5fe391e604\" id=\"d5d0a87f-5bb5-4b29-8e3e-bf5fe391e604-link\">7<\/a><\/sup>. I do find that approach quite hacky and messy though.<\/p>\n\n\n\n<p>We can specify some <em>smartd<\/em> settings by creating or editing the file <em>\/etc\/smartd.conf<\/em>. One option you&#8217;d think would help here is<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>     \n-n POWERMODE&#091;,N]&#091;,q]<\/code><\/pre>\n\n\n\n<p>where POWERMODE can be, for example, <em>standby<\/em>. This option prevents <em>smartd<\/em> from polling a disk that&#8217;s in standby or sleep mode. Sounds perfect for our situation but it only works for ATA devices. My HDDs are SAT devices (ATA behind an SCSI Translation Layer) so this option has no effect.<\/p>\n\n\n\n<p>Most of my HDDs are behind a powered USB Hub which prevents them from going into low-power mode. So we could use <em>smartd<\/em> for those. But there&#8217;s also a HDD connected directly to the RPi for which <em>smartd<\/em> won&#8217;t work. Overall, it seems a lot easier to just use <em>cron<\/em> for all devices.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" style=\"margin-top:var(--wp--preset--spacing--30);margin-bottom:var(--wp--preset--spacing--30)\"\/>\n\n\n\n<p>Footnotes:<\/p>\n\n\n<ol class=\"wp-block-footnotes\"><li id=\"a83e52dc-3e27-4409-8b0c-9b93d2fc1117\">To install run <code>sudo apt install smartmontools<\/code> on the Raspberry Pi or <code>sudo pacman -S smartmontools<\/code> on Arch. The package name will be similar on other systems. <a href=\"#a83e52dc-3e27-4409-8b0c-9b93d2fc1117-link\" aria-label=\"Jump to footnote reference 1\">\u21a9\ufe0e<\/a><\/li><li id=\"6fb03318-83a8-4b9d-99e2-e31a35cda532\">Since <em>WORST<\/em> refers to the worst value we should ask what <em>VALUE<\/em> refers to. Is it the latest value? An average? I don&#8217;t know what it is. If you do, please let me know in the comments below. <a href=\"#6fb03318-83a8-4b9d-99e2-e31a35cda532-link\" aria-label=\"Jump to footnote reference 2\">\u21a9\ufe0e<\/a><\/li><li id=\"b5d87284-e743-487d-b1ff-e4ffa494ac1a\">If the head is away from the disk then it won&#8217;t touch its surface when the disk is spun-up or -down. And if there&#8217;s a shock to the disk (e.g. if you knock it over) then the head also won&#8217;t be able to damage it&#8217;s surface.  <a href=\"#b5d87284-e743-487d-b1ff-e4ffa494ac1a-link\" aria-label=\"Jump to footnote reference 3\">\u21a9\ufe0e<\/a><\/li><li id=\"14552963-097e-4b92-8374-73921d6dc44f\">They are not really free, you do pay with your data. <a href=\"#14552963-097e-4b92-8374-73921d6dc44f-link\" aria-label=\"Jump to footnote reference 4\">\u21a9\ufe0e<\/a><\/li><li id=\"12d1a11a-3fa3-40b7-8b97-c26f028af364\">If a value is unspecified then it will always match. For example, if you don&#8217;t specify month then the command will run every month (as long as the other values match). <a href=\"#12d1a11a-3fa3-40b7-8b97-c26f028af364-link\" aria-label=\"Jump to footnote reference 5\">\u21a9\ufe0e<\/a><\/li><li id=\"ef2e7653-9ff4-41d5-88d7-9e06dde16f40\">If it&#8217;s not correct then that could be because not enough time is allowed for the server to respond to time update requests. You could try to open <em><code>\/etc\/systemd\/timesyncd.conf<\/code><\/em> and increase <em><code>RootDistanceMaxSec<\/code><\/em> to a larger value, e.g. 15 or 30. Restart the service with <code>sudo systemctl restart systemd-timesyncd.service<\/code>. <a href=\"#ef2e7653-9ff4-41d5-88d7-9e06dde16f40-link\" aria-label=\"Jump to footnote reference 6\">\u21a9\ufe0e<\/a><\/li><li id=\"d5d0a87f-5bb5-4b29-8e3e-bf5fe391e604\">Smartd will run our job after a poll. So if we want the checks to be run at 2am but smartd does its monthly poll at 5pm then your check will run at around 5pm. So we&#8217;d have to be very careful to time smartd&#8217;s polling well. This approach will be even less suitable if we have multiple jobs that should run at different days and times.  <a href=\"#d5d0a87f-5bb5-4b29-8e3e-bf5fe391e604-link\" aria-label=\"Jump to footnote reference 7\">\u21a9\ufe0e<\/a><\/li><\/ol>","protected":false},"excerpt":{"rendered":"<p>S.M.A.R.T. stands for Self-Monitoring, Analysis, and Reporting Technology. Practically that means that HDDs that support SMART can run self-tests to diagnose problems and report them to us. Although Greyhole can provide our NAS with redundancy, i.e. no data is lost if an HDD fails, it&#8217;s still good to know how healthy our HDDs are before [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":772,"comment_status":"open","ping_status":"open","sticky":false,"template":"wp-custom-template-single-with-sidebar-1","format":"standard","meta":{"_eb_attr":"","footnotes":"[{\"content\":\"To install run <code>sudo apt install smartmontools<\/code> on the Raspberry Pi or <code>sudo pacman -S smartmontools<\/code> on Arch. The package name will be similar on other systems.\",\"id\":\"a83e52dc-3e27-4409-8b0c-9b93d2fc1117\"},{\"content\":\"Since <em>WORST<\/em> refers to the worst value we should ask what <em>VALUE<\/em> refers to. Is it the latest value? An average? I don't know what it is. If you do, please let me know in the comments below.\",\"id\":\"6fb03318-83a8-4b9d-99e2-e31a35cda532\"},{\"content\":\"If the head is away from the disk then it won't touch its surface when the disk is spun-up or -down. And if there's a shock to the disk (e.g. if you knock it over) then the head also won't be able to damage it's surface. \",\"id\":\"b5d87284-e743-487d-b1ff-e4ffa494ac1a\"},{\"content\":\"They are not really free, you do pay with your data.\",\"id\":\"14552963-097e-4b92-8374-73921d6dc44f\"},{\"content\":\"If a value is unspecified then it will always match. For example, if you don't specify month then the command will run every month (as long as the other values match).\",\"id\":\"12d1a11a-3fa3-40b7-8b97-c26f028af364\"},{\"content\":\"If it's not correct then that could be because not enough time is allowed for the server to respond to time update requests. You could try to open <em><code>\/etc\/systemd\/timesyncd.conf<\/code><\/em> and increase <em><code>RootDistanceMaxSec<\/code><\/em> to a larger value, e.g. 15 or 30. Restart the service with <code>sudo systemctl restart systemd-timesyncd.service<\/code>.\",\"id\":\"ef2e7653-9ff4-41d5-88d7-9e06dde16f40\"},{\"content\":\"Smartd will run our job after a poll. So if we want the checks to be run at 2am but smartd does its monthly poll at 5pm then your check will run at around 5pm. So we'd have to be very careful to time smartd's polling well. This approach will be even less suitable if we have multiple jobs that should run at different days and times. \",\"id\":\"d5d0a87f-5bb5-4b29-8e3e-bf5fe391e604\"}]"},"categories":[3,10],"tags":[9,7,4,5],"class_list":["post-414","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-linux","category-projects","tag-data-safety","tag-greyhole","tag-linux","tag-raspberry-pi"],"_links":{"self":[{"href":"https:\/\/mpr-projects.com\/index.php\/wp-json\/wp\/v2\/posts\/414","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mpr-projects.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mpr-projects.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mpr-projects.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mpr-projects.com\/index.php\/wp-json\/wp\/v2\/comments?post=414"}],"version-history":[{"count":43,"href":"https:\/\/mpr-projects.com\/index.php\/wp-json\/wp\/v2\/posts\/414\/revisions"}],"predecessor-version":[{"id":1569,"href":"https:\/\/mpr-projects.com\/index.php\/wp-json\/wp\/v2\/posts\/414\/revisions\/1569"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/mpr-projects.com\/index.php\/wp-json\/wp\/v2\/media\/772"}],"wp:attachment":[{"href":"https:\/\/mpr-projects.com\/index.php\/wp-json\/wp\/v2\/media?parent=414"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mpr-projects.com\/index.php\/wp-json\/wp\/v2\/categories?post=414"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mpr-projects.com\/index.php\/wp-json\/wp\/v2\/tags?post=414"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}