SMART Disk Failure Prediction

SMART (sometimes written as S.M.A.R.T.) is a monitoring technology that measures disk drive “health parameters.” It stands for Self-monitoring, Analysis, and Reporting Technology and is a standard that hardware manufacturers implement to monitor reliability and anticipate failures in hard disk drives (HDDs) and solid-state drives (SSDs). Do note that the metrics, specific analysis methods, and how SMART attributes are read or interpreted differ between various manufacturers.

When SMART attributes point to a possible disk drive failure, a user can choose to replace the affected drive to mitigate data loss and outage. Manufacturers can also use recorded SMART data as input to future drive designs.

Admittedly, not all disk failures can be anticipated by SMART. It is still, however, a very useful baseline to monitor drive health in terms of predictable failures resulting from mechanical wear and gradual degradation of storage surfaces.

SMART monitoring of disks running Linux

Every popular Linux distribution includes SMART monitoring. The package is usually called smartmontools and includes at least two executables, which are smartctl utility and smartddaemon.

Smartctl helps you check your current drive state, run drive self-tests, and enable/disable the SMART capability of your drive (sda in the examples below):

To show all SMART information of the drive:

smartctl -a /dev/sda

To show just drive identity info (included in smartctl -a output):

smartctl -i /dev/sda

To do a self-test on the drive:

smartctl -t long /dev/sda

Note: You can opt to use either a long test or a short test. A short test is a very basic test that runs for a minute or two and a long test is a full drive test.

Run the smartd daemon to enable continuous drive monitoring. It notifies you when key SMART attributes drastically change. You can set this so you see the notifications in the system logs and you also receive it via email.

The daemon configuration file is usually located in /etc/smartd.conf or in /etc/smartd/smartd.conf. Everything is commented by default except the DEVICESCAN -H -m root string. For x86 compatible servers with ATA drives, you can only replace the root with your actual email address, e.g. “DEVICESCAN -H -m someguy@example.com”. The daemon you execute using this configuration then monitors all the drives it recognizes. Note that you may lose connectivity to a drive during reboot that is why you should clearly indicate the drives. See the sample configuration below with the comments stripped:

/dev/sda -H -m someguy@example.com
/dev/sdb -H -m someguy@example.com

For daemon to start automatically you have to enable it with utilities, e.g. update-rc.d, chkconfig, systemctl, etc.

We recommend exploring the other capabilities of smartd and smartctl listed in the corresponding manuals of the Linux distribution you are using.

SMART monitoring of disks running Windows

While Windows does notify you if your drive is near failure via a pop-up window, it does not have a built-in tool that shows your hard disk’s SMART data. To get around this, you can install the smartmontools package. Its configuration is also the same as the one we described above for Linux but you need an additional helper utility like Blat so SMART notifications can be emailed to you. There is an open-source graphical user interface for smartctl –GSmartControl.

Runnig command “%SMARTCTL_FOLDER%\smartd.exe” install will install smartd as system service. For more info on setting smartd and blat working together follow thelink.

Disk drive manufacturers usually provide proprietary tools for monitoring SMART attributes on Windows. Do a quick search and find the right one for you.