Monitoring your RAID array and hard disk health status using SMART should be priority No.1 on any system. Early warnings and preventative maintenance will save you time, money and huge headaches in the long run. It’s always a good practice to order your hard drives at different times or from multiple vendors in order to eliminate the chances of a bad production run or firmware issue.
We have had two drives fall out of a 3 disk RAID5 array due to buggy firmware, which only due to some very good luck were we able to rebuild the array and preserve the data. In most cases you can expect the data to be completely unrecoverable where you would resort to a manual system rebuild and data restore from your backups.
Keeping a few spare hard disks around should be budgeted into your costs, especially considering how difficult it can be to find the same model drive if they’re out of production. (Which happens often). And of course, always use external backups along with your RAID.
Installing and Configuring smartctl
We’ll be installing SMART disk monitoring on Citrix Xenserver with RAID5, but setup and configuration should be similar for other Linux distros or non-RAID setups. (Consider using apt-get or yum for installation).
# wget http://mirror.centos.org/centos-5/5.5/os/i386/CentOS/smartmontools-5.38-2.el5.i386.rpm
# rpm -hiv smartmontools-5.38-2.el5.i386.rpm
Edit the /etc/smartd.conf and comment out the DEVICESCAN line while modifying entries in the config to meet your hard disk needs. (In our case we’re using the 3ware/LSI 3w-9xxx driver).
What the following configuration does is check each of the physical disks in our RAID array and performs a long self test every Saturday between 2-3am, 3-4am and 5-6am respectively.
We don’t want to perform tests on more than one disk at a time since it’s in RAID; and since we do full backups on Sunday around this time it’s best to plan around it and use Saturday instead.
If SMART finds any errors at anytime it will send an email in the case of -m firstname.lastname@example.org or send a local message to root if using -m root.
Be sure to check out how to enable emailing on Xenserver.
#DEVICESCAN -H -m root /dev/twa0 -d 3ware,0 -a -s L/../../6/02 -m email@example.com /dev/twa0 -d 3ware,1 -a -s L/../../6/03 -m firstname.lastname@example.org /dev/twa0 -d 3ware,2 -a -s L/../../6/04 -m email@example.com
We’ll make sure that the daemon is turned on if we ever need to reboot.
# chkconfig smartd on
And of course start the service. (start|stop|reload)
# /etc/init.d/smartd start