RAID disk monitoring on Xenserver with email alerting

July 8th, 2011 | Posted by techblog in Linux | Virtualization

Unlike ESXi, Xenserver doesn’t really have much health monitoring built in (especially not the free license versions). Xenserver has always been the light weight, thin, streamlined and better performing virtualization product; though as time progresses Citrix is slowly adding more features and modifying their licensing structure to be more competitive. But fore now we’ll handle our monitoring the old fashion way; and that’s perfectly fine – nothing like being an actual systems administrator that knows how to work with the products you employ.

Please be sure to check out our article on S.M.A.R.T disk monitoring as it should be used in conjunction with this article for Xenserver. And of course, always use external backups along with your RAID.

While we’re using a 3ware (LSI) 9650SE RAID card, it should be relatively similar for other physical raid devices. We’ll assume that you already have the correct driver loaded for your RAID card and your Xenserver system works.

Download the Linux CLI tools for your RAID card

First thing you’ll need is a set of command line tools for your RAID card, typically this is provided by the hardware vendor in their support/download section.

3ware (LSI) 9650SE-4LPML tw_cli command line tools for Linux

3ware/LSI Product Lookup

You can download directly to your xenserver hypervisor using wget, or SCP the tw_cli file from another machine. These command line tools are the core of what we need to monitor the status of our RAID array. There’s no special install instructions, just # chmod 750 the tw_cli file in order to execute it.

Now to check the status of our array manually. (Please see the built in help with your CLI tools for usage)

# ./tw_cli /c0 show

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-5    OK             -       -       256K    931.303   RiW    ON     

VPort Status         Unit Size      Type  Phy Encl-Slot    Model
------------------------------------------------------------------------------
p0    OK             u0   465.76 GB SATA  0   -            WDC WD5003ABYX-01WE
p1    OK             u0   465.76 GB SATA  1   -            WDC WD5003ABYX-01WE
p2    OK             u0   465.76 GB SATA  2   -            WDC WD5003ABYX-01WE

This is the basic output which gives us information on our current array. What we’ll be looking for is a change in the status field to “DEGRADED”, as this is when we’d like to be alerted. Other statuses such as “REBUILDING” can also be used to notify when an array is being rebuilt.

Creating a Custom Monitoring Script

Our custom Xenserver RAID monitoring script is pretty basic, as all we really want to know at this point is when it hits a DEGRADED status. Of course emailing us when that happens.

Please be sure to check out our article on setting up Free Xenserver license to use an SMTP server to send email.

#! /bin/bash
# monitor-raid.sh

checkraid=`/root/tw_cli /c0 show | grep DEGRADED`

if [ "$checkraid" == "" ]; then
    echo "Everything ok"
else
    echo "Error, array DEGRADED";

    SUBJECT="RAID ARRAY DEGRADED!"
    EMAIL="you@yourdomain.tld"
    EMAILMESSAGE="/tmp/emailmessage.txt"
    echo "Below is the output of the CLI tool (tw_cli /c0 show):"> $EMAILMESSAGE
    echo `/root/tw_cli /c0 show` >>$EMAILMESSAGE
    /bin/mail -s "$SUBJECT" "$EMAIL" < $EMAILMESSAGE
fi

You can use this script as a starting point to your own custom monitoring, though for us this is sufficient. And we setup a cronjob to query our monitor-raid.sh script every 15 minutes. If you’re implementing this on multiple servers, change the email subject accordingly to include the hostname. (Note: make sure monitor-raid.sh is executable)

Creating a CRON job

Creating a cron job is fairly easy, # crontab-e and add the following line (:wq will write and quit when finished)

*/15 * * * * /root/monitor-raid.sh >/dev/null 2>&1

And that’s it. Now our script checks the array every 15 minutes and only emails us when it’s degraded. Simple and effective.

how to, linux, monitoring, raid, system administration, xenserver

You can follow any responses to this entry through the RSS 2.0 You can leave a response, or trackback.

One Response

Tom says:

Friday, 28th September 2012 at 3:56 am

Hello,

Thanks for sharing! I agree, very simple and effective.
I use different Raid controller – Adaptec – so I tweak your script accordingly.
Anyway, the format the email is not easily readable. See below.

Below is the output of the CLI tool (/usr/StorMan/arcconf getconfig 1 LD):
Controllers found: 1 ———————————————————————- Logical device information ———————————————————————- Logical device number 0 Logical device name : VMs RAID level : 1 Status of logical device : Optimal Size : 285696 MB Read-cache mode : Disabled Write-cache mode : Disabled (write-through) Write-cache setting : Disabled (write-through) Partitioned : Yes Protected by Hot-Spare : No Bootable : Yes Failed stripes : No ——————————————————– Logical device segment information ——————————————————– Segment 0 : Present (0,0) JTVMASPL Segment 1 : Present (0,1) JTVMBAPL Logical device number 1 Logical device name : Data RAID level : 1 Status of logical device : Optimal Size : 476160 MB Read-cache mode : Disabled Write-cache mode : Disabled (write-through) Write-cache setting : Disabled (write-through) Partitioned : No Protected by
Hot-Spare : No Bootable : No Failed stripes : No ——————————————————– Logical device segment information ——————————————————– Segment 0 : Present (0,2) 9QMCMZG8 Segment 1 : Present (0,3) 9QMCR2T3 Command completed successfully.

Any advice on better formatting for the email?

Thanks buddy!

Reply