Unlike ESXi, Xenserver doesn’t really have much health monitoring built in (especially not the free license versions). Xenserver has always been the light weight, thin, streamlined and better performing virtualization product; though as time progresses Citrix is slowly adding more features and modifying their licensing structure to be more competitive. But fore now we’ll handle our monitoring the old fashion way; and that’s perfectly fine – nothing like being an actual systems administrator that knows how to work with the products you employ.
Please be sure to check out our article on S.M.A.R.T disk monitoring as it should be used in conjunction with this article for Xenserver. And of course, always use external backups along with your RAID.
While we’re using a 3ware (LSI) 9650SE RAID card, it should be relatively similar for other physical raid devices. We’ll assume that you already have the correct driver loaded for your RAID card and your Xenserver system works.
Download the Linux CLI tools for your RAID card
First thing you’ll need is a set of command line tools for your RAID card, typically this is provided by the hardware vendor in their support/download section.
3ware (LSI) 9650SE-4LPML tw_cli command line tools for Linux
You can download directly to your xenserver hypervisor using wget, or SCP the tw_cli file from another machine. These command line tools are the core of what we need to monitor the status of our RAID array. There’s no special install instructions, just # chmod 750 the tw_cli file in order to execute it.
Now to check the status of our array manually. (Please see the built in help with your CLI tools for usage)
# ./tw_cli /c0 show Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy ------------------------------------------------------------------------------ u0 RAID-5 OK - - 256K 931.303 RiW ON VPort Status Unit Size Type Phy Encl-Slot Model ------------------------------------------------------------------------------ p0 OK u0 465.76 GB SATA 0 - WDC WD5003ABYX-01WE p1 OK u0 465.76 GB SATA 1 - WDC WD5003ABYX-01WE p2 OK u0 465.76 GB SATA 2 - WDC WD5003ABYX-01WE
This is the basic output which gives us information on our current array. What we’ll be looking for is a change in the status field to “DEGRADED”, as this is when we’d like to be alerted. Other statuses such as “REBUILDING” can also be used to notify when an array is being rebuilt.
Creating a Custom Monitoring Script
Our custom Xenserver RAID monitoring script is pretty basic, as all we really want to know at this point is when it hits a DEGRADED status. Of course emailing us when that happens.
Please be sure to check out our article on setting up Free Xenserver license to use an SMTP server to send email.
#! /bin/bash # monitor-raid.sh checkraid=`/root/tw_cli /c0 show | grep DEGRADED` if [ "$checkraid" == "" ]; then echo "Everything ok" else echo "Error, array DEGRADED"; SUBJECT="RAID ARRAY DEGRADED!" EMAIL="email@example.com" EMAILMESSAGE="/tmp/emailmessage.txt" echo "Below is the output of the CLI tool (tw_cli /c0 show):"> $EMAILMESSAGE echo `/root/tw_cli /c0 show` >>$EMAILMESSAGE /bin/mail -s "$SUBJECT" "$EMAIL" < $EMAILMESSAGE fi
You can use this script as a starting point to your own custom monitoring, though for us this is sufficient. And we setup a cronjob to query our monitor-raid.sh script every 15 minutes. If you’re implementing this on multiple servers, change the email subject accordingly to include the hostname. (Note: make sure monitor-raid.sh is executable)
Creating a CRON job
Creating a cron job is fairly easy, # crontab-e and add the following line (:wq will write and quit when finished)
*/15 * * * * /root/monitor-raid.sh >/dev/null 2>&1
And that’s it. Now our script checks the array every 15 minutes and only emails us when it’s degraded. Simple and effective.
Thanks for sharing! I agree, very simple and effective.
I use different Raid controller – Adaptec – so I tweak your script accordingly.
Anyway, the format the email is not easily readable. See below.
Below is the output of the CLI tool (/usr/StorMan/arcconf getconfig 1 LD):
Controllers found: 1 ———————————————————————- Logical device information ———————————————————————- Logical device number 0 Logical device name : VMs RAID level : 1 Status of logical device : Optimal Size : 285696 MB Read-cache mode : Disabled Write-cache mode : Disabled (write-through) Write-cache setting : Disabled (write-through) Partitioned : Yes Protected by Hot-Spare : No Bootable : Yes Failed stripes : No ——————————————————– Logical device segment information ——————————————————– Segment 0 : Present (0,0) JTVMASPL Segment 1 : Present (0,1) JTVMBAPL Logical device number 1 Logical device name : Data RAID level : 1 Status of logical device : Optimal Size : 476160 MB Read-cache mode : Disabled Write-cache mode : Disabled (write-through) Write-cache setting : Disabled (write-through) Partitioned : No Protected by
Hot-Spare : No Bootable : No Failed stripes : No ——————————————————– Logical device segment information ——————————————————– Segment 0 : Present (0,2) 9QMCMZG8 Segment 1 : Present (0,3) 9QMCR2T3 Command completed successfully.
Any advice on better formatting for the email?