Hardware RAID

Warning

It is dangerous to play with MegaCli and lsiutil if you don’t know what you are doing. You risk losing your data, backup before doing anything.


Introduction

This guide is meant to help you verify the state of your RAID as well as your hard drives’ health using a Hardware RAID controller (LSI, LSI MegaRaid and 3ware [Deprecated]).

MegaRaid RAID controller

Gathering RAID informations

Prior to verify your RAID state, let’s start by making sure you got a MegaRaid controller:

lspci | grep -i lsi | grep -i megaraid
03:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2108 [Liberator] (rev 05)

This confirms the server has indeed a MegaRaid RAID controller installed.

To gather and list available RAID arrays, you can use MegaCli command:

MegaCli -LDInfo -Lall -aALL
Adapter 0 - Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name :
RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0
Size : 36.321 GB
Sector Size : 512
Mirror Data : 36.321 GB
State : Optimal
Strip Size : 64 KB
Number Of Drives : 2
Span Depth : 1
Default Cache Policy: WriteBack, ReadAdaptive, Cached, Write Cache OK if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Cached, Write Cache OK if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy : Disk's Default
Encryption Type : None
Bad Blocks Exist: No
Is VD Cached: No
 
 
Virtual Drive: 1 (Target Id: 1)
Name :
RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0
Size : 2.727 TB
Sector Size : 512
Mirror Data : 2.727 TB
State : Optimal
Strip Size : 64 KB
Number Of Drives : 2
Span Depth : 1
Default Cache Policy: WriteBack, ReadAdaptive, Cached, Write Cache OK if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Cached, Write Cache OK if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy : Disk's Default
Encryption Type : None
Bad Blocks Exist: No
Is VD Cached: Yes
Cache Cade Type : Read Only
 
Exit Code: 0x00

We can see 2 virtual drives which are composed of 2 physical hard drives each, so a total of 4 physical disks.

In that case, the RAID status shows Optimal which means the RAID is up and running fine.

If it happens the RAID status shows Degraded, we recommend you to verify the hard drives’ state as well.

Checking hard drives’ state

First, you must list the Device Id for each drives in order to fully test them with smartmontools:

MegaCli -PDList -aAll | egrep 'Slot\ Number|Device\ Id|Inquiry\ Data|Raw|Firmware\ state' | sed 's/Slot/\nSlot/g'
 
Slot Number: 0
Device Id: 4
Raw Size: 279.460 GB [0x22eec130 Sectors]
Firmware state: Online, Spun Up
Inquiry Data: BTWL3450062J300PGN  INTEL SSDSC2BB300G4                     D2010355
 
Slot Number: 1
Device Id: 5
Raw Size: 279.460 GB [0x22eec130 Sectors]
Firmware state: Online, Spun Up
Inquiry Data: BTWL345003X6300PGN  INTEL SSDSC2BB300G4                     D2010355
 
Slot Number: 2
Device Id: 7
Raw Size: 2.728 TB [0x15d50a3b0 Sectors]
Firmware state: Online, Spun Up
Inquiry Data:       PN2234P8K2PKDYHGST HUS724030ALA640                    MF8OAA70
 
Slot Number: 3
Device Id: 6
Raw Size: 2.728 TB [0x15d50a3b0 Sectors]
Firmware state: Online, Spun Up
Inquiry Data:       PN2234P8JYP59YHGST HUS724030ALA640                    MF8OAA70

With smartmontools‘s command smartctl, we will test each hard drives like this :

smartctl -d megaraid,N -a /dev/sdX
N:is the drive’s Device ID
X:is the RAID’s Device: /dev/sda = 1st RAID, /dev/sdb = 2nd RAID, etc.

Note

In some situation, you may receive this output:

/dev/sda [megaraid_disk_00] [SAT]: Device open changed type from 'megaraid' to 'sat'

You must then replace megaraid by sat+megaraid:

smartctl -d sat+megaraid,N -a /dev/sdX

Warning

If one of your hard drives is showing SMART errors, perform a full backup of your data as soon as possible and contact our support.

Progression resynchronisation raid

If you had one or more hard drives replaced, the RAID will re-synchronize automatically.

You can use this command to see which hard drives are currently rebuilding :

MegaCli -PDList -aAll | egrep 'Slot\ Number|Device\ Id|Inquiry\ Data|Raw|Firmware\ state' | sed 's/Slot/\nSlot/g'
 
Slot Number: 0
Device Id: 4
Raw Size: 279.460 GB [0x22eec130 Sectors]
Firmware state: Online, Spun Up
Inquiry Data: BTWL3450062J300PGN  INTEL SSDSC2BB300G4                     D2010355
 
Slot Number: 1
Device Id: 5
Raw Size: 279.460 GB [0x22eec130 Sectors]
Firmware state: Online, Spun Up
Inquiry Data: BTWL345003X6300PGN  INTEL SSDSC2BB300G4                     D2010355
 
Slot Number: 2
Device Id: 7
Raw Size: 2.728 TB [0x15d50a3b0 Sectors]
Firmware state: Online, Spun Up
Inquiry Data:       PN2234P8K2PKDYHGST HUS724030ALA640                    MF8OAA70
 
Slot Number: 3
Device Id: 6
Raw Size: 2.728 TB [0x15d50a3b0 Sectors]
Firmware state: Rebuild
Inquiry Data:       PN2234P8JYP59YHGST HUS724030ALA640                    MF8OAA70

To monitor the progress of resync, you can use this command:

MegaCli -PDRbld -ShowProg -PhysDrv [EncID:SlotID] -aALL
EncID:Enclosure ID
SlotID:Slot ID

These values are obtained while listing the hard drives as shown above

CacheCade

Note

CacheCade is a module from LSI used to improve random read performance of hard drives using an SSD as front caching device.

To verify the the CacheCade’s configuration:

MegaCli -CfgCacheCadeDsply -a0

To see which RAID array is associated with the CacheCade:

MegaCli -CfgCacheCadeDsply -a0 | grep "Associated LDs"

LSI RAID controller

Gathering RAID informations

Prior to verify your RAID state, let’s start by making sure you got a LSI RAID controller card:

lspci | grep -i lsi | grep -v megaraid
01:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2004 PCI-Express Fusion-MPT SAS-2 [Spitfire] (rev 03)

This confirms the server has indeed a LSI RAID controller installed.

Note

The grep -v megaraid command serves to remove MegaRaid RAID Controller card from the lspci output as MegaRaid card are made by LSI Corporation as well

To gather and list available RAID array, you can use the lsiutil command:

lsiutil -p1 -a 1,0 21
 
LSI Logic MPT Configuration Utility, Version 1.63-OVH (27a4f9f54c)
 
1 MPT Port found
 
     Port Name         Chip Vendor/Type/Rev    MPT Rev  Firmware Rev  IOC
 1.  ioc0              LSI Logic SAS2004 03      200      13000000     0
 
RAID actions menu, select an option:  [1-99 or e/p/w or 0 to quit] 1
 
 
Volume 0 is DevHandle 011e, Bus 1 Target 0, Type RAID1 (Mirroring)
  Volume Name:
  Volume WWID:  0aaf504551c8efe5
  Volume State:  optimal, enabled, background init complete
  Volume Settings:  write caching disabled, auto configure hot swap enabled
  Volume draws from Hot Spare Pools:  0
  Volume Size 1906394 MB, 2 Members
  Primary is PhysDisk 1 (DevHandle 0009, Bus 0 Target 0)
  Secondary is PhysDisk 0 (DevHandle 000a, Bus 0 Target 1)
 
RAID actions menu, select an option:  [1-99 or e/p/w or 0 to quit] 0

We can see 1 virtual drives which is composed of 2 physical hard drives or “members”, for a total of 4 physical drives.

In this case, the RAID status shows Optimal which means the RAID is up and running fine.

If it happens the RAID status shows Degraded, we recommend you to verify the hard drives’ state as well.

Note

It may happens when you just received the server that you will see [In Progress: data scrub]. This indication isn’t a problem, it is an automated process generated by the controller firmware in order to lower uncorrectable errors as much as possible.

Checking hard drives’ state

To take a look at the hard drives state from the RAID controller, you can use this command:

lsiutil -p1 -a 2,0 21
 
LSI Logic MPT Configuration Utility, Version 1.63-OVH (27a4f9f54c)
 
1 MPT Port found
 
     Port Name         Chip Vendor/Type/Rev    MPT Rev  Firmware Rev  IOC
 1.  ioc0              LSI Logic SAS2004 03      200      13000000     0
 
RAID actions menu, select an option:  [1-99 or e/p/w or 0 to quit] 2
 
 
PhysDisk 0 is DevHandle 000a, Bus 0 Target 1
  PhysDisk State:  optimal
  PhysDisk Size 1906394 MB, Inquiry Data:  ATA      HGST HUS724020AL AA70
  Path 0 is DevHandle 000a, Bus 0 Target 1, online, primary
  Path 1 is DevHandle 000a, invalid
 
PhysDisk 1 is DevHandle 0009, Bus 0 Target 0
  PhysDisk State:  optimal
  PhysDisk Size 1906394 MB, Inquiry Data:  ATA      HGST HUS724020AL AA70
  Path 0 is DevHandle 0009, Bus 0 Target 0, online, primary
  Path 1 is DevHandle 0009, invalid
 
RAID actions menu, select an option:  [1-99 or e/p/w or 0 to quit] 0

In this case both drives show as Optimal.

Since the LSI card uses sg-map, we must test the /dev/sgX (X being the device number like /dev/sg1 for example) corresponding to the hard drives in order to test them with smartmontools.

Here’s how to list them:

cat /proc/scsi/scsi | grep Vendor
  Vendor: LSI      Model: Logical Volume   Rev: 3000
  Vendor: ATA      Model: HGST HUS724020AL Rev: AA70
  Vendor: ATA      Model: HGST HUS724020AL Rev: AA70

Note

Each line represents a sg device which is mapped according to the order of the device shown here.

Ex: Vendor: LSI Model: Logical Volume Rev: 3000 => /dev/sg0
Vendor: ATA Model: HGST HUS724020AL Rev: AA70 => /dev/sg1 Vendor: ATA Model: HGST HUS724020AL Rev: AA70 => /dev/sg2 etc...

In order to get the right device within one command, you may use the following command:

cat /proc/scsi/scsi | grep Vendor | nl -v 0 | sed 's/^/\/dev\/sg/' | grep -v LSI | cut -d ' ' -f1,6 | sed 's/sg\ /sg/' | sed 's/\/dev\/sg.\ /\/dev\/sg/'
/dev/sg1
/dev/sg2

With smartmontools‘s command smartctl, we will test each hard drives like the following:

smartctl -a /dev/sgX
X:The sg device number shown in the above command

Warning

If one of your hard drives is showing SMART errors, perform a full backup of your data as soon as possible and contact our support.

Progression resynchronisation raid

If you had one or more hard drives replaced, the RAID will re-synchronize automatically.

To see if the RAID is in re-sync and monitor the resync progression, you can use this command:

lsiutil -p1 -a 3,0 21
 
LSI Logic MPT Configuration Utility, Version 1.63-OVH (27a4f9f54c)
 
1 MPT Port found
 
     Port Name         Chip Vendor/Type/Rev    MPT Rev  Firmware Rev  IOC
 1.  ioc0              LSI Logic SAS2004 03      200      13000000     0
 
RAID actions menu, select an option:  [1-99 or e/p/w or 0 to quit] 3
 
Volume 0 is DevHandle 011e, Bus 1 Target 0, Type RAID1 (Mirroring)
 
Volume 0 State:  degraded, enabled, resync in progress
Resync Progress:  total blocks 624943104, blocks remaining 484024888, 77%
 
RAID actions menu, select an option:  [1-99 or e/p/w or 0 to quit] 0

Warning

The percentage value shown in the command result is NOT the completion percentage. It is the remaining percentage.

3Ware RAID controller

Danger

This RAID controller card is deprecated, we highly recommend you to contact OVH Support to schedule an intervention to replace the RAID controller by an LSI or LSI MegaRaid RAID controller as 3ware RAID controller are proven to be rather unstable.