Software RAID

Warning

Attention, it is dangerous to play with mdadm if you don’t know what you are doing. You risk losing your data, backup before doing anything. If you just had a disk replaced in your server you can refer to the Disk replacement section.


MDADM installation

Mdadm might not be installed on the server. For distributions with RPM, you can install it like so:

rpm -Uvh --nodeps ftp://ftp.ovh.net/made-in-ovh/sources/mdadm-2.5.5p1-1.i386.rpm
Retrieving ftp://ftp.ovh.net/made-in-ovh/sources/mdadm-2.5.5p1-1.i386.rpm
Preparing... ########################################### 100%
1:mdadm ########################################### 100%

For more information use the –help flag:

mdadm --misc --help
Usage: mdadm misc_option devices...
 
This usage is for performing some task on one or more devices, which
may be arrays or components, depending on the task.
The --misc option is not needed (though it is allowed) and is assumed
if the first argument in a misc option.
 
Options that are valid with the miscellaneous mode are:
--query -Q : Display general information about how a
device relates to the md driver
--detail -D : Display details of an array
--examine -E : Examine superblock on an array component
--examine-bitmap -X: Display contents of a bitmap file
--zero-superblock : erase the MD superblock from a device.
--run -R : start a partially built array
--stop -S : deactivate array, releasing all resources
--readonly -o : mark array as readonly
--readwrite -w : mark array as readwrite
--test -t : exit status 0 if ok, 1 if degrade, 2 if dead, 4 if missing

Information

To get information about the status of the RAID use –misc command.

mdadm --misc --detail /dev/md1
/dev/md1:
Version : 00.90.00
Creation Time : Tue Nov 7 22:01:16 2006
Raid Level : raid1
Array Size : 3068288 (2.93 GiB 3.14 GB)
Device Size : 3068288 (2.93 GiB 3.14 GB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 1
Persistence : Superblock is persistent
 
Update Time : Wed Nov 8 15:42:35 2006
State : active, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
 
UUID : 4a9a446d:af55e24b:b311aa61:8dc74ed4
Events : 0.12
 
Number Major Minor RaidDevice? State
0 8 1 0 active sync /dev/sda1
1 0 0 1 removed

We see that we have a RAID1 of 3 Gb, which is in degraded mode because the sdb1 disk is missing.

Listing partitions on a disk

cat /proc/partitions

or

parted /dev/sdX print

Creating a Software RAID

Requirements

To create a Software RAID on your server, you need to;

  • Have root access to the server
  • Have a Linux server
  • Have 2 hard drive with partition parity (same size)

Warning

This guide assume that you want to create a Software RAID on two blank hard drives. If this is not the case, you will lose your data.

Finding usable disks

The first step is to determine which disks you want to use to create your RAID.

This example will show how to create a RAID using to blank hard drive with he same capacity.

Use the parted -l command to find out which disks you can use:

parted -l
Model: ATA Hitachi HDP72502 (scsi)
Disk /dev/sda: 172GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
 
Number Start End Size Type File system Flags
1 1049kB 20,0GB 20,0GB primary ext4 boot, raid
2 20,0GB 170GB 150GB primary ext4 raid
3 170GB 172GB 1998MB primary linux-swap(v1)
 
 
Model: ATA Hitachi HDP72502 (scsi)
Disk /dev/sdb: 172GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
 
Number Start End Size Type File system Flags
1 1049kB 20,0GB 20,0GB primary ext4 raid
2 20,0GB 170GB 150GB primary ext4 raid
3 170GB 172GB 1998MB primary linux-swap(v1)
 
 
Model: ATA ST3250820AS (scsi)
Disk /dev/sdc: 268GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
 
Number Start End Size Type File system Flags
 
 
Model: ATA ST3250820AS (scsi)
Disk /dev/sdd: 268GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
 
Number Start End Size Type File system Flags
 
 
Model: Linux Software RAID Array (md)
Disk /dev/md1: 20,0GB
Sector size (logical/physical): 512B/512B
Partition Table: loop
 
Number Start End Size File system Flags
1 0,00B 20,0GB 20,0GB ext4
 
 
Model: Linux Software RAID Array (md)
Disk /dev/md2: 150GB
Sector size (logical/physical): 512B/512B
Partition Table: loop
 
Number Start End Size File system Flags
1 0,00B 150GB 150GB ext4

In this example we can see two RAIDs:

  • /dev/md1
  • /dev/md2

And two blank disks (without partitions):

  • /dev/sdc
  • /dev/sdd

So we will create a RAID1 array with a partition on each disks (sdc and sdd)

Creating the partitions

Here we have two blank disks, so we will create a partition on the first disk and then copy that partition on the second disk.

We will always use parted to create the partitions. If the disk is blank we will set either a msdos or a gpt label.

Note

Use a gpt label for disks larger than 2TB for the disk partition.

To set a label use this command:

parted /dev/sdc -s mklabel msdos

Warning

This command is desctructive, do NOT run it on a disk that is not blank or all the data it contains will be lost. You also need to replace /dev/sdc with the correct path to your disk.

Then we can locate the begining and end of the partition:

parted /dev/sdc -s print free
Model: ATA ST3250820AS (scsi)
Disk /dev/sdc: 268GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
 
Number Start End Size Type File system Flags
32,3kB 268GB 268GB Free Space

This tells us that the partition begins at 32,3kB and ends at 268GB.

So we create the partition like so:

parted /dev/sdc mkpart --align="optimal" primary 32,3kB 268GB

Finally, we can copy our new partition on the second disk.

If you are using a GPT label

Use this command :

sgdisk -R=/dev/sdd /dev/sdc
/dev/sdc:the disk to copy the partition from
/dev/sdd:the disk to copy the partition to

If you are using a msdos label

sfdisk -d /dev/sdc | sfdisk /dev/sdd
Checking that no-one is using this disk right now ...
OK
 
Disk /dev/sdd : 32635 cylinders, 255 heads, 63 secorts/track
Old situation :
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0
 
Device Boot Start Fin #blocks Id System
/dev/sdd1 0 - 0 0 0 Empty
/dev/sdd2 0 - 0 0 0 Empty
/dev/sdd3 0 - 0 0 0 Empty
/dev/sdd4 0 - 0 0 0 Empty
New situation :
Units = sectors of 512 octets, counting from 0
 
Device Boot Start Fin #blocks Id System
/dev/sdd1 63 524287999 524287937 83 Linux
/dev/sdd2 0 - 0 0 Empty
/dev/sdd3 0 - 0 0 Empty
/dev/sdd4 0 - 0 0 Empty
Warning: partition 1 does not end at a cylinder boundary
 
Warning: no primary partition is marked bootable (active)
This does not matter for LILO, but the DOS MBR will not boot this disk.
Successfully wrote the new partition table
 
Re-reading the partition table ...
 
If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
to zero the first 512 bytes:  dd if=/dev/zero of=/dev/foo7 bs=512 count=1
(See fdisk(8).)

Creating the RAID

To create the RAID, use the following command:

mdadm --create /dev/mdZ --metadata=0.90 --level=L --assume-clean --raid-devices=N /dev/sdX1 /dev/sdY1
/dev/mdZ:is the RAID volume you want to create (it should not exist)
–level:the RAID type (0,1,5,6)
–raid-devices:number of devices in the RAID

In our example, we will create a RAID1 /dev/md4 with the partitions /dev/sdc1 and /dev/sdd1:

mdadm --create /dev/md4 --metadata=0.90 --level=1 --assume-clean --raid-devices=2 /dev/sdc1 /dev/sdd1

Now we must add the created RAID into the mdadm.conf file. To find out the informations you need to put, use this command:

mdadm --examine --scan --verbose
ARRAY /dev/md/1 level=raid1 metadata=0.90 num-devices=2 UUID=168dcc81:bf4adfaa:ac9374dc:fcf4afea name=nsxxxxxx:0 devices=/dev/sdb1,/dev/sda1
ARRAY /dev/md/2 level=raid1 metadata=0.90 num-devices=2 UUID=2a6589cb:c252378a:bdd0da02:6812cf40 name=nsxxxxxx:1 devices=/dev/sdb2,/dev/sda2
ARRAY /dev/md4 level=raid1 num-devices=2 UUID=17a97424:924182e2:77f5fb22:80a1782a devices=/dev/sdd1,/dev/sdc1

So the line we need to add should look like this:

ARRAY /dev/md4 level=raid1 num-devices=2 UUID=17a97424:924182e2:77f5fb22:80a1782a devices=/dev/sdd1,/dev/sdc1

We will add this line to /etc/mdadm/mdadm.conf.

Custom kernel with initrd

If you have a custom kernel with a initrd, you must also modify the mdadm.conf in the initrd.

Find the initrd file using this command:

grep -s initrd /boot/grub/grub.cfg /boot/grub.cfg /etc/lilo.conf | head -1
/boot/grub/grub.cfg:    initrd  /boot/initrd.img-3.2.0-4-amd64

Create a safety backup of your initrd:

cp -a /boot/initrd.img-3.2.0-4-amd64 /boot/initrd.img-3.2.0-4-amd64.old

Create a temporary directory:

mkdir /tmp/initrd && cd /tmp/initrd

Find the compression used for the initrd:

file /boot/initrd.img-3.2.0-4-amd64
/boot/initrd.img-3.2.0-4-amd64: gzip compressed data, from Unix, last modified: Fri May 16 18:37:00 2014, max compression

Extract the initrd:

Gzip

gunzip -c '/boot/initrd.img-3.2.0-4-amd64' | cpio -i

lzma

lzma -dc -S .lz /boot/initrd.img-3.2.0-4-amd64 | cpio -id

Modify the mdadm.conf file as explained in the Creating the RAID section.

Finally, recreate the initrd:

Gzip

find * | cpio -o -H newc | gzip -9 -c > '/boot/initrd.img-3.2.0-4-amd64'

lzma

find * | cpio --quiet --dereference -o -H newc | lzma -7 > /boot/initrd.img-3.2.0-4-amd64

Formatting the RAID

To complete your RAID array you must format your partition. In this example we will format it using ext4:

mkfs.ext4 /dev/md4
mke2fs 1.42.5 (29-Jul-2012)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
16384000 inodes, 65535968 blocks
3276798 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
2000 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872
 
Allocating group tables : done
Writing inode tables: done
Creating journal (32768 blocks) : done
Writing superblocks and filesystem accounting information: done

Creating mounting point

If you want your new RAID partition to be automatically mounted on your server at boot time, you need to add the line /dev/mdZ /mnt/data FILESYSTEM defaults 0 2 in the the /etc/fstab file.

/dev/mdZ:The path of your RAID volume
/mnt/data:The mount point where to mount the partition
FILESYSTEM:The filesystem used in the RAID volume

So, following our example our fstab line should be /dev/md4 /mnt ext4 defaults 0 2

To make sure your volume mounts correctly, use this command:

mount -a

Verifications

To verify the state of your RAID array:

cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty]
md4 : active raid1 sdd1 [1] sdc1[0]
262143872 blocks [2/2] [UU]
 
md1 : active raid1 sda2 [0] sdb2 [1]
146157440 blocks [2/2] [UU]
 
md0 : active raid2 sda1 [0] [sdb11]
19514240 blocks [2/2] [UU]
 
unused devices:

To verify that the volume is mounted you can either use df our mount:

df -h
Filesystem            Size  Used Avail Use% Mounted on
udev                   10M     0   10M   0% /dev
tmpfs                 5.0M     0  5.0M   0% /run/lock
tmpfs                 3.4G   13M  3.4G   1% /run/shm
tmpfs                 1.6G  388K  1.6G   1% /run
/dev/md2               20G  1.8G   17G  10% /
/dev/md4              1.8T  195G  1.6T  12% /mnt/data
mount
udev on /dev type devtmpfs (rw,relatime,size=10240k,nr_inodes=2038465,mode=755)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
tmpfs on /run/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=3475020k)
tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=1632760k,mode=755)
/dev/md2 on / type ext3 (rw,relatime,errors=remount-ro,user_xattr,acl,barrier=0,data=ordered)
/dev/md4 on /mnt/data type ext4 (rw,relatime,errors=remount-ro,user_xattr,acl,barrier=0,data=ordered)

Disk replacement

If you have a Software RAID and your server just had its disk changed, the new disk is blank and contains no partition table.

A quick way to determine which disk has no partition table is to use the parted command:

parted -l
Error: /dev/sda: unrecognised disk label
Model: ATA Hitachi HUA72201 (scsi)
Disk /dev/sda: 1000GB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags:
 
Model: ATA ST1000DM003-9YN1 (scsi)
Disk /dev/sdb: 1000GB
Sector size (logical/physical): 512B/4096B
Partition Table: msdos
Disk Flags:
 
Number  Start   End     Size    Type     File system     Flags
  1      2097kB  21.5GB  21.5GB  primary  ext4            boot, raid
  2      21.5GB  1000GB  978GB   primary  ext4            raid
  3      1000GB  1000GB  538MB   primary  linux-swap(v1)
  [...]

In this specific example, we see the new disk is /dev/sda as it doesn’t have any partition table.

You must then copy the partition from the disk containing the data to the new disk.

In case SDA is the new disk:

sfdisk -d /dev/sdb | sfdisk /dev/sda

In case SDB is the new disk:

sfdisk -d /dev/sda | sfdisk /dev/sdb

Note

sfdisk does not support GPT partition tables, which is bound to becoming more popular.

So if the server has GPT and Soft RAID, you use:

In case SDA is the new disk:

sgdisk -R=/dev/sda /dev/sdb

In case SDB is the new disk:

sgdisk -R=/dev/sdb /dev/sda

Adding a disk in degraded mode

Here is an example of commands, to add a disk to a raid in degraded mode:

mdadm /dev/md1 --manage --add /dev/sdb1
mdadm: hot added /dev/sdb1

The raid is rebuilding.

cat /proc/mdstat
Personalities : linear raid0 raid1 raid5
read_ahead 1024 sectors
md1 : active raid1 sdb12 sda10
3068288 blocks 2/1 U_
=>................... recovery = 5.4% (167028/3068288) finish=0.8min speed=55676K/sec
md2 : active raid1 sda20
240597376 blocks 2/1 U_

A few minutes later the reconstruction will be finished:

mdadm --misc --detail /dev/md1
/dev/md1:
Version : 00.90.00
Creation Time : Tue Nov 7 22:01:16 2006
Raid Level : raid1
Array Size : 3068288 (2.93 GiB 3.14 GB)
Device Size : 3068288 (2.93 GiB 3.14 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 1
Persistence : Superblock is persistent
 
Update Time : Wed Nov 8 16:01:02 2006
State : active
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
 
UUID : 4a9a446d:af55e24b:b311aa61:8dc74ed4
Events : 0.14
 
Number Major Minor RaidDevice? State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1

We will do the same on md1:

mdadm /dev/md2 --manage --add /dev/sdb2
mdadm: hot added /dev/sdb2

Rebuilding

mdadm --misc --detail /dev/md2
/dev/md2:
Version : 00.90.00
Creation Time : Tue Nov 7 22:01:16 2006
Raid Level : raid1
Array Size : 240597376 (229.45 GiB 246.37 GB)
Device Size : 240597376 (229.45 GiB 246.37 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 2
Persistence : Superblock is persistent
 
Update Time : Wed Nov 8 16:06:43 2006
State : active, degraded, recovering
Active Devices : 1
Working Devices : 2
Failed Devices : 0
Spare Devices : 1
 
Rebuild Status : 0% complete
 
UUID : e8754328:26f28599:8b834aba:ade37773
Events : 0.10
 
Number Major Minor RaidDevice? State
0 8 2 0 active sync /dev/sda2
1 0 0 1 removed
 
2 8 18 2 spare /dev/sdb2
cat /proc/mdstat
Personalities : linear raid0 raid1 raid5
read_ahead 1024 sectors
md1 : active raid1 sdb11 sda10
3068288 blocks 2/2 UU
 
md2 : active raid1 sdb22 sda20
240597376 blocks 2/1 U_
>.................... recovery = 0.2% (655104/240597376) finish=73.2min speed=54592K/sec
unused devices: 

Check the size of your swap at the end.

cat /proc/swaps
Filename Type Size Used Priority
/dev/sdb3 partition 522104 43984 -1

You must now add the swap of the disk that you have just formatted and re-added to the RAID.

mkswap /dev/sda3
swapon -a

You can see with the free command that the whole swap space is activated again.