Configuring XFS on a Very Large RAID Array

This was one of those times that you’ve got a huge task at hand and you get bogged down in some super small detail that consumes a full day… I’m setting up a new MythTV system and the first step is configuring the RAID array. The hardware’s a piece of cake, partitioning was no problem, but how do you create an XFS volume that’ll survive multiple expansions?

You see, I’ve been stung by this before, I think with efs3. You start out with a modest file system, expand it a couple times, then you’re out of inodes. You’ve got a lot of free space, but you can’t create any more files until you create more inodes. You can create more inodes without destroying the file system which means backing the whole thing up, etc…

Luckily, XFS creates new inodes on the fly so that problem’s fixed. But I will be ‘growing’ it several times so how do I start my 1.3 TB array with a file system that will grow to over 5 TB? Turns out to be fairly easy. Common wisdom on the Internet is to simply create a default XFS file system only with at least 512 byte inodes. That’s for a > 1 TB array. This array will probably grow to 5 TB so I made my inodes 1k or 1024 bytes. That’s supposed to spread the inodes around the file system a bit better and is a bit more efficient with directories with lots of files in them. Good so far.

Now I have the problem of performance. I ran some tests and was only getting about 180MB/s read/write performance. Compared to other benchmarks on the Internet, that was pretty low.

I thought my XFS RAID parameters were incorrect so I start goofing off with the -d su=XX,-sw=xx parameters to speed things up. These parameters are supposed to indicate to XFS what the underlying RAID geometry is so that it can efficiently read and write to the file system. The su or sunit is the RAID stripe size (64k in my case) and the sw or stripe width is the number of data drives you have in your array multiplied by the sunit. In my case, I have a total of three drives in the array, one parity and two data drives, so my stripe width would be 128k. After a while, nothing was making a performance impact. I then tested a single SATA 2 7200RPM drive by itself and got 40MB/s throughput. Now 180MB/s on a two data drive array aint lookin’ so bad. Then I noticed the 400MB/s benchmarks came from eight drive arrays. The more drives the faster the array so that probably explained it.

The tests I used were primarily:
Write test:

# dd if=/dev/zero of=10gb bs=1M count=10240
10737418240 bytes (11 GB) copied, 22.459 seconds, 478 MB/s

Read test:

# dd if=10gb of=/dev/zero bs=1M count=10240
10737418240 bytes (11 GB) copied, 28.7843 seconds, 373 MB/s

I also tried bonnie++ but the results were largely incomprehensible.

That’s when I made the decision not to care about the stripe size and width anymore. Primarily because it didn’t appear to make a difference and also because it seems you can’t change it after you create the file system. If I went with the above parameters, what would happen when I added another drive? I’d be hosed I suspect.

The command I used to create the file system was:

/sbin/mkfs.xfs -i size=1k -f /dev/data/lvol0

…where /dev/data/lvol0 was my LVM logical partition.

I’ll be expanding the array shortly. I’ll let you know how it goes.

This entry was posted in Mythtv. Bookmark the permalink.

Comments are closed.