Great set of file system benchmarks over at alephnull

Over at alephnull they’ve put together a really nice set of file system benchmarks, tracking all softs of various options which can improve and hurt performance when running in various mdraid configurations. From their site:

Linux Software RAID Performance Comparisons (2012)

The goal of this study is to determine the best configuration parameters for a 5-spindle software RAID using Linux as an NFS file server for Big Data analysis and (mostly write) backup storage.

Large sequential read and write access of files ranging from about 512mb to 4gb is particularly interesting.

Those with more spindles, more users, or more complex I/O patterns should not expect these results to apply to or scale to their environment.

Experimental details are provided below. Here is a summary of the recommendations.

A Comparison of Chunk Size for Software RAID-5

A Comparison of Cache Size for Software RAID-5

A Comparison of Payload Align Size for Cryptosetup Over Software RAID-5

A Comparison of Software RAID-5 With and Without Encryption and Ext4fs

A Comparison of Various Ext4fs Options

A Comparison of NFSv4 rsize and wsize Values

 

Linux Software RAID Performance Comparisons

The goal of this study is to determine the cheapest reasonably performant solution for a 5-spindle software RAID configuration using Linux as an NFS file server for a home office. Normal I/O includes home directory service, mostly-read-only large file service (e.g., MP3s), and nightly rsync-based backup.

Those with more spindles, more users, or more complex I/O patterns should not expect these results to apply to or scale to their environment.

Experimental details are provided below. Here is a summary of the recommendations.

A Comparison of Two Cheap 8-port SATA Controllers

When using software RAID, on mostly-single-threaded workloads, the Supermicro AOC-SAT2-MV8 has superior price/performance compared with the Promise SuperTrak STTX8650.

A Comparison of Chunk Size for Software RAID-5

The chunksize for 5-spindle RAID-5 should be 128KB (“mdadm … -c 128”).

A Comparison of Software RAID Types

RAID-5 and RAID-6 have nearly identical performance for I/O sizes less than about 256KB. Never use RAID-0.

A Comparison of Cryptographic Algorithms for Software RAID-5

AES-128 is fastest, although AES-256 is comparable for most workloads and should be considered for potentially better security.

When using a chunksize of 128KB for 5-spindle RAID-5, use the recommended full-stripe size of 512KB (1024 sectors) for the align-payload parameter for crypsetup.

A Comparison of Stride and Stripe-Width Values (for ext3 and ext4dev)

When using a 5-spindle RAID-5 with a chunksize of 128KB and a value of 512KB for the align-payload parameter, using the recommended stride of 128KB and stripe-width of 512KB is reasonable for both ext3 and ext4.

A Comparison of Ext3 and Ext4dev

Quite similar for these uncached tests. Ext4dev may provide better multi-threaded and mixed read/write performance.

A Comparison of Various NFS Configurations

Memory speed doesn’t much matter.

CPU speed is significant.

Larger [rw]size is better for multi-threaded workloads, but worse for single threaded workloads. 64KB is a good compromise.

Summary of Comparisons

For 5-spindle RAID-5, chunk size should be 128k. The AES-256 encrypted file system payload alignment should be 4 times this value, or 512k. The file system should be informed of these values as a stride of 128k and a stripe-width of 512k. For NFS, the rsize and wsize values should be at least 64k. NFSv4 should be used because it avoids lockd issues.

As a departure from this summary, consider AES-128 for improved write performance.

Final Build

After this analysis, the file server was built with the following commands:

  • fdisk -H 224 -S 56 /dev/sd[bcdef], start 1, end 155000
    The fdisk parameters were recommended by Ted T’so for SSDs and should not help with spinning media.
  • badblocks -b 4096 -s -v -w -t random /dev/sd[bcdef]1
    When running 5 copies simultaneously, 2 drives sustained 30MB/s writes, and the other 3 sustained 15MB/s — the difference being in the controller’s ports. Reads were 40MB/s on the two drives attached to the faster ports, and 20MB/s on the other ports.
  • mdadm -C /dev/md0 /dev/sd[bcdef]1 -c 128 -n 5 -l 5
    The -c parameter sets the chunk size to 128k. RAID reconstruction proceeded by reading at about 24MB/s from 4 disks and writing at 24MB/s to the other disk.
  • cryptsetup -c aes-cbc-essiv:sha256 -s 128 -h sha256 –align-payload=1024 luksFormat /dev/md0
    This uses aes-128 with a 512k payload alignment.
  • cryptsetup luksOpen /dev/md0 r0
  • pvcreate –metadatasize 506k /dev/mapper/r0
    This is also from Ted’s blog, but in this case I want to align the boundaries to 512k, since that is the payload alignment for encryption.
  • pvs /dev/mapper/r0 -o+pe_start
  • vgcreate -s 1g v0 /dev/mapper/r0
  • lvcreate -L 2t -n m v0
  • lvcreate -L 300g -n o v0
  • mke2fs -t ext4 -i1048576 -m0 -E stride=32,stripe-width=128 /dev/mapper/v0-m
  • mke2fs -t ext4 -i65536 -m0 -E stride=32,stripe-width=128 /dev/mapper/v0-o

Leave a Reply

Your email address will not be published. Required fields are marked *