TL2000_PowerVault_User_Guide_UG_en (PDF)
This started as testing the TL2000 tape device, and ended up with seeking to opimise the read performance of the Dell R510 server.
For the Dell R510 to optimmise read performance of home directories
This was not expected. The disks originally in maris were slower than both desiree and cabbage. The "slow" disks are now in desiree. The slowness was about 20% of performance when measured using RAID 10 and the tar read of home directories performance test.
RAID card: Stripe size of 1024k (1M) = maximum Default settings for read and write. I think these are Adaptive Read Ahead and Write Back. LVM: standard lvm setup as documented. and make sure to get the alignment right. yum --enablerepo=dag install gdisk gdisk /dev/sdb pvcreate -v /dev/sdb1 VGNAME=$(perl -e 'use POSIX; $v=(uname())[1];$v=~ s/\..*//;print ucfirst($v)."Span0\n"') vgcreate -v $VGNAME --physicalextentsize 32M /dev/sdb1 lvcreate -l 100%VG --name data -v $VGNAME File-system: IDEV=/dev/$VGNAME/data mkfs.xfs -d su=1024k,sw=5 -i attr=2 -l internal,version=2 $IDEV fstab mount options: defaults,nosuid,nodev,quota,noatime,logbufs=8,logbsize=256k Note: slight performance improvment without LVM, maybe 5%
RAID card: Stripe size of 64k = minimum Default settings for read and write. I think these are Adaptive Read Ahead and Write Back. LVM: standard lvm setup as documented. and make sure to get the alignment right. yum --enablerepo=dag install gdisk gdisk /dev/sdb pvcreate -v /dev/sdb1 VGNAME=$(perl -e 'use POSIX; $v=(uname())[1];$v=~ s/\..*//;print ucfirst($v)."Span0\n"') vgcreate -v $VGNAME --physicalextentsize 32M /dev/sdb1 lvcreate -l 100%VG --name data -v $VGNAME File-system: IDEV=/dev/$VGNAME/data mkfs.xfs -d su=64k,sw=9 -i attr=2 -l internal,version=2 $IDEV fstab mount options: defaults,nosuid,nodev,quota,noatime,logbufs=8,logbsize=256k
Power on the TL2000
The device will be created on the attached server:
/dev/changer
Check status
mtx -f /dev/changer status
mtx -f /dev/changer status
Storage Changer /dev/changer:1 Drives, 23 Slots ( 0 Import/Export )
Data Transfer Element 0:Empty
Storage Element 1:Empty
Storage Element 2:Empty
Storage Element 3:Empty
Storage Element 4:Empty
Storage Element 5:Empty
Storage Element 6:Empty
- snip -
Notes:
Data Transfer Element 0 is the tape drive
mtx -f /dev/changer inquiry
Product Type: Medium Changer
Vendor ID: 'IBM '
Product ID: '3573-TL '
Revision: '9.50'
Attached Changer: No
# Load media from slot 1 into drive:
$ mtx -f /dev/sg3 load 1
# status query includes:
Data Transfer Element 0:Full (Storage Element 1 Loaded)
# unload drive to slot 1 (default as not specified):
mtx -f /dev/changer unload
# load from slot 24:
mtx -f /dev/changer load 24
# test transfer of data to tape:
tar cpvf /dev/st0 ./howto/
# Errors you get if you try to write to /dev/changer or /dv/sg3
tar -cpvf /dev/sg3 ./howto/
tar: /dev/sg3: Cannot write: Cannot allocate memory
# table of contents:
tar tzvf /dev/st0
# Test backup of two copies of the DAMTP home directories:
cd /local/sync
# trigger a scan of the file system:
find . -iname "*rtljkhfgnalgnalegn*"
# then the actual tar of 1.3TB of files:
time tar cpf /dev/st0 ./cinghome ./cinghome2
real 1694m23.938s
user 6m53.381s
sys 55m43.215s
28.24 hours for 1.26TB of home directory data
(1260*1000)/(1694*60) = 12.4 MB/sec
# this time with compression:
time tar cpzf /dev/st0 ./cinghome ./cinghome2
real 2480m50.416s
user 1505m25.931s
sys 78m24.657s
(1260*1000)/(2480*60) = 8.5 MB/sec
# create a 128GB tar archive on disk:
maris:/local/sync/cinghome $ tar cpf /local/sync/test_data.tar ./gr ./chaos -b 300
# time the same tar to tape:
maris:/local/sync/cinghome $ time tar cpf /dev/st0 ./gr ./chaos -b 300
real 108m30.785s
user 0m14.866s
sys 3m5.396s
(128*1000)/(108*60) = 19.75 MB/sec
# time the same tar to /dev/null via dd
maris:/local/sync/cinghome $ time tar cpf - ./gr ./chaos -b 300 | dd of=/dev/null bs=100MB
real 90m31.963s
user 0m16.024s
sys 3m38.151s
(128*1000)/(90.5*60) = 23.57 MB/sec
# repeat:
real 89m46.543s
user 0m16.793s
sys 3m42.144s
(128*1000)/(89.75*60) = 23.77 MB/sec
# reboot and repeat
# same test on cabbage
cabbage:/local/backup/cinghome $ time tar cpf - ./gr ./chaos -b 300 | dd of=/dev/null bs=100MB
real 84m10.258s
user 0m15.674s
sys 3m46.482s
(128*1000)/(84.22*60) = 25.33 MB/sec
# reboot and repeat
cabbage:/local/backup/cinghome $ time tar cpf - ./gr ./chaos -b 300 | dd of=/dev/null bs=100MB
real 85m19.239s
user 0m15.635s
sys 3m36.686s
(128*1000)/(85.33*60) = 25 MB/sec
# reboot and mount with nodiratime option
cabbage:/local/backup/cinghome $ time tar cpf - ./gr ./chaos -b 300 | dd of=/dev/null bs=100MB
real 85m27.705s
user 0m15.716s
sys 3m37.414s
= same
# try with RAID6
# desiree, RAID6 over 12 2TB HDDs
desiree:/local/backup/cinghome $ time tar cpf - ./gr ./chaos -b 300 | dd of=/dev/null bs=100MB
real 44m18.994s
user 0m14.170s
sys 3m31.057s
(128*1000)/(44.33*60) = 48.12 MB/sec
# reboot and repeat
desiree:/local/backup/cinghome $ time tar cpf - ./gr ./chaos -b 300 | dd of=/dev/null bs=100MB
real 47m34.463s
user 0m15.097s
sys 3m29.015s
(128*1000)/(47.5*60) = 44.91 MB/sec
# Now with other mount options (logbufs=8,logbsize=256k)
desiree:/local/backup/cinghome $ time tar cpf - ./gr ./chaos -b 300 | dd of=/dev/null bs=100MB
real 47m5.655s
user 0m14.659s
sys 3m33.329s
= same
# maris RAID10 with logbufs=8,logbsize=256k
maris:/local/sync/cinghome $ time tar cpf - ./gr ./chaos -b 300 | dd of=/dev/null bs=100MB
real 91m29.291s
user 0m16.067s
sys 3m41.533s
(128*1000)/(91.5*60) = 23.3 MB/sec
# RAID 10 on maris with nodiratime mount option for xfs
maris:/local/sync/cinghome $ time tar cpf - ./gr ./chaos -b 300 | dd of=/dev/null bs=100MB
# oops, reading more suggests that nodiratime is set when noatime is set
# xfs parameter changes
http://www.practicalsysadmin.com/wiki/index.php/XFS_optimisation
http://www.mythtv.org/wiki/XFS_Filesystem#Mounting_the_XFS_filesystem_with_high_performance_options
http://everything2.com/title/Filesystem+performance+tweaking+with+XFS+on+Linux
http://xfs.org/index.php/XFS_FAQ
# Perc performance tuning?
Suggests stripe size increase to 512k
# maris, remove noatime and nodiratime
maris:/local/sync/cinghome $ time tar cpf - ./gr ./chaos -b 300 | dd of=/dev/null bs=100MB
real 91m25.559s
user 0m16.183s
sys 3m40.502s
= same
# cabbbage, lazy-count=1 during mkfs.xfs
cabbage:/local/backup/cinghome $ time tar cpf - ./gr ./chaos -b 300 | dd of=/dev/null bs=100MB
real 87m53.122s
user 0m16.089s
sys 3m44.112s
= same
# desiree, RAID6 over 11 disks (+ 1 HS)
desiree:/local/backup/cinghome $ time tar cpf - ./gr ./chaos -b 300 | dd of=/dev/null bs=100MB
real 46m23.132s
user 0m14.535s
sys 3m33.222s
(128*1000)/(46.4*60) = 45.97MB/sec
# simple write test
desiree:/local/backup $ time dd if=/dev/zero of=./2TB bs=1024k count=2048000
2048000+0 records in
2048000+0 records out
2147483648000 bytes (2.1 TB) copied, 4737.26 seconds, 453 MB/s
real 78m58.576s
user 0m2.875s
sys 44m37.443s
# same write test on maris with RAID10 over 10 disks (+ 2 HS)
2048000+0 records in
2048000+0 records out
2147483648000 bytes (2.1 TB) copied, 4044.37 seconds, 531 MB/s
real 67m25.275s
user 0m3.183s
sys 51m31.996s
# desiree
RAID6 (11 disks + 1 HS), stripe size 512k
desiree:/local/backup/cinghome $ time tar cpf - ./gr ./chaos -b 300 | dd of=/dev/null bs=100MB
real 58m54.944s
user 0m15.399s
sys 3m36.378s
(128*1000)/(59*60) = 36.15MB/sec
# simple write test
desiree:/local/backup $ time dd if=/dev/zero of=./2TB bs=1024k count=2048000
2147483648000 bytes (2.1 TB) copied, 2743.88 seconds, 783 MB/s
# cabbage
RAID10 (10 disks + 2 HS), stripe size 512k
cabbage:/local/backup/cinghome $ time tar cpf - ./gr ./chaos -b 300 | dd of=/dev/null bs=100MB
real 34m33.936s
user 0m14.291s
sys 3m35.915s
(128*1000)/(34.5*60) = 61.83MB/sec
# repeat to check: 30 mins (even faster)
# simple write test
cabbage:/local/backup/cinghome $ time dd if=/dev/zero of=./2TB bs=1024k count=2048000
2147483648000 bytes (2.1 TB) copied, 4309.78 seconds, 498 MB/s
# reboot and run read test again, to be sure.
real 36m0.916s
= same
# try without LVM, mkfs.xfs directly onto partition
# cabbage, RAID 10 + 2 HS, 512k stripe width
cabbage:/local/backup/cinghome $ time tar cpf - ./gr ./chaos -b 300 | dd of=/dev/null bs=100MB
real 32m1.775s
(128*1000)/(32*60) = 66.66MB/sec
# improved?
# reboot and re-run the test
real 33m27.284s
= conclude slight improvment without LVM, maybe 5%
# desiree, RAID6 + 1 HS, 512k stripe width
desiree:/local/backup/cinghome $ time tar cpf - ./gr ./chaos -b 300 | dd of=/dev/null bs=100MB
real 56m29.759s
(128*1000)/(56.5*60) = 37.76 MB/sec
# improved?
# reboot and re-run the test
real 57m1.217s
# slight improvement
# try with software RAID
# Other stripe sizes with each RAID level: 1MB, 256k
# cabbage, 1MB stripe width, RAID10 over 10 disks
cabbage:/local/backup/cinghome $ time tar cpf - ./gr ./chaos -b 300 | dd of=/dev/null bs=100MB
real 27m2.647s
# reboot and try again.
real 27m39.247s
(128*1000)/(27*60) = 79 MB/sec
# write test
time dd if=/dev/zero of=./2TB bs=1024k count=2048000
2147483648000 bytes (2.1 TB) copied, 3991.52 seconds, 538 MB/s
# desiree, 1MB stripe width, RAID6 over 11 disks
desiree:/local/backup/cinghome $ time tar cpf - ./gr ./chaos -b 300 | dd of=/dev/null bs=100MB
real 59m40.819s
# reboot and retry
real 61m13.788s
# write test
time dd if=/dev/zero of=./2TB bs=1024k count=2048000
2147483648000 bytes (2.1 TB) copied, 2837.29 seconds, 757 MB/s
# cabbage, 256k stripe width, RAID10 over 10 disks
IDEV=/dev/sdb1
mkfs.xfs -d su=256k,sw=5 -i attr=2 -l internal,version=2 $IDEV
cabbage:/local/backup/cinghome $ time tar cpf - ./gr ./chaos -b 300 | dd of=/dev/null bs=100MB
real 43m53.178s
time dd if=/dev/zero of=./2TB bs=1024k count=2048000
2147483648000 bytes (2.1 TB) copied, 4945.34 seconds, 434 MB/s
# desiree, 256k stripe width, RAID6 over 11 disks
IDEV=/dev/sdb1
mkfs.xfs -d su=256k,sw=9 -i attr=2 -l internal,version=2 $IDEV
desiree:/local/backup/cinghome $ time tar cpf - ./gr ./chaos -b 300 | dd of=/dev/null bs=100MB
real 46m27.165s
time dd if=/dev/zero of=./2TB bs=1024k count=2048000
2147483648000 bytes (2.1 TB) copied, 2960.77 seconds, 725 MB/s
# cabbage, software RAID10 over 10 disks
/sbin/mdadm --create --metadata=1.2 --verbose /dev/md0 --level=10 --raid-devices=10 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1 /dev/sdj1 /dev/sdk1
mkfs.xfs -d su=64k,sw=5 -i attr=2 -l internal,version=2 /dev/md0
cabbage:/local/backup/cinghome $ time tar cpf - ./gr ./chaos -b 300 | dd of=/dev/null bs=100MB
real 88m34.933s
# desiree, software RAID6 over 11 disks
/sbin/mdadm --create --metadata=1.2 --verbose /dev/md0 --level=6 --raid-devices=11 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1 /dev/sdj1 /dev/sdk1 /dev/sdl1
mkfs.xfs -d su=64k,sw=9 -i attr=2 -l internal,version=2 /dev/md0
desiree:/local/backup/cinghome $ time tar cpf - ./gr ./chaos -b 300 | dd of=/dev/null bs=100MB
real 88m35.789s
# amazingly similar times for software RAID6 and RAID10
rsync -axSHR ./cinghome/ -e "ssh -2 -x -c arcfour" cabbage:/local/backup
# configue maris with "winning" RAID level and other config
mkfs.xfs -d su=1024k,sw=5 -i attr=2 -l internal,version=2 $IDEV
# oh no, just 60MB/sec yet managed 80MB/sec before - must be missing something...
1156684185600 bytes (1.2 TB) copied, 2221.45 seconds, 521 MB/s
# configure desiree and cabbage with RAID10 over 10 disks with 1MB stripe size
# desiree
real 30m0.640s
(128*1000)/(30*60) = 71MB/sec
# cabbage
real 29m52.322s
(128*1000)/(30*60) = 71MB/sec
# run test, again, on maris
slower... 60BM/sec odd
# swapped disks between maris and desiree.
RAID cards give warnings, press "f" to import configuration from disks.
Then a simple /etc/fstab edit to take the different pv lvm name.
Wait for the consistency check to finish before read test
maris:/local/sync/cinghome $ time tar cpf - ./gr ./chaos -b 300 | dd of=/dev/null bs=100MB
real 30m16.459s
desiree:/local/backup/cinghome $ time tar cpf - ./gr ./chaos -b 300 | dd of=/dev/null bs=100MB
real 38m33.996s
# looks like something is wrong with the set of disks that was in maris that are now
# in desiree. Conclusion: use maris (now) for tape backup.
# Write tests (of small files)
# in /local/backup/cinghome/ or /local/sync/cinghome/
time rsync -axSHR gr ../
# desiree
real 62m48.320s
# maris
real 63m18.660s
# read/write test with RAID10, 10 disks, 64k and 512k stripe size
# cabbage, 64k stripe size
mkfs.xfs -d su=64k,sw=5 -i attr=2 -l internal,version=2 $IDEV
real 95m53.450s
real 97m28.721s
# maris, 512k stripe size
mkfs.xfs -d su=512k,sw=5 -i attr=2 -l internal,version=2 $IDEV
real 64m35.318s
real 64m50.431s
# desiree, 1024k stripe size, with "slow" set of disks swapped from maris
real 62m12.340s
real 62m49.950s
conclude that 1MB stripe size for RAID10 is optimal for read and write of
small files
# checking RAID initialisation status
/usr/local/sbin/MegaCli -LDInfo -LALL -aALL
# tar test of 1.2TB to tape, maris
cd /local/sync
time tar cpf /dev/st0 ./cinghome ./cinghome2
real 746m3.958s
12.43 hours for 1.2TB (this was 28.2 hours)
(1260*1000)/(746*60) = 28.15 MB/sec
# maybe try changing the lvm "PE Size" (vgdisplay reports this as 32MB)
# small files write test using rsync from same file-system to itself.
cabbage:/local/backup/cinghome $ time rsync -axSHR waves ../
6.3G waves
cabbage:/local/backup/cinghome $ time rsync -axSHR waves ../
real 3m5.448s
(6.3*1000)/(3*60) = 35MB/sec
cabbage:/local/backup/cinghome $ time rsync -axSHR gr ../
real 61m9.404s
(105*1000)/(61*60) = 28.7MB/sec
# similar test on cingulum
cingulum:/local/cinghome $ time rsync -axSHR waves ./temp/
real 33m43.245s
user 1m17.246s
sys 2m23.737s
(6.3*1000)/(33*60) = 3.18 MB/sec
not really fair as cingulum is hosting live home directories
# maris setup for DAMTP home directories (in the first instance)
# RAID10 stripe size of 1024k
maris:/local/marishome/cinghome $ time rsync -axSHR gr ../
real 58m41.427s
user 18m37.255s
sys 15m8.833s
# try with xen