Question

OpenStack Swift deployment documentation says:

Swift’s disk usage pattern is the worst case possible for RAID, and performance degrades very quickly using RAID 5 or 6.

But I failed to find any elaboration or explanation of that. So, before I dig deep into the Swift source code, I'd like to ask the community:

  • what should the RAID-friendly "disk usage pattern" be?
  • what's so special about Swift's disk usage?
Was it helpful?

Solution

Swift is totally random IO model due to its Ring data structure. In short, Ring maps files uniformly to all disks.

RAID 5 or RAID 6 performance is very bad if you have a high random write workload. See more information here

The scenario is similar to database. Database also stores files somewhat uniformly, such as mongodb. You will find they do not suggest RAID 5 or RAID 6 either. Only RAID 10 is recommended.

OTHER TIPS

Why at first place you need RAID with Swift?

Swift natively uses XFS & most of the operations are handled by its native algorithm called RING.

Alternatively if you want to dig deep inside RING algorithm my colleagues did a video deep dive in on RING.

Hope it helps,

Atul

  1. what should the RAID-friendly "disk usage pattern" be?

People use RAID card for the following reasons:

1) protect from single drive failures (except for RAID 0) 2) gain higher I/O performance than single drives (RAID 5,6,10,50, etc, and write back cache etc. with BBU) 3) Use more drives than a motherboard can support with RAID/HBA cards 4) Some storage management features (GUI or command line tools)

  1. what's so special about Swift's disk usage?

Swift disk I/O are 1) mostly random on A/C/O servers 2) high concurrency in parallel 3) 6x amplification factor for put one object (write 3x object and update 3x containers at least, let alone other replication process, auditor etc)

Openstack Swift is designed to use the commodity servers and hard drives, meaning lowest cost on reasonably good quality hardware, which often do not include RAID card(s). However, one would need a RAID/HBA card to use 8-10+ HDDs in a server, so in practice many would use RAID card but configure each HDD as single drive RAID0, or use a HBA card, if the motherboard can not support the number of HDDs the server chassis can hold.

You certainly can use RAID5, 6, 10, and lose some capacity to gain some protection and performance, but that often has higher cost than needed. Swift has tunable replication factors which is default to be 3x.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top