Btrfs with RAID5 is safe now

Btrfs’ wiki page about RAID5 states some serious warnings about using it.

In search of an easy to integrate and service, encrypted mass storage I stumbled across many filesystems. I almost considered using ZFS, but there were disadvantages for me. The worst ones were not being able to (easily) remove disks from an existing ZFS pool and that there is the typical RAID5 limitation of being stuck with the size of the smallest drive for all drives.

Well, Btrfs lets you remove drives on any time. Also btrfs has a RAID functionallity built-in, resulting in a work-together style instead of “you do your part, and I mine”. Because of this, the filesystem exactly knows what’s going on on the block layer and the file system layer.

Finally though, Btrfs with its built-in RAID5 became my choice even though RAID5 is considered very dangerous and not production ready. But here is why I still use it.

Something that I just wanted to add in this post: Btrfs’ RAID5 is working with sub-partitions, an implementation that I want to see in every RAID5 implementation. Instead of having one big partition, resulting in the typical RAID5 disk space limitations, btrfs created many smaller partitions. So btrfs distributes all these small partitions and parity on the available disks the best way it sees possible. Don’t worry: these small partitions are visible as one real partition in the end.

But that’s not the topic of this post.

Btrfs’ RAID5/6 mode is considered very dangerous to use in production environment. Told by the Btrfs developers itself. But how dangerous is it really?

I was evaluating Btrfs in RAID5 for months now, with a test maschine with 6 old drives that I found, in various sizes from 500GB to 2.5TB. Using btrfs in RAID5 I could use almost all free space available, not just 500GB per drive.

But I was sceptical about the problems btrfs’ RAID5 comes with. There were 2 major failures that make RAID5 not ready for production. The first problem was solved recently: scrub and auto-repair. And the other one is the write hole.

The write hole is the last missing part, preliminary patches have been posted but needed to be reworked.

From Btrfs’ status wiki page

The write hole only occurs when the parity could not be fully written to the disk(s). For example on a power failure or on a kernel panic. And then the filesystem only fails when you have to replace a bad disk some time later and the data is restored with the bad parity that failed to be written before, resulting in a bad filesystem.

The write hole can be “closed” if a rebalance of the filesystem is executed, so the parity data is rewritten.

But this is a general problem for RAIDs, because (almost) any RAID fails if it cannot write its parity data.

So I did some testing with my crossbreed test maschine. I created a large btrfs RAID5 partition and played many scenarios of something failing. I also tested the write hole scenario. I wrote data, disconnected the maschine’s power cord while writing data, rebalanced the RAID after a restart, replaced the disk to let the RAID rewrite the disk. I did this 5 times and except for the currently written files being not complety written (makes sense) everything was okay considering a checksum check for all files.

My solution for preventing these write holes even, I’m using an Uninterruptible Power Supply and a very good power supply. So if there is an power outage my storage server shuts down safely.

My storage server runs since 8 months now, with btrfs in RAID5 and I save all my data on it. All my data. That’s how much I trust the RAID5 of btrfs, it works super fast, like a charm. 2 drives were added in this time.

I, myself, consider btrfs’ RAID5 for safe. You can try it, too.