This document is meant for users with some technical skills. For absolute beginners, you are probably better off by starting elsewhere.What it BTRFS:
The purpose for this document is to try to clarify some concepts and describe how BTRFS works in principle as there seems to be a lot of misinformation or simply lack of knowledge out there.
Note: the author of this document is just a regular user with some C coding experience and an interest in BTRFS so while I will try hard to avoid misinformation there is a slight risk that I might in fact add to the misinformation that is already there.
Note#2: This document is as of writing this (30.June 2018) work in progress and must be considered unfinished.
Btrfs is a (mostly) self-healing Copy On Write (COW) filesystem for Linux with a lot of fancy features. It is maturing, which means that some parts of it works great, some parts need a little bit of know how, and some parts of the filesystem is experimental and only recommended for testing purposes.So what can BTRFS do for you:
How does BTRFS work:
- Well, it is a filesystem so it will allow you to store files in an organized way just like other filesystems such as ext3,ext4, xfs, etc...
- It will checksum data and metadata (data is your files, metadata is data that describes your filesystem)
- It will therefore tell you if you have a corrupt file...
- ...and if set up correctly BTRFS can therefore auto-repair corrupt data for you and self-heal in most situations.
- It supports multiple storage devices
- Which means you will not require LVM, MD-RAID or HW-RAID to run BTRFS on multiple devices.
- It support separate data and metadata (filesystem structures) profiles similar to traditional RAID0, RAID1, RAID10, RAID5 and RAID6 (Warning: RAID5 and RAID6 are currently experimental and has dataloss bugs)
- It allows for snapshots
- Snapshots are a "saved state" of (parts of) your filesystem.
- (Snapshots are also subvolumes)
- It allows you to create subvolumes - which act like common directories.
- Subvolumes are a separate "tree" of the filesystem
- Note: While it in many ways can be compared to a partition, it is not a partition in the traditional sense and it is definitively not a separate block device that can be used to run other filesystems.
- Subvolumes act like directories with some extra features (like for example their own UUID)
- Future plans include allowing different redundancy levels per subvolume
- This means that you can have a directory where all stuff is redundant (like RAID1 / 10 / 5 or 6)
- ...or you may have a non-redundant directory for speed (like RAID 0)
- It will allow you to convert to different redundancy (or non-redundancy) profiles online - while the system is fully operational
- It supports verifying the filesystem consistency (scrubbing) while online.
- It is more or less self-healing and will auto-fix corruptions and structural damage to the filesystem in most cases
- It offers transparent compression (current algorithms are zlib, lzo and zstd)
So let's get to work...:
- Allocation of data
- Required storage space is always allocated piece by piece in chunks
- A chunk is usually 1GB (which means that a 1TB storage device may be split up in 1024 GB chunks)
- BTRFS by default differentiates between metadata-chunks and data-chunks
- Metadata is the data that describes the filesystem structure
- Data is just data e.g. the content of your files....
- Chunks can have different redundancy profiles (duplication, striping, mirroring, parity , dual parity)
- Duplication
- Works almost like RAID1 on a single device: maintains two mirrors of the chunk on a single storage device
- Striping
- Works almost like RAID0 : will stripe data across all storage devices
- Mirroring
- Works almost like RAID1 on multiple storage devices: will place two copies of the chunk on two different storage devices
- Parity (Note: as of writing this, this functionality is unstable and not recommended to use)
- Works almost like RAID5 on multiple storage devices: will stripe on all but one storage device and keep additional parity information on one storage device.
- Dual parity (Note: as of writing this, this functionality is unstable and not recommended to use)
- Works almost like RAID6 on multiple storage devices: is like above, but with two parity devices
- BTRFS currently use DUP, RAID0 , RAID1, RAID10, RAID5 and RAID6 as profile names - this is technically not correct, but it is "close enough" to what most people can relate to and understand.
- NOTE: RAID5 and RAID6 like configurations are as of writing this unstable and bug prone. Do not use for anything serious unless you have (verified working) backups.
- When BTRFS need space it will allocate a chunk of space from one (or more) of your storage device(s) and make sure the chunk is copied, striped or whatever fits the bill according to the explanation above.
- Keep in mind allocated space on disk, is not necessarily used space on disk.
- Automatic error recovery:
- If you set up BTRFS in a way that resembles for example RAID-1 then BTRFS may auto-repair if it hits a bad copy, providing that the alternate copy is good. Note: BTRFS does not check both copies on read (for performance reasons).
- This means that BTRFS may always read copy A and won't bother to read copy B. This means that if copy B gets corrupt and copy A at some point later gets corrupt there is nothing to recover from. You therefore need to run regular filesystem scrubs to ensure that BTRFS will verify both copies (which implies an auto-repair from the good copy if one is destroyed)
- Profiles that offers redundancy is: DUP (single disk), RAID-1 (one disk), RAID-10 (one disk), RAID-5 (one disk) and RAID-6 (two disks).
The first thing that you want to keep in mind is that you typically want your harddrives partitioned. BTRFS can be run directly on the underlying device, but it is not always advised or beneficial to do so. If you for example want to make your disk pool bootable (with redundancy) you might want to install GRUB on all of your disks so that if one disks is toast you will still be able to boot your system comfortably.Basic usage:
How to create a filesystem:
mkfs.btrfs /dev/sdx
...and if you want to use multiple storage devices...
mkfs.btrfs /dev/sdx /dev/sdy /dev/sdz
How to add a device to the filesystem
btrfs device add /dev/sdn /mountpoint
How to remove a device from the filesystem
btrfs device remove /dev/sdn /mountpoint
How to switch between storage profiles online:
btrfs balance start -dconvert=profile -mconvert=profile /mountpoint
The available storage profiles are described here:
Note the new format first determines the number of copies that should be made, then it determines how these copies should be stored.
nCmSpP = number of copies
nCmSpP = number of devices to stripe over (m=max)
nCmSpP = number of parity devices to use
Storage profile name
Description
Technical description
Redundancy
Total storage utilization
Old format:
New format:
A block of data / metadata is stored like so:
Device failures allowed:
In percent (%) :
SINGLE
1C
Only one copy on any device
No replicas
0
100
DUP
2CD
Two copies on one storage device
One local replica
0
50
RAID0
1CmS
One copy, striped over all storage devices
Striping
0 100
RAID1
2C
Two copies on different storage devices
1xReplica
1
50
RAID10
2CmS
Two copies, striped over different storage devices
1xReplica+1xStripe
1
50
N/A
3C
Three copies on different storage devices
2xReplicas
2
33
N/A
4C
Four copies on different storage devices
3xReplicas
3
25
RAID5
1CmS1P
One copy striped over all, but one storage device (used for parity)
1xStripe+1xParity
1
((num_devices-1)*100) / num_devices
RAID6
1CmS2P
One copy striped over all, but two storage devices (used for parity)
1xStripe+2xParity
2
((num_devices-2)*100) / num_devices
N/A
1CmS3P
One copy striped over all, but three storage devices (used for parity)
1xStripe+3xParity
3
((num_devices-3)*100) / num_devices
SINGLE
DUP
STRIPE
MIRROR
MIRROR2
MIRROR3
MIRRORSTRIPE
STRIPEPARITY1
STRIPEPARITY2
NOTE: Because BTRFS will store small files in the metadata don't be fooled into thinking that data=RAID6 protect all your files against dual disk failure unless you also have the metadata stored in the same profile
How to view filesystem allocation (Beware: this is NOT the same as usage)
btrfs filesystem usage -T /mnt