Root Cause Analysis and Recovery of a btrfs Out-of-Space Crash

Series of Events and Steps to Recovery

1. On 7th Feb 2019, when running `sudo zypper dup` estimated to download and upgrade about 1.4GiB of packages, suddenly Konsole and KDE apps crashed during the installation stage.

2.0. Inspecting the system, I noticed that `df -h /` reported 100% disk usage.

2.1. Worried of system corruption, I immediately ran `sync` and force-shutdown the system.

3. I then booted into the “pre” stage snapshot by Snapper from boot.

4. Inspecting the system, I noticed that `df -h /` reported close to full disk usage. Presumably, the space was exhausted when running the system upgrade.

sudo df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/cr_sda2 40G 37G 3.1G 93% /

5. I then ran `du -shx /` which only reported 1/4 the disk usage.

sudo du -xhs /
9.3G /

6. At this point, I suspected that a hidden volume related to snapshots is taking up “shadow” space in the root filesystem.

7. I then did some research, and realised that btrfs subvolumes act like a “tree of filesystems”, which `du -x` does not cross. For example, under btrfs, / and /var may be under different filesystems under the UNIX view, yet still belong to the same filesystem under btrfs as different “subvolumes”.

See: https://btrfs.wiki.kernel.org/index.php/SysadminGuide

“A btrfs subvolume is an independently mountable POSIX filetree” … “Most other POSIX filesystems have a single mountable root, btrfs has an independent mountale root for the volume (top level subvolume) and for each subvolume; a btrfs volume can contain more than a single filetree, it can contain a forest of filetrees.”

8. Using this knowledge, I ran `df -h` which listed different “subvolumes” of the root “volume”, all listing the same usage of the root “volume”. I had always suspected this was weird, but never realised what this meant until now.

This also explains why `btrfs filesystem usage` didn’t show anything to contradict `df -h /`

Filesystem Size Used Avail Use% Mounted on
...
/dev/mapper/cr_sda2 40G 37G 3.1G 93% /
/dev/mapper/cr_sda2 40G 37G 3.1G 93% /opt
/dev/mapper/cr_sda2 40G 37G 3.1G 93% /usr/local
/dev/mapper/cr_sda2 40G 37G 3.1G 93% /srv
/dev/mapper/cr_sda2 40G 37G 3.1G 93% /.snapshots
/dev/mapper/cr_sda2 40G 37G 3.1G 93% /boot/grub2/x86_64-efi
/dev/mapper/cr_sda2 40G 37G 3.1G 93% /boot/grub2/i386-pc
/dev/mapper/cr_sda2 40G 37G 3.1G 93% /var
/dev/mapper/cr_sda2 40G 37G 3.1G 93% /tmp
...

9. I then ran `du -shx` on the different “subvolume” mountpoints, but then ran `du -sh` instead after realising that these mountpoints may contain more subvolumes themselves, for example the /.snapshots mountpoint.

From the output, the /.snapshots mountpoint seemed most suspicious at 64G of usage compared to 4.9G of /var

10. Since 64G exceeds the total actual space of the volume, there is deduplication in effect. The actual “block” usage of different subvolumes has to be calculated using btrfs tools, and not standard UNIX tools, since the standard tools can’t see that two copies may actually use less space.

11. From this, I suspected that a possible reason for space exhaustion is because of accumulated total change in my system makes deduplication ineffective, forcing btrfs to store multiple copies of system libraries.

12. btrfs has no native mechanism to track unique block use by subvolumes. Instead, it uses a “quota” sub-system. The “quota” in this case does not refer to the normal quotas applied to UNIX users. Instead, it refers to btrfs’ internal “quota” that different subvolumes can take up.

13. Thus, in order to track actual space used by subvolumes, one must go through this “quota” system, even when you have no intention to restrict subvolume space usage. You must enable the “quota” system to see space usage by subvolumes.

14. I ran `btrfs quota enable` to enable the quota feature, and then ran `btrfs qgroup show /` to list the space usage for subvolumes under the root filesystem “volume”.

Thankfully, my installation was recent enough to automatically keep track of “quota groups” from the beginning, and I didn’t need to scan anything.

See: https://lwn.net/Articles/579009/

btrfs qgroup show /
qgroupid rfer excl
-------- ---- ----
0/5 16.00KiB 16.00KiB
0/257 16.00KiB 16.00KiB
0/258 6.12GiB 6.12GiB
0/259 5.62MiB 5.62MiB
0/260 2.78GiB 2.78GiB
0/261 16.00KiB 16.00KiB
0/262 72.27MiB 72.27MiB
0/263 3.50MiB 3.50MiB
0/264 16.00KiB 16.00KiB
0/265 4.64MiB 4.64MiB
0/536 7.67GiB 7.56GiB
0/946 9.62GiB 4.68GiB
0/1334 8.91GiB 17.84MiB
0/1339 8.91GiB 15.99MiB
0/1341 9.60GiB 196.58MiB
0/1350 9.01GiB 16.00KiB
0/1357 9.01GiB 16.00KiB

15. After enabling btrfs quota, I wondered if snapper had a feature to make use of this information, and it sure did. There is a command called `snapper setup-quota`.

Even without running that command, I could now see the space usage of snapshots under `snapper list` so I assume that `setup-quota` does the same thing I did by enabling quota manually.

16. The snapper list space usage showed the initial installation snapshot taking 4.68GiB and all others taking less than 200MiB, so it wasn’t as big a space sink as I suspected.

17. From here, correlating `btrfs qgroup show /` with `btrfs subvolume list /`, I could see /tmp taking up 2G and the easiest way to get the system online. Using ncdu, I deleted the largest files.

18. I then ran `df -h /` again to check that space had been freed up. 5.5G is larger than the initial 3.1G available.

df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/cr_sda2 40G 34G 5.5G 87% /

19. I then ran `snapper rollback` to restore to the system prior to the upgrade.

Prevention of Issue

1. Because of the Copy-on-Write nature of btrfs, I must make sure I have enough space for the preexisting libraries, as well as the new libraries, when doing a full system upgrade.

2. I must monitor the filesystem usage before upgrades, and prune snapshots and subvolumes if necessary.

I also learnt more about btrfs snapshots from this self-exercise!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s