In this post we are going to illustrate how ZFS allocates and calculates storage space, using a simple single-disk zpool as an example.
Our environment is a VirtualBox VM running Ubuntu with ZFS package installed. VM has two virtual disks assigned:
First, we will examine sdb disk configuration with the fdisk -l command:
wg@ubuntu:~$ sudo fdisk -l /dev/sdb Disk /dev/sdb: 1000 GiB, 1073741824000 bytes, 2097152000 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes wg@ubuntu:~$
The total disk capacity is 1,073,741,824,000 bytes or 1000 GiB. Also note the sector size value of 512 bytes.
Let's create a zpool with the name pool1 using the whole disk sdb:
wg@ubuntu:~$ sudo zpool create -f pool1 /dev/sdb wg@ubuntu:~$ sudo zpool status pool1 pool: pool1 state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM pool1 ONLINE 0 0 0 sdb ONLINE 0 0 0 errors: No known data errors wg@ubuntu:~$
Our pool has been created successfully. The first thing we will do now is to check the size of the pool with zpool list or zpool get size:
wg@ubuntu:~$ sudo zpool list pool1 NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT pool1 992G 64K 992G - 0% 0% 1.00x ONLINE - wg@ubuntu:~$ wg@ubuntu:~$ sudo zpool get size pool1 NAME PROPERTY VALUE SOURCE pool1 size 992G - wg@ubuntu:~$
To obtain the exact pool size value in bytes we will use zpool get with the -p ("parsable values") switch:
wg@ubuntu:~$ sudo zpool get -p size pool1
NAME PROPERTY VALUE SOURCE
pool1 size 1065151889408 -
wg@ubuntu:~$
The difference between the size of the disk and the size of pool1 reported by the zpool get -p command is: 1,073,741,824,000 B - 1,065,151,889,408 B = 8,589,934,592 B = 8 GiB
This means, we've just "lost" 8 GiB out of the total disk capacity.
We will start our investigation by checking the disk partition configuration:
wg@ubuntu:~$ sudo fdisk -l /dev/sdb Disk /dev/sdb: 1000 GiB, 1073741824000 bytes, 2097152000 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: gpt Disk identifier: A299EECE-7ABC-1A47-9C34-516CFE85A82D Device Start End Sectors Size Type /dev/sdb1 2048 2097133567 2097131520 1000G Solaris /usr & Apple ZFS /dev/sdb9 2097133568 2097149951 16384 8M Solaris reserved 1 wg@ubuntu:~$
Comparing this output with the fdisk output obtained earlier, we can see that in the process of pool configuration, ZFS has created two new partitions:
Both partitions are aligned on 1 MiB (2048 blocks) boundaries - there is 1 MiB space at the beginning and at the end of the disk.
Hence, we've just "found" 10 MiB = 8 Mib + 1 MiB + 1 MiB. Not much, but it is a start.
As the next step, let's take a look at the vdev configuration for our pool with the help of the zdb -C command:
wg@ubuntu:~$ sudo zdb -C pool1 MOS Configuration: version: 5000 name: 'pool1' state: 0 txg: 4 pool_guid: 15475987159508587651 errata: 0 hostname: 'ubuntu' vdev_children: 1 vdev_tree: type: 'root' id: 0 guid: 15475987159508587651 create_txg: 4 children[0]: type: 'disk' id: 0 guid: 8609600542146121006 path: '/dev/sdb1' whole_disk: 1 metaslab_array: 34 metaslab_shift: 33 ashift: 9 asize: 1073726619648 is_log: 0 create_txg: 4 features_for_read: com.delphix:hole_birth com.delphix:embedded_data wg@ubuntu:~$
First, let's examine the asize (allocatable size) value, which shows the amount of space that can be allocated from this vdev. The difference between the asize value and the data partition size (2097131520 sectors * 512 B) reported by the fdisk -l command above is: 1,073,731,338,240 B - 1,073,726,619,648 B = 4,718,592 B = 4.5 MiB
This 4.5 MiB space is used for:
Well, this is one more step in the right direction, but 4.5 MiB is still a far cry from the "missing" 8 GiB.
The next parameter from the zdb output that we'd like to check is metaslab_shift. Vdevs are divided into 200 or less metalsabs for the purpose of space management. Metaslab size is always a number equal to 2N, where N is defined by the metaslab_shift parameter. In our case N = 33, therefore the metaslab size for our pool is: 233 B = 8,589,934,592 B = 8 GiB
The actual number of metaslabs per vdev can be determined with the help of the zdb -m command:
wg@ubuntu:~$ sudo zdb -m pool1
Metaslabs:
vdev 0
metaslabs 124 offset spacemap free
--------------- ------------------- --------------- -------------
metaslab 0 offset 0 spacemap 37 free 8.00G
metaslab 1 offset 200000000 spacemap 0 free 8G
metaslab 2 offset 400000000 spacemap 0 free 8G
metaslab 3 offset 600000000 spacemap 0 free 8G
metaslab 4 offset 800000000 spacemap 0 free 8G
metaslab 5 offset a00000000 spacemap 0 free 8G
metaslab 6 offset c00000000 spacemap 0 free 8G
...
... Output truncated
...
Multiplying the number of metaslabs by the metaslab size: 124 * 8 GiB = 992 GiB = 1,065,151,889,408 B
The result is equal to the size of the pool reported by the zpool get -p size command.
Let's subtract the size of the pool from the asize value reported by zdb -C:
1,073,726,619,648 B - 1,065,151,889,408 B = 8,574,730,240 B = 8177.5 MiB
This space is what remains of the vdev capacity after allocation of 124 metaslabs. Since the
remaining space is less than one metaslab in size, it is of no use for ZFS.
After adding all of the "losses" we've discovered earlier, we will receive the resulting sum of exactly 8GiB:
8177.5 MiB + 4.5 MiB + 10 MiB = 8192 MiB = 8GiB
Bingo!!!
Now we will take a look at the discrepancy between the storage capacity values reported by the zpool list and the zfs list commands:
wg@ubuntu:~$ sudo zpool list pool1 NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT pool1 992G 64K 992G - 0% 0% 1.00x ONLINE - wg@ubuntu:~$ wg@ubuntu:~$ sudo zfs list pool1 NAME USED AVAIL REFER MOUNTPOINT pool1 55K 961G 19K /pool1 wg@ubuntu:~$
To get the exact values in bytes we will use the zpool get -p and the zfs get -p commands:
wg@ubuntu:~$ sudo zpool get -p size,free,allocated pool1 NAME PROPERTY VALUE SOURCE pool1 size 1065151889408 - pool1 free 1065151823872 - pool1 allocated 65536 - wg@ubuntu:~$ wg@ubuntu:~$ sudo zfs get -p avail,used pool1 NAME PROPERTY VALUE SOURCE pool1 available 1031865836544 - pool1 used 56320 - wg@ubuntu:~$
To calculate the total capacity we need to add the available and used values reported by the zfs command: 1,031,865,836,544 B + 56,320 B = 1,031,865,892,864 B
The difference between the value obtained from the zfs command and the pool size value is: 1,065,151,889,408 B - 1,031,865,892,864 B = 33,285,996,544 B = 31 GiB
This is known as a slop space reservation. This space is to ensure that some critical ZFS operations can
complete even in situations with very low free space remaining in the pool.
Slop space is calculated as 1/32th of the zpool capacity:
1,065,151,889,408 B * 1/32 = 33,285,996,544 B = 31 GiB
To recap: We've used a disk with raw capacity of 1000 GiB to create a single-disk zpool and ended up with 961 GiB or 96.1% of
usable ZFS space. In our case the total overhead was 39 GiB or 3.9%.
The details for the ZFS overhead items we've discussed in the post are summarized in the following table:
Item | Size | Where Applicable |
Partition labels and alignment | 33 KiB - 2 MiB (actual value may vary between OSs / distributions) | Per physical disk |
Reserved partition | 8 MiB | Per physical disk |
Vdev labels | 4 x 256 KiB = 1 MiB | Per disk vdev / physical disk |
Boot block reservation | 3.5 MiB | Per disk vdev / physical disk |
Metaslab allocation loss | 0% - 0.5% (approximately) of the vdev capacity | Per top level vdev |
Slop space reservation | 1/32 or 3.125% of the pool capacity | Per zpool |