WintelGuy.com
ZFS Storage Overhead

In this post we are going to illustrate how ZFS allocates and calculates storage space, using a simple single-disk zpool as an example.

Our environment is a VirtualBox VM running Ubuntu with ZFS package installed. VM has two virtual disks assigned:

  • sda is used for OS installation;
  • sdb will be used to create a zpool.

First, we will examine sdb disk configuration with the fdisk -l command:

wg@ubuntu:~$ sudo fdisk -l /dev/sdb
Disk /dev/sdb: 1000 GiB, 1073741824000 bytes, 2097152000 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
wg@ubuntu:~$

The total disk capacity is 1,073,741,824,000 bytes or 1000 GiB. Also note the sector size value of 512 bytes.

Let's create a zpool with the name pool1 using the whole disk sdb:

wg@ubuntu:~$ sudo zpool create -f pool1  /dev/sdb
wg@ubuntu:~$ sudo zpool status pool1
  pool: pool1
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        pool1       ONLINE       0     0     0
          sdb       ONLINE       0     0     0

errors: No known data errors
wg@ubuntu:~$

Our pool has been created successfully. The first thing we will do now is to check the size of the pool with zpool list or zpool get size:

wg@ubuntu:~$ sudo zpool list  pool1
NAME    SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
pool1   992G    64K   992G         -     0%     0%  1.00x  ONLINE  -
wg@ubuntu:~$
wg@ubuntu:~$ sudo zpool get size pool1
NAME   PROPERTY   VALUE  SOURCE
pool1  size       992G   -
wg@ubuntu:~$

To obtain the exact pool size value in bytes we will use zpool get with the -p ("parsable values") switch:

wg@ubuntu:~$ sudo zpool get -p size  pool1
NAME   PROPERTY   VALUE          SOURCE
pool1  size       1065151889408  -
wg@ubuntu:~$

The difference between the size of the disk and the size of pool1 reported by the zpool get -p command is: 1,073,741,824,000 B - 1,065,151,889,408 B = 8,589,934,592 B = 8 GiB

This means, we've just "lost" 8 GiB out of the total disk capacity.

We will start investigation by checking the disk partition configuration:

wg@ubuntu:~$ sudo fdisk -l /dev/sdb
Disk /dev/sdb: 1000 GiB, 1073741824000 bytes, 2097152000 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: A299EECE-7ABC-1A47-9C34-516CFE85A82D

Device          Start        End    Sectors  Size Type
/dev/sdb1        2048 2097133567 2097131520 1000G Solaris /usr & Apple ZFS
/dev/sdb9  2097133568 2097149951      16384    8M Solaris reserved 1
wg@ubuntu:~$

Comparing this output with the fdisk output obtained earlier, we can see that in the process of pool configuration, ZFS has created two new partitions:

  • sdb1 - data partition with the size of 2,097,131,520 sectors or 1,073,731,338,240 bytes;
  • sdb9 - "reserved" partition with the size of 16,384 sectors or 8 MiB.

Both partitions are aligned on 1 MiB (2048 blocks) boundaries - there is 1 MiB space at the beginning and at the end of the disk.

Hence, we've just "found" 10 MiB = 8 Mib + 1 MiB + 1 MiB. Not much, but it is a start.

As the next step, let's take a look at the vdev configuration for our pool with the help of the zdb -C command:

wg@ubuntu:~$ sudo zdb -C pool1

MOS Configuration:
        version: 5000
        name: 'pool1'
        state: 0
        txg: 4
        pool_guid: 15475987159508587651
        errata: 0
        hostname: 'ubuntu'
        vdev_children: 1
        vdev_tree:
            type: 'root'
            id: 0
            guid: 15475987159508587651
            create_txg: 4
            children[0]:
                type: 'disk'
                id: 0
                guid: 8609600542146121006
                path: '/dev/sdb1'
                whole_disk: 1
                metaslab_array: 34
                metaslab_shift: 33
                ashift: 9
                asize: 1073726619648
                is_log: 0
                create_txg: 4
        features_for_read:
            com.delphix:hole_birth
            com.delphix:embedded_data
wg@ubuntu:~$

First, let's examine the asize (allocatable size) value, which shows the amount of space that can be allocated from this vdev. The difference between the asize value and the data partition size (2097131520 sectors * 512 B) reported by the fdisk -l command above is: 1,073,731,338,240 B - 1,073,726,619,648 B = 4,718,592 B = 4.5 MiB

This 4.5 MiB space is used for:

  • four copies of vdel label ( 4 x 256 KiB = 1 MiB );
  • boot block reservation ( 3.5 MiB ).

Well, this is one more step in the right direction, but 4.5 MiB is still a far cry from the "missing" 8 GiB.

The next parameter from the zdb output that we'd like to check is metaslab_shift. Vdevs are divided into 200 or less metalsabs for the purpose of space management. Metaslab size is always a number equal to 2N, where N is defined by the metaslab_shift parameter. In our case N = 33, therefore the metaslab size for our pool is: 233 B = 8,589,934,592 B = 8 GiB

The actual number of metaslabs per vdev can be determined with the help of the zdb -m command:

wg@ubuntu:~$ sudo zdb -m pool1

Metaslabs:
        vdev          0
        metaslabs   124   offset                spacemap          free
        ---------------   -------------------   ---------------   -------------
        metaslab      0   offset            0   spacemap     37   free    8.00G
        metaslab      1   offset    200000000   spacemap      0   free       8G
        metaslab      2   offset    400000000   spacemap      0   free       8G
        metaslab      3   offset    600000000   spacemap      0   free       8G
        metaslab      4   offset    800000000   spacemap      0   free       8G
        metaslab      5   offset    a00000000   spacemap      0   free       8G
        metaslab      6   offset    c00000000   spacemap      0   free       8G
...
... Output truncated
...

Multiplying the number of metaslabs by the metaslab size: 124 * 8 GiB = 992 GiB = 1,065,151,889,408 B

The result is equal to the size of the pool reported by the zpool get -p size command.
Let's subtract the size of the pool from the asize value reported by zdb -C: 1,073,726,619,648 B - 1,065,151,889,408 B = 8,574,730,240 B = 8177.5 MiB

This space is what remains of the vdev capacity after allocation of 124 metaslabs. Since the remaining space is less than one metaslab in size, it is of no use for ZFS.
After adding all of the "losses" we've discovered earlier, we will receive the resulting sum of exactly 8GiB: 8177.5 MiB + 4.5 MiB + 10 MiB = 8192 MiB = 8GiB

Bingo!!!

Now we will take a look at the discrepancy between the storage capacity values reported by the zpool list and the zfs list commands:

wg@ubuntu:~$ sudo zpool list  pool1
NAME    SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
pool1   992G    64K   992G         -     0%     0%  1.00x  ONLINE  -
wg@ubuntu:~$
wg@ubuntu:~$ sudo zfs list pool1
NAME    USED  AVAIL  REFER  MOUNTPOINT
pool1    55K   961G    19K  /pool1
wg@ubuntu:~$ 

To get the exact values in bytes we will use the zpool get -p and the zfs get -p commands:

wg@ubuntu:~$ sudo zpool get -p size,free,allocated  pool1
NAME   PROPERTY   VALUE          SOURCE
pool1  size       1065151889408  -
pool1  free       1065151823872  -
pool1  allocated  65536          -
wg@ubuntu:~$
wg@ubuntu:~$ sudo zfs get -p avail,used  pool1
NAME   PROPERTY   VALUE          SOURCE
pool1  available  1031865836544  -
pool1  used       56320          -
wg@ubuntu:~$

To calculate the total capacity we need to add the available and used values reported by the zfs command: 1,031,865,836,544 B + 56,320 B = 1,031,865,892,864 B

The difference between the value obtained from the zfs command and the pool size value is: 1,065,151,889,408 B - 1,031,865,892,864 B = 33,285,996,544 B = 31 GiB

This is known as a slop space reservation. This space is to ensure that some critical ZFS operations can complete even in situations with very low free space remaining in the pool.
Slop space is calculated as 1/32th of the zpool capacity: 1,065,151,889,408 B * 1/32 = 33,285,996,544 B = 31 GiB


To recap: We've used a disk with raw capacity of 1000 GiB to create a single-disk zpool and ended up with 961 GiB or 96.1% of usable ZFS space. In our case the total overhead was 39 GiB or 3.9%.
The details for the ZFS overhead items we’ve discussed in the post are summarized in the following table:

ItemSizeWhere Applicable
Partition labels and alignment 33 KiB - 2 MiB (actual value may vary between OSs / distributions)Per physical disk
Reserved partition 8 MiBPer physical disk
Vdev labels 4 x 256 KiB = 1 MiBPer disk vdev / physical disk
Boot block reservation 3.5 MiBPer disk vdev / physical disk
Metaslab allocation loss 0% - 0.5% (approximately) of the vdev capacity Per top level vdev
Slop space reservation 1/32 or 3.125% of the pool capacityPer zpool