← back

2018-09-23: getting a handle on zettabytes with ZFS


All systems need storage and modern business, especially in companies with many employees or complex mathematical and scientific data. Often this means a number of large files and requires a considerable amount of hard disk storage.

Existing RAID configurations demand significant hardware, and some like RAID5, feature so-called "write holes" that can painfully degrade performance and associated consequences upon a disk failure. RAID6 can reduce quite a lot of this, but it requires even more physical drives.

Given that the pace of development of disk firmware tends to be faster than the physical hardware, frequently even disks that feature the same physical platters can have different firmware version and features. Hence to truly ensure that the desired RAID is both safe and that no unforeseen minor incompatibilities occur, a company is often forced to invest in purchasing several additional spares up-front.

Multiple attempts to find alternatives to RAID tended to be very platform dependant or vendor specific, so they tended to not be widely adopted by the industry. The largest and most popular of these was ZFS from Sun Microsystems, while on the Linux side there was an attempt with the Btrfs file system. Neither were terribly ideal since ZFS was closed source and Btrfs has been in an extended beta since 2007.

Then in 2013 the OpenZFS group was able to fork the code base due to the fact that Sun Microsystems had been purchased by Oracle and opensourced the Solaris operating system. Modern storage needs for big systems use this open and feature-rich version of ZFS. From wikipedia:

ZFS is scalable, and includes extensive protection against data corruption, support for high storage capacities, efficient data compression, integration of the concepts of filesystem and volume management, snapshots and copy-on-write clones, continuous integrity checking and automatic repair, RAID-Z, native NFSv4 ACLs, and can be very precisely configured.

The feature list above is rather overwhelming, however, ZFS is designed to make it easy to manage a diverse storage ecosystem.

A word of caution to the reader; the ZFS code base uses the CDDL license which is not compatible with the GPL license of the Linux kernel. Ergo, you will either need to find a distro that has a pluggable pre-built dkms module (e.g. Ubuntu) or attempt to build the code yourself and then make a Linux kernel that works with that ZFS version; thus ZFS versions will not always work with any given Linux kernel.

ZFS works by means of storage pools, which are fundamently groups of hard drives that all interact with each other. Disks in another pool cannot influence with inside of a given pool. At any time, existing drives can be removed or new drives attached to the pool.

To create a pool, use the create command with a desired name, the RAID type and a list of devices to attach to the pool. For this example, we create a RAID-1 style mirrored list using drives by-id, call it example_pool.

sudo zpool create example_pool mirror \
/dev/disk/by-id/ata-SERIAL_NUMBER_12345_part1 \
/dev/disk/by-id/ata-SERIAL_NUMBER_23456_part1

Where '_part1' refers to the intended partition. The mirrors created by ZFS tend to be very robust and while other RAID styles are possible, they are much less flexible. A basic overview of some of the more interesting commands are demonstrated below.

Create a volume:

sudo zfs create name_of_pool/volume0

Tell ZFS to start on boot-up:

systemctl enable zfs.target
systemctl start zfs.target
systemctl enable zfs-import-cache
systemctl enable zfs-mount
systemctl enable zfs-import.target

Consider scrubbing the pool at least once a week to ensure proper caching performance:

sudo zpool scrub name_of_pool

Create, list, rollback, clone and destroy snapshots:

sudo zfs snapshot name_of_pool/volume1@123456
sudo zfs list -t snapshot -o name,creation
sudo zfs rollback -r name_of_pool/volume1@123456
sudo zfs clone name_of_pool/volume1@21082011 name_of_pool/volume1_restore
sudo zfs destroy name_of_pool/volume1@123456

Split the pool, either to create new separate pools or to replace disks:

sudo zpool split name_of_pool name_of_brand_new_pool \
/dev/ata-SERIAL_NUMBER_23456

Attach a new drive to a pool with an existing device:

sudo zpool attach name_of_pool \
/dev/ata-SERIAL_NUMBER_of_existing_device \
/dev/ata-SERIAL_NUMBER_of_brand_new_hdd

Check history of all of the pools or of a given pool:

sudo zpool history

Monitor the current I/O of a pool, every 6 seconds:

sudo zpool iostat 6

As an aside, you can also mix-and-match hard drives of multiple sizes.

For interest, if you have a set of mechanical platter drives and SSD with an operating system installed, you could preserve your data by making the SSD the first drive of a pool and then attach mechanical drives as a mirror.

That covers most of the important commands of ZFS, however, there are still quite a lot of other features available. Consider reading the man page if you are curious.