Tuesday, August 30, 2011

Zettabyte File System (ZFS) in Solaris 10

ZFS is an advanced modern filesystem from Sun Microsystems, originally designed for Solaris/OpenSolaris.

ZFS is a new file system in Solaris 10 OS which provides excellent data integrity and performance compared to other file systems (considering the enterprise storage scenario). Unlike previous file systems, it's a 128-bit file system, which means it can scale up to accommodate very large data. It is perhaps the world's first 128-bit file system. But why do we need so much scalability? The reason is simple. In an enterprise, data is continuously stored on servers and it keeps on increasing. Enterprises want to keep as much of this data live as possible, so that it can be quickly retrieved when required.

ZFS has many features which can benefit all kinds of users - from the simple end-user to the biggest enterprise systems:

  • Provable integrity - it checksums all data (and metadata), which makes it possible to detect hardware errors (hard disk corruption, flaky IDE cables, etc...)
  • Atomic updates - means that the on-disk state is consistent at all times, there's no need to perform a lengthy filesystem check after forced reboots or power failures
  • Instantaneous snapshots and clones - it makes it possible to have hourly, daily and weekly backups efficiently, as well as experiment with new system configurations without any risks
  • Built-in (optional) compression
  • Highly scalable
  • Pooled storage model - creating filesystems is as easy as creating a new directory. You can efficiently have thousands of filesystems, each with it's own quotas and reservations, and different properties (compression algorithm, checksum algorithm, etc...)
  • Built-in stripes (RAID-0), mirrors (RAID-1) and RAID-Z (it's like software RAID-5, but more efficient due to ZFS's copy-on-write transactional model).
  • Many others (variable sector sizes, adaptive endianness, ...)

In traditional file systems, data is stored on a single disk or on a large volume, consisting of multiple disks. In ZFS, a pool of storage model is used, ie every single storage device is part of a single expandable storage pool, irrespective of where the data is being written. Each storage device which resides inside the pool can have different file systems, which helps administrators scale the system in an easy and efficient manner, ie you no longer need to take care of the file system. Just add a storage device to the pool. With this new architecture, each file system that resides under the pool can share the same amount of size and I/O resources as the pool itself. Also ZFS is used for correcting noisy data corruption. For eg, in cases when you've done an I/O operation, the disk returns an error message, say, 'Can't read the specified block.' The second case could be silent data corruption, wherein you do an I/O operation and the system returns corrupted results. ZFS identifies and if possible even corrects these data corruptions, something which existing file systems can't do. Managing existing file systems is also difficult. For example, you upgrade your system after which you find that the file system doesn't support the machine and you have to copy all the data. This would consume a lot of time, but ZFS helps alleviate this. Moreover, existing file systems have limitations in terms of volumes, file size, etc.

ZFS definitely looks like a great engineering achievement and its makers have all rights to be proud of it. In their own words, they've blown away 20 years of obsolete assumptions and now they refer to ZFS as the last word in filesystems.

When ZFS was first announced, I'm sure many Linux hackers had a thought how it would be a great idea to port such a great filesystem to Linux. Unfortunately, ZFS source is distributed under Sun's CDDL license which is (some say deliberatly) incompatible with the GPL license that Linux kernel uses. So, it looks like there will be no native port of ZFS for Linux in the foreseeable future. What a pity