[Disclosure: I was invited by GestaltIT as a delegate to their Storage Field Day 19 event from Jan 22-24, 2020 in the Silicon Valley USA. My expenses, travel, accommodation and conference fees were covered by GestaltIT, the organizer and I was not obligated to blog or promote the vendors’ technologies to be presented at this event. The content of this blog is of my own opinions and views]

Western Digital dived into Storage Field Day 19 in full force as they did in Storage Field Day 18. A series of high impact presentations, each curated for the diverse requirements of the audience. Several open source initiatives were shared, all open standards to address present inefficiencies and designed and developed for a greater future.

Zoned Storage

One of the initiatives is to increase the efficiencies around SMR and SSD zoning capabilities and removing the complexities and overlaps of both mediums. This is the Zoned Storage initiatives a technical working proposal to the existing NVMe standards. The resulting outcome will give applications in the user space more control on the placement of data blocks on zone aware devices and zoned SSDs, collectively as Zoned Block Device (ZBD). The implementation in the Linux user and kernel space is shown below:

I woke up yesterday morning with a shocker of a news. IBM announced that they were buying Redhat for USD34 billion. Never in my mind that Redhat would sell but I guess that USD190.00 per share was too tempting. Redhat (RHT) was trading at USD116.68 on the previous Friday’s close.

Redhat is one of my favourite technology companies. I love their Linux development and progress, and I use a lot of Fedora and CentOS in my hobbies. I started with Redhat back in 2000, when I became obsessed to get my RHCE (Redhat Certified Engineer). I recalled on almost every weekend (Saturday and Sunday) back in 2002 when I was in the office, learning Redhat, and hacking scripts to be really good at it. I got certified with RHCE 4 with a 96% passing mark, and I was very proud of my certification.

One of my regrets was not joining Redhat in 2006. I was offered the job as an SE by Josep Garcia, and the very first position in Malaysia. Instead, I took up the Hitachi Data Systems job to helm the project implementation and delivery for the Shell GUSto project. It might have turned out differently if I did.

The IBM acquisition of Redhat left a poignant feeling in me. In many ways, Redhat has been the shining star of Linux. They are the only significant one left leading the charge of open source. They are the largest contributors to the Openstack projects and continue to support the project strongly whilst early protagonists like HPE, Cisco and Intel have reduced their support. They are of course, the perennial top 3 contributors to the Linux kernel since the very early days. And Redhat continues to contribute to projects such as containers and Kubernetes and made that commitment deeper with their recent acquisition of CoreOS a few months back.

There’s better a lot of chatter about the default file system in the recently released Fedora 16. Fedora 16 was released on November 8, 2011. For months, there were rumours that the default file system for Fedora 16 will be btrfs (btree file system, or better known as butter-FS). But after the release, the default file system of Fedora 16 is still ext4, much to the dismay of many. btrfs holds a lot of potential because in the space of less than 4 years, it has moved up the hierarchy to be one of the top file systems in the Linux ecosystem.

File systems in Linux has not seen had a knight in shining armour for the long time. The file systems kings of the Linux world were ext2/3 for the RedHat and Debian flavoured distros and reiserfs for the SuSE flavoured distros. ext4 is now default in most Linux distros but the concept of ext2/3/4 has not changed much since its inception decades ago.

At the same time, reiserfs had a lot of promise as well, but its development and progress have lost it lustre after its principal developer, Han Reiser was convicted of the murder of his wife a few years ago. If you are KPC (Malaysian Chinese colloquialism, meaning busybody), you can read the news here.

btrfs is going to be the new generation of file systems for Linux and even Ted T’so, the CTO of Linux Foundation and principal developer admitted that he believed btrfs is the better direction because “it offers improvements in scalability, reliability, and ease of management”.

For those who has studied computer science, B-Tree is a data structure that is used in databases and file systems. B-Tree is an excellent data structure to store billions and billions of objects/data and is able to provide fast data retrieval in logarithmic time. And the B-Tree implementation is already present in some of the file systems such as JFS, XFS and ReiserFS. However, these file systems are not shadow-paging filesystems (popularly known as copy-on-write file systems).

You see, B-Tree, in its native form of implementation, is very incompatible with COW file systems. In fact, the implementation was thought of impossible, until someone by the name of Ohad Rodeh came along. He presented a paper in Usenix FAST ’07 which described the challenges of combining the B-Tree concept with shadow paging file systems. The solution, as he described it, was to introduce insert/remove key into the tree structure, and removing the dependency of intra-leaf linking.

Chris Mason, one of the developers of reiserfs, took Ohad’s idea and created a shadow-paging file system based on the B-Tree idea.

Traditional file systems tend to follow the idea of the Berkeley Fast File Systems. Different cylinder groups have its own inode, bitmap and disk blocks. The used space in one cylinder group cannot be shared to another cylinder group, resulting in wastage. At the same time, performance can be an issue as the disk read/head frequently have to move to the inodes to find out where the next used or free blocks will be. In a way, it looks like the diagram below.

Chris Mason’s idea of btrfs made the file system looked like this:

Today, a quick check in btrfs wiki page, shows that the main Btrfs features available at the moment include:

Extent based file storage

2^64 byte == 16 EiB maximum file size

Space-efficient packing of small files

Space-efficient indexed directories

Dynamic inode allocation

Writable snapshots, read-only snapshots

Subvolumes (separate internal filesystem roots)

Checksums on data and metadata

Compression (gzip and LZO)

Integrated multiple device support

RAID-0, RAID-1 and RAID-10 implementations

Efficient incremental backup

Background scrub process for finding and fixing errors on files with redundant copies

Online filesystem defragmentation

And Chris Mason and his team have still plenty more to offer for btrfs. It will likely be the default file system in Fedora 17 and at the rate it is going, could be the file system of choice for RedHat, Debian and SuSE distros very soon.

There’s been a lot of comparisons between btrfs and ZFS, since both are part of Oracle now. ZFS is obviously a much more mature file system, with more enterprise features and more robust, (Incidentally, ZFS just celebrated its 10th birthday on Halloween 2011 – see Matt Ahren’s blog) and the btrfs is the rising star in the Linux world. But at this moment, the 2 file systems are set apart in their market positioning and deployment.

ZFS is based on the CDDL (Common Development and Distribution License) while btrfs is based on GNU GPL, open source licensing. There are controversies surrounding CDDL licensing scheme, while is incompatible with GNU GPL scheme.

Oracle can count itself very lucky to have 2 of the most promising and prominent COW file systems around. It will be interesting to see what Oracle will do next. As a proponent of innovation, community and sharing, I sincerely hope that both file systems will continue to thrive in Oracle’s brutal, sales-driven organization. We certainly don’t want to see controversies about dual ownership BS of Oracle and mySQL that could end in 2015.