Over time, however, we also need to maintain the HCFS tests. Heres a quick way to confirm the behaviour of a test on hadoop trunk, in case you want to know that the test "actually works", before you blame your hadoop connector

The above line will test the TestRawLocalContractAppend case. Since that test may or may not be skipped depending on your configuration, you may need to temporarily comment out the skipIfUnsupported line in setup().

September 2, 2014 How to update the HCFS contract and test classes as the FileSystem evolves.

When we define new file system behaviours, its critical to update the contract documentation and tests. However, its not particularly difficult to do this,

the HCFS contracts only consist of:

Unit tests which are extended for file systems, annotated with fields from

XML files which define a filesystem's semantics

A series of .md files, which define a semi formal specification.

The steps to extend any FileSystem semantics are now quite simple, and explicit.

* An XML File to define FileSystem semantics. This file needs to be loaded in your unit tests. The contract will define the semantics of your file system, and the unit tests will then test based on the parameters you define. For example,

And so on (all the classes which you can overide are in org.apache.hadoop.fs.contract., and you can scan the existing hadoop source code for examples of how to properly override them.

The completion of this coherent and flexible test framework allows us to expand upon and customize hadoop file system work. To extend the contract tests, or add new semantics, there is a clear path : The .md files, which exist inside of existing hadoop-common source code. See the src/site/markdown/filesystem/.... files to do so. These can easily be browsed here:

Thanks to Steve Loughran and other 9361 reviewers for this critical and timely update to the hadoop contract tests. This democratizes storage underneath hadoop and enables continued co-evolution of storage and YARN computation, hand in hand.

In order to begin iterating on improving HCFS test coverage we need an unambiguous mechanism for implementing HCFS tests. After implementing HCFS tests, we want to compare test coverage with a gold-standard, and finally, where any gaps exist we want to be able to justify them using simplest semantics possible, ideally in code. The below three JIRAs address these 3 issues. Once they are completed, HCFS testing will be much simpler.

* HADOOP-9361: A programmatic way for FS implementations to broadcast the features they support, so that all file systems can reuse the same basic test libraries without needing to do ad-hoc overriding of certain implementations.

Hadoop has a pluggable FileSystem Architecture. 3rd party FileSystems can be enabled for Hadoop by developing a plugin that mediates between the Hadoop FileSystem Interface and the interface of the 3rd Party FileSystem. For those developing a Hadoop FileSystem plugin, there is no comprehensive test library to validate that their plugin creates a Hadoop FileSystem implementation that is Hadoop compatible.

What do we mean by comprehensive? We mean that there is a test for every single operation in the FS Interface that properly tests the expected behavior of the operation given the full variability of its parameters. To create a comprehensive test library, we plan to do the following:

* Focus on the Hadoop 2.0 FS Interface. If possible, create a work stream that would allow testing and validation of the FS 1.0 Interface also.

Create a Hadoop 2.0 FileSystem Interface Specification for developers creating plugins as well as additional background for interested users. This should be created as a JavaDoc and managed in JIRA so that it supports proper governance.

The workstream definition at the top of this page has been updated to reflect the new additions to the initiative.