Differences between revisions 19 and 20
Revision 19 as of 2014-04-09 22:49:21
Size: 6479
Editor: JayVyas
Comment:
Revision 20 as of 2014-08-12 23:57:15
Size: 8779
Editor: JayVyas
Comment: I beleive from now, the monolithic workstream is complete, and people can now implement and extend / evolve the HCFS work over time, iteratively.
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
'''Hadoop FileSystem and FileContext work, largely concluded.'''

Good news ! With https://issues.apache.org/jira/browse/HADOOP-9361, we are now able to test hadoop FileSystems in an unambiguous and declarative manner,
using a combination of:

* An XML File to define FileSystem semantics. This file needs to be loaded in your unit tests. The contract will define the semantics of your file system, and the unit tests will then test based on the parameters you define. For example,
{{{
   <property>
   <name>fs.contract.supports-unix-permissions</name>
   <value>true</value>
   </property>
}}}
* The standard contract test super classes bundled into Hadoop. These are built in the hadoop common tests jar (hadoop-common-3.0.0-SNAPSHOT-tests.jar).

* Adding custom classes to override each of the above super classes. To do this, you manually create classes extending from the super classes in the hadoop tests jar, like so:
{{{
 public class TestMyHCFSBaseContract extends AbstractFSContract
 public class TestMyHCFSCreateTests extends AbstractContractCreateTest
 public class TestMyHCFSFSContractDelete extends AbstractContractDelete
 public class TestMyHCFSContractLoaded extends AbstractFSContractLoaded
 public class TestMyHCFSContractMkdir extends AbstractContractMkdir
 public class TestMyHCFSContractOpen extends AbstractContractOpenTest
}}}
And so on (all the classes which you can overide are in org.apache.hadoop.fs.contract., and you can scan the existing hadoop source code for examples of how to properly override them.

The completion of this coherent and flexible test framework allows us to expand upon and customize hadoop file system work. To extend the contract tests, or add new semantics,
there is a clear path : The .md files, which exist inside of existing hadoop-common source code. See the src/site/markdown/filesystem/.... files to do so.
These can easily be browsed here:

https://github.com/apache/hadoop-common/tree/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem

Thanks to Steve Loughran and other 9361 reviewers for this critical and timely update to the hadoop contract tests. This democratizes storage underneath hadoop and enables continued co-evolution of storage and YARN computation, hand in hand.
Line 5: Line 38:
* HADOOP-9361: A programmatic way for FS implementations to broadcast the features they support, so that all file systems can reuse the same basic test libraries without needing to do ad-hoc overriding of certain implementations.  * HADOOP-9361: A programmatic way for FS implementations to broadcast the features they support, so that all file systems can reuse the same basic test libraries without needing to do ad-hoc overriding of certain implementations.
Line 7: Line 40:
* HADOOP-10463: Boosting RawLocalFileSystem test coverage and  * HADOOP-10463: Boosting RawLocalFileSystem test coverage and
Line 18: Line 51:
 

Hadoop FileSystem and FileContext work, largely concluded.

Good news ! With https://issues.apache.org/jira/browse/HADOOP-9361, we are now able to test hadoop FileSystems in an unambiguous and declarative manner, using a combination of:

* An XML File to define FileSystem semantics. This file needs to be loaded in your unit tests. The contract will define the semantics of your file system, and the unit tests will then test based on the parameters you define. For example,

   <property>
   <name>fs.contract.supports-unix-permissions</name>
   <value>true</value>
   </property>

* The standard contract test super classes bundled into Hadoop. These are built in the hadoop common tests jar (hadoop-common-3.0.0-SNAPSHOT-tests.jar).

* Adding custom classes to override each of the above super classes. To do this, you manually create classes extending from the super classes in the hadoop tests jar, like so:

 public class TestMyHCFSBaseContract extends AbstractFSContract
 public class TestMyHCFSCreateTests extends AbstractContractCreateTest
 public class TestMyHCFSFSContractDelete extends AbstractContractDelete
 public class TestMyHCFSContractLoaded extends AbstractFSContractLoaded
 public class TestMyHCFSContractMkdir extends AbstractContractMkdir
 public class TestMyHCFSContractOpen extends AbstractContractOpenTest

And so on (all the classes which you can overide are in org.apache.hadoop.fs.contract., and you can scan the existing hadoop source code for examples of how to properly override them.

The completion of this coherent and flexible test framework allows us to expand upon and customize hadoop file system work. To extend the contract tests, or add new semantics, there is a clear path : The .md files, which exist inside of existing hadoop-common source code. See the src/site/markdown/filesystem/.... files to do so. These can easily be browsed here:

https://github.com/apache/hadoop-common/tree/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem

Thanks to Steve Loughran and other 9361 reviewers for this critical and timely update to the hadoop contract tests. This democratizes storage underneath hadoop and enables continued co-evolution of storage and YARN computation, hand in hand.

Hadoop FileSystem and FileContext Workstream (2014)

In order to begin iterating on improving HCFS test coverage we need an unambiguous mechanism for implementing HCFS tests. After implementing HCFS tests, we want to compare test coverage with a gold-standard, and finally, where any gaps exist we want to be able to justify them using simplest semantics possible, ideally in code. The below three JIRAs address these 3 issues. Once they are completed, HCFS testing will be much simpler.

* HADOOP-9361: A programmatic way for FS implementations to broadcast the features they support, so that all file systems can reuse the same basic test libraries without needing to do ad-hoc overriding of certain implementations.

* HADOOP-10463: Boosting RawLocalFileSystem test coverage and

* HADOOP-10461: Creating an injection framework for any HCFS to plug into, which supports: (feed back pending from OrangeFS Community and others).

  • Ignoring or modifying expectaions of certain tests (i.e. alongside the semantics of HADOOP-9361).
  • The ceremony of inheriting or invoking necessary underlying test utility classes as necessary.

* BIGTOP-1089: Scale testing as a universal HCFS integration test, confirming that the whole ecosystem works together on FS interface level. (Scale testing / updateing to 50 input splits pending from OrangeFS community).

In another thread, we will work to improve coverage of RawLocalFileSystem (LocalFs/LocalFileSystem)

Hadoop FileSystem Validation Workstream (2013)

Hadoop has a pluggable FileSystem Architecture. 3rd party FileSystems can be enabled for Hadoop by developing a plugin that mediates between the Hadoop FileSystem Interface and the interface of the 3rd Party FileSystem. For those developing a Hadoop FileSystem plugin, there is no comprehensive test library to validate that their plugin creates a Hadoop FileSystem implementation that is Hadoop compatible.

What do we mean by comprehensive? We mean that there is a test for every single operation in the FS Interface that properly tests the expected behavior of the operation given the full variability of its parameters. To create a comprehensive test library, we plan to do the following:

* Focus on the Hadoop 2.0 FS Interface. If possible, create a work stream that would allow testing and validation of the FS 1.0 Interface also.

* Undertake an audit of the Hadoop FileSystem 1.0 Test Coverage - Link to Work In Progress

* Undertake an audit of the Hadoop FileSystem 2.0 Test Coverage - Link to Work In Progress

* Document the FileSystem 2.0 Specification (as a JavaDoc) as a JIRA Ticket

* Create a gap analysis contrasting the FileSystem 2.0 Specification and the audits of existing FileSystem 2.0 Test Coverage.

* Create tests to fill in the gaps

  • - Also, create a test strategy for handling Object/Block Stores as Hadoop FileSystems

Once the comprehensive test library is complete, it can then be used by the provider of a 3rd Party FileSystem to verify compatibility with Hadoop by:

  • - Passing Functional Validation: Successfully passing the test library that will be created (described above)

    - Passing Ecosystem Validation: Successfully passing the Hadoop Integration Tests from Apache BigTop


Next Meeting

June 25th 2013 - Face to Face meeting at Red Hat in Mountain View. The day before Hadoop Summit. Details/Sign up here - http://hadoop-fs.eventbrite.com/


Work thus far


June 10th 2013 9AM PST via Google Hangout

Attendees: Tim St. Clair, Matt Farrellee, Steve Watt, Jay Vyas, Steve Loughran, Sanjay Radia, Andrew Purtell, Joe Buck, Roman Shaposhnik, Nathan (?)

Agenda:

- Discussion of the goals of the work

- Steve Loughran to give an update on the Hadoop FS Tests he developed for SWIFT

- Discussion on where people would like to participate

Outcome:

- Validation of the current goals, plus the addition of:

  • Leveraging BigTop in order to make use of the Hadoop Ecosystem Tests as additional Test Coverage for a FileSystem Implementation

  • Create a workstream to identify if Object/Blob stores have unique properties that make them a special case for Test Coverage as a Hadoop FS. Create a strategy for handling Object/Block Stores.
  • Focus tests against the AbstractFileSystem class rather than the FileSystem Class (which is an abstract class). Yes, this can be confusing.

  • Create a Hadoop 2.0 FileSystem Interface Specification for developers creating plugins as well as additional background for interested users. This should be created as a JavaDoc and managed in JIRA so that it supports proper governance.

The workstream definition at the top of this page has been updated to reflect the new additions to the initiative.


June 4th 2013

Created a diff report contrasting Hadoop FileSystem 1.0 and 2.0

Next step is to evaluate how comprehensive the unit test case coverage is for FileSystem 1.0 and 2.0. This is a work in progress Audit of the FileSystem 1.0 Test Library


May 23rd 2013 - A broader call for participation was made to the hadoop-core dev proposing:

* broader participation in defining the expected behavior of Hadoop FileSystem operations

* creating a comprehensive test suite verifying compliance with the expected behavior of a Hadoop FileSystem

* several google hangouts and a workshop to discuss the topics

The following parties responded that they were interested in participation: - mbhandarkar@gopivotal.com , shv.hadoop@gmail.com , stevel@hortonworks.com , erlv5241@gmail.com , shaposhnik@gmail.com , apurtell@apache.org , cdouglas@apache.org , jayhawk@cs.ucsc.edu , sanjay@hortonworks.com , dmitry.bimatov@isilon.com

HCFS/Progress (last edited 2014-08-12 23:57:15 by JayVyas)