HDFS Futures

Below is a categorized list and descriptions of HDFS Future Features

Goal: HDFS for Production Use

  1. Reliable and Secure: The file system is solid enough for user to feel comfortable to use in "production"
  2. Good Enough Performance: HDFS should not limit the scaling of the Grid and the utilization of the nodes in the Grid
  3. Rich Enough FS Features for applications
  4. Sufficient Operations and Management features to manage large 4K Cluster


Service Scaling

This means scaling the Name Service (aka Namenode) and the number of Datanodes that can be present in a HDFS system.

For scaling the Name service (Namenode), there are two main issues here

Improving one may improve the other.

Summary of various options that scale name space and its performance (details below)

(Also see ScaleNN_Sea_of_Options.pdf)

Scaling Name Service Throughput and Response Time

Scaling Namespace (i.e. number of files/dirs)

Since the name node stores block and name objects in memory, the size of the name space (and hence the number of files) is limited by amount of heap memory. Currently a 14GB heap (ie 16GB machine) allows 60 million block and name objects. Hence if one has 2 blocks per file, then one is limited to 20 million files. This is a significant restriction for large clusters. Besides adding more memory, several options are listed below.

Partition/distribute Name node (will also help performance)

Several Options:

Name Service Availability (includes integrityof NN data, HA, etc)

Integrity of NN Image and Journal

Faster Startup

Restart and Failover

Security: Authorization and ACLs

File Features

File IO Performance

Namespace Features

File Data Integrity (For NN see NN data integrity above)

Operations and Management Features

Hadoop Protocol RPC

RPC Timeouts, Connection handling, Q handling, threading

Client-side recovery from NN restarts and faIlovers

Versioning

Multiple Language Support

Benchmarks and Performance Measurements

Diagnosability

Development Support

Intercluster Features

BCP support

Attachments

<<AttachList>>