Notes by Arun C. Murthy

  • Shared goals
    • Hadoop is HDFS & Map-Reduce in this context of this set of slides
  • Priorities
    • Yahoo
      • Correctness
      • Availability: Not the same as high-availability (6 9s. etc.) i.e. SPOFs
      • API Compatibility
      • Scalability
      • Operability
      • Performance
      • Innovation
    • Cloudera
      • Test coverage, api coverage
      • APL Licensed codec (lzo replacement)
      • Security
      • Wire compatibility
      • Cluster-wide resource availability
      • New apis (FileContext, MR Context Objs.), documentation of their advantages
      • HDFS to better support non-MR use-cases
      • Cluster metrics hooks
      • MR modularity (package)
    • Facebook
      • Correctness
      • Availability, High Availability, Failover, Continuous Availability
      • Scalability
  • Bar for patches/features keeps going higher as the project matures
    • Build consensus (e.g. Python Enhancement Process, JSR etc.)
    • Run/test on your own to prove the concept/feature or branch and finish
    • Early versions of libraries should be started outside of the project (github etc.) e.g. input-formats, compression-codecs etc.
      • github for all the above
      • Prune contrib
  • Maven for packaging
  • Tom: Hadoop Common/HDFS/Mapreduce 0.21 release
  • Owen: Release Manager (see slides)
  • Agenda for next meeting
    • Eli: Hadoop Enhancement Process (modelled on PEP?)
    • Branching strategies: Development Models
  • No labels