Apache Zebra Wiki


Zebra is a storage layer that provides a high level data access abstraction and a tabular view of data in Hadoop, and could free Pig users from implementing their own data storage/retrieval code. It provites

In the future, it could also support predicate pushdown for further performance improvement. Initially, Zebra is released as a contrib project in Pig and can become a hadoop subproject later on.


Zebra requires Hadoop 20 (as of July 24th, 2009 with Hadoop patch 6150) that supports TFile and works with Pig 0.3.0 with patch PIG-660. This patch makes PIG work with Hadoop 20. Zebra has been submitted as PIG-833.

Getting Zebra

Zebra has been committed as a Pig contrib project at:

Zebra source code

Compilation prerequisite:

How to compile:

Zebra jar will be generated at build/contrib/zebra directory

Running Zebra

Sample Mapreduced code, Pig scripts attached to this wiki.

Java doc is available at Zebra JavaDoc