Load/Store Functions

Load/Store Functions are written by implementing one or both of the interfaces given below.

If the LoadFunc interface is implemented, the function can be used to load tuples. If the StoreFunc interface is implemented, the function can be used to store tuples. Since loading and storing are usually tied to each other, most functions will implement both interfaces, e.g., PigStorage and BinStorage do. However, occassionally, we may write a function only for loading.

For examples of how to implement the following interfaces, look at PigStorage, or BinStorage.

public interface LoadFunc {
        /**
         * This interface is used to implement functions to parse records
         * from a dataset.
         *
         */
        /**
         * Specifies a portion of an InputStream to read tuples. Because the
         * starting and ending offsets may not be on record boundaries it is up to
         * the implementor to deal with figuring out the actual starting and ending
         * offsets in such a way that an arbitrarily sliced up file will be processed
         * in its entirety.
         * <p>
         * A common way of handling slices in the middle of records is to start at
         * the given offset and, if the offset is not zero, skip to the end of the
         * first record (which may be a partial record) before reading tuples.
         * Reading continues until a tuple has been read that ends at an offset past
         * the ending offset.
         *  
         * @param fileName the name of the file to be read
         * @param is the stream representing the file to be processed.
         * @param offset the offset to start reading tuples.
         * @param end the ending offset for reading.
         * @throws IOException
         */
        public abstract void bindTo(String fileName, InputStream is, long offset, long end) throws IOException;

        /**
         * Retrieves the next tuple to be processed.
         * @return the next tuple to be processed or null if there are no more tuples
         * to be processed.
         * @throws IOException
         */
        public abstract Tuple getNext() throws IOException;     
}

and

public interface StoreFunc {
        /**
        * This interface is used to implement functions to write records
        * from a dataset.
        *
        */
        
        /**
         * Specifies the OutputStream to write to. This will be called before
         * store(Tuple) is invoked.
         * 
         * @param os The stream to write tuples to.
         * @throws IOException
         */
    public abstract void bindTo(OutputStream os) throws IOException;

    /**
     * Write a tuple the output stream to which this instance was
     * previously bound.
     * 
     * @param f the tuple to store.
     * @throws IOException
     */
    public abstract void putNext(Tuple f) throws IOException;

        /**
     * Do any kind of post processing because the last tuple has been
     * stored. DO NOT CLOSE THE STREAM in this method. The stream will be
     * closed later outside of this function.
     * 
     * @throws IOException
     */
    public abstract void finish() throws IOException;  
}