Differences between revisions 2 and 3
Revision 2 as of 2008-05-26 13:58:44
Size: 6697
Editor: PiSong
Comment:
Revision 3 as of 2009-09-20 23:38:07
Size: 6701
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 166: Line 166:
http://people.apache.org/~pisong/dot1.png {{http://people.apache.org/~pisong/dot1.png}}

Plan Testing Helper

This is a small utility that I developed for testing my type checking logic. I think it might be useful for other people as well so I have refactored a bit to make it more generic.

Use cases

Here are steps that I do for type checking:-

  • Construct a plan
  • Run type-checking logic against the plan
  • Construct the expected plan
  • Compare structures of the actual plan and the expected plan.

Here are steps that one might do for query parser:-

  • Given a query string, construct the plan.
  • Construct the expected plan
  • Compare two plans

Here for testing plan optimizer:-

  • Construct a plan
  • Run optimizer
  • Construct the expected plan
  • Compare structures of the actual plan and the expected plan.

What can be facilitated?

So there are two common bits from above use cases:-

  1. Construct the expected plan
  2. Compare two plans

Construct a plan

What is Dot Language?

Dot language is a text graph description language. There are three main object types: node, edge, and graph. All of them can have custom attributes.

( See http://en.wikipedia.org/wiki/DOT_language )

Sample Dot graph

digraph plan1 {
    
    load [color="black"]

    load -> distinct -> split -> splitOut1 [style=dotted] ;
    split -> splitOut2 ;
    splitOut1 -> cross ;
    splitOut2 -> cross ;
}

Note: "digraph" dictates that this is a description of directed graph which is the domain we're interested in.

Note: "load [color="black"]" is attaching an attribute to the node. This is optional.

By extending Dot a bit, we can encode our logical plan in the following format:-

digraph graph1 {
    
    node [schema="field1: int, field2: float"]       /* "node" means the following nodes will inherit this attribute */

    load        [key="1", type="LOLoad"] ;
    distinct    [key="2", type="LODistinct"] ;
    split       [key="3", type="LOSplit"] ;
    splitout1   [key="4", type="LOForEach"] ;
    splitout2   [key="5", type="LOForEach"] 
    cross       [key="6", type="LOCross", schema="field1: int, field2: float, field3: int, field4: float"] ;

    load -> distinct -> split -> splitOut1 ;
    split -> splitOut2 ;
    splitOut1 -> cross ;
    splitOut2 -> cross ;
}

And this can be translated to a plan using a loader class (API will be provided)

Compare two plans

I will provide API like this:-

/***
 * This abstract class is a base for plan comparer
 */

public abstract class PlanStructuralComparer<E extends Operator,
                                             P extends OperatorPlan<E>> {

    /***
     * This method does structural comparison of two plans based on:-
     *  - Graph connectivity
     *
     * The current implementation is based on simple key-based
     * vertex matching.
     *
     * @param plan1 the first plan
     * @param plan2 the second plan
     * @param messages where the error messages go
     * @return
     */
    public boolean structurallyEquals(P plan1, P plan2, StringBuilder messages)  ;


    /***
     * Same as above in case just want to compare but
     * don't want to know the error messages
     * @param plan1
     * @param plan2
     * @return
     */
    public boolean structurallyEquals(P plan1, P plan2) ;
}

A subtype which is interested in type information would look like this:-

/***
 * This class is used for LogicalPlan comparison
 */
public class LogicalPlanComparer
                extends PlanStructuralComparer<LogicalOperator, LogicalPlan> {

    /***
     * This method does naive structural comparison of two plans.
     *
     * Things we compare :-
     *  - Things compared in the super class
     *  - Types of matching nodes
     *  - Schema associated with each operator
     *
     * @param plan1
     * @param plan2
     * @param messages
     * @return
     */
    @Override
    public boolean structurallyEquals(LogicalPlan plan1,
                                      LogicalPlan plan2,
                                      StringBuilder messages) {
        // Stage 1: Compare connectivity
        if (!super.structurallyEquals(plan1, plan2, messages)) return false ;

        // Stage 2: Compare node types
        if (isMismatchNodeType(plan1, plan2, messages)) return false ;

        // Stage 3: Compare schemas
        if (isMismatchSchemas(plan1, plan2, messages)) return false ;
   
        // else
        return true ;
    }

Dot Trick

One can plot a graph written in Dot language by just doing like:-

dot -Tpng dot1.dot > dot1.png

Or alternatively,

dotty dot1.dot

NOTE: You need graphviz installed on your machine to do these things.

Here is a sample graph generated from the given sample.

http://people.apache.org/~pisong/dot1.png

Current Status & Issues

  • Working code will be available in 1-2 days (Today = 26th May)
  • Doesn't work with inner plans yet. Inner plans may have to be constructed separately.

Appendix

The API:-

OperatorPlanLoader - This class is an abstract base class for loading a plan from Dot

public abstract class OperatorPlanLoader<E extends Operator,
                                         P extends OperatorPlan<E>> {

    /***
     * This method is used for loading an operator plan encoded in Dot format
     * @param dotContent the dot content
     * @param clazz the plan type to be created
     * @return
     */
    public P load(String dotContent) {

    /***
     * This method has be overridden to instantiate the correct node type
     *
     * @param node
     * @param plan
     * @return
     */
    protected abstract E createOperator(Node node, P plan) ;
}

Structures captured from Dot (Before being converted to plan):-

/***
 * This represents graph structure in DOT format
 */
public class DotGraph {

    public String name;
    public List<Edge> edges = new ArrayList<Edge>() ;
    public List<Node> nodes = new ArrayList<Node>() ;
    public Map<String, String> attributes = new HashMap<String,String>() ;


    public DotGraph(String name) {
        this.name = name ;
    }

}

/***
 * This represents a node in DOT format
 */
public class Node {

    public String name ;
    public Map<String, String> attributes = new HashMap<String,String>() ;

}

/**
 * This represents an edge in DOT format.
 * An edge in DOT can have attributes but we're not interested
 */
public class Edge {
    public String fromNode ;
    public String toNode ;
}

PlanTestingHelper (last edited 2009-09-20 23:38:07 by localhost)