Differences between revisions 1 and 2
Revision 1 as of 2013-04-10 17:13:01
Size: 10488
Comment: added the documentation for Gen
Revision 2 as of 2013-04-12 15:54:10
Size: 10554
Deletions are marked like this. Additions are marked like this.
Line 13: Line 13:
{{{ {{{#!java
Line 37: Line 37:
{{{ {{{#!java
Line 67: Line 67:
{{{ {{{#!java
Line 82: Line 82:
{{{ {{{#!java
Line 88: Line 88:
{{{ {{{#!java
Line 100: Line 100:
{{{ {{{#!java
Line 134: Line 134:
{{{ {{{#!java
Line 145: Line 145:
{{{ {{{#!java
Line 165: Line 165:
{{{ {{{#!java
Line 200: Line 200:
{{{ {{{#!java
Line 217: Line 217:
{{{ {{{#!java

Gen: A Java Package for Constructing and Manipulating Abstract Syntax Trees

Gen is a Java preprocessor that extends the Java language with special syntax to make the task of handling Abstract Syntax Trees (ASTs) easier. Two classes are used by Gen: the class Tree that captures an AST (with subclasses VariableLeaf, IntegerLeaf, FloatLeaf, StringLeaf, and Node), and the class Trees that captures a list of ASTs (see the Appendix for the detailed API). A Tree is a tree-like data structure, which is used for representing various tree-like data structures used in compiler construction.

For example, the Java expression

   1 new Node("Binop",
   2          Trees.nil.append(new VariableLeaf("Plus"))
   3                   .append(new VariableLeaf("x"))
   4                   .append(new Node("Binop",
   5                                    Trees.nil.append(new VariableLeaf("Minus"))
   6                                             .append(new VariableLeaf("y"))
   7                                             .append(new VariableLeaf("z")))))

constructs the AST Binop(Plus,x,Binop(Minus,y,z)). To make the task of writing these tree constructions less tedious, Gen extends Java with the syntactic form #< >. For example, #<Binop(Plus,x,Binop(Minus,y,z))> is equivalent to the above Java expression. That is, the text within the brackets #< > is used by Gen to generate Java code, which creates the tree-like form (an instance of the class Tree) that represents this text. Values of the class Tree can be included into the form generated by the #< > brackets by "escaping" them with a backquote character (`). The operand of the escape operator (the backquote operator) is expected to be an expression of type Tree that provides the value to "fill in" the hole in the bracketed text at that point (actually, an escaped string/int/float is also lifted to a Tree). For example, in

   1   Tree x = #<join(a,b,p)>;
   2   Tree y = #<select(`x,q)>;
   3   Tree z = #<project(`y,A)>;

y is set to #<select(join(a,b,p),q)> and z to #<project(select(join(a,b,p),q),A)>. There is also bracketed syntax, #[ ], for constructing instances of Trees.

The bracketed syntax takes the following form:

bracketed ::=

"#<" expr ">"

a Tree construction (an AST)

"#[" arg "," ... "," arg "]"

a Trees construction (a list of ASTs)

expr ::=


the representation of a variable name


the representation of an integer


the representation of a float number


the representation of a string

"`" name

escaping to the value of name

"`(" code ")"

escaping to the value of code

name "(" arg "," ... "," arg ")"

the representation of a Node with zero or more children

"`" name "(" arg "," ... "," arg ")"

the representation of a Node with escaped name

arg ::=


the representation of an expression

"..." name

escaping to a list of ASTs (Trees) bound to name

"...(" code ")"

escaping to a list of ASTs (Trees) returned by code

where code is any Java code. The #< `(code) > embeds the value returned by the Java code code of type Tree to the term representation inside the brackets. For example, #<`f(6,...r,g("ab",`(k(x))),`y)> is equivalent to the following Java code:

   1 new Node(f,
   2          Trees.nil.append(new IntegerLeaf(6))
   3                   .append(r)
   4                   .append(new Node("g",Trees.nil.append(new StringLeaf("ab"))
   5                                                 .append(k(x))))
   6                   .append(y))

If f="h", y=#<m(1,"a")>, and k(x) returns the value #<8>, then the above term is equivalent to #<h(6,g("ab",8),m(1,"a"))>. The three dots (...) construct is used to indicate a list of children in a Node. Since this list is an instance of the class Trees, the type of name in ...name is also Trees. For example, in

   1   Trees r = #[join(a,b,p),select(c,q)];
   2   Tree z = #<project(...r)>;

z will be bound to #<project(join(a,b,p),select(c,q))>. Finally, to iterate over Trees, use a for-loop:

   1 for ( Tree v: #[a,b,c] )
   2     System.out.println(v);

Gen provides a case statement syntax with patterns. Patterns match the Tree representations with similar shape. Escape operators applied to variables inside these patterns represent variable patterns, which "bind" to corresponding subterms upon a successful match. This capability makes it particularly easy to write functions that perform source-to-source transformations. A function that simplifies arithmetic expressions can be expressed easily as:

   1   Tree simplify ( Tree e ) {
   2     match e {
   3     case plus(`x,0): return x;
   4     case times(`x,1): return x;
   5     case times(`x,0): return #<0>;
   6     case _: return e;
   7     }
   8   }

where the _ pattern matches any value. For example, simplify(#<times(z,1)>) returns #<z>, since times(z,1) matches the second case pattern. The syntax of the case statement is:

case_stmt ::=

"match" code "{" case ... case "}"

case ::=

"case" pattern ":" code

pattern ::=


exact match with a name


exact match with an integer


exact match with a float number


exact match with a string

"`" name

match any Tree and bind name to the matched value

name "(" arg "," ... "," arg ")"

match with a Node with a given name and match the Node children

"`" name "(" arg "," ... "," arg ")"

match the Node children and bind name to the Node name


match any Tree

arg ::=


match with a pattern

"..." name

match with a list of ASTs (Trees) and bind name to the matched list


match any arguments

For example, the pattern `f(...r) matches any Node: when it is matched with #<join(a,b,c)>, it binds f to the string "join" and r to the list #[a,b,c]. Another example is the following function that adds the terms #<8> and #<9> as children to any Node e:

   1   Tree add_arg ( Tree e ) {
   2     match e {
   3     case `f(...r): return #<`f(8,9,...r)>;
   4     case `x: return x;
   5     }
   6   }

The special keyword fail is used in Gen to abort the current matching and skip to the next case. For example

   1     match e {
   2     case `f(...r,join(`x,`y),...s):
   3          if (x.equals(y))
   4              return #<`f(...r,`x,...s)>;
   5          else fail;
   6     case `x: return x;
   7     }

will transform join(`x,`y) to x if x is equal to y (it is not permitted to use a variable twice in a pattern, that is, join(`x,`x) is an illegal pattern).

Appendix: The Tree API

The class Tree captures an AST, and the class Trees captures a list of ASTs. A Tree is a tree-like data structure, which is used for representing various tree-like data structures used in compiler construction.

   1 abstract class Tree {
   2     public boolean is_node ();     // is this a Node?
   3     public boolean is_variable (); // is this a VariableLeaf?
   4     public boolean is_long ();     // is this a LongLeaf?
   5     public boolean is_string ();   // is this a StringLeaf?
   6     public boolean is_double ();   // is this a DoubleLeaf?
   7     public String variableValue ();// return the variable name
   8     public long longValue ();      // return the long value
   9     public String stringValue ();  // return the string value
  10     public double doubleValue ();  // return the double value
  11 }
  12 class VariableLeaf extends Tree {
  13     public VariableLeaf ( String s );
  14     public String value ();
  15 }
  16 class LongLeaf extends Tree {
  17     public LongLeaf ( long n );
  18     public long value ();
  19 }
  20 class DoubleLeaf extends Tree {
  21     public DoubleLeaf ( double n );
  22     public double value ();
  23 }
  24 class StringLeaf extends Tree {
  25     public StringLeaf ( String s );
  26     public String value ();
  27 }
  28 class Node extends Tree {
  29     public Node ( String n, Trees cs );  // construct a new Node with a given name n and children cs
  30     public String name ();               // Node label
  31     public Trees children ();            // Node children
  32 }

where the Trees class represents a list of ASTs (a linked list):

   1 class Trees implements Iterable<Tree> {
   2     public Tree  head();               // the head of the list
   3     public Trees tail();               // the rest of the list (the list without the head)
   4     public Trees ( Tree e, Trees t );  // construct a new list with head e and tail t
   5     public final static Trees nil;     // the empty list (same as #[])
   6     public Trees cons ( Tree e );      // put the element e before the list
   7     public Trees append ( Tree e );    // append the element e at the end of list (does not change 'this')
   8     public Trees append ( Trees s );   // append the list s at the end of list (does not change 'this')
   9     public boolean is_empty ();        // is this an empty list?
  10     public int length ();              // return the list length
  11     public Trees reverse ();           // return the reverse of the list (does not change 'this')
  12     public boolean member ( Tree e );  // is e a member of the list?
  13     public Tree nth ( int n );         // return the nth list element
  14 }

To iterate over Trees, one can use Java iterators:

   1 Trees ts = ...;
   2 for( Tree e: ts )
   3    System.out.println(e);

GenDoc (last edited 2013-04-12 15:54:10 by LeonidasFegaras)