Differences between revisions 1 and 2
Revision 1 as of 2008-05-19 12:27:43
Size: 1950
Editor: PiSong
Comment:
Revision 2 as of 2009-09-20 23:38:21
Size: 1950
Editor: localhost
Comment: converted to 1.6 markup
No differences found!

The new LOProject works in two different ways:-

  • Given 1 index, it outputs datum.
  • Given 2 or more indexes, it outputs tuple.

Besides that it can be marked as sentinel, meaning it bridges data from outer plan to inner plan.

Doesn't seem it having too many meanings?

Example

B = COGroup A BY $0, B BY S1 ;
C = FOREACH B GENERATE flatten(A.(f1, f2)), group ;

Here are the inner plans (inside GENERATE):-

     (plan1)                 (plan2)

Project(A.(f1, f2))         Project(group) 

The one in the first plan returns projected bag but the one from the second plan returns datum. Both of them also act as bridges between outer/inner plans.

My suggestion

It would be cleaner and more understandable if we just:-

  1. Introduce LOSentinel which can be used to get 1 field out of outer plan (from tuple or bag).
  2. Use LOProject only when projecting tuples or bags (and output tuple/bag)

Following examples show plans inside LOGenerate:-

Example1

B = FOREACH A GENERATE x1*x2 ;

Sentinel(x1) Sentinel(x2) 
        \    /
          MUL

Example2

FOREACH C GENERATE FLATTEN(A.(f1, f2)), group ;

     (plan1)                 (plan2)

    Sentinel(A)             Sentinel(group)
        |
  Project(f1, f2)          

Note: Flatten is handled by LOGenerate

Example3

W = LOAD '...' AS (url, outlink);
G = GROUP W by url;
R = FOREACH G {
        FW = FILTER W BY outlink eq 'www.apache.org';
        PW = FW.outlink;
        DW = DISTINCT PW;
        GENERATE group, COUNT(DW);
}

   (plan1)           (plan2)

  Sentinel(group)   Sentinel(W)
                        |
                      Filter
                        |
                  Project(outlink)
                        |
                     Distinct 
                        |
                       COUNT

Thought?

NewLOProject (last edited 2009-09-20 23:38:21 by localhost)