The new LOProject works in two different ways:-

Besides that it can be marked as sentinel, meaning it bridges data from outer plan to inner plan.

Doesn't seem it having too many meanings?

Example

B = COGroup A BY $0, B BY S1 ;
C = FOREACH B GENERATE flatten(A.(f1, f2)), group ;

Here are the inner plans (inside GENERATE):-

     (plan1)                 (plan2)

Project(A.(f1, f2))         Project(group) 

The one in the first plan returns projected bag but the one from the second plan returns datum. Both of them also act as bridges between outer/inner plans.

My suggestion

It would be cleaner and more understandable if we just:-

  1. Introduce LOSentinel which can be used to get 1 field out of outer plan (from tuple or bag).
  2. Use LOProject only when projecting tuples or bags (and output tuple/bag)

Following examples show plans inside LOGenerate:-

Example1

B = FOREACH A GENERATE x1*x2 ;

Sentinel(x1) Sentinel(x2) 
        \    /
          MUL

Example2

FOREACH C GENERATE FLATTEN(A.(f1, f2)), group ;

     (plan1)                 (plan2)

    Sentinel(A)             Sentinel(group)
        |
  Project(f1, f2)          

Note: Flatten is handled by LOGenerate

Example3

W = LOAD '...' AS (url, outlink);
G = GROUP W by url;
R = FOREACH G {
        FW = FILTER W BY outlink eq 'www.apache.org';
        PW = FW.outlink;
        DW = DISTINCT PW;
        GENERATE group, COUNT(DW);
}

   (plan1)           (plan2)

  Sentinel(group)   Sentinel(W)
                        |
                      Filter
                        |
                  Project(outlink)
                        |
                     Distinct 
                        |
                       COUNT

Thought?