Execution Block - A distributed query plan consists of a tree of execution blocks. In other words, a logical plan of a query statement is broken into multiple parts, each of which is included in an execution block. Each execution block is a distributed processing phase which is similar to map or reduce phase in MapReduce. A number of tasks are created with the logical plan and control flags of an execution block.
SubQuery - It is a control instance and a state machine for an execution block.
QueryUnitAttempt - Like MapReduce, each running query unit (task) has an attempt instance and an attempt id. A task can be restarted if it is failed. So, we need a way to identify failed or succeeded tasks.
The TaskRunnerLaunchImpl class handles two events CONTAINER_REMOTE_LAUNCH and CONTAINER_REMOTE_CLEANUP, which lead to launching a TaskRunner and killing a running TaskRunner respectively. These events come from SubQuery::allocateContainers(SubQuery) method.
- A logical plan of an execution block which created the task.
- A fragment - an input path, an offset range, and schema. This is available only if the execution is leaf.
Fetch URIs - HTTP URIs to fetch the results processed by TaskRunners of the previous execution block. This is available only if the execution is non-leaf.
Initially, a Task registers fetch URIs to fetchLauncher (ExecutorService) in order to pull data, and it restore a logical plan. Then, a physical operator tree is created from the logical plan via PhysicalPlannerImpl. Finally, Task::run() method is called, and then Task's status is changed to RUNNING.
Also, a running Task periodically sends a statusUpdate report to TaskRunnerListener via RPC. A StatusUpdate report includes a status and some statistics of the running task. If the running task is failed, TaskRunner sends a message 'fatal' to TaskRunnerListenerImpl. If the task is completed, TaskRunner sends a message 'done' to TaskRunnerListenerImpl.
For each execution block, TaskRunner is launched by Yarn's ContainerManager. TaskRunner processes a Task at a time. If TaskRunner has an available slot, it sends 'getTask' to TaskRunnerListner. If TaskRunner receives the response (QueryUnitRequest) of 'getTask' message, TaskRunner creates an instance of Task from the response.
In the current implementation, there are four messages as follows:
A sequence diagram of statusUpdate, done, and fatal messages
A sequence diagram of getTask message