Differences between revisions 2 and 3
Revision 2 as of 2013-06-13 09:08:38
Size: 4366
Editor: HyunsikChoi
Comment: I've attached some sequence diagrams.
Revision 3 as of 2013-07-17 00:43:38
Size: 4386
Editor: HenrySaputra
Comment:
Deletions are marked like this. Additions are marked like this.
Line 8: Line 8:
The main objective of TaskRunnerLaunchImpl is to launch TaskRunner through Yarn's ContainerManager. TaskRunnerLaunchImpl handles two events CONTAINER_REMOTE_LAUNCH and CONTAINER_REMOTE_CLEANUP, which lead to launching a TaskRunner and killing a running TaskRunner respectively. These events come from SubQuery::allocateContainers(SubQuery) method. The main objective of TaskRunnerLaunchImpl is to launch TaskRunner through Yarn's ContainerManager.

The
TaskRunnerLaunchImpl class handles two events CONTAINER_REMOTE_LAUNCH and CONTAINER_REMOTE_CLEANUP, which lead to launching a TaskRunner and killing a running TaskRunner respectively. These events come from SubQuery::allocateContainers(SubQuery) method.
Line 17: Line 19:
Initially, a Task registers fetch URIs to fetchLauncher (ExecutorService) in order to pull data, and it restore a logical plan. Then, a physical operator tree is created from the logical plan via PhysicalPlannerImpl. Finally, Task::run() method is called, and then Task's status is changed to RUNNING. Initially, a Task registers fetch URIs to fetchLauncher (ExecutorService) in order to pull data, and it restore a logical plan.
Then, a physical operator tree is created from the logical plan via PhysicalPlannerImpl. Finally, Task::run() method is called, and then Task's status is changed to RUNNING.
Line 19: Line 22:
Also, a running Task periodically sends a statusUpdate report to TaskRunnerListener via RPC. A StatusUpdate report includes a status and some statistics of the running task. If the running task is failed, TaskRunner sends a message 'fatal' to TaskRunnerListenerImpl. If the task is completed, TaskRunner sends a message 'done' to TaskRunnerListenerImpl. Also, a running Task periodically sends a statusUpdate report to TaskRunnerListener via RPC. A StatusUpdate report includes a status and some statistics of the running task.
If the running task is failed, TaskRunner sends a message 'fatal' to TaskRunnerListenerImpl.
If the task is completed, TaskRunner sends a message 'done' to TaskRunnerListenerImpl.

Terminology

  • Execution Block - A distributed query plan consists of a tree of execution blocks. In other words, a logical plan of a query statement is broken into multiple parts, each of which is included in an execution block. Each execution block is a distributed processing phase which is similar to map or reduce phase in MapReduce. A number of tasks are created with the logical plan and control flags of an execution block.

  • SubQuery - It is a control instance and a state machine for an execution block.

  • QueryUnit (Task) - QueryUnit indicates a Task. In the design stage, we named QueryUnit to indicate a task. Recently, we are changing QueryUnit to Task.

  • QueryUnitAttempt - Like MapReduce, each running query unit (task) has an attempt instance and an attempt id. A task can be restarted if it is failed. So, we need a way to identify failed or succeeded tasks.

TaskRunnerLaunchImpl class

The main objective of TaskRunnerLaunchImpl is to launch TaskRunner through Yarn's ContainerManager.

The TaskRunnerLaunchImpl class handles two events CONTAINER_REMOTE_LAUNCH and CONTAINER_REMOTE_CLEANUP, which lead to launching a TaskRunner and killing a running TaskRunner respectively. These events come from SubQuery::allocateContainers(SubQuery) method.

Task

In TaskRunner, a Task is created from the response (QueryUnitRequest) of 'getTask()' rpc call. Task contains three main attributes as follows:

  • A logical plan of an execution block which created the task.
  • A fragment - an input path, an offset range, and schema. This is available only if the execution is leaf.
  • Fetch URIs - HTTP URIs to fetch the results processed by TaskRunners of the previous execution block. This is available only if the execution is non-leaf.

Initially, a Task registers fetch URIs to fetchLauncher (ExecutorService) in order to pull data, and it restore a logical plan. Then, a physical operator tree is created from the logical plan via PhysicalPlannerImpl. Finally, Task::run() method is called, and then Task's status is changed to RUNNING.

Also, a running Task periodically sends a statusUpdate report to TaskRunnerListener via RPC. A StatusUpdate report includes a status and some statistics of the running task. If the running task is failed, TaskRunner sends a message 'fatal' to TaskRunnerListenerImpl. If the task is completed, TaskRunner sends a message 'done' to TaskRunnerListenerImpl.

TaskRunner

For each execution block, TaskRunner is launched by Yarn's ContainerManager. TaskRunner processes a Task at a time. If TaskRunner has an available slot, it sends 'getTask' to TaskRunnerListner. If TaskRunner receives the response (QueryUnitRequest) of 'getTask' message, TaskRunner creates an instance of Task from the response.

TaskListenerImpl

It receives messages sent from a number of TaskRunners. It passes the received message as events to some event handlers, such as QueryUnitAttempt and TaskScheduler.

In the current implementation, there are four messages as follows:

A sequence diagram of statusUpdate, done, and fatal messages

http://diveintodata.files.wordpress.com/2013/06/statusupdate_done_fatal1.png

A sequence diagram of getTask message

http://diveintodata.files.wordpress.com/2013/06/gettask.png

Architecture (last edited 2013-07-17 00:43:38 by HenrySaputra)