Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

...

  1. Check local disk:
    1. Check local disk if writable.
    2. Delete files under local dir (bsp.local.dir)
  2. Clear state table:
    1. Wiki Markup
      Clean up tasks (Map\[TaskAttemptID \-> TaskInProgress\])

    2. Wiki Markup
      Initialize job (Map\[BSPJobID \-> RunningJob\])

    3. Wiki Markup
      Cleanup running tasks (Map\[TaskAttemptID \-> TaskInProgress\])

    4. Configure max tasks, default to 3.
  3. Start http server: An embedded http service.
  4. Start task report server: Communication between GroomServer and spawned child task. See TaskRunner.BspChildRunner
  5. Start worker server: RPC service listens to master's direction.
  6. Register to BSPMaster: Enroll itself to BSPMaster with GroomServerStatus.
  7. Start message dispatcher (Instructor): (TODO: refactor needed)
  8. Start monitor service: A process export metrics, task status, etc. information.

...

Task Management

GroomServer state includes

  • NORMAL: Everything works fine.
  • STALE: This happens when DiskErrorException is thrown.
  • DENIED: This indicates failing to establish connection to BSPMaster.

...

The GroomServer

  • receives instructions from BSPMasters.
  • spawns one or more tasks as separated jvm processes where tasks are then executed.
  • monitors spawned processes via ping; when a task is
    • out of contact (failure/ crashed): launch a new process and restart the task with max attempt set to 3
    • exceeding max attempt: update task status/ notify BSPMasters
  • sends heartbeat to BSPMasters

Scenario

Normal Case

Image Added

Failure Case

Image Added

  • A task failure event happens.
  • A corresponded GroomServer detects the task failure.
  • The GroomServer tries to restart the task.
  • If failing restart, the GroomServer export by marking the task as failed.

Components

  • Registrator: Register to GroomManager
  • TaskManager: Perform tasks management.
    • Launch a task.
    • Stop a task.
    • Kill a task.
    • Resume a task.
  • Monitor: Report GroomServer and tasks related information.