...
- receives instructions from BSPMasters.
- spawns one or more tasks as separated jvm processes where tasks are then executed.
- monitors spawned processes via ping; when a task is
- out of contact (failure/ crashed): launch a new process and restart the task with max attempt set to 3
- exceeding max attempt: update task status/ notify BSPMasters
- sends heartbeat to BSPMasters
Scenario
Normal Case
- A GroomServer forks a new task.
- The spawned task ack with metrics to GroomServer.
- Upon reception of task's ack, the GroomServer exports metrics to monitor system.
Failure Case
- A task failure event happens.
- A corresponded GroomServer detects the task failure.
- The GroomServer tries to restart the task.
- If failing restart, the GroomServer export by marking the task as failed.