|Deletions are marked like this.||Additions are marked like this.|
|Line 29:||Line 29:|
== Task Management ==
* receives instructions from [[BSPMaste|BSPMasters]].
* spawns one or more tasks as separated jvm processes where tasks are then executed.
* monitors spawned processes via ''ping''; when a task is
* out of contact (failure/ crashed): launch a new process and restart the task with '''max attempt''' set to '''3'''
* exceeding '''max attempt''': update task status/ notify [[BSPMaster|BSPMasters]]
* sends heartbeat to [[BSPMaster|BSPMasters]]
GroomServer is a process whose main responsibility is to manage bsp tasks. In addition to task management, GroomServer collaborates with BSPMaster so that job execution can be done correctly. Works that GroomServer performs include:
- Check local disk:
- Check local disk if writable.
Delete files under local dir (bsp.local.dir)
- Clear state table:
- Start http server: An embedded http service.
- Start worker server: RPC service listens to master's direction.
Register to BSPMaster: Enroll itself to BSPMaster with GroomServerStatus.
- Start message dispatcher (Instructor): (TODO: refactor needed)
- Start monitor service: A process export metrics, task status, etc. information.
GroomServer state includes
- NORMAL: Everything works fine.
STALE: This happens when DiskErrorException is thrown.
- DENIED: This indicates failing to establish connection to BSPMaster.
receives instructions from BSPMasters.
- spawns one or more tasks as separated jvm processes where tasks are then executed.
monitors spawned processes via ping; when a task is
out of contact (failure/ crashed): launch a new process and restart the task with max attempt set to 3
exceeding max attempt: update task status/ notify BSPMasters
sends heartbeat to BSPMasters