Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

...

Wiki Markup
In general, a system designed to deal with failures largelyusually need basesto onapply the conceptstechniques including unit of mitigation, redundancy, fault detection, fault observerrecovery\[4\], and so on. 

The architecture defines the basic unit which performs functions of a system according to requirements.

Providing redundant units.

Unit of mitigation: GroomServer(s)/ BSPMaster

Redundant units: GroomServer(s)

Fault detection: System monitor, heartbeat.

Fault recovery: Fail overFault observers are designed to detect fault or error in an earlier stage so that other strategies, such as error recovery can be employed to correct the problem.

Architecture

Task Failure

The execution of a task is spawned from the GroomServer so that the failure of the task would not pull down the GroomServer. Following steps are performed in the senario of task failure.

...