SPOF: Single Point of Failure

An SPOF is a single part of a system, which, if it fails, takes down the entire system.

In Hadoop, the NameNode is a SPOF. DNS may also be SPOF, unless you have failover. The network may be a SPOF, depending on is redundancy, as can be the room's power supply.

People can be SPOFs too. If only one person knows the database admin password, you can be sure they are on vacation when the password is needed. If only one developer understands how the mission-critical MapReduce algorithm works, they are the SPOF.

There is always a SPOF. Even when you think you have anticipated and eliminated every SPOF, there is one out there, waiting. How you find it? You wait.

  • No labels