Server Not Available Yet

This can appear in the logs of a DataNode

2011-06-30 11:30:40,403 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310. Already tried 0 time(s).
2011-06-30 11:30:41,404 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310. Already tried 1 time(s).
2011-06-30 11:30:42,404 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310. Already tried 2 time(s).
2011-06-30 11:30:43,405 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310. Already tried 3 time(s).
2011-06-30 11:30:44,405 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310. Already tried 4 time(s).
2011-06-30 11:30:45,406 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310. Already tried 5 time(s).
2011-06-30 11:30:46,407 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310. Already tried 6 time(s).
2011-06-30 11:30:47,407 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310. Already tried 7 time(s).
2011-06-30 11:30:48,408 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310. Already tried 8 time(s).
2011-06-30 11:30:49,409 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310. Already tried 9 time(s).
2011-06-30 11:30:49,410 INFO org.apache.hadoop.ipc.RPC: Server at namenode/10.8.1.2:54310 not available yet, Zzzzz...
2011-06-30 11:30:51,411 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310. Already tried 0 time(s).
2011-06-30 11:30:52,412 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310. Already tried 1 time(s).
2011-06-30 11:30:53,412 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310. Already tried 2 time(s).
2011-06-30 11:30:54,413 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310. Already tried 3 time(s).
2011-06-30 11:30:55,414 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310. Already tried 4 time(s).
2011-06-30 11:30:56,414 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310. Already tried 5 time(s).
2011-06-30 11:30:57,415 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310. Already tried 6 time(s).
2011-06-30 11:30:58,416 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310. Already tried 7 time(s).
2011-06-30 11:30:59,416 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310. Already tried 8 time(s).
2011-06-30 11:31:00,417 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310. Already tried 9 time(s).
2011-06-30 11:31:00,418 INFO org.apache.hadoop.ipc.RPC: Server at namenode/10.8.1.2:54310 not available yet, Zzzzz...

What's happening here is that the DataNode cannot connect to the NameNode. Rather than fail, it assumes that the NameNode is temporarily offline -it hasn't started or is being restarted. The DataNodes will happily wait for the NameNode to come back up, and as soon as it does, report in. After trying repeatedly every seconds the client will back off for couple of seconds, then try again.

This process of retrying and backing off is a key part of how an HDFS cluster handles the temporary outage of a NameNode. It works well provided the network is set up and running correctly. It can be triggered by other cluster setup problems, which anyone setting up a Hadoop cluster is likely to encounter.

  1. The namenode hasn't been started yet. Fix: start the NameNode.

  2. The fs.default.name property in core-site.xml doesn't point to the correct hostname for the NameNode, and the DataNodes are trying to connect to the wrong server. Look at the server name in the log and verify it is valid.

  3. The port in the fs.default.name property is wrong. Verify the NameNode is listening at that port; if not correct the site settings.

  4. The client can't resolve the hostname, or it is resolving to the wrong address. Verify that IP address in the logs.
  5. Connection problems. Look at the network connectivity options in the TroubleShooting page.

ServerNotAvailable (last edited 2011-06-30 10:50:37 by SteveLoughran)