You can get a BindException Address already in use if a socket on a machine is already in use and a service (NameNode, JobTracker, DataNode, TaskTracker, HTTP Server, etc.) tries to create a sort on that same port to listen for incoming requests.

Possible Causes

If the port is "0", then the OS is looking for any free port -so the port-in-use and port-below-1024 problems are highly unlikely to be the cause of the problem. Hostname confusion and network setup are the likely causes.

As you cannot have more than one process listening on a TCP port, whatever is listening is stopping the service coming up. You will need to track down and stop that process, or change the service you are trying to start up to listen to a different port.

How to track down the problem

  1. Identify which host/IP address the program is trying to use.
  2. Make sure the hostname is valid:try to ping it; use ifconfig to list the network interfaces and their IP addresses.

  3. Make sure the hostname/IP address is one belonging to the host in question.
  4. Try and identify why it is in use. telnet <hostname> <port> and pointing a web browser at it are both good tricks.

  5. Identify which port the program is trying to bind to
  6. Identify the port that is in use and the program that is in use
  7. As root use netstat -a -t --numeric-ports -p to list the ports that are in use by number and process. (On OS/X you need to use lsof).

  8. Change the configuration of one of the programs to listen on a different port.

Finally, this is not a Hadoop problem, it is a host, network or Hadoop configuration problem. As it is your cluster, only you can find out and track down the problem.. Sorry

BindException (last edited 2015-10-26 11:05:32 by SteveLoughran)