You can get a BindException java.net.BindException: Address already in use if a socket on a machine is already in use and a service (NameNode, JobTracker, DataNode, TaskTracker, HTTP Server, etc.) tries to create a sort on that same port to listen for incoming requests.
- The port is in use (likeliest)
- If the port number is below 1024, the OS may be preventing your program from binding to a "trusted port"
If the configuration is a hostname:port value, it may be that the hostname is wrong -or its IP address isn't one your machine has.
- There is an instance of the service already running.
- If you are running on EC2, your service is trying to explicitly bind a public Elastic IP address using the public hostname or IP, or implicitly using "0.0.0.0" as the address.
If the port is "0", then the OS is looking for any free port -so the port-in-use and port-below-1024 problems are highly unlikely to be the cause of the problem. Hostname confusion and network setup are the likely causes.
As you cannot have more than one process listening on a TCP port, whatever is listening is stopping the service coming up. You will need to track down and stop that process, or change the service you are trying to start up to listen to a different port.
How to track down the problem
- Identify which host/IP address the program is trying to use.
Make sure the hostname is valid:try to ping it; use ifconfig to list the network interfaces and their IP addresses.
- Make sure the hostname/IP address is one belonging to the host in question.
Try and identify why it is in use. telnet <hostname> <port> and pointing a web browser at it are both good tricks.
- Identify which port the program is trying to bind to
- Identify the port that is in use and the program that is in use
As root use netstat -a -t --numeric-ports -p to list the ports that are in use by number and process. (On OS/X you need to use lsof).
- Change the configuration of one of the programs to listen on a different port.
Finally, this is not a Hadoop problem, it is a host, network or Hadoop configuration problem. As it is your cluster, only you can find out and track down the problem.. Sorry