You get an Unknown Host Error -often wrapped in a Java IOException
, when one machine on the network cannot determine the IP address of a host that it is trying to connect to by way of its hostname. This can happen during file upload (in which case the client machine is has the hostname problem), or inside the Hadoop cluster.
Some possible causes in approximately reverse order of likelihood (not an exclusive list):
nslookup <hostname>
from the client machine./etc/hosts
entries but the calling machine's hosts file lacks an entry for the host. FQDN entries in /etc/hosts
files must contain a trailing dot. See the Using Hosts files section below. Test: do a ping <hostname>.
from the client machine (note trailing 'dot').core-site.xml
) is misspelled.core-site.xml
) is confused with the hostname of another service. For example, you are using the hostname of the YARN Resource Manager in the fs.defaultFS
configuration option to define the namenode./etc/hosts
; the service will fail to start as it cannot determine which network card/address to use.Less likely causes:
These are all network configuration/router issues. As it is your network, only you can find out and track down the problem. That said, any tooling to help Hadoop track down such problems in cluster would be welcome, as would extra diagnostics. If you have to extend Hadoop to track down these issues -submit your patches!
Some tactics to help solve the problem:
nslookup
, the dig
command is invaluable for tracking down DNS problems, though it does assume you understand DNS records. Now is a good time to learn.Unless the route cause has been identified, the problem may return.
If you are using /etc/hosts
files instead of DNS-based lookups and your hosts files have FQDNs, you must ensure that the FQDN includes a trailing dot. Thus a correct hosts file entry for foo.example.com
may look like this.
1.2.3.4 foo.example.com foo.example.com. foo |
The Hadoop host resolver ensures hostnames are terminated with a trailing dot prior to lookup to avoid the security issue described in RFC 1535.
This exception surfaces when setting up HA HDFS.
As documented, HA HDFS requires you to list the namenode URLs of a cluster in the property edfs.ha.namenodes.mycluster
, where "mycluster" is the name of your HA cluster.
<property> <name>dfs.ha.namenodes.mycluster</name> <value>nn1,nn2</value> </property> |
Then for the filesystem URL, you use the name of the cluster:
<property> <name>fs.defaultFS</name> <value>hdfs://mycluster</value> </property> |
If you get an Unknown Host Exception, and the host is the name of your HA cluster, here mycluster
, then it means that the HDFS client hasn't recognized that this is an HA cluster, and instead tried to connect to it directly on the default HDFS port.
The dfs.ha.namenodes.mycluster property is unset or the cluster name is inconsistent across the properties. Check your config and try again.
Finally, because this is a configuration problem, filing bug reports is not going to help. They will only be closed as Invalid Issues