Isolating YARN Applications in Docker Containers
The Docker executor for YARN involves work on YARN along with its counterpart in Docker to forge the necessary API end points. The purpose of this page is to collect related tickets across both projects in one location.
The advantages Containers and Docker offer to Hadoop YARN are well understood. Here is a partial list.
Isolation of software dependencies and configuration With applications encapsulated within Docker containers, software dependencies and system configuration required for an application can be independently specified from that of the host and other applications running on the cluster.
Security The privilege scope of a task is limited to the container it runs in. Root in the container would have no root privileges on the host for example. Linux capabilities possessed by the task, devices accessible to it etc. can be controlled.
Performance isolation Containers provide dynamically tunable limits on a task's use of resources such as CPU, memory and IO bandwidth.
Consistency All tasks of an application run in an identical software environment defined by the container and its image, regardless of the state of the host. For example, an application could run in an Ubuntu environment making use of Ubuntu-specific software, while the host itself runs RHEL.
Quick provisioning The central repository of container images decouples software state and configuration from hardware enabling a relatively stateless base platform to be rapidly provisioned for a YARN application by automatically pulling right container image on demand.
Programmability Dockerfiles provide a fast and canonical mechanism to produce the file system context and configuration required for a YARN application.
Realizing these benefits requires changes to both Docker and YARN. Several of the necessary Docker features for the above such as excluding intermediate data directory from copy-on-write file system and adding data node Unix socket from host into the container for short-circuit IO are already available. The following new pieces of work needs to be done.
YARN Docker executor
An initial patch of Docker executor.
- Some of the Docker features below may only be made available via its REST endpoint. Docker executor should connect to it rather than shell out to invoke those functions.
Docker support for user namespaces to map root user in the container to an unprivileged user on the host. Currently root in a Docker container has root privileges on the host.
Container network configuration that allows the task and application master containers to talk to each other. The NAT'ed non-routable IP addresses assigned by Docker don't allow the task to reach the application master running in a container on a different host. Possible approaches to addressing this and relevant tickets are outlined here.
Dynamic tuning of resource limits for granular control over resources allocation. Docker currently does not allow changing container resources once created.