Isolating YARN Applications in Docker Containers
The Docker executor for YARN involves work on YARN along with its counterpart in Docker to forge the necessary API end points. The purpose of this page is to collect related tickets across both projects in one location.
May 2015 Update
The initial implementation of Docker Container Executor required that all YARN containers be launched in a Docker container. While this approach allowed us to more quickly get hands-on experiencing bringing Docker to YARN, it's not practical for production clusters. Also, we noticed that a production-quality implementation of Docker Container Executor would require borrowing a large amount of important -- and security-sensitive -- code and configuration from Linux Container Executor (and it's supporting binary).
As a result, we've concluded that the best way to bring Docker to a production cluster is to add Docker as a feature of the Linux Container Executor. With this features, individual jobs -- and even individual YARN containers -- can be configured to use Docker containers, while other jobs can continue to use regular Linux containers.
Based on this conclusion, we have developed the following plan for moving forward:
* Add to the Linux Container Executor (LCE) the option to launch containers using Docker.
* Add this functionality in a way that leverages LCE's existing ability to create cgroups to obtain cgroups for Docker containers.
* Add the ability to load the Docker image from a localized tar file (in addition to being able to load from a Docker registry).
* Extend our Docker work to behave correctly in Kerberized clusters.
* Verify that the distributed cache works correctly in Docker containers (we think it does, but we haven't fully tested).
We are targeting a beta release of this functionality for Hadoop 2.8.
The advantages Containers and Docker offer to Hadoop YARN are well understood:
* Security. YARN is typically deployed as a multi-tenant environment in large organizations with multiple groups sharing a common IT-managed cluster. Tasks from different tenants could potentially be scheduled on the same host. Containers securely isolate those tasks by limiting the privilege scope of a task to the container in which it runs. Root in the container is distinct from root on the host. Even though the root in a container could run privileged operations, it only affects the container counterparts of the host resources but not the host directly. Specific Linux Capabilities possessed by the task, devices accessible to it, etc. are adjusted for each container.
When combined with Software Defined Networking techniques, containers isolate the network traffic of different tenant applications. Then the tasks of one customer would not be able to maliciously or unintentionally snoop the traffic of another tenant.
* Performance isolation. Containers provide resource accounting and enforce resource limits on the processes running within them to prevent applications from stepping on each other. For fine-grain control, resource limits associated with CPU, memory and I/O bandwidth can be tuned on-the-fly as decided by the resource manager.
* Higher utilization by co-scheduling CPU and I/O bound jobs. In a multitenant environment, applications have varying resource needs. While some tasks are compute intensive, others could be I/O-bound. When the tasks of an I/O bound job are scheduled on a node, its compute resources go unused and vice versa. Due to the security risk of co-locating the tasks of different tenants on a shared machine, the idle resources are not allocated to other tenants even if they are able to utilize them. Containers prevent such resource underutilization by securely isolating tasks from one another, so that they can be safely co-scheduled on the same host.
* Consistency. Distributed YARN applications consist of tasks that need to run on different cluster nodes deployed with an identical host environment. Any discrepancies may cause application misbehavior. Containers ensure that all the tasks of an application run in a consistent software environment defined by the container and its image, regardless of the state of the host. For example, an application could run in an Ubuntu environment making use of Ubuntu-specific software, while the host itself runs RHEL.
* Isolation of software dependencies and configuration. YARN is designed to be modular, with well-defined interfaces between applications and its core. This allows applications to be built as independent binaries which often rely on third party software. For example, an application that predicts consumer spending based on linear regression might have a dependency on Matlab. Since the tasks of an application could be potentially scheduled to run on any host in the cluster, these software dependencies would have to be installed on all the cluster nodes. A variety of applications all sharing the same YARN cluster can quickly clutter the nodes with their respective software dependencies. Installing all dependencies across all hosts is an unscalable approach. In some cases, the software dependencies and their versions may be mutually conflicting.
With applications encapsulated in Docker containers, software dependencies and the system configuration required for them can be specified independent of the host and other applications running on the cluster.
* Reproducible and programmable mechanism to define application environments. Docker supports a mechanism to programmatically build out a consistent environment required for YARN applications. The build process can be run offline with its products stored in the central repository of container images. At the time of deployment, the image bits are quickly streamed into the cluster without incurring the overhead of runtime configuration.
* Rapid provisioning. The central repository of container images decouples software state and configuration from the hardware, enabling a relatively stateless base platform to be rapidly provisioned for a YARN application, by automatically pulling the right container image on demand. When the job finishes the containers are simply removed, returning the cluster to its pristine state.
Realizing these benefits requires changes to both Docker and YARN. Summary of those changes (existing and proposed) are listed below.
YARN-1964 Initial support for running YARN applications within Docker containers through Docker Container Executor (DCE)
YARN-2477 DCE must support secure mode
YARN-2478 Nested containers should be supported
YARN-2479 DCE must support handling of distributed cache
YARN-2480 DCE must support user namespaces
YARN-2482 DCE configuration
- Some of the Docker features below may only be made available via its REST endpoint. DCE should connect to it rather than shell out to invoke those functions.
Support for user namespace to map root user in the container to an unprivileged user on the host. Currently root in a Docker container has root privileges on the host.
Container network configuration that allows the task and application master containers to talk to each other. The NAT'ed non-routable IP addresses assigned by Docker don't allow the task to reach the application master running in a container on a different host. Possible approaches to addressing this and relevant tickets are outlined here.
Dynamic tuning of resource limits for granular control over resources allocation. Docker currently does not allow changing container resources once created.