Status

Authors: Weiqing Yang, Jian He

Current state: ACCEPTED

Discussion thread: http://mail-archives.apache.org/mod_mbox/samza-dev/201911.mbox/browser

JIRA: SAMZA-2067 - Getting issue details... STATUS

Released:

Problem

Currently Samza applications can be run on YARN or in standalone mode. This SEP enables Samza on Kubernetes.

Background

This section describes a bit of background of Kubernetes. If you are familiar with Kubernetes, you can skip this section.

Kubernetes is a container orchestration system for automating deployment, scaling, and management of containerized applications. Kubernetes consists of Master components and Node components.

Figure 1. Kubernetes architecture

Master Components

API-server: Component on the master that exposes the Kubernetes API. It is the front-end for the Kubernetes control plane.
Scheduler: Component on the master that watches newly created Pods that have no node assigned, and selects a node for them to run on.

API-server + Scheduler is similar to the ResourceManager in YARN. The difference is that API-server and scheduler runs in separate containers/processes, whereas ResourceManager is a single process.

Controller/Operator: Kubernetes ships built-in controllers such as StatefulSet that ensures the number of replicas for an application is up and running. Users can also develop their own operators (The idiom in Kubernetes community to usually call user-developed controllers as operator). In this case, the operator is similar to the AM in YARN.

Node Components

Kubelet: An agent that runs on each node in the cluster. It makes sure that containers are running in a Pod. This is similar to NodeManager in YARN.
Pod: A group of one or more containers, with shared storage/network, and a specification for how to run the containers (refer to here). Note that Pod is similar to the container concept in YARN.
Container-runtime: The container runtime is the software that is responsible for running containers, most notably, docker.

Proposed Changes

The Samza Operator, similar to the Samza AM in YARN, is the control hub for Samza applications running on Kubernetes. It is responsible for requesting Pods from Kubernetes and coordinating work assignment across Pods.

Below graph describes the lifecycle of a Samza application running on Kubernetes.

Figure 2. Lifecycle of Samza applications running on Kubernetes

The run-app.sh script is started providing the location of your application’s binaries and its config file. The script instantiates an ApplicationRunner, which is the main entry-point responsible for running your application.
The ApplicationRunner parses your configs and writes them to a special Kafka topic named - the Coordinator Stream for distributing them. It proceeds to submit a request to Kubernetes API-server to launch the job-coordinator Pod.
The job-coordinator Pod (The AM, in YARN’s parlance) is started, It is then responsible for managing the overall application. It reads configs from the Coordinator Stream and computes work-assignments for individual Pods.
It also determines the hosts each Pod should run on taking data-locality into account. It proceeds to send Pod creation requests to API-server.
The Kubelet will watch the requests and start the task Pods. If the application’s dependencies are hosted in remote artifact repositories like HDFS. They need to be downloaded to the nodes first. How to download?

M1: the task Pod can leverage the Kubernetes Init-container functionality to download the dependencies.
M2: the regular container can download the dependencies first before executing the core logic.
M3: the other way is to pre-bake all the dependencies into the container image itself, but that is less flexible as it requires all the code, configs to be available in the image. Regardless of M1 or M2, this method can always be used.
M1 vs M2: The Init-containers is ensured to be run before regular containers. In M1, if the regular container fails, the Init-container will not be re-run. In M2, if the regular container fails, it needs to handle the case to not re-run the logic to download the resources.

When the task Pod is started, each Pod first queries the job-coordinator pod to determine its work-assignments and configs. It then proceeds to execute its assigned tasks.
The job-coordinator does the typical control-loop pattern, ensures the current state matching the desired state. e.g. It monitors how many task Pods are alive and creates new Pods to match the desired replicas if any fails

Host Affinity & Kubernetes

This document describes a mechanism to allow Samza to request containers from YARN on a specific machine. This locality-aware container assignment is particularly useful for containers to access their local state on the machine. This mechanism leverages a feature in YARN to be able to request container by hostname.

Similar primitive is provided in Kubernetes to allow users to request pods by hostname. This document describes the feature. Particularly, “preferredDuringSchedulingIgnoredDuringExecution” policy can be used to “run my pod on host X, if not satisfied, run it elsewhere.”

Alternatively, If a remote storage, instead of local, can be used for persisting Samza task state, the goal of container-state-rebinding can be achieved by dynamically attach the remote storage to the container even if the container is restarted on a different host, by leveraging the Kubernetes PersistentVolume primitive. This is usually useful in a cloud environment where remote storage is typically accessible.

Implementation

Prepare a base container image for Samza application including all the Samaza framework jars etc.
The run-app.sh and ApplicationRunner needs to be modified to support submitting apps to Kubernetes api-server.
Implement a KubeClusterResourceManager, similar to YARN AM, that creates task Pods in Kubernetes and re-create if any task fails
Prepare a container that can download the dependencies from remote artifact repositories.
Refactor Samza Core logic to support Samza on K8s and Samza on Yarn.

Job-Coordiantor Details

The Job-Coordinator is very similar to YARN AM. When it starts, it first reads the JobModel from coordinator stream and then create pods from Kubernetes with the container information provided. The Kubelet will then start the containers.

Figure 3. Workflow of Samza on Kubernetes

Note that there is a difference here between Kubernetes and YARN. In YARN, there are usually two stages: AM first requests the containers and then launches the container on NodeManager

In Kubernetes, requesting containers and starting containers are done in a single call. The createPod API will request the pods and then the pods will be started by kubelet automatically.

Due to this difference, the current samza on YARN implementations regarding storing the resource requests and then match the received containers with the stored pending requests are not needed.

Comparison with Samza on Yarn

Figure 4. Workflow of Samza on Yarn

By comparing Figure 3 and Figure 4, we can see the workflow of Samza on K8s is simpler than the workflow of Samza on Yarn. We need to refactor Samza Core logic to support both.

Reference

The java client for interacting with Kubernetes APIs: https://github.com/fabric8io/kubernetes-client

Space shortcuts

Child pages

Problem

Background

Proposed Changes

Host Affinity & Kubernetes

Implementation

Job-Coordiantor Details

Comparison with Samza on Yarn

Reference

Space shortcuts

Child pages

SEP-20: Samza on Kubernetes

Problem

Background

Proposed Changes

Host Affinity & Kubernetes

Implementation

Job-Coordiantor Details

Comparison with Samza on Yarn

Reference