runtime-client-server

Client/Server container runtime

Abstract

A proposal of client/server implementation of kubelet container runtime interface.

Motivation

Currently, any container runtime has to be linked into the kubelet. This makes experimentation difficult, and prevents users from landing an alternate container runtime without landing code in core kubernetes.

To facilitate experimentation and to enable user choice, this proposal adds a client/server implementation of the new container runtime interface. The main goal of this proposal is:

  • make it easy to integrate new container runtimes
  • improve code maintainability

Proposed design

Design of client/server container runtime

The main idea of client/server container runtime is to keep main control logic in kubelet while letting remote runtime only do dedicated actions. An alpha container runtime API is introduced for integrating new container runtimes. The API is based on protobuf and gRPC for a number of benefits:

  • Perform faster than json
  • Get client bindings for free: gRPC supports ten languages
  • No encoding/decoding codes needed
  • Manage api interfaces easily: server and client interfaces are generated automatically

A new container runtime manager KubeletGenericRuntimeManager will be introduced to kubelet, which will

  • conforms to kubelet’s Runtime interface
  • manage Pods and Containers lifecycle according to kubelet policies
  • call remote runtime’s API to perform specific pod, container or image operations

A simple workflow of invoking remote runtime API on starting a Pod with two containers can be shown:

Kubelet                  KubeletGenericRuntimeManager       RemoteRuntime
   +                              +                               +
   |                              |                               |
   +---------SyncPod------------->+                               |
   |                              |                               |
   |                              +---- Create PodSandbox ------->+
   |                              +<------------------------------+
   |                              |                               |
   |                              XXXXXXXXXXXX                    |
   |                              |          X                    |
   |                              |    NetworkPlugin.             |
   |                              |       SetupPod                |
   |                              |          X                    |
   |                              XXXXXXXXXXXX                    |
   |                              |                               |
   |                              +<------------------------------+
   |                              +----    Pull image1   -------->+
   |                              +<------------------------------+
   |                              +---- Create container1 ------->+
   |                              +<------------------------------+
   |                              +---- Start container1 -------->+
   |                              +<------------------------------+
   |                              |                               |
   |                              +<------------------------------+
   |                              +----    Pull image2   -------->+
   |                              +<------------------------------+
   |                              +---- Create container2 ------->+
   |                              +<------------------------------+
   |                              +---- Start container2 -------->+
   |                              +<------------------------------+
   |                              |                               |
   | <-------Success--------------+                               |
   |                              |                               |
   +                              +                               +

And deleting a pod can be shown:

Kubelet                  KubeletGenericRuntimeManager      RemoteRuntime
   +                              +                               +
   |                              |                               |
   +---------SyncPod------------->+                               |
   |                              |                               |
   |                              +----   Stop container1   ----->+
   |                              +<------------------------------+
   |                              +----  Delete container1  ----->+
   |                              +<------------------------------+
   |                              |                               |
   |                              +----   Stop container2   ------>+
   |                              +<------------------------------+
   |                              +----  Delete container2  ------>+
   |                              +<------------------------------+
   |                              |                               |
   |                              XXXXXXXXXXXX                    |
   |                              |          X                    |
   |                              |    NetworkPlugin.             |
   |                              |       TeardownPod             |
   |                              |          X                    |
   |                              XXXXXXXXXXXX                    |
   |                              |                               |
   |                              |                               |
   |                              +---- Delete PodSandbox  ------>+
   |                              +<------------------------------+
   |                              |                               |
   | <-------Success--------------+                               |
   |                              |                               |
   +                              +                               +

API definition

Since we are going to introduce more image formats and want to separate image management from containers and pods, this proposal introduces two services RuntimeService and ImageService. Both services are defined at pkg/kubelet/api/v1alpha1/runtime/api.proto:

// Runtime service defines the public APIs for remote container runtimes
service RuntimeService {
    // Version returns the runtime name, runtime version and runtime API version
    rpc Version(VersionRequest) returns (VersionResponse) {}

    // CreatePodSandbox creates a pod-level sandbox.
    // The definition of PodSandbox is at https://github.com/kubernetes/kubernetes/pull/25899
    rpc CreatePodSandbox(CreatePodSandboxRequest) returns (CreatePodSandboxResponse) {}
    // StopPodSandbox stops the sandbox. If there are any running containers in the
    // sandbox, they should be force terminated.
    rpc StopPodSandbox(StopPodSandboxRequest) returns (StopPodSandboxResponse) {}
    // DeletePodSandbox deletes the sandbox. If there are any running containers in the
    // sandbox, they should be force deleted.
    rpc DeletePodSandbox(DeletePodSandboxRequest) returns (DeletePodSandboxResponse) {}
    // PodSandboxStatus returns the Status of the PodSandbox.
    rpc PodSandboxStatus(PodSandboxStatusRequest) returns (PodSandboxStatusResponse) {}
    // ListPodSandbox returns a list of SandBox.
    rpc ListPodSandbox(ListPodSandboxRequest) returns (ListPodSandboxResponse) {}

    // CreateContainer creates a new container in specified PodSandbox
    rpc CreateContainer(CreateContainerRequest) returns (CreateContainerResponse) {}
    // StartContainer starts the container.
    rpc StartContainer(StartContainerRequest) returns (StartContainerResponse) {}
    // StopContainer stops a running container with a grace period (i.e., timeout).
    rpc StopContainer(StopContainerRequest) returns (StopContainerResponse) {}
    // RemoveContainer removes the container. If the container is running, the container
    // should be force removed.
    rpc RemoveContainer(RemoveContainerRequest) returns (RemoveContainerResponse) {}
    // ListContainers lists all containers by filters.
    rpc ListContainers(ListContainersRequest) returns (ListContainersResponse) {}
    // ContainerStatus returns status of the container.
    rpc ContainerStatus(ContainerStatusRequest) returns (ContainerStatusResponse) {}

    // Exec executes the command in the container.
    rpc Exec(stream ExecRequest) returns (stream ExecResponse) {}
}

// Image service defines the public APIs for managing images
service ImageService {
    // ListImages lists existing images.
    rpc ListImages(ListImagesRequest) returns (ListImagesResponse) {}
    // ImageStatus returns the status of the image.
    rpc ImageStatus(ImageStatusRequest) returns (ImageStatusResponse) {}
    // PullImage pulls a image with authentication config.
    rpc PullImage(PullImageRequest) returns (PullImageResponse) {}
    // RemoveImage removes the image.
    rpc RemoveImage(RemoveImageRequest) returns (RemoveImageResponse) {}
}

Note that some types in pkg/kubelet/api/v1alpha1/runtime/api.proto are already defined at Container runtime interface/integration. We should decide how to integrate the types in #25899 with gRPC services:

  • Auto-generate those types into protobuf by go2idl
    • Pros:
    • trace type changes automatically, all type changes in Go will be automatically generated into proto files
    • Cons:
    • type change may break existing API implementations, e.g. new fields added automatically may not noticed by remote runtime
    • needs to convert Go types to gRPC generated types, and vise versa
    • needs processing attributes order carefully so as not to break generated protobufs (this could be done by using protobuf tag)
    • go2idl doesn’t support gRPC, protoc-gen-gogo is still required for generating gRPC client
  • Embed those types as raw protobuf definitions and generate Go files by protoc-gen-gogo
    • Pros:
    • decouple type definitions, all type changes in Go will be added to proto manually, so it’s easier to track gRPC API version changes
    • Kubelet could reuse Go types generated by protoc-gen-gogo to avoid type conversions
    • Cons:
    • duplicate definition of same types
    • hard to track type changes automatically
    • need to manage proto files manually

For better version controlling and fast iterations, this proposal embeds all those types in api.proto directly.

Implementation

Each new runtime should implement the gRPC server based on pkg/kubelet/api/v1alpha1/runtime/api.proto. For version controlling, KubeletGenericRuntimeManager will request RemoteRuntime’s Version() interface with the runtime api version. To keep backward compatibility, the API follows standard protobuf guide to deprecate or add new interfaces.

A new flag --container-runtime-endpoint (overrides --container-runtime) will be introduced to kubelet which identifies the unix socket file of the remote runtime service. And new flag --image-service-endpoint will be introduced to kubelet which identifies the unix socket file of the image service.

To facilitate switching current container runtime (e.g. docker or rkt) to new runtime API, KubeletGenericRuntimeManager will provide a plugin mechanism allowing to specify local implementation or gRPC implementation.

Community Discussion

This proposal is first filed by @brendandburns at kubernetes/13768: