pod-security-context

Abstract

A proposal for refactoring SecurityContext to have pod-level and container-level attributes in order to correctly model pod- and container-level security concerns.

Motivation

Currently, containers have a SecurityContext attribute which contains information about the security settings the container uses. In practice, many of these attributes are uniform across all containers in a pod. Simultaneously, there is also a need to apply the security context pattern at the pod level to correctly model security attributes that apply only at a pod level.

Users should be able to:

  1. Express security settings that are applicable to the entire pod
  2. Express base security settings that apply to all containers
  3. Override only the settings that need to be differentiated from the base in individual containers

This proposal is a dependency for other changes related to security context:

  1. Volume ownership management in the Kubelet
  2. Generic SELinux label management in the Kubelet

Goals of this design:

  1. Describe the use cases for which a pod-level security context is necessary
  2. Thoroughly describe the API backward compatibility issues that arise from the introduction of a pod-level security context
  3. Describe all implementation changes necessary for the feature

Constraints and assumptions

  1. We will not design for intra-pod security; we are not currently concerned about isolating containers in the same pod from one another
  2. We will design for backward compatibility with the current V1 API

Use Cases

  1. As a developer, I want to correctly model security attributes which belong to an entire pod
  2. As a user, I want to be able to specify container attributes that apply to all containers without repeating myself
  3. As an existing user, I want to be able to use the existing container-level security API

Use Case: Pod level security attributes

Some security attributes make sense only to model at the pod level. For example, it is a fundamental property of pods that all containers in a pod share the same network namespace. Therefore, using the host namespace makes sense to model at the pod level only, and indeed, today it is part of the PodSpec. Other host namespace support is currently being added and these will also be pod-level settings; it makes sense to model them as a pod-level collection of security attributes.

Use Case: Override pod security context for container

Some use cases require the containers in a pod to run with different security settings. As an example, a user may want to have a pod with two containers, one of which runs as root with the privileged setting, and one that runs as a non-root UID. To support use cases like this, it should be possible to override appropriate (i.e., not intrinsically pod-level) security settings for individual containers.

Proposed Design

SecurityContext

For posterity and ease of reading, note the current state of SecurityContext:

package api

type Container struct {
    // Other fields omitted

    // Optional: SecurityContext defines the security options the pod should be run with
    SecurityContext *SecurityContext `json:"securityContext,omitempty"`
}

type SecurityContext struct {
    // Capabilities are the capabilities to add/drop when running the container
    Capabilities *Capabilities `json:"capabilities,omitempty"`

    // Run the container in privileged mode
    Privileged *bool `json:"privileged,omitempty"`

    // SELinuxOptions are the labels to be applied to the container
    // and volumes
    SELinuxOptions *SELinuxOptions `json:"seLinuxOptions,omitempty"`

    // RunAsUser is the UID to run the entrypoint of the container process.
    RunAsUser *int64 `json:"runAsUser,omitempty"`

    // RunAsNonRoot indicates that the container should be run as a non-root user.  If the RunAsUser
    // field is not explicitly set then the kubelet may check the image for a specified user or
    // perform defaulting to specify a user.
    RunAsNonRoot bool `json:"runAsNonRoot,omitempty"`
}

// SELinuxOptions contains the fields that make up the SELinux context of a container.
type SELinuxOptions struct {
    // SELinux user label
    User string `json:"user,omitempty"`

    // SELinux role label
    Role string `json:"role,omitempty"`

    // SELinux type label
    Type string `json:"type,omitempty"`

    // SELinux level label.
    Level string `json:"level,omitempty"`
}

PodSecurityContext

PodSecurityContext specifies two types of security attributes:

  1. Attributes that apply to the pod itself
  2. Attributes that apply to the containers of the pod

In the internal API, fields of the PodSpec controlling the use of the host PID, IPC, and network namespaces are relocated to this type:

package api

type PodSpec struct {
    // Other fields omitted

    // Optional: SecurityContext specifies pod-level attributes and container security attributes
    // that apply to all containers.
    SecurityContext *PodSecurityContext `json:"securityContext,omitempty"`
}

// PodSecurityContext specifies security attributes of the pod and container attributes that apply
// to all containers of the pod.
type PodSecurityContext struct {
    // Use the host's network namespace. If this option is set, the ports that will be
    // used must be specified.
    // Optional: Default to false.
    HostNetwork bool
    // Use the host's IPC namespace
    HostIPC bool

    // Use the host's PID namespace
    HostPID bool

    // Capabilities are the capabilities to add/drop when running containers
    Capabilities *Capabilities `json:"capabilities,omitempty"`

    // Run the container in privileged mode
    Privileged *bool `json:"privileged,omitempty"`

    // SELinuxOptions are the labels to be applied to the container
    // and volumes
    SELinuxOptions *SELinuxOptions `json:"seLinuxOptions,omitempty"`

    // RunAsUser is the UID to run the entrypoint of the container process.
    RunAsUser *int64 `json:"runAsUser,omitempty"`

    // RunAsNonRoot indicates that the container should be run as a non-root user.  If the RunAsUser
    // field is not explicitly set then the kubelet may check the image for a specified user or
    // perform defaulting to specify a user.
    RunAsNonRoot bool
}

// Comments and generated docs will change for the container.SecurityContext field to indicate
// the precedence of these fields over the pod-level ones.

type Container struct {
    // Other fields omitted

    // Optional: SecurityContext defines the security options the pod should be run with.
    // Settings specified in this field take precedence over the settings defined in
    // pod.Spec.SecurityContext.
    SecurityContext *SecurityContext `json:"securityContext,omitempty"`
}

In the V1 API, the pod-level security attributes which are currently fields of the PodSpec are retained on the PodSpec for backward compatibility purposes:

package v1

type PodSpec struct {
    // Other fields omitted

    // Use the host's network namespace. If this option is set, the ports that will be
    // used must be specified.
    // Optional: Default to false.
    HostNetwork bool `json:"hostNetwork,omitempty"`
    // Use the host's pid namespace.
    // Optional: Default to false.
    HostPID bool `json:"hostPID,omitempty"`
    // Use the host's ipc namespace.
    // Optional: Default to false.
    HostIPC bool `json:"hostIPC,omitempty"`

    // Optional: SecurityContext specifies pod-level attributes and container security attributes
    // that apply to all containers.
    SecurityContext *PodSecurityContext `json:"securityContext,omitempty"`
}

The pod.Spec.SecurityContext specifies the security context of all containers in the pod. The containers’ securityContext field is overlaid on the base security context to determine the effective security context for the container.

The new V1 API should be backward compatible with the existing API. Backward compatibility is defined as:

  1. Any API call (e.g. a structure POSTed to a REST endpoint) that worked before your change must work the same after your change.
  2. Any API call that uses your change must not cause problems (e.g. crash or degrade behavior) when issued against servers that do not include your change.
  3. It must be possible to round-trip your change (convert to different API versions and back) with no loss of information.

Previous versions of this proposal attempted to deal with backward compatibility by defining the affect of setting the pod-level fields on the container-level fields. While trying to find consensus on this design, it became apparent that this approach was going to be extremely complex to implement, explain, and support. Instead, we will approach backward compatibility as follows:

  1. Pod-level and container-level settings will not affect one another
  2. Old clients will be able to use container-level settings in the exact same way
  3. Container level settings always override pod-level settings if they are set

Examples

  1. Old client using pod.Spec.Containers[x].SecurityContext

    An old client creates a pod:

    ```yaml apiVersion: v1 kind: Pod metadata: name: test-pod spec: containers:

    • name: a securityContext: runAsUser: 1001
    • name: b securityContext: runAsUser: 1002 ```

    looks to old clients like:

    apiVersion: v1
    kind: Pod
    metadata:
      name: test-pod
    spec:
      containers:
      - name: a
        securityContext:
          runAsUser: 1001
      - name: b
        securityContext:
          runAsUser: 1002

    looks to new clients like:

    apiVersion: v1
    kind: Pod
    metadata:
      name: test-pod
    spec:
      containers:
      - name: a
        securityContext:
          runAsUser: 1001
      - name: b
        securityContext:
          runAsUser: 1002
  2. New client using pod.Spec.SecurityContext

    A new client creates a pod using a field of pod.Spec.SecurityContext:

    ```yaml apiVersion: v1 kind: Pod metadata: name: test-pod spec: securityContext: runAsUser: 1001 containers:

    • name: a
    • name: b ```

    appears to new clients as:

    apiVersion: v1
    kind: Pod
    metadata:
      name: test-pod
    spec:
      securityContext:
        runAsUser: 1001
      containers:
      - name: a
      - name: b

    old clients will see:

    apiVersion: v1
    kind: Pod
    metadata:
      name: test-pod
    spec:
      containers:
      - name: a
      - name: b
  3. Pods created using pod.Spec.SecurityContext and pod.Spec.Containers[x].SecurityContext

    If a field is set in both pod.Spec.SecurityContext and pod.Spec.Containers[x].SecurityContext, the value in pod.Spec.Containers[x].SecurityContext wins. In the following pod:

    ```yaml apiVersion: v1 kind: Pod metadata: name: test-pod spec: securityContext: runAsUser: 1001 containers:

    • name: a securityContext: runAsUser: 1002
    • name: b ```

    The effective setting for runAsUser for container A is 1002.

Testing

A backward compatibility test suite will be established for the v1 API. The test suite will verify compatibility by converting objects into the internal API and back to the version API and examining the results.

All of the examples here will be used as test-cases. As more test cases are added, the proposal will be updated.

An example of a test like this can be found in the OpenShift API package

E2E test cases will be added to test the correct determination of the security context for containers.

Kubelet changes

  1. The Kubelet will use the new fields on the PodSecurityContext for host namespace control
  2. The Kubelet will be modified to correctly implement the backward compatibility and effective security context determination defined here