A Host volume source is probably the simplest volume type to define, needing only a single path. However, that simplicity comes with many assumptions and caveats.
This proposal describes one of the issues associated with Host volumes — their silent and implicit creation of directories on the host — and proposes a solution.
Right now, under Docker, when a bindmount references a hostPath, that path will be created as an empty directory, owned by root, if it does not already exist. This is rarely what the user actually wants because hostPath volumes are typically used to express a dependency on an existing external file or directory. This concern was raised during the initial implementation of this behavior in Docker and it was suggested that orchestration systems could better manage volume creation than Docker, but Docker does so as well anyways.
To fix this problem, I propose allowing a pod to specify whether a given hostPath should exist prior to the pod running, whether it should be created, and what it should exist as. I also propose the inclusion of a default value which matches the current behavior to ensure backwards compatibility.
To understand exactly when this behavior will or won’t be correct, it’s important to look at the use-cases of Host Volumes. The table below broadly classifies the use-case of Host Volumes and asserts whether this change would be of benefit to that use-case.
|Use-case||Description||Examples||Benefits from this change?||Why?|
|Accessing an external system, data, or configuration||Data or a unix socket is created by a process on the host, and a pod within kubernetes consumes it||fluentd-es-addon, addon-manager, kube-proxy, etc||:white_check_mark:||Fails faster and with more useful messages, and won’t run when basic assumptions are false (e.g. that docker is the runtime and the docker.sock exists)|
|Providing data to external systems||Some pods wish to publish data to the host for other systems to consume, sometimes to a generic directory and sometimes to more component-specific ones||Kubelet core components which bindmount their logs out to
||:white_check_mark:||Sometimes, but not always. It’s directory-specific whether it not existing will be a problem.|
|Communicating between instances and versions of yourself||A pod can use a hostPath directory as a sort of cache and, as opposed to an emptyDir, persist the directory between versions of itself||etcd, caches||:x:||It’s pretty much always okay to create them|
One additional motivating factor for this change is that under the rkt runtime paths are not created when they do not exist. This change moves the management of these volumes into the Kubelet to the benefit of the rkt container runtime.
I propose that the
object be changed to include the following additional field:
Type - An optional string of
exists|file|device|socket|directory - If not
set, it will default to a backwards-compatible default behavior described
|unset||If nothing exists at the given path, an empty directory will be created there. Otherwise, behaves like
||If nothing exists at the given path, the pod will fail to run and provide an informative error message|
||If a file does not exist at the given path, the pod will fail to run and provide an informative error message|
||If a block or character device does not exist at the given path, the pod will fail to run and provide an informative error message|
||If a socket does not exist at the given path, the pod will fail to run and provide an informative error message|
||If a directory does not exist at the given path, the pod will fail to run and provide an informative error message|
Additional possible values, which are proposed to be excluded:
|Value||Behavior||Reason for exclusion|
||Like file, but if nothing exist an empty file is created instead||In general, bindmounting the parent directory of the file you intend to create addresses this usecase|
||If a path does not exist, then do not create any container-mount at all||This would better be handled by a new field entirely if this behavior is desirable|
This feature does not make sense for any of the other volume types simply because all of the other types are already fully qualified. For example, NFS volumes are known to always be in existence else they will not mount. Similarly, EmptyDir volumes will always exist as a directory.
Only the HostVolume and SubPath means of referencing a path have the potential to reference arbitrary incorrect or nonexistent things without erroring out.
One alternative is to augment Host Volumes with a
MustExist bool and provide
no further granularity. This would allow toggling between the
exists behaviors described above. This would likely cover the “90%” use-case
and would be a simpler API. It would be sufficient for all of the examples
linked above in my opinion.
It’s proposed that prior to starting a pod, the Kubelet validates that the
given path meets the qualifications of its type. Namely, if the type is
the Kubelet will create an empty directory if none exists there, and for each
of the others the Kubelet will perform the given validation prior to running
the pod. This validation might be done by a volume plugin, but further
technical consideration (out of scope of this proposal) is needed.
This proposal does not attempt to change the state of volume permissions. Currently, a HostPath volume is created with
root ownership and
755 permissions. This behavior will be retained. An argument for this behavior is given here.
This proposal should not impact SELinux relabeling. Verifying the presence and type of a given path will be logically separate from SELinux labeling. Similarly, creating the directory when it doesn’t exist will happen before any SELinux operations and should not impact it.
A containerized kubelet would have difficulty creating directories. The
implementation will likely respect the
containerized flag, or similar,
allowing it to either break out or be “/rootfs/” aware and thus operate as
Ideally the validation would be done at the time the bindmounts are created, else it’s possible for a given path or directory to change in the duration from when it’s validated and the container runtime attempts to create said mount.
The only way to solve this problem is to integrate these sorts of qualification into container runtimes themselves.
I don’t think this problem is severe enough that we need to push to solve it; rather I think we can simply accept this minor race, and if runtimes eventually allow this we can begin to leverage them.