In a self-hosted Kubernetes deployment (see this comment for background on self hosted kubernetes), we have the initial bootstrap problem. When running self-hosted components, there needs to be a mechanism for pivoting from the initial bootstrap state to the kubernetes-managed (self-hosted) state. In the case of a self-hosted kubelet, this means pivoting from the initial kubelet defined and run on the host, to the kubelet pod which has been scheduled to the node.
This proposal presents a solution to the kubelet bootstrap, and assumes a functioning control plane (e.g. an apiserver, controller-manager, scheduler, and etcd cluster), and a kubelet that can securely contact the API server. This functioning control plane can be temporary, and not necessarily the “production” control plane that will be used after the initial pivot / bootstrap.
In order to understand the goals of this proposal, one must understand what “self-hosted” means. This proposal defines “self-hosted” as a kubernetes cluster that is installed and managed by the kubernetes installation itself. This means that each kubernetes component is described by a kubernetes manifest (Daemonset, Deployment, etc) and can be updated via kubernetes.
The overall goal of this proposal is to make kubernetes easier to install and upgrade. We can then treat kubernetes itself just like any other application hosted in a kubernetes cluster, and have access to easy upgrades, monitoring, and durability for core kubernetes components themselves.
We intend to achieve this by using kubernetes to manage itself. However, in order to do that we must first “bootstrap” the cluster, by using kubernetes to install kubernetes components. This is where this proposal fits in, by describing the necessary modifications, and required procedures, needed to run a self-hosted kubelet.
The approach being proposed for a self-hosted kubelet is a “pivot” style
installation. This procedure assumes a short-lived “bootstrap” kubelet will run
and start a long-running “self-hosted” kubelet. Once the self-hosted kubelet is
running the bootstrap kubelet will exit. As part of this, we propose introducing
--bootstrap flag to the kubelet. The behaviour of that flag will be
explained in detail below.
We propose adding a new flag to the kubelet, the
--bootstrap flag, which is
assumed to be used in conjunction with the
--lock-file flag. The
flag is used to ensure only a single kubelet is running at any given time during
this pivot process. When the
--bootstrap flag is provided, after the kubelet
acquires the file lock, it will begin asynchronously waiting on
inotify events. Once an
“open” event is received, the kubelet will assume another kubelet is attempting
to take control and will exit by calling
Thus, the initial bootstrap becomes:
During an upgrade of the kubelet, for simplicity we will consider 3 kubelets, namely “bootstrap”, “v1”, and “v2”. We imagine the following scenario for upgrades:
Alternatively, it would also be possible via this mechanism to delete the “v1” daemonset first, allow the “bootstrap” kubelet to take over, and then introduce the “v2” kubelet daemonset, effectively eliminating the race between “bootstrap” and “v2” for lock acquisition, and the reliance on the failing health check procedure.
Eventually this could be handled by a DaemonSet upgrade policy.
This will allow a “self-hosted” kubelet with minimal new concepts introduced into the core Kubernetes code base, and remains flexible enough to work well with future bootstrapping services.
Various similar approaches have been discussed here and here. Other discussion around the kubelet being able to be run inside a container is here. Note this isn’t a strict requirement as the kubelet could be run in a chroot jail via rkt fly or other such similar approach.
Additionally, Taints and Tolerations, whose design has already been accepted, would make the overall kubelet bootstrap more deterministic. With this, we would also need the ability for a kubelet to register itself with a given taint when it first contacts the API server. Given that, a kubelet could register itself with a given taint such as “component=kubelet”, and a kubelet pod could exist that has a toleration to that taint, ensuring it is the only pod the “bootstrap” kubelet runs.