There are several methods to deploy a kubernetes cluster, one method that offers some unique advantages is self hosting. The purpose of this proposal is to outline a method to checkpoint specific annotated pods, namely the control plane components, for the purpose of enabling self hosting.
The details of self hosting are beyond the scope of this proposal, and are outlined in the references listed below:
Extra details on this proposal, and its history, can be found in the links below:
The scope of this proposal is bounded, but has the potential for broader reuse in the future. The reader should be mindful of the explicitly stated Non-Goals that are listed below.
The enablement of this feature is gated by a single command line flag that
is passed to the kubelet on startup,
and will be denoted that it is
node.kubernetes.io/bootstrap-checkpoint=trueis added to that Pod, which indicates that it should be checkpointed by the kubelet. When the kubelet receives a notification from the apiserver that a new pod is to run, it will inspect the
--bootstrap-checkpoint-pathflag to determine if checkpointing is enabled. Finally, the kubelet will perform an atomic write of a
Pod_UID.yamlfile when the afore mentioned annotation exists. The scope of this annotation is bounded and will not be promoted to a field.
--bootstrap-checkpoint-path. If the value is specified, it will read in the contents of the that directory and startup the appropriate Pod. Lastly, the kubelet will then pull the list of pods from the api-server and rectify what is supposed to be running according to what is bound, and will go through its normal startup procedure.
Due to its opt-in behavior, administrators will need to take the same precautions necessary in segregating master nodes, when enabling the bootstrap annotation.
Please see WIP Implementation for more details.
Graduating this feature is a responsibility of sig-cluster-lifecycle and sig-node to determine over the course of the 1.10 and 1.11 releases. History has taught us that initial implementations often have a tendency overlook use cases and require refinement. It is the goal of this proposal to have an initial alpha implementation of bootstrap checkpoining in the 1.9 cycle, and further refinement will occur after we have validated it across several deployments.
Testing of this feature will occur in three parts. - Unit testing of standard code behavior - Simple node-e2e test to ensure restart recovery - (TODO) E2E test w/kubeadm self hosted master restart recovery of an apiserver.