We want to remove any cloud provider specific logic from the kubernetes/kubernetes repo. We want to restructure the code to make it easy for any cloud provider to extend the kubernetes core in a consistent manner for their cloud. New cloud providers should look at the Creating a Custom Cluster from Scratch and the cloud provider interface which will need to be implemented.
We are trying to remove any dependencies from Kubernetes Core to any specific cloud provider. Currently we have seven such dependencies. To prevent this number from growing we have locked Kubernetes Core to the addition of any new dependencies. This means all new cloud providers have to implement all their pieces outside of the Core. However everyone still ends up consuming the current set of seven in repo dependencies. For the seven in repo cloud providers any changes to their specific cloud provider code requires OSS PR approvals and a deployment to get those changes in to an official build. The relevant dependencies require changes in the following areas.
For the cloud providers who are in repo, moving out would allow them to more quickly iterate on their solution and decouple cloud provider fixes from open source releases. Moving the cloud provider code out of the open source processes means that these processes do not need to load/run unnecessary code for the environment they are in. We would like to abstract a core controller manager library so help standardize the behavior of the cloud controller managers produced by each cloud provider. We would like to minimize the number and scope of controllers running in the cloud controller manager so as to minimize the surface area for per cloud provider deviation.
Have a cloud controller manager in the kubernetes main repo which hosts all of the controller loops for the in repo cloud providers. Do not run any cloud provider logic in the kube controller manager, the kube apiserver or the kubelet. At intermediary points we may just move some of the cloud specific controllers out. (Eg. volumes may be later than the rest)
Forcing cloud providers to use the generic cloud manager.
For the controller manager we would like to create a set of common code which can be used by both the cloud controller manager and the kube controller manager. The cloud controller manager would then be responsible for running controllers whose function is specific to cloud provider functionality. The kube controller manager would then be responsible for running all controllers whose function was not related to a cloud provider.
In order to create a 100% cloud independent controller manager, the controller-manager will be split into multiple binaries.
kube-controller-managerthat is being shipped with kubernetes releases.
The cloud dependent binaries will run those loops that rely on cloudprovider in a separate process(es) within the kubernetes control plane. The rest of the controllers will be run in the cloud independent controller manager. The decision to run entire controller loops, rather than only the very minute parts that rely on cloud provider was made because it makes the implementation simple. Otherwise, the shared data structures and utility functions have to be disentangled, and carefully separated to avoid any concurrency issues. This approach among other things, prevents code duplication and improves development velocity.
Note that the controller loop implementation will continue to reside in the core repository. It takes in cloudprovider.Interface as an input in its constructor. Vendor maintained cloud-controller-manager binary could link these controllers in, as it serves as a reference form of the controller implementation.
There are four controllers that rely on cloud provider specific code. These are node controller, service controller, route controller and attach detach controller. Copies of each of these controllers have been bundled together into one binary. The cloud dependent binary registers itself as a controller, and runs the cloud specific controller loops with the user-agent named “external-controller-manager”.
RouteController and serviceController are entirely cloud specific. Therefore, it is really simple to move these two controller loops out of the cloud-independent binary and into the cloud dependent binary.
NodeController does a lot more than just talk to the cloud. It does the following operations -
While Monitoring Node status, if the status reported by kubelet is either ‘ConditionUnknown’ or ‘ConditionFalse’, then
the controller checks if the node has been deleted from the cloud provider. If it has already been deleted from the
cloud provider, then it deletes the nodeobject without waiting for the
monitorGracePeriod amount of time. This is the
only operation that needs to be moved into the cloud dependent controller manager.
Finally, The attachDetachController is tricky, and it is not simple to disentangle it from the controller-manager easily, therefore, this will be addressed with Flex Volumes (Discussed under a separate section below)
The kube-controller-manager has many controller loops. See NewControllerInitializers
Among these controller loops, the following are cloud provider dependent.
The nodeController uses the cloudprovider to check if a node has been deleted from the cloud. If cloud provider reports a node as deleted, then this controller immediately deletes the node from kubernetes. This check removes the need to wait for a specific amount of time to conclude that an inactive node is actually dead.
The volumeController uses the cloudprovider to create, delete, attach and detach volumes to nodes. For instance, the logic for provisioning, attaching, and detaching a EBS volume resides in the AWS cloudprovider. The volumeController uses this code to perform its operations.
The routeController configures routes for hosts in the cloud provider.
The serviceController maintains a list of currently active nodes, and is responsible for creating and deleting LoadBalancers in the underlying cloud.
Moving on to the kubelet, the following cloud provider dependencies exist in kubelet.
The majority of the calls by the kubelet to the cloud is done during the initialization of the Node Object. The other uses are for configuring Routes (in case of GCE), scrubbing DNS, and periodically polling for IP addresses.
All of the above steps, except the Node initialization step can be moved into a controller. Specifically, IP address polling, and configuration of Routes can be moved into the cloud dependent controller manager.
Scrubbing DNS was found to be redundant. So, it can be disregarded. It is being removed.
Finally, Node initialization needs to be addressed. This is the trickiest part. Pods will be scheduled even on uninitialized nodes. This can lead to scheduling pods on incompatible zones, and other weird errors. Therefore, an approach is needed where kubelet can create a Node, but mark it as “NotReady”. Then, some asynchronous process can update it and mark it as ready. This is now possible because of the concept of Taints.
This approach requires kubelet to be started with known taints. This will make the node unschedulable until these taints are removed. The external cloud controller manager will asynchronously update the node objects and remove the taints.
Finally, in the kube-apiserver, the cloud provider is used for transferring SSH keys to all of the nodes, and within an admission controller for setting labels on persistent volumes.
Kube-apiserver uses the cloud provider for two purposes
Volumes need cloud providers, but they only need specific cloud providers. The majority of volume management logic resides in the controller manager. These controller loops need to be moved into the cloud-controller manager. The cloud controller manager also needs a mechanism to read parameters for initialization from cloud config. This can be done via config maps.
There are two entirely different approach to refactoring volumes - Flex Volumes and CSI Container Storage Interface. There is an undergoing effort to move all of the volume logic from the controller-manager into plugins called Flex Volumes. In the Flex volumes world, all of the vendor specific code will be packaged in a separate binary as a plugin. After discussing with @thockin, this was decidedly the best approach to remove all cloud provider dependency for volumes out of kubernetes core. Some of the discovery information for this can be found at https://goo.gl/CtzpVm.
This change will introduce new binaries to the list of binaries required to run kubernetes. The change will be designed
such that these binaries can be installed via
kubectl apply -f and the appropriate instances of the binaries will be
Issues such as monitoring, configuring the new binaries will generally be left to cloud provider. However they should ensure that test runs upload the logs for these new processes to test grid.
Applying the cloud controller manager is the only step that is different in the upgrade process. In order to complete the upgrade process, you need to apply the cloud-controller-manager deployment to the setup. A deployment descriptor file will be provided with this change. You need to apply this change using
kubectl apply -f cloud-controller-manager.yml
This will start the cloud specific controller manager in your kubernetes setup.
The downgrade steps are also the same as before for all the components except the cloud-controller-manager. In case of the cloud-controller-manager, the deployment should be deleted using
kubectl delete -f cloud-controller-manager.yml
This is a proposed structure, and may change during the 1.11 release cycle. WG-Cloud-Provider will work with individual sigs to refine these requirements to maintain consistency while meeting the technical needs of the provider maintainers
Each cloud provider hosted within the
kubernetes organization shall have a
single repository named
repositories shall have the following structure:
cloud-controller-managersubdirectory that contains the implementation of the provider-specific cloud controller.
docs/cloud-controller-manager.mdfile that describes the options and usage of the cloud controller manager code.
docs/testing.mdfile that describes how the provider code is tested.
testentrypoint to run the provider tests.
Additionally, the repository should have:
docs/getting-started.mdfile that describes the installation and basic operation of the cloud controller manager code.
Where the provider has additional capabilities, the repository should have the following subdirectories that contain the common features:
dnsfor DNS provider code.
cnifor the Container Network Interface (CNI) driver.
csifor the Container Storage Interface (CSI) driver.
flexfor the Flex Volume driver.
installerfor custom installer code.
Each repository may have additional directories and files that are used for additional feature that include but are not limited to:
This purpose of these requirements is to define a common structure for the cloud provider repositories owned by current and future cloud provider SIGs. In accordance with the WG-Cloud-Provider Charter to “define a set of common expected behaviors across cloud providers”, this proposal defines the location and structure of commonly expected code.
As each provider can and will have additional features that go beyond expected common code, requirements only apply to the location of the following code:
This document may be amended with additional locations that relate to enabling consistent upstream testing, independent storage drivers, and other code with common integration hooks may be added
The development of the Cloud Controller Manager and Cloud Provider Interface has enabled the provider SIGs to develop external providers that capture the core functionality of the upstream providers. By defining the expected locations and naming conventions of where the external provider code is, we will create a consistent experience for:
To facilitate community development, providers named in the
Makes SIGs responsible for implementations of
patch can immediately migrate their external provider work into their named
Each provider will work to implement the required structure during the Kubernetes 1.11 development cycle, with conformance by the 1.11 release. WG-Cloud-Provider may actively change repository requirements during the 1.11 release cycle to respond to collective SIG technical needs.
After the 1.11 release all current and new provider implementations must conform with the requirements outlined in this document.
Make sure that you consider the impact of this feature from the point of view of Security.
How will we know that this has succeeded?
Gathering user feedback is crucial for building high quality experiences and SIGs have the important responsibility of
setting milestones for stability and completeness.
Hopefully the content previously contained in umbrella issues will be tracked in the
Graduation Criteria section.
As part of the graduation to
stable or General Availability (GA), we have set
both process and technical goals.
We propose the following repository structure for the cloud providers which
currently live in
email@example.com:kubernetes/cloud-provider-wg firstname.lastname@example.org:kubernetes/cloud-provider-aws email@example.com:kubernetes/cloud-provider-azure firstname.lastname@example.org:kubernetes/cloud-provider-gcp email@example.com:kubernetes/cloud-provider-openstack
We propose this structure in order to obtain
The use of a tracking repository
is proposed to
*Major milestones in the life cycle of a KEP should be tracked in
Major milestones might include
Motivationsections being merged signaling SIG acceptance
Proposalsection being merged signaling agreement on a proposed design
The ultimate intention of WG Cloud Provider is to prevent multiple classes of software purporting to be an implementation of the Cloud Provider interface from fracturing the Kubernetes Community while also ensuring that new Cloud Providers adhere to standards of quality and whose management follow Kubernetes Community norms.
One alternate to consider is the use of a side-car. The cloud-interface in tree could then be a GRPC call out to that side-car. We could then leave the Kube API Server, Kube Controller Manager and Kubelet pretty much as is. We would still need separate repos to hold the code for the side care and to handle cluster setup for the cloud provider. However we believe that different cloud providers will (already) want different control loops. As such we are likely to need something like the cloud controller manager anyway. From the perspective it seems easier to centralize the effort in that direction. In addition it should limit the proliferation of new processes across the entire cluster.