Last updated: September 20, 2016
Links: * 1.3 Schedule Dates
This document is intended to chronicle the decisions made by the Storage SIG near the end of the Kubernetes 1.3 release with the storage stack that were not well understood by the wider community. This document should explain those decisions, why the SIG made the exception, detail the impact, and offer lessons learned for the future.
Kubernetes 1.2 had numerous problems and issues with the storage framework that arose from organic growth of the architecture as it tackled numerous new features it was not initially designed for. There were race conditions, maintenance and stability issues, and architectural problems with all major components of the storage stack including the Persistent Volume (PV) & Persistent Volume Claim (PVC) controller and the attach/detach and mount/unmount logic.
The PV/PVC controller handles the connection of provisioned storage volumes to a user claim for storage. The attach/detach logic handles how volumes are attached to hosts. The mount/unmount logic handles how volumes are mounted into containers. Architecturally in 1.2 the attach/detach logic was part of the kubelet on the node.
A characteristic list of issues (as not all of them were well captured in GitHub issues) include:
Below are the Github Issues that were filed for this area: * Problem rescheduling POD with GCE PD disk attached (#14642) * GCE PD Volumes already attached to a node fail with “Error 400: The disk resource is already being used by” node (#19953) * Kubelet should be able to delete 10 pods per node in 1m0s (#23591) * Detach EBS volumes when node restarted (#26847) * Technical debt: refactor Kubelet.HandlePodCleanups into separate thread (#19645) * EBS volume mount failures due to “… already attached to an instance” are not retried (#18785) * Node upgrades: e2e test: ensure persistent volumes survive when pods die (#6084) * Consider “attach controller” to secure cloud provider credentials (#12399)
Addressing these issues was the main deliverable for storage in 1.3. This required an in depth rewrite of several components.
Early in the 1.3 development cycle (March 28 to April 1, 2016) several community members in the Storage SIG met at a week long face-to-face summit at Google’s office in Mountain View to address these issues. A plan was established to approach the attach/detach/mount/unmount issues as a deliberate effort with contributors already handling the design. Since that work was already in flight and a plan established, the majority of the summit was devoted to resolving the PV/PVC controller issues. Meeting notes were captured in this document.
Three projects were planned to fix the issues outlined above: * PV/PVC Controller Redesign (a.k.a. Provisioner/Binder/Recycler controller) * Attach/Detach Controller * Kubelet Volume Redesign
At the end of the design summit, the attendees of the summit agreed to pseudo code for a re-written PV/PVC controller and a go-forward plan for the attach/detach controller and kubelet volume redesign.
Resources were established for the PV/PVC controller rework at the conclusion of the design summit and the existing resources on the attach/detach/mount/unmount work deemed acceptable to complete the other two projects.
At this point, a group of engineers were assigned to work on the three efforts that compromised the overhaul. The plan was to not only include development work but comprehensive testing with time to have the functionality “soak” weeks before 1.3 shipped. These engineers were composed of a hybrid team of Red Hat and Google. The allocation of work made making all three sub deliverables in 1.3 aggressive but reasonable.
Near the end of 1.3 development, on May 13, 2016, approximately one week prior to code freeze, a key engineer for this effort left the project. This disrupted the Kubelet Volume Redesign effort. The PV/PVC controller was complete (PR #24331) and committed at this point. However the Attach/Detach Controller was dependent on the Kubelet Volume Redesign and was impacted.
The leads involved with the projects met and the Kubelet Volume Redesign work was handed off from one engineer to another familiar with Storage. The decision to continue this work after the 1.3 code freeze date of May 20 was based on the need to address the outstanding issues in 1.2. Also much of the Attach/Detach Controller work had been committed but was dependent on the Kubelet Volume Redesign effort.
The Kubelet Volume Redesign involved changing fundamental assumptions of data flow and volume operations in kubelet. The high level change introduced a new volume manager in kubelet that handled mount/unmount logic and enabled attach/detach logic to be offloaded to the master (by default, while retaining the ability for kubelet to do attach/detach on its own). The remaining work to complete the effort was the kubelet volume redesign PR (#26801). This combined with the attach/detach controller (PR #25457) were substantial changes to the stack.
The value of the feature freeze date is to ensure the release has time to stabilize. Refactoring or features that need to be merged past feature freeze date as an exception should be a tool that can be used, albeit sparingly, for the sake of a release. Exceptions should meet certain requirements which the Kubelet Volume Redesign did not meet.
Kubernetes is an incredibly fast moving project, with hundreds of active contributors creating a solution that thousands of organization rely on. Stability, trust, and openness are paramount in both the product and the community around Kubernetes. We undertook this retrospective effort to learn from the 1.3 release’s shipping delay. These action items and other work in the upcoming releases are part of our commitment to continually improve our project, our community, and our ability to deliver production-grade infrastructure platform software.