Scaling Container Technologies at Coinbase with Kubernetes
June 6, 2022
Tl;dr: Our latest analysis of Kubernetes underscored its suitability for scaling Coinbase into the long run. In the previous, a migration to Kubernetes raised considerations as a result of operational burden of operating and securing the management aircraft in-house. We’ve now concluded that managed Kubernetes choices scale back this operational burden with out compromising our stack safety.
By Clare Curtis, Coinbase Staff Software Engineer
Almost two years in the past we launched a weblog publish detailing why Kubernetes is not part of our technical stack. At the time, migrating to Kubernetes would have created an entire new set of issues that outweighed any near-term advantages. However, as these applied sciences have matured, our newly-formed Compute Team devised a method for leveraging Kubernetes in a means that may ship a extra versatile and scalable model of our present system.
Coinbase has grown considerably since we first thought of migrating to Kubernetes. With any progress of this type, it is very important prioritize scalability considerations. As we proceed to scale, one of many foremost areas in want of future-proofing is Coinbase’s compute platform. In mid-2020, our largest service was configured to run a comparatively small variety of hosts, whereas at the moment it’s operating 10x that quantity.
In this similar interval, we quadrupled the scale of our engineering group inflicting a considerable improve within the variety of deployments — every needing fully new hosts. The improve within the variety of deployments have raised considerations over future scalability as we’re already operating into technical limitations of present APIs and assets. Recurring points with getting sufficient capability and having it delivered in an inexpensive timeframe, prompted a rise in failed deployments and required our largest providers to dramatically decelerate their launch course of.
While these points are solvable, we determined to take this chance to judge whether or not it made sense to proceed investing in a homegrown system or think about an open supply different that will be way more scalable within the lengthy time period.
In our analysis of Kubernetes, we discovered that one of many greatest benefits of a migration is that it decouples host provisioning from service deployment, shifting the burden of managing host acquisition from particular person groups to the broader Infrastructure crew. This empowers the Infrastructure crew to take a holistic method to host administration. Also, capability constraints are much less prone to have an effect on deployments, and we scale back the quantity of cloud supplier particular information that particular person engineers want to take care of.
The Kubernetes neighborhood has created a wealth of knowledge and tooling that we are able to make the most of to supply higher assist to groups and rapidly allow new options. Additionally, as Kubernetes is extensible, there’s nonetheless the choice to construct tooling internally and open supply it to be used throughout the wider neighborhood.
Security is extremely necessary at Coinbase and securing Kubernetes clusters is a non-trivial endeavor. Transitioning from highly-isolated and single-tenant compute to a system which promotes multi-tenancy requires deliberate safety design and consideration. Because we’ve high-security workloads the place we’ve to ensure isolation, we should run separate clusters and construct automated tooling that handles all cluster operations. Giving people entry to function high-security infrastructure isn’t allowed.
Managed Kubernetes choices, equivalent to AWS EKS, tackle the duty of working, sustaining, and securing the management aircraft, decreasing the operational burden of operating many clusters. Reducing our operational burden and safety duty permits us to deal with constructing the orchestration and automation that’s required to assist many clusters throughout a big engineering group. EKS has considerably matured over the previous few years and proven that it gives secure, operational Kubernetes whereas additionally integrating with options which can be generally utilized in EC2 equivalent to with the ability to connect safety teams to pods and IAM Roles to service accounts. Having these integrations reduces the danger and price related to migration, as they permit for migration with out having to vary the identification or entry patterns of our present platform.
While the migration to Kubernetes spurred considerations previously, we’ve now concluded that managed Kubernetes choices, equivalent to AWS EKS, can scale back the operational burden with out compromising safety. Ultimately, we realized there’s a clear ceiling to the flexibility of our homegrown system to scale, and whereas there’s a massive arrange and migration value related to a transfer to Kubernetes, we’re assured that it will likely be extra versatile and scalable than our present system.