Scaling Node Operations at Coinbase

Tl;dr: This weblog shares insights on how Coinbase is investing in new instruments and processes to scale its node operations.

By Min Choi, Senior Engineering Manager — Crypto Reliability

Blockchain nodes energy virtually each person expertise at Coinbase. We use them to watch fund actions, assist our prospects earn their staking rewards, and construct the analytics wanted to help widespread options inside our functions. As such, with the ability to successfully handle blockchain nodes is significant to our core enterprise and we’re persevering with to put money into methods to scale our node operations.

One of essentially the most tough elements of node administration is maintaining with the fixed, and generally unpredictable, modifications to the node software program. Asset builders are constantly releasing new code variations and a few blockchains, corresponding to Tezos, leverage an on-chain governance mannequin to take a group vote on all proposed modifications. A decentralized governance mannequin corresponding to this makes it tough to foretell when a change can be launched and put together our inside techniques upfront. An instance of such a situation is depicted within the under Messari alert.

Data supplied by https://messari.io/

The penalties of not maintaining with these modifications may be extreme to our prospects. They might trigger lengthy delays to steadiness updates in our core wallets or slashed staking rewards. To assist decrease these incidents from occurring, we’re focusing investments into the next areas:

Asset Release Manager

This service offers us an additional pair of arms (or ought to I say “ARM”) to course of frequent node upgrades. All puns apart, the ARM service screens Github launch exercise for dozens of vital blockchains and automates the deployment of recent node binaries to our non-production environments. This frees up our engineers to give attention to service validations and work proactively with asset builders to resolve issues previous to manufacturing launch.

The under diagram exhibits the excessive degree knowledge circulate for ARM.

Here’s a latest instance of how the ARM service was leveraged to course of a node improve for Algorand.

  • On May 9 at 12:44 PM PDT, Algorand model 3.6.2 was launched.
  • On May 9 at 1:13 PM PDT, the ARM service filed a ticket to inform our engineers and observe the incoming change.
  • On May 9 at 1:43 PM PDT, the required code change was robotically generated for construct and deployment.
  • On May 9 at 2:13 PM PDT, the change was robotically deployed to all our non-production environments for Algorand.
  • On May 9 at 2:43 PM PDT, an error in one of many three deployments was detected and the ARM service escalated to an engineer to assist examine.
  • On May 10 at 6:27 AM PDT, the engineer resolved the deployment drawback and commenced service validation testing in preparation for manufacturing deployment.

As seen above on this occasion chronology, the system isn’t utterly touchless, that means engineers are nonetheless wanted as a part of the general improve course of. However, the ARM service permits us to transact a whole lot of those improve operations in parallel, saving numerous hours of engineering time which might then be reinvested into high quality assurance efforts.

Test-Runner

This is an orchestration service used to execute integration exams, each by way of temporal workflows and API calls to vital techniques throughout Coinbase. As the title could counsel, Test-Runner obtains and shops take a look at outcomes, aggregates them by metadata, and exposes an API to question the outcomes. By making it easy to create these exams and share standardized take a look at outcomes throughout our engineering groups, we’re capable of speed up our asset addition and incident response processes. We put a number of worth in constructing reusable integration exams as we view them as a basis of our asset upkeep regime.

The under diagram exhibits the excessive degree service structure for Test-Runner.

Here are additionally a number of primary examples of the varieties of exams which are in scope for Test-Runner.

  1. Balance transfers inside Coinbase.
  2. Deposits and withdrawals out and in of Coinbase.
  3. Sweep and restore operations between cold and warm wallets.
  4. Simple commerce operations (purchase/promote).
  5. Rosetta validation.

Each time a node is upgraded, these exams are robotically triggered by way of our steady integration (CI) pipeline, offering a transparent validation of success or failure. This helps our engineers make fast and knowledgeable operational choices corresponding to rolling again to a earlier model of the node binary.

Blockchain Pods

As we add extra blockchains to our help catalog, we’re investing in versatile engineering groups designed to collaborate on rising priorities. Our pods are roughly 5–7 engineers in measurement, are made up of website reliability and software program engineers, and provide alternatives to rapidly adapt to shifting market situations. For instance, we most lately fashioned a pod to focus particularly on Ethereum’s upcoming transition from a Proof-of-Work (POW) to a Proof-of-Stake (POS) blockchain. The Merge is a really giant and intensely advanced change, requiring almost all Coinbase techniques to regulate, however can be merely a one time occasion that doesn’t justify the formation of a everlasting engineering workforce.

We’re additionally within the technique of forming new pods to give attention to ERC-20 (Tokens) and ERC-721 (NFTs). In this fashion, we are able to pivot on the event of options that harness these requirements for the betterment of our prospects. By consistently forming and dissolving pods on this method, we’re capable of develop small economies of scale that rapidly meet our buyer wants. It additionally offers our engineers the flexibleness to decide on between areas of technological curiosity and construct subject material experience that assist them develop their careers at Coinbase.

Final Thoughts

Developing a complete technique for node administration is a difficult endeavor. While we acknowledge that our personal technique shouldn’t be with out flaws, we take delight in working on the slicing fringe of blockchain expertise. Everyday, Coinbase engineers work tirelessly in partnership with the better crypto group to beat these operational challenges. So in case you’re fascinated by constructing the monetary system of the longer term, try the openings on the Crypto Reliability (CREL) team at Coinbase.


Scaling Node Operations at Coinbase was initially revealed in The Coinbase Blog on Medium, the place persons are persevering with the dialog by highlighting and responding to this story.

Add a Comment

Your email address will not be published. Required fields are marked *