Matt Wrock is a Principal Software Engineer at CenturyLink Cloud, where he works on data center automation. Many of you are already familiar with Matt from his blog Hurry Up and Wait! Matt writes about many aspects of software engineering, including Chef. He spoke on Entering the Chef Ecosystem From a Windows Background at ChefConf 2015, where he also received the title of Awesome Community Chef. Matt is the project founder of Boxstarter, which automates Windows installations and is a contributor to Chocolatey, a package manager for Windows.
In this blog post, Matt takes us through the Chef-based workflow that he and his team have adopted to automatically update CenturyLink Cloud’s infrastructure in an existing data center.
The workflow begins with local development. The following diagram shows the local development environment for creating cookbooks. This environment lives inside of a Vagrant box that is used by anyone at CenturyLink Cloud who works with Chef.
Matt says “We have a repo that’s our application repo. That includes one cookbook, our workstation cookbook, called chef_workstation. That cookbook sets up such things as Chef DK, the links to our Berkshelf API server, links to our Nexus repository manager, the internal gems, Docker and a Rakefile for testing. The cookbook is consumed by a Vagrant file, which first builds an Ubuntu image and then runs the cookbook. When it’s done, you’re in a prompt where knife commands work, you have access to Test Kitchen, and our vSphere provisioning driver.”
Matt noted that, for consistency, the same cookbook that builds the workstations also creates the build agents. If tests pass locally, there’s a good chance that they’ll pass on the test agents. Matt’s team uses TeamCity as its build server.
The team also uses Vagrant Cachier, which leverages Squid, a caching proxy for the web. As a result, although it can take about 10 minutes to build a box the first time, subsequent creation times are fast because packages and Chef DK are cached. It’s easy to set an environment up again if it becomes compromised.
Although Matt’s team uses Vagrant to build the workstation, they don’t use the Vagrant kitchen driver to test it. Instead, they use Docker or their actual cloud infrastructure. Docker works with most of the Linux instances. For Windows and the rest of the Linux instances, they use their own vSphere driver.
After a developer finds that a cookbook passes its ChefSpec and Kitchen tests locally, it’s pushed to the master on GitHub. That push is caught by the TeamCity build agents, which initiate Phase 1 of the build process. The following diagram shows Phase 1.
Initially, the build agents run ChefSpec and FoodCritic. If the cookbook passes,then another build occurs that bumps up the cookbook version. This process follows the environment cookbook pattern. Matt says, “We have an environment cookbook that basically acts as the glue for all the other cookbooks. It is simply a metadata.rb file and a Berksfile. It describes the top–level dependencies.”
When a cookbook passes its tests, it’s uploaded to the Chef server and the Berksfile.lock of the environment cookbook is updated. That Berksfile defines all of the versions that make up the app at any point in time. Finally, all the cookbook versions are promoted to the test environment.
Phase 2 is largely about running Kitchen tests. The following diagram shows Phase 2.
Matt says, “We call the first stage of our Kitchen tests a dirty test. If you made a change to a single cookbook and there are two other cookbooks that take a dependency on that cookbook, then we run a full set of Kitchen tests on all three cookbooks.”
If all the cookbooks from the Phase 2 build succeed, then it’s on to Phase 3. Phase 3 promotes the set of cookbooks that make up the application to a number of QA environments and then to the production environments. The following diagram shows Phase 3.
Matt described the process. “If all the builds (in Phase 2) succeed, then the cookbooks are promoted to one of our QA environments. These are real QA environments where people are doing real work against real infrastructure. That environment is made up of all the cookbooks that are green.
When we decide it’s time to do an actual deployment to production, all I have to do is go to my build server and press a button. The first thing that happens is all those cookbook versions in the green environment get circulated to all our QA data centers. Right now, there are three. The distribution happens one data center at a time. If every node converges in one data center, then the same repeats itself in the next data center.
Promoting the cookbooks to production is manual because our production environment and our QA environment can’t talk to each other. We have a knife plugin that takes everything that’s changed in the QA environment and pushes it to the production Chef server.
We first use a canary environment. Technically, it’s a customer data center but it’s a private cloud and the only customer is CenturyLink. If anything goes wrong in that deployment, we personally suffer but our external customers don’t.
Once the canary data center succeeds, the build server takes over and starts circulating the cookbook version from one data center to another. It’s sort of a continuous delivery situation. We have to manually initiate the process but everything that’s screened in QA should be deployable. We may not deploy it, but we can.”
Matt’s team uses Chef provisioning to provision the clusters in a data center. The following diagram shows how this works.
Because the QA and production environments are separate networks, there has to be a way to get the information from one environment to the other. The process uses what Matt refers to as “stamp databags.” A stamp is a self-contained environment. As private clouds become more popular, Matt believes that the number of stamps within a single data center will grow.
Each stamp’s databag contains all of the network and virtualization metadata the provisioner and the cookbooks need. The databag is a JSON file that contains, for example, the network gateway, DNS servers, hypervisor storage and hosts. The databag helps to determine where on the network to find something, such as an HAProxy node.
Of course, Matt sees ways of improving the workflow. For example, creating stamp databags isn’t yet automated. He also wants to make the rollout to the data centers faster. Matt’s passion for innovation and continunal improvement is something he sums up concisely. He says, “I’m never satisfied.”