Best Practices for Migrating Your Chef Server

Last week we presented a webinar Best Practices for Migrating Your Chef Server, which was a follow-up to the popular blog post from 2017, Migrating your Chef Server with knife-ec-backup and knife-tidy.  

During this talk, our participants brought so many excellent questions that we couldn’t answer them all. In this post, we’ll attempt to answer some of the most frequently asked questions:

Knife-ec-backup Installation and Dependencies

In general, you do not want to install additional gems into the Chef Server’s embedded gemset. Doing so can have unintentional consequences for the gem dependencies of the Chef Server. For my demo, I chose to install a ChefDK instance on my Chef Servers for simplicity. In practice, you can use any method you want to install a separate Ruby environment on your Chef Server.

Because knife-ec-backup relies on the pg (postgresql) gem, you will need to install some build dependencies to compile the native extension for the pg gem dependency. Additionally, the package names of the dependencies can vary from platform to platform. Our previous migration blog post provides more detail on the specific platform dependencies required to install knife-ec-backup.

Running knife-ec-backup and knife-tidy From a Workstation

You can use knife-ec-backup from a workstation for most, but not all, use cases. Specifically the --with-user-sql and --with-key-sql options requires access to the listening postgresql port on the Chef Server. This is not publicly exposed on the Chef server, so those options require local Chef server access or a port forwarding configuration that allows external access.

When to Use –with-user-sql

If you don’t use Chef Manage or a private Chef Supermarket, you can skip this flag.

Without this option, knife-ec-backup will use the /users/USER endpoint to backup and restore user objects. However, the API does not handle authentication fields used for oc-id, the Chef Server’s OAuth 2.0 plugin used for authentication to Chef Manage and private Chef Supermarket. Using --with-user-sql will fetch and upload user records directly to the Postgresql database, including the fields used by oc-id.

Editor’s Note: As of knife-ec-backup 2.4.0, there are no longer restrictions on restoring knife-ec-backup data sets from multiple chef servers with user sql records into a single chef server. Additionally other improvements have gone into our products to address issues encountered when migrating data from older Chef Servers with less stringent data validation. We encourage you to use the latest versions of our packages wherever possible.

When to Use –with-key-sql

If you don’t have any chef clients with multiple clients keys, you can skip this flag.

Without this option, knife-ec-backup will use the /clients/CLIENT endpoint to backup and restore client objects. However, this endpoint assumes a client only has a single client key. For any clients that have had additional client keys added with the /clients/CLIENT/keys endpoint, only the first key will be returned. Using --with-key-sql will fetch and upload all client key records directly from the Postgresql database.

Combining and Splitting Chef Servers

Let’s say you aren’t doing an end-to-end Chef Server migration, and you just want to move organization data from one Chef Server to another. You can use the –only-org option to specify a single Chef organization to backup or restore. Another option is to backup the entire Chef Server, and manually delete the organizations and sub-organization data you don’t want to be restored out of your backup directory. You can even combine data from separate ec-backups, as long as you manually account for conflicting data between them. Examples of these conflicts could include:

  • Two roles in two Chef Servers that share the same organization and role name but different content.
  • Two users in two Chef Servers that share a username but different emails
  • Two cookbooks in two Chef Servers that share the same organization, cookbook name, and version but different content

Trimming Additional Data In-flight

The output of a successful ec-backup is a directory structure of json documents that mirrors the structure of your Chef Server. At the top level are global objects like users, user acls, user keys, and an organizations directory that contains the nested objects representing each organization’s data. You can combine, edit, and remove objects to tailor your backup as needed. As knife-tidy highlights, nodes and cookbooks are easily the largest objects in your backup. A flexible backup and restore strategy customized to your company’s usage of the Chef Server can often eliminate non-essential data to optimize your storage requirements and backup/restore times. A few examples of this could include:

  • Scrubbing attribute data out of node json objects. Since node attributes are driven through converging a run_list, a minimal node object would only include: node_name, run_list, and environment.
  • Leveraging CI/CD pipelines to upload cookbooks from source to Chef Servers rather than storing them in backups.
  • Deprecating use of older cookbook versions to reduce the number of cookbook versions required for Chef Server operations.

Cutover Practices

Due to the approach of incremental backups and restores, the only downtime necessary for your migration is the combined time of your final incremental backup, syncing your backup to your destination, and your final incremental restore. That means you can practice and perfect your migration process without interrupting your existing Chef server leading up to the migration.

After backing up, syncing, and restoring your initial full backup to the destination server, you will want to script this process into a repeatable scheduled job that can run on a fixed interval (likely nightly). The timing of these “catchup” syncs will be indicative of your required downtime when you are ready to perform your final migration cutover.

Performing a Smooth Cut-over

  • Recruit several non-production nodes to become Canary nodes, pointing at the new server well in advance of the rest of the fleet
  • After taking the final backup of the source Chef server, you want to stop accepting new writes to it.  This can be as simple as stopping the server (in a Standalone configuration) or taking down the load balancer VIP in a cluster.
  • The simplest way to switch all of your Chef clients over to the new cluster (after the final sync is complete) is to use DNS – either update the alias or target IP on your DNS server.
  • It is wise to lower the TTL (time-to-live) of your Chef server’s DNS record to 300 seconds at least 24 hours before the cut-over.

Combining the Cut-over With a DNS Name Change

  • You’ll need to update the chef_server_url value in the Chef client.rb configuration file on every node
  • There are several valid approaches for changing the Chef client.rb configuration on your entire fleet, but we recommend using the chef_client cookbook.
  • It’s very important to verify that the new SSL certificate will be trusted by your fleet of nodes before cut-over.  Watch your canary nodes carefully for SSL errors.

Enterprise vs Open Source Chef Server vs OpsWorks

The “EC” in knife-ec-backup refers to Enterprise Chef, which created a few questions about what that was. Back in 2014 we had two separate code lines for Chef Server: Enterprise Chef 11 (a commercial product which had multi-tenancy, RBAC and HA) and Open Source Chef Server 11. With the release of Chef Server 12 we eliminated this difference by releasing the chef-server-core package as open source based on the Enterprise Chef code. For more information, see our post here.

AWS OpsWorks for Chef Automate as well as all cloud Marketplace offerings of Chef Automate also include the latest chef-server-core package (aka Chef Server 12).

What this means for you it that it is possible to use knife-ec-backup and knife-tidy to migrate between any version of Enterprise Chef, Chef Server 12 or our Marketplace and managed services offerings! 

If you are migrating data from an Open Source Chef 11 Server, check out our notes for upgrading from Open Source Chef 11.

Have more questions?

Join us in the #chef-server-migration channel in the Chef Community Slack!

See you at ChefConf!

Posted in:

Josh Hudson

Josh is Technical Lead of the Strategic Customer Engineering team at Chef. Josh has been at Chef for over two years helping our largest customers solve problems of scale using Chef tools and DevOps best practices. He has over a decade of experience building and managing systems.