Berkshelf API Remediation Followup

As you may be aware, on 2014-05-14, there was an outage to the Berkshelf API service that Berkshelf v3+ uses to resolve dependencies from the Community site. We posted previously a postmortem of the incident. I wanted to take a moment and follow-up with the community where we’re at with the remediation items.

Chef’s operations team has set up Nagios monitoring for the application. This is a simple Ruby script that checks that the service is healthy externally and that the cache is ok. This allows us to better support the application. We also set up a metrics dashboard so we can check the overall health of the service, including application service times, request times, and most importantly, memory usage.

We will continue to improve support for those using a Berkshelf-based workflow for developing and managing cookbooks. The functionality provided by will be rolled into the Supermarket project as an API endpoint. That means it will be built into the community site.

Joshua Timberman

Joshua Timberman is a Code Cleric at CHEF, where he Cures Technical Debt Wounds for 1d8+5 lines of code, casts Protection from Yaks, and otherwise helps continuously improve internal technical process.