Blog-Delivery_100x385

Chef Roles Aren’t Evil

“If roles are evil, what about Al-Qaeda?”

You may laugh, but this is an actual quote from a session at the Opscode Community Summit this year. I want to dive deeper into the community’s apparent dislike for roles in Chef, explain why I think they are still useful, and outline some design patterns for using both them and role cookbooks effectively.

“Stahp Using Roles”

I think the Chef community’s revolt against roles crystallized with Jamie Winsor’s presentation entitled “The Berkshelf Way”. There is a slide that looks like this:

berkshelf-stahp-using-roles

Jamie’s a great guy and an incredible contributor to the Chef ecosystem. (We voted him an Awesome Chef, after all.) However, his advice — just as my advice in this post — should not be blindly followed without ensuring that it applies to your particular situation, and understanding both the advantages & disadvantages.

The title of the talk, “The Berkshelf Way”, also has unintended consequences when it comes to roles and whether you should use them. It implies that if you want to use Berkshelf, you must rigorously follow each and every principle in the talk. (Also, I wonder whether the above deck residing under Opscode’s Slideshare account makes readers believe that Jamie’s views on roles are the Official Opscode Viewpoint, whereas no such thing exists.)

Other well-known folks in the community, though, have also spoken out against roles. Doug Ireton from Nordstrom, for example, advocates against setting attributes or the run list in roles, which of course begets the question: what are roles good for?

What’s a Role Again?

Before I address some of the concerns that Jamie and Doug have raised, let’s review what a role is. Opscode’s documentation states:

A role is a way to define certain patterns and processes that exist across nodes in an organization as belonging to a single job function. Each role consists of zero (or more) attributes and a run list.

In other words, a role represents a server function, consisting of the run list and attributes needed to make that node take on that function.

Roles, of course, also factor into the attribute precedence/merge order chart:

overview_chef_attributes_table

Wait, what’s this force_default and force_override stuff?

You’ll notice that force_default and force_override are recent additions to this matrix. This looks like someone got backed into a corner with attribute precedence because they weren’t using roles anymore. If you don’t use roles, you lose attribute precedence levels 4 and 11, which means the only way to override a default attribute set in attribute files, recipes or environments is to use override. I bet the user was already using override levels 9, 10 or 11 for something else, so they didn’t have “enough levels of precedence”. As such, we wound up with more forcing.

One of my favorite quotes from Bryan McLellan, Opscode’s Technical Program Manager for Open Source, is that “more forcing is never the final forcing”. If you are doing this much forcing, you might be doing something wrong. In my view, “doing something wrong” is not using roles at all.

In Defense of Roles

Let’s address the main complaint about roles: they’re not versioned. But what most people want with versioned roles is to version the run list.

Suppose I have a base role and it contains three recipes in its run list: recipe[ntp::client], recipe[chef-client::config], recipe[chef-client]. If I want to add a fourth recipe, recipe[openssh], I’m faced with adding that across all machines that run the base role and deploying it right away. I might break my entire infrastructure that way! This is terrible, right? Yes, it is, which is why folks invented the idea of the “role cookbook” with one or more recipes emulating the run list of that role using include_recipe:

include_recipe "ntp::client"
include_recipe "chef-client::config"
include_recipe "chef-client"

Now if I need to add recipe[openssh] to the run_list, I can modify this recipe, adding include_recipe "openssh", bump the cookbook version, and deploy it across my environments in a controlled way.

Another reason why roles are still valuable: Chef Server has an index for roles, so you can dynamically discover other machines based on their role (function), e.g.

webservers = search(:roles, 'role:my_corporate_webservers')

If you don’t use roles at all, you don’t get to do this.

How is this different than the “Berkshelf Way”?

The “Berkshelf Way” advocates never using roles, but simply adding recipes directly to the run lists of your nodes.

Since you lose out on the attribute precedence and merge order that way, I recommend an alternative: having a role in which you set role-specific attributes, if required. The only thing you delegate to a role cookbook is the run list; the role’s run list contains one item, which is the default recipe of the role cookbook. You should have as many role cookbooks as you do roles, and each of those cookbooks should have one and only one recipe in it: the default recipe.

Wrapping Up: Sensible Design of Roles and Role Cookbooks

In summary:

  • Roles are useful because they factor into attribute precedence and merge order. Without them, you simply have more forcing.
  • Roles allow you to find servers by function within your cookbook code. For example, load balancers can find their backends, app servers can find their database servers, and so on.
  • Role cookbooks allow you to version your role’s run list.
  • If you use role cookbooks, have a role cookbook for every role (1:1). This minimizes the number of dependencies in your role cookbook’s metadata. Don’t have a single role cookbook called “roles”, because this cookbook will depend on every other cookbook in your infrastructure.
  • Each role cookbook should have one and only one recipe that contains enough include_recipe statements to form the run list you would have previously put in the role itself.
  • Keep your roles small, so that the blast radius of making changes to a role’s attributes or the role cookbook is kept to a minimum. Unless you have a very small infrastructure, do not have a role called “webserver”. Instead, have many roles with narrow functions (e.g. “corpsite_webserver”, “app_foo_webserver”)

Epilogue: Versioned Roles

Opscode is likely to add some kind of “versioned role” structure in Chef 12. Until then, the foregoing design principles should stop you from shooting yourself in the foot and having to force-all-the-things.

Julian Dunn

Julian is a former Chef employee