blog_infra

TIMTOWTDI Rides Again – Running Resources on a Limited Schedule

If you’ve been with the Chef Community for a long time, you’ve no doubt gotten caught up in a problem that comes down to “there’s more than one way to do it”. These sorts of issues pop up once in a while, often relying more on personal preference for a solution than on any technical reasoning. Problems with multiple possible solutions aren’t inherently bad, but they can make it hard to find a solution that works for you when it feels like everyone else is doing something different that doesn’t *quite* work in your environment.

One of the things that comes up in the Chef Infra ecosystem is how to handle resources that shouldn’t run as often as the other resources being managed on a node. Sometimes these are heavy processes, like a package database update, that should really only run during times when the node is otherwise not busy to avoid issues with the applications living on the node. We want to keep them in the configuration management system, and have the resources and settings stored in our version control system, but we don’t want the process to run every hour or so when the rest of our system converges. 

One of our community members asked this question on our Discourse list recently, and it led to a lengthy discussion in an internal Slack channel. I’ll summarize some of the solutions we’ve used in various places over the years. If you’ve used something else that has worked well for you, join our Discourse and add your solution!

So, let’s say you’ve got nodes that have some number of recipes assigned to them, and chef-client is running hourly, but you have some Chef Infra resources that should only run once a day. 

Cron It

The classic solution for scheduled tasks on UNIX-y systems is cron. It’s there, it’s dependable, and you’ve probably used it for any number of things. On Windows you have the option of working with Scheduled Tasks. 

Create a cron job that runs chef-client with an run_list override. 

  • Pro: This gives you some flexibility for limiting the conversion of many resources that are stored in a cookbook or in multiple recipes. If the resources don’t require elevated privileges, it is possible to run chef-client under a non-privileged user’s cron.
  • Cons: Any resources converged during an override run will not persist to the node’s data. You may also see that the scheduled chef-client job doesn’t run when the daemon is still executing at the same time, depending on your daemon schedule.

Here’s the bit you’ll want from the chef-client docs:

-o RUN_LIST_ITEM, --override-runlist RUN_LIST_ITEM

Replace the current run-list with the specified items. This option will not clear the list of cookbooks (and related files) that is cached on the node. This option will not persist node data at the end of the client run.

If your node is getting its run_list via Policies, you can use the named_run_list feature of the Policyfile to create a separate run_list for just these sorts of cases. A named_run_list appears in the Policyfile.rb, and becomes part of the Policy for your Policy Group. Here’s what the lock file looks like with a named_run_list added:

"name": "timed_run",
 "run_list": [
   "recipe[timed_run::default]"
 ],
 "named_run_lists": {
   "limited_time": [
     "recipe[timed_run::bonus]"
   ]
 }
...

Instead of the -o option, you’ll want the -n (small n) option for chef-client:

-n NAME, --named-run-list NAME

The run-list associated with a policy file.

A bonus for the Policyfile method is that this named_run_list isn’t relegated to being tracked as the cron job configuration, it is part of the policy and easy to reuse and update – it’s in the Policyfile with the primary run_list and available to all hosts configured with this Policy.

You’ll want to manage this chef-client run via a Chef resource in your daemonized chef-client run so you don’t lose track of it. Check out the cron resource or, after Chef Client 14.4, the cron_d resource. For Windows, you’ll want the windows_task resource.

This is probably the way to go for our original question-asker, as he was contemplating limiting a whole cookbook out of his daemonized chef-client runs. But what if you just have a resource or two that need some time limitations? TIMTOWDI, but here’s some examples.

Manage the Timing in a Recipe – Time.now

Chef Infra is Ruby, so you can make use of bits of Ruby in your recipes. Let’s see what that looks like for guarding a single resource.

Maybe you have a resource you only want to run between 1am and 2am in the morning. It can run every day, but only during that hour, and you have no requirement that it should always run at a specific time. You can keep this resource in the regular chef-client run, but use the only_if guard to watch the time for you.

time1 = 1
time2 = 2
now = Time.now.hour

resource my_long_resource do
 action :run
 only_if { now >= time1 && now < time2 }
end

This is a bit of a blunt instrument, but you can use it to assign the times in a node attribute, or use other parts of the date, such as the day of the week or the month of the year.  This page has a good description of what you get from Time.now. This method will probably work for you, as long as your chef-client is running on a regular basis and would normally run during the window you choose. Similarly, if you don’t want the resource to run over the weekend for example, you could check that as well, with Time.now.wday.

Manage the Timing in a Recipe – Lockfiles

Another way to limit the time when a process runs is the use of a lockfile. The chef-client program uses a lockfile to make sure no other instance of chef-client is already running when it starts. System services commonly use lockfiles or pidfiles to keep track of running services in a low-overhead way.

The apt package manager on Debian- and Ubuntu-based Linux systems has a success file that it uses when the update process has run for the package databases. The built-in apt_update resource investigates this file when deciding to execute an update. The code for apt_update is in GitHub so let’s take a look at what we could borrow from this example.

The test for whether or not to run an update is in a function definition on line 52 of the file:

# Determines whether we need to run `apt-get update`
#
# @return [Boolean]
def apt_up_to_date?
 ::File.exist?("#{STAMP_DIR}/update-success-stamp") &&
   ::File.mtime("#{STAMP_DIR}/update-success-stamp") > Time.now -   new_resource.frequency
end

So this snippet is doing a couple of interesting things that we might find helpful. It’s running two file status tests on the file update-success-stamp, which is written by the apt-get update process after a successful update. 

The first test makes sure the file exists. The second test compares the modified time of the file to another time, the Time.now minus a frequency in seconds. Ruby’s Time.now function is clever in that it will allow you to do math on the time in this way without modifying the function call. 

This method will return true or false based on whether or not the current time is far enough away from the last time the resource was run. You can see the rest of the action of this resource in the other parts of the apt_update resource code.

If you don’t want to write a custom resource or stash this kind of thing in a method, you can get this functionality using some built in Chefisms with the guards and notifications.

Let’s say you want to run some resource no more often than every day: 86,400 seconds.

lockfile = “#{Chef::Config[:file_cache_path]}/timed_lockfile”
file lockfile do
 action :nothing
end

execute "big_command" do
 command "some big long command that runs a long time"
 notifies :touch, "file[#{lockfile}]", :immediately
 only_if { (Time.now - ::File.mtime(lockfile)) > 86400 }
end

The interesting bits here are the action :nothing in the file resource, and notifies :touch in the execute later. By adding the file with action :nothing, I’m recording it for the resource list inside the chef-client run. Then, later, I can use the notification to touch it, which will update its timestamps. See the file resource documentation for more information on the touch action.

I did the math a little differently from the example from the apt_update resource; this way made more sense to me as I was reading it in my head, “time now minus the time the file was created, should be larger than a day”. YMMV because TIMTOWDI, of course. 

You can make this example more safe by checking that the file exists, as in the apt_update example. You’ll also want some way of creating the file initially, otherwise you’ll never run the execute resource; if the file doesn’t exist, it will fail in this example, and the only_if will return false with an .exist? test. So keep that in mind.

If the command you want to run already has a lockfile or status file of some sort, you can probably use that in many cases, especially if your chef-client is running as root or an administrator. Or you can create one as in this example. If you’re really enterprising, maybe you want to get more in the weeds with the Chef internals, and try adding this feature to the resource API? It might be an interesting feature to add to all the resources so everyone could have, for example, a lockfile and a wait time. Hmm. 

Manage Timing in a Recipe – Gating a Recipe Include

Now let’s do something a bit silly, but hey, you might need it. Maybe you want to only run an included recipe on certain days of the week. We can use Time.now.wday to figure out what day of the week it is, and we can use that to limit when the included recipe will run (wday starts with 0 on Sunday):

day = Time.now.wday
day1 = 1
day2 = 5
if day >= day1 && day <= day2
 include_recipe 'cookbook::weekday’
end

Remember that include_recipe just reads in the resources from the included recipe and puts them inline with the rest of your resources; it doesn’t add the recipe to your run_list

Summary

So there’s a few ways to do it, depending on how much code you need to break out of the usual chef-client schedule. Hopefully you find this helpful. If you have another suggestion, please add it to the Chef Infra Discourse thread! If you have other questions, or are inspired to try adding lockfiles to the resource api, or just want to chat with other like-minded Chefs, please join our Community Slack

Don’t forget, ChefConf 2020 is coming up! We’ll be in Seattle in June. Watch for the CFP so you can tell us all the clever stuff you’re doing with our projects. We hope to see you there!

Tags:

Mandi Walls

Mandi is Technical Community Manager for Chef. She can be found online @LNXCHK.