Introducing Partial Search for Opscode Hosted Chef

We have deployed a new search API, dubbed /partial search/, to Opscode
Hosted Chef. The partial search API is designed to reduce the amount
of memory and the network bandwidth required by chef-client to process
search results. You can start experimenting with partial search in
your Hosted Chef organization by downloading the
[partial search cookbook][] and using the `partial_search` method it
provides in place of calls to `search` in your recipes.

[partial search cookbook]: http://community.opscode.com/cookbooks/partial_search

Both the standard search and the new partial search APIs allow you to
search different types of objects. When you execute a search, you
specify an object index, for example the node index, and a query that
is used to match against the relevant index.

The standard search API returns complete Chef objects. When you query
the node index, the result is an array of `Chef::Node` objects that
matched your query, each containing a full set of attributes for the
node.

If you only need to obtain the values of a small number of attributes
from objects matching your query, then partial search is for you. With
partial search, you specify the attribute keys and the server
extracts the values for you. The partial search API returns simple hashes
containing just the data of interest — these results are part of an
object, hence partial search.

The syntax used for partial search is best explained with an
example. Below, is a partial search query that returns the received
bytes counter for the eth0 interface of nodes with role `external-lb`:

[sourcecode language="ruby"]
nodes = partial_search(:node, 'role:external-lb',
    :keys => {
               'name' => ['name'],
               'ip'   => ['ipaddress'],
               'eth0_rx_bytes' => ['counters',
                                   'network',
                                   'interfaces',
                                   'eth0', 'rx', 'bytes']
             })

nodes.each do |result|
    # result is a hash with keys 'name', 'ip', and 'eth0_rx_bytes'
    result['eth0_rx_bytes']
end
[/sourcecode]

The `:keys` option defines a mapping of short names of your choosing
to the attribute key paths you want to extract. Key paths are
specified as an array of keys to avoid escaping issues and allow you
to extract values nested deeply in an object. Each key path gets a
short name (e.g. `eth0_rx_bytes`). The extracted value is available
under the short name in the returned partial search result.

Partial search has the potential to make a big impact for recipes that
use search to access data for all nodes in your
infrastructure. Consider the [nagios cookbook][] that includes the
following search in the [server recipe][]:

[sourcecode language="ruby"]
nodes = search(:node, "hostname:[* TO *] AND chef_environment:#{node.chef_environment}")
[/sourcecode]

The recipe uses the node results to render the `hosts.cfg`
file. For each node we need the ipaddress, hostname, and
run_list. Rather than fetching full node objects for every node in the
infrastructure, we can use partial search to obtain a much smaller
search result. Here’s the same query rewritten using the `partial_search` method:

[sourcecode language="ruby"]
node_data = partial_search(:node, "hostname:[* TO *] AND chef_environment:#{node.chef_environment}"
                           :keys => {
                                     'ipaddress'  => ['ipaddress'],
                                     'hostname'   => ['hostname'],
                                     'hostgroups' => ['run_list']
                                    })
[/sourcecode]

Since the return is no longer an array of `Chef::Node` objects, I’ve
renamed the variable to `node_data`. In the cookbook, the search
result is used by the [hosts.cfg.erb][] template. One of the things
that happens there is that the `run_list` method is called on the
`Chef::Node` objects returns by the standard search API. Since we will
have simple hashes instead of node objects, we won’t have a
`run_list` method to call. Instead, we postprocess the partial
search result and manipulate the `hostgroups` value to be in the form
needed for the nagios config.

[sourcecode language="ruby"]
# process the run_list to set hostgroup
node_data = node_data.map do |n|
    run_list = Chef::RunList.new(*n['hostgroups'])
    hostgroups = if run_list.roles.nil? || run_list.roles.length == 0
                     'all'
                 else
                     run_list.roles.to_a.join(",")
                 end
    n['hostgroups'] = hostgroups
    n
end
[/sourcecode]

The extra post-processing effort to massage the raw run_list into the
form needed by the template is a tradeoff of using partial
search. With partial search you get smaller and more efficient search
results, but you have to specify what you want ahead of time and you
don’t have the convenience of the helper methods that come with full
Chef objects.

To complete the partial-search-ization of the nagios server recipe, we
need to adjust the [hosts.cfg.erb][] template. The relavent edited
section should look like this:

[sourcecode language="ruby"]
<% @nodes.each do |n| -%>
<% unless n.name == node.name -%>
define host {
  use server
  address <%= n['ipaddress'] %>
  host_name <%= n['hostname'] %>
  hostgroups <%= n['hostgroups'] %>
}
<% end -%>
<% end -%>
[/sourcecode]

[hosts.cfg.erb]: https://github.com/opscode-cookbooks/nagios/blob/master/templates/default/hosts.cfg.erb
[nagios cookbook]: https://github.com/opscode-cookbooks/nagios/
[server recipe]: https://github.com/opscode-cookbooks/nagios/blob/master/recipes/server.rb#L68

So please take partial search for a spin and tell us what you
think. We are particularly interested in whether or not the
`partial_search` method feels right — that’s why we’re making it
available first as a cookbook before baking it into core Chef.

For those of you using the Open Source Chef Server, partial search
will ship with Chef 11.

Finally, if you are using Private Chef, partial search is available as
of version 1.2.2.