Clean Chef code: Depend on public cookbook interfaces

For about a year, we’ve been cleaning up the Chef Infra code for freistilbox to make updating dependencies, Chef versions and even operating systems easier. It’s a lot of work because our early code is functional but not pretty. There have been many instances of “we didn’t know better”, and that’s what refactoring is for. But I also came to realise that we were missing a critical fact: Common software engineering principles and practices apply to infrastructure code like they do to any other type of code.

Or as early Chef developer Joshua Timberman puts it:

“Hey, ya’ll remember when devops really just meant you knew how to write all your bash in ruby instead?”

Ouch. Making this connection earlier would have saved us weeks of work. That’s why I’m going to share my findings in a series of posts.

In this post, I’m going to advocate for treating a Chef cookbook as a unit of software that provides explicit interfaces instead of tempting its users to depend on implementation details.

Chef Infra cookbooks use node attributes as variable parameters for system configuration. In software engineering terms, node attributes are global variables. They’re implementation details of a cookbook. In consequence, using a cookbook’s node attributes for any purpose other than defining setup parameters creates a dependency on implementation details which can change at any time.

For example, it’s common practice to use a search on node attributes for service discovery. As node attributes are global variables, any cookbook can do this:

service_nodes = search(:node, "webservice_id:myservice")

From this code, we can tell that web service nodes are identified by a node attribute named webservice_id; nodes sharing the same webservice_id value belong to the same web service.

The problem with using this information outside the cookbook which provides it is that this particular implementation can change at any time. This kind of tight coupling is a liability. For example, a second attribute webservice_status might get introduced, reducing the node set by adding AND webservice_status:active to the query. Since this change in semantics is not necessarily a breaking change, there’s no simple way like semantic versioning to inform everyone who depends on this unofficial interface.

How about we provide an public API instead? Our web service cookbook could for example provide a class we can use for service discovery:

webservice = Company::Cookbook::Webservice::Discovery.new("myservice")
service_nodes = webservice.nodes

This is easy to implement as a cookbook library. By using namespaces, we make sure that method names don’t conflict. In my practice, I tend to use the camel-cased cookbook name under the namespace Cookbook and the company namespace.

With service discovery encapsulated and hidden behind a public interface, we could even reimplement the cookbook’s service discovery using a different technology like Consul without breaking any code outside our cookbook.

But even if other cookbooks depending on our implementation isn’t a concern, implementing auxiliary logic in a central library instead of scattering it across recipe files makes it much easier to maintain. I’m going to talk about “Plain Old Ruby versus Chef DSL” in a separate post.