Chef Survey 2017: Results

We’re happy to announce that the Chef Survey 2017 results are in. Many thanks to everyone who participated and made their voice heard.

Our survey focused on gaining a better understanding of the trends in productivity, workforce roles, and technology adoption amongst Chef users in our community. In the first few weeks of January we received over 1500 responses. While that’s a small slice of our user base, we received responses from around the world, across all industry verticals. This varied sample pool allows us to get a pulse on the challenges most directly impacting the Chef community.

Here’s a synopsis of what we discovered:

  • Cross-functional teams are likelier to move faster, have greater resiliency, and adopt new technologies faster than traditionally structured teams
  • Managing hybrid infrastructures is the reality for most users
  • Workloads are increasing faster than headcount
  • Compliance automation presents a large opportunity for efficiency gains
  • Speed and efficiency are the biggest target areas for continuous improvement

The changing role of IT organizations

Not surprisingly, DevOps is changing the shape and practices of IT organizations.  The use of automation is on the rise, with more established technologies seeing wider adoption.  Teams that encompass multiple areas of expertise are correlated with a number of performance gains.  Let’s break that down.

Less than half of our users describe their teams as being infrastructure focused, with the next biggest slice (29%) describing their teams as cross-functional (primarily a mix of app development and infrastructure ops).  In terms of estate coverage, 61% are automating infrastructure, 30% are automating compliance, and only 27% are automating container management.  The majority of respondents (58%) say teams across their companies are using or mostly using standardized tools to automate tasks, however, only 19% of those teams consider the tool-sprawl problem solved. Cross-functional teams are the most likely to use a common standard for tooling across the company, while security teams were the most likely to use their own tooling.

When looking at measures across the survey, we see statistically significant differences for how cross-functional teams perform against their more siloed peers.  Cross-functional teams are 17% more likely than application teams to release changes to production on a continuous, on-demand basis.  They’re also 23% and 24% more likely than infrastructure and security teams, respectively, to release changes to production on-demand.  Cross-functional teams are the most likely to reduce the time it takes from initial software commit to running that change in production, the most likely to have already completed cloud migration projects, and the most likely to be running containers in their infrastructure.

Hybrid is the new operating model

Whether it’s a move from physical to virtual, datacenter to cloud, or monolithic to microservices, the majority of our users will be living in some form of hybrid operating model for the foreseeable future.

77% of our users are in some phase (migrating or already migrated) of container adoption.  However, the users adopting containers estimate that, on average only 44% of their infrastructure will be container based.  95% of respondents indicate they’re in some phase of cloud adoption, with 56% still in the migration process.  Of those, 38% employ a hybrid cloud strategy, 37% favor public-cloud only, and 25% will only use their own private cloud.  Of our users operating in a cloud, on average only 68% of their infrastructure will be cloud based.  Perhaps surprisingly, 39% of our users are still currently managing a migration from physical machines to virtual.  Of our users that manage virtual machines (89%), on average only 75% of their infrastructure is virtual.

This signals that many of our users currently operate in heterogenous environments and plan to for the foreseeable future, despite current migrations.  Over a third of respondents are managing multiple migration projects simultaneously.  While a shift toward new technology is in flight for most of our users, the reality is also that legacy infrastructure will continue to be a part of daily operations.

Workloads are increasing faster than headcount

Most respondents see a rise in demand that outpaces the rise in headcount.  In order to stay ahead of rising workloads, teams will have to get more efficient with the resources they already have.

63% of respondents see their workloads increasing, but only 44% expect to see an increase in the size of their development teams.  Development teams are 33% more likely to grow in size in the next year than operations teams.  But perhaps the most telling statistic is that almost half (47%) expect to see the size of their operations teams remain stable or decrease in size, despite the increasing workloads.

Where is all that work coming from?  Unplanned work is a fact of life in IT.  For our respondents, unplanned work accounts for approximately 20% of the work week (i.e. one full day).  Of that time, 42% is spent dealing with deployment failures, 32% is spent dealing with damage from unmanaged change (introduced out-of-band), and 21% is spent re-architecting change already implemented to meet Information Security standards.  Unplanned work always exists, however these areas of volatility are concerning because they’re mostly avoidable.  Adopting practices that focus on avoiding those pitfalls may be one way to help bridge the expanding gap between workloads and headcount.

The opportunity of compliance automation

Most of our community faces a significant burden of work when it comes to compliance solutions.  But a majority haven’t yet automated their way into a better place.

64% of our users have regulatory standards to follow.  Of those users, 73% wait to assess compliance after development work has begun and new features have been implemented.  59% assess compliance once code is already running in production, possibly resulting in additional rework as change is re-architected to meet Information Security standards.  However, even the act of assessing the state of compliance is challenging with 22% of users making those assessments inconsistently and another 23% not assessing the state of compliance at all.

Compliance policies exist as a way of enforcing application and data security.  The more frequently audits occur and vulnerabilities are remediated, the lower the risk of attackers exploiting known vectors.  75% of our users only assess the state of their compliance policies on a quarterly (or longer) basis, with 46% of those users making assessments at an inconsistent rate.  If vulnerabilities are discovered, less than 20% are able to deploy remediations within one day and 55% have inconsistent timeframes or would take weeks to do so.

Automation is consistently the most effective way to scale and stay ahead of increasing workload demands.  Automated compliance testing appears to be a mostly untapped opportunity to both increase security and stay ahead of the expanding gap between workloads and headcount.  In tandem with the infrastructure automation most users have already adopted, compliance automation also makes it possible to create a detect & repair loop with short-feedback cycles that also alleviate some of the unplanned work challenges in the previous section.

Continuous improvement

At the start of our survey, we asked participants to tell us about their current practices.  At the end, we asked them to identify and rank areas where improvements would make the biggest impact on their overall performance.

As a baseline, about two-thirds indicated they can’t deploy changes to production on-demand, with almost half needing a week or more.  Similarly, for about two-thirds it’s common to have more than a day elapse between an initial software commit and running that code in production.  Respondents indicated change failure rates averaging about 15% when making changes to production (however several outliers exceeded 50%).  When failures occur, most users indicated they recover quickly, but 45% still need four hours or more.

The number one improvement respondents indicated was faster deployment speed, with 49% of respondents having that in their top 3.  Being able to respond to service failures faster was in the top 3 concerns for 63%.  Those concerns were well ahead of lowering the rate of failure when making changes to production.  These results suggest that recovering from failure quickly is more important to our users than preventing failures altogether, which may be seen as inevitable.

We also asked users to provide freeform responses around other possible improvements: both for team performance gains and to create more humane work environments. Surprisingly, the most common response for both categories centered around increasing automation, with performance gains also referencing better deployment solutions as a close second.  That suggests we should continue to see a rise in the use of all types of automation and an emphasis on deployment workflows.

Conclusion

For me, the interesting part of this data set is that we’re seeing both current challenges and potential solutions.  We see that maintaining flexibility is not only an essential part of any solution that manages our hybrid infrastructure and applications, but teams that promote that same hybrid approach in experience lead the pack when it comes to addressing these challenges.  Working at Chef, I’ve always been a believer in the power of automation.  However, combing through the responses and coaxing out trends in this data has given me an entirely new appreciation for just how many challenges can and will be solved as we extend the reach of automation throughout IT organizations. At Chef, we’ve found this data extremely poignant and we hope this provides clarity and a point of focus for our community as well.

Additional Resources

George Miranda

Former Chef Employee