Jesse Robbins interview on DevOps Cafe #19 (w/ full transcript!)

Damon Edwards and John Willis

DevOps Cafe Episode 19

TUESDAY, SEPTEMBER 20, 2011 AT 11:30PM


Direct download

Follow John Willis on Twitter: @botchagalupe
Follow Damon Edwards on Twitter: @damonedwards
Follow Jesse Robbins on Twitter: @jesserobbins

DevOps Cafe #19 (Audio Transcript)

Damon Edwards: Hello, everybody. Welcome to another episode of DevOps Café. I’m Damon Edwards, coming to you today from Los Angeles.

John Wills: This is John Wills, coming from beautiful Acworth, Georgia.

Damon: John, this is Episode 19. We’re almost into our 20s.

John: Man, that 20’s going to have to be a celebration.

Damon: It is a celebration. But not much intro is required, because you’ve been kicking ass on the DevOps drops, right?

John: Yes, I’ve been having a lot of fun. I joke that I always use that Instapaper where you kind of read later, and I have these queues of all these things I read later that I never read. So now the DevOps drops force me to read them.

Damon: Awesome, yes. So everybody knows what’s going on with you, and we’re both going to be together at the PuppetConf next week, right?

John: That’s right.

Damon: What’s the date on that? Do we know? It’s like the 20-something-th? I probably should know that, but…

Damon: And then on this coming Monday – whatever that would be – the 19th, that’s how professional we are. We’ll be giving a talk at the Large-Scale Production Engineering Meetup, which is a really cool meetup they run over at Yahoo in their big URL’s Café. This time it’s going to be on a Monday night.But more importantly, we’ve got a guest today, and he’s already on the line with us, Jesse Robbins from Opscode. How you doing, Jesse?

Jesse Robbins: Hey, guys. Good to be here.

John: Yes, awesome, Jesse. So Jesse, I’ve wanted to get you on a call here, because I was joking with you beforehand that I wind up quoting you all over the place everywhere I go now. Every time I say, you know what, my [inaudible 01:53] Jesse would say this, and it’s great.

One of the stories that I loved was, you gave that presentation at Boston DevOps, the cultural hacks, and the kind of the bridge between operations development. And I wanted you to kind of maybe pull out some of the gems that I saw in there, and one was what I call the greatest misalignment of incentive stories ever told. And then just to cue them up for you too are, you talked a little bit about how your boss has to be, you have to let your boss be your champion and understand what you’re doing so that your boss can explain it. Those are two of many really cool points you made in that presentation.

Damon: Yes, and Jesse, I think for the listeners who don’t know you as well as we do, maybe it’d be kind of cool to tell just a little bit about what your history was and sort of how you arrived at Opscode. Help set the stage. It’ll probably add a lot more context to those fun stories.

John: For those two people who listen that doesn’t know who you are.

Jesse: It’s funny, I just received an award from the MIT Tech Review, the TR35. And they did a nice profile of me, which I’m actually just going to read, because I’ll let them explain it. But I started off as a firefighter. Actually, I started off working for ISPs when I was in high school, and then I got sick of that. I decided to become a firefighter.

I moved to Seattle to join Seattle Fire, and I had applied for a bunch of jobs in the meantime while I was waiting to go through that process. One was a bus driver in Seattle for King County Transit, and the other one was returning to tech as a senior backup systems engineer at Amazon. Amazon called me back first, and in the decade that followed, the MIT guys say that I transformed the way that web companies design and manage complex networks or servers and software.

Damon: Wow.

Jesse: So I think that was a nice summary. But basically, my background is, I have run large-scale web systems since the mid-90s, and it’s been my passion. The route that brought me to Opscode is always an interesting one, any founding story is. But basically I left Amazon at the end of 2006, fourth quarter 2006. And I wanted to sort of export a lot of the knowledge that I had gained and I felt like we had sort of collectively been building across an industry.

But really, to put operations on the map as being a crucial and important discipline which most people I think at the time – and probably this is still the case, but – really viewed sort of the sys admin and the operations organization as those guys that no one wanted to talk to in the back.

And instead, it’s clear that we men and women are part of a crucial and important foundation of Internet infrastructure. And it is through our work that the Internet that we know today and all the economic and social change that’s happened as a result of that exists. And I think it’s work that matters.

So I joined the O’Reilly Radar, I created the Velocity Web Performance and Operations Conference, and that through that I met Adam Jacob, who showed me Chef in its infancy. We shared a lot of really strong opinions about how infrastructure automation could be improved, how there weren’t tools that we sort of felt met our needs. And I wrote the plan for Opscode along with my co-founders, and we created Opscode, got it funded, launched Chef as a product, launched Opscode-hosted Chef as a product, and have been sort of scaling and growing ever since.

John: Cool. So I was thinking about the misalignment incentive story. If you want to brush that off, that was a pretty cool story.

Jesse: Sure. So, I was the typical bad guy operations guy. I like to say “Dr. No.” And this goes back before my time at Amazon, but it certainly happened for Amazon, too. In fact, I would sign, at Amazon there’s launch posters that you sign at the end of a new product release. And mine would typically be just the word “no,” and then my name.

And it’s funny, because there’s quite a few of those where you would ask for resources and I would try to block them. And the reason is a very fundamental disconnect between the business between development organizations, and between operations organizations. And it comes down to this. I was held accountable for website availability and for costs for the infrastructure that the web properties that I supported were operating on.

So I could only do a bad thing, which was spend money, and I could never get any better than 100 percent availability. And so, every outage minute meant that I was being penalized, and often without a whole lot of control over the operating environment. And so, if the website goes down, it wasn’t necessarily my fault.

So this creates this sort of core fundamental misalignment, where an operations organization will happily and in fact is strongly incentivized to block any new change that comes out. And this should sound familiar to everybody who’s listening to this. You want to say no because the operations group is responsible for keeping the site up, and change is what brings the site down. Meanwhile, the business and the dev teams all want you to get stuff pushed out faster.

So, the story that I like to tell is actually a feature launch. I’m not going to say what it was, but I had an argument with a senior vice president who… He may have been just a vice president, I don’t remember. But anyway, a guy that I like to call “The VP of Awesome,” because he was responsible for all of the cool things that Amazon launched for a period of time on the retail side of the business.

And he and I had this argument about whether or not the site would go down when this particular feature launched publicly. And I had the power to block, he had the power to override me. And he did, but he did so by saying something profound, which was, the website may go down, but the stock price will go up because we’re rewarded for innovation and for customer delight. And holding things back is not how you build a business.

And I realized pretty much at that moment that I had been on the wrong side of every argument for probably my entire career, when I viewed my job as being stopping people from doing work. And that’s sort of what led the transformation in my career from being the kind of grumpy ops guy who said no, and really liked catching developers screwing up, to the guy who viewed himself as having the first and most important job of serving a business and really trying to make things go faster and better consistently. And so, that’s that anecdote.

John: Cool, yes. And then you had said that you had kind of at that point changed some of the behavior, forced developers wearing pagers, and things like that.

Jesse: So, one of the core things that I look to – and I talk to a lot of organizations now about DevOps and sort of the cultural shift – there’s a couple of really simple things that people can take to their organizations when they sort of want to export DevOps or import it into their culture. And the first is, code that is written and not deployed is money wasted.

One of the things that we looked at in my previous role was, developer productivity is a crucial metric. And this is the metric that sort of makes the DevOps love happen from the ops side. It’s the place where there’s probably a lot of pain, and it’s, how long does it take to deploy code, and how many checks and processes are in the way?

So from the sort of critical metric that drives change, it’s pretty simple. It’s how long does it take from code being written to code being deployed in production? And when you look at that and then you start examining, well, why is it that there are all these delays? Pretty quickly you find, well, there are all these delays because we have to build a bunch of support stuff around operations. So there are operational acceptance criteria, or there are all these other things that are basically artifacts of a throw-it-over-the-wall mentality between dev and ops.

And we realized that the fastest way and the best way of making developers productive was put them directly on call for their software and let them deploy themselves. Don’t stand in the way. This is sort of a precursor to continuous deployment, but this a number of years ago that this all began, and before that was a term.

And so, a lot of changes got driven out of that. One of the artifacts of that was, I did in fact hand out thousands of pagers, and we put developers on call for the software that they were responsible for. And the operations organization supported them by kind of focusing on the glue and helping them sort of being the next line on call.

The analogy that I like to use is, we want everyone to know how to use a fire extinguisher, but we don’t expect you to know how to kick down a door and use a big hose. So you sort of shift from the operations group being the people that are running the show to being the fire department when you call when there’s an emergency to help contain things. And then you look back to other people to restore and improve over time.

Damon: Jesse, you mentioned something interesting there which is the key metric or what you see as the developer productivity. I think it’s interesting. I’m a big believer in the idea that how you improve an organization… Maybe this is not just me. Maybe this is just common wisdom. How you improve an organization is through shared incentives. You give everybody the same goal to target and let them understand how their role impacts that target and let natural incentive take care of itself, or natural human nature take care of itself from there.

A lot of times when you arrive in a web operations company, they’ve got two piles of metrics. They’ve got their very development centric metrics. It’s all about the dev team and what they’re churning out. And then you’ve got your ops metrics which is generally cost and maybe a whole pile of infrastructure stats. That’s what I call it, information about your server and all that kind of stuff, the things in an organization.

But very little focus on the fact that this is actually an end to end process, from development all the way through to operations, like you said, that’s really getting it down.

I was wondering if you have any insight on why that happens. At Amazon, you mentioned it was very metrics driven. Was that something that right out of the gate they got right or was there a transformation process to get to that point?

Jesse: I can’t really speak to specifics about Amazon other than to say Amazon is a very metrics heavy culture. They talk a lot about that.

I think that when you look at a company that gets good at web ops, the Oracles, the reason that they get good at web ops is because it’s crucial to their survival. That’s really any business that depends on the web for revenue which is ultimately going to be every business.

First of all, the reason these misalignments occur is because of the traditional software methodology and mindsets. This is a legacy of our legacy where the developers wrote code and over the wall to operations who ran it.

What we’re finding though is that websites, anything that is operating on a constant basis, works much more like a factory. I think we all in the early community, I’ll call it the “velocity community,” began figuring out pretty quickly that there was a lot of prior art here. I like to look to things like the theory of constraints, and the book “The Goal” and stuff that came out of the Toyota production system where rather than viewing engineering as being a separated discipline, it was very tightly integrated with a larger supply chain.

We have come to view websites as factories. We may not be perfect in our application of that, but the reality is that it is not something where there is design and then development to produce something down the road. It is an ongoing process with very tight integration of all the components from engineering to operation and supply which is now our compute providers and hosting companies, network providers, et cetera.

And so, I think the origin of that misalignment is well understood. The reason for the shift is because it’s the only way to compete. So, there’s no major web company that is operating today that is successful using a really broken historically separated dev and ops model. They’re all having to make things tighter and faster and focusing on continuous improvement.

The metrics that you expect to see in an organization really focus on how productive people can be and how efficiently they are able to use resources. If you look at those objectively, you end up breaking down a lot of silos and barriers because those are where those obvious lacks of productivity occur. It’s where the obvious bottlenecks emerge.

Anyway, I think that’s the meta point which is when you wonder like, “How do these things emerge?” Well, it’s a legacy of the way we developed software prior to the web. “Where are things headed and why are they headed in this universally common thing?” Well, it’s necessary in order to survive.

If it takes you three months to turn a software feature on a website, you fail. If you’re website isn’t highly available while you are making those frequent changes, you fail. If it isn’t fast and good from a performance standpoint for all the users, you fail. If it’s not efficient in the way that it operates, you fail.

So, natural selection provides a strong incentive for people to actually improve the way that they do things.

Damon: Right. That’s interesting, though. I hope that’s a good segue here, but I’ll segue to Opscode.

It’s kind of interesting. When we talk about the DevOps movement and DevOps ideas and goals, from the top down we talk a lot about culture. We talk a lot about metrics. We talk a lot about from the process point of view, right?

Yet one of the criticisms of the DevOps movement, which I think is probably at least partially accurate, is that a lot of the excitement and interest goes into the automation tools. Because probably and rightly so, that’s where a lot of the innovation, and radical innovation especially, for how it affects the day to day life and tooling of a systems administrator or even a development engineer for that matter. But it’s this idea that we jump right down to the tools.

As one of the leading tools vendor or tools innovators in this space, in Opscode with Chef. Do you feel like it’s in your guy’s purview to help dictate or inform people on how this end to end process should look and the metrics they should be collecting? Do you feel like you’re really attached to that or do you see more of yourself as the arms dealers? Like, “Look, no matter what your conflict is, what your process is, what metrics you’re trying to measure, you’re going to need guns and we’ve got them for you.” What’s Opscode larger interest in the upper level of the people process in tools, sort of DevOps stack?

Jesse: We try to generally use Opscode as a force for awesome. Let me explain what we see and how we operate with organizations.

The first thing is that in order to succeed at scale, you need really powerful automation tools. We happen to believe that we’ve built the best one. There are certainly other approaches, but you need an underlying technology set that supports efficiency, supports scalability that allows you to realize the benefits of the shift in technology that’s currently underway. Chef is the glue for that.

From our perspective, I think “arms dealer” is probably not the term I would use. I like to think of us as a critical infrastructure provider. There are certainly aspects of it where you get a massive competitive advantage by using a tool like Chef and particularly when you’re using a tool like Chef coupled with a service like the Opscode Hosted Chef. So, you’re not even having to run the Chef server.

But for us, what we tend to see is these organizations say, “Wow. It’s obvious that the big web guys got something right and we need the same tools, power and agility. We need to map that into our organization. What does that mean?”

When we explain, “Well, here’s how lots of people are using Chef. Here is how this customer is using it and here’s how this customer is using it.” What they say is, “Man, I really want to be able to build and deploy a new server in ten minutes. How do you map that to our existing tools and processes?” And we say, “Well, the reality is that you’re in the middle of a big shift, so rather than approaching it from a big top down, let’s throw away everything they’ve been doing previously, let’s help them bootstrap a small agile culture.”

We explain how deployment works. We talk about how all the automation works that will support a very productive developer working with awesome sys admins who are helping them automate and scale really, really easily. And a natural product of that is support of the cultural change that’s underway.

If you come to us and say, “We want to use Chef to manage a legacy infrastructure and application,” we’ll certainly help you do that, but the reality is that the previous decade’s infrastructure automation tools are really not what we’re focusing on. We’re focusing on cloud oriented environments. We’re talking about agile environments. We’re talking about environments where people are able to reap the benefits of efficiency and scale that is sort of table stakes now.

That’s what we do with Chef. I think the logical outcome of our work is bigger and bigger organizations are saying, “Wow. There is such a profound difference between what we have done historically and what is possible with Opscode that we’re totally interested in changing in order to be able to do similar things in our organization.”

We see that again and again, including some large financial institutions with 10 of thousands of servers where they’re saying, “We want the same kind of agility that big web has, but in our environments.” They’re choosing Chef for that.

Basically, there’s a tight coupling between tools and culture. I always like to say, “Culture first,” because what they want is the same old, same old. If they like six months to get a server installed and in production, they probably don’t need a tool like ours, but they’re probably not going to be around very long either.

Damon: Jesse, one of the things we talk a lot about is technical debt in operations. We’ve done a couple of presentations on what we call, “a cloud gone wrong.” One of the things I use a lot, I don’t know if you remember your “Tale of Two Startups” presentation?

Jesse: Yes.

Damon: And so, I always try to explain that. You start off and you show the first 10 days out of a 20 week, or something like that, exercise where you put in a lot up front and then the difference between what you put up front. I always think that is the infrastructure, partly the infrastructure code. I would say mostly infrastructure code.

Do you want to talk about what we call “technical debt” and what you were trying to say in that “Tale of Two Startups?”

Jesse: Sure. The simple answer is when you do a little bit of work upfront to use automation and interestingly enough, the barrier there has gotten easier and easier, at least with Chef.

Damon: Right.

Jesse: The work that you spend doing the repetitive stuff that you’ve always tried to convince yourself, “OK. Well, I’ll automate in the next step. I’ll automate in the next step. I’ll automate in the next step.” If you’re spending two hours to get a new server into production and you didn’t automate that process, the next one that you have to do and the next one, certainly you’re burning days and days and days before you’re even in production or before you’ve even got a development stack set up.

This is why when I talk about it as being table stakes, it’s required to succeed. You have to be able to really focus, not on the operating system and the deployment, but on what you are supposed to do for your business; the actual business value that you gain an immediate competitive advantage.

When I try to work with organizations and help them figure out where they should be focusing. Particularly, as an assistant myself, I’ve fetishized my unique ability to build and configure things; that specialized kernel tooling knowledge that you gain over years. The reality is that the new skill, the new important thing to have, is speed to deployment and time to value, as I’ve become more fond of saying lately.

So when you talk about a start-up or a new project… I’ll actually use a couple of our customers as examples where you have a developer inside of a large company. They sign up for Hosted Chef and an easy tour Rackspace account. They are up and running without interacting with their internal IT team and building on top of a scalable foundation in minutes-you can get working with Chef in under 15 minutes.

They’re able to deploy in scale, programmatically and easily, and go pretty seamlessly from their development environment to production without having to recreate all of the glue and one-off scripts and all of the other crap that I feel like I’ve built over and over and over again prior to Opscode, and we built Opscode in order to eliminate, largely.

You now have a developer who is productive in a way that they couldn’t have been before without tools and a programmable infrastructure – infrastructure-level APIs. As a result, they’re able to do something in minutes which previously took weeks. That is so profound when you see it inside of an organization that it’s really actually disruptive at times.

John, when you worked for Opscode, we had a customer who, although we won’t say their name, but they had a dev and ops organization that started using Chef and using EC2 and it freaked out the rest of their organization. Both because of what was possible and because it was not possible with any of the tools that they had spent years and years trying to build and improve on.

As a result, it’s disruptive within an organization. You’ve been having all these people who were saying “no” all the time and now they’re finally saying “yes,” and it didn’t require internal permission to get there.

Using that tale of two start-ups thing, which predated Opscode I’ll say, if you’ve got an organization that can do something in minutes and see time to value from developer writing code to code in production making money and then the guy in the next cube, it takes them two weeks or three weeks to do the same thing. The first guy wins because they’ve got better tools and they’re focusing on the part of the company that matters; the valuable part of the company, which is getting the unique stuff out and into production. They’re not focusing on “How special can I make this server config?”

John: Damon. Damon, let me add just one thing.

Damon: Sure.

John: Funny about that company, which we won’t name, is that what really drove them over the edge was there was some people liked Chef and there’s some people that had some legacy infrastructure thing in place. But when they were able to look at their total server farm was under 5,000 and in some of these spin-out projects they were throwing out almost 1,000. They realized they were going into a world that it was choked. The clothes they were wearing were not warm enough for where they were going camping.

That was, I think, the straw that broke the camel’s back in terms of why they really needed to focus in on Chef. Just looking at a quarter of their infrastructure being spun out for four hours and realizing, “Oh, my God, where is this going to go?” And, “We’ve got to refactor the way we look at automation.” Infrastructure is code.

Jesse: Yeah.

Damon: Actually, my question for Jesse is along those lines. You mentioned competitive advantage, you got… I’m sorry, competitive differentiator or competitive advantage. You mentioned operations as a competitive advantage. You mentioned the need for having people do more value-creating activity. These are all ideas that I feel like people that have gotten within the DevOps movement or that have seen the light, they’ve seen what’s possible. We understand implicitly, like, “Oh, yeah. I understand. This is a better thing to do.”

Or they’re in a situation, like you and John just mentioned, where literally they see the giant avalanche about to fall on their head and that spurs them into action. But the common frustration that I hear a lot from people who are more in the trenches is, “How do I explain to the business that technical operations are a competitive advantage?” It seems like they understand it on the… If you’re in the manufacturing business, you understand that Dell kicked everyone’s ass because operations were they’re competitive advantage. Same way with things like FedEx and UPS versus the postal service.

Operations matters, but it seems like when it comes to the IT space, especially the general web operations. A lot of organizations are now entering into this space and they don’t really have that… It’s hard to explain to the business that technical operations are a competitive advantage. Do you have any advice for people on how they can have that conversation with the business and open their eyes before it becomes a life-or-death situation?

Jesse: I do. First of all, the hardest part of my advice is that this is a process that takes time. It’s been hard for me in my career when I realized something profound and important that I want to tell the world… Fortunately I’ve been blessed with opportunities to do that more frequently, but when you’re operating within an organization where there’s a lot of cultural resistance to it, the first thing you have understand is that change is a continuum, it’s not an event.

I tend to explain a couple of things. In the speech that I gave that John was talking about, I talked about wanting to run back to the office and say, “I learned this new thing called DevOps and it’s going to change everything and make life stop sucking so much. We’ll just give everyone root and we’ll put everyone on call and we’ll deploy our software all the time. Also we’ll make metrics and dashboards and we’ll send them to the CEO. It’s going to be totally awesome.”

You’ll immediately get smacked down and you’ll actually have built a bunch of cultural resistance. Now you’re trying to make change happen in an organization. Change is always scary. Change is always hard. It doesn’t matter if it’s actually obviously the far-better thing for you, because there’s a kind of inertia that you have to overcome.

What I tell people to do, first of all, is rather than start from the top down and be really visible… I learned this lesson, actually, from one of my mentors, Kim Rackmiller, who another ex-Amazon executive. Kim, when I was explaining some of the frustrations I was having in getting things done as a program manager, she said to me part of my problem was that I was trying to work on the whole company at once, rather than looking for small pockets to make change happen.

The simple solution is don’t lead with the solution for the entire company first. You want to start asking questions to explore the problem and make things safe. There’s a great book that I recommend called “Crucial Conversations” that actually explains how to have these very difficult conversations with people about why things aren’t working and what needs to be done. Usually if you’re the person inside of that organization who’s trying to make the change, there’s already a bunch of outages; there’s a bunch of unhappy developers.

There’s a whole bunch of resistance and when you talk about doing something different, people don’t think “Oh, it’s going to make it better,” they think it’s going to make it worse. What I tend to do is lead with questions. I ask things like, “Hey, what are some of the ways in which we could improve this process?” “What are some of the ways that we can make you, the developer, happier?”

When you’re having conversations privately with folks, particularly from the business side, say, “What are you frustrated by?” And I guarantee you that what they will is “It seems so hard to get stuff that we want done that’s crucial for our business actually done.” Everyone has that problem. “Well, I have some ideas about this. What do you think about this?” And you introduce the conversation slowly and safely.

Ultimately, the best-case scenario is that people will figure out this stuff through the leading questions that you’re asking and they come to it themselves. They realize “What would be better is if we could get stuff deployed faster.”

“Oh, well, what are some of the reasons why we’re not able to do that?”

“Oh, well, we have this change-of-view process that we have to go to.”

“Why do we have that?”

“Oh, well, one time this guy that did this one thing.”

In fact, that’s actually the origin of a lot of terrible enterprise culture. I forget who gave me this example, but a lot of the crap that you end up with in a big company is a result of “Well, one time this person did this and so we created a process to stop people from ever doing that again.” Usually the cost of that process is way, way, way higher than the actual risk and damage being created.

Anyway, that’s the first thing. Don’t come in with your solution and hope that people are going to fall in love with you. Instead, come in with a question, which is “How can we make things better?” Consistently explore that and maybe nudge a little bit. Ultimately, look for a small project with a bunch of receptive people who you can help make successful. That’s the first thing.

Really, unless you’ve got massive executive buy-in and are able to get everyone moving at the same time on something, it is the only way to begin to make change that actually will work when you’re in the sys admin or developer shoes. You can do things a little bit higher up, but still it’s better to start small and build trust and safety. That’s the first thing.

The second thing, which I always remind people, that you can accomplish anything so long as you don’t require credit or compensation. In that I suggest that people focus on creating champions. These are the people that are not you. These are the other developers, the sys admins, the business unit heads, the executives within the organization who you can point to and say, “Hey, we’re starting this little project here, but we think it’s pretty exciting. It’s a much better way of doing things, or we think it might be a better way of doing things. Would you help us evangelize that?”

It’s amazing how effective that is. I’ve done that a lot with availability work that I’ve done. Projects like “Game Day,” which we can talk about at another time, or maybe later in this call. But, basically, what you want to do is get a bunch of people that are spreading the word on your behalf and really seeding the ground ahead of you in order to make things work.

The other piece of that is using a lot of metrics, so you want to start measuring first. I suggest people look at developer productivity: time to deploy, as one metric, and then the other one being time to recover from outages. This gives you what John Alsbach calls “the currency you need for change.” So you start with your small team and your small project, and you measure the difference between the way that things were and the way that things are as you make these series of changes.

As you do that, what you find is you see improvements and you’re able to use that to both arm your champions and to introduce the conversation more broadly. Like, “Hey, we have this different way of doing things. It’s working kind of well for us. Here are some metrics that show it.” That builds a very compelling case for people; one that’s objective, one that doesn’t have you arguing.

Any time you try to introduce the DevOps conversation, it’s the same resistance, that agile god, and it’s “Well, here’s all the reasons why this won’t work.” The way to redirect that conversation is “Well, here’s how awesome it is when it does.” I tend to not spend a lot of time trying to argue with people who are entrenched on a position. I try to help them see the light.

The last thing – there are a couple of other pieces – you want to make successes celebrated in public – Oh, sorry, one other thing. The most important champion you can create is your boss. This is something that I have been historically terrible at. Unfortunately, up until very recently, I haven’t had a boss, other than a board of directors. I now have a boss again. I just hired a new CEO at Opscode, who’s awesome.

The most important thing that you can do is make it so that your boss is able to advocate on your behalf. That’s where those metrics become so important. You need to make it say for them to provide the cover for you in order to get the important work done. They need to be able to tell the story of why you’re doing what you’re doing just as well as you can to their people. So, something to know.

You basically help them tell the story. It’s “We made a change in the way that we’re deploying stuff and it’s resulted in us being able to deploy new features every day. Certainly the first couple of days there was a lot of chaos as we mapped it out, but outages are reduced and the business units are really happy and developers are really happy and the ops people are happy and it’s working well for us. How are things going for you?”

You really want to drive visibility – this is the celebrate successes. You want to make it not a punitive thing where the beatings will continue until website availability improves, but, instead, say, “Look at how awesome it is that we’ve managed to make this change and this change has really changed our entire business within the smaller team.” That provides people the moment to examine their own stuff and makes it safe for you to have the conversation with them.

The last – this is the secret – is you want to exploit compelling events. Compelling events are things like shifting to the cloud or a major data center outage or a major outage on a website or other things. Every once in a while, something, good or bad, but a major thing happens that makes people receptive to change. It could be a major operating system upgrade. It could be a new version of a deployment technology. It could be just a company-wide mandate to evaluate the cloud.

When I help people figure out how to make change in an organization, usually there is a compelling event that you can either wait for or you can, in some cases, manufacture. That’s the thing that lets you accelerate the loop and spread things out a little bit faster. I found that the bigger the event, the more currency for change you get.

In my previous roles, major outages or problems or issues with data centers allowed me to do a lot more because suddenly people were willing to accept that yep, data centers fail. We need to change our approach or we need to get developers able to deploy code faster because of these reasons.

Similarly, actually within the story of Opscode, there’s this massive shift to the cloud. It’s a once-in-a-decade shift. That literally gave us the currency for change. We were able to get funding because it was clear to a lot of people that things are going to be done differently, both technically and culturally in the next decade. There’s going to be a small number of providers that capture everything. We hope it’s Opscode, but, again, the whole cloud-shift is a massive opportunity for people to change the way they do things.

John: Well, it’s cool. I think we’re running out of time, so I do want to end with one of the things that I think is amazing about you is your whole philosophy towards failure. I think we’re going to have to get you back at some point to spend the whole time on. I call failure the new black, but I know it’s always been your-what’s your quote? “It’s not tested unless you break it in production.”

Jesse: That’s actually not my quote.

John: Oh, it’s not? I know “failure happens…”

Jesse: My quote is “failure happens and anyone who tells you otherwise is lying.”

John: I’ve heard you say that, though. Or somebody say, “Jesse says it’s not really tested unless we break it in production.”

Jesse: That’s trust through verify. Yeah.

John: But what I do want to know is – we’ve got a few minutes left. You know I’m a close follower and a big fan of Opscode and, full disclosure, I’m a stockholder. Tell us all the great things that are going on there. I’ve seen some really cool stuff over the last couple of months.

Jesse: A couple of trends I think I’ll call out first. The first is we talk about Chef as being a foundational tool for businesses to success at scale. Something that every developer and sys admin and every organization are able to use. One of the exciting things is that seems to be growing really fast.

I like to highlight our contributor community. We have over 400 contributors now to the chef project alone and thousands of people registered and using Opscode Hosted Chef. In terms of the exciting stuff that’s happening on our end, what we’re seeing is a lot of big enterprises who have decided that they want Chef and they want it at some pretty impressive scale.

So we launched a product called Opscode Private Chef, which is an on-prem version of Hosted Chef. It’s been an amazing process with the whole team flying around to lots of large, large, large enterprises who are basically ripping out their legacy infrastructure automation stuff and are replacing it with Hosted Chef or with Private Chef.

That’s been pretty awesome. It’s been a little weird, honestly, some of the conversations that we end up with where it’s tens of thousands of machines. We launched Windows support a couple months ago, officially. That’s been something that’s been really important for a surprising number of these big, big enterprises where they are a heterogeneous shop and they want to see more there. So we’ve been seeing a lot of interest from the larger providers on Windows.

Those are a couple of exciting things. The other exciting thing we’re seeing a lot is there’s a tooling shift that we talk about occurring across the industry. One of the neat parts about getting to have visibility on Hosted Chef to the size and scale of 5,000 or so organizations is we’re seeing a lot of growth on the rack space side and other providers for infrastructure as a service, which has been pretty cool.

Then the last interesting thing I’ll talk about is the work which, John, you had a big hand in with VMware with them releasing a bunch of stuff around their cloud paths. There’s a bunch of automation with Chef there, some of which is getting released publicly and more to come very soon. Basically, from our perspective, from an Opscode perspective, the community got bigger and richer.

We’re seeing the momentum accelerate pretty dramatically. The part that I’m excited about is we’re able to help a lot of big companies, where I, frankly, thought it was going to take us a lot longer for them to shift. They’ve got these big cloud initiatives so they’re knocking down our door. It’s been pretty cool to see how ready people are to do something entirely different and that they’re choosing Opscode for that.

John: Very cool.

Damon: Well, hey, Jesse, we’ll let you get back to work and making a lot of stuff going on. We’ve monopolated – monopolated – monopolized a lot of your time. Monopolated – it’s a new word. I really appreciated it and, as John said, yeah, I’d love to get you back on and talk about the whole failure is the new black and all of that stuff. John, anything else?

John: No, I think it’s good. Great podcast.

Damon: Yeah, great. Thanks a lot Jesse, and everybody else we’ll talk to you next time.

Author Jesse Robbins

Chef Co-Founder & Advisor