DevOps can be interesting: Just read “The Phoenix Project”

I recently finished reading The Phoenix Project, which was a great read. It’s a fictional story that teaches DevOps (the intersection of the worlds of software development and application operations), and is actually much more fun to read than you’d expect based on that topic. The story made it an easy read, while there was real meat — it was like going to training on DevOps best practices, including concepts from ITIL, Lean, and continuous integration (CI); and other recommendations on books and concepts to study further (like chatting with a mentor).

If your career is related to software development or deployment, I recommend it. Here are a few quotes that jumped off the page for me:

  • “Remember, it goes beyond reducing WIP [Work in Progress]. Being able to take needless work out of the system is more important than being able to put more work into the system. To do that, you need to know what matters to the achievement of the business objectives, whether it’s projects, operations, strategy, compliance with laws and regulations, security, or whatever.”
  • “Left unchecked, technical debt will ensure that the only work that gets done is unplanned work!”
  • “Unplanned work has another side effect. When you spend all your time firefighting, there’s little time or energy left for planning. When all you do is react, there’s not enough time to do the hard mental work of figuring out whether you can accept new work. So, more projects are crammed onto the plate, with fewer cycles available to each one, which means more bad multitasking, more escalations from poor code, which mean more shortcuts. As Bill said, ‘around and around we go.’ It’s the IT capacity death spiral.”
  • “Everyone knows that in manufacturing, as WIP increases, due-date performance goes down”
  • “Brent is a worker, not a work center,” I say again. “And I’m betting that Brent is probably a worker supporting way too many work centers. Which is why he’s a constraint.” “Now we’re getting somewhere!” Erik says, smiling. Gesturing broadly at the plant floor below, he says, “Imagine if twenty-five percent of all the work centers down there could only be operated by one person named Brent. What would happen to the flow of work?”
  • “The Third Way is all about ensuring that we’re continually putting tension into the system, so that we’re continually reinforcing habits and improving something. Resilience engineering tells us that we should routinely inject faults into the system, doing them frequently, to make them less painful.”
  • “Studies have shown that practicing five minutes daily is better than practicing once a week for three hours. And if you want to create a genuine culture of improvement, you must create those habits”
  • “I’m experimenting with putting kanbans around our key resources. Any activities they work on must go through the kanban. Not by e-mail, instant message, telephone, or whatever. “If it’s not on the kanban board, it won’t get done,” she says. “And more importantly, if it is on the kanban board, it will get done quickly. You’d be amazed at how fast work is getting completed, because we’re limiting the work in process. Based on our experiments so far, I think we’re going to be able to predict lead times for work and get faster throughput than ever.”
  • “The wait time is the ‘percentage of time busy’ divided by the ‘percentage of time idle.’ In other words, if a resource is fifty percent busy, then it’s fifty percent idle. The wait time is fifty percent divided by fifty percent, so one unit of time. Let’s call it one hour. So, on average, our task would wait in the queue for one hour before it gets worked. “On the other hand, if a resource is ninety percent busy, the wait time is ‘ninety percent divided by ten percent’, or nine hours. In other words, our task would wait in queue nine times longer than if the resource were fifty percent idle.” I conclude, “So, for the Phoenix task, assuming we have seven handoffs, and that each of those resources is busy ninety percent of the time, the tasks would spend in queue a total of nine hours times the seven steps…”
  • “’I’ve learned that while the finance goals are important, they’re not the most important. Finance can hit all our objectives, and the company still can fail. After all, the best accounts receivables team on the planet can’t save us if we’re in the wrong market with the wrong product strategy with an R&D team that can’t deliver.’ Startled, I realize he’s talking about Erik’s First Way. He’s talking about systems thinking, always confirming that the entire organization achieves its goal, not just one part of it.”
  • “People think that just because IT doesn’t use motor oil and carry physical packages that it doesn’t need preventive maintenance,” Erik says, chuckling to himself. “That somehow, because the work and the cargo that IT carries are invisible, you just need to sprinkle more magic dust on the computers to get them running again. “Metaphors like oil changes help people make that connection. Preventive oil changes and vehicle maintenance policies are like preventive vendor patches and change management policies. By showing how IT risks jeopardize business performance measures, you can start making better business decisions.
  • “She created these SOX-404 control documents for the finance team. It shows the end-to-end information flow for the main business processes in each financially significant account. She documented where money or assets entered the system and traced it all the way to the general ledger. “This is pretty standard, but she took it one step further: She didn’t look at any of the IT systems until she understood exactly where in the process material errors could occur and where they would be detected. She found that most of the time, we would detect it in a manual reconciliation step where account balances and values from one source were compared to another, usually on a weekly basis. “When this happens,” he says, with awe and wonder in his voice, “she knew the upstream IT systems should be out of scope of the audit.” “Here’s what she showed the auditors,” John says, excitedly flipping to the second page. “Quote: ‘The control being relied upon to detect material errors is the manual reconciliation step, not in the upstream IT systems.’
  • “He saw a presentation given by John Allspaw and his colleague Paul Hammond that flipped the world on its head. Allspaw and Hammond ran the IT Operations and Engineering groups at Flickr. Instead of fighting like cats and dogs, they talked about how they were working together to routinely do ten deploys a day! This is in a world when most IT organizations were mostly doing quarterly or annual deployments. Imagine that. He was doing deploys at a rate one thousand times faster than the previous state of the art.”
  • “It’s about continual experimentation, like Scott Cook did at Intuit, where they did over forty experiments during the peak tax filing season to figure out how to maximize customer conversion rates. During the peak tax filing season! “If you can’t out-experiment and beat your competitors in time to market and agility, you are sunk. Features are always a gamble. If you’re lucky, ten percent will get the desired benefits. So the faster you can get those features to market and test them, the better off you’ll be. Incidentally, you also pay back the business faster for the use of capital, which means the business starts making money faster, too.”

One comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s