Operation Bootstrap

Web Operations, Culture, Security & Startups.

Question: How Did You Build an Automated Service Delivery Pipeline?

| Comments

Here’s the scenario – you are a development team tasked with making the life of other developers easier. Your organization is breaking a monolithic service into many micro services and the current process for spinning up a new service and all the associated pieces is laborious, error prone, and not elastic at all. Your organization wants a self-service system that allows a team to spin up the scaffolding for a new service in days without being blocked by other teams.

If you have thoughts on this – I have some questions:

  • Have you done this before?
  • How would you describe what you built?
  • More interesting to me, what were the increments?
  • Did you try to use existing tools?
  • What was the team that delivered those increments comprised of (Ops, developers, both)? I know many Ops are developers – but mindsets are probably different.
  • How long did it take?

Keep in mind, I’m not talking about just deploying code, I’m talking about creating an entire dev workflow pipeline from desktop to prod – automatically. The assumption is that if you aren’t doing Continuous Deployment, you are at least doing Continuous Delivery. This probably looks something like a private PaaS at the end of the day – but automation that extends beyond just spinning up machines, it extends to CI, monitoring, everything.

If this story has been told in a blog post you have, or if you just copied what someone else wrote in their blog post, point me that way, but I want to hear about YOUR experience implementing it.

See, I am fairly certain that the answer to this question changes depending on the makeup of the group that built it. Further, I suspect that without the necessary increments – any group will build it wrong (like any software). So for a group that’s been down that road, I’m more interested in the journey than the destination.

Reply via email, in the comments, add links, use the twitters, whatever. I just want to learn – this is not the start of a debate.

Cargo Cult: Devops

| Comments

We’ve been at this for a few years now, talking about DevOps & what it means to be a DevOps friendly (or insert your term here) organization. For much of this time I’ve held the belief that an organization could change & that by creating examples of awesomeness you could lead a horse to water. Largely, I still think this is true, I still think in an organization absent of examples of what works – creating awesomeness can help people see that something better might be possible.

In tandem with this I’ve watched (and written) countless discussions about what DevOps means. I’ve heard countless definitions of what people think DevOps means which differ from my own definition. I’ve watched organizations create entirely new teams centered around what they understand DevOps to be. The typical charter of these teams centers around working with developers, yes, but also around automation & tooling.

In this process the word/title/team name of “DevOps” has become synonymous with “Operationally focused Development”… or Ops folks who code (sometimes). To me, this is just Operations and Development, but I’m an open minded guy and this post isn’t really about that – if doing this is so different from what you believe Operations is that you need to call it something different, so be it.

For organizations which I observe to be actually embracing the sprit of DevOps as it was originally intended, I find a few things that seem true:

  • Behavior isn’t isolated to Dev & Ops, it happens across the company and is usually promoted as part of the company core values.
  • Behavior is a result of the whole, everyone contributes to making it work and ensuring that it continues to work. They hire (and fire) with this value in mind.
  • Behavior happens because the people involved see it as a means to an end, that working together is how we achieve greatness & no single team can do that.
  • They don’t usually call it DevOps.

Why does that last bullet matter? Because giving it a name doesn’t make it so. Actually giving it a name, I think, removes power from the teams to define how things should work. We already have names for this stuff: Collaboration, Communication, Teamwork. When you call it “DevOps” then I start to wonder what you mean, because it must be different from something I already have a name for.

So today I heard a reference to Cargo Cult, decided to lookup this term I’ve used in the past to make sure I was using it correctly, and was struck by how it applies so perfectly to what I see as wrong with the way many folks interpret DevOps.

We’ve seen examples of Cargo Cult in the past. Agile implementations are surely ripe with examples of companies implementing a process but not embracing the principles. The world of marketing uses the idea every day to sell you stuff you don’t need:

  • “Installing this IDS will make you more secure” (security engineer not included)
  • “Using Adobe Photoshop will make you produce awesome photos” (ability to use a camera not included)
  • “Buying these jeans will make you look like a movie star” (gym membership, nutritionist, open-schedule definitely not included)

The result you are trying to reproduce is embodied in something that is a subset of what actually produced it. Taking that subset and dropping it into your life doesn’t give you all the things that produced it, you get an empty shell of the thing.

I love to rock climb and spend a fair amount of time at it. Rock Climbers have some observable characteristics – strong hands & upper body, relatively good balance, maybe less sanity than most folks. Many folks first approach climbing thinking that strength is the primary barrier to improvement. They think that to be a good climber they have to get strong. Then you watch some massive muscle-bound gym rat try to climb and you realize that can’t be right.

The reality is that climbers get strong by climbing. Climbers climb because they love the challenge, and they get better by being persistent and having the mental discipline to overcome doubt and fear. You can’t watch a climber and see passion, fear, doubt and their response to it. You can’t read their thoughts and know that, despite that move looking incredibly easy for them, it required very precise movement and exceptional focus and attention. You may not realize that the reason that particular sequence of movements worked for them was because they are 5’10” and have unusually long arms.

Climbers may enjoy the strength benefits of climbing, but if their objective was to become strong, there are more direct means. They become good climbers because they love something more fundamental about it.

In organizations where DevOps works, it isn’t Developers working with Operations that make it work, it’s people wanting to work with other people and the organization encouraging them to find the right solution together that makes it work. Operations seems to work well with Development in these organizations, an observable outcome of the culture, but reproducing that practice in another company isn’t likely to produce the same results. I’d go so far as to say it’s guaranteed to not produce the expected results.

I wrote a long winded post about what I see as things leading to a functional software development organization. DevOps is not a practice within these things that is singularly important, nor is it a team which is relevant to success. It’s now become a distracting misnomer for a subset of observable traits in successful organizations, few of which contribute to overall success when practiced in isolation. The factors that do contribute to success have been defined for quite a while now, they were defined in Good to Great, they are described in The Phoenix Project, and to a large extent they are at the core of what Agile was intended to be.

If you Cargo Cult DevOps into your organization then you’re just implementing a subset of what successful companies do & you are bound not to see the results you expect, unless your expectations are low.

On the other hand, if your goal is to use it as a hiring tool to clarify to Ops folks that the job you are offering is working on automation, I get it, but wish there was another term for that – because it isn’t DevOps. It’s Operations, or Development, or both. Something we all should be doing anyways.

I’m not really sure this post helps anyone, but it helped me – so thanks for reading.

Why I Infracode

| Comments

I’ve been involved in, and observed, some recent conversations which have me thinking about why I do what I do. Also, what exactly is it that I do? I was having a discussion about why I enjoy working in the areas I do – which I typically describe as:

  • Config management & Deploy automation
  • Monitoring infra & app integration
  • CI / Build infrastructure

I mentioned that I don’t typically contribute to product features, preferring to focus in these areas. When asked why that was, my first response was – “I suck at algorithms and I never went to school” – both true but inaccurate explanations for why I got where I am. Neither of those things really hold me back from building or contributing to a product. It also doesn’t describe why I’m passionate about what I do.

I’ve never been one to choose to do something because I can’t do what I really want to do. Well, except that if I could jump out of airplanes all day I’d probably do that – totally awesome fun. Instead I rock climb.

I love writing code. I love putting together bits and pieces and building something useful – I’m terrible at finishing coding books because without a practical problem to solve, the incentive just isn’t there. I also get bored pretty quickly. For me, having a stack of things to do is optimal – I move one to the point where I’m blocked and move to something new, I like to focus for periods of time, but then have to switch gears. Occasionally I’ll find something that really gets my attention and it’s like the worst video game addiction – my kids notice, my wife notices, nothing else gets done. That’s rare for me and I am horrible at handling those things other than to just knock it out and get done.

Managing this shifting of priorities & still getting things done is a skill I’ve worked hard to get right. At this point I’m pretty good at it, and struggle working any other way.

Operations has been this for me. In most organizations Ops are the ultimate generalists, both using their own experience to solve problems as well as being adept at engaging other domain experts. We know where the right folks hang out on IRC, we know when to call support and when to just dig in, we learn quickly and are super resourceful. We are exceptionally familiar with the tough problem cycle – initial interest, fear that this is too big, despair that you aren’t going to solve it, the glimmer of hope leading to resolution and awesomeness. I have another post brewing on that specific topic. This isn’t exclusive to Ops, but it’s something you get very used to.

Wait, wasn’t this post about Infracoding?

Yes, the only reason I’m still doing Ops work is because I get to write code. If you were to offer me an Ops job where all I did all day was figure out tough problems for other people to code solutions, I’d tell you to suck it. If you suggest that I can pair with a developer and we can fix it together, I’m happier. If I can get proficient enough to code my own solutions and have other people tell me how to improve what I did, I’d sign up. It’s for this reason that I love the pull request model. I’ve learned from both Sr and Jr team members through PR feedback and it’s a great way for folks to watch and observe what feedback others get.

But along with writing code I get to do other stuff too. Helping to debug strange inconsistencies between monitoring systems – without the ability to code I wouldn’t be able to write the small tools that make testing theories easier. Packaging up and deploying infrastructure, more opportunities to build tooling. Make it easier for Developers to provision their workstations, more coding, more variety. Building networks these days is often pretty limited if you aren’t willing to write code. I know many Ops folks have been doing this for a while, but more and more the % of time an Ops guy spends writing code is increasing, not decreasing. This isn’t universally true, but it’s true of work I am interested in calling Ops.

All along the way, as I get more proficient with coding, as I have become more familiar with patterns you see in production, I can work with developers more closely to help them build better systems. Where developers can definitely get exposed to these same patterns, Ops is pretty directly involved with how things operate in the real world. We get to pull together the resources to make things work and we get to learn in the process. This is the more traditional part of Ops for me, being the guy who helps developers understand the realities of production. But more and more that knowledge gap is closing and Ops are as much an Engineering team as the UI team is. Availability and Agility are features of your product, you have to engineer them in and that requires Developer awareness that is on par with Ops.

Alright Aaron, so what’s your point?

I don’t do feature work because frankly, it seems too focused for me, too specific. Within the Ops role I can go as deep into code as I want but then surface to work on many other things. If I want to go deep on something, there are a bazillion tools out there I can (and sometimes do) contribute to, I can build my own, or not. I get to work with the sharp edges of many tools and find ways to wrap Nerf bits around them. I get to choose when good enough is the enemy of perfection or when perfection is, in fact, the enemy.

I love Ops and every day that role moves closer and closer to awesomeness for me. Although I generally say I don’t know what I want to be when I grow up, I’m pretty sure it’ll look a lot like Ops.

DevopsDays 2013 - We Are Avoiding Culture, Why?

| Comments

I just got back from Devops Days Austin. It was a really good conference. I think I enjoyed the speakers and Open Spaces at this event more than I did at Devops Days Mountain View last year. Huge props to the organizers and speakers for putting together such a great set of topics.

One thing irked me a bit though, I heard a number of comments about how there was too much talk about culture.

I am confused – but I think I understand.

I attended one Open Space session called “Culture Hacks” which ended up not being so much about hacks as a plea for help. A number of folks expressed concerns about being in a difficult situation where they associate their problems with organizational cultural problems and they were looking for ideas on how to make things better. The suggestions that folks raised, myself included, sounded a lot like typical lean/agile tools – retrospectives, stand-ups, putting developers on-call, communicating metrics, providing incentives, hackathons. I think these are all good suggestions, the problem is that they only go so far. The unfortunate reality is that old saying – you can lead a horse to water, but you can’t make him drink. These are all tools but they don’t solve a problem in an organization that doesn’t want to change.

I think the topic of culture is pretty big and scary for many folks. It’s also very subjective, as John Willis raised in his talk – the culture in which a group of crooks thrive is very different than the culture in which a group of hippies would thrive (John didn’t use Hippies in his example). For each group though, there is a specific culture that allows those individuals to reach their goals in an optimal way. Is one culture right or wrong? Functional or dysfunctional? I think the answer is that if the culture is aligned with the team and makes them perform in an optimal way then it’s good – but it’s good for that group, not for everyone.

As such, if you haven’t defined an objective for your culture, if you’ve hired a mish-mash of people with different objectives and principles then you really aren’t going to find a culture that makes everyone achieve their goals. You can make it better for some, but it’ll likely not be better for others. Getting the folks off the bus who do not align with your desired culture is an important part of a change like this and is not something that an Ops person in one group can do. Here, I think, lies one of the main barriers to DevOps being about culture change – it’s driven by Ops people.

Ops working with Dev is great, we can all do that, but we are individuals who can only set an example and hope the organization follows suit. If they don’t – you can vote with your feet or suck it up, I’m not sure there are many other options. Setting an example often manifests as Ops folks trying get closer to Dev – the most natural way to do this is to write code, help with release engineering, enable developers to gain access to monitoring, logs, config management, etc. All the tools that we say aren’t actually what Devops is about… These seem to me to be a manifestation of ops folks doing what they can do get closer to Dev, that’s all.

Of course it isn’t culture, company culture is bigger than this, but it can change a small part of an organization & help set an example for the broader organization. Sometimes it can have a bigger impact – I love this one by John Allspaw as an example. Still, the change was isolated.

I recently read & really liked this TechCrunch article because I feel like it hits on an important point. If I work at a company where the CEO, CTO, COO, VP of Engineering or a variety of other high level positions push a culture that I don’t like – my chances of changing that are small. Further still, there is already a culture which is somewhat defined by the team you’ve hired. Unless you’ve worked very hard to hire for a specific person & team fit then no amount of effort is going to change the organizational culture.

So what is my point with all of this? While Dev & Ops collaboration is important to having a healthy development process – it is so easily undermined by problems with the larger organization. I don’t think that means you do nothing – I just think it helps clarify why talking about culture is hard. I also think having more actionable steps, more examples to replicate would be helpful. Right now the examples we have are of companies who have a good culture throughout – but what about companies who can’t do that? How do I fix my piece? How do I get my 10 person startup pointed in the right direction? How can I build a functional bubble inside a dysfunctional behemoth? What can I do to make my small corner of the world suck less?

There aren’t enough good examples for these questions – and this is why we need to talk about it. Talking about the little bits and pieces that work for you helps – so don’t avoid talking about it because you don’t have all the answers – share what you can. In that Open Space I attended, plenty of folks had little ideas about things that worked for them – they didn’t have a complete solution, but they had some ideas.

I shared my thoughts about how this may look when you start from the ground up and we’re implementing some of these ideas at my current company. Are they all the right things for this company? Probably not – the team will decide. Some of the things that worked at a past company wont work here – does that mean we are doomed? I don’t think it does.

Culture is very personal – but so are a lot of things that we have some established patterns for. We are homing in on some ideas that work – we need more ideas – and we need to talk about them more. I hope this comes up more at other Devops Days events in the future. It’s an important topic.

Good & Bad Patterns in Development and Operations

| Comments

As part of my role at a new company I’ve been asked to provide feedback about structuring Dev & Ops as well as what sorts of things work and don’t. I certainly don’t claim to have all the answers, but I’ve seen some very functional and some very dysfunctional organizations. I’ve spent a fair amount of time thinking about what works & why.

Below is a cleaned up version of a message I sent to our CEO who asked for my thoughts on what does and doesn’t work. This was intended as scaffolding for further discussion so I didn’t go into deep details. If you want more details on any particular area just throw some comments out there.

I realize not all these issues are black & white to many folks – there are gray areas. My goal with this message was to drive conversation.

I figure this is probably review to many folks, but maybe it’ll help someone.


First, there are some very simple goals that all these bullets drive toward & they’re somewhat exclusive to SaaS companies:

  • Customers should continuously receive value from Developers as code is incrementally pushed out
  • Developers should get early feedback from customers on changes by enabling features for customers to test
  • We can address problems for customers very quickly – often in a matter of hours
  • We can inspect and understand customer behavior very deeply, gathering exceptional detail about how they use the service.
  • We can swap out components & substantially change the underlying software without the customer knowing (if we do it right)
  • We can measure how happy customers are with changes as we make them based on behavior & feedback

The lists below are what I feel make that possible (Good) and what inhibit it (Bad)

Culture & Communication

Good:

  • Stand ups.
  • Retrospectives
  • Small, self-formed teams (Let folks work on their area of passion)
  • Use Information Radiators whenever possible (Kanban boards, stats on big monitors, etc)
  • Decisions by teams, Leaders facilitate consensus
  • Discovering what doesn’t work is part of finding the right solution, not something to fear.
  • Hackathons allow Developers to do things they are passionate about
  • Hire for personality & team fit first, technical ability second
  • Data driven decisions, strive to have facts to back up decisions.
  • Make the right behavior the easiest thing to do – build a low resistance path to doing the right thing.

Bad:

  • Top down decision making
  • Strict role assignments & Silos
  • Fear of not getting it right the first time
  • Hiring for technical ability thinking team fit will come later
  • Creating process out of fear that makes it difficult to do the right thing.

Eliminate Manual Processes

Good:

  • Continuous Deployment / Delivery
  • Fully automated testing
  • Test Driven Development
  • Fully automated system monitoring, configuration & provisioning
  • Separate Deploy & Release (Feature toggles)
  • Deploy from master, do not branch (Forces particular behaviors)

Bad:

  • Manual testing by a QA Team – sometimes it’s necessary, but should be avoided
  • Deploying off a branch, slows things down & allows for other bad behaviors
  • Writing tests after writing code, code isn’t written with testing in mind
  • Developers relying on other teams to perform tasks that could be automated.
  • Processes that are the result of fear rather than necessary business process.

If it moves, measure it

Good:

  • Collect high resolution metrics about everything you possibly can
  • Developers can add new metrics by pushing new code, do not rely on additional configuration by other teams.
  • Graphs & metrics can be seen by anyone – Developers should rely on these.
  • There should be individuals or teams who are passionate about data visualization & analysis.
  • Dev teams rely on these metrics to make decisions, help identify what metrics are important
  • Developers watch metrics after pushing new code, watch for trend changes (Devs take responsibility for availability)

Bad:

  • Operations has to configure new metrics after developers have added support for them (Manual)
  • Operations monitors metrics & asks Dev teams when they think there’s a problem
  • Developers don’t look at metrics unless something is brought to their attention
  • Code doesn’t expose metrics until someone else asks for it

And here is the long version of all of that…

#1 Culture & Communication

Above all else I consider these most important. I think most problems in other areas of the business can be overcome if you do well in these areas. Rally has been, by far, the best example of a very successful model that I’ve seen in this area. They aren’t unique – there are other companies with similar models & similar successes.

Main points

  • Stand ups. By far the most effective tool for keeping everyone in touch. As teams grow you have to break them apart, so you have a 2nd standup where teams can bring cross-team items to share.
  • Projects are tackled by relatively small, typically self-formed teams. Get individuals who are interested in working in an area together & they feed on each others passion.
  • Perform retrospectives. This gives individuals & small groups the ability to voice concerns in a way that fosters resolution. There’s an art to facilitating this but it works well when done right. It also allows recognition of things that are done well.
  • Use open information radiators – it should be easy to see what’s going on by looking at status somewhere vs. having to ask for status, go to meetings, etc. Kanban boards are great for this.
  • Leaders exist to facilitate and help drive consensus but decisions are largely made by teams, not leaders. This makes being a leader harder, but it makes the teams more empowered.
  • Accept that things may not work & the team and company will adjust when things do not work. This makes it easy to try new things & easy for people to vocalize when they think it isn’t working. If it’s hard to change process then people are more resistant to try new things. This goes back to retrospectives for keeping things in check. Also important in this are “spikes” or time boxed efforts explicitly designed to explore possibilities.
  • Give developers time to pursue their own projects for the company. Many awesome features have come out of Hackathons where developers spent their own time to build something they were passionate about.
  • Hire for personality fit first. I have seen many awesome people find a special niche in a company because they grew into a role that you couldn’t hire for – but what made that possible was that they worked well with the team as an individual. Hiring for technical skill also means you lose that skill when that person leaves, I would prefer to have cross-functional teams.
  • Data driven decisions. This helps keep emotion and “I think xyz” out of the discussion & focuses on the data we do and do not have. If we don’t have data we either get more or acknowledge we may not be making the right decision but we’re going to move forward.
  • Make the right thing the easiest thing. I’ve seen too many companies put process out there that makes the “right thing” really difficult, so it gets bypassed. The right thing should be an express train to done – very little resistance and very easy to do. It’s when you start wanting to do things differently that it should become harder, more painful.

Also, everyone owns the quality of the service. This includes availability, performance, user experience, cost to deliver, etc. At my last company, there was exceptional collaboration between Operations, Engineering and Product (and across engineering teams) on all aspects of the service and there was a strong culture of shared ownership & very little finger pointing.

If you want more details on this specific to Rally I wrote a blog post with some more info: Blog Post

#2 Obsessively eliminate manual process – let computers do what they are good at.

This is so much easier to do up front. There should be as little manual process as possible standing between a developer adding value for customers (writing code) and that code getting into production. There may be business process that controls when that feature is enabled for customers – but the act of deploying & testing that code should not be blocked by manual process. I refer to this as separating “Deploy” from “Release” – those are two very different things.

Testing should only be manual to invalidate assumptions, validating assumptions should be automatic When we assume that if x is true then y will occur, there should be a test to validate that this is true. Testers should not manually validate these sorts of things unless there is just no way to automate them (rare). Testers are valuable to invalidate assumptions. Testers should be looking at the assumptions made by Developers and helping identify those assumptions that may not always be correct.

Too many organizations rely on manual testing because it’s “easier”, but it has some serious drawbacks:

  • You can only change your system as fast as your team can manually test it – which is very slow.
  • Your testing is done by humans who make mistakes and don’t behave predictably so you get inaccurate results.
  • The # of tests will only grow over time, requiring either more humans or more time, or both. It doesn’t scale.

Over time the software quality gets lower, takes longer to test, and the test results become less reliable. This is a death spiral for many companies who eventually find it very hard to make changes due to fear & low confidence in testing.

Avoiding this requires developers spend more time up front writing automated tests. This means developers might spend 60-70% of their time developing tests vs. writing code – this is the cost of doing business if you want to produce high quality software.

That may seem excessive, but the tradeoffs are significant:

  • Much higher code quality which stays high (those tests are always run, so re-introduced bugs (regressions) get caught)
  • Faster developer on boarding, the tests describe how the code should behave and act as documentation.
  • Refactoring code becomes easier because you know the tests describe what it should do.
  • Each commit to the codebase is fully tested, allowing nearly immediate deployment to production if done right.
  • Problems that make it into product feed back into more tests & continually improve code quality.

Much of the time developing tests is spent thinking about how to solve the problem, but you are also writing code with the intent of making it testable. Code is often written differently when the developer knows tests need to pass vs. someone manually testing it. It’s much harder to come along later and write tests for existing code.

You will hear me talk about Continuous Deployment & Continuous Integration – I feel these practices are extremely important to driving the above “good” behaviors. If you strive for Continuous Deployment then everything else falls into place without much disagreement because it has to be that way. This has a lot of benefits beyond what’s listed above:

  • Value can be delivered to customers in days or hours instead of weeks or months
  • Developers can get immediate feedback on their change in production
  • New features can be tuned & tweaked while they are fresh in a developers mind
  • You can focus on making it fast to resolve defects, no matter how predictable they are, rather than trying to predict all the ways things might go wrong.
  • Most of the tools and behaviors that enable Continuous Deployment scale to very large teams & very frequent deployments. Amazon is a prime example of this, deploying something, somewhere, about every 11 seconds. Many companies that are in the 30-100 engineer size talk about deploying tens of times per day.
  • This also impacts how you hire QA/Testers. This is a longer discussion, but you want to hire folks who can help during the test planning phase & can help Developers write better tests. Ideally your testers are also developers & work in a way that’s similar to Operations, helping your Developers to be better at their jobs.

#3 If it moves, measure it

I mentioned above, two big advantages a SaaS organization has are the amount it can learn about how customers use the product & the ability to change things rapidly. Both of these require obsessive measurement of everything that is going on so that you know if things are better or worse. Some of these metrics are about user behavior & experience to understand how the service is being used. Other metrics are about system performance & behavior.

The ability to expose some % of your customer base to a new feature & measure their feedback to that is huge. Plenty of companies have perfected the art of A/B testing but at the heart of it is the ability to measure behavior. Similar to testing, the software has to be built in a way which allows this behavior to be measured.

System performance similarly requires a lot of instrumentation to identify changes in trends, to identify problems areas in the application & to verify when changes actually improve the situation.

I’ve been at too many companies where they simply had no idea how the system was performing today compared to last week to understand if things were better or worse. At my last company I saw a much more mature approach to this measurement which worked pretty well, but it required investment. They had two people fully dedicated to performance analysis & customer behavior analysis.