Work-as-Imagined vs Work-as-Done

With engineering focus on reducing incidents and improving operational reliability, I frequently come back to the realization that humans are fallible and we should be learning ways to nudge people toward success rather than failure.

There are whole industries and research machines built around the study of Human Factors, and how to improve safety, reliability, and quality. One topic that struck me as extremely useful to software engineering was the concepts of Work-as-Imagined (WAI) versus Work-as-Done (WAD). Anytime you’ve heard “the system failed because someone executed a process differently than it was documented” could be a WAI vs WAD issue. This comes up a lot in healthcare, manufacturing, and transportation — where accidents can have horrible consequences.

Full disclosure: There are actually many varieties of human work, but WAI and WAD are good enough to make the point. Steven Shorrock covers the subject so well on his blog: Humanistic Systems 

Work-as-Imagined

When thinking about a process or set of tasks that make up work, we need to imagine the steps and work others must do to accomplish the tasks. We do this for many good reasons, like scheduling, planning, and forecasting. WAI is usually formed by past experiences of actually doing work. While this is a good starting point, it’s likely the situation, assumptions, and variables are not the same.

To a greater or lesser extent, all of these imaginations – or mental models – will be wrong; our imagination of others’ work is a gross simplification, is incomplete, and is also fundamentally incorrect in various ways, depending partly on the differences in work and context between the imaginer and the imagined. — The Varieties of Human Work

Work-as-Done

Work-as-Done is literally the work people do. It happens in the real world, under a variety of different conditions and variables.  It’s hard to document WAD because of the unique situation in which the work was done and the specific adjustments and tradeoffs required to complete the work for a given situation.

In any normal day, a work-as-done day, people:

  • Adapt and adjust to situations and change their actions accordingly
  • Deal with unintended consequences and unexpected situations
  • Interpret policies and procedures and apply them to match the conditions
  • Detect and correct when something is about to go wrong and intervene to prevent it from happening

Mind the Gap

Monitoring the gap between the WAI and the WAD of a given task has been highlighted as an important practice for organizations aiming to achieve high reliability. The gap between WAI and WAD can result in “human error” conditions. We frequently hear about incidents and accidents that were caused by “human error” in a variety of situations:

  • Air traffic near misses at airports
  • Train derailments and accidents
  • Critical computer systems taken offline
  • Mistakes made during medical procedures

It’s natural for us to blame the problem on the gap — people didn’t follow the process — and try to improve reliability and reduce errors by focusing on stricter adherence to WAI. Perhaps unsurprisingly, this results in more rules and processes which can certainly slow down overall productivity, and even increase the gap between WAI and WAD.

Safety management must correspond to Work-As-Done and not rely on Work-As-Imagined. — Can We Ever Imagine How Work is Done?

In recent decades, there is more focus on WAD. Examining the reasons why the WAD gap exists and working to align WAI more closely with WAD. Embracing the reality of how the work is done and working to formalize it. Instead of optimizing for the way we imagine work is done, we acknowledge the way work is actually done.

Closing the Gap

In my work, production incidents that occur in software systems are an easy area to find WAI vs WAD happening. Incident management and postmortems have best practices that usually involve blameless reviews of what led to the incident. In many case, the easiest answer to “how can we stop this incident from happening again?” is better documentation and more process.

Modern incident management is focusing more on learning from incidents and less about root-cause analysis. One reason is that incidents rarely happen the exact same way in the future. Focusing on fixing a specific incident yields less value than learning about how the system worked to create the incident in the first place. Learning about how your system work in production is harder, but yields more impact in discovering weak parts of the systems.

This section could be an entire book, or at least several posts, so I’ll leave it to you to read some of the links.

Desire Paths

The whole WAI vs WAD discussion reminds me of desire paths, which visually show the difference between the planned and actual outcomes.

Desire paths typically emerge as convenient shortcuts where more deliberately constructed paths take a longer or more circuitous route, have gaps, or are non-existent

Tying together desire paths with WAI & WAD, some universities and cities have reportedly waited to see which routes people would take regularly before deciding where to pave additional pathways across their campuses and walking paths.

Information Flows in Organizations

I’ve had cause to looked into research and ideas about the ways information flows within organizations. Discussions about transparency, decision making, empowering teams, and trust seem to intersect at organizational communication and information flows.

One of my favorite people to follow in this space is Gene Kim (Phoenix Project, DevOps Handbook, Accelerate, and DORA Reports). He has done a few podcasts that focused on relevant topics and concluded that you can predict whether an organization is a high performer or a low performer, just by looking at the communication paths of an organization, as well as their frequency and intensity. (Episode 16, @54 min)

Some of these ideas might resonate with you. There are generally two forms of information flows:

  • Slow flows where we need detailed granularity and accuracy of information. Leadership usually needs to be involved in these discussions so communication tends to escalate up and down hierarchies.
  • Fast flows where frequency and speed tend to be more important. These flows occur in the operational realm, where work is executed, and happen directly between teams using existing interfaces.

In the ideal case, a majority of the communication is happening within and between teams using fast flows. Forcing escalation up and down the hierarchy means getting people involved who probably don’t have a solid grasp of the details. Interactions are slow and likely lead to poor decisions. On the other hand, when teammates talk to each other or where there are sanctioned ways for teams to work with each other with a shared goal, integrated problem solving is very fast.

This doesn’t mean all information flows should be fast. There are two phases where slow flows are critical: Upfront planning and Retrospective assessment. Planning and preparation are the activities where we need leaders to be thoughtful about defining the goals and then defining responsibilities and the structures to support them. Later, slow communications come back when we assess and improve our performance and outcomes.

Thinking, Fast and Slow

I want to be clear that fast and slow information flows are different concepts than the fast and slow modes of thinking explored in Daniel Kahneman’s book Thinking, Fast and Slow. The book explores two systems of thinking that drive the way humans make decisions.

  • System 1 (Fast Thinking): This system is intuitive, automatic, and operates quickly with little effort or voluntary control. It’s responsible for quick decisions, habits, and reactions.
  • System 2 (Slow Thinking): This system is deliberate, analytical, and requires effortful mental activity. It’s used for complex computations, learning new information, and solving difficult problems.

Kahneman discusses how these two systems can work together, but sometimes lead to biases and errors in judgment. He talks about how these modes can affect decision-making and offers suggestions into how we can become more aware of these biases to make better decisions.

Obviously another area worth exploring to help understand how organizations can support people to create better outcomes.

Continuously Doing a Thing

Practice makes perfect — Anonymous Parent

A theme that keeps popping up in my world is the idea of how often an action is done being correlated to how well the action is done

  • Deploying application and system code
  • Releasing application distributions
  • Triaging issues
  • Testing product behavior
  • Creating objectives
  • Running experiments
  • Executing migrations

A lot has been written about high-performing engineering teams. Accelerate is a great resource for exploring the behaviors of such teams. Frequent deploys is one of the leading indicators and was chosen by the authors as a key metric. With enough practice, deploys become low-risk and low-stress.

Small batches are another trait of successful teams. Performing a deployment more frequently usually means there are fewer changes happening each time. These small batches can actually improve overall quality because fewer changes happen in each cycle.

Rotating a large group of people through activity shifts, like handling issue triage or the application release process, allows the group to share the burden, but there are downsides too. If the activity isn’t part of the group’s primary deliverable, it’s likely not a priority. If there are long stretches of time between any given person taking on the activity there might only be enough time to just do the work, but never think about how to improve the process or tooling. There is no time to become good at the process.

The DevOps Handbook talks a lot about the benefits of shorter feedback loops across many different aspects of engineering organizations. In most situations, shorter feedback loops happen when an activity becomes a more continuous process.

If you have an area that could be improved, maybe you could ask yourself if the process could happen more often.

Being an Effective Engineering Leader

I often wonder if I’m being effective at my job. Might be related to my impostor syndrome, but in engineering management, the signals of effectiveness aren’t always clear. I have some basic, high level criteria I try to think about monthly, or so, to provide some insight.

Providing a clear direction

Lack of clear direction can sometimes be seen when teams are doing medium-term/quarterly planning. If the objectives aren’t aligned with upper management, it’s probably my fault for not creating clear direction and expected outcomes. Try not to be too prescriptive, but make sure the goals are clearly defined.

Try to have a good narrative for each of these levels:

  • Vision: How the team(s) create impact
  • Mission: Role the team(s) within the company
  • Objectives: Think about the next year

Shipping what matters

Do the same problems keep coming up? Make sure we are prioritizing the right work. Make sure we are completing the work. Talk to senior engineers about problems that seem to be holding us back.

Focus on capabilities. Keep improving the operational capabilities of the company. Feature projects are built on capabilities.

Maintain a healthy mix of project sizes. Big projects can stall shipping momentum. Make sure big projects are broken into smaller milestones and iterations. Small projects might feel like low impact, but sometimes are just what people are asking to see.

Helping people grow

Have honest conversations about expectations and performance, and providing actionable feedback.

Make space for other people by getting out of the way. For projects and meetings where I’m getting invited as a point-of-contact, look for other people I can delegate the role to.

Surveys and Feedback Loops

Workplaces typically have company-wide engagement surveys to get feedback on many aspects. Those usually have a management section, and this feedback can be a gift. Interpreting feedback in a positive a way and not a personal attack might be a learned skill, but worth learning.

Thanks to Nick DiStefano for a reminder that manager surveys are also a useful way to get regular feedback on how things are going. Manager surveys can happen more frequently than company-wide engagement surveys and are usually more focused at the team-level.

Engineering Productivity: Being Actionable

There is a lot of information about engineering productivity out there. No one says it’s easy, but it can be downright difficult to turn the practices you hear about into plans you can put into action. What follows is an example of how we can create an actionable plan to increase our productivity.

Let’s define engineering productivity as how effectively your engineering team can get important and valuable work done.

  • How do you determine important and valuable work? — goals and objectives.
  • How do you effectively get work done? — remove wasted time and effort from the delivery cycle

Goals, Planning, and Prioritizing

If productivity is an organizational goal, you need to make sure people understand why and how it affects them. You need to communicate the message over and over in as many venues as possible. The more developers understand the goals and the direction, the more engaged they’ll be with the work.

Engineering teams should try to set ambitious goals, focused through the lens of the team’s mission statement. We also try to create measures for defining success — yes, this is the OKR framework, but any goal or strategy planning can be used. We try to keep goals (objectives) and measures (key results) from being project to-do lists. Projects are tasks we can use to move the measures. Goals are bigger than projects. 

Engineers need to clearly understand the importance of their work. Large backlogs of work create decision fatigue about what work to prioritize. Without planning and prioritizing, we can end up with teams that aren’t aligned — the opposite of productivity. Use your goals, even the high-level organizational goals, as a guide to prioritize work.

Removing Wasted Time and Effort

A great resource for exploring engineering performance is the book Accelerate. Based on years of research and collected data (State of DevOps reports), the book sets out to find a way to measure software delivery performance — and what drives it. Some important measures include:

  • Lead Time: time it takes to go from a customer making a request to the request being satisfied.
  • Deployment Frequency: frequency as a proxy for batch size since it is easy to measure and typically has low variability. In other words: smaller batches correlates with higher deploy frequency and higher quality.
  • Time to Restore: given software failures are expected, it makes more sense to measure how quickly teams recover from failure.
  • Change Fail Percentage: a proxy measure for quality throughout the process.

Each of these measures could be a goal we want to focus and improve. Each measure has an impact on our ability to deliver software faster with better quality. Let’s also call-out that these measures are somewhat overlapping and interdependent.

Creating an Action Plan

Picking a Goal

As an experiment, let’s take one, Lead Time, and see how we could brainstorm ways to improve it. In a different favorite book, The DevOps Handbook, we’re presented with ways to effect change in Lead Time. A short summary that does not do justice to the depth presented in the book:

  • Reduce toil with automation
  • Reduce number of hand-offs
  • Find and remove non-value time
  • Create fast and frequent feedback loops

Let’s think about what’s involved between filing a ticket to start work — to delivering the work to the end user? Many different tasks and activities happen within this cycle. This becomes the scope we can work within. Some high-level things come to mind:

  • Designing
  • Coding
  • Reviewing
  • Testing
  • Bug Fixing
  • Ramping
  • Monitoring

Picking Measures

We should be thinking about ways to measure success and failure of these activities. This should be independent of the work we intend to undertake. We can draw upon the pain and stumbles that have happened in the past. Finding good measurements can be a very hard process itself. Let’s not be unrealistic about our expectations on manual processes — we’re only human and people make mistakes. Think about ways to make it easy to succeed and hard to fail:

  • Find more defects in pre-release than post-release: We’re always going to have bugs, but let’s try to find and fix more of them before releasing.
  • Reduce the times a project gets bumped to next release: This happens a lot and for many different reasons. We should be better at hitting the desired timeline.
  • Reduce the time it takes people to be exposed to a feature release: It can take days or week for people to “see” new features appear in the apps when ramping a feature flag. This also makes A/B testing painful.
  • Reduce the times a feature flag is rolled back: Finding problems after we ramp a feature in production is costly, painful, and slows the release of the feature.
  • Reduce time to detect and time to mitigate incidents: We’ll always have breaking incidents, but we need to minimize the disruptions to people using the product. Minutes, not days.
  • Reduce amount of non-value time: It’s hard to say “code should be reviewed in X minutes”, or “bugs should be found in Y hours”, but it’s easier to identify dead-time in those activities.

Brainstorming Projects

With our objective and measures sketched out, let’s think about the activities and tasks we want to change. Some are manual. Many involve multiple teams. There are a lot of hand-offs. Let’s create smaller affinity groups based on the tasks and activities using the framework.

Reduce toil with automation

  • Fast and continuous integration/UI testing
  • Canary monitoring and alerting
  • Simple hands-off deployments
  • Easy low risk feature ramping

Reduce number of hand-offs by keeping cross-functional teams informed and involved

  • Spec and requirement generation
  • Test plan generation and updates
  • Pre-release testing setup

Find and remove non-value time, usually the gaps between stages

  • Fast edit/build/test cycles for developers
  • Timely code reviews
  • All code merges ready for QA next day
  • File new defect tickets ASAP
  • Prioritized pre-release defect tickets
  • Merging green code

Create fast and frequent feedback loops

  • Timely code reviews
  • Fast and continuous integration/UI testing
  • All code merges ready for QA next day
  • File new defect tickets ASAP
  • Fast, short feature ramps
  • Canary monitoring and alerting

This level of grouping is perfect to start brainstorming actual project ideas. We’ve started at an organization-level objective (Increase engineering productivity), focused on a contributing factor (Lead time), and created a nice list of projects that could be used to affect the factor. This is important — we’re not focused on a single large project! We have many potential small, diverse projects. This dramatically increases the probability that we will succeed, to some degree. A single project is an all-or-nothing situation and lowers your probability of success. Most projects fail to complete, for one reason or another.

We also see that some idea groups appear multiple times. This allows us to leverage work to create impact in more ways.


If you take anything away from this post, I hope it’s that improving engineering productivity is an actionable goal. We can be systematic and measure results.

Accelerate and The DevOps Handbook cover a lot more than what I’ve presented here. The information on organizational culture and its effects on performance are also very enlightening. I’d recommend both books to anyone who wants to learn more about ways to improve engineering productivity.

Shipping Faster: The Hackday Mentality

We recently held our Summer Hackday at Tumblr and the results were impressive. I started to think about the mentality of a Hackday and how it differs from a more traditional product feature workflow.

It’s amazing what can be accomplished in a day.

Individuals or small groups start planning their projects in the days leading up to the event. On the day of the event, they’re off and running. They have 24 hours to get something working and demo it to the rest of the company.

Dead-end technical approaches are quickly discarded for alternatives, usually something more simple — the clock is ticking. There is no time for complexity or grand schemes. Get the basics working so you can impress your coworkers.

hackometer
Tumblr Hackometer measures completeness & shipping potential

After the demo presentations, there is always discussion about “how close this project is to shipping?” or “what’s left to do before we could release this project?” or some other notion that we could productize certain projects after a little clean-up work.

Productize — The death knell of the Hackday project. But why? I think it’s scope creep. Scope of the purpose, but also scope of the code.

The constraint of limited time is a gift, forcing or removing decisions which create a better environment for completing the project. Hackday projects are often more aligned with the core purpose of the product as well.

  • Focus on a singular purpose — try to be good at one thing.
  • No time or space for complexity — you can’t build whole new architectures.
  • Built on existing frameworks, patterns, and primitives — it fits into the existing product structure.

The Hackday mentality seems like a better process for building better products. It reminds me of the “fixed time, variable scope” principle from Basecamp’s Shape Up, a book describing their product process. They use six week time-boxes for any project.

Constraints limit our options without requiring us to do any of the cognitive work. With fewer decisions involved when we’re constrained, we’re less prone to decision fatigue. Constraints can actually speed up development.

Ship faster.

Thoughts on Organizational Culture

I’ve been thinking a lot about why it seems so hard to effect change in organizations. The change I’m referring to could be related to product strategy, processes, or improving engineering / operational excellence.

I’ve come to realize that in many situations our efforts and plans don’t always align with the organization’s culture. When that happens, change is difficult.

I’m using culture here to mean something deeper than espresso machines, foosball tables, and edgy office decor — the visible parts of an organization’s culture. I’m talking about an organization’s beliefs, values, and basic assumptions — the things people take for granted and guide decisions. These may have started from the founders, but they’ve evolved over time as we praise and recognize specific behavior.

From Edgar Schein’s “Organizational Culture and Leadership“:

The only thing of real importance that leaders do is to create and manage culture. If you do not manage culture, it manages you, and you may not even be aware of the extent to which this is happening.

We need to become aware of the organization’s culture and learn to manage it in the direction of our desired outcomes.

From Schein’s framework for changing culture:

Change creates learning anxiety. The higher the learning anxiety, the stronger the resistance.

  • The only way to overcome resistance is to reduce the learning anxiety by making the learner feel “psychologically safe”.
  • The change goal must be defined concretely in terms of the specific problem you are trying to fix, not as culture change.

I find myself trying to learn what people value about an existing behavior and how it relates to a purpose or mission. If I want to change to a different behavior, I must show a higher value in the new behavior. This can sometimes be easier if I can create an association to our existing basic assumptions.

From a Kellan Elliott-McCrea post:

Culture is what you celebrate. Rituals are the tools you use to shape culture

Celebrate work and actions that align with strategy. We need to reinforce what we think is important. Reinforcement requires consistent messaging.

  • Create a brief and to the point mission or high-level purpose
  • Establish a few simple & crisp principles that support the mission

Use these as a framework to scope & define objectives and strategy. They also provide a foundation for shaping culture.

 

Random Thoughts on Team Structure


I’ve written previously about my thoughts on team structure. I’m a fan of product-centric teams — multidisciplinary teams that embed members from functional groups on the same team, all working together to create and ship a software product.

Team Evolution

At some point, a team might grow large enough that you want to split into smaller groups, each with a primary focus. You’re still building a single product, but now you have a collection of product-centric teams working on specific features. How did you get here?

  • Teams get harder to manage and coordinate as they grow in size.
  • Product-drivers feel like it’s a struggle to get development focus on their features.

I’ve been able to work in both situations: single product-centric team, and multiple feature-based teams. My preference is still the single product-based team. The downsides of feature-based teams outweigh the advantages.

  • Teams become silos and stop focusing on the product as a whole.
  • Issues without a clear owner become someone else’s problem.
  • Cross-team communication becomes more difficult as more groups are created.
  • Individual team ambitions inadvertently dilute the primary focus of the product.

Conway’s Law tells us that organizations tend to build products based on the organization’s structure. Using several small teams, with a focus on specific features, will have an effect on the final product. It might not be a desired effect.

Mindful Divisions

I’m not suggesting teams grow beyond 7 to 10 people. There is plenty of literature, and experience, that tells us that would be bad, and even less efficient. But how you divide teams is important. Some divisions are more natural than others:

  • By platform (Desktop, Android, iOS, Web): Make sure there is some product consistency across platforms.
  • By front-end / back-end: Make sure both sides are part of defining the interaction APIs.
  • By application / UI widgets: Make sure both sides are part of defining the component APIs.

These separations are clean and easier to identify.

Feature Survival

Single product-centric teams bring us back to the issue of product-drivers fighting for development focus. I think this is a good thing.

When features are implemented through a single team, you need to be good at prioritizing. It shouldn’t be easy to add every little feature to the product. By making all features compete for priority, you make sure the best features get the attention.

I believe this makes the product stronger.

Guiding Teams to Outcomes


You work in an organization that sets some high-level goals. Your team might be accountable for some of those goals. However, to hit the goals, you’ll need cooperation from groups outside of your team.

What do you do? How do you get everyone on the path to finishing the shared outcomes?

Situations like this happen a lot. Some ideas:

  • Make sure the path is clearly marked.
  • Make it easy for people to stay on the path.
  • Make it hard for people to go off the path.
  • Be the voice of encouragement.
  • Be the voice of recognition.
  • Assume people want to be on the path, but they might also be busy with other problems.

Managing “friction” can be a useful technique in getting everyone working toward the goals. Try to reduce friction on anything that positively affects getting to the outcomes, but add friction to those things that are negative.

  • Centralize documentation for checklist processes. Better yet, automate as many of the steps as possible. Even better might be to add the manual steps to your automated steps so you only have one true list.
  • Do more checks in your continuous integration (CI) system, especially adding automated tests (unit, integration and performance). Stop regressions ASAP.
  • Make sure the output of your process is being measured and is clearly visible to everyone. Put up monitors with charts and graphs in your open office spaces. Showing progress and trends helps to reinforce the importance of everyone’s role in hitting goals.
  • Add anomaly detection to the measurement data. Don’t count on people to find the problems in real-time.
  • Don’t be surprised if you need to keep repeating the plan.

Random Management: Unblocking Technical Leadership

I’ve been an Engineering Manager for a while now, but for many years I filled a Developer role. I have done a lot of coding over the years. I still try to do a little coding every now and then. Because of my past as a developer, I could be oppressive to senior developers on my teams. When making decisions, I found myself providing both the management viewpoint and the technical viewpoint. This usually means I was keeping a perfectly qualified technical person from participating at a higher level of responsibility. This creates an unhealthy technical organization with limited career growth opportunities.

As a manager with a technical background, I found it difficult to separate the two roles, but admitting there was a problem was a good first step. Over the last few years, I have been trying to get better at creating more room for technical people to grow on my teams. It seems to be more about focusing on outcomes for them to target, finding opportunities for them to tackle, listening to what they are telling me, and generally staying out of the way.

Another thing to keep in mind, it’s not just an issue with management. The technical growth track is a lot like a ladder: Keep developers climbing or everyone can get stalled. We need to make sure Senior Developers are working on suitable challenges or they end up taking work away from Junior Developers.

I mentioned this previously, but it’s important to create a path for technical leadership. With that in mind, I’m really happy about the recently announced Firefox Technical Architects Group. Creating challenges for our technical leadership, and roles with more responsibility and visibility. I’m also interested to see if we get more developers climbing the ladder.