Work-as-Imagined vs Work-as-Done

With engineering focus on reducing incidents and improving operational reliability, I frequently come back to the realization that humans are fallible and we should be learning ways to nudge people toward success rather than failure.

There are whole industries and research machines built around the study of Human Factors, and how to improve safety, reliability, and quality. One topic that struck me as extremely useful to software engineering was the concepts of Work-as-Imagined (WAI) versus Work-as-Done (WAD). Anytime you’ve heard “the system failed because someone executed a process differently than it was documented” could be a WAI vs WAD issue. This comes up a lot in healthcare, manufacturing, and transportation — where accidents can have horrible consequences.

Full disclosure: There are actually many varieties of human work, but WAI and WAD are good enough to make the point. Steven Shorrock covers the subject so well on his blog: Humanistic Systems 

Work-as-Imagined

When thinking about a process or set of tasks that make up work, we need to imagine the steps and work others must do to accomplish the tasks. We do this for many good reasons, like scheduling, planning, and forecasting. WAI is usually formed by past experiences of actually doing work. While this is a good starting point, it’s likely the situation, assumptions, and variables are not the same.

To a greater or lesser extent, all of these imaginations – or mental models – will be wrong; our imagination of others’ work is a gross simplification, is incomplete, and is also fundamentally incorrect in various ways, depending partly on the differences in work and context between the imaginer and the imagined. — The Varieties of Human Work

Work-as-Done

Work-as-Done is literally the work people do. It happens in the real world, under a variety of different conditions and variables.  It’s hard to document WAD because of the unique situation in which the work was done and the specific adjustments and tradeoffs required to complete the work for a given situation.

In any normal day, a work-as-done day, people:

  • Adapt and adjust to situations and change their actions accordingly
  • Deal with unintended consequences and unexpected situations
  • Interpret policies and procedures and apply them to match the conditions
  • Detect and correct when something is about to go wrong and intervene to prevent it from happening

Mind the Gap

Monitoring the gap between the WAI and the WAD of a given task has been highlighted as an important practice for organizations aiming to achieve high reliability. The gap between WAI and WAD can result in “human error” conditions. We frequently hear about incidents and accidents that were caused by “human error” in a variety of situations:

  • Air traffic near misses at airports
  • Train derailments and accidents
  • Critical computer systems taken offline
  • Mistakes made during medical procedures

It’s natural for us to blame the problem on the gap — people didn’t follow the process — and try to improve reliability and reduce errors by focusing on stricter adherence to WAI. Perhaps unsurprisingly, this results in more rules and processes which can certainly slow down overall productivity, and even increase the gap between WAI and WAD.

Safety management must correspond to Work-As-Done and not rely on Work-As-Imagined. — Can We Ever Imagine How Work is Done?

In recent decades, there is more focus on WAD. Examining the reasons why the WAD gap exists and working to align WAI more closely with WAD. Embracing the reality of how the work is done and working to formalize it. Instead of optimizing for the way we imagine work is done, we acknowledge the way work is actually done.

Closing the Gap

In my work, production incidents that occur in software systems are an easy area to find WAI vs WAD happening. Incident management and postmortems have best practices that usually involve blameless reviews of what led to the incident. In many case, the easiest answer to “how can we stop this incident from happening again?” is better documentation and more process.

Modern incident management is focusing more on learning from incidents and less about root-cause analysis. One reason is that incidents rarely happen the exact same way in the future. Focusing on fixing a specific incident yields less value than learning about how the system worked to create the incident in the first place. Learning about how your system work in production is harder, but yields more impact in discovering weak parts of the systems.

This section could be an entire book, or at least several posts, so I’ll leave it to you to read some of the links.

Desire Paths

The whole WAI vs WAD discussion reminds me of desire paths, which visually show the difference between the planned and actual outcomes.

Desire paths typically emerge as convenient shortcuts where more deliberately constructed paths take a longer or more circuitous route, have gaps, or are non-existent

Tying together desire paths with WAI & WAD, some universities and cities have reportedly waited to see which routes people would take regularly before deciding where to pave additional pathways across their campuses and walking paths.

Tracking Work is Fundamental

“Developers should only need Github Issues and Pull Requests to do their job” — Why should anyone need more than that to track work?

Small companies and startups have small engineering teams. The amount of effort required to understand the ongoing and planned work is low due to sheer lack of ability to take on too much and succeed. Failure weeds out the companies that take on too much, too soon.

Companies succeed and grow, and so do the engineering teams. At some point, multiple engineering teams are created or evolve. Ideally, these teams are self-sufficient and isolated from each other, creating modular and decoupled output. This ideal state rarely lasts and soon cross-team projects start to appear. Teams continue to evolve into product and platform functions, creating more opportunities for cross-team dependencies.

At this point, work can no longer be tracked at the developer-level alone. Success requires collaboration and coordination. Companies without a cohesive work tracking system that can span individual teams start to slow down. Requirements and dependencies become difficult to track and are often not meeting expectations which lead to rework and churn. Deliverables aren’t meeting the guesstimate timelines and drag on.

Making work visible is a core attribute to many different methodologies and processes, even the ad-hoc ones. If you don’t have a bird’s eye view of the engineering work happening at your company, what can you say about your situation? Very little. Try to ascertain the status of a given cross-team project without asking someone. If it takes you longer than 5 minutes, you’re in trouble and the people you would have asked don’t really know either. All of this work required to figure out a project status is wasting people’s time.

Work tracking is something that isn’t hard to introduce and provides value. It doesn’t require adding any extra work for developers, but starts to also provide value to team leads, project managers, and senior leadership.

“Not Jira!” The cry goes out across engineering. It doesn’t need to be Jira, but don’t hate a tool for being successful at what it does. Just because most companies don’t put enough effort into running Jira well doesn’t make work tracking tools bad in general. Pick something else — except fucking spreadsheets!

“You die a hero lightweight tool, or you live long enough to become the villain bloated enterprise-ready system


See also: Merits of Bug Tracking

Continuously Doing a Thing

Practice makes perfect — Anonymous Parent

A theme that keeps popping up in my world is the idea of how often an action is done being correlated to how well the action is done

  • Deploying application and system code
  • Releasing application distributions
  • Triaging issues
  • Testing product behavior
  • Creating objectives
  • Running experiments
  • Executing migrations

A lot has been written about high-performing engineering teams. Accelerate is a great resource for exploring the behaviors of such teams. Frequent deploys is one of the leading indicators and was chosen by the authors as a key metric. With enough practice, deploys become low-risk and low-stress.

Small batches are another trait of successful teams. Performing a deployment more frequently usually means there are fewer changes happening each time. These small batches can actually improve overall quality because fewer changes happen in each cycle.

Rotating a large group of people through activity shifts, like handling issue triage or the application release process, allows the group to share the burden, but there are downsides too. If the activity isn’t part of the group’s primary deliverable, it’s likely not a priority. If there are long stretches of time between any given person taking on the activity there might only be enough time to just do the work, but never think about how to improve the process or tooling. There is no time to become good at the process.

The DevOps Handbook talks a lot about the benefits of shorter feedback loops across many different aspects of engineering organizations. In most situations, shorter feedback loops happen when an activity becomes a more continuous process.

If you have an area that could be improved, maybe you could ask yourself if the process could happen more often.

Being an Effective Engineering Leader

I often wonder if I’m being effective at my job. Might be related to my impostor syndrome, but in engineering management, the signals of effectiveness aren’t always clear. I have some basic, high level criteria I try to think about monthly, or so, to provide some insight.

Providing a clear direction

Lack of clear direction can sometimes be seen when teams are doing medium-term/quarterly planning. If the objectives aren’t aligned with upper management, it’s probably my fault for not creating clear direction and expected outcomes. Try not to be too prescriptive, but make sure the goals are clearly defined.

Try to have a good narrative for each of these levels:

  • Vision: How the team(s) create impact
  • Mission: Role the team(s) within the company
  • Objectives: Think about the next year

Shipping what matters

Do the same problems keep coming up? Make sure we are prioritizing the right work. Make sure we are completing the work. Talk to senior engineers about problems that seem to be holding us back.

Focus on capabilities. Keep improving the operational capabilities of the company. Feature projects are built on capabilities.

Maintain a healthy mix of project sizes. Big projects can stall shipping momentum. Make sure big projects are broken into smaller milestones and iterations. Small projects might feel like low impact, but sometimes are just what people are asking to see.

Helping people grow

Have honest conversations about expectations and performance, and providing actionable feedback.

Make space for other people by getting out of the way. For projects and meetings where I’m getting invited as a point-of-contact, look for other people I can delegate the role to.

Surveys and Feedback Loops

Workplaces typically have company-wide engagement surveys to get feedback on many aspects. Those usually have a management section, and this feedback can be a gift. Interpreting feedback in a positive a way and not a personal attack might be a learned skill, but worth learning.

Thanks to Nick DiStefano for a reminder that manager surveys are also a useful way to get regular feedback on how things are going. Manager surveys can happen more frequently than company-wide engagement surveys and are usually more focused at the team-level.

Stability: Smarter Monitoring Application Crashes

I had posted about the way Tumblr uses time-series monitoring to alert on crash spikes in the Android and iOS applications. Since then, we’ve done a lot of work to reduce the overall volume of crashes. As a result, we created a new problem: it was possible for a handful of people, caught in crash cycles, to cause our stability alerts to trigger.

Once the stability alert is triggered, we typically start looking in the crash logging systems, like Crashlytics or Sentry, to find more information about the crash. We found an increasing number of occurrences where no particular crash could be easily identified as causing the spike.

Getting paged at 2am because of a stability alert is not great, but not finding a crash was even worse.

The problem was the way we were monitoring. Simply watching all crash events wasn’t good enough. We had to start normalizing the events across the users. Thankfully, we collect events and not simple ticks. We have rich data in the crash event, including a way to group events coming from the same device.

Here is an example of the two styles of monitoring over the last week:

raw count shows a large spike, but the unique device count shows a normal trend

That after-midnight Raw Count spike on Friday would have paged our on-call person if we hadn’t changed to alert on the Unique Device Count instead. We still use the Raw Counts to identify issues and investigate, but we don’t alert on them. We can use the high-cardinality events to zero-in on the cause of the spike. In this case, two (2) people were having a bad experience using their Cubot Echo devices.

Since moving to the new alerting metric, we’ve had far fewer after-hour pages, while still being able to focus on the stability of the applications across our user base.

Engineering Productivity: Being Actionable

There is a lot of information about engineering productivity out there. No one says it’s easy, but it can be downright difficult to turn the practices you hear about into plans you can put into action. What follows is an example of how we can create an actionable plan to increase our productivity.

Let’s define engineering productivity as how effectively your engineering team can get important and valuable work done.

  • How do you determine important and valuable work? — goals and objectives.
  • How do you effectively get work done? — remove wasted time and effort from the delivery cycle

Goals, Planning, and Prioritizing

If productivity is an organizational goal, you need to make sure people understand why and how it affects them. You need to communicate the message over and over in as many venues as possible. The more developers understand the goals and the direction, the more engaged they’ll be with the work.

Engineering teams should try to set ambitious goals, focused through the lens of the team’s mission statement. We also try to create measures for defining success — yes, this is the OKR framework, but any goal or strategy planning can be used. We try to keep goals (objectives) and measures (key results) from being project to-do lists. Projects are tasks we can use to move the measures. Goals are bigger than projects. 

Engineers need to clearly understand the importance of their work. Large backlogs of work create decision fatigue about what work to prioritize. Without planning and prioritizing, we can end up with teams that aren’t aligned — the opposite of productivity. Use your goals, even the high-level organizational goals, as a guide to prioritize work.

Removing Wasted Time and Effort

A great resource for exploring engineering performance is the book Accelerate. Based on years of research and collected data (State of DevOps reports), the book sets out to find a way to measure software delivery performance — and what drives it. Some important measures include:

  • Lead Time: time it takes to go from a customer making a request to the request being satisfied.
  • Deployment Frequency: frequency as a proxy for batch size since it is easy to measure and typically has low variability. In other words: smaller batches correlates with higher deploy frequency and higher quality.
  • Time to Restore: given software failures are expected, it makes more sense to measure how quickly teams recover from failure.
  • Change Fail Percentage: a proxy measure for quality throughout the process.

Each of these measures could be a goal we want to focus and improve. Each measure has an impact on our ability to deliver software faster with better quality. Let’s also call-out that these measures are somewhat overlapping and interdependent.

Creating an Action Plan

Picking a Goal

As an experiment, let’s take one, Lead Time, and see how we could brainstorm ways to improve it. In a different favorite book, The DevOps Handbook, we’re presented with ways to effect change in Lead Time. A short summary that does not do justice to the depth presented in the book:

  • Reduce toil with automation
  • Reduce number of hand-offs
  • Find and remove non-value time
  • Create fast and frequent feedback loops

Let’s think about what’s involved between filing a ticket to start work — to delivering the work to the end user? Many different tasks and activities happen within this cycle. This becomes the scope we can work within. Some high-level things come to mind:

  • Designing
  • Coding
  • Reviewing
  • Testing
  • Bug Fixing
  • Ramping
  • Monitoring

Picking Measures

We should be thinking about ways to measure success and failure of these activities. This should be independent of the work we intend to undertake. We can draw upon the pain and stumbles that have happened in the past. Finding good measurements can be a very hard process itself. Let’s not be unrealistic about our expectations on manual processes — we’re only human and people make mistakes. Think about ways to make it easy to succeed and hard to fail:

  • Find more defects in pre-release than post-release: We’re always going to have bugs, but let’s try to find and fix more of them before releasing.
  • Reduce the times a project gets bumped to next release: This happens a lot and for many different reasons. We should be better at hitting the desired timeline.
  • Reduce the time it takes people to be exposed to a feature release: It can take days or week for people to “see” new features appear in the apps when ramping a feature flag. This also makes A/B testing painful.
  • Reduce the times a feature flag is rolled back: Finding problems after we ramp a feature in production is costly, painful, and slows the release of the feature.
  • Reduce time to detect and time to mitigate incidents: We’ll always have breaking incidents, but we need to minimize the disruptions to people using the product. Minutes, not days.
  • Reduce amount of non-value time: It’s hard to say “code should be reviewed in X minutes”, or “bugs should be found in Y hours”, but it’s easier to identify dead-time in those activities.

Brainstorming Projects

With our objective and measures sketched out, let’s think about the activities and tasks we want to change. Some are manual. Many involve multiple teams. There are a lot of hand-offs. Let’s create smaller affinity groups based on the tasks and activities using the framework.

Reduce toil with automation

  • Fast and continuous integration/UI testing
  • Canary monitoring and alerting
  • Simple hands-off deployments
  • Easy low risk feature ramping

Reduce number of hand-offs by keeping cross-functional teams informed and involved

  • Spec and requirement generation
  • Test plan generation and updates
  • Pre-release testing setup

Find and remove non-value time, usually the gaps between stages

  • Fast edit/build/test cycles for developers
  • Timely code reviews
  • All code merges ready for QA next day
  • File new defect tickets ASAP
  • Prioritized pre-release defect tickets
  • Merging green code

Create fast and frequent feedback loops

  • Timely code reviews
  • Fast and continuous integration/UI testing
  • All code merges ready for QA next day
  • File new defect tickets ASAP
  • Fast, short feature ramps
  • Canary monitoring and alerting

This level of grouping is perfect to start brainstorming actual project ideas. We’ve started at an organization-level objective (Increase engineering productivity), focused on a contributing factor (Lead time), and created a nice list of projects that could be used to affect the factor. This is important — we’re not focused on a single large project! We have many potential small, diverse projects. This dramatically increases the probability that we will succeed, to some degree. A single project is an all-or-nothing situation and lowers your probability of success. Most projects fail to complete, for one reason or another.

We also see that some idea groups appear multiple times. This allows us to leverage work to create impact in more ways.


If you take anything away from this post, I hope it’s that improving engineering productivity is an actionable goal. We can be systematic and measure results.

Accelerate and The DevOps Handbook cover a lot more than what I’ve presented here. The information on organizational culture and its effects on performance are also very enlightening. I’d recommend both books to anyone who wants to learn more about ways to improve engineering productivity.

Integration Testing: Time to Reboot

I tried to push a plan for bootstrapping an automated integration test system for our Android and iOS applications. The plan was based on similar strategies I’d used, or seen used, at other companies. It didn’t fit well with the current situation and workflows. I failed to take those differences into account and the initiative failed.

Developers never saw the value of using their time and workflow on the automated integration tests. Test engineers were overwhelmed by the amount of manual regression tests required for each release. Even with the manual tests, we have gaps in regression coverage leading to some severe defects shipped to users. We are wasting valuable manual testing time on hundreds of manual regression tests that rarely break, when we should be focusing those people on new feature and exploratory testing.

Looking Forward

We want to be able to do automated testing on our iOS and Android client apps. From the simplest type of “does the application start” smoke test to more complicated tests around critical features and functionality. More test automation means:

  1. Finding bugs faster
  2. Focusing manual testing on high value tasks (new features and exploratory testing)
  3. Shipping releases faster with higher quality

Test engineering is highly motivated to do more integration testing as a way to reduce the number of manual regression test cases. Though they don’t have development experience, those folks want to start creating the tests. So we want to keep the barrier to writing tests very low. As we automate regression tests, we want to focus on manual testing on new features, exploratory, and ad-hoc edge case testing.

Objectives for our automated integration testing reboot:

  • Doesn’t require knowledge of building applications or the languages used to develop the applications.
  • Requires little knowledge of the structure used to build the UI of the applications.
  • Reuse integration testing framework, code, and knowledge across all application platforms.
  • Reduce the amount of manual integration testing as much as possible.

Approach

We intend to use a black-box approach to installing, launching, and driving the applications. The plan includes:

  • Using python-based Appium scripts as the framework for integration tests. Python is a good entry-level programming language and Appium has capabilities to black-box test Android and iOS clients. We’re leveraging the same language and framework for both mobile platforms.
  • Using emulators & simulators to run smoke and integration tests. Easy to setup and run locally, while also capable in CI.
  • Run the tests several times a day using CI, but not on each PR. The focus is on reducing manual regression testing, while not adding friction to developer workflows.
  • Only send consistently failing tests to QA for manual verification and ticket filing.

Milestones

Milestone 1 is about creating a solid foundation for the approach. We’ve completed the proof of concept:

  • Test engineers are building scripts using cross-platform tools — and learning to code.
  • Developers have added a few testing hooks into the clients to allow faster, more robust tests.
  • Python scripts have been created for over 200 integration tests using Appium
  • Tests are running in CI several times a day.
  • Test engineering created a simple system to send consistent failures to QA.
  • Reliability is better than the previous Espresso/XCUITest test suite.

We’ve already saved several tester-hours a day from manual regression testing.

Milestone 2 is really just expanding the test coverage from only high priority test cases to medium and even low priority test cases. We’re also expanding the tooling to support running the Appium tests on both Alpha and Beta channels, as well as self-service support for running on pull requests. Some additional tasks:

  • Get better at controlling feature flags for more deterministic test flows
  • Start mocking API responses for faster testing and less variations due to live data
  • Intercept outgoing requests to track and verify more analytics
  • Create a smaller, faster suite for PR testing