Tracking Work is Fundamental

“Developers should only need Github Issues and Pull Requests to do their job” — Why should anyone need more than that to track work?

Small companies and startups have small engineering teams. The amount of effort required to understand the ongoing and planned work is low due to sheer lack of ability to take on too much and succeed. Failure weeds out the companies that take on too much, too soon.

Companies succeed and grow, and so do the engineering teams. At some point, multiple engineering teams are created or evolve. Ideally, these teams are self-sufficient and isolated from each other, creating modular and decoupled output. This ideal state rarely lasts and soon cross-team projects start to appear. Teams continue to evolve into product and platform functions, creating more opportunities for cross-team dependencies.

At this point, work can no longer be tracked at the developer-level alone. Success requires collaboration and coordination. Companies without a cohesive work tracking system that can span individual teams start to slow down. Requirements and dependencies become difficult to track and are often not meeting expectations which lead to rework and churn. Deliverables aren’t meeting the guesstimate timelines and drag on.

Making work visible is a core attribute to many different methodologies and processes, even the ad-hoc ones. If you don’t have a bird’s eye view of the engineering work happening at your company, what can you say about your situation? Very little. Try to ascertain the status of a given cross-team project without asking someone. If it takes you longer than 5 minutes, you’re in trouble and the people you would have asked don’t really know either. All of this work required to figure out a project status is wasting people’s time.

Work tracking is something that isn’t hard to introduce and provides value. It doesn’t require adding any extra work for developers, but starts to also provide value to team leads, project managers, and senior leadership.

“Not Jira!” The cry goes out across engineering. It doesn’t need to be Jira, but don’t hate a tool for being successful at what it does. Just because most companies don’t put enough effort into running Jira well doesn’t make work tracking tools bad in general. Pick something else — except fucking spreadsheets!

“You die a hero lightweight tool, or you live long enough to become the villain bloated enterprise-ready system


See also: Merits of Bug Tracking

Continuously Doing a Thing

Practice makes perfect — Anonymous Parent

A theme that keeps popping up in my world is the idea of how often an action is done being correlated to how well the action is done

  • Deploying application and system code
  • Releasing application distributions
  • Triaging issues
  • Testing product behavior
  • Creating objectives
  • Running experiments
  • Executing migrations

A lot has been written about high-performing engineering teams. Accelerate is a great resource for exploring the behaviors of such teams. Frequent deploys is one of the leading indicators and was chosen by the authors as a key metric. With enough practice, deploys become low-risk and low-stress.

Small batches are another trait of successful teams. Performing a deployment more frequently usually means there are fewer changes happening each time. These small batches can actually improve overall quality because fewer changes happen in each cycle.

Rotating a large group of people through activity shifts, like handling issue triage or the application release process, allows the group to share the burden, but there are downsides too. If the activity isn’t part of the group’s primary deliverable, it’s likely not a priority. If there are long stretches of time between any given person taking on the activity there might only be enough time to just do the work, but never think about how to improve the process or tooling. There is no time to become good at the process.

The DevOps Handbook talks a lot about the benefits of shorter feedback loops across many different aspects of engineering organizations. In most situations, shorter feedback loops happen when an activity becomes a more continuous process.

If you have an area that could be improved, maybe you could ask yourself if the process could happen more often.

Being an Effective Engineering Leader

I often wonder if I’m being effective at my job. Might be related to my impostor syndrome, but in engineering management, the signals of effectiveness aren’t always clear. I have some basic, high level criteria I try to think about monthly, or so, to provide some insight.

Providing a clear direction

Lack of clear direction can sometimes be seen when teams are doing medium-term/quarterly planning. If the objectives aren’t aligned with upper management, it’s probably my fault for not creating clear direction and expected outcomes. Try not to be too prescriptive, but make sure the goals are clearly defined.

Try to have a good narrative for each of these levels:

  • Vision: How the team(s) create impact
  • Mission: Role the team(s) within the company
  • Objectives: Think about the next year

Shipping what matters

Do the same problems keep coming up? Make sure we are prioritizing the right work. Make sure we are completing the work. Talk to senior engineers about problems that seem to be holding us back.

Focus on capabilities. Keep improving the operational capabilities of the company. Feature projects are built on capabilities.

Maintain a healthy mix of project sizes. Big projects can stall shipping momentum. Make sure big projects are broken into smaller milestones and iterations. Small projects might feel like low impact, but sometimes are just what people are asking to see.

Helping people grow

Have honest conversations about expectations and performance, and providing actionable feedback.

Make space for other people by getting out of the way. For projects and meetings where I’m getting invited as a point-of-contact, look for other people I can delegate the role to.

Surveys and Feedback Loops

Workplaces typically have company-wide engagement surveys to get feedback on many aspects. Those usually have a management section, and this feedback can be a gift. Interpreting feedback in a positive a way and not a personal attack might be a learned skill, but worth learning.

Thanks to Nick DiStefano for a reminder that manager surveys are also a useful way to get regular feedback on how things are going. Manager surveys can happen more frequently than company-wide engagement surveys and are usually more focused at the team-level.

Stability: Smarter Monitoring Application Crashes

I had posted about the way Tumblr uses time-series monitoring to alert on crash spikes in the Android and iOS applications. Since then, we’ve done a lot of work to reduce the overall volume of crashes. As a result, we created a new problem: it was possible for a handful of people, caught in crash cycles, to cause our stability alerts to trigger.

Once the stability alert is triggered, we typically start looking in the crash logging systems, like Crashlytics or Sentry, to find more information about the crash. We found an increasing number of occurrences where no particular crash could be easily identified as causing the spike.

Getting paged at 2am because of a stability alert is not great, but not finding a crash was even worse.

The problem was the way we were monitoring. Simply watching all crash events wasn’t good enough. We had to start normalizing the events across the users. Thankfully, we collect events and not simple ticks. We have rich data in the crash event, including a way to group events coming from the same device.

Here is an example of the two styles of monitoring over the last week:

raw count shows a large spike, but the unique device count shows a normal trend

That after-midnight Raw Count spike on Friday would have paged our on-call person if we hadn’t changed to alert on the Unique Device Count instead. We still use the Raw Counts to identify issues and investigate, but we don’t alert on them. We can use the high-cardinality events to zero-in on the cause of the spike. In this case, two (2) people were having a bad experience using their Cubot Echo devices.

Since moving to the new alerting metric, we’ve had far fewer after-hour pages, while still being able to focus on the stability of the applications across our user base.

Engineering Productivity: Being Actionable

There is a lot of information about engineering productivity out there. No one says it’s easy, but it can be downright difficult to turn the practices you hear about into plans you can put into action. What follows is an example of how we can create an actionable plan to increase our productivity.

Let’s define engineering productivity as how effectively your engineering team can get important and valuable work done.

  • How do you determine important and valuable work? — goals and objectives.
  • How do you effectively get work done? — remove wasted time and effort from the delivery cycle

Goals, Planning, and Prioritizing

If productivity is an organizational goal, you need to make sure people understand why and how it affects them. You need to communicate the message over and over in as many venues as possible. The more developers understand the goals and the direction, the more engaged they’ll be with the work.

Engineering teams should try to set ambitious goals, focused through the lens of the team’s mission statement. We also try to create measures for defining success — yes, this is the OKR framework, but any goal or strategy planning can be used. We try to keep goals (objectives) and measures (key results) from being project to-do lists. Projects are tasks we can use to move the measures. Goals are bigger than projects. 

Engineers need to clearly understand the importance of their work. Large backlogs of work create decision fatigue about what work to prioritize. Without planning and prioritizing, we can end up with teams that aren’t aligned — the opposite of productivity. Use your goals, even the high-level organizational goals, as a guide to prioritize work.

Removing Wasted Time and Effort

A great resource for exploring engineering performance is the book Accelerate. Based on years of research and collected data (State of DevOps reports), the book sets out to find a way to measure software delivery performance — and what drives it. Some important measures include:

  • Lead Time: time it takes to go from a customer making a request to the request being satisfied.
  • Deployment Frequency: frequency as a proxy for batch size since it is easy to measure and typically has low variability. In other words: smaller batches correlates with higher deploy frequency and higher quality.
  • Time to Restore: given software failures are expected, it makes more sense to measure how quickly teams recover from failure.
  • Change Fail Percentage: a proxy measure for quality throughout the process.

Each of these measures could be a goal we want to focus and improve. Each measure has an impact on our ability to deliver software faster with better quality. Let’s also call-out that these measures are somewhat overlapping and interdependent.

Creating an Action Plan

Picking a Goal

As an experiment, let’s take one, Lead Time, and see how we could brainstorm ways to improve it. In a different favorite book, The DevOps Handbook, we’re presented with ways to effect change in Lead Time. A short summary that does not do justice to the depth presented in the book:

  • Reduce toil with automation
  • Reduce number of hand-offs
  • Find and remove non-value time
  • Create fast and frequent feedback loops

Let’s think about what’s involved between filing a ticket to start work — to delivering the work to the end user? Many different tasks and activities happen within this cycle. This becomes the scope we can work within. Some high-level things come to mind:

  • Designing
  • Coding
  • Reviewing
  • Testing
  • Bug Fixing
  • Ramping
  • Monitoring

Picking Measures

We should be thinking about ways to measure success and failure of these activities. This should be independent of the work we intend to undertake. We can draw upon the pain and stumbles that have happened in the past. Finding good measurements can be a very hard process itself. Let’s not be unrealistic about our expectations on manual processes — we’re only human and people make mistakes. Think about ways to make it easy to succeed and hard to fail:

  • Find more defects in pre-release than post-release: We’re always going to have bugs, but let’s try to find and fix more of them before releasing.
  • Reduce the times a project gets bumped to next release: This happens a lot and for many different reasons. We should be better at hitting the desired timeline.
  • Reduce the time it takes people to be exposed to a feature release: It can take days or week for people to “see” new features appear in the apps when ramping a feature flag. This also makes A/B testing painful.
  • Reduce the times a feature flag is rolled back: Finding problems after we ramp a feature in production is costly, painful, and slows the release of the feature.
  • Reduce time to detect and time to mitigate incidents: We’ll always have breaking incidents, but we need to minimize the disruptions to people using the product. Minutes, not days.
  • Reduce amount of non-value time: It’s hard to say “code should be reviewed in X minutes”, or “bugs should be found in Y hours”, but it’s easier to identify dead-time in those activities.

Brainstorming Projects

With our objective and measures sketched out, let’s think about the activities and tasks we want to change. Some are manual. Many involve multiple teams. There are a lot of hand-offs. Let’s create smaller affinity groups based on the tasks and activities using the framework.

Reduce toil with automation

  • Fast and continuous integration/UI testing
  • Canary monitoring and alerting
  • Simple hands-off deployments
  • Easy low risk feature ramping

Reduce number of hand-offs by keeping cross-functional teams informed and involved

  • Spec and requirement generation
  • Test plan generation and updates
  • Pre-release testing setup

Find and remove non-value time, usually the gaps between stages

  • Fast edit/build/test cycles for developers
  • Timely code reviews
  • All code merges ready for QA next day
  • File new defect tickets ASAP
  • Prioritized pre-release defect tickets
  • Merging green code

Create fast and frequent feedback loops

  • Timely code reviews
  • Fast and continuous integration/UI testing
  • All code merges ready for QA next day
  • File new defect tickets ASAP
  • Fast, short feature ramps
  • Canary monitoring and alerting

This level of grouping is perfect to start brainstorming actual project ideas. We’ve started at an organization-level objective (Increase engineering productivity), focused on a contributing factor (Lead time), and created a nice list of projects that could be used to affect the factor. This is important — we’re not focused on a single large project! We have many potential small, diverse projects. This dramatically increases the probability that we will succeed, to some degree. A single project is an all-or-nothing situation and lowers your probability of success. Most projects fail to complete, for one reason or another.

We also see that some idea groups appear multiple times. This allows us to leverage work to create impact in more ways.


If you take anything away from this post, I hope it’s that improving engineering productivity is an actionable goal. We can be systematic and measure results.

Accelerate and The DevOps Handbook cover a lot more than what I’ve presented here. The information on organizational culture and its effects on performance are also very enlightening. I’d recommend both books to anyone who wants to learn more about ways to improve engineering productivity.

Integration Testing: Time to Reboot

I tried to push a plan for bootstrapping an automated integration test system for our Android and iOS applications. The plan was based on similar strategies I’d used, or seen used, at other companies. It didn’t fit well with the current situation and workflows. I failed to take those differences into account and the initiative failed.

Developers never saw the value of using their time and workflow on the automated integration tests. Test engineers were overwhelmed by the amount of manual regression tests required for each release. Even with the manual tests, we have gaps in regression coverage leading to some severe defects shipped to users. We are wasting valuable manual testing time on hundreds of manual regression tests that rarely break, when we should be focusing those people on new feature and exploratory testing.

Looking Forward

We want to be able to do automated testing on our iOS and Android client apps. From the simplest type of “does the application start” smoke test to more complicated tests around critical features and functionality. More test automation means:

  1. Finding bugs faster
  2. Focusing manual testing on high value tasks (new features and exploratory testing)
  3. Shipping releases faster with higher quality

Test engineering is highly motivated to do more integration testing as a way to reduce the number of manual regression test cases. Though they don’t have development experience, those folks want to start creating the tests. So we want to keep the barrier to writing tests very low. As we automate regression tests, we want to focus on manual testing on new features, exploratory, and ad-hoc edge case testing.

Objectives for our automated integration testing reboot:

  • Doesn’t require knowledge of building applications or the languages used to develop the applications.
  • Requires little knowledge of the structure used to build the UI of the applications.
  • Reuse integration testing framework, code, and knowledge across all application platforms.
  • Reduce the amount of manual integration testing as much as possible.

Approach

We intend to use a black-box approach to installing, launching, and driving the applications. The plan includes:

  • Using python-based Appium scripts as the framework for integration tests. Python is a good entry-level programming language and Appium has capabilities to black-box test Android and iOS clients. We’re leveraging the same language and framework for both mobile platforms.
  • Using emulators & simulators to run smoke and integration tests. Easy to setup and run locally, while also capable in CI.
  • Run the tests several times a day using CI, but not on each PR. The focus is on reducing manual regression testing, while not adding friction to developer workflows.
  • Only send consistently failing tests to QA for manual verification and ticket filing.

Milestones

Milestone 1 is about creating a solid foundation for the approach. We’ve completed the proof of concept:

  • Test engineers are building scripts using cross-platform tools — and learning to code.
  • Developers have added a few testing hooks into the clients to allow faster, more robust tests.
  • Python scripts have been created for over 200 integration tests using Appium
  • Tests are running in CI several times a day.
  • Test engineering created a simple system to send consistent failures to QA.
  • Reliability is better than the previous Espresso/XCUITest test suite.

We’ve already saved several tester-hours a day from manual regression testing.

Milestone 2 is really just expanding the test coverage from only high priority test cases to medium and even low priority test cases. We’re also expanding the tooling to support running the Appium tests on both Alpha and Beta channels, as well as self-service support for running on pull requests. Some additional tasks:

  • Get better at controlling feature flags for more deterministic test flows
  • Start mocking API responses for faster testing and less variations due to live data
  • Intercept outgoing requests to track and verify more analytics
  • Create a smaller, faster suite for PR testing

Shipping Faster: The Hackday Mentality

We recently held our Summer Hackday at Tumblr and the results were impressive. I started to think about the mentality of a Hackday and how it differs from a more traditional product feature workflow.

It’s amazing what can be accomplished in a day.

Individuals or small groups start planning their projects in the days leading up to the event. On the day of the event, they’re off and running. They have 24 hours to get something working and demo it to the rest of the company.

Dead-end technical approaches are quickly discarded for alternatives, usually something more simple — the clock is ticking. There is no time for complexity or grand schemes. Get the basics working so you can impress your coworkers.

hackometer
Tumblr Hackometer measures completeness & shipping potential

After the demo presentations, there is always discussion about “how close this project is to shipping?” or “what’s left to do before we could release this project?” or some other notion that we could productize certain projects after a little clean-up work.

Productize — The death knell of the Hackday project. But why? I think it’s scope creep. Scope of the purpose, but also scope of the code.

The constraint of limited time is a gift, forcing or removing decisions which create a better environment for completing the project. Hackday projects are often more aligned with the core purpose of the product as well.

  • Focus on a singular purpose — try to be good at one thing.
  • No time or space for complexity — you can’t build whole new architectures.
  • Built on existing frameworks, patterns, and primitives — it fits into the existing product structure.

The Hackday mentality seems like a better process for building better products. It reminds me of the “fixed time, variable scope” principle from Basecamp’s Shape Up, a book describing their product process. They use six week time-boxes for any project.

Constraints limit our options without requiring us to do any of the cognitive work. With fewer decisions involved when we’re constrained, we’re less prone to decision fatigue. Constraints can actually speed up development.

Ship faster.

Thoughts on Organizational Culture

I’ve been thinking a lot about why it seems so hard to effect change in organizations. The change I’m referring to could be related to product strategy, processes, or improving engineering / operational excellence.

I’ve come to realize that in many situations our efforts and plans don’t always align with the organization’s culture. When that happens, change is difficult.

I’m using culture here to mean something deeper than espresso machines, foosball tables, and edgy office decor — the visible parts of an organization’s culture. I’m talking about an organization’s beliefs, values, and basic assumptions — the things people take for granted and guide decisions. These may have started from the founders, but they’ve evolved over time as we praise and recognize specific behavior.

From Edgar Schein’s “Organizational Culture and Leadership“:

The only thing of real importance that leaders do is to create and manage culture. If you do not manage culture, it manages you, and you may not even be aware of the extent to which this is happening.

We need to become aware of the organization’s culture and learn to manage it in the direction of our desired outcomes.

From Schein’s framework for changing culture:

Change creates learning anxiety. The higher the learning anxiety, the stronger the resistance.

  • The only way to overcome resistance is to reduce the learning anxiety by making the learner feel “psychologically safe”.
  • The change goal must be defined concretely in terms of the specific problem you are trying to fix, not as culture change.

I find myself trying to learn what people value about an existing behavior and how it relates to a purpose or mission. If I want to change to a different behavior, I must show a higher value in the new behavior. This can sometimes be easier if I can create an association to our existing basic assumptions.

From a Kellan Elliott-McCrea post:

Culture is what you celebrate. Rituals are the tools you use to shape culture

Celebrate work and actions that align with strategy. We need to reinforce what we think is important. Reinforcement requires consistent messaging.

  • Create a brief and to the point mission or high-level purpose
  • Establish a few simple & crisp principles that support the mission

Use these as a framework to scope & define objectives and strategy. They also provide a foundation for shaping culture.

 

Thoughts on Dependency Injection

I hear some strong opinions on dependency injection (DI). I’ve never really thought too much about DI specifically, but it is part of an Inversion of Control strategy, which I think about a lot.

Focus on the developer experience, low-friction maintenance and code health outcomes. What’s important to me:

  • Loosely coupled code
  • Easy to test code
  • Simple code
  • Easy to maintain code

Many folks seem to focus on constructor or method based DI. I agree the approach works great for shallow code hierarchies. I’d argue that loosely coupled, easily testable code requires constructor/method DI. Trying to inject everything across deep call stacks can get painful the deeper you go. It creates friction for developers trying to update code, possibly inhibiting code health refactors.

Singletons are usually considered pure evil — hiding code details, creating global state, and making it difficult to test code. That said, they work nicely for accessing basic services and configuration from anywhere.

Service locators can sit somewhere in between the pure DI and singletons. Pretty easy to swap concrete and mock services, but you are adding a single dependency wherever it’s used. TBH, I think of things like Dagger as annotation-based service locator tools.

The tl;dr is that code gets complicated and instead of being too idealistic on the implementation details, know when to be pragmatic and focus on the higher-level objectives. Consider the pros & cons of different approaches. Legacy code is not an ideal situation, but one you need to handle. It’s a rare treat when you get to work in brand new code. This means you need to make pragmatic choices.

  • Make the best compromise to increase the testability of your code.
  • Avoid thousand line code changes to add logging or a network check in one place.
  • Keep the code simple and clean, focusing on a code-maintenance point of view.

Don’t blindly follow to idealistic dogma. Make choices that deliver the best impact.

Web ADB: Simple Web-based Access to Devices

I’ve had a number of occasions where I needed direct access to an Android device that wasn’t connected to the computer in front of me. I can usually SSH into the remote host machine and use ADB to try to debug the situation. If the simple stuff doesn’t work, I eventually start using ADB screencap to get a look at what’s on the device. If I’m lucky, I can remote desktop to the host. If not, I end up copying the images back to my machine to view them.

Remote Situation
Connecting to a remote host with some Android devices attached to it

Surely there is must be an easier way.

There is! I found the OpenSTF project which basically allows you to get web-based control of the Android and iOS devices. Just install the system on the host machine and install an agent on the Android devices. It looks pretty cool, but always seemed like overkill when I was in a remote debug situation.

So I decided I’d start hacking together a really simple system in Python. I started with the simplest Python API server I could find. Then I added a fairly basic webapp front-end. The result is Web ADB.

It’s a very minimal Python API server, which also serves up a basic single-page webapp. The approach is pretty simple: run ADB commands via Python, parse the output, send the results back through the API response.

The API supports getting attached devices, getting a screenshot of a device, sending key presses and screen taps, and even rebooting a device. The webapp just uses the API to make something useful. Maybe the only cool feature is clicking on a screenshot will send a tap to the device, then update the screenshot. I have some ideas for other features, as time permits.