On the Merits of Bug Tracking

Yes, I wrote a whole blog post about bugs. Bugs are boring and managing bugs can be mind-numbing. However, all software has bugs and managing those bugs helps you understand the health and quality of your software, helps you understand the risk associated with new features, and helps you figure out if you’re ready to ship or not.

At some point in our careers, developers have a desire to fix all bugs before releasing. This might work for small projects or in situations where you don’t have a lot of testing. For larger projects, especially as projects mature, it’s just not possible to fix all the bugs before releasing an new version, so it’s time to manage your backlog. Creating good bugs helps reduce the time it takes to manage and fix bugs.

  • Summary – Be explicit and contextual. This is the text that shows up in bug lists. Something too vague, like “Crash when posting” will require people to always open the details to figure out the context.
  • Steps to reproduce – Be clear and precise. What was the expected behavior vs actual behavior? Does it happen all the time or is it intermittent and hard to reproduce?
  • Description – Is the bug a crash, broken behavior, performance or regression? Make sure you add these details. For UI related issues, add a screenshot or video. Can you provide a minimal test case for the issue?

You can’t fix them all, so it’s time to triage. Bug debt contributes to the risk of shipping, so you need to manage the set of bugs like many other aspects of your development process. Don’t be frightened by a large quantity of bugs in your backlog. It just means people are testing your software, which is a good thing.

Bug triage is the process of going through the list to find bugs that need assistance, escalation, or follow-up. This is usually done in a group, but sometimes individually to clean incoming bugs. Through this process the nastiest, riskiest bugs are identified.

  • Prioritize – Don’t guess. Use a decision tree, or some other system, to determine a real priority.
  • Estimate – Don’t guess. If it’s too hard to figure out, you should break the work up into smaller tasks. Link those sub-tasks back to the original bug.
  • Adjust – Bug metadata is not set in stone. Situations change over time, so can the bug priority.

Bugs have their own social networks. New code always spawns bugs so link those regressions back to the source feature or fix. Link duplicates back to the original issue. Sometimes those are not 100% duplicates and it’s good to verify all the duplicates are really fixed. Link code landings back to bugs. Code archeology is a real thing so make it easier by creating a map of bugs to code. You should be able to start with a line of code and easily find out why/when it was added. You should also be able to start with a bug and find the code used to fix the issue.

The bug metadata should be factual, but separate from the decision to ship. Don’t lower a bug priority just to make the decision to ship easier for someone else. Let those people own the decision to ship with a known level of quality.

Triage helps keep bug status up to date, which is how real-time roadmaps are created. In a time-oriented release schedule, release roadmaps can change because some features aren’t ready to ship. When the enough code lands and regressions that need to be fixed are fixed, a feature is ready to ship. Triaging bugs and managing feature status frequently allows you to be proactive about changes in roadmaps, not reactive.

Pitching Ideas – It’s Not About Perfect

I realized a long time ago that I was not the type of person who could create, build & polish ideas all by myself. I need collaboration with others to hone and build ideas. More than not, I’m not the one who starts the idea. I pick up something from someone else – bend it, twist it, and turn it into something different.

Like many others, I have a problem with ‘fear of rejection’, which kept me from shepherding my ideas from beginning to shipped. If I couldn’t finish the idea myself or share it within my trusted circle, the idea would likely die. I had most successes when sharing ideas with others. I have been working to increase the size of the trusted circle, but it still has limits.

Some time last year, Mozilla was doing some annual planning for 2016 and Mark Mayo suggested creating informal pitch documents for new ideas, and we’d put those into the planning process. I created a simple template and started turning ideas into pitches, sending the documents out to a large (it felt large to me) list of recipients. To people who were definitely outside my circle.

The world didn’t end. In fact, it’s been a very positive experience, thanks in large part to the quality of the people I work with. I don’t get worried about feeling the idea isn’t ready for others to see. I get to collaborate at a larger scale.

Writing the ideas into pitches also forces me to get a clear message, define objectives & outcomes. I have 1x1s with a variety of folks during the week, and we end up talking about the idea, allowing me to further build and hone the document before sending it out to a larger group.

I’m hooked! These days, I send out pitches quite often. Maybe too often?

Firefox on Mobile: A/B Testing and Staged Rollouts

We have decided to start running A/B Testing in Firefox for Android. These experiments are intended to optimize specific outcomes, as well as, inform our long-term design decisions. We want to create the best Firefox experience we can, and these experiments will help.

The system will also allow us to throttle the release of features, called staged rollout or feature toggles, so we can monitor new features in a controlled manner across a large user base and a fragmented device ecosystem. If we need to rollback a feature for some reason, we’d have the ability to do that, quickly without needing people to update software.

Technical details:

  • Mozilla Switchboard is used to control experiment segmenting and staged rollout.
  • UI Telemetry is used to collect metrics about an experiment.
  • Unified Telemetry is used to track active experiments so we can correlate to application usage.

What is Mozilla Switchboard?

Mozilla Switchboard is based on Switchboard, an open source SDK for doing A/B testing and staged rollouts from the folks at KeepSafe. It connects to a server component, which maintains a list of active experiments.

The SDK does create a UUID, which is stored on the device. The UUID is sent to the server, which uses it to “bucket” the client, but the UUID is never stored on the server. In fact, the server does not store any data. The server we are using was ported to Node from PHP and is being hosted by Mozilla.

We decided to start using Switchboard because it’s simple, open source, has client code for Android and iOS, saves no data on the server and can be hosted by Mozilla.

Planning Experiments

The Mobile Product and UX teams are the primary drivers for creating experiments, but as is common on the Mobile team, ideas can come from anywhere. We have been working with the Mozilla Growth team, getting a better understanding of how to design the experiments and analyze the metrics. UX researchers also have input into the experiments.

Once Product and UX complete the experiment design, Development would land code in Firefox to implement the desired variations of the experiment. Development would also land code in the Switchboard server to control the configuration of the experiment: On what channels is it active? How are the variations distributed across the user population?

Since we use Telemetry to collect metrics on the experiments, the Beta channel is likely our best time period to run experiments. Telemetry is on by default on Nightly, Aurora and Beta; and Beta is the largest user base of those three channels.

Once we decide which variation of the experiment is the “winner”, we’ll change the Switchboard server configuration for the experiment so that 100% of the user base will flow through the winning variation.

Yes, a small percentage of the Release channel has Telemetry enabled, but it might be too small to be useful for experimentation. Time will tell.

What’s Happening Now?

We are trying to be very transparent about active experiments and staged rollouts. We have a few active experiments right now.

  • Onboarding A/B experiment with several variants.
  • Easy entry points for accessing History and Bookmarks on the main menu.
  • Experimenting with the awesomescreen behavior when displaying search results page.

You can always look at the Mozilla Switchboard configuration to see what’s happening. Over time, we’ll be adding support to Firefox for iOS as well.

An Engineer’s Guide to App Metrics

Building and shipping a successful product takes more than raw engineering. I have been posting a bit about using Telemetry to learn about how people interact with your application so you can optimize use cases. There are other types of data you should consider too. Being aware of these metrics can help provide a better focus for your work and, hopefully, have a bigger impact on the success of your product.

Active Users

This includes daily active users (DAUs) and monthly active users (MAUs). How many people are actively using the product within a time-span? At Mozilla, we’ve been using these for a long time. From what I’ve read, these metrics seem less important when compared to some of the other metrics, but they do provide a somewhat easy to measure indicator of activity.

These metrics don’t give a good indication of how much people use the product though. I have seen a variation metric called DAU/MAU (daily divided by monthly) and gives something like retention or engagement. DAU/MAU rates of 50% are seen as very good.

Engagement

This metric focuses on how much people really use the product, typically tracking the duration of session length or time spent using the application. The amount of time people spend in the product is an indication of stickiness. Engagement can also help increase retention. Mozilla collects data on session length now, but we need to start associating metrics like this with some of our experiments to see if certain features improve stickiness and keep people using the application.

We look for differences across various facets like locales and releases, and hopefully soon, across A/B experiments.

Retention / Churn

Based on what I’ve seen, this is the most important category of metrics. There are variations in how these metrics can be defined, but they cover the same goal: Keep users coming back to use your product. Again, looking across facets, like locales, can provide deeper insight.

Rolling Retention: % of new users return in the next day, week, month
Fixed Retention: % of this week’s new users still engaged with the product over successive weeks.
Churn: % of users who leave divided by the number of total users

Most analysis tools, like iTunes Connect and Google Analytics, use Fixed Retention. Mozilla uses Fixed Retention with our internal tools.

I found some nominal guidance (grain of salt required):
1-week churn: 80% bad, 40% good, 20% phenomenal
1-week retention: 25% baseline, 45% good, 65% great

Cost per Install (CPI)

I have also seen this called Customer Acquisition Cost (CAC), but it’s basically the cost (mostly marketing or pay-to-play pre-installs) of getting a person to install a product. I have seen this in two forms: blended – where ‘installs’ are both organic and from campaigns, and paid – where ‘installs’ are only those that come from campaigns. It seems like paid CPI is the better metric.

Lower CPI is better and Mozilla has been using Adjust with various ad networks and marketing campaigns to figure out the right channel and the right messaging to get Firefox the most installs for the lowest cost.

Lifetime Value (LTV)

I’ve seen this defined as the total value of a customer over the life of that customer’s relationship with the company. It helps determine the long-term value of the customer and can help provide a target for reasonable CPI. It’s weird thinking of “customers” and “value” when talking about people who use Firefox, but we do spend money developing and marketing Firefox. We also get revenue, maybe indirectly, from those people.

LTV works hand-in-hand with churn, since the length of the relationship is inversely proportional to the churn. The longer we keep a person using Firefox, the higher the LTV. If CPI is higher than LTV, we are losing money on user acquisition efforts.

Total Addressable Market (TAM)

We use this metric to describe the size of a potential opportunity. Obviously, the bigger the TAM, the better. For example, we feel the TAM (People with kids that use Android tablets) for Family Friendly Browsing is large enough to justify doing the work to ship the feature.

Net Promoter Score (NPS)

We have seen this come up in some surveys and user research. It’s suppose to show how satisfied your customers are with your product. This metric has it’s detractors though. Many people consider it a poor value, but it’s still used quiet a lot.

NPS can be as low as -100 (everybody is a detractor) or as high as +100 (everybody is a promoter). An NPS that is positive (higher than zero) is felt to be good, and an NPS of +50 is excellent.

Go Forth!

If you don’t track any of these metrics for your applications, you should. There are a lot of off-the-shelf tools to help get you started. Level-up your engineering game and make a bigger impact on the success of your application at the same time.

Random Management: Unblocking Technical Leadership

I’ve been an Engineering Manager for a while now, but for many years I filled a Developer role. I have done a lot of coding over the years. I still try to do a little coding every now and then. Because of my past as a developer, I could be oppressive to senior developers on my teams. When making decisions, I found myself providing both the management viewpoint and the technical viewpoint. This usually means I was keeping a perfectly qualified technical person from participating at a higher level of responsibility. This creates an unhealthy technical organization with limited career growth opportunities.

As a manager with a technical background, I found it difficult to separate the two roles, but admitting there was a problem was a good first step. Over the last few years, I have been trying to get better at creating more room for technical people to grow on my teams. It seems to be more about focusing on outcomes for them to target, finding opportunities for them to tackle, listening to what they are telling me, and generally staying out of the way.

Another thing to keep in mind, it’s not just an issue with management. The technical growth track is a lot like a ladder: Keep developers climbing or everyone can get stalled. We need to make sure Senior Developers are working on suitable challenges or they end up taking work away from Junior Developers.

I mentioned this previously, but it’s important to create a path for technical leadership. With that in mind, I’m really happy about the recently announced Firefox Technical Architects Group. Creating challenges for our technical leadership, and roles with more responsibility and visibility. I’m also interested to see if we get more developers climbing the ladder.

Random Thoughts on Engineering Management

I have ended up managing people at the last three places I’ve worked, over the last 18 years. I can honestly say that only in the last few years have I really started to embrace the job of managing. Here’s a collection of thoughts and observations:

Growth: Ideas and Opinions and Failures

Expose your team to new ideas and help them create their own voice. When people get bored or feel they aren’t growing, they’ll look elsewhere. Give people time to explore new concepts, while trying to keep results and outcomes relevant to the project.

Opinions are not bad. A team without opinions is bad. Encourage people to develop opinions about everything. Encourage them to evolve their opinions as they gain new experiences.

“Good judgement comes from experience, and experience comes from bad judgement” – Frederick P. Brooks

Create an environment where differing viewpoints are welcomed, so people can learn multiple ways to approach a problem.

Failures are not bad. Failing means trying, and you want people who try to accomplish work that might be a little beyond their current reach. It’s how they grow. Your job is keeping the failures small, so they can learn from the failure, but not jeopardize the project.

Creating Paths: Technical versus Management

It’s important to have an opinion about the ways a management track is different than a technical track. Create a path for managers. Create a different path for technical leaders.

Management tracks have highly visible promotion paths. Organization structure changes, company-wide emails, and being included in more meetings and decision making. Technical track promotions are harder to notice if you don’t also increase the person’s responsibilities and decision making role.

Moving up either track means more responsibility and more accountability. Find ways to delegate decision making to leaders on the team. Make those leaders accountable for outcomes.

Train your engineers to be successful managers. There is a tradition in software development to use the most senior engineer to fill openings in management. This is wrong. Look for people that have a proclivity for working with people. Give those people management-like challenges and opportunities. Once they (and you) are confident in taking on management, promote them.

Snowflakes: Each Engineer is Different

Engineers, even great ones, have strengthens and weaknesses. As a manager, you need to learn these for each person on your team. People can be very strong at starting new projects, building something from nothing. Others can be great at finishing, making sure the work is ready to release. Some excel at user-facing code, others love writing back-end services. Leverage your team’s strengthens to efficiently ship products.

“A 1:1 is your chance to perform weekly preventive maintenance while also understanding the health of your team” – Michael Lopp (rands)

The better you know your team, the less likely you will create bored, passionless drones. Don’t treat engineers as fungible, swapable resources. Set them, and the team, up for success. Keep people engaged and passionate about the work.

Further Reading

The Role of a Senior Developer
On Being A Senior Engineer
Want to Know Difference Between a CTO and a VP of Engineering?
Thoughts on the Technical Track
The Update, The Vent, and The Disaster
Bored People Quit
Strong Opinions, Weakly Held

Firefox for Android: Collecting and Using Telemetry

Firefox 31 for Android is the first release where we collect telemetry data on user interactions. We created a simple “event” and “session” system, built on top of the current telemetry system that has been shipping in Firefox for many releases. The existing telemetry system is focused more on the platform features and tracking how various components are behaving in the wild. The new system is really focused on how people are interacting with the application itself.

Collecting Data

The basic system consists of two types of telemetry probes:

  • Events: A telemetry probe triggered when the users takes an action. Examples include tapping a menu, loading a URL, sharing content or saving content for later. An Event is also tagged with a Method (how was the Event triggered) and an optional Extra tag (extra context for the Event).
  • Sessions: A telemetry probe triggered when the application starts a short-lived scope or situation. Examples include showing a Home panel, opening the awesomebar or starting a reading viewer. Each Event is stamped with zero or more Sessions that were active when the Event was triggered.

We add the probes into any part of the application that we want to study, which is most of the application.

Visualizing Data

The raw telemetry data is processed into summaries, one for Events and one for Sessions. In order to visualize the telemetry data, we created a simple dashboard (source code). It’s built using a great little library called PivotTable.js, which makes it easy to slice and dice the summary data. The dashboard has several predefined tables so you can start digging into various aspects of the data quickly. You can drag and drop the fields into the column or row headers to reorganize the table. You can also add filters to any of the fields, even those not used in the row/column headers. It’s a pretty slick library.

uitelemetry-screenshot-crop

Acting on Data

Now that we are collecting and studying the data, the goal is to find patterns that are unexpected or might warrant a closer inspection. Here are a few of the discoveries:

Page Reload: Even in our Nightly channel, people seem to be reloading the page quite a bit. Way more than we expected. It’s one of the Top 2 actions. Our current thinking includes several possibilities:

  1. Page gets stuck during a load and a Reload gets it going again
  2. Networking error of some kind, with a “Try again” button on the page. If the button does not solve the problem, a Reload might be attempted.
  3. Weather or some other update-able page where a Reload show the current information.

We have started projects to explore the first two issues. The third issue might be fine as-is, or maybe we could add a feature to make updating pages easier? You can still see high uses of Reload (reload) on the dashboard.

Remove from Home Pages: The History, primarily, and Top Sites pages see high uses of Remove (home_remove) to delete browsing information from the Home pages. People do this a lot, again it’s one of the Top 2 actions. People will do this repeatably, over and over as well, clearing the entire list in a manual fashion. Firefox has a Clear History feature, but it must not be very discoverable. We also see people asking for easier ways of clearing history in our feedback too, but it wasn’t until we saw the telemetry data for us to understand how badly this was needed. This led us to add some features:

  1. Since the History page was the predominant source of the Removes, we added a Clear History button right on the page itself.
  2. We added a way to Clear History when quitting the application. This was a bit tricky since Android doesn’t really promote “Quitting” applications, but if a person wants to enable this feature, we add a Quit menu item to make the action explicit and in their control.
  3. With so many people wanting to clear their browsing history, we assumed they didn’t know that Private Browsing existed. No history is saved when using Private Browsing, so we’re adding some contextual hinting about the feature.

These features are included in Nightly and Aurora versions of Firefox. Telemetry is showing a marked decrease in Remove usage, which is great. We hope to see the trend continue into Beta next week.

External URLs: People open a lot of URLs from external applications, like Twitter, into Firefox. This wasn’t totally unexpected, it’s a common pattern on Android, but the degree to which it happened versus opening the browser directly was somewhat unexpected. Close to 50% of the URLs loaded into Firefox are from external applications. Less so in Nightly, Aurora and Beta, but even those channels are almost 30%. We have started looking into ideas for making the process of opening URLs into Firefox a better experience.

Saving Images: An unexpected discovery was how often people save images from web content (web_save_image). We haven’t spent much time considering this one. We think we are doing the “right thing” with the images as far as Android conventions are concerned, but there might be new features waiting to be implemented here as well.

Take a look at the data. What patterns do you see?

Here is the obligatory UI heatmap, also available from the dashboard:
uitelemetry-heatmap

Patterns of Effective Teams

I have been lucky to build software products for a few different companies, each with a distinct culture. It’s help me form opinions about people, tools and processes that make teams effective at shipping software products.

Who makes up a software product team? Mileage may vary, but I like to include:

  • Developers
  • Testers
  • UX Designers
  • Project Managers
  • Product Managers
  • Support

Lots of companies organize people into functional groups: All the developers in a group, all the testers in a group, all the designers in a group… and so on. This doesn’t make it easy to ship software. It can create walls and make it harder to communicate. You also lose the “team” feeling, as well as the focus and drive that comes from that.

Product-centric teams seem to be more effective at shipping. These multidisciplinary teams embed members from the various groups on the team, all working together to create and ship a software product.

Over the years, I’ve seen productive teams using a few basic concepts. Some are process related, some can be aided by tools, but most deal with relationships between people:

  • Trust each other: Each member has a role, and members need to trust in each other’s ability to perform.
  • Talk to each other: Lots of open communication is important. The team is a safe place, so there are no stupid questions. Meet as a group often to discuss progress.
  • Support each other: You win and lose together. Help others, even if not asked directly.
  • Be passionate: The team needs to be passionate about succeeding and hungry to ship a great product. There will be rough spots on the way. There always are, but the team needs that passion to be able to power through.
  • Move as a single, focused group: Speed is important. Decisions, implementation, feedback – all need to happen ASAP. Distractions kill speed.
  • Plan work as a group and document the plan: If everyone is part of the planning, everyone is committed to the plan. Keeps the team focused.
  • Create a roadmap: You need a Big Picture too. What’s the vision and strategy? It helps set the tone for everything else.
  • Break work into small tasks and track the tasks: Small tasks are manageable and trackable. Small tasks are easy to scope and keeps the team focused. Watch out for scope creep.
  • Create milestones and track progress: Deadlines are a good thing, even if just internal. Forward progress is essential for shipping and milestones are great for tracking progress.
  • Adjust as needed: Don’t be afraid to adjust anything: schedule, milestones, tasks. You are collecting data every day. Use it to make informed decisions ASAP. Triage your work often.

I like to keep things lightweight. This includes tools and processes. Focus more on your product and the work at hand. Processes and tools can be distractions. The best ones are those that stay out of your way.

Update: Taras reminded me indirectly about the importance of passion, so I added it to the list.

Add-ons: Binary Components and js-ctypes

Lots of discussion going on recently about the affect of 6 week development cycles on binary XPCOM components in add-ons. I don’t want to re-hash those discussions, but Daniel Glazman brought up an interesting point in a comment on Wladimir’s post. Wladimir was suggesting binary component developers start moving to js-ctypes. Daniel pointed out that there are two classes of binary XPCOM components:

  • XPCOM wrappers around 3rd-party binary libraries: We use this model for exposing external binary functionality into JavaScript so add-ons and applications can access the libraries. Using js-ctypes should provide a simple, non-breaking way to expose the libraries. You create a simple JavaScript wrapper in a JavaScript XPCOM component. We need more examples of using js-ctypes to do this, but it works.
  • Pure binary XPCOM components built only using the Mozilla platform: Sometimes the functionality you want to expose is actually locked away in the Mozilla platform itself. Maybe there is no public nsIXxx interface or the existing interface has a [noscript] attribute on a property of method. This model shouldn’t be required anymore, in my opinion. Mozilla is pushing JavaScript based components and we should be exposing as much as possible to chrome JavaScript. I would encourage add-on developers to file bugs and lobby to expose binary-only parts of the Mozilla platform to chrome JavaScript.

JavaScript ctypes has come a long, long way since it was started back in 2007. Let’s start leveraging it more. The ctypes model has been used quite effectively in other languages.

Zippity – Using the Crowd to Collect Performance Data

If you’re serious about improving your application’s performance, you need to collect data. At Mozilla, we use Talos. The Mobile team has been using Talos to track performance on Maemo (using N810 and N900) devices for a few years and recently started tracking Android (using Tegras) as well. Here are a few of the metrics we track:

  • Startup (Ts)
  • Pageload (Tp)
  • HTML Rendering (Tdhtml)
  • Sunspider (Tss)

We build the application every time code is checked into the source repository. After each build, we run these performance tests and watch for changes. Regressions happen and need to be fixed as soon as possible.

One limitation of the current Talos system is the hardware variety. For Firefox Mobile, we run tests on two (2) types of hardware: N900 (Maemo) and NVidia Tegra (Android). One problem is that this doesn’t map to the real world very well. Don’t get me wrong – the current system minimizes measurement noise and we can see regressions fairly easily. However, without testing on a variety of devices, we just don’t get a good picture of how Firefox Mobile runs on your device.

Hello Zippity!

One idea that has talked about is to get the community involved in testing performance. If we could leverage many different people, using many different phones, we might get a better picture of how Firefox Mobile performs in the real world.

I created a simple add-on (Test Harness) that can install in any Firefox 4 for Mobile release and can run a series of tests. Then it posts the result to a public data server (Zippity) and anyone can view the results. When a test result is posted to Zippity, we capture a few bits of information:

  • Test Type: Pageload, startup – whatever the test might be tracking.
  • Test Manifest: What was the actual sequence performed? What series of pages did you load? This is important if we want to try to minimize noise in the results. If you want to only compare test runs that used the same input data, it can help reduce noise and make trends easier to see.
  • User Key: In case you want to only see results submitted by a particular user. The user key is completely optional. It can be used to help filter extraneous results.
  • Device Metadata: Information like device type, OS, OS version. Again, helpful for filtering.

I have been running Firefox Mobile nightly builds on several Android devices using a static set of webpages, hosted from a local webserver. Very controlled, to minimize noise and try to see small changes over time. You can see the results here.

Anyone can run the same set of live, real world pages using the add-on. Because of differences in devices, networking latency and changes to the actual website content – the results won’t be the same. That’s OK! The real world results are just as meaningful and provide just as much valuable insight.

Currently, the Test Harness add-on only sends pageload (Tp) data to Zippity, but plans are underway to extend the test types. If you want to play along, just install the Test Harness add-on. It is already configured to run the pre-defined pageset and submit the results to Zippity. Just push the button:

Zippity is still evolving and doesn’t have a lot of features. The source code should be in my Mercurial user repo soon. I’ll post more about Zippity and the Test Harness add-on as things improve.