If you can not measure it, you can not improve it. – Lord Kelvin.
There is so much information out there on ways to improve the performance of your mobile application or website. You probably feel you can just dive in and start making changes. But if you’re not measuring your application’s performance, you don’t know if anything is really helping or hurting. How do you know what effect any changes will have on the performance? Most applications are complex enough that we can’t assume our simplistic reasoning accurately reflects the code behavior. You need to measure.
You need measurements from before and after any changes are made. Your application has development phases, so should your measurement plan. Measure in CI to find improvements and regressions as soon as they happen. Measure in the real world to find how variations like network conditions, device fragmentation, and unpredictable user behavior manifests in performance.
Measuring in CI
The point of measuring performance in CI is to control variability and watch for relative differences on each change. Try to reduce the noisy variables like network calls and background services to create a fast and consistent surface on which you can monitor performance changes in a reliable and repeatable manner. The purpose is not to determine the performance your users will encounter. There are too many variables in the real world and you can’t control them well enough for apples-to-apples comparisons.
Use real devices when measuring performance, not emulators or simulators running on host hardware.
You can try to create reliable & repeatable simulations of some real world situations. Network connection speed is one example. You can use a network simulator, like Facebook’s Augmented Traffic Control system to simulate WiFi and mobile network conditions. This is especially useful if your application is designed to react differently under different network conditions. You can also use different types of content in the tests, trying to mimic some high level differences your users might encounter.
If you’re measuring data in CI, you should be storing and displaying it as well. Try to get the CI to alert on regressions, failing changes before they make it into the product.
Some common things to measure in CI:
- Launch time to show UI
- Launch time to interactive UI
- Scroll performance (janky frames)
- Time to load content
- Memory usage (startup, after content loads, after scrolling content)
Remember to use multiple (physical) devices and a variety of content types.
Measuring in Real World
While CI measurements come from only a handful of tests & situations, the real world has many, many more situations. Depending on the number of active users, you could have millions of data points with thousands of unique situations. Collecting data from real users, at a large scale, allows you to investigate how things like global regions, network conditions — and even user types — can affect the performance of the application.
The are many third-party systems you can integrate into your code to easily and efficiently collect real world data. It’s not uncommon for companies to grow their own systems as well. In any case, make sure you are validating the data itself. Since real world data is messy, make sure you are vetting the collection systems and the data. Look for problems like payload corruption, clock skew, range errors or other oddities.
Create automated queries and reports, sent out broadly for people to review. Remember to go deeper than high-level summaries. Some of the interesting discoveries happen when you split out data across different dimensions.
Some common things to measure from real world:
- Network usage, including start time, end time, content type and size of the response. Get detailed connection timing, if possible, for DNS and SSL handshake information.
- For API endpoints, this is useful for tracking latency and payload size
- For media loading, this gives a ballpark metric for how long people are staring at an empty box, waiting for an image to load.
- Event, session and error state data. This can be used to track critical content impressions, but also can be used to learn how people use the application.
Remember to include some common metadata in each measurement so you can split out the data across different dimensions. Things like non-PII identifier, generic geo-location/region, device specs and connection type/speed can help you drill down into the data, looking for trends.
It’s also polite to allow people to opt-out of this type of data collection.