Smart Tapping in Mobile Firefox

It’s now well understood that, when designing for a touchscreen, there are certain minimum usable sizes for touchable targets. While the amount you can display on a screen is increasing with higher resolutions, human finger sizes aren’t changing, and fingertips are much larger than a mouse pointer. As a result, most UI recommendations for touch-target sizes on mobile devices range from around 7mm to 9mm.

It’s relatively easy to take these minimum sizes into account when designing your own interface. But what about when you’re displaying something that someone else has designed, and that wasn’t built with fingers in mind? This is the situation mobile browsers encounter in most web pages: links, fields, and buttons are often much smaller than you’d want them to be in order to tap on them, but making them bigger would interfere with the designer’s intended layout.

An approach that a number of touch-oriented OSes take is to make a small target’s tap-sensitive area larger, invisibly, than the visible target itself. This approach has been cleverly referred to as using “iceberg buttons” because the visible part of the target is much smaller than what’s lurking below. In fact, the iPhone does this with their keyboard as well, dynamically changing the invisible button target size based on what letters it predicts are most likely to come next.

Given how central link-tapping is to a browser, and how frustrating it is to tap the wrong link or not be able to tap at all, we decided to build our own approach.

Introducing SmartTap

In Firefox Mobile 1.1, we’ve added a smart-tapping scheme with the goal of allowing for accurate and easy tapping on links, form widgets and other focusable targets in web content. The main concepts of the approach are:

  • Using a region, not just a point, to define the tap location
  • Creating a list of focusable element candidates in the region of the tap
  • Weight the elements by z-order
  • Weight the elements by distance from the actual touch point
  • Weight the links by number of visits

The result of the algorithm should be the element you were most likely trying to tap. Initial results show that tapping elements in Firefox Mobile 1.1 is much easy than previous versions. From a user’s perspective, taps just seem to work as they should.

Implementation Details

The code to support SmartTap was added to both the Mozilla platform and the Firefox front-end. It’s very flexible. The platform already exposes an elementFromPoint API to chrome and web content. The new API, nsIDOMWindowUtils.nodesFromRect, is very similar, but is only available to chrome content. For a given region, established by top/right/bottom/left, and a point, the method will return a list of possible DOM nodes.

The returned list is sorted in z-order. The Firefox front-end code then applies additional heuristics to the node list to find the most likely candidate. The code filters the nodes by “focusable” elements, weights by the distance of each node from touch point and, finally, weights visited links higher than other elements.

The region passed to nodesFromRect can be controlled via preferences (browser.ui.touch.top, browser.ui.touch.right, browser.ui.touch.bottom, browser.ui.touch.left). The weighting used for the visited links can also be adjusted using the browser.ui.touch.weight.visited preference.

Firefox Mobile uses a region that is offset more above the touch point. This has the affect of favoring elements above the touch point – which is based on our observations that people tend to “tap” below elements. Here’s a simple illustration:


click for larger image

The red dot is the actual touch point. The red box is the region passed to nodesFromRect. The title link will end up being “clicked” even though the author’s name text was actually “tapped.”

Another bit of intelligence in this system is based on the same insight that drives the Firefox awesomebar: that people tend to visit the same pages over and over again. Links are given higher weightings if they’ve been visited before, so visited links are more likely to “win” if a tap target is ambiguous because multiple small links are very close together. In practice, from the user’s perspective, tapping on the intended link just seems to work more often.

Thanks to Madhava Enros, Vivien Nicolas and Felipe Gomes for contributing to this post.

9 Comments

  1. David Baron said,

    May 13, 2010 @ 1:48 am

    Rather than weighting the elements by z-order, might it be better to take a sample of points in the region around the touch, find the topmost element at each point, and then weight the elements by distance-from-touch and frequency within that sample? It doesn’t seem like z-order should matter for elements that don’t overlap (and might cause bias in the results, requiring touching higher on the display to avoid being too close to the next line); and sampling the topmost element at different points ought to account for z-order when it does matter.

  2. jmdesp said,

    May 13, 2010 @ 5:19 am

    I think what you need truly is “hit and slide”.
    The thing that’s really hard to do is to, with a high precision, watch the screen, see an element and be able to hit exactly on that element on first try. But after you have hit the screen, if something shows you where you really are, you will be able to correct your position with a much higher resolution.

    So what the interface should do is after you hit the screen, give you a very visible return on what you have hit, and *not* act until you *raise* your finger (or stay for some delay on the element), and letting you *slide* on the surface to select something else if you realized you hit the wrong element, still giving you as you slide a visible return on what the current hit is.

  3. Kadir said,

    May 13, 2010 @ 7:52 am

    Wow, awesome improvement, can’t wait to try this out on my iPhone. Oh, wait, right :/

  4. voracity said,

    May 13, 2010 @ 8:08 am

    “The platform already exposes an elementFromPoint API to chrome and web content.”

    Wouldn’t this be a privacy issue, similar to the :visited one? (Albeit, a lot harder to take advantage of.)

  5. Mark Finkle said,

    May 13, 2010 @ 9:50 am

    @voracity – elementFromPoint does not do anything with visited links, it merely returns the element at the point. The new API, nodesFromRect, doesn’t do anything directly with visited links either. The Firefox front-end code applies the weighting separately. Nothing is leaked to content, that I can think of anyway.

  6. Matt Brubeck said,

    May 13, 2010 @ 3:59 pm

    jmdesp: Fennec does give visual feedback when you first press the screen, and you can cancel your touch by moving your finger away before lifting it.

    You can’t shift focus to a different element by sliding with your finger still pressed, because Fennec scrolls the page when you do that.

  7. Dave Hulbert said,

    May 14, 2010 @ 2:40 am

    This sounds great, but shouldn’t the offsetting the tap area up a few pixels be done by the OS instead of the program?

  8. Tomas said,

    May 14, 2010 @ 11:39 am

    Dave: Some OS’s offset the tap, some do not, so it makes sense to make it configurable as the post suggests. Also, if Firefox wants to do in-browser rotation (i.e. rotation of the content is performed by the application itself instead of the OS) at some point, it’s important that Firefox is able to correct the tap offset in the right direction as well.

  9. Gordon P. Hemsley said,

    June 7, 2010 @ 12:41 am

    This all sounds reasonable, but there is one unmentioned caveat that could quickly become a major annoyance: If there are a number of links close together, and you accidentally tap on the wrong one the first time you try, future taps in that same area will keep giving you the same wrong link (since visited links are favored).

    Just something to keep in mind.

RSS feed for comments on this post