The State of Commercial and Retail App Testing 2020

When the folks at Testlio approached me to review their “State of App Testing 2020″ report, I had some mixed feelings. These reports can sometimes be too broad; they can say what is not, but not help a team decide where to go. Still, it would only cost me five minutes of my time to see if it was worth writing about, and it came highly recommended, so I took a look. I am glad I did. Today, I’ll analyze the report itself, hit some of the highlights from the survey results and provide some analysis to help you decide if the information is relevant for you—and what you should do about it.

First of all, the report is short. This is a feature. You won’t have to dig through 68 pages of information—six pages just doesn’t provide an opportunity to provide “fluff.” There are no long opening editorials by someone with an impressive title and list of hypothetical things that survey respondents pick between that will “become important” next year. Instead, Testlio simply anonymized its client information and re-used it to determine averages. This replaces the multiple-choice survey (which was likely filled out by someone abstracted from the work) with the actual data of the work. Testlio knows how often its customers’ applications go to production; it knows the scores of those applications in the Google and Apple stores. From that, the company can draw inferences in the relationship of shipping speed and quality. Finally, the report is grounded in a specific domain—that is, retail and business applications. The distinction between the domains is clear enough that readers won’t walk away trying something that won’t apply to their industry.

Let’s talk about what the report says.

In addition to Testlio’s data, the report borrows from other sources to draw a picture. One of the most important observations they made was a connection between quality, speed and customer adoption.

Quality Matters

I was amazed to read that 50% of users will not download an application with a 3-star rating, and 85% won’t download one with two stars. That isn’t a huge surprise, as I was recently brought into a project rescue for an application with 2.2 stars (and yes, thousands of ratings) in the Apple store. Trusting that data, that meant the application had less than half the potential users it could have—and the core reason for the app was not to sell a product but instead to promote brand loyalty.

That data creates a strong argument for an investment in quality in mobile software. Yet, it is also a lagging indicator of quality, the result of poor quality. Put differently, app store reviews are sort of an “aww, shoot” metric. We put the software into the wild and hope it scores well. If the score is good, we have a party. If it is bad, we say “aww, shoot.” Instead of lagging metrics, the result, my interest is in leading metrics, the cause. What actions can we take to improve the scores?

That’s when things get really interesting.

Release Speed Matters

The folks at Testlio grouped mobile applications into two categories: one group that released three times per month and another that released less often. (The average for the study was 2.4 releases per month.) I recently worked with a team that used something like the Scaled Agile Framework (SAFe) to manage deploys about once a quarter, with perhaps a patch in the middle.

As it turns out, the more frequently released software had higher review scores, about 7% on average. That’s a counterintuitive result worth examining. I am not suggesting you ship twice as often. However, it does make sense that teams that ship more frequently have a small amount of change between releases. That change is likely to be more localized. A large program with teams of teams that tries to integration-test at the end—say the last sprint out of six—is likely to have a lot of uncertainty. The developer who fixes a bug is unlikely to be the one that created it; the “fix” may have unintended consequences that are difficult to test out. At the very least, be very careful about slowing down release schedules in the name of quality, as the results may be lose-lose.

Let’s move on to device coverage.

Device Coverage Matters

Although they have an identical rating system, Testlio found Android OS applications across the board ranked slightly lower than their Apple iOS cousins. That is, the average score for the top 30 commercial and retail apps on iOS is 4.6 and 4.3 on Android. Personally, I think it is fantastic to have these hard numbers at the top of the field rather than opinions from a small group who responded to a survey. The challenge with this data, like the release cadences, is that you have to infer a reason. Testlio proposed three: that acceptance to the Apple Store itself is more difficult (Apple has internal tests); that as a premier brand, Apple has a “halo” effect; and that Android devices are simply too fragmented. With 24,000 different Android devices as of 2015 and too many to count now, it is likely that some older devices have some problems that could not be tested for; thus, a few bad reviews from rare models pulls down the scores.

One thing Testlio didn’t see shrinking was the size of the test group. The top 30 clients averaged 18 testers per week, and over the time of the survey had at least eight and as many as 38 testers. These would be the people who work on all the different devices checking for compatibility, in addition to doing the human testing that is not, or should not be, automated. What is changing is how those testers are deployed, for more visual inspection, flexing with project needs, instead of a defined group that will be a bottleneck some of the time and over-capacity at others. Personally, I’m a fan of the Agile, whole-team approach, but they make a case for test augmentation with flexibility in the mobile application space, which reminds me of Jon Bach’s chapter in the book, “How To Reduce The Cost of Software Testing.” As an editor on that book, I’ll be the first to admit we picked a terrible title. At the time, we were working through the ideas that would eventually come to be known as Lean Software Testing, which seems to be what Testlio is suggesting.

App Testing Conclusions

There’s a fair bit more to the survey, including the amount of device testing occurring, locations for test sourcing worldwide and how distributed testing is working, especially as remote work has become the norm.

As I said, I was pleased to see some hard data for once, and the challenge will be figuring out what we as an industry can make of it.

What do you think? Share your thoughts below.