Why You May Be Dropping Key Mobile Data From Your Observability Solution

If you have a business-critical mobile app, you may be surprised that your observability solution is dropping a large percentage of your mobile app observability data. That is because data from mobile devices is frequently delayed from when it is collected client-side to when it is ingested server-side. But why exactly, and how does it affect observability in your mobile experiences?

The Reality of Collecting Data From Mobile Devices

When collecting data from mobile devices, you must accept that — unlike backend observability — you don’t control the whole system. The devices you are collecting data from are out in the wild in people’s hands. As such, they are a heterogeneous data source with widely varying usage patterns and network connectivity.

The assumption in backend and web monitoring — that the user has a connection to the observability service — does not hold for mobile, some observability tools have this explicit expectation of near-constant connectivity. However, in mobile apps, people’s connectivity varies wildly.

This brings us to one of the harsh realities of mobile observability — data is frequently delayed. Data collected on mobile devices can take hours, or even days, to reach backend systems.

To put this in perspective, most DevOps teams would consider a server being offline for a single day, and not delivering observability data, to be very much out of the norm. With some mobile apps, that is the norm rather than the exception.

What Contributes to Data Delays in Mobile Apps?

One key way that mobile apps vary in their data delay profiles is their usage patterns. Consider health and wellness apps used in remote areas for navigating trails or internal productivity apps for use in warehouses, agricultural areas or oil rigs. Wherever connectivity is strained, you will see larger data delays.

However, a key benefit of mobile apps is that they can be used anywhere. So, even location-agnostic apps related to shopping and mobile games can have heavy usage under poor connectivity. People play mobile games when they are taking the metro or bus ride. Additionally, they place mobile food orders while driving to a nearby fast-food restaurant.

As a further example, in our customer base — which includes mobile apps across every industry and category — we only see a small percentage of apps that do not have any instances of data that is delayed by at least a day.

Beyond usage patterns, the iOS and Android ecosystems also impact the data delay. When apps crash on iOS, the crash will not be reported until the app is relaunched. On Android, the situation is a little better where most crashes that happen in Java or Kotlin code can be reported with limited delay if there is an internet connection. However, crashes that occur in native code will not be reported until the next app launch. Thus, the end users’ behavior, especially on iOS, impacts when you receive the data that indicates that you have a stability problem in your app.

Why Do These Data Delays Matter?

Let’s start with the obvious: You want a complete picture of what your mobile app is doing. If your backend system discards any data that arrives three hours after it was collected on a device, you will have a large visibility gap. In the graph below, you can see how much visibility you would lose from discarding delayed data.

The above data is pulled from a customer that has above-average data delays. Note how 25% of data does not arrive for at least two days after it is collected on the mobile device, and 100% of data does not arrive for approximately one week.

Why Would You Ever Discard Delayed Data?

So, if this is typical behavior for mobile applications, it begs the question — why would you ever discard delayed data? This is not typical behavior for backend observability. If your system was designed to handle data from backend applications, supporting delayed data is probably not a concern that you had. While it may not be that challenging to modify, or costly to operate, a system capable of handling delayed data when the data volumes are moderate, once you get to larger data volumes, you will start to experience serious challenges.

At Embrace, we have been focusing on mobile observability, so we have built our system from day one with the expectation that we will be getting a substantial amount of data that is delayed. We have chosen building blocks that allowed us to efficiently store delayed data and used data schema that accommodated the delayed data to be stored and queried, such that the larger time windows that must be considered did not lead to excessive penalties in query performance.

If mobile is a growing priority within your organization, you should already be planning for how you will address issues with delayed data at scale.

What Can You Do Today About Delayed Data Issues?

A simple solution is to use the time the data is reported to the observability service as the time of the event. However, this tends to lead to more confusion. Let us consider how this would impact tracking crashes in a new app version that you just released.

Your team witnesses a spike in crashes, so you launch an investigation to track down the root cause and then release a new version. The crash rate goes down, and all is good.

But what happens when users on the previous version that crashed — who were too frustrated to relaunch your app — have finally decided to give it another go? They relaunch the app, which sends a crash report from the device. If your observability tool marks those crashes as having just occurred, you might think the issue is still ongoing, even though you released a fix.

So why not do the right thing and map the data to the time that the events occurred? You will pay the price for doing so at either ingestion or query time, or sometimes both. Most commonly used databases that support scale at a cost-effective price point have tradeoffs that make ingesting delayed data non-trivial. If you take the simpler approach to ingestion though, you will end up paying the price at query time: You may be querying an order of magnitude more data, depending on how much delay you have decided to support, than you would under normal circumstances for non-delayed data.

Closing Thoughts

Given the value generated by mobile apps and how critical they are for many businesses today, operating based on a subset of observability data is not a sound strategy. Mobile applications cannot be monitored effectively by traditional observability solutions for many reasons, with the delayed nature of mobile data being critical.

You are not alone in realizing the challenge of getting the full picture of mobile observability data. The good news is that there are solutions today, from mobile-first approaches to configuring existing backend systems, to account for the sharp corners of mobile data capture. As mobile growth accelerates, we see open-source communities and governing groups rethink what mobile telemetry standards should be. It is exciting to see what mobile observability will look like soon.