By now, you’ve opened your GA4 instance in search of answers only to realize there seems to be missing data. You’ve bounced back and forth from GA3 to GA4 to verify. You’ve scoured your admin panel looking for an incorrect setting, all to no avail. Then you see it, the tiny little hazard triangle. You click on it and read that Google has applied thresholding to your data. But what is thresholding, and how do we fix it?
What Is thresholding?
Data thresholding is a feature of Google Analytics 4 that prevents users from identifying individual users based on data collected by Google Analytics. Thresholding is applied to reports and explorations when the number of users or events is small. This is done to protect the privacy of users.
When thresholding is applied, the data for the affected rows is hidden. This can make it difficult to analyze data, especially when trying to track trends or identify patterns. Which data triggers the thresholding is dependent on what kind of report or exploration you are looking at. You can explore more on that here.
What is triggering this action?
Thresholding is triggered when you have Google Signals turned on in an account. Google Signals is a feature of Google Analytics that allows you to collect data about users across multiple devices. This data can be used to track user journeys, create more targeted audiences, and measure the effectiveness of your marketing campaigns.
Google Signals works by associating data from your website or app with data from Google’s advertising services. This association is done anonymously, so Google does not know who the data belongs to.
The data that is collected through Google Signals includes:
- Device ID: The unique identifier for the device that the user is using.
- User ID: The unique identifier for the user if the user has signed in to Google.
- Session data: Data about the user’s interactions with your website or app, such as the pages they visited, the actions they took, and the time they spent on your site.
What are the effects of thresholding on my reports and explorations?
When viewing a report in Google Analytics 4, and the property contains data from Google Signals, Google Analytics may hide rows with small user numbers. The exact number of users that triggers this is unknown, but it is likely below 50.
For example, if you view a Traffic Acquisition report and some traffic sources generated less than 50 users in a given timeframe, Google Analytics will hide that data from the report. The data is still stored in the database but will not be displayed in the report.
This is done to protect the privacy of users. By hiding data from reports with small user numbers, Google Analytics makes it more difficult for someone to identify individual users based on their data.
In this example, you’ll notice that although the site has 67 different sources/mediums feeding it, GA4 is hiding/removing all but the top 16.
What can I do about this?
If you are concerned about data thresholding, there are a few things you can do:
- Increase your data set size. You can do this by extending the date range or use fewer filters in your report or exploration.
- Disable Google Signals. This will prevent Google Analytics from collecting data from Google Signals, which may trigger data thresholding.
- Use a different reporting identity. Google Analytics offers three reporting identities: Device-based, User-based, and Blended.
Why do I want to use Google Signals?
- Improved accuracy: Google Signals can help to improve the accuracy of your GA4 reports by providing additional data about your users. This data includes things like demographics, interests, and device type.
- Better cross-platform tracking: Google Signals can help you to track users across multiple devices and platforms. This can be helpful for understanding how users interact with your content and products across different devices.
- More insights: Google Signals can provide you with more insights into your users’ behavior. This can help you to improve your marketing campaigns and create a better user experience.
- Improve the accuracy and effectiveness of audience building.
- Increased reach: Google Signals can help you reach more users with your marketing campaigns by allowing you to target users across multiple devices and platforms.
- Improved targeting: Google Signals can help you target your marketing campaigns more effectively by providing more data about your users, such as their demographics, interests, and device type.
- Increased ROI: Google Signals can help you increase the ROI of your marketing campaigns by helping you reach more relevant users and target them more effectively.
How do I change my reporting identity?
Navigate to your admin panel, then click on the option that is labeled Reporting Identity.
Device-based reporting identity is the most basic reporting identity. It uses the device ID to identify users. This means that users who use different devices to access your website or app will be seen as different users. This can make it difficult to track user behavior across devices.
Observed reporting identity is more advanced than device-based reporting identity. It uses User ID if available, followed by Google Signals if enabled, and then Device ID. This means that users who have signed into your website or app with a Google account will be seen as the same user, even if they use different devices. This can make it easier to track user behavior across devices.
Blended reporting identity is the most comprehensive reporting identity. It uses a combination of User ID, Google Signals, Device ID, and Modeled data if nothing else is available. This means that users will be seen as the same user, even if they do not have a Google account and have not enabled Google Signals. This can make it easier to track user behavior across devices and platforms.
Keep in mind when changing from blended identity to device-based identity in Google Analytics 4, your historical data may be affected in the following ways:
- Data loss: Some data may be lost if users have cleared their cookies or used a VPN since their first interaction with your website or app. This is because device-based identity relies on users having a unique device ID, and if this ID changes, the user will be tracked as a new user.
- Changes in user behavior: The way that user behavior is tracked may change. For example, if a user uses multiple devices to access your website or app, devices-based identity will only track them as a single user. This can affect the accuracy of reports that track user behavior, such as how often users visit your website or app, what pages they view, and how long they spend on each page.
- Decreased accuracy: The accuracy of your historical data may decrease. This is because device-based identity is less comprehensive than blended identity, so it may not be able to track all of your users or track them accurately.
Final thoughts
Thresholding is an annoyance that I really can’t see a good reason for. I do not see a reasonable way that you would derive a person’s identity from the demographics collected, not even when pulling the data from BigQuery directly. It is certainly a feature that I wish was not part of GA4.
As for what to do about it, while the data coming in with Google Signals is arguably better than without, you should consider your needs when trying to decide. It is possible that you may not need the extra data or desire to use the audience-building capabilities, in which case you can choose not to use Google Signals. Evaluate each site and the client’s needs as you are going through the setup process. And if you do need it, remember that you can swap between reporting identities quickly and easily. Just keep in mind that the change may affect how the numbers are calculated.
Posted
John Paul Strong
John Paul Strong combines his two decades of automotive marketing experience with a team of more than 150 professionals as owner and CEO of Strong Automotive.