2017 Guide to Blocking and Removing Google Analytics Spam
Analytics spam isn’t a new problem, but recently it’s become much more pervasive. In this post, I’ll explain the different types of analytics spam, how they impact your site, and I’ll provide some actionable tips to help users remove and filter out fake traffic data to improve reporting accuracy.
The first thing you’ll need to know is that analytics spam essentially comes in two different forms: bot spam and ghost spam. Each one impacts your analytics data differently, therefore, the way you go about removing them depends on which type you’re dealing with.
What is Bot Spam?
Bot traffic is more common than you might think. In fact, recent studies suggest that bots account for nearly 52 percent of all web-based traffic. Not all of them are bad, though. Google, Bing, and other major search engines, rely on their own search bots to crawl and index content on the web. As do commercial crawlers (SEMrush, Pinterest), feed fetchers (FeedBurner, Twitter), and monitoring bots (WordPress, Uptime Robot). However, spam bots tend to be a bit more nefarious.
Spam bots (a.k.a. crawler spam) are programs designed to automate a variety of shady tasks, such as data and content theft, server jacking, comment spam, phishing links, and DDoS attacks. They usually fall into one of four different categories: impersonators, hacking tools, scrapers, and spammers.
- Impersonators use false identities to imitate natural user behavior and characteristics to bypass security measures. They’re commonly used to deploy DDoS attacks.
- Hacking tools are typically used to distribute different types of malware. In 2015, Google reported a 180 percent increase in hacked sites since 2014.
- Scrapers are used to steal content and user data. In many cases, the stolen content will be republished on other domains.
- Spammers use bots to spread promotional content, usually through comments and phishing links.
Aside from the obvious security risks, one thing that these bots all have in common is that they litter your analytics account with fake traffic data.
How to Block Bot Spam
There are several different ways to block and remove bot spam. I’ve included a few of the methods that I’ve found to be most effective.
Automatic Bot/Spider Filtering Feature
A few years ago, Google introduced a Google Analytics feature that allows users to check a box to block traffic from known bots and spiders. The feature automatically filters out all spiders and bots included on the IAB/ABC International Spiders & Bots List.
Although the auto filter feature may not be the most effective defense against bot spam since some bots may slip beneath the radar, it’s undoubtedly the easiest method to implement. But before you start adding filters (be it auto filters or custom filters), I recommend creating an additional view within your Google Analytics account, to ensure that you always have a raw data segment to use as an unadulterated baseline.
To do this, click on the “Admin” tab towards the bottom left corner of the page. Then in the “View” column on the far right, select “Create new view.”
Click to enlarge
Next, in the “View Settings” section, you’ll want to scroll down and check the box that says “Exclude all hits from known bots and spiders” and then click “Save.” Now the filters you select will only be applied to the new view that you created.
Click to enlarge
Block Bots Using .htaccess
Although the auto filter feature may help to remove the bot visits from your analytics data, it won’t do much to prevent the bots from accessing your server. To block the bots altogether, you’ll need to block the specific urls in your website’s .htaccess file.
Jared Gardner wrote an extremely thorough post on how to block spam bots using .htaccess, so instead of reinventing the wheel, I’ll just leave that link here. Please be advised that website owners should only make changes to their .htaccess file if they know what they’re doing. As Jared pointed out in his article, one misplaced character can crash your entire site.
Block bots using custom filters in Google Analytics
The tactics listed above will only prevent future bot visits from showing up in Google Analytics. To remove the bot visits from your historical data, you’ll need to use custom filters. One way to do this is by excluding traffic from specific countries. For instance, if you see a lot of spam traffic coming in from Russian domains, you could create an exclusion filter that would prevent Russian traffic from showing up in Google Analytics.
You can do this by selecting your filtered view (the view you created to apply the auto filter), and clicking on “Filters” under the “View” column on the far right.
Next, click “Add Filter.” Once you name your new filter, select “Custom” as the filter type.
Make sure that the “Exclude” option is selected, and then choose “Country” as the filter field (if you don’t see “Country,” you can type it into the box and it should appear as an option).
You would then enter the country name(s) as the filter pattern. When filtering multiple counties, you’ll need to separate each country name with a “pipe” symbol. You can then use the “Verify this filter” feature, which shows how the filter parameters would have affected data over the past seven days, to ensure that your filter is working properly.
Click to enlarge
This is just one of many examples of how filters can be used to remove bot traffic from your analytics data. You could also use additional features that target other attributes, such as campaign source, hostname, and IP. The attributes you choose will ultimately depend on your website and demographic. That said, website owners should use caution when adding filters, to ensure that they don’t filter out real visitors. For instance, it may not be a good idea to filter out an entire country if you’re an international brand. For more information on creating and managing filters, you can check out this Google Analytics support doc.
What is Ghost Spam?
Ghost spam – a.k.a. referrer spam – is basically “second generation” bot spam. Unlike bot visits, ghost spam doesn’t have any real interaction with your site – hence the name. Instead, ghost spam bypasses your website, by exploiting a vulnerability within Google’s Measurement Protocol, and sends data directly to your Google Analytics server, leaving behind fake traffic data.
The way ghost spam works is simple. Ghost spammers arbitrarily choose their victims, by randomly generating Google Analytics tracking codes. Then they use these tracking codes to inject urls into a variety of reports, such as referrals, events, language, and even organic traffic.
The goal is lure website owners to promotional websites, such as businesses selling SEO and online marketing services, affiliate marketing sites, or plugins asking you to insert a potentially dangerous script on your site (such as sharebutton.to).
Ghost spam isn’t nearly as malicious as bot spam, but it tends to be much more common, and it typically requires some extra finesse to scrub it from your analytics data.
How to Block Ghost Spam
Since ghost spam doesn’t access your site, it’s impossible block ghost visits using your .htaccess file or the built-in bot-filtering feature. Instead, you’ll need to use custom filters.
Over the past couple of years, I’ve came across several different methods for filtering out ghost spam on various blogs and forums. I’ve found many of these methods to be highly ineffective. Speaking of which, Georgi Georgiev wrote a great post, explaining why many of these proposed solutions are flawed.
One of the major challenges when dealing with referrer spam is that that spam itself is constantly evolving. For instance, shortly after the 2016 presidential election, Google Analytics users started seeing the following message in the language section of their Google Analytics accounts: “Secret.ɢoogle.com You are invited! Enter only with this ticket URL. Copy it. Vote for Trump!”. This new breed of referrer spam is referred to as “language spam” and it’s been creating quite a nuisance for online marketers over the past few months.
Currently, the best way to block and remove ghost spam is to use custom filters. Your first steps would be creating a new view in Google Analytics, and creating a new custom filter under that view. Make sure that the “Exclude” option is selected, and then choose “Campaign Source” as the filter field.
You can then use Regex to create your filter pattern. Like the bot filters, you’ll use the “pipe” symbol to filter multiple campaign sources, and you’ll also want to use the backslash (\) to escape dots (.) in the campaign source domains. Your expression should look something like this: domain1\.|domain2\.|domain3\.
Click to enlarge
Although in my experience, this has been the most effective method for removing referrer spam, it’s important to remember that this isn’t necessarily a long-term solution. Keeping up with spammers can be exhausting, and a massive waste of time and resources. Unfortunately, until Google fixes the vulnerabilities within the Google Analytics tracking functionality, we’re left to sort through the mess on our own.
Not All Traffic Anomalies are the Result of Spam
The growing diversity of our online ecosystem, such as mobile, social media, voice search, and messaging apps, has led to a huge increase in dark traffic. In addition to weeding out fake traffic sources, website owners should also be on the lookout for misattributed traffic that could be skewing their data.
Since most of us rely on Google Analytics data to help guide our digital marketing strategies, data integrity should be considered a top priority. If you see anything unusual in your reports, your first step should be digging a bit deeper to determine where the traffic is coming from.
From there you can determine whether it’s the result of spam, misattribution, or even a random reporting glitch. It’s impossible to ensure that your reporting is accurate 100 percent of the time, but there’s a lot that can be done, with minimal effort, to actively monitor and safeguard your data. By taking the steps outlined in this article, users can significantly improve the accuracy of their reports, which enables them to make more well-informed marketing decisions.
If you have any questions about the solutions provided in this post, or if you’ve discovered a more effective method for removing and blocking Google Analytics spam, feel free to leave a comment below!