Nobody likes spam. Fortunately, through years of innovation and work by the world's top scientists, we've become pretty good at filtering it out - at least when it comes to email. You may have thought the top scientists in the world would be figuring out how we can live on Mars, but nope - the world's top men and women have gotten very good at figuring out how to filter your emails. Every time I open my GMail inbox, everything is filtered nicely into piles - promotions, coupons, junk e-mail are all in separate folders, allowing me to see the emails which actually matter at a glance. Unfortunately, when it comes to Google Analytics, things aren't so neatly organized.

While Google Analytics does a pretty good job of sorting our default channels (Referral, Organic, Paid, Social and Direct) and we can add custom channels and campaigns through the use of the URL builder and event tracking; sadly we sometimes we get spam mixed into these default channels. This spam traffic causes quite a issue for marketing managers: it undermines your data quality by skewing both your overall traffic numbers and your channel reports. Fortunately, there are ways we can remedy this situation and clean up our Google Analytics data. 

What is Referral Spam?

As a quick refresher in Analytics 101, referral traffic is traffic referred to your website from another website. Search engines don't count, and social media like Facebook may show up here on occasion, but in general it should fall into its own category of "Social" in your Google Analytics Channel report. Common examples of referral traffic include:

  1. Someone clicking through to a restaurant's website from Yelp.com
  2. Someone clicking through to a handyman's website from a site such as Angie's List
  3. Someone clicking through to our website from a link they found on Search Engine Land as part of their daily blogroll
  4. Someone clicking through to a client's site as a result of an content piece we created and got placed on a site like Entrepreneur.com

All of these are good traffic sources that we want to be able to measure, and glance at to understand how much engagement we're getting from these types of channels. But referral spam gets in the way and clouds the data. The example below comes from a small legal firm looking at their referral sources over the course of one month:  

An example of referral spam to a small law firm's website Referral spam (click to expand)

No. 3, Attorneys.lawinfo would be a legitimate referral, as would no. 5, avvo.com. Both are sites with listings of legal professionals. But almost 80% of this site's referral traffic is taken up by two sources - Semalt and Buttons-For-Website, two well-known sources of referral spam. I'm not even going to give them the privilege of linking to their websites: all you need to know is that Semalt is a garbage company that tries to position itself as an SEO tool - their website even has the worst call to action I've ever seen:

Dear visitors… Your website is not in Google TOP? You have weak conversion? Lack of sales?

I would hope if you were trying to convert me into a customer, you'd at least bother with proper English. Buttons-For-Website.com is similarly scammy, claiming "Get Free Buttons, Drive Traffic To Website"...well, they do drive traffic, technically, so they're not lying. But the traffic is worthless garbage coming from their own bots, taking advantage of those not in the know, and people who are looking to say "Well look, boss, look at the traffic" without considering the quality of the traffic and the conversions it creates. There's more crappy variants out there - best-seo-offer.com; googlsucks.com ; social-share-buttons.com - all of it, garbage. It's pretty easy to spot in your Channels > Referral Traffic > Sources report - and luckily, just as simple to rid yourself of it.

How Do I Block Referral Spam?

In your Google Analytics account, go to the Admin tab then to Property Settings -> Tracking Info -> Referral Exclusion List and add the offending domains to the referral exclusion list. This traffic will no longer appear in your referral traffic reports, making analyzing your data much easier.

Blocking Referral Spam

 Blocking referral spam (click to expand)

If you have quite a big referrer spam issue and you're not sure where to start, the lovely folks at Pwik Open Source Analytics have a full list of blacklisted referral spam sites you can check out on Pwik's Github - and if you have the development knowledge, you can go one step further and block these on the sever level so the traffic never even runs through the Google Analytics script. Several companies are in the process of trying to automate the process of referral spam removal including XSE, a Swiss company dealing in digital marketing. You can try out their automated tool here, which claims to automatically add offending spam domains into your referral exclusion list.

I'd recommend starting with the first approach - manual exclusion - and then only moving into these advanced methods if referral spam is still an issue after you've added the offending domains. Spammers are coming up with new domains for their crummy services all the time, so be sure to revisit your referral reports every month or so and exclude newly found spam domains as needed.

What is Direct Spam?

Indicators of direct spam in Google Analytics Direct spam (click to expand)

Analytics 101 Refresher, Part 2: generally speaking, direct traffic comes directly to your website by typing the URL into the search bar. Direct spam is indicated by an unusually high number of direct visits to your website. Here's an example of a small business that went from about 10-20 direct visits per day....to 2,000-8,000 per day. This is clearly indicative of a direct spam problem. A high amount of direct traffic can also be an indicator of a DDOS-attack, where a malicious person or firm attempts to send so much traffic to a website that it overloads the server and takes the website offline. Direct spam, according to this report, is 90% of the total traffic to the website - obviously skewing our analytics data quite a bit, and possibly eating up our bandwidth as well. So how do we go about fixing this? Well, it's not as simple as eliminating referral spam - but there are actions we can take to try to find the cause of the issue and ultimately eliminate it.

How Do I Block Direct Spam?

To eliminate direct spam, we have to start with diagnosing the source. Here's a few places we would check to try and find the source of direct spam. Hostname: In Google Analytics while viewing Channels -> Direct Traffic, one of the primary dimension options is hostname, which shows the host which referred the traffic to the site. Almost all of this traffic should be coming from your domain or the domain of the hosting provider (i.e. Acquia, BigCommerce, etc.) If you see something that isn't a hosting provider or your domain name, it's likely spam, and you should reach out to your developer to see if there's a way to block that source through manipulation of your .httaccess file.

How to block direct spam with hostname reports in Google Analytics Hostname reports (click to expand)

Geolocation: In Google Analytics, still viewing through the Direct Traffic filter, navigate over to Audience -> Geo -> Location. Many times excessive direct traffic is the result of spam attacks from Russia and other European/Asian countries. Compare this report to previous months - are any countries spiking in their direct traffic, aside from the country your business operates in? Check the statistics for the country in question - does the session duration or bounce rate seem odd? Looking at the report below, it looks like Russia in particular could be the source of our direct traffic issue. Speak with your developer about ways you can limit traffic access from these countries to resolve the direct spam issue.

How to block direct spam with geolocation reports Geolocation reports (click to expand)

  Other Factors: Hostname and Geolocation are the two first factors we check to see if direct spam is actually happening, but there are other dimensions we look at with extended investigation. All of these factors can be found under Audience -> Technology in Google Analytics.

  • Browser Type
  • Screen Resolution
  • Operating System
  • Network
  • Screen Colors

Any unusual data found in these channels can be used to help confirm direct spam attacks, which you should discuss with your developer and marketing team to determine the best way of removal. Also, it's worth talking to your hosting provider to see how much data is being consumed, and see what options they have available to secure your site and prevent direct spam.

In Conclusion

We recommend removing referral spam when it's discovered, as it's a fairly simple process that can make your data cleaner and make reporting easier. Direct spam is a bit more of a beast to handle, but the key is to gather as much data as possible to show evidence of a problem, and then work with technology professionals to find the best path to resolution - often times, adjustment to the httaccess file or the change security is configured on the website can help fix the problem. Spam is just one of the many frustrations that we face as marketers - both on the agency and in-house side. Another big frustration is black-hat tactics that provide short-term gains while destroying the potential for long-term growth.  Check out our guide on Black Hat SEO tactics to learn the red flags of bad SEO and ensure you avoid long-term damage to your website and brand.

Download the Red Flag Guide to Identify Black Hat SEO