Filter Referrer Spam

What is Referrer Spam?

With smaller websites especially, you may have noticed the huge swarm of annoying referral spam that has been plaguing our Google Analytics in recent months. In fact, if you’ve just launched a new website and aren’t yet pushing for much traffic, if you sort your traffic by Bounce Rate, it might look a little like this:

Spam Traffic in Google Analytics

Gross, right? An eighth of the website’s traffic is from a single spammy website. Just sitting there, screwing up my analytics. Of course, you could exclude them right there and then as you’re browsing. But they’ll keep popping up, infecting your data with more fake traffic… You need to get rid of them once and for all.

Those figures above seem small, but they add up and can skew your website data massively depending on the incoming traffic of your website. Giving you the wrong figures for things such as conversation rates, bounce rates, and a whole load of other metrics in addition to just pageviews and visits.

WARNING!

Referrer spam can be dangerous. You may see some suspicious referrals coming from websites containing the word “blackhat” or even websites that seem to be offering free tools or services. Recently I’ve even seen referral spam coming from howtostopreferralspam.eu – DO NOT VISIT! If you are curious, you can google the website and see what it’s about from a different source or forum. Website domains may even imitate popular websites using a misspelled variant of the popular domain name, so that you click through to the website to see where your website was mentioned.

The reason these websites exist and are spamming your Google Analytics like this is because they want you to visit their website. Whether it’s to download adware or malware, or to show you a whole bunch of ads which make them money. They aren’t spamming your analytics for fun; it’s their business.

Divider

 

How To Filter Referrer Spam in Google Analytics Using REGEX

There are two main types of this Google Analytics spam: the annoying web crawlers which you can block from accessing your website (and therefore your analytics) using .htaccess, and ghost referrals. Ghost referrals don’t actually visit your website in the first place, so blocking them in your .htaccess will do nothing.

This solution will only prevent referral spam from appearing in your Google Analytics. Annoying, attention-seeking web crawlers will still be able to access your website but their visit won’t be recorded in Analytics, unless you block them in your .htaccess too.

1) Create a Master View – Don’t create new filters on your main All Website Data view!

Google highly recommends that webmasters always maintain one view that collects all website data, with no filters to alter the data. (In fact, this even comes up in the Google Analytics Certification exam that you should always keep one unfiltered view!) This is so that you are always able to have access to all incoming traffic and data, should you need it.

Filter Internal IP and Referrer Spam in Google Analytics

Creating a new “Master View” (by which I mean the view that you will most likely be checking everyday) is easy. It’s as easy as clicking on Create new view, where you will be able to name your view. And that’s all there is to it. Once you’ve named your view, it’s right there for you to do with it what you please. By default, it will just be unfiltered. Filters do not work retroactively, so I recommend creating the filter as soon as you have your view, before any spam gets in.

2) Create your Fake Referral Spam filter

In the account’s Admin, once you have selected your new View, you can click on Filters and + New Filter. Where you will see the Add Filter To View page:

Add spam filter in Analytics

You can name your Filter whatever you want to help you remember what the filter actually does. You want to set your Filter Type to Custom, so that you can Exclude by Campaign Source in the Filter Field.

And then in the Filter Pattern, you can type out or paste in the REGEX pattern to block your spam. The one in this example will block all of the main ones that pop up, so feel free to copy and paste this:

Fake Referrer Filter REGEX Example:

darodar\.|buttons-for.*?website|blackhatworth|ilovevitaly|prodvigator|cenokos\.|ranksonic\.|adcash\.|share.?buttons\.|social.?buttons\.|hulfingtonpost\.|best-seo-(solution|offer|service)|free.*traffic|buy-cheap-online|100dollars-seo

REGEX Tips for Google Analytics Filters:

.    Any character
\    Escape following character
*    More than 0 of previous character
?    1 or 0 of the previous character
|    “Or”
()    Matches group of characters

Divider

You can add to or replace the REGEX above with your most common spam referrers. Make sure that every dot (.) has a backslash (\) in front of it as you literally do mean a . and not “any character” as a dot would usually signify in REGEX. The backslash “escapes” the following character

If you frequently get hit by any other fake referral websites, you should add them in to the pattern or create a new filter to add them in. You may need more than one filter to filter all these websites as there is a character limit of 255 for filter patterns.

3) Edit your Fake Referral Spam Filter for any new spam sites

New websites will in time pop up, and it’s important to keep on top of this. You can add to or edit your filters whenever you want, or simply create a new one.

Referral Spam Filtered View

Spam Referrers to Filter:

There are too many to list… but these guys seem to pop up the most!

  • buttons-for-website.com
  • buttons-for-your-website.com
  • darodar.com
  • priceg.com
  • blackhatworth.com
  • hulfingtonpost.com
  • bestwebsitesawards.com
  • o-o-6-o-o.com
  • ilovevitaly.com
  • semaltmedia.com
  • free-share-buttons.com
  • social-buttons.com
  • best-seo-solution.com
  • best-seo-offer.com
  • Get-Free-Traffic-Now.com
  • googlsucks.com
  • theguardlan.com
  • 100dollars-seo.com
  • howtostopreferralspam.eu
  • sitevaluation.org
  • free-social-buttons.com

As you can see, these referrer spam domain names often tend to disguise themselves as popular websites (e.g. hulfingtonpost.com and theguardlan.com) making you think that one of the top names in the media industry has linked to you somewhere, and encouraging you to click through to see where you were linked from. Or they make you think as if you’ve won some sort of website award so you want to see what award you’ve won (e.g. bestwebsitesawards.com). But most of the time, the domain name tries to offer you something for “free” or something that is the “best” (such as SEO, traffic, social sharing buttons, or some other tool or service).

There’s a huge blacklist available to help you block known spammy HTTP referrers in your .htaccess too. Of which you can select which ones to filter in your analytics or whatever. For obvious reasons, some of the URLs on that list are a bit… NSFW perhaps. Also, be warned that you should NOT, under any circumstances, just copy and paste the whole thing into your website’s .htaccess. But don’t forget: this will only affect the referrer spambots that actually visit the website. Not the ghost referral bots.

5 Comments

  1. Hi Ria, I enjoyed reading anoher one or your great articles. I will certainly take another look at how I can improve my GA reports a bit further.\n\nHave you noticed that Ghost Referrals always seem to originate from a Hostname of ‘(not set)’? You can see this for yourself if you set Secondary Dimensions to ‘Hostname’ under Referrals.\n\nI use a filter similar to this;\nSessions > Include > Hostname > matches regex\n.*dgsupplyline.*|.*googleusercontent\.com|.*silkstream.*\n\nIt’s important to have it set to Include otherwise it doesn’t work.\n\nMy reports now only displays genuine referral traffic. As you mentioned in point 1, I only experiment with filters on either a new view or a Segment, before commiting it to the Main View. That way it doesn’t matter if you make a mistake. I learn’t the hard way recently and lost nearly a weeks worth of data! eek!

    1. Great tip, Dave!

      It’s annoying that we have to find workarounds like this….

      As for analytics data loss, I do understand Google’s recommendation of keeping one main view completely unfiltered. Comforting to know that everything is there, spam ‘n’ all!, should I need to take a look.

    2. Hi again, Dave.

      Taking a look at Hostname as a Secondary Dimension now, it seems that the majority of spam referrals targeting the sites I have in Google Analytics are spoofing their hostname? So, whilst a couple of them are (not set), it wouldn’t get rid of all the spam for me.

      Looks like the best solution to really clean everything up would probably be a combination of filters. And even then, it’s not really water tight :/

      Thanks again for the comment!

      1. You’re right, the Include hosts method doesn’t work perfectly for every website. Fortunately it works well for the DG Supplyline site (for now).\n\nI’ve been looking into Google Tag Manager recently, but that’s another kettle of fish entirely ;)

  2. This is a good solution if you manage a couple of websites, but managing dozens or hundreds is a completely different story. Applying & updating the filters becomes a nightmare. Thus we’ve developed a fully-automated, set-and-forget tool to do this for you: https://www.analytics-toolkit.com/auto-spam-filters/ . @Ria – if you want to try it out for free just email me (free trial of this tool is usually not available to trial accounts due to quota contraints).