Command-line utility for automating the fight against Google Analytics referral spam
Google Analytics referrer spam is pain. There are hundreds of known referrer spam domains and every other day a new one pops up. And the only way to keep the spammers from skewing your web analytics reports is to block these spam domain names one by one.
ga-spam-control is a small command-line utility that keeps your Google Analytics spam filters up-to-date, automatically.
ga-spam-control creates filters for your Google Analytics accounts that block known referrer spam domains from your analytics reports and keeps these filter up-to-date.
To always protect your analytics reports from annoying false entries ga-spam-control combines multiple community-maintained lists of known spam domains:
- ddofborgs' Analytics Ghost Spam List
- Stevie Rays' apache-nginx-referral-spam-blacklist
- Piwik Referrer spam blacklist
This gives you the ability to completely automate your spam protection process. Just let ga-spam-control check your Google Analytics accounts daily for new spam. And when it detects new spam; update your filters.
The command line utility provides the following actions.
Spam Control Filter Actions
In order to protect your Google Analytics account from spam ga-spam-control creates filters which blocks known referrer spam domains from your analytics reports. These are the commands that help you to review and update your spam filters:
- Action: show-status Display the spam-control status of all your accounts or for a specific account
- Action: update-filters Create or update the spam-control filters for a specific account
- Action: remove-filters Remove all spam-control filters from an account
Referrer Spam Domains Actions
The basis for the spam filters is an up-to-date list of known referrer spam domains. And with these commands you can review and update the spam-domain lists:
- Action: list-spam-domains Print a list of all currently known referrer spam domains
- Action: update-spam-domains Update the list of referrer spam domain names.
- Action: find-spam-domains
Manually review the last
n
days of analytics data and mark domain names as spam
Which domains are currently considered spam is stored in the ~/.ga-spam-control/spam-domains/community.txt
and ~/.ga-spam-control/spam-domains/personal.txt
.
ga-spam-control <command> [<args> ...]
ga-spam-control --help
Display the current spam-control show-status for all accounts that you have access to:
ga-spam-control show-status
Display the spam-control status in a parseable format:
ga-spam-control show-status --quiet
Display the current spam-control status for a specific Google Analytics account:
ga-spam-control show-status <accountID>
update the spam-control filters for a specific Google Analytics account:
ga-spam-control update-filters <accountID>
remove the spam-control filters for a specific Google Analytics account:
ga-spam-control remove-filters <accountID>
The find-spam-domains displays referrer domain names from the last n
days of analytics data to you for review.
ga-spam-control find-spam-domains <accountID> <numberOfDaysToLookBack>
By default ga-spam-control will use the last 90 days of analytics data. But if you want to review less or more days you can specify the number of days yourself.
Authentication
The first time you perform an action, you will be displayed an oAuth authorization dialog.
If you permit the requested rights the authentication token will be stored in your home directory (~/.ga-spam-control/credentials.json
).
To sign out you can either delete the file or de-authorize the "Google Analytics Spam Control" app in your Google App Permissions at https://security.google.com/settings/security/permissions.
The command-line package is github.com/andreaskoch/ga-spam-control/cli. You can clone the repository or install it with go get github.com/andreaskoch/ga-spam-control
and then run the make.go script:
go run make.go -test
go run make.go -install
go run make.go -crosscompile
Or with make:
make test
make install
make crosscompile
ga-spam-control is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.
Ideally Google would just include a spam-protection into Google Analytics but until then here are some ideas for additional features and possible improvements:
- Make remote spam domain providers configurable
- Populate my own list of known referrer spam domains with the results from the
find-spam-domains
action.- Automatic daily upload from the ga-spam-control clients
- Review of the additions by trusted community members or by a tool which checks the listed website
- Create and update a "No Referrer Spam" segment and update it during the normal update process. Unfortunately I will need Google to add create and update support to the Google Analytics API for this to work (see: analytics-issues - Issue 174: Create Advanced Segment and Customized Report Through API).
- Until Google supports segment creation via the API I ga-spam-control can at least print the necessary segment content to support manual editing of spam segments.
- Use machine learning to automatically identify new referrer spam. Earlier versions of ga-spam-control already used a machine learning model. But unfortunately I could only train the model to detect new referrer spam for a single website - the model did not work well enough when I applied it to websites with different usage patterns.
- Other options for detecting referrer spam automatically
- Correlate analytics data with web server logs to identify referrer spam
- Do a word analysis of the referrer site and use regular e-mail techniques to identify spam sites
Let me know if you have other ideas, or if want one of the features implemented next.
There are multiple curated lists of referrer spam domains out there that you can use to create filters for your analytics accounts.
- Analytics Ghost Spam List
- Stevie Ray: apache-nginx-referral-spam-blacklist
- Piwik Referrer spam blacklist
- Referrer Spam Blocker Blacklist
- My own list of referral spam domains
ga-spam-control is not the first and not the only tool that helps you to block referrer spam from your Google Analytics accounts.
Filters prevent referrer spam from getting into your Google Analytics accounts. But filters don't help you with referrer spam that already reached your reports. In order to filter this spam out you can use segments that filter out the spammy traffic:
Google Analytics has a setting to block bots and spiders from your Google Analytics reports.
- Goto
Google Analytics > Admin > Account > Property > View > View Settings
- Goto
Bot Filtering
- Check
Exclude all hits from known bots and spiders
This feature is not advertised much by Google. The only time it officially got mentioned by is in a Google Plus post: Google Analytics - Introducing Bot and Spider Filtering.
I am not yet sure if this flag does the trick. One would assume that is would be easy for Google to exclude all referrer spam and block the stupid spammers once and for all.