To view Part One: Ghost and Crawler Spam click here.
To view Part Three: Blocking Crawler Spam click here.
Welcome back to Edifice Automotive’s series about removing Ghost and Crawler Spam from your Google Analytics. Last week we started off our series with an explanation of the different types of spam and how they work. If you missed that blog post you can find it here.
This week we will show you how to deal with Ghost Spam and remove it from your Google Analytics reporting. Before we do that however, we need to talk about setting up views.
Setting Up ViewsViews are the level in a Google Analytics account where you can access reports and analytics tools. Views are important in organizing your Google Analytics as information in Google Analytics data cannot be altered, deleted or recovered if it is not initially recorded. In other words, if you lose part of your data due to a wrongly configured filter or setting, it will be gone forever. For this reason we use different Views to test out settings and apply different filters while still receiving all of the website’s data. Google Analytics automatically creates a filter for each of the websites you are monitoring with the name “All Website Data.” To protect your data and make sure that you do not lose anything, you will not make any changes to this view other than verifying the time and date is correct. Instead we will show you how to create three new views with the names: “Main,” “Test,” and “Unfiltered Data.”
To make things faster we will simply copy the original view, “All Website Data.”
4. Change the name of the new view to Main
5. Click Copy View to confirm the copy.
6. Repeat these steps for the Test and Unfiltered views.
From now on all changes made to your Google Analytics account will be done in these views. It is best to use the Test view when attempting to set up new filters or make changes to settings that could possibly cause you to lose data. Once you have confirmed that the view does what you intended it to, you can then apply it to the Main view. As its name suggests, the Unfiltered view should not have any filters applied to it in order to see the raw data your site has collected.
Now that we have set up views for testing, it’s time to make a filter to take care of Ghost Spam
Last week we discussed how Ghost Spam shows up in your Google Analytics without ever visiting your site by injecting itself directly into the Google Analytics server through the measurement protocol. Since the bot has never visited your site, the Source and Hostname are fake or simply not set, as well as all of the other data that Google Analytics usually collects.
Therefore, by only allowing the entrance of visits that have a valid Hostname, you will block the entrance of all Ghost Spam. To do this you will need to find all of your Hostnames. It is imperative that you select all valid Hostnames for your site so you do not exclude any valid traffic.
To get a list of your Hostnames:
From here you will be brought to a table of Hostnames. From this list you will select all of the valid Hostnames that should be included in Google Analytics’ Reporting. At the very least you should see one valid Hostname: your website’s URL or Main Domain. Other Hostnames will include any services you use on your website such as external payment and translation services. Make a list of all your valid Hostnames. You can even go further and take this opportunity to leave out other Hostnames that are not spam but still non-relevant traffic, like DEV or test environments.
An INVALID Hostname is essentially any other Hostname that you don’t recognize. You may see:
5. Once you gather all your valid Hostnames, you should create a Regular Expression or REGEX that includes these Hostnames. A REGEX tells Google Analytics to ignore any visits without one of the specified Hostnames.
Here are some important formatting rules for writing a Regular Expression:
Your REGEX should look something like this:
yoururl\.com|translatingservice\.com|webcacheservice\.com|videoservice\.com|206\.190\.45\.150|shoppingcart\.com|cdn\-service\.com
It is crucial that you add all the relevant Hostnames. Otherwise, you run the risk of losing valid data.
Once you make sure the expression is correct, it’s time to create the filter.
6. Go to the Admin tab and select the Test view.
7.Select Filters on the last column “View.”
8. Select + New Filter
9. Select Create New Filter and enter Include Valid Hostnames the name.
10. In Filter Type select Custom.
11. Make sure you choose INCLUDE (you may need to scroll down a little) and select Hostname from the dropdown menu.
12. Finally, paste the Hostname REGEX that you built previously in the Filter Pattern Box.
13. Select Verify this filter. It will show a table showing you sample data of before and after applying the filter. (You should only see invalid Hostnames on the left column)
14. After making sure that no valid traffic is excluded, you can save the filter.
And that’s it! You have successfully eliminated Ghost Spam from your Google Analytics Reporting. While this soluiton does not require much maintenance, it is important to remember that every time you add your tracking-ID in any service you want to track, you need to add that service to your REGEX. You are now one step closer to unlocking Google Analytics’ true potential and having the most accurate information about your website.
We hope that this post has helped you have a better understanding of how to eliminate Ghost Spam once and for all. Next week we will discuss how we can go about blocking Crawlers from your Google Analytics. In the meantime if you have any questions about Ghost and Crawler Spam or any of Edifice Automotive’s services, fill out the contact from below and we will be happy to assist you.
Words to Know
|
LinksSpammers, Crawlers and Bot Lists: http://www.robotstxt.org/db.html http://www.botsvsbrowsers.com/ Google Analytics Resources: |
Special thanks to Carlos Escalara for his information on Ghost and Crawler Spam.
https://www.ohow.co/what-is-referrer-spam-how-stop-it-guide/#gs.zFg7TxY