Part One: What is Ghost and Crawler Spam?

To view Part Two: Setting Up Views and Blocking Ghost Spam click here.

Anyone who runs a website that is being monitored by Google Analytics knows that eventually you must address the issue of spam.

Due to a ton of referral spam from social buttons, adult sites, and many other sources, your traffic data can become very skewed and lead you to believe your website is running better or worse than it really is. This spam can also be used as a way for advertising agencies to oversell the success of their marketing efforts as they appear to be generating large amounts of website traffic. Edifice Automotive prides itself in its ability to generate REAL website traffic and we want our customers to see what traffic we are generating for them, as well as be able to differentiate between real and “fake” traffic.

To make sure that you are getting what you are paying for, as well as determine if your website is set up to generate the highest amount of conversions, it is important to be reviewing accurate data. The simple answer to this problem is to filter out all the spam, but this is easier said than done. Spam bots are constantly changing and using different approaches to get around spam filters, and Website owners are starting to become overwhelmed by all the filters they are setting up to manage the useless data they are receiving.

The good news is, there is no need to panic. Over the next couple of weeks we will be delving in to the topic of spam and how to make sure you are getting the most accurate data about your website.

This week we will begin by laying the foundation with an understanding of the different types of spam, how they work, and common misconceptions surrounding spam and Google Analytics.

First we’ll discuss the different types of spam.

Ghost Spam 

The vast majority of spam is Ghost Spam. This type of spam gets its name from the distinguishing trait that it never accesses your website. While this makes the information harder to track, it also provides a way to exclude it from Google Analytics reporting.

How Does Ghost Spam Work

First, to understand how Ghost Spam works, it is important to know what a Source and a Hostname is. Every visit to your page has both a Hostname and a Source.

The Source is how the viewer got to your page. The three types of Sources are:

  • Referrals – a link from another website not including search engines
  • Organic – a link from an unpaid search engine listing, such as a Google search
  • Direct – a visit straight to your website by typing in your URL

The Hostname is where the visit arrives on your website and should be the same as your domain.

Because Ghost Spam is never visiting your website, all of this information is either faked, or more often not set at all.

So if Google Analytics main purpose is to track visits to your website, why does Ghost Spam show up in the reporting in the first place?

This is because it uses the Measurement Protocol, which allows people to send data directly to Google Analytics’ servers. Using this method, and most likely randomly generated tracking codes, the spammers leave a “visit” with fake data, without even knowing who they are hitting. This is very important as this greatly effects the reporting on your website. Page views no longer reflect the accurate numbers you need to assess your marketing efforts’ success, and average bounce rates and average page view times will be largely effected by these “fake” views. In addition to this, if you are paying for a marketing plan with a PPM or Pay Per Impression model, you will be paying for fake visits.

Crawler Spam

The second type of spam is Crawlers or Web Crawlers.

Web Crawlers are spiders or bots that systematically browse the web and perform automated tasks on websites such as: checking links to other websites, validating HTML codes to check for errors, and archiving or storing site contents. They are also one of the driving forces behind search engines that scan pages for relevant terms.

Some bots however are used for malicious purposes. These spam bots crawl your pages, ignoring rules like those found in robots.txt, otherwise known as the Robot Exclusion Standard or Robot Exclusion Protocol. This standard is used by websites to tell web crawlers and other web robots what parts of the website not to process or scan. However, spammers, or robots with malicious intent often ignore these rules and will even start with parts of the website they have been told not to access. They do this in order to search your website for email addresses or forms that will be used to spam you or even search for security risks within your code. When they exit your site, they leave a record on your reports that appears similar to a legitimate visit.

Crawlers are harder to identify because they know their targets and use real data. But it is also true that new ones seldom appear. So if you detect a referral in your analytics that looks suspicious, researching it on Google or checking it against lists available online might help you answer the question of whether or not it is spam. (See links for lists below.)

Now that we have a better understanding of how spam works, it is time to figure out how to deal with it. Before we do that however, let’s talk about some of the mistakes that website owners often make. These mistakes are important to address because, while they may seem like a logical fix, they can lead to a bigger problem than you started with in the first place.

What Not To Do

Mistake #1. Blocking ghost spam from the .htaccess file

One of the biggest mistakes people make is trying to block Ghost Spam from the .htaccess file.

For those who are not familiar with this file, one of its main functions is to allow/block access to your site. Now we know that ghosts never reach your site, so adding them here won’t have any effect and will only add useless lines to your .htaccess file.

Ghost spam usually shows up for a few days and then disappears. As a result, sometimes people see the decreased activity and think that they have successfully solved their problem by blocking it in the .htaccess file, when really it’s just a coincidence of timing.

When the spammers later return, the website owner gets worried because the solution is not working anymore, and they think the spammer has somehow bypassed the barriers they set up.

The .htaccess file can only effectively block crawlers and other spammers that directly access your website. Most spam can’t be blocked using this method, so there is no other option than using filters.

Mistake #2. Using the referral exclusion list to stop spam

Another error is trying to use the referral exclusion list to stop the spam. The name may confuse you, but this list is not intended to exclude referrals in a way to decrease spam. It is used for entirely different purposes.

For example, when a customer buys something, sometimes they get redirected to a third-party page for payment. After making a payment, they’re redirected back to you website, and Google Analytics records that as a new referral. It is appropriate to use referral exclusion list to prevent this from happening.

If you try to use the referral exclusion list to manage spam, however, the indicator that the page view is a referral will be stripped. As a result, a direct visit will be recorded, and you will have a bigger problem than the one you started with. You will still have spam, and direct visits are harder to track.

Mistake #3. Worrying that bounce rate changes will affect rankings

When people see that their bounce rate changes drastically because of the newly filtered spam, they start worrying about the impact that it will have on their rankings in the Search Engine Results Pages or SERPs.

This is a common misconception, but the fact of the matter is that Google doesn’t take into consideration Google Analytics metrics as a ranking factor. However it is important to have an accurate view of your bounce rate as it shows whether or not you are linking potential customers to relevant material and if your website is set up intuitively. With accurate numbers you can see if you are on the right track with your setup, or if changes need to be made to win more customers.

In Closing

Since we have established that rankings and security will not be affected, the only thing left to worry about is your data. The fake trail that spam leaves behind pollutes your reports and can have a major impact on your marketing efforts.

Invalid traffic means inaccurate reports, and inaccurate reports can cost you money.

We hope that this post has helped you have a better understanding of how spam operates and can affect your website. Next week we will discuss how we can go about blocking Ghost Spam once and for all. In the meantime if you have any questions about Ghost and Crawler Spam or any of Edifice Automotive’s services fill out the contact from below and we will be happy to assist you.

 

 

Click Here to Go to Part Two

 

Resources

 

Words to Know

  • Ghost Spam- a kind of spaming that involves making repeated web site requests using a fake referrer URL to the site the spammer wishes to advertise
  • Crawler Spam- a type of spam generated by internet bots that browse websites and log information
  • Hostname- where a visitor arrives at your website, should be the same as your domain name
  • Source – how a visitor gets to your website, made up of three different types:
    • Referrals – a link from another website not including search engines
    • Organic – a link from an unpaid search engine listing, such as a Google search
    • Direct – a visit straight to your website by typing in your URL
  • .htaccess file – a directory-level configuration file supported by several web servers, used for configuration of site-access issues, such as URL redirection, URL shortening, Access-security control (for different webpages and files), and more.
  • Robot Exclusion Standardrobot.txt or Robot Exclusion Protocol, used by websites to tell web crawlers and other web robots what parts of the website not to process or scan
  • Referral Exclusion List – a list used to prevent duplicate referrals from third party services

Links

Spammers, Crawlers and Bot Lists:

http://www.user-agents.org/

http://www.robotstxt.org/db.html

http://www.botsvsbrowsers.com/

Google Analytics Resources:

https://support.google.com/analytics/?hl=en#topic=3544906

Special thanks to Carlos Escalara for his information on Ghost and Crawler Spam.

https://www.ohow.co/what-is-referrer-spam-how-stop-it-guide/#gs.zFg7TxY

 

Back to Blog