Le Nguyen The Dat bio photo

Le Nguyen The Dat

~ Data Science & Engineering

Twitter Facebook LinkedIn Github



If you are using or planing to use Google Analytics for visitors tracking purpose on your website, you probably want to have another closer look into it and rethink on the platform you might want to chose in the end, especially when you have a production system and want the best insights you can achieve.

A simple search on “Google Analytics referral spam” keyword would give you millions of results on the web about it, on phantom traffics definition, why and how to detect / remove such traffics.

This post is, however, not about solving this problem. It’s about me ranting and being annoyed. Below is what I see when logging into Google Analytics Dashboard earlier today:


See those phantom traffics? All you can do about this is to remove them each time when you see a new one, through the interface. The whole process is extremely manual and painful:

Note: It’s even worse trying to do it retrospectively to fix your historical tracking data.

Imho, one of the most essential characteristic of a mature tracking platform is that it should be able to eliminate spam itself effectively. Technically, it’s not even a hard problem to begin with - there are thousands of ways to detect spammy traffics and to eliminate them with high confident level. In this case: a simple Decision Tree or Naive Bayes classifier model would simply work.

It’s also important to note that this is a 10-years-old Google product, and with all the great engineers they have, it should not take more than a few days (maybe a week?) to fix it properly. Why didn’t they fix it already? Well, I have no idea, but since I’m using it for free, what can I say?