Question

Is there a way to prevent faked Google Analytics statistics by using PhantomJS and/or a ruby crawler like Anemone?

Our monitoring tool (which is based on both of them) crawls the sites from our clients and updates the link status of each link in a specific domain.

The problem, that simulates huge trafic.

Is there a way to say something like "I'm a robot, don't track me" with a cookie, header or something?

( adding crawler IP's to Google Analytics [as a filter] may not be the best solution )

Thanks in advance

Was it helpful?

Solution 2

I found a quick solution for this specific problem. The easiest way to exclude your crawler which executes js (like phantomjs) from all Google Analytics statistics is, to simply block the Google Analytics domain through the /etc/hosts.

127.0.0.1    www.google-analytics.com
127.0.0.1    google-analytics.com

It's the easiest way to prevent fake data. This way, you don't have to add a filter to all your clients.

( thanks for other answers )

OTHER TIPS

Joe, try setting up advanced exclude filter -- use field Browser and into "Filter Pattern" put down the name of your user agent for phantom (or any other user agent -- look up the desired name in your Technology -> Browser and OS report).

enter image description here

IP filtering might not be sufficient, but maybe filtering by user agent string (which can be set arbitrarily with phantom) ? That would be the "browser" field in the filters.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top