Defend against fake google bots

I can think of some reasons why folks might use the Googlebot user agent on their non-Google bots, but I can’t think of any good, upstanding reasons to do it.

Here’s how one might find some fine folks who would do such a thing.

As of right now (May 2018), all valid Google Bot source IPs start with the same prefix, 66.249. This may change in the future, so if you’re having problems being crawled by Google, check to make sure you’re not blocking a new range they may have started using. OK, here’s the nitty-gritty.

Interesting that a few of the IPs from my logs indicated that Facebook in Ireland are using the Google user agent. Naughty! Anyway, if you want to test that you’re not blocking a valid Google address, then you need to do an IP lookup on some of these groups of addresses. And of course you can modify the above to scan the current log files instead of the archived gzipped files.

Here’s how I’m blocking the baddies (this isn’t original, I searched and found a version of this). This goes in your Apache config or .htaccess file:

Defend against fake google bots is original content from devolve.