No response from Google as of yet. I have been looking through Reduce the Googlebot crawl rate and did file a special request to reduce the crawl rate with the hope that someone would notice and take action. My expectations are very low for any quick action or reply from them.
This is very irritating because on my main board Google's crawl stats show that on April 15th (last stats currently available) there was over 1M crawl requests made by GoogleOther. The board has about 700K posts and is already well indexed by GoogleBot. I am totally mystified by this new behavior by this bot.
Digging deeper and re-reading Reduce Googlebot Crawl Rate | Google Search Central | Documentation | Google for DevelopersThat looks promising as a direction to investigate. It would involve using the servers
I'm working on testing implementation details. Unfortunately my boards run on shared hosting so I don't have access to the Apache rate limiting settings and controls.
This is very irritating because on my main board Google's crawl stats show that on April 15th (last stats currently available) there was over 1M crawl requests made by GoogleOther. The board has about 700K posts and is already well indexed by GoogleBot. I am totally mystified by this new behavior by this bot.
Digging deeper and re-reading Reduce Googlebot Crawl Rate | Google Search Central | Documentation | Google for Developers
Looking up these status codes I discovered 429 Too Many Requests - HTTP | MDNIf you need to urgently reduce the crawl rate for short period of time (for example, a couple of hours, or 1-2 days), then return 500, 503, or 429 HTTP response status code instead of 200 to the crawl requests. Googlebot reduces your site's crawling rate when it encounters a significant number of URLs with 500, 503, or 429 HTTP response status codes
and RFC 6585 - Additional HTTP Status Codes which gives this example codeThe HTTP 429 Too Many Requests response status code indicates the user has sent too many requests in a given amount of time ("rate limiting").
A Retry-After header might be included to this response indicating how long to wait before making a new request.
Code:
HTTP/1.1 429 Too Many Requests Content-Type: text/html Retry-After: 3600 <html> <head> <title>Too Many Requests</title> </head> <body> <h1>Too Many Requests</h1> <p>I only allow 50 requests per hour to this Web site per logged in user. Try again soon.</p> </body> </html>
.htaccess
file to check the HTTP_USER_AGENT
variable against the problematic GoogleOther user-agent string and if they match then reply with the 429 status code. I'm working on testing implementation details. Unfortunately my boards run on shared hosting so I don't have access to the Apache rate limiting settings and controls.
Statistics: Posted by P_I — Fri Apr 19, 2024 2:06 pm