Quantcast
Channel: phpBB.com
Viewing all articles
Browse latest Browse all 1712

[3.3.x] Support Forum • Re: GoogleOther misbehaving? Hundreds of board guest sessions active

$
0
0
No response from Google as of yet. I have been looking through Reduce the Googlebot crawl rate and did file a special request to reduce the crawl rate with the hope that someone would notice and take action. My expectations are very low for any quick action or reply from them.

This is very irritating because on my main board Google's crawl stats show that on April 15th (last stats currently available) there was over 1M crawl requests made by GoogleOther. The board has about 700K posts and is already well indexed by GoogleBot. I am totally mystified by this new behavior by this bot.

Digging deeper and re-reading Reduce Googlebot Crawl Rate | Google Search Central  |  Documentation  |  Google for Developers
If you need to urgently reduce the crawl rate for short period of time (for example, a couple of hours, or 1-2 days), then return 500, 503, or 429 HTTP response status code instead of 200 to the crawl requests. Googlebot reduces your site's crawling rate when it encounters a significant number of URLs with 500, 503, or 429 HTTP response status codes
Looking up these status codes I discovered 429 Too Many Requests - HTTP | MDN
The HTTP 429 Too Many Requests response status code indicates the user has sent too many requests in a given amount of time ("rate limiting").

A Retry-After header might be included to this response indicating how long to wait before making a new request.
and RFC 6585 - Additional HTTP Status Codes which gives this example code

Code:

HTTP/1.1 429 Too Many Requests   Content-Type: text/html   Retry-After: 3600   <html>      <head>         <title>Too Many Requests</title>      </head>      <body>         <h1>Too Many Requests</h1>         <p>I only allow 50 requests per hour to this Web site per            logged in user.  Try again soon.</p>      </body>   </html>
That looks promising as a direction to investigate. It would involve using the servers .htaccess file to check the HTTP_USER_AGENT variable against the problematic GoogleOther user-agent string and if they match then reply with the 429 status code.

I'm working on testing implementation details. Unfortunately my boards run on shared hosting so I don't have access to the Apache rate limiting settings and controls.

Statistics: Posted by P_I — Fri Apr 19, 2024 2:06 pm



Viewing all articles
Browse latest Browse all 1712

Trending Articles