Introduction to the Problem
Google’s John Mueller answered a question about a site that received millions of Googlebot requests for pages that don’t exist. One non-existent URL received over two million hits, essentially DDoS-level page requests. The publisher’s concerns about crawl budget and rankings seemingly were realized, as the site subsequently experienced a drop in search visibility.
Understanding 410 Gone Server Response Code
The 410 Gone server response code belongs to the family 400 response codes that indicate a page is not available. Unlike the 404 status code, the 410 signals the browser or crawler that the missing status of the resource is intentional and that any links to the resource should be removed. The person asking the question had about 11 million URLs that should not have been discoverable, which they removed entirely and began serving a 410 response code.
Rankings Loss Due to Excessive Crawling
Three weeks later, things had not improved, and the person posted a follow-up question noting they’ve received over five million requests for pages that don’t exist. They shared an actual URL in their question, which received approximately 5.4 million requests from Googlebot, with around 2.4 million directed at one specific URL. The person also noticed a significant drop in their visibility on Google during this period and wondered if there was a connection.
Google’s Explanation
Google’s John Mueller confirmed that it’s Google’s normal behavior to keep returning to check if a page that is missing has returned. This is meant to be a helpful feature for publishers who might unintentionally remove a web page. Mueller stated that disallowing crawling with robots.txt is also fine if the requests annoy the publisher.
Technical Considerations
Mueller cautions that the proposed solution of adding a robots.txt could inadvertently break rendering for pages that aren’t supposed to be missing. He advises the person to double-check that the ?feature= URLs are not being used at all in any frontend code or JSON payloads that power important pages. Additionally, Mueller suggests using Chrome DevTools to simulate what happens if those URLs are blocked and monitoring Search Console for Soft 404s to spot any unintended impact on pages that should be indexed.
Diagnostic Approach
John Mueller suggests a deeper diagnostic to rule out errors on the part of the publisher. A publisher error started the chain of events that led to the indexing of pages against the publisher’s wishes. So, it’s reasonable to ask the publisher to check if there may be a more plausible reason to account for a loss of search visibility. This is a classic situation where an obvious reason is not necessarily the correct reason.
Conclusion
In conclusion, Google’s John Mueller provided valuable insights into how Google handles missing pages and the potential impact on crawl budget and rankings. The case highlights the importance of careful consideration when implementing technical SEO solutions to avoid unintended consequences. By following Mueller’s advice, publishers can ensure that their website is properly indexed and visible to search engines, while also avoiding potential pitfalls that can negatively impact their online presence.