WordPress Robots.txt Essentials

Introduction to Robots.txt

The humble robots.txt file often sits quietly in the background of a WordPress site, but the default is somewhat basic out of the box and doesn’t contribute towards any customized directives you may want to adopt. This post is only useful for WordPress installations on the root directory of a domain or subdomain only, e.g., domain.com or example.domain.com.

Where to Find the WordPress Robots.txt File

By default, WordPress generates a virtual robots.txt file. You can see it by visiting /robots.txt of your install, for example: https://yoursite.com/robots.txt. This default file exists only in memory and isn’t represented by a file on your server. If you want to use a custom robots.txt file, all you have to do is upload one to the root folder of the install. You can do this either by using an FTP application or a plugin, such as Yoast SEO, that includes a robots.txt editor that you can access within the WordPress admin area.

The Default WordPress Robots.txt

If you don’t manually create a robots.txt file, WordPress’ default output looks like this:

- Advertisement -

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

While this is safe, it’s not optimal. Let’s go further.

Including Your XML Sitemap(s)

Make sure that all XML sitemaps are explicitly listed, as this helps search engines discover all relevant URLs. For example:

Sitemap: https://example.com/sitemap_index.xml
Sitemap: https://example.com/sitemap2.xml

Things Not to Block

There are now dated suggestions to disallow some core WordPress directories like /wp-includes/, /wp-content/plugins/, or even /wp-content/uploads/. Don’t! Here’s why you shouldn’t block them:

Google is smart enough to ignore irrelevant files. Blocking CSS and JavaScript can hurt renderability and cause indexing issues.
You may unintentionally block valuable images/videos/other media, especially those loaded from /wp-content/uploads/, which contains all uploaded media that you definitely want crawled.

Managing Staging Sites

It’s advisable to ensure that staging sites are not crawled for both SEO and general security purposes. I always advise to disallow the entire site. You should still use the noindex meta tag, but to ensure another layer is covered, it’s still advisable to do both. If you navigate to Settings > Reading, you can tick the option “Discourage search engines from indexing this site,” which does the following in the robots.txt file (or you can add this in yourself).

User-agent: *
Disallow: /

Google may still index pages if it discovers links elsewhere (usually caused by calls to staging from production when migration isn’t perfect). Important: When you move to production, ensure you double-check this setting again to ensure that you revert any disallowing or noindexing.

Cleaning Up Non-Essential Core WordPress Paths

Not everything should be blocked, but many default paths add no SEO value, such as the below:

Disallow: /trackback/
Disallow: /comments/feed/
Disallow: */feed/
Disallow: */embed/
Disallow: /cgi-bin/
Disallow: /wp-login.php
Disallow: /wp-json/

Disallowing Specific Query Parameters

Sometimes, you’ll want to stop search engines from crawling URLs with known low-value query parameters, like tracking parameters, comment responses, or print versions. Here’s an example:

User-agent: *
Disallow: /*?replytocom=
Disallow: /*?print=

You can use Google Search Console’s URL Parameters tool to monitor parameter-driven indexing patterns and decide if additional disallows are worthy of adding.

Disallowing Low-Value Taxonomies and SERPs

If your WordPress site includes tag archives or internal search results pages that offer no added value, you can block them too:

User-agent: *
Disallow: /tag/
Disallow: /page/
Disallow: /?s=

As always, weigh this against your specific content strategy. If you use tag taxonomy pages as part of content you want indexed and crawled, then ignore this, but generally, they don’t add any benefits.

Monitoring Crawl Stats

Once your robots.txt is in place, monitor crawl stats via Google Search Console:

Look at Crawl Stats under Settings to see if bots are wasting resources.
Use the URL Inspection Tool to confirm whether a blocked URL is indexed or not.
Check Sitemaps and make sure they only reference pages you actually want crawled and indexed.
In addition, some server management tools, such as Plesk, cPanel, and Cloudflare, can provide extremely detailed crawl statistics beyond Google.

Conclusion

While WordPress is a great CMS, it isn’t set up with the most ideal default robots.txt or set up with crawl optimization in mind. Just a few lines of code and less than 30 minutes of your time can save you thousands of unnecessary crawl requests to your site that aren’t worthy of being identified at all, as well as securing a potential scaling issue in the future. By following these steps and customizing your robots.txt file, you can improve your website’s crawlability and overall SEO performance.

Google Updates Image SEO...

How to Use Google...

Content That Converts: The...

Maximize Your Blog’s Impact:...

Introduction to Robots.txt

Where to Find the WordPress Robots.txt File

The Default WordPress Robots.txt

Including Your XML Sitemap(s)

Things Not to Block

Managing Staging Sites

Cleaning Up Non-Essential Core WordPress Paths

Disallowing Specific Query Parameters

Disallowing Low-Value Taxonomies and SERPs

Monitoring Crawl Stats

Conclusion

WordPress Gutenberg 22.7 Lays Groundwork For AI Publishing

WordPress Releases AI Plugins For Anthropic Claude, Google Gemini, And OpenAI

Joost de Valk Exits Federated WordPress Repository Project

WooCommerce May Gain Sidekick-Type AI Through Extensions

Google Tests AI Headlines, Rolls Out Spam Update –...

Google Answers Questions About Search Console’s Branded Queries Filter

ChatGPT’s Default & Premium Models Search The Web Differently

WordPress Gutenberg 22.7 Lays Groundwork For AI Publishing

WordPress Releases AI Plugins For Anthropic Claude, Google Gemini, And OpenAI

Google Tests AI Headlines, Rolls Out Spam Update – SEO Pulse

Google Answers Questions About Search Console’s Branded Queries Filter

ChatGPT’s Default & Premium Models Search The Web Differently

WordPress Gutenberg 22.7 Lays Groundwork For AI Publishing

About Blog Traffic Guide

Categories to explore

Useful Links

Our Newsletter

Explore the website

Looking for something?

Explore the website

Looking for something?

Explore the website

Looking for something?

WordPress Robots.txt Essentials

Introduction to Robots.txt

Where to Find the WordPress Robots.txt File

The Default WordPress Robots.txt

Including Your XML Sitemap(s)

Things Not to Block

Managing Staging Sites

Cleaning Up Non-Essential Core WordPress Paths

Disallowing Specific Query Parameters

Disallowing Low-Value Taxonomies and SERPs

Monitoring Crawl Stats

Conclusion

About Blog Traffic Guide

Categories to explore

Useful Links

Our Newsletter