
15 Jan Best Robots.txt for WordPress & WooCommerce in 2022
Need a template for the best robots.txt for WordPress and WooCoommerce? Checkout the code below, explained line by line. Perfect for any use of WordPress and or WooCommerce.
The Robots.txt File
Below is raw text you can paste, you can also check the file here, which we use live on this site.
Our file is here: https://wpwebdesign.ie/robots.txt, this file must be live and on the root of your domain, also https:// is assumed now with all servers.
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /thanks/
Disallow: /*page/
Disallow: /tag/
Disallow: /feed/
Disallow: /author/
Disallow: /category/
Disallow: /my-account/
Disallow: /cart/
Disallow: /checkout/
Disallow: /portfolio_page/
Sitemap: https://domain.com/sitemap_index.xml
The Code: Line by Line
Not every line here is needed. Read below to find out more:
- This is asking all crawlers to obey the same set of rules, you can specify crawlers, but this is generally not very helpful
- Excluding the admin area globally from being crawled
- This line is an exception to #2 allowing the JS required to load be crawled, Google alleges to render your site, but they really just crawl your JavaScript
- This is an example of a page that a form converts a user to. So if a user sends me a message, the form redirects them to /thanks/ this page and any not in your menu should be hidden
- This line disables all pagination, shop, blog etc. These are bad and generate a list of pointless pages, users dont need to see this, they need to see the main archive page, where you can avoid generating these kinds of URLs you should, using lazy load, infinite scroll and other non page based approaches
- Tags should be avoided/disabled in all facets of WordPress and WooCommerce, nothing generates bloat like tags, I add this line to be sure they wont index if live, but if you link to tags you will get a really messy SERP, diluted with super poor low value pages
- This /feed/ URL is one I see GSC wasting time on, its a recent addition to this template, it blocks all the RSS stuff WP generates, all of which is a little bit antique now, either way feeds should not index
- This blocks Author links, they can also be disabled via the Yoast Plugin, one of the great things Yoast can do once de-bloated
- Category is similar to tags, its is highly unlikely your blog categories should be indexed, there are very niche uses cases for news site where this may not be true
- A WooCommerce Specific line, this blocks the account section, this is not expected to index, but will be setup to do so by default
- Similar to #10 this is a portion of the eCommerce journey, so not required to index, these pages are dynamic and contain no content
- Checkout will not be accessible to the user, without a product in the cart, this line hides it from search. If you dont use WooCommerce it does no harm to have these lines, in case it is ever installed and indexes on your live site
- This blocks the portfolio of my website, but may not be needed in your site, but you may find a similar URL structure like this that you need to block
- The final line is always a link to your single site map, or sitemap index if you have more than one
NB! “Warning: Don’t use a robots.txt file as a means to hide your web pages from Google search results.If other pages point to your page with descriptive text, Google could still index the URL without visiting the page. If you want to block your page from search results, use another method such as password protection or noindex.”
Google is rarely this direct, this is the main take-away from the link above. Saving you a click/read 🙂 You should be blocking all of these pages via meta tags also. This is the real way to stop them indexing. Any tag or page indexed you want to remove can be done via GSC, I will have a post on this to help you out with that task.