A noindex tag informs search engines not to index a page or website, excluding it from appearing in search engine results.
While much of search engine optimization (SEO) focuses on how to get your quality content indexed and seen, there are pages that you may not wish to show up in the search engine results page (SERPs). Using the noindex tag tells search engine web crawlers to not include a page or website in its index, effectively removing it from appearing in the SERP.
A noindex robots meta tag is an HTML value that tells search engines not to include a page in the index of search results. Utilizing a noindex tag helps to define valuable and curated content from other pages used to enhance the user experience. This is a helpful tool for pages that you do not necessarily wish to be available in a SERP.
Inevitably, there may be some pages that are not helpful to your rankings or are not intended for the general eyes of the public. For example:
Pages that include posts by users such as forums. Forums may contain answers, comments, and threads without any authority.
The time it may take to notice a noindex error that occurs from misuse, or accidentally noindexing a page or entire site may vary. A decline in organic traffic by being removed from the index could have immediate and drastic results, or it could take months.
It is important to closely monitor for major changes or declines in organic traffic. This can be done through auditing content performance, Google Analytics, and Google Search Console.Recovering Pages Noindexed Errors
Noindex errors can have severely negative consequences. Recovering from a noindex error can be done through the completion of a few short steps:
To noindex a page, add an HTML meta tag in the header section of the page, or in a robots.txt file. The header section of code on any given page describes and lists information about the webpage’s properties. This may include the title, meta tags, links to external files, and code.
A general header with a noindex tag will look like:
<meta name= “robots” content = “noindex”>
<title>Don’t index this</title
The directive can be restricted so that it targets only specific bots by changing the value of the “name” in the meta tag. For example, to block Google’s bots, the code would look like this:
<meta name=”googlebot” content=”noindex>
The noindex tag can also be used as an element of the HTTP header response for any given URL, by utilizing the X-Robots-Tag:
HTTP/1.1 200 OK
Date: Wed, 12 Feb 2020 13:26:32 GMT
A domain may also include a noindex tag in its robot.txt file. A robot.txt file helps bots and search engines understand the structure and content of a website. This element of technical SEO doesn’t usually have a user experience impact but is like a guide or map for a bot crawling a site. The robot.txt file can typically be located by navigating to:
An example of noindex in a robots.txt file may look like this:
When used in robots.txt file, the noindex directive can command crawlers not to index a single page, as seen in the first line above after the user-agent is defined. It can also be used to command crawler not to index an entire section of a website, such as all of the pages generated on a website’s forum.
While a noindex tag tells a bot or crawler not to add a page to the index of the search results, a disallow directive tells search engines not to crawl the page at all. This must be done through the robots.txt file and is sometimes used in tandem with noindex.
While the disallow tag is a helpful tool, it is important to be extremely cautious when using a disallow directive. By disallowing a page, you are essentially removing it from your site in regards to search, and you are also removing its ability to pass PageRank — the value given to a webpage by a search engine that allows it to appear in the SERPs. Accidentally disallowing the wrong page — a page that drives traffic to your site, for example — can have disastrous effects on traffic and your SEO tactics.
Disallowing pages that have no reader or SEO value use can make your site quicker for bots to crawl and index. An example would be the search function on an e-commerce site. While the search function provides value to the user, the various pages it retrieves are not necessarily pages that add SEO value to your site.
If there are external links or canonical tags — tags that tell bots which page from a group of similar pages should be indexed — pointing to a page that has been disallowed, it could still be indexed and ranked, even though it cannot be crawled. This means that it could still show up in a SERPs.
To apply both directives, add them both into the robot.txt file. For example:
A nofollow tag is used to tell search engines not to evaluate the merit of the links (or a specific link) that exist on a page. Nofollow meta directives also tell bots not to discover more URLs within the site by setting all of the links to “nofollow” — by default all links on a page are set to be followed. You can either add a nofollow tag to individual links, or blanket nofollow them via a robots meta tag in the page’s HTML header. Nofollow links can be used as an SEO tactic to be able to link to pages they wish to provide to the reader, without the bot or crawler associating that page with their own.
For example, a single nofollowed link might look like:
<a href=”https://example.com/” rel=”nofollow”>
While a nofollow meta tag in the header would look like this:
<meta name=”robots” content=”nofollow”>
Nofollow tags are useful when applied to links that you may not directly control, such as links in comment sections, inorganic or non-relevant paid links, guest posts, links to something off-topic to the website or page, or an embed such as a widget or infographic.
Adding a nofollow tag to a link won’t prevent the linked page from being crawled or indexed, though it prevents an association or passing of authority between the linked pages.
To simultaneously command bots not to index a page or follow the links on it, you would simply combine the noindex, nofollow definitions into one meta tag. For example:
<meta name=”robots” content=”noindex, nofollow”>
If you do not wish for Google to crawl the page completely, you will still need to disallow it.