What is a robots.txt file?
The robots.txt is a file in the root folder of your website that contains directives tells search engines and other web crawlers what they are allowed to do on your website, including sections of the site they cannot access, crawl rates, and more.
What is the use of robots txt file in SEO?
The robots.txt file is used in SEO to control areas of the site that you do not want the search engines to access, index, or even crawl. These reasons are:
- The content is private (possibly even private information) and should not be in the search indices;
- The content is low-value and the SEO does not want the search engines to waste their time on that section of the site.
The robots.txt file at its base looks thus:
The user-agent is declared if you want to specify directives for a specific crawler, such as Googlebot. If you want to include all crawlers, you use User-agent: *
Disallow is used to exclude certain parts of the site or specific pages from being crawled.
Allow can be used to allow specific pages within areas that are disallowed. The default for every page and section on your website is that it is allowed.
The robots.txt file should not be confused with meta noindex tags or canonical tags. The robots.txt is a sledgehammer that will completely block whole sections of websites, whereas canonicals and meta noindex tags are more of a scalpel that can be used in one-off areas to control things like duplicate content.
The most common mistake I see with robots.txt is blocking the entire website by accident. All it takes is this simple directive:
User-agent: * Disallow: /