what is robots.txtRobots.txt – is an important text file that determines how a search engine accesses your website. Within the file, there are directives, which specify which website pages are allowed or disallowed for a search engine to access.

Setting the file up correctly is crucial in order to get good SEO results in the longer. Below, we talk about robots.txt more in-depth.

Robots.txt in terms of SEO

You may think of search engine robots as automated systems that crawl and document (aka index) the websites that they visit.

If you’re somewhat familiar with some aspects of SEO, you may have heard robots being called crawlers, spiders or simply, bots. While they all mean the same thing, each search engine bot is a bit different from each other.

In order to “communicate” with them and to point them to the content you want to get out there, you need to create a tidy robots.txt file, which will let the bots know which pages you want or don’t want to be crawled.

Using the robots.txt file

In order to know how to use the robots.txt file, we first need to familiarize ourselves with the syntax in which we will create it.

1.User agent definition

As mentioned, each search engine bot has a unique name to it, or more specifically – a user agent. You will need to state the name of the robot (be it Yahoo, or Google) that you will be addressing.

2. Disallowing access

The “disallow” command allows you define what part of the website you want block from engine bot crawling. Typically, the URLs are stated here.

3. Allowing access

The “allow” command is typically used to unblock a certain part of a website within a blocked parent directory. Here, a subdirectory path is entered in the document.

In other words, with robots.txt, you tell the bots which parts you want to index, and which you do not.

The importance of robots.txt

You may look at it, and think to yourself: “Why would I want to block access to pages on my website? After all, isn’t it important to have a well-indexed website?”.

And you would be right, however, there are some instances where disabling access is crucial in order to get everything right.

1. Sensitive information

A good example of sensitive website information would be directories. You probably would want to block the following on your site:

  • /wp-admin/
  • /scripts/
  • /cgi-bin/

And so on.

2. Pages that have low-quality content

Google has stated on a number of occasions that the quality of a website is very important in order to get higher rankings. Pages that don’t bring anything to the table will only hinder that performance.

3. Duplicates

Preventing indexed duplicates is another crucial thing to keep in mind. Let’s say you allow people to view a “print” version of your webpage. You probably don’t want Google to scan the same thing twice, if you expect to rank high.

These are just a few examples of the pages you want to disallow when creating your robots.txt file.

Allow and Disallow formats for robots.txt

The principle on how to set up the file correctly is fairly simple, all you need to do is specify which pages you want to allow and disallow for indexing.

You’ll only need to use the disallow function once, to list all of the webpages you don’t want Google or others to index. Meanwhile, the allow command is only needed when you want a specific page to be indexed, but its parent page is disallowed.

Setting up robots.txt on your site

First, you will need to create the text file itself. Here, you will enter all of the directives for the search bots to follow. Afterwards, you will need to upload the file to your top-level domain directory using Cpanel.

The file should always be accessible right after your top-level domain name in the URL. For example “.eu/robots.txt”.

Otherwise, the bots won’t even bother looking for your specified directives.

It’s also important to know what to do if you have subdomains. For each one, you need to create a separate robots.txt file for the bots to follow.

Testing the file

Google offers a free tool in order to check how your file is working. Just head on to the Google Search Console and click robots.txt. tester.

All in all

We’ve talked about robots.txt and it’s importance in website setup and SEO. Not only is it a great way to specify what to index, you may continue to use it later on as well, as you make changes and your website grows.