Robots.txt file basics

05.02.2013 by Ilia Raiskin

When a web crawler finds a website, it first reads a file named robots.txt which is positioned at the root of a domain. In this file, you can set how and if the web crawler can visit a website. This offers us a possibility to deny the access to our website for specific search engines. Note that the robots.txt file can not be used to keep files secret, because one can open them with a browser. Some search engines still show the "hidden" pages in their search results, but without descriptions.


The first line describes the crawler or User-agent which is adressed by the following rules. There is no limitation for the amount of these "blocks". After reading a block which starts with User-agent: * a crawler will stop reading the file, so special-crawler-blocks have to be placed at the beginning of the file. You can add one-row comments to a robots.txt. They simply start with #. One-row comments can be helpful to describe your settings in the robots.txt. They are ignored by crawlers.

Possible Statements

Statement Example Description
User-Agent: User-Agent: * | User-Agent: Googlebot Specifies web crawler. * selects all crawlers.
Disallow: Disallow: / | Disallow: /images/ | Disallow: /test.html Doesn't allow reading of files.
Allow: Allow: / | Allow: /free/ | Allow: /public.html Allows reading of files.
Crawl-delay: Crawl-delay: 100 Sets readout speed. If we use our example, only every 100 seconds a new page may be opened for reading.
Sitemap: Sitemap: The Sitemap can be found using this url. This only works for the following crawlers: Googlebot, Yahoo! Slurp, msnbot,


User-agent: Googlebot
Allow: /public/
Disallow: /not-google.html

User-agent: *
Disallow /images/
Disallow /privates/
Disallow: intern_file.html

#I am a one-row comment

More Information

You'll find more Information at

About the author

Ilia Raiskin

Ilia Raiskin is a web designer, web developer, blogger and founder of