Robots.txt file basics
When a web crawler finds a website, it first reads a file named
robots.txt which is positioned at the root of a domain. In this file, you can set how and if the web crawler can visit a website. This offers us a possibility to deny the access to our website for specific search engines. Note that the
robots.txt file can not be used to keep files secret, because one can open them with a browser.
Some search engines still show the "hidden" pages in their search results, but without descriptions.
The first line describes the crawler or User-agent which is adressed by the following rules. There is no limitation for the amount of these
"blocks". After reading a block which starts with
User-agent: * a crawler will stop reading the file, so special-crawler-blocks have to be placed at the beginning of the file.
You can add one-row comments to a robots.txt. They simply start with
#. One-row comments
can be helpful to describe your settings in the robots.txt. They are ignored by crawlers.
||Specifies web crawler.
||Doesn't allow reading of files.|
||Allows reading of files.|
||Sets readout speed. If we use our example, only every 100 seconds a new page may be opened for reading.|
||The Sitemap can be found using this url. This only works for the following crawlers: Googlebot, Yahoo! Slurp, msnbot, Ask.com|
User-agent: Googlebot Allow: /public/ Disallow: /not-google.html User-agent: * Disallow /images/ Disallow /privates/ Disallow: intern_file.html #I am a one-row comment
You'll find more Information at http://www.robotstxt.org