Robots.txt - Protect Web Servers From Search Indexing
Internet search services such as Google can be very comprehensive, indexing servers down to the last file, and often offering cached versions of the indexed files for view. The Ohio State University has its own local Google search appliance. However, due to the sensitive nature of some documents stored on OSU webservers, this indexing is not always desirable. To prevent this, you can use what's called a 'robots.txt' file. This gives robots (AKA search engine indexing programs or spiders) instructions for what should and shouldn't be indexed in a particular directory.
The most comprehensive protection is also the most straightforward. Just name a file 'robots.txt' and place the following lines in it:
User-agent: *
Disallow: /
Save that as a text file, and store it on the root directory of your webserver. However, you can obtain more granularity if you want some private sections blocked and some searched. You can find examples here:
The Web Robots FAQ
For an overview and more information, visit www.robotstxt.org
Current Record: 3211
Create Date: 10-27-2006
Last Reviewed: 09-06-2007
Home
