10 Latest Articles
What Is A Robots.txt File?
What Is A Robots.txt File?
Author: Chuck Lasker
Search engines look at millions of Web pages to come up with search results. They do this with what we call "search engine spiders." This makes sense - spiders crawling around on the Web. But another word for them is "robots" because they are simply unmanned programs gathering data automatically.
In the beginning, these robots spidered every page, every file, attached to the Web. This caused problems for both the search engines and the people using them. Pages that really weren´t worth looking at, such as, say, header files to be included in all pages on a site, were being spidered and showed up in search results. Have you ever searched on Google and gotten a partial page as a result?
The solution was for Google and other search engines to begin looking for a robots.txt file in the root folder of each site (http:// www. mydomain. com/ robots.txt) to determine what should and shouldn´t be searched. This is named, "The Robots Exclusion Standard." This simple text file, created with Notepad or other simple text editor, gives you complete control by telling the robots not to spider certain folders in your site. The result is happier visitors who come to your site from search engines and get only full pages that you want them to see, not partial, test or script pages you don´t want them to see.
Let´s look at some examples to get started:
This allows all spiders to spider all pages on your site. The * is a wildcard that means "all spiders."
User-agent: *Disallow:
This is the opposite of the above example. This one tells all spiders to NOT spider your whole site. You might want this if you have a test site, for example, that is not live yet.
User-agent: *Disallow: /
This example tells all robots to stay out of the cgi-bin and images folders.
User-agent: *Disallow: /cgi-bin/Disallow: /images/
This example tells only the WebFerret robot to not spider the page ferret.htm. It´s only an example. I have nothing against WebFerret. The user agent code for Google is googlebot.
User-agent: WebFerretDisallow: ferret.htm
It is important that the file is a simple text file - do not use Microsoft Word to create it. And be careful of how you type - it must look exactly like the above examples, with caps only for the first letter, just the right spacing, etc. A poorly done robots.txt file could harm your site more than help it.
What Is A Robots.txt File? - about the author:
As an e-commerce consultant for over five years, and Web designer for over ten, Chuck Lasker of has been helping individuals and organizations utilize the Internet in almost every arena. Chuck´s latest focus has been training and writing about ecommerce, social media and affiliate marketing.