In one of our first discussions on search engines, I said that robots were the algorithms that indexed your site. In this discussion, I’ll tell you how to keep them off of certain pages.
Did you know that a robot will follow your links to index the site? What would you do if one of your links went to a page that you didn’t want in the search engine index? How would you protect your data from intrusion by the spiders.
Remember our discussion of the meta tags. Well, there is another meta tag that is helpful in getting you indexed fully or partially. It’s called the “robots” tag. Using this robots tag, you can tell the search engines to either index all pages by following all links, or you can have them index certain pages and ignore other pages by putting a robots.txt file on your server and telling the spiders that there are indexing instructions in that file. If you were to go to a website that had a robots tag (and most meta tag analyzers recommend that you have one), it would either say index all or redirect to a robots.txt file. Here is a very good resource, http://www.searchengineworld.com/robots/robots_tutorial.htm . I suggest this page because it gives you a very good step by step layout that you can use to write your own robots.txt file
It is very interesting to know that the search engine spiders actually have different names (GoogleBot, T-rex, Scooter). So that gives you the flexibility to tell one robot that it can index a certain page or group of pages while telling another robot that it can index a completely different set of pages. You end up with total control over what is indexed by certain search engines.
Another feature that the robots file has is its ability to tell the robot to slow down. There are certain spiders that index the site very quickly, almost too quickly. By setting a “speed limit” per se, you can make sure your site is indexed completely.
I personally tell the robots to index all of my pages because anything that I need to hide is taken care of with passwords. But the situation is different if you are a work in progress or have pages that are for e-mail subscribers only.
There is some speculation, though, that some of the robots are ignoring the webmasters requests and indexing everything. Although speculation is light at best, it’s always best to still make the request known.
It’s also not required to have a robots.txt file or tell the robots to index all pages; it does help though to cover all the bases.
Hope that helps and in my next lesson, I’ll talk to you about how to make money off of your website through promotional ads and affiliate links.