17 February 2009

Robots.txt file

Robot.txt is a file that gives instructions to all search engine spiders to index or follow certain page or pages of a website. This file is normally use to disallow the spiders of a search engines from indexing unfinished page of a website during it's development phase. Many webmasters also use this file to avoid spamming. The creation and uses of Robot.txt file are listed below:

Robot.txt Creation:

To all robots out
User-agent: *
Disallow: /

To prevent pages from all crawlers
User-agent: *
Disallow: /page name/

To prevent pages from specific crawler
User-agent: GoogleBot
Disallow: /page name/

To prevent images from specific crawler
User-agent: Googlebot-Image
Disallow: /

To allows all robots
User-agent: *
Disallow:

Finally, some crawlers now support an additional field called "Allow:", most notably, Google.

To disallow all crawlers from your site EXCEPT Google:
User-agent: *
Disallow: /
User-agent: Googlebot
Allow: /


"robots" meta tag

If you want a page indexed but do not want any of the links on the page to be followed, you can use the following instead:
< meta name="robots" content="index,nofollow"/>

If you don't want a page indexed but want all links on the page to be followed, you can use the following instead:
< meta name="robots" content="noindex,follow"/>

If you want a page indexed and all the links on the page to be followed, you can use the following instead:
< meta name="robots" content="index,follow"/>

If you don't want a page indexed and followed, you can use the following instead:
< meta name="robots" content="noindex,nofollow"/>

Invite robots to follow all pages
< meta name="robots" content="all"/>

Stop robots to follow all pages
< meta name="robots" content="none"/>