|
SELECTED PRESS RELEASE:
|
|
posted on:
5/17/2011 5:47:33 AM EST
|
Control access of web crawlers to your site robots.txt, googlebot, spam bots
VISIT WEBSITE (learn more)
|
There are numerous reasons as to why or when you should control the access of the web robots or crawlers to your site. As much as you want Googlebot to come to your site, you don't want the spam bots to come and collect your private information. This post describes how you can control the access of the web robots to your site through the usage of a simple 'robots.txt' file.
What are web robots or spiders?
Web Robots (also known as bots, web spiders, web crawlers, Ants) are programs that traverse the Internet in an automated manner. Search engines use web crawlers to index the web pages to provide up to date data.
Why use a 'robots.txt' file?
Gooblebot crawls your site to provide better search results. However, other spam bots may be collecting personal information such as email addresses for spamming purposes. Use the 'robots.txt' file to control the access of web crawlers to your site
Create the 'robots.txt' file using any text editor.
A simple robots.txt file uses the following 3 fields:
User-agent: the web robot the following rule applies to. Disallow: the URL you want to block the robot from accessing. Allow: the URL you want to allow the robot to access.
Examples
The following will stop ALL robots from crawling your site ('*' means all and '/' is the root directory.)
User-agent: * Disallow: /
The following will stop all robots from crawling the '/private' directory.
User-agent: * Disallow: /private
Stops Googlebot from indexing your images for Google image search. Use this to save bandwidth if u don't want your images to be available for Google image search.
User-agent: Googlebot-Image Disallow: /
The following will block all robots from crawling your site except Googlebot
User-agent: * Disallow: / User-agent: Googlebot Allow: /
Put the robots.txt file in the root directory of your website. For example, put the file in the www.yoursite.com not in a sub-directory like www.yoursite.com /sub-directory"> www.yoursite.com /sub-directory . In most cases it will be the 'public_html' directory of your site.
|
|
BUSINESS OWNER COMMENTS:
|
leave comment
|
|
** You need to be a member of IBOtoolbox to comment. Click Here to create free account.
|