Web spiders are software agents that traverse the Internet gathering, filtering, and potentially aggregating information for a user. Using common scripting languages and their collection of Web modules, you can easily develop Web spiders. Web spider are helping people searching internet efficient and easy.Web spiders to crawl the Web pages on the Internet, return their content, and index it..Web spider look for in text is relevant content.Spider bot can only scan text and their follows link,so that image and graphic in web page no meaning to search engine bot for indexing.
spider detection:
Google and Yahoo are could be recognized by user agent string.After user agent has been detected,next is to check IP.If it is true,you can sure that this is the real search engine spider.That is good to know IP ranges of search engine spiders because you will find User Agent string
Some IP and resolved IPs you can use to detect search engines web spiders:
Google**: 66.249.64.* to 66.249.95.*, crawl-66-249-* , *.googlebot.com
Yahoo: 72.30.* , 74.6.* , 67.195.* , 66.196.* , *.crawl.yahoo.net , *.inktomisearch.com
MSN/LIVE/BING :65.54.* , 65.55.* , msnbot.msn.com , *.search.live.com
Fake Google spiders spotted from 66.249.16.* (Google IPs are from 66.249.31.xxx)
These IPs are for example only, for better detection you need to use longer IP, i.e. 65.55.252.* for MSN, to be sure that is not some another spider. Best is to check WhoIs to get IP range.
Credits to:
nice~ :D
ReplyDelete