This is share about webbot command line syntax from w3c
Robots.txt and HTML META tags
There are situations where you may not want the robot to behave as a robot but more as a link checker in which case you may consider using these options:
-norobotstxt
If you for some reason don't want the robot to check for a robots.txt file then add this command line option
-nometatags
If you for some reason don't want the robot to check for HTML robots related META tags then add line option
Distribution and Statistics Features
Note that if you are using SQL based logging then the set of statistics that can be drawn directly from the database is very high
-charset [ file ]
Specifies a log file of which charsets (content type parameter) were
encountered in the run and their distribution
-format [ file ]
Specifies a log file of which media types (content types)
were encountered in the run and their distribution
-hit [ file ]
Specifies a log file of URIs sorted after how many times they were referenced in the run
-lm [ file ]
Specifies a log file of URIs sorted after last modified date. This gives a good overview of the dynamics of the web site that you are checking.
-rellog [ file ]
Specifies a log file of any link relationship found in the HTML LINK tag (either the REL of the REV attribute) that has the relation specified in the -relation parameter (all relations are modelled by libwww as "forward"). For example "-rellog stylesheets-logfile.txt -relation stylesheet" will produce a log file of all link relationships of type "stylesheet". The format of the log file is
" --> "
meaning that the from-URI has the forward relationship with to-URI.
-title [ file ]
Specifies a log file of URIs sorted after any title found either as an HTTP header or in the HTML.
- References
- Henrik Frystyk Nielsen, Webbot Command Line Syntax, Retrived in 04/05/1999
- URL : http://www.w3.org/Robot/User/CommandLine.html
No comments:
Post a Comment