"Fossies" - the Fresh Open Source Software Archive 
Member "checkbot-1.80/README" (15 Oct 2008, 6101 Bytes) of package /linux/www/old/checkbot-1.80.tar.gz:
As a special service "Fossies" has tried to format the requested text file into HTML format (style:
standard) with prefixed line numbers.
Alternatively you can here
view or
download the uninterpreted source code file.
1 Checkbot -- a WWW link verifier
2
3 Checkbot is a perl5 script which can verify links within a region of
4 the World Wide Web. It checks all pages within an identified region,
5 and all links within that region. After checking all links within the
6 region, it will also check all links which point outside of the
7 region, and then stop.
8
9 Checkbot regularly writes reports on its findings, including all
10 servers found in the region, and all links with problems on those
11 servers.
12
13 Checkbot was written originally to check a number of servers at
14 once. This has implied some design decisions, so you might want to
15 keep that in mind when making suggestions. Speaking of which, be sure
16 to check the to do file on the website for things which have been
17 suggested for Checkbot.
18
19 INSTALLATION
20
21 Making and installing Checkbot is easy:
22
23 perl Makefile.PL
24 make
25 make install
26
27 You will need to have the following Perl modules installed in order to
28 properly install Checkbot:
29
30 LWP
31 URI
32 HTML::Parser
33 MIME::Base64
34 Net::FTP
35 Mail::Send (optional, contained in the MailTools package)
36 Time::Duration (optional, used for additional info in report)
37
38
39 WHERE TO FIND IT
40
41 Checkbot is distributed at: http://degraaff.org/checkbot/
42
43 Problems, bug reports, and feature enhancements are welcome at
44 http://sourceforge.net/projects/checkbot/
45
46 There is an announcement mailing list to which announcements of new
47 versions are posted. You can sign up for the list at
48 https://lists.sourceforge.net/lists/listinfo/checkbot-announce
49
50 Hans de Graaff <hans@degraaff.org>
51
52
53 RECENT CHANGES
54
55 Changes in versino 1.80 (15-Oct-2008)
56
57 * Fix handling of nofollow robots tag.
58 * Require newer version of LWP for better handling of character
59 encodings.
60 * Ignore mms scheme.
61 * Minor clarification in output.
62
63 Changes in version 1.79 (3-Feb-2007)
64
65 * Correctly parse documents to avoid problems with UTF-8
66 documents. This avoids the "Parsing of undecoded UTF-8 will give
67 garbage when decoding entities" messages.
68 * Allow regular expressions in the suppression file, and complain if
69 the suppression file is not a proper file.
70 * More robust handling of HTTP and FTP servers that have problems
71 responding to HEAD requests.
72 * Use the original URL to report problems.
73 * Ensure XHTML compliance.
74
75 Changes in version 1.78 (3-May-2006)
76
77 * Don't throw errors for links that cannot be expected to be valid
78 all the time (e.g. the classid attribute of an object element)
79 * Better fallbacks for some cases where the HEAD request does not
80 work
81 * Add more classes and ids to allow more styling of results pages
82 (including example CSS file)
83 * Ensure XHTML compliance
84 * Better checks for optional dependencies
85
86 Changes in version 1.77 (28-Jul-2005)
87
88 * Fix silly build-related problem that prevented checkbot 1.76 from
89 running at all.
90 * Check for presence of robots meta tag and act on it.
91
92 Changes in version 1.76 (25-Jul-2005)
93
94 * Error reports now include the page title for easier identification.
95 * javascript: links are now ignored because they cannot be checked.
96 * Documentation updates.
97
98 Changes in version 1.75 (22-Apr-2004)
99
100 * New --cookies option to accept cookies from servers while checking.
101 * New --noproxy option indicates which domains should not be
102 passed through the proxy.
103 * New error code for unknown domains; only known non-checkable
104 schemes are ignored now.
105 * Minor bug fixes.
106 * Documentation updates.
107
108 Changes in version 1.74 (17-Dec-2003)
109
110 * New --suppress option allows Response code/URL combinations not
111 to be reported as problems.
112 * Checkbot warnings are now handled as pseudo-HTTP status messages
113 so that they can make use of all Checkbot features such as
114 --dontwarn.
115 * Option --allow-simple-hosts is deprecated due to this change.
116 * More robust handling of (lack of) status messages.
117 * Checkbot now requires LWP 5.70 due to bugfixes in this release,
118 although it should still also work with older LWP versions.
119 * Documentation fixes.
120
121 Changes in version 1.73 (31-Aug-2003)
122
123 * Checkbot now tries to produce valid XHTML 1.1
124 * URLs matching the --ignore option are now completely ignored;
125 they used to be checked but not reported.
126 * Proxy support works again, but --proxy now applies to all links
127 * Documentation fixes
128
129 Changes in version 1.72 (04-May-2003)
130
131 * URLs with query strings are now checked by default, the
132 --exclude option can be used to revert to the previous behavior
133 * The server results page contains shortcut links to each section
134 * Removed warning for unqualified hostnames for news: URLs
135 * Handling of signals such as SIGINT
136 * Bug and documentation fixes
137
138 Changes in version 1.71 (29-Dec-2002)
139
140 * New --filter option allows rewriting of URLs before they will be checked
141 * Problematic links are now reported for each page on which they occur
142 * New statistics which should work correctly
143 * Much simplified storage of information on problem links
144 * Duplicate links are now properly detected and not checked twice
145 * Rewritten internals for link checking, as a consequence internal
146 and external links are checked at the same time now, not in two
147 passes like before
148 * Rewritten internals for message output
149 * A simple test case for 'make test'
150 * Minor cleanups of the code
151
152 Version 1.70 was only released for testing purposes
153 Changes in version 1.69
154
155 * Improved makefile and packaging
156 * Better default for --match argument
157 * Additional instance of using GET instead of HEAD added
158 * Bug fixes in printing of web server feedback
159
160 Changes in version 1.68
161
162 * Add --allow-simple-hosts which doesn't check for unqualified hosts
163 * Mention --style option in help and added example style file
164 * Change --sleep implementation so that fractional seconds can be used
165 * Fix a bug with handling <base> tags
166 * Tighten checks for http and https schemes
167 * Remove harmless warnings