"Fossies" - the Fresh Open Source Software Archive

Member "checkbot-1.80/README" (15 Oct 2008, 6101 Bytes) of package /linux/www/old/checkbot-1.80.tar.gz:


As a special service "Fossies" has tried to format the requested text file into HTML format (style: standard) with prefixed line numbers. Alternatively you can here view or download the uninterpreted source code file.

    1 Checkbot -- a WWW link verifier
    2 
    3 Checkbot is a perl5 script which can verify links within a region of
    4 the World Wide Web. It checks all pages within an identified region,
    5 and all links within that region. After checking all links within the
    6 region, it will also check all links which point outside of the
    7 region, and then stop.
    8 
    9 Checkbot regularly writes reports on its findings, including all
   10 servers found in the region, and all links with problems on those
   11 servers.
   12 
   13 Checkbot was written originally to check a number of servers at
   14 once. This has implied some design decisions, so you might want to
   15 keep that in mind when making suggestions. Speaking of which, be sure
   16 to check the to do file on the website for things which have been
   17 suggested for Checkbot.
   18 
   19 INSTALLATION
   20 
   21 Making and installing Checkbot is easy:
   22 
   23     perl Makefile.PL
   24     make
   25     make install
   26 
   27 You will need to have the following Perl modules installed in order to 
   28 properly install Checkbot:
   29 
   30     LWP
   31     URI
   32     HTML::Parser
   33     MIME::Base64
   34     Net::FTP
   35     Mail::Send (optional, contained in the MailTools package)
   36 	Time::Duration (optional, used for additional info in report)
   37 
   38 
   39 WHERE TO FIND IT
   40 
   41 Checkbot is distributed at: http://degraaff.org/checkbot/
   42 
   43 Problems, bug reports, and feature enhancements are welcome at
   44 http://sourceforge.net/projects/checkbot/
   45 
   46 There is an announcement mailing list to which announcements of new
   47 versions are posted. You can sign up for the list at 
   48 https://lists.sourceforge.net/lists/listinfo/checkbot-announce
   49 
   50 Hans de Graaff <hans@degraaff.org>
   51 
   52 
   53 RECENT CHANGES
   54 
   55 Changes in versino 1.80 (15-Oct-2008)
   56 
   57     * Fix handling of nofollow robots tag.
   58     * Require newer version of LWP for better handling of character
   59       encodings.
   60     * Ignore mms scheme.
   61     * Minor clarification in output.
   62 
   63 Changes in version 1.79 (3-Feb-2007)
   64 
   65     * Correctly parse documents to avoid problems with UTF-8
   66       documents. This avoids the "Parsing of undecoded UTF-8 will give
   67       garbage when decoding entities" messages.
   68     * Allow regular expressions in the suppression file, and complain if
   69       the suppression file is not a proper file.
   70     * More robust handling of HTTP and FTP servers that have problems
   71       responding to HEAD requests.
   72     * Use the original URL to report problems.
   73     * Ensure XHTML compliance.
   74 
   75 Changes in version 1.78 (3-May-2006)
   76 
   77     * Don't throw errors for links that cannot be expected to be valid
   78       all the time (e.g. the classid attribute of an object element)
   79     * Better fallbacks for some cases where the HEAD request does not
   80       work
   81     * Add more classes and ids to allow more styling of results pages
   82       (including example CSS file)
   83     * Ensure XHTML compliance
   84     * Better checks for optional dependencies
   85 
   86 Changes in version 1.77 (28-Jul-2005)
   87 
   88     * Fix silly build-related problem that prevented checkbot 1.76 from
   89       running at all.
   90     * Check for presence of robots meta tag and act on it.
   91 
   92 Changes in version 1.76 (25-Jul-2005)
   93 
   94     * Error reports now include the page title for easier identification.
   95     * javascript: links are now ignored because they cannot be checked.
   96     * Documentation updates.
   97 
   98 Changes in version 1.75 (22-Apr-2004)
   99 
  100     * New --cookies option to accept cookies from servers while checking.
  101     * New --noproxy option indicates which domains should not be
  102       passed through the proxy.
  103     * New error code for unknown domains; only known non-checkable
  104       schemes are ignored now.
  105     * Minor bug fixes.
  106     * Documentation updates.
  107 
  108 Changes in version 1.74 (17-Dec-2003)
  109 
  110     * New --suppress option allows Response code/URL combinations not
  111       to be reported as problems.
  112     * Checkbot warnings are now handled as pseudo-HTTP status messages
  113       so that they can make use of all Checkbot features such as
  114       --dontwarn.
  115     * Option --allow-simple-hosts is deprecated due to this change.
  116     * More robust handling of (lack of) status messages.
  117     * Checkbot now requires LWP 5.70 due to bugfixes in this release,
  118       although it should still also work with older LWP versions.
  119     * Documentation fixes.
  120 
  121 Changes in version 1.73 (31-Aug-2003)
  122 
  123     * Checkbot now tries to produce valid XHTML 1.1
  124     * URLs matching the --ignore option are now completely ignored;
  125       they used to be checked but not reported.
  126     * Proxy support works again, but --proxy now applies to all links
  127     * Documentation fixes
  128 
  129 Changes in version 1.72 (04-May-2003)
  130 
  131     * URLs with query strings are now checked by default, the
  132       --exclude option can be used to revert to the previous behavior
  133     * The server results page contains shortcut links to each section
  134     * Removed warning for unqualified hostnames for news: URLs
  135     * Handling of signals such as SIGINT
  136     * Bug and documentation fixes
  137 
  138 Changes in version 1.71 (29-Dec-2002)
  139 
  140     * New --filter option allows rewriting of URLs before they will be checked
  141     * Problematic links are now reported for each page on which they occur
  142     * New statistics which should work correctly
  143     * Much simplified storage of information on problem links
  144     * Duplicate links are now properly detected and not checked twice
  145     * Rewritten internals for link checking, as a consequence internal
  146       and external links are checked at the same time now, not in two
  147       passes like before
  148     * Rewritten internals for message output
  149     * A simple test case for 'make test'
  150     * Minor cleanups of the code
  151 
  152 Version 1.70 was only released for testing purposes
  153 Changes in version 1.69
  154 
  155     * Improved makefile and packaging
  156     * Better default for --match argument
  157     * Additional instance of using GET instead of HEAD added
  158     * Bug fixes in printing of web server feedback
  159 
  160 Changes in version 1.68
  161 
  162     * Add --allow-simple-hosts which doesn't check for unqualified hosts
  163     * Mention --style option in help and added example style file
  164     * Change --sleep implementation so that fractional seconds can be used
  165     * Fix a bug with handling <base> tags
  166     * Tighten checks for http and https schemes
  167     * Remove harmless warnings