"Fossies" - the Fresh Open Source Software Archive

Member "websec-1.9.0/ignore.list" (31 May 2003, 3797 Bytes) of package /linux/www/old/websec-1.9.0.tar.gz:


As a special service "Fossies" has tried to format the requested text file into HTML format (style: standard) with prefixed line numbers. Alternatively you can here view or download the uninterpreted source code file.

    1 [General]
    2 all rights reserved
    3 an error occurred
    4 click here
    5 comments
    6 copyright
    7 daily articles for
    8 details
    9 discussion forum
   10 downloads
   11 in issues
   12 last modified
   13 last updated
   14 maintained
   15 posted
   16 posted at
   17 previous cartoon
   18 search by
   19 special offer
   20 the current week
   21 total votes
   22 visits
   23 votes
   24 copyright
   25 
   26 [Date_Time]
   27 \d+ Jan(uary)? \d+
   28 \d+ Feb(ruary)? \d+
   29 \d+ Mar(ch)? \d+
   30 \d+ Apr(il)? \d+
   31 \d+ May \d+
   32 \d+ June? \d+
   33 \d+ July? \d+
   34 \d+ Aug(ust)? \d+
   35 \d+ Sep(tember)? \d+
   36 \d+ Oct(ober)? \d+
   37 \d+ Nov(ember)? \d+
   38 \d+ Dec(ember)? \d+
   39 # 28-03-2005 28/03/2005 28.3.2005 2005-03-28
   40 \d+[\/\-.]\d+[\/\-.]\d+
   41 # 02:24 PST
   42 \d{2}:\d{2} [A-Z]{3}
   43 
   44 [Adverts]
   45 http://www.news.com/cgi-bin/acc_clickthru
   46 http://ads2.zdnet.com/adverts/
   47 http://doublclick4.net
   48 
   49 [VIM]
   50 [\d,]+ scripts, [\d,]+ downloads
   51 [\d,]+ tips, [\d,]+ tip views
   52 
   53 [cvsweb]
   54 \d+ (years?|months?|weeks?|days?|hours?|minutes?)
   55 
   56 [Slashdot]
   57 \d+ of \d+
   58 
   59 __END__
   60 
   61 =head1 NAME
   62 
   63 ignore.list - websec url monitoring configuration
   64 
   65 =head1 DESCRIPTION
   66 
   67 =head2 IGNORE KEYWORDS
   68 
   69 When determining which parts of a particular web page has changed, you may
   70 want to skip those paragraphs that contains certain predefined words. For
   71 example, pages like InfoWorld, PC Magazine and PC Week often contain the
   72 current date/time regardless of whether there is new or changed content. In
   73 such cases, you can use IGNORE KEYWORDS to skip those paragraphs which
   74 contains date/time information.
   75 
   76 Ignore keywords are stored in a file called "ignore.list" in the same
   77 directory as websec. Like the URL list, the ignore keywords are partitioned
   78 into different sections. Each section has a user-defined name. An example is
   79 shown below:
   80 
   81         [General]
   82         all rights reserved
   83         an error occurred
   84         click here
   85         comments
   86         copyright
   87 
   88         [Date_Time]
   89         January\s+\d{1,2}
   90         February\s+\d{1,2}
   91         March\s+\d{1,2}
   92         April\s+\d{1,2}
   93         May\s+\d{1,2}
   94     
   95 In the example above, there are two sections: "General" and "Date_Time".
   96 You can use them in the URL list as follows:
   97 
   98     Ignore = General
   99 
  100 You can also use multiple sections at one go:
  101 
  102     Ignore = General,Date_Time
  103 
  104 If you use certain ignore keywords regularly, you might want to add them to
  105 a defaults section in the URL list.
  106 
  107 Ignore keywords can contain regular expressions. For example, the ignore
  108 keyword "January\s+\d{1,2}" tells websec to look for the string "January",
  109 followed by one or more spaces, followed by at least one but not more than
  110 two digits.
  111 
  112 Two sections of ignore keywords are supplied in this distribution. "General"
  113 contains some general ignore keywords which you may want to use. "Date_Time"
  114 contains date/time detectors coded using regular expressions. Feel free to
  115 add your own!
  116 
  117 
  118 =head2 IGNORE URLS
  119 
  120 Most advertisements in webpages are of the following form:
  121 
  122         <A HREF="http://page.url.com/advert/cgi-bin/" ...>
  123         <IMG SRC="advert.animated.gif" ...>
  124         Click here for free beer!
  125         </A>
  126 
  127 Such advertisements can be ignored when running webdiff using ignore URLs.
  128 
  129 Ignore URLs are also stored in "ignore.list". They contain all of parts of
  130 the URL referred to by the <A HREF> tag which you want to ignore. An example
  131 is shown below:
  132 
  133         [Adverts]
  134         page.url.com/advert/cgi-bin/
  135     
  136 Use the "Adverts" section in the URL list as follows:
  137 
  138     IgnoreURL = Adverts
  139 
  140 You can also use multiple sections at one go:
  141 
  142     IgnoreURL = Adverts1,Adverts2
  143 
  144 If you use certain ignore URLs regularly, you might want to add them
  145 to a defaults section in the URL list.
  146 
  147 Like ignore keywords, ignore URLs can contain regular expressions.
  148 
  149 An "Adverts" section is supplied in this distribution. Feel free to add your
  150 own!
  151 
  152 
  153 =head1 SEE ALSO
  154 
  155 L<url.list(5)>
  156 
  157 
  158 =head1 AUTHOR
  159 
  160 Baruch Even <websec@ev-en.org> is maintaining this program.
  161 
  162 =cut
  163