"Fossies" - the Fresh Open Source Software Archive 
Member "websec-1.9.0/ignore.list" (31 May 2003, 3797 Bytes) of package /linux/www/old/websec-1.9.0.tar.gz:
As a special service "Fossies" has tried to format the requested text file into HTML format (style:
standard) with prefixed line numbers.
Alternatively you can here
view or
download the uninterpreted source code file.
1 [General]
2 all rights reserved
3 an error occurred
4 click here
5 comments
6 copyright
7 daily articles for
8 details
9 discussion forum
10 downloads
11 in issues
12 last modified
13 last updated
14 maintained
15 posted
16 posted at
17 previous cartoon
18 search by
19 special offer
20 the current week
21 total votes
22 visits
23 votes
24 copyright
25
26 [Date_Time]
27 \d+ Jan(uary)? \d+
28 \d+ Feb(ruary)? \d+
29 \d+ Mar(ch)? \d+
30 \d+ Apr(il)? \d+
31 \d+ May \d+
32 \d+ June? \d+
33 \d+ July? \d+
34 \d+ Aug(ust)? \d+
35 \d+ Sep(tember)? \d+
36 \d+ Oct(ober)? \d+
37 \d+ Nov(ember)? \d+
38 \d+ Dec(ember)? \d+
39 # 28-03-2005 28/03/2005 28.3.2005 2005-03-28
40 \d+[\/\-.]\d+[\/\-.]\d+
41 # 02:24 PST
42 \d{2}:\d{2} [A-Z]{3}
43
44 [Adverts]
45 http://www.news.com/cgi-bin/acc_clickthru
46 http://ads2.zdnet.com/adverts/
47 http://doublclick4.net
48
49 [VIM]
50 [\d,]+ scripts, [\d,]+ downloads
51 [\d,]+ tips, [\d,]+ tip views
52
53 [cvsweb]
54 \d+ (years?|months?|weeks?|days?|hours?|minutes?)
55
56 [Slashdot]
57 \d+ of \d+
58
59 __END__
60
61 =head1 NAME
62
63 ignore.list - websec url monitoring configuration
64
65 =head1 DESCRIPTION
66
67 =head2 IGNORE KEYWORDS
68
69 When determining which parts of a particular web page has changed, you may
70 want to skip those paragraphs that contains certain predefined words. For
71 example, pages like InfoWorld, PC Magazine and PC Week often contain the
72 current date/time regardless of whether there is new or changed content. In
73 such cases, you can use IGNORE KEYWORDS to skip those paragraphs which
74 contains date/time information.
75
76 Ignore keywords are stored in a file called "ignore.list" in the same
77 directory as websec. Like the URL list, the ignore keywords are partitioned
78 into different sections. Each section has a user-defined name. An example is
79 shown below:
80
81 [General]
82 all rights reserved
83 an error occurred
84 click here
85 comments
86 copyright
87
88 [Date_Time]
89 January\s+\d{1,2}
90 February\s+\d{1,2}
91 March\s+\d{1,2}
92 April\s+\d{1,2}
93 May\s+\d{1,2}
94
95 In the example above, there are two sections: "General" and "Date_Time".
96 You can use them in the URL list as follows:
97
98 Ignore = General
99
100 You can also use multiple sections at one go:
101
102 Ignore = General,Date_Time
103
104 If you use certain ignore keywords regularly, you might want to add them to
105 a defaults section in the URL list.
106
107 Ignore keywords can contain regular expressions. For example, the ignore
108 keyword "January\s+\d{1,2}" tells websec to look for the string "January",
109 followed by one or more spaces, followed by at least one but not more than
110 two digits.
111
112 Two sections of ignore keywords are supplied in this distribution. "General"
113 contains some general ignore keywords which you may want to use. "Date_Time"
114 contains date/time detectors coded using regular expressions. Feel free to
115 add your own!
116
117
118 =head2 IGNORE URLS
119
120 Most advertisements in webpages are of the following form:
121
122 <A HREF="http://page.url.com/advert/cgi-bin/" ...>
123 <IMG SRC="advert.animated.gif" ...>
124 Click here for free beer!
125 </A>
126
127 Such advertisements can be ignored when running webdiff using ignore URLs.
128
129 Ignore URLs are also stored in "ignore.list". They contain all of parts of
130 the URL referred to by the <A HREF> tag which you want to ignore. An example
131 is shown below:
132
133 [Adverts]
134 page.url.com/advert/cgi-bin/
135
136 Use the "Adverts" section in the URL list as follows:
137
138 IgnoreURL = Adverts
139
140 You can also use multiple sections at one go:
141
142 IgnoreURL = Adverts1,Adverts2
143
144 If you use certain ignore URLs regularly, you might want to add them
145 to a defaults section in the URL list.
146
147 Like ignore keywords, ignore URLs can contain regular expressions.
148
149 An "Adverts" section is supplied in this distribution. Feel free to add your
150 own!
151
152
153 =head1 SEE ALSO
154
155 L<url.list(5)>
156
157
158 =head1 AUTHOR
159
160 Baruch Even <websec@ev-en.org> is maintaining this program.
161
162 =cut
163