"Fossies" - the Fresh Open Source Software Archive 
Member "tin-2.6.2/doc/filtering" (24 Aug 2021, 12253 Bytes) of package /linux/misc/tin-2.6.2.tar.xz:
As a special service "Fossies" has tried to format the requested text file into HTML format (style:
standard) with prefixed line numbers.
Alternatively you can here
view or
download the uninterpreted source code file.
See also the last
Fossies "Diffs" side-by-side code changes report for "filtering":
2.6.0_vs_2.6.1.
1 Filtering in tin
2
3 0. Status
4
5 This is an overview of the new filtering capabilities of tin. This
6 document will be absorbed in the main documentation at some point.
7
8
9 1. Introduction
10
11 Tin's filtering mechanism has changed significantly since version 1.3beta.
12 Originally there were only two possibilities:
13
14 1) kill an article matching a rule.
15 2) hot-select an article matching a rule.
16
17 This led to constant confusion, as it seemed important which rule
18 came first in the filter file, but it wasn't. Then if an article was
19 selected for whatever reason it couldn't be killed even if it was Craig
20 Shergold telling you how to make money fast in a crosspost to alt.test.
21 This binary concept isn't modern anyway, so a much more up-to-date fuzzy
22 mechanism was necessary: scoring.
23
24 When using tin's new scoring mechanism you assign a "score" to each
25 filter rule. The scores of rules matching the current article are added
26 and the final score of the article decides if it is regular, marked hot
27 or killed.
28
29 The standard "kill" and standard "select" already in your filter-file have
30 the score "score_kill" and "score_select" respectively (See section 4).
31
32
33 2. Changes to the filter-file format
34
35 Tin understands the additional "score" command in the filter-file now.
36
37 Old style rule:
38
39 scope=*
40 type=0
41 case=0
42 subj=*$$$*
43
44 New style rule:
45
46 group=*
47 case=0
48 score=-100
49 subj=*$$$*
50 #####
51
52 So you can give the individual rule a weight, based on your opinion
53 about the rule. e.g. if you want to be sure to never read a certain
54 individual again, you may give the rule a score of (-)9000.
55
56 If you want only "classical" filtering and don't want to mess around
57 with score values, you can use the magic words "kill" and "hot" as score
58 values in your filter file. Example:
59
60 group=*
61 case=0
62 score=kill
63 subj=*$$$*
64 #####
65
66 These are handled as default values at program initialization time and
67 may be somewhat easier to remember.
68
69 You might have noticed by the examples above that tin inserts a line of
70 hashes between two rules now. This is *not* required, it just improves
71 readability.
72
73
74 3. Changes in the filter menu
75
76 The on screen filter menu is now more compact and fits easily on small
77 terminals such as a small xterm or a 640x200 CON: window now. It has been
78 enhanced to allow you to enter a score for the rule you are adding. It
79 should be in the range from 1 to SCORE_MAX otherwise it will default to
80 "score_select" for select filter rules and "score_kill" for kill filter
81 rules (See section 4).
82
83
84 4. Internal defaults and config options
85
86 There are some constants defined in tin.h and tinrc:
87
88 SCORE_MAX is the maximum score an article can reach. Any value
89 above this is cut to SCORE_MAX, the same goes for negative scores.
90 recommended: 10000
91
92 "score_kill" is the default score given for any kill rule, if no other
93 is specified.
94 recommended: -100
95
96 "score_select" is the default score for any auto-selection rule, if no
97 other is specified.
98 recommended: 100
99
100 "score_limit_kill" and "score_limit_select" are the limits that must be
101 crossed to mark an article as killed or selected.
102 recommended when used with values given above: -50/+50.
103
104 "score_kill", "score_select", "score_limit_kill" "score_limit_select" are
105 config options. You can find them in tin's configuration file
106 (~/.tin/tinrc). They can also be changed at runtime in the config menu.
107
108
109 5. Overview of "filter"-commands
110
111 Everything here is also described in the file ~/.tin/filter, albeit more
112 concisely.
113
114 All lines are of the form:
115 command=value
116
117 Valid "command"s are:
118
119 add a comment to the following rule:
120
121 comment= a short text
122
123 multiple comment lines may be used, comments lines _must_ be right before
124 the scope selection.
125
126 scope selection:
127
128 group=newsgroup_pattern_list
129
130 newsgroup_pattern_list is a comma-separated list of newsgroup_patterns
131
132 newsgroup_patterns can be a pattern (wildmat-style) or !pattern,
133 negating the match of pattern. This is the same format used for the
134 AUTO(UN)SUBSCRIBE environment variable.
135
136 Tin doesn't rework your filter file, the new pattern matching is only
137 used when you enter new entries by hand.
138
139 additional info:
140
141 case=num num: 0=case sensitive, 1=case insensitive
142 score=num num: score value of rule, can now also be one of the magic words
143 "kill" or "hot", which are equivalent to
144 SCORE_KILL and SCORE_SELECT respectively.
145 time=num num: time_t value; when rule expires. When tin writes the filter
146 file it adds the time in human readable form as a comment in
147 parentheses after the numeric value. When reading the file
148 tin uses _only_ the numeric value, not the human readable form.
149
150 matches: matched to:
151
152 subj=pattern Subject:
153 from=pattern From:
154 Tin converts the contents of the From-header to an
155 old-style e-mail address, i.e. ''some@body.example (John
156 Doe)'' instead of ''John Doe <some@body.example>'',
157 before trying to match the patterns in the filter rule.
158 That way a rule tailored to match the full from
159 header "jsmith@ac.example (John Smith)" will still work
160 when John posts with a different newsreader which uses
161 "John Smith <jsmith@ac.example>".
162 msgid=pattern Message-Id: *AND* full References:
163 msgid_last=pattern Message-Id: and last Reference:s entry only
164 msgid_only=pattern Message-Id:
165 refs_only=pattern References: line (e.g. <123@example.net>) without Message-Id:
166 lines=num Lines: ; <num matches less than, >num matches more than.
167 gnksa=[<>]?NUM GNKSA parse_from() return code
168 xref=pattern Xref: ; filter crossposts to groups matching pattern
169 path=pattern Path: ; filter server names matching pattern
170
171 When you are using wildmat pattern-matching, patterns in ~/.tin/filter
172 should be delimited with "*", verbatim wildcards in patterns must be
173 escaped with "\". When using the built-in filter-file functions, tin tries
174 to take care of it for itself, except when you are entering text in the
175 built in kill/hot-menu. Then you have to quote manually because tin
176 doesn't know if e.g. "\[" is already quoted or not.
177
178 GNKSA return codes: these are the return codes of the From:-address
179 parser, enabling you to filter on certain kinds of syntactical and
180 semantical errors present in that header. For an up-to-date list see the
181 definitions in extern.h and the parser source code in misc.c, the
182 following is just a short introduction.
183
184 0-99: internal codes
185 code error description
186 0 no error, valid address
187 1 internal error, should not happen (blame me)
188
189 100-199: general syntactical errors
190 code error description
191 100 left angle bracket ("<") missing in route address
192 101 left parenthesis ("(") missing in oldstyle address (realname comment)
193 102 right parenthesis (")") missing in oldstyle address (realname comment)
194 103 at-sign ("@") missing in mail address
195 104 right angle bracket (">") missing in route address
196
197 200-299: right hand side (FQDN part) of address, syntax and semantics
198 code error description
199 200 right hand side (RHS) of address is a single component
200 201 RHS has an unknown top level domain (3 or more characters)
201 202 RHS has a malformed top level domain
202 203 RHS has an unknown country code as top level domain
203 204 illegal character in RHS
204 205 leading or trailing dot or two consecutive dots in RHS
205 206 RHS has a component longer than 63 characters
206 207 RHS has a component with leading or trailing hyphen ("-")
207 208 RHS has a component starting with a digit (with ENFORCE_RFC1034 only)
208 209 RHS is not a valid IP address
209 210 RHS is an IP address from private IP space (see RFC1918) or loopback
210 211 brackets ("[", "]") around IP address missing in RHS
211
212 300-399: syntactical errors left hand side (localpart) of address
213 code error description
214 300 there was no localpart found at all in address
215 301 localpart contains illegal characters
216 302 localpart has leading, trailing or consecutive dots
217
218 400-499: syntactical errors in realname part
219 code error description
220 400 illegal character in unquoted word in realname part
221 401 illegal character in quoted word in realname part
222 402 illegal character in encoded word in realname part
223 403 bad syntax in encoded word in realname part
224 404 illegal character in oldstyle realname part (one of "()<>\")
225 405 illegal character in realname part
226 406 missing realname part
227
228
229 Path:-filter
230 Restrictions - this will only work if:
231 - reading from local spool and
232 -- without access to local NOV-files and OVERVIEW.FMT or
233 -- local NOV files provide Path data
234 - or reading via NNTP and
235 -- NOV files provide Path data or
236 -- server supports HDR and announces "Path" in LIST HEADERS RANGE or
237 -- server does not support HDR but XHDR and returns "Path" data if requested or
238 -- server supports XPAT and returns "Path" data if requested
239 Side effects:
240 When using a Path:-filer tin _may_ rebuild locally cached overview
241 data if cache_overview_files=ON is set to get the Path data into the
242 local cache of the group where the filter is active. This may cause
243 more NNTP traffic once.
244
245 6. EXAMPLES
246
247 6.1 WILDMAT EXAMPLES
248
249 none given, too simple, find out yourself ,-)
250
251 6.2 REGEXP EXAMPLES
252
253 Be sure to change Wildcard setting from WILDMAT (default) to REGEX to make
254 the following examples to work properly. This can be done using the internal
255 configuration menu or in file ~/.tin/tinrc
256
257 comment= this kills all articles about CNews, DNEWS or diablo
258 comment= in news.software.* but not in news.software.readers
259 group=news.software.*,!news.software.readers
260 case=1
261 score=kill
262 subj=([cd]news|diablo)
263
264
265 comment= this should mark all articles about tin, rtin, tind, ktin or cdtin
266 comment= as hot
267 group=*
268 case=1
269 score=hot
270 subj=\b(cd|[rk]?)?tin(d|pre)?[-.0-9]*\b
271
272
273 comment= mark own articles and followups to own articles as hot in all groups
274 comment= except local ones
275 comment= match From: (a bit complex) and/or
276 comment= Message-ID: (I'm the only user who's posting on this server)
277 group=*,!akk.*,!tin.*
278 case=1
279 score=hot
280 from=urs@(.*\.)?((akk\.uni-karlsruhe|arbeitsen)\.de|(karlsruhe|tin|akk)\.org|ka\.nu)
281 msgid=@akk3(?:-dmz)?\.akk\.uni-karlsruhe\.de>
282
283
284 comment= stupid ppl. sometimes read control.cancel to see if there are any
285 comment= forged cancels around... the next rule helps you a bit
286 comment= ignore know despammers and net.* cancels
287 group=control.cancel
288 case=1
289 score=kill
290 from=(news@news\.msfc\.nasa\.gov|clewis@ferret\.ocunix\.on\.ca|jem@xpat\.com|(jeremy|lysander)@exit109\.com|howardk@iswest\.com|cosmo.roadkill.*rauug\.mil\.wi\.us|spamless@pacbell\.net|cwilkins@.*\.clark\.net)
291 msgid_only=<net-monitor-cancel
292
293
294 comment= this might help when reading alt.*
295 comment= ignore all postings with $$$ or *** or !!!
296 comment= ignore all postings shorter then 3 lines
297 comment= ignore all postings crossposted into more then 10 groups
298 comment= if an article has less than 10 lines AND i.e !!!
299 comment= in the subject it gets a score of -400
300 group=alt.*
301 case=1
302 score=-200
303 subj=[$*!]{3,}
304 lines=<3
305 xref=([^,]+,){10,}
306
307 comment= mark own articles and direct replies based on message-id
308 comment= use 2*hot as score to unkill otherwise killed articles
309 group=*
310 case=1
311 score=200
312 msgid_last=doeblitz\.ts\.rz\.tu-bs\.de
313
314 comment= unmark own articles based on message-id
315 comment= -> only f'ups to own articles keep marked hot
316 group=*
317 case=1
318 score=-200
319 msgid_only=doeblitz\.ts\.rz\.tu-bs\.de
320
321
322 comment= kill all articles which do not have your message-id
323 comment= as last reference _if_ article has any references
324 group=de.newusers.questions
325 case=1
326 score=-100
327 refs_only=.*<[^@\s]+@\S+(?<!akk3\.akk\.uni-karlsruhe\.de)>$
328
329 comment= Kill all articles from John Smith, who writes under different
330 comment= addresses at ac.example, e.g john@ac.example and boss@ac.example
331 group=*
332 case=1
333 score=kill
334 from=@ac\.example\s\(John\sSmith\)$
335
336 comment= Kill all articles which have news.example.org in
337 comment= in the Path: header
338 group=*
339 case=1
340 score=kill
341 path=news\.example\.org
342
343 7. TODO
344
345 - make the time value in the filter file more human readable.
346 - rewrite filtering order to get optimal performance
347 - filtering on arbitrary header lines
348 - move docu to tin.5