"Fossies" - the Fresh Open Source Software Archive

Member "tin-2.6.2/doc/filtering" (24 Aug 2021, 12253 Bytes) of package /linux/misc/tin-2.6.2.tar.xz:


As a special service "Fossies" has tried to format the requested text file into HTML format (style: standard) with prefixed line numbers. Alternatively you can here view or download the uninterpreted source code file. See also the last Fossies "Diffs" side-by-side code changes report for "filtering": 2.6.0_vs_2.6.1.

    1 Filtering in tin
    2 
    3 0. Status
    4 
    5 This is an overview of the new filtering capabilities of tin. This
    6 document will be absorbed in the main documentation at some point.
    7 
    8 
    9 1. Introduction
   10 
   11 Tin's filtering mechanism has changed significantly since version 1.3beta.
   12 Originally there were only two possibilities:
   13 
   14 1) kill an article matching a rule.
   15 2) hot-select an article matching a rule.
   16 
   17 This led to constant confusion, as it seemed important which rule
   18 came first in the filter file, but it wasn't. Then if an article was
   19 selected for whatever reason it couldn't be killed even if it was Craig
   20 Shergold telling you how to make money fast in a crosspost to alt.test.
   21 This binary concept isn't modern anyway, so a much more up-to-date fuzzy
   22 mechanism was necessary: scoring.
   23 
   24 When using tin's new scoring mechanism you assign a "score" to each
   25 filter rule. The scores of rules matching the current article are added
   26 and the final score of the article decides if it is regular, marked hot
   27 or killed.
   28 
   29 The standard "kill" and standard "select" already in your filter-file have
   30 the score "score_kill" and "score_select" respectively (See section 4).
   31 
   32 
   33 2. Changes to the filter-file format
   34 
   35 Tin understands the additional "score" command in the filter-file now.
   36 
   37 Old style rule:
   38 
   39 scope=*
   40 type=0
   41 case=0
   42 subj=*$$$*
   43 
   44 New style rule:
   45 
   46 group=*
   47 case=0
   48 score=-100
   49 subj=*$$$*
   50 #####
   51 
   52 So you can give the individual rule a weight, based on your opinion
   53 about the rule. e.g. if you want to be sure to never read a certain
   54 individual again, you may give the rule a score of (-)9000.
   55 
   56 If you want only "classical" filtering and don't want to mess around
   57 with score values, you can use the magic words "kill" and "hot" as score
   58 values in your filter file. Example:
   59 
   60 group=*
   61 case=0
   62 score=kill
   63 subj=*$$$*
   64 #####
   65 
   66 These are handled as default values at program initialization time and
   67 may be somewhat easier to remember.
   68 
   69 You might have noticed by the examples above that tin inserts a line of
   70 hashes between two rules now. This is *not* required, it just improves
   71 readability.
   72 
   73 
   74 3. Changes in the filter menu
   75 
   76 The on screen filter menu is now more compact and fits easily on small
   77 terminals such as a small xterm or a 640x200 CON: window now. It has been
   78 enhanced to allow you to enter a score for the rule you are adding. It
   79 should be in the range from 1 to SCORE_MAX otherwise it will default to
   80 "score_select" for select filter rules and "score_kill" for kill filter
   81 rules (See section 4).
   82 
   83 
   84 4. Internal defaults and config options
   85 
   86 There are some constants defined in tin.h and tinrc:
   87 
   88 SCORE_MAX is the maximum score an article can reach. Any value
   89 above this is cut to SCORE_MAX, the same goes for negative scores.
   90 recommended: 10000
   91 
   92 "score_kill" is the default score given for any kill rule, if no other
   93 is specified.
   94 recommended: -100
   95 
   96 "score_select" is the default score for any auto-selection rule, if no
   97 other is specified.
   98 recommended: 100
   99 
  100 "score_limit_kill" and "score_limit_select" are the limits that must be
  101 crossed to mark an article as killed or selected.
  102 recommended when used with values given above: -50/+50.
  103 
  104 "score_kill", "score_select", "score_limit_kill" "score_limit_select" are
  105 config options. You can find them in tin's configuration file
  106 (~/.tin/tinrc). They can also be changed at runtime in the config menu.
  107 
  108 
  109 5. Overview of "filter"-commands
  110 
  111 Everything here is also described in the file ~/.tin/filter, albeit more
  112 concisely.
  113 
  114 All lines are of the form:
  115 command=value
  116 
  117 Valid "command"s are:
  118 
  119 add a comment to the following rule:
  120 
  121 comment= a short text
  122 
  123 multiple comment lines may be used, comments lines _must_ be right before
  124 the scope selection.
  125 
  126 scope selection:
  127 
  128 group=newsgroup_pattern_list
  129 
  130 newsgroup_pattern_list is a comma-separated list of newsgroup_patterns
  131 
  132 newsgroup_patterns can be a pattern (wildmat-style) or !pattern,
  133 negating the match of pattern. This is the same format used for the
  134 AUTO(UN)SUBSCRIBE environment variable.
  135 
  136 Tin doesn't rework your filter file, the new pattern matching is only
  137 used when you enter new entries by hand.
  138 
  139 additional info:
  140 
  141 case=num    num: 0=case sensitive, 1=case insensitive
  142 score=num   num: score value of rule, can now also be one of the magic words
  143                  "kill" or "hot", which are equivalent to
  144                  SCORE_KILL and SCORE_SELECT respectively.
  145 time=num    num: time_t value; when rule expires. When tin writes the filter
  146                  file it adds the time in human readable form as a comment in
  147                  parentheses after the numeric value. When reading the file
  148                  tin uses _only_ the numeric value, not the human readable form.
  149 
  150 matches:            matched to:
  151 
  152 subj=pattern        Subject:
  153 from=pattern        From:
  154                     Tin converts the contents of the From-header to an
  155                     old-style e-mail address, i.e. ''some@body.example (John
  156                     Doe)'' instead of ''John Doe <some@body.example>'',
  157                     before trying to match the patterns in the filter rule.
  158                     That way a rule tailored to match the full from
  159                     header "jsmith@ac.example (John Smith)" will still work
  160                     when John posts with a different newsreader which uses
  161                     "John Smith <jsmith@ac.example>".
  162 msgid=pattern       Message-Id: *AND* full References:
  163 msgid_last=pattern  Message-Id: and last Reference:s entry only
  164 msgid_only=pattern  Message-Id:
  165 refs_only=pattern   References: line (e.g. <123@example.net>) without Message-Id:
  166 lines=num           Lines: ; <num matches less than, >num matches more than.
  167 gnksa=[<>]?NUM      GNKSA parse_from() return code
  168 xref=pattern        Xref: ; filter crossposts to groups matching pattern
  169 path=pattern        Path: ; filter server names matching pattern
  170 
  171 When you are using wildmat pattern-matching, patterns in ~/.tin/filter
  172 should be delimited with "*", verbatim wildcards in patterns must be
  173 escaped with "\". When using the built-in filter-file functions, tin tries
  174 to take care of it for itself, except when you are entering text in the
  175 built in kill/hot-menu. Then you have to quote manually because tin
  176 doesn't know if e.g. "\[" is already quoted or not.
  177 
  178 GNKSA return codes: these are the return codes of the From:-address
  179 parser, enabling you to filter on certain kinds of syntactical and
  180 semantical errors present in that header. For an up-to-date list see the
  181 definitions in extern.h and the parser source code in misc.c, the
  182 following is just a short introduction.
  183 
  184    0-99: internal codes
  185 code   error description
  186    0   no error, valid address
  187    1   internal error, should not happen (blame me)
  188 
  189    100-199: general syntactical errors
  190 code   error description
  191  100   left angle bracket ("<") missing in route address
  192  101   left parenthesis ("(") missing in oldstyle address (realname comment)
  193  102   right parenthesis (")") missing in oldstyle address (realname comment)
  194  103   at-sign ("@") missing in mail address
  195  104   right angle bracket (">") missing in route address
  196 
  197    200-299: right hand side (FQDN part) of address, syntax and semantics
  198 code   error description
  199  200   right hand side (RHS) of address is a single component
  200  201   RHS has an unknown top level domain (3 or more characters)
  201  202   RHS has a malformed top level domain
  202  203   RHS has an unknown country code as top level domain
  203  204   illegal character in RHS
  204  205   leading or trailing dot or two consecutive dots in RHS
  205  206   RHS has a component longer than 63 characters
  206  207   RHS has a component with leading or trailing hyphen ("-")
  207  208   RHS has a component starting with a digit (with ENFORCE_RFC1034 only)
  208  209   RHS is not a valid IP address
  209  210   RHS is an IP address from private IP space (see RFC1918) or loopback
  210  211   brackets ("[", "]") around IP address missing in RHS
  211 
  212    300-399: syntactical errors left hand side (localpart) of address
  213 code   error description
  214  300   there was no localpart found at all in address
  215  301   localpart contains illegal characters
  216  302   localpart has leading, trailing or consecutive dots
  217 
  218    400-499: syntactical errors in realname part
  219 code   error description
  220  400   illegal character in unquoted word in realname part
  221  401   illegal character in quoted word in realname part
  222  402   illegal character in encoded word in realname part
  223  403   bad syntax in encoded word in realname part
  224  404   illegal character in oldstyle realname part (one of "()<>\")
  225  405   illegal character in realname part
  226  406   missing realname part
  227 
  228 
  229 Path:-filter
  230 Restrictions - this will only work if:
  231 - reading from local spool and
  232 -- without access to local NOV-files and OVERVIEW.FMT or
  233 -- local NOV files provide Path data
  234 - or reading via NNTP and
  235 -- NOV files provide Path data or
  236 -- server supports HDR and announces "Path" in LIST HEADERS RANGE or
  237 -- server does not support HDR but XHDR and returns "Path" data if requested or
  238 -- server supports XPAT and returns "Path" data if requested
  239 Side effects:
  240 When using a Path:-filer tin _may_ rebuild locally cached overview
  241 data if cache_overview_files=ON is set to get the Path data into the
  242 local cache of the group where the filter is active. This may cause
  243 more NNTP traffic once.
  244 
  245 6. EXAMPLES
  246 
  247 6.1 WILDMAT EXAMPLES
  248 
  249 none given, too simple, find out yourself ,-)
  250 
  251 6.2 REGEXP EXAMPLES
  252 
  253 Be sure to change Wildcard setting from WILDMAT (default) to REGEX to make
  254 the following examples to work properly. This can be done using the internal
  255 configuration menu or in file ~/.tin/tinrc
  256 
  257 comment= this kills all articles about CNews, DNEWS or diablo
  258 comment= in news.software.* but not in news.software.readers
  259 group=news.software.*,!news.software.readers
  260 case=1
  261 score=kill
  262 subj=([cd]news|diablo)
  263 
  264 
  265 comment= this should mark all articles about tin, rtin, tind, ktin or cdtin
  266 comment= as hot
  267 group=*
  268 case=1
  269 score=hot
  270 subj=\b(cd|[rk]?)?tin(d|pre)?[-.0-9]*\b
  271 
  272 
  273 comment= mark own articles and followups to own articles as hot in all groups
  274 comment= except local ones
  275 comment= match From: (a bit complex) and/or
  276 comment= Message-ID: (I'm the only user who's posting on this server)
  277 group=*,!akk.*,!tin.*
  278 case=1
  279 score=hot
  280 from=urs@(.*\.)?((akk\.uni-karlsruhe|arbeitsen)\.de|(karlsruhe|tin|akk)\.org|ka\.nu)
  281 msgid=@akk3(?:-dmz)?\.akk\.uni-karlsruhe\.de>
  282 
  283 
  284 comment= stupid ppl. sometimes read control.cancel to see if there are any
  285 comment= forged cancels around... the next rule helps you a bit
  286 comment= ignore know despammers and net.* cancels
  287 group=control.cancel
  288 case=1
  289 score=kill
  290 from=(news@news\.msfc\.nasa\.gov|clewis@ferret\.ocunix\.on\.ca|jem@xpat\.com|(jeremy|lysander)@exit109\.com|howardk@iswest\.com|cosmo.roadkill.*rauug\.mil\.wi\.us|spamless@pacbell\.net|cwilkins@.*\.clark\.net)
  291 msgid_only=<net-monitor-cancel
  292 
  293 
  294 comment= this might help when reading alt.*
  295 comment= ignore all postings with $$$ or *** or !!!
  296 comment= ignore all postings shorter then 3 lines
  297 comment= ignore all postings crossposted into more then 10 groups
  298 comment= if an article has less than 10 lines AND i.e !!!
  299 comment= in the subject it gets a score of -400
  300 group=alt.*
  301 case=1
  302 score=-200
  303 subj=[$*!]{3,}
  304 lines=<3
  305 xref=([^,]+,){10,}
  306 
  307 comment= mark own articles and direct replies based on message-id
  308 comment= use 2*hot as score to unkill otherwise killed articles
  309 group=*
  310 case=1
  311 score=200
  312 msgid_last=doeblitz\.ts\.rz\.tu-bs\.de
  313 
  314 comment= unmark own articles based on message-id
  315 comment= -> only f'ups to own articles keep marked hot
  316 group=*
  317 case=1
  318 score=-200
  319 msgid_only=doeblitz\.ts\.rz\.tu-bs\.de
  320 
  321 
  322 comment= kill all articles which do not have your message-id
  323 comment= as last reference _if_ article has any references
  324 group=de.newusers.questions
  325 case=1
  326 score=-100
  327 refs_only=.*<[^@\s]+@\S+(?<!akk3\.akk\.uni-karlsruhe\.de)>$
  328 
  329 comment= Kill all articles from John Smith, who writes under different
  330 comment= addresses at ac.example, e.g john@ac.example and boss@ac.example
  331 group=*
  332 case=1
  333 score=kill
  334 from=@ac\.example\s\(John\sSmith\)$
  335 
  336 comment= Kill all articles which have news.example.org in
  337 comment= in the Path: header
  338 group=*
  339 case=1
  340 score=kill
  341 path=news\.example\.org
  342 
  343 7. TODO
  344 
  345 - make the time value in the filter file more human readable.
  346 - rewrite filtering order to get optimal performance
  347 - filtering on arbitrary header lines
  348 - move docu to tin.5