"Fossies" - the Fresh Open Source Software Archive

Member "tin-2.4.1/doc/filtering" (28 Aug 2013, 11269 Bytes) of archive /linux/misc/tin-2.4.1.tar.gz:


As a special service "Fossies" has tried to format the requested text file into HTML format (style: standard) with prefixed line numbers. Alternatively you can here view or download the uninterpreted source code file.

    1 Filtering in tin
    2 
    3 0. Status
    4 
    5 This is an overview of the new filtering capabilities of tin. This
    6 document will be absorbed in the main documentation at some point.
    7 
    8 
    9 1. Introduction
   10 
   11 Tin's filtering mechanism has changed significantly since version 1.3beta.
   12 Originally there were only two possibilities:
   13 
   14 1) kill an article matching a rule.
   15 2) hot-select an article matching a rule.
   16 
   17 This led to constant confusion, as it seemed important which rule
   18 came first in the filter file, but it wasn't. Then if an article was
   19 selected for whatever reason it couldn't be killed even if it was Craig
   20 Shergold telling you how to make money fast in a crosspost to alt.test.
   21 This binary concept isn't modern anyway, so a much more up-to-date fuzzy
   22 mechanism was necessary: scoring.
   23 
   24 When using tin's new scoring mechanism you assign a "score" to each
   25 filter rule. The scores of rules matching the current article are added
   26 and the final score of the article decides if it is regular, marked hot
   27 or killed.
   28 
   29 The standard "kill" and standard "select" already in your filter-file have
   30 the score "score_kill" and "score_select" respectively (See section 4).
   31 
   32 
   33 2. Changes to the filter-file format
   34 
   35 Tin understands the additional "score" command in the filter-file now.
   36 
   37 Old style rule:
   38 
   39 scope=*
   40 type=0
   41 case=0
   42 subj=*$$$*
   43 
   44 New style rule:
   45 
   46 group=*
   47 case=0
   48 score=-100
   49 subj=*$$$*
   50 #####
   51 
   52 So you can give the individual rule a weight, based on your opinion
   53 about the rule. e.g. if you want to be sure to never read a certain
   54 individual again, you may give the rule a score of (-)9000.
   55 
   56 If you want only "classical" filtering and don't want to mess around
   57 with score values, you can use the magic words "kill" and "hot" as score
   58 values in your filter file. Example:
   59 
   60 group=*
   61 case=0
   62 score=kill
   63 subj=*$$$*
   64 #####
   65 
   66 These are handled as default values at program initialization time and
   67 may be somewhat easier to remember.
   68 
   69 You might have noticed by the examples above that tin inserts a line of
   70 hashes between two rules now. This is *not* required, it just improves
   71 readability.
   72 
   73 
   74 3. Changes in the filter menu
   75 
   76 The on screen filter menu is now more compact and fits easily on small
   77 terminals such as a small xterm or a 640x200 CON: window now. It has been
   78 enhanced to allow you to enter a score for the rule you are adding. It
   79 should be in the range from 1 to SCORE_MAX otherwise it will default to
   80 "score_select" for select filter rules and "score_kill" for kill filter
   81 rules (See section 4).
   82 
   83 
   84 4. Internal defaults and config options
   85 
   86 There are some constants defined in tin.h and tinrc:
   87 
   88 SCORE_MAX is the maximum score an article can reach. Any value
   89 above this is cut to SCORE_MAX, the same goes for negative scores.
   90 recommended: 10000
   91 
   92 "score_kill" is the default score given for any kill rule, if no other
   93 is specified.
   94 recommended: -100
   95 
   96 "score_select" is the default score for any auto-selection rule, if no
   97 other is specified.
   98 recommended: 100
   99 
  100 "score_limit_kill" and "score_limit_select" are the limits that must be
  101 crossed to mark an article as killed or selected.
  102 recommended when used with values given above: -50/+50.
  103 
  104 "score_kill", "score_select", "score_limit_kill" "score_limit_select" are
  105 config options. You can find them in tin's configuration file
  106 (~/.tin/tinrc). They can also be changed at runtime in the config menu.
  107 
  108 
  109 5. Overview of "filter"-commands
  110 
  111 Everything here is also described in the file ~/.tin/filter, albeit more
  112 concisely.
  113 
  114 All lines are of the form:
  115 command=value
  116 
  117 Valid "command"s are:
  118 
  119 add a comment to the following rule:
  120 
  121 comment= a short text
  122 
  123 multiple comment lines may be used, comments lines _must_ be right before
  124 the scope selection.
  125 
  126 scope selection:
  127 
  128 group=newsgroup_pattern_list
  129 
  130 newsgroup_pattern_list is a comma-separated list of newsgroup_patterns
  131 
  132 newsgroup_patterns can be a pattern (wildmat-style) or !pattern,
  133 negating the match of pattern. This is the same format used for the
  134 AUTO(UN)SUBSCRIBE environment variable.
  135 
  136 Tin doesn't rework your filter file, the new pattern matching is only
  137 used when you enter new entries by hand.
  138 
  139 additional info:
  140 
  141 case=num    num: 0=casesensitive, 1=caseinsensitive
  142 score=num   num: score value of rule, can now also be one of the magic words
  143                  "kill" or "hot", which are equivalent to
  144                  SCORE_KILL and SCORE_SELECT respectively.
  145 time=num    num: time_t value; when rule expires. When tin writes the filter
  146                  file it adds the time in human readable form as a comment in
  147                  parentheses after the numeric value. When reading the file
  148                  tin uses _only_ the numeric value, not the human readable form.
  149 
  150 matches:            matched to:
  151 
  152 subj=pattern        Subject:
  153 from=pattern        From:
  154                     Tin converts the contents of the From-header to an
  155                     old-style e-mail address, i.e. ''some@body.example (John
  156                     Doe)'' instead of ''John Doe <some@body.example>'',
  157                     before trying to match the patterns in the filter rule.
  158                     That way a rule tailored to match the full from
  159                     header "jsmith@ac.example (John Smith)" will still work
  160                     when John posts with a different newsreader which uses
  161                     "John Smith <jsmith@ac.example>".
  162 msgid=pattern       Message-Id: *AND* full References:
  163 msgid_last=pattern  Message-Id: and last Reference:s entry only
  164 msgid_only=pattern  Message-Id:
  165 refs_only=pattern   References: line (e.g. <123@ether.net>) without Message-Id:
  166 lines=num           Lines: ; <num matches less than, >num matches more than.
  167 gnksa=[<>]?NUM      GNKSA parse_from() return code
  168 xref=pattern        Xref: ; filter crossposts to groups matching pattern
  169 
  170 When you are using wildmat pattern-matching, patterns in ~/.tin/filter
  171 should be delimited with "*", verbatim wildcards in patterns must be
  172 escaped with "\". When using the built-in filter-file functions, tin tries
  173 to take care of it for itself, except when you are entering text in the
  174 built in kill/hot-menu. Then you have to quote manually because tin
  175 doesn't know if e.g. "\[" is already quoted or not.
  176 
  177 GNKSA return codes: these are the return codes of the From:-address
  178 parser, enabling you to filter on certain kinds of syntactical and
  179 semantical errors present in that header. For an up-to-date list see the
  180 definitions in extern.h and the parser source code in misc.c, the
  181 following is just a short introduction.
  182 
  183    0-99: internal codes
  184 code   error description
  185    0   no error, valid address
  186    1   internal error, should not happen (blame me)
  187 
  188    100-199: general syntactical errors
  189 code   error description
  190  100   left angle bracket ("<") missing in route address
  191  101   left parenthesis ("(") missing in oldstyle address (realname comment)
  192  102   right parenthesis (")") missing in oldstyle address (realname comment)
  193  103   at-sign ("@") missing in mail address
  194 
  195    200-299: right hand side (FQDN part) of address, syntax and semantics
  196 code   error description
  197  200   right hand side (RHS) of address is a single component
  198  201   RHS has an unknown top level domain (3 or more characters)
  199  202   RHS has a malformed top level domain
  200  203   RHS has an unknown country code as top level domain
  201  204   illegal character in RHS
  202  205   leading or trailing dot or two consecutive dots in RHS
  203  206   RHS has a component longer than 63 characters
  204  207   RHS has a component with leading or trailing hyphen ("-")
  205  208   RHS has a component starting with a digit (with ENFORCE_RFC1034 only)
  206  209   RHS is not a valid IP address
  207  210   RHS is an IP address from private IP space (see RFC1918) or loopback
  208  211   brackets ("[", "]") around IP address missing in RHS
  209 
  210    300-399: syntactical errors left hand side (localpart) of address
  211 code   error description
  212  300   there was no localpart found at all in address
  213  301   localpart contains illegal characters
  214  302   localpart has leading, trailing or consecutive dots
  215 
  216    400-499: syntactical errors in realname part
  217 code   error description
  218  400   illegal character in unquoted word in realname part
  219  401   illegal character in quoted word in realname part
  220  402   illegal character in encoded word in realname part
  221  403   bad syntax in encoded word in realname part
  222  404   illegal character in oldstyle realname part (one of "()<>\")
  223  405   illegal character in realname part
  224 
  225 
  226 6. EXAMPLES
  227 
  228 6.1 WILDMAT EXAMPLES
  229 
  230 none given, too simple, find out yourself ,-)
  231 
  232 6.2 REGEXP EXAMPLES
  233 
  234 Be sure to change Wildcard setting from WILDMAT (default) to REGEX to make
  235 the following examples to work properly. This can be done using the internal
  236 configuration menu or in file ~/.tin/tinrc
  237 
  238 comment= this kills all articles about CNews, DNEWS or diablo
  239 comment= in news.software.* but not in news.software.readers
  240 group=news.software.*,!news.software.readers
  241 case=1
  242 score=kill
  243 subj=([cd]news|diablo)
  244 
  245 
  246 comment= this should mark all articles about tin, rtin, tind, ktin or cdtin
  247 comment= as hot
  248 group=*
  249 case=1
  250 score=hot
  251 subj=\b(cd|[rk]?)?tin(d|pre)?[-.0-9]*\b
  252 
  253 
  254 comment= mark own articles and followups to own articles as hot in all groups
  255 comment= except local ones
  256 comment= match From: (a bit complex) and/or
  257 comment= Message-ID: (I'm the only user who's posting on this server)
  258 group=*,!akk.*,!tin.*
  259 case=1
  260 score=hot
  261 from=urs@(.*\.)?((akk\.uni-karlsruhe|arbeitsen)\.de|(karlsruhe|tin|akk)\.org|ka\.nu)
  262 msgid=@akk3(?:-dmz)?\.akk\.uni-karlsruhe\.de>
  263 
  264 
  265 comment= stupid ppl. sometimes read control.cancel to see if there are any
  266 comment= forged cancels around... the next rule helps you a bit
  267 comment= ignore know despammers and net.* cancels
  268 group=control.cancel
  269 case=1
  270 score=kill
  271 from=(news@news\.msfc\.nasa\.gov|clewis@ferret\.ocunix\.on\.ca|jem@xpat\.com|(jeremy|lysander)@exit109\.com|howardk@iswest\.com|cosmo.roadkill.*rauug\.mil\.wi\.us|spamless@pacbell\.net|cwilkins@.*\.clark\.net)
  272 msgid_only=<net-monitor-cancel
  273 
  274 
  275 comment= this might help when reading alt.*
  276 comment= ignore all postings with $$$ or *** or !!!
  277 comment= ignore all postings shorter then 3 lines
  278 comment= ignore all postings crossposted into more then 10 groups
  279 comment= if an article has less than 10 lines AND i.e !!!
  280 comment= in the subject it gets a score of -400
  281 group=alt.*
  282 case=1
  283 score=-200
  284 subj=[$*!]{3,}
  285 lines=<3
  286 xref=([^,]+,){10,}
  287 
  288 comment= mark own articles and direct replies based on message-id
  289 comment= use 2*hot as score to unkill otherwise killed articles
  290 group=*
  291 case=1
  292 score=200
  293 msgid_last=doeblitz\.ts\.rz\.tu-bs\.de
  294 
  295 comment= unmark own articles based on message-id
  296 comment= -> only f'ups to own articles keep marked hot
  297 group=*
  298 case=1
  299 score=-200
  300 msgid_only=doeblitz\.ts\.rz\.tu-bs\.de
  301 
  302 
  303 comment= kill all articles which do not have your message-id
  304 comment= as last reference _if_ article has any references
  305 group=de.newusers.questions
  306 case=1
  307 score=-100
  308 refs_only=.*<[^@\s]+@\S+(?<!akk3\.akk\.uni-karlsruhe\.de)>$
  309 
  310 comment= Kill all articles from John Smith, who writes under different
  311 comment= addresses at ac.example, e.g john@ac.example and boss@ac.example
  312 group=*
  313 case=1
  314 score=kill
  315 from=@ac\.example\s\(John\sSmith\)$
  316 
  317 7. TODO
  318 
  319 - make the time value in the filter file more human readable.
  320 - rewrite filtering order to get optimal performance
  321 - filtering on arbitrary header lines
  322 - move docu to tin.5