"Fossies" - the Fresh Open Source Software Archive

Member "tin-2.4.3/doc/filtering" (18 Feb 2018, 12168 Bytes) of package /linux/misc/tin-2.4.3.tar.xz:


As a special service "Fossies" has tried to format the requested text file into HTML format (style: standard) with prefixed line numbers. Alternatively you can here view or download the uninterpreted source code file. See also the latest Fossies "Diffs" side-by-side code changes report for "filtering": 2.4.2_vs_2.4.3.

    1 Filtering in tin
    2 
    3 0. Status
    4 
    5 This is an overview of the new filtering capabilities of tin. This
    6 document will be absorbed in the main documentation at some point.
    7 
    8 
    9 1. Introduction
   10 
   11 Tin's filtering mechanism has changed significantly since version 1.3beta.
   12 Originally there were only two possibilities:
   13 
   14 1) kill an article matching a rule.
   15 2) hot-select an article matching a rule.
   16 
   17 This led to constant confusion, as it seemed important which rule
   18 came first in the filter file, but it wasn't. Then if an article was
   19 selected for whatever reason it couldn't be killed even if it was Craig
   20 Shergold telling you how to make money fast in a crosspost to alt.test.
   21 This binary concept isn't modern anyway, so a much more up-to-date fuzzy
   22 mechanism was necessary: scoring.
   23 
   24 When using tin's new scoring mechanism you assign a "score" to each
   25 filter rule. The scores of rules matching the current article are added
   26 and the final score of the article decides if it is regular, marked hot
   27 or killed.
   28 
   29 The standard "kill" and standard "select" already in your filter-file have
   30 the score "score_kill" and "score_select" respectively (See section 4).
   31 
   32 
   33 2. Changes to the filter-file format
   34 
   35 Tin understands the additional "score" command in the filter-file now.
   36 
   37 Old style rule:
   38 
   39 scope=*
   40 type=0
   41 case=0
   42 subj=*$$$*
   43 
   44 New style rule:
   45 
   46 group=*
   47 case=0
   48 score=-100
   49 subj=*$$$*
   50 #####
   51 
   52 So you can give the individual rule a weight, based on your opinion
   53 about the rule. e.g. if you want to be sure to never read a certain
   54 individual again, you may give the rule a score of (-)9000.
   55 
   56 If you want only "classical" filtering and don't want to mess around
   57 with score values, you can use the magic words "kill" and "hot" as score
   58 values in your filter file. Example:
   59 
   60 group=*
   61 case=0
   62 score=kill
   63 subj=*$$$*
   64 #####
   65 
   66 These are handled as default values at program initialization time and
   67 may be somewhat easier to remember.
   68 
   69 You might have noticed by the examples above that tin inserts a line of
   70 hashes between two rules now. This is *not* required, it just improves
   71 readability.
   72 
   73 
   74 3. Changes in the filter menu
   75 
   76 The on screen filter menu is now more compact and fits easily on small
   77 terminals such as a small xterm or a 640x200 CON: window now. It has been
   78 enhanced to allow you to enter a score for the rule you are adding. It
   79 should be in the range from 1 to SCORE_MAX otherwise it will default to
   80 "score_select" for select filter rules and "score_kill" for kill filter
   81 rules (See section 4).
   82 
   83 
   84 4. Internal defaults and config options
   85 
   86 There are some constants defined in tin.h and tinrc:
   87 
   88 SCORE_MAX is the maximum score an article can reach. Any value
   89 above this is cut to SCORE_MAX, the same goes for negative scores.
   90 recommended: 10000
   91 
   92 "score_kill" is the default score given for any kill rule, if no other
   93 is specified.
   94 recommended: -100
   95 
   96 "score_select" is the default score for any auto-selection rule, if no
   97 other is specified.
   98 recommended: 100
   99 
  100 "score_limit_kill" and "score_limit_select" are the limits that must be
  101 crossed to mark an article as killed or selected.
  102 recommended when used with values given above: -50/+50.
  103 
  104 "score_kill", "score_select", "score_limit_kill" "score_limit_select" are
  105 config options. You can find them in tin's configuration file
  106 (~/.tin/tinrc). They can also be changed at runtime in the config menu.
  107 
  108 
  109 5. Overview of "filter"-commands
  110 
  111 Everything here is also described in the file ~/.tin/filter, albeit more
  112 concisely.
  113 
  114 All lines are of the form:
  115 command=value
  116 
  117 Valid "command"s are:
  118 
  119 add a comment to the following rule:
  120 
  121 comment= a short text
  122 
  123 multiple comment lines may be used, comments lines _must_ be right before
  124 the scope selection.
  125 
  126 scope selection:
  127 
  128 group=newsgroup_pattern_list
  129 
  130 newsgroup_pattern_list is a comma-separated list of newsgroup_patterns
  131 
  132 newsgroup_patterns can be a pattern (wildmat-style) or !pattern,
  133 negating the match of pattern. This is the same format used for the
  134 AUTO(UN)SUBSCRIBE environment variable.
  135 
  136 Tin doesn't rework your filter file, the new pattern matching is only
  137 used when you enter new entries by hand.
  138 
  139 additional info:
  140 
  141 case=num    num: 0=casesensitive, 1=caseinsensitive
  142 score=num   num: score value of rule, can now also be one of the magic words
  143                  "kill" or "hot", which are equivalent to
  144                  SCORE_KILL and SCORE_SELECT respectively.
  145 time=num    num: time_t value; when rule expires. When tin writes the filter
  146                  file it adds the time in human readable form as a comment in
  147                  parentheses after the numeric value. When reading the file
  148                  tin uses _only_ the numeric value, not the human readable form.
  149 
  150 matches:            matched to:
  151 
  152 subj=pattern        Subject:
  153 from=pattern        From:
  154                     Tin converts the contents of the From-header to an
  155                     old-style e-mail address, i.e. ''some@body.example (John
  156                     Doe)'' instead of ''John Doe <some@body.example>'',
  157                     before trying to match the patterns in the filter rule.
  158                     That way a rule tailored to match the full from
  159                     header "jsmith@ac.example (John Smith)" will still work
  160                     when John posts with a different newsreader which uses
  161                     "John Smith <jsmith@ac.example>".
  162 msgid=pattern       Message-Id: *AND* full References:
  163 msgid_last=pattern  Message-Id: and last Reference:s entry only
  164 msgid_only=pattern  Message-Id:
  165 refs_only=pattern   References: line (e.g. <123@ether.net>) without Message-Id:
  166 lines=num           Lines: ; <num matches less than, >num matches more than.
  167 gnksa=[<>]?NUM      GNKSA parse_from() return code
  168 xref=pattern        Xref: ; filter crossposts to groups matching pattern
  169 path=pattern        Path: ; filter server names matching pattern
  170 
  171 When you are using wildmat pattern-matching, patterns in ~/.tin/filter
  172 should be delimited with "*", verbatim wildcards in patterns must be
  173 escaped with "\". When using the built-in filter-file functions, tin tries
  174 to take care of it for itself, except when you are entering text in the
  175 built in kill/hot-menu. Then you have to quote manually because tin
  176 doesn't know if e.g. "\[" is already quoted or not.
  177 
  178 GNKSA return codes: these are the return codes of the From:-address
  179 parser, enabling you to filter on certain kinds of syntactical and
  180 semantical errors present in that header. For an up-to-date list see the
  181 definitions in extern.h and the parser source code in misc.c, the
  182 following is just a short introduction.
  183 
  184    0-99: internal codes
  185 code   error description
  186    0   no error, valid address
  187    1   internal error, should not happen (blame me)
  188 
  189    100-199: general syntactical errors
  190 code   error description
  191  100   left angle bracket ("<") missing in route address
  192  101   left parenthesis ("(") missing in oldstyle address (realname comment)
  193  102   right parenthesis (")") missing in oldstyle address (realname comment)
  194  103   at-sign ("@") missing in mail address
  195 
  196    200-299: right hand side (FQDN part) of address, syntax and semantics
  197 code   error description
  198  200   right hand side (RHS) of address is a single component
  199  201   RHS has an unknown top level domain (3 or more characters)
  200  202   RHS has a malformed top level domain
  201  203   RHS has an unknown country code as top level domain
  202  204   illegal character in RHS
  203  205   leading or trailing dot or two consecutive dots in RHS
  204  206   RHS has a component longer than 63 characters
  205  207   RHS has a component with leading or trailing hyphen ("-")
  206  208   RHS has a component starting with a digit (with ENFORCE_RFC1034 only)
  207  209   RHS is not a valid IP address
  208  210   RHS is an IP address from private IP space (see RFC1918) or loopback
  209  211   brackets ("[", "]") around IP address missing in RHS
  210 
  211    300-399: syntactical errors left hand side (localpart) of address
  212 code   error description
  213  300   there was no localpart found at all in address
  214  301   localpart contains illegal characters
  215  302   localpart has leading, trailing or consecutive dots
  216 
  217    400-499: syntactical errors in realname part
  218 code   error description
  219  400   illegal character in unquoted word in realname part
  220  401   illegal character in quoted word in realname part
  221  402   illegal character in encoded word in realname part
  222  403   bad syntax in encoded word in realname part
  223  404   illegal character in oldstyle realname part (one of "()<>\")
  224  405   illegal character in realname part
  225 
  226 
  227 Path:-filter
  228 Restrictions - this will only work if:
  229 - reading from local spool and
  230 -- without access to local NOV-files and OVERVIEW.FMT or
  231 -- local NOV files provide Path data
  232 - or reading via NNTP and
  233 -- NOV files provide Path data or
  234 -- server supports HDR and announces "Path" in LIST HEADERS RANGE or
  235 -- server does not support HDR but XHDR and returns "Path" data if requested or
  236 -- server supports XPAT and returns "Path" data if requested
  237 Side effects:
  238 When unsing a Path:-filer tin _may_ rebuild locally cached overview
  239 data if cache_overview_files=ON is set to get the Path data into the
  240 local cache of the group where the filter is active. This will may cause
  241 more NNTP traffic once.
  242 
  243 6. EXAMPLES
  244 
  245 6.1 WILDMAT EXAMPLES
  246 
  247 none given, too simple, find out yourself ,-)
  248 
  249 6.2 REGEXP EXAMPLES
  250 
  251 Be sure to change Wildcard setting from WILDMAT (default) to REGEX to make
  252 the following examples to work properly. This can be done using the internal
  253 configuration menu or in file ~/.tin/tinrc
  254 
  255 comment= this kills all articles about CNews, DNEWS or diablo
  256 comment= in news.software.* but not in news.software.readers
  257 group=news.software.*,!news.software.readers
  258 case=1
  259 score=kill
  260 subj=([cd]news|diablo)
  261 
  262 
  263 comment= this should mark all articles about tin, rtin, tind, ktin or cdtin
  264 comment= as hot
  265 group=*
  266 case=1
  267 score=hot
  268 subj=\b(cd|[rk]?)?tin(d|pre)?[-.0-9]*\b
  269 
  270 
  271 comment= mark own articles and followups to own articles as hot in all groups
  272 comment= except local ones
  273 comment= match From: (a bit complex) and/or
  274 comment= Message-ID: (I'm the only user who's posting on this server)
  275 group=*,!akk.*,!tin.*
  276 case=1
  277 score=hot
  278 from=urs@(.*\.)?((akk\.uni-karlsruhe|arbeitsen)\.de|(karlsruhe|tin|akk)\.org|ka\.nu)
  279 msgid=@akk3(?:-dmz)?\.akk\.uni-karlsruhe\.de>
  280 
  281 
  282 comment= stupid ppl. sometimes read control.cancel to see if there are any
  283 comment= forged cancels around... the next rule helps you a bit
  284 comment= ignore know despammers and net.* cancels
  285 group=control.cancel
  286 case=1
  287 score=kill
  288 from=(news@news\.msfc\.nasa\.gov|clewis@ferret\.ocunix\.on\.ca|jem@xpat\.com|(jeremy|lysander)@exit109\.com|howardk@iswest\.com|cosmo.roadkill.*rauug\.mil\.wi\.us|spamless@pacbell\.net|cwilkins@.*\.clark\.net)
  289 msgid_only=<net-monitor-cancel
  290 
  291 
  292 comment= this might help when reading alt.*
  293 comment= ignore all postings with $$$ or *** or !!!
  294 comment= ignore all postings shorter then 3 lines
  295 comment= ignore all postings crossposted into more then 10 groups
  296 comment= if an article has less than 10 lines AND i.e !!!
  297 comment= in the subject it gets a score of -400
  298 group=alt.*
  299 case=1
  300 score=-200
  301 subj=[$*!]{3,}
  302 lines=<3
  303 xref=([^,]+,){10,}
  304 
  305 comment= mark own articles and direct replies based on message-id
  306 comment= use 2*hot as score to unkill otherwise killed articles
  307 group=*
  308 case=1
  309 score=200
  310 msgid_last=doeblitz\.ts\.rz\.tu-bs\.de
  311 
  312 comment= unmark own articles based on message-id
  313 comment= -> only f'ups to own articles keep marked hot
  314 group=*
  315 case=1
  316 score=-200
  317 msgid_only=doeblitz\.ts\.rz\.tu-bs\.de
  318 
  319 
  320 comment= kill all articles which do not have your message-id
  321 comment= as last reference _if_ article has any references
  322 group=de.newusers.questions
  323 case=1
  324 score=-100
  325 refs_only=.*<[^@\s]+@\S+(?<!akk3\.akk\.uni-karlsruhe\.de)>$
  326 
  327 comment= Kill all articles from John Smith, who writes under different
  328 comment= addresses at ac.example, e.g john@ac.example and boss@ac.example
  329 group=*
  330 case=1
  331 score=kill
  332 from=@ac\.example\s\(John\sSmith\)$
  333 
  334 comment= Kill all articles which have news.example.org in
  335 comment= in the Path: header
  336 group=*
  337 case=1
  338 score=kill
  339 path=news\.example\.org
  340 
  341 7. TODO
  342 
  343 - make the time value in the filter file more human readable.
  344 - rewrite filtering order to get optimal performance
  345 - filtering on arbitrary header lines
  346 - move docu to tin.5