"Fossies" - the Fresh Open Source Software Archive  

Source code changes of the file "doc/ocrad.info" between
ocrad-0.25-pre5.tar.gz and ocrad-0.25-pre6.tar.gz

About: GNU Ocrad is an OCR (Optical Character Recognition) program. Pre-release.

ocrad.info  (ocrad-0.25-pre5):ocrad.info  (ocrad-0.25-pre6)
File: ocrad.info, Node: Top, Next: Introduction, Up: (dir) File: ocrad.info, Node: Top, Next: Introduction, Up: (dir)
GNU Ocrad Manual GNU Ocrad Manual
**************** ****************
This manual is for GNU Ocrad (version 0.25-pre5, 8 January 2015). This manual is for GNU Ocrad (version 0.25-pre6, 19 January 2015).
* Menu: * Menu:
* Introduction:: Purpose and features of GNU Ocrad * Introduction:: Purpose and features of GNU Ocrad
* Character sets:: Input charsets and output formats * Character sets:: Input charsets and output formats
* Invoking ocrad:: Command line interface * Invoking ocrad:: Command line interface
* Filters:: Postprocessing the produced text * Filters:: Postprocessing the produced text
* Library version:: Checking library version * Library version:: Checking library version
* Library functions:: Descriptions of the library functions * Library functions:: Descriptions of the library functions
* Library error codes:: Meaning of codes returned by functions * Library error codes:: Meaning of codes returned by functions
skipping to change at line 220 skipping to change at line 220
Filters don't enable the recognition of characters, just filter them Filters don't enable the recognition of characters, just filter them
from the output. Use '--charset' to enable the recognition of a from the output. Use '--charset' to enable the recognition of a
character set different from the default ISO-8859-15. character set different from the default ISO-8859-15.
Ocrad provides both built-in filters and user-defined filters. Ocrad provides both built-in filters and user-defined filters.
4.1 User-defined filters 4.1 User-defined filters
======================== ========================
The format of a user-defined filter file (*note --user-filter::) is very The format of a user-defined filter file (*note --user-filter::) is very
simple. Each line contains a comma-separated list of quoted characters simple. Each line contains either a character conversion or a word that
specifies the default behaviour for unlisted characters.
A character conversion is a comma-separated list of quoted characters
('c'), character sets ([0-9A-Z]), character codes (U0063), or character ('c'), character sets ([0-9A-Z]), character codes (U0063), or character
ranges (U0000 - UFFFF), and an optional conversion (an equal sign (=) ranges (U0000 - UFFFF), and an optional conversion (an equal sign (=)
followed by a quoted character or a character code). The characters in followed by a quoted character or a character code). The characters in
the list are converted to the character in the conversion. If no the list are converted to the character in the conversion. If no
conversion is specified, the character is left unmodified (converted to conversion is specified, the character is left unmodified (converted to
itself). itself).
Any character not appearing in the file, either by itself or The default behaviour is to discard unlisted characters, i.e. those
included in a set or range, will be discarded. The destination characters not appearing in the file, either by themselves or included
character of a conversion is considered as listed by default. Every in a set or range. If a line containing just the word 'leave' is found
character may be listed more than once, even as part of different in the file, unlisted characters are left unmodified. If the word is
conversions. The last conversion affecting a given character is the one 'mark', unlisted characters are marked as unrecognized.
that is performed.
The destination character of a conversion is considered as listed by
default. Every character may be listed more than once, even as part of
different conversions. The last conversion affecting a given character
is the one that is performed.
Character sets and quoted characters may contain escape sequences. Character sets and quoted characters may contain escape sequences.
The character '#' at begin of line or after whitespace starts a The character '#' at begin of line or after whitespace starts a
comment that extends to the end of the line. comment that extends to the end of the line.
Ranges of characters may be specified in character sets by writing Ranges of characters may be specified in character sets by writing
the starting and ending characters with a '-' between them. Thus, the starting and ending characters with a '-' between them. Thus,
'[A-Z]' matches any ASCII uppercase letter. '-' may be specified by '[A-Z]' matches any ASCII uppercase letter. '-' may be specified by
placing it first or last. ']' may be specified by placing it first. If placing it first or last. ']' may be specified by placing it first. If
skipping to change at line 260 skipping to change at line 267
capital letter y with diaeresis' is specified in a set as '[\xBE]', but capital letter y with diaeresis' is specified in a set as '[\xBE]', but
its code is 'U0178'. its code is 'U0178'.
Spaces and control characters are unaffected by filters, except that Spaces and control characters are unaffected by filters, except that
leadind, trailing, and duplicate spaces produced by the removal of other leadind, trailing, and duplicate spaces produced by the removal of other
characters will be themselves removed. characters will be themselves removed.
Here is an example user-defined filter file equivalent to the built-in Here is an example user-defined filter file equivalent to the built-in
filter 'numbers': filter 'numbers':
U0000 - U00FF # remove this line to get 'numbers_only' leave # remove this line to get 'numbers_only'
'D', 'O', 'Q', 'o' = '0' 'D', 'O', 'Q', 'o' = '0'
'I', 'L', 'l', '|' = '1' 'I', 'L', 'l', '|' = '1'
'Z', 'z' = '2' 'Z', 'z' = '2'
'3' '3'
'A', 'q' = '4' 'A', 'q' = '4'
'S', 's' = '5' 'S', 's' = '5'
'G', 'b', U00F3 = '6' # latin small letter o with acute 'G', 'b', U00F3 = '6' # latin small letter o with acute
'J', 'T' = '7' 'J', 'T' = '7'
'&', 'B' = '8' '&', 'B' = '8'
'g' = '9' 'g' = '9'
 End of changes. 4 change blocks. 
9 lines changed or deleted 16 lines changed or added

Home  |  About  |  All  |  Newest  |  Fossies Dox  |  Screenshots  |  Comments  |  Imprint  |  Privacy  |  HTTPS