"Fossies" - the Fresh Open Source Software Archive  

Source code changes of the file "doc/ocrad.texi" between
ocrad-0.25-pre5.tar.gz and ocrad-0.25-pre6.tar.gz

About: GNU Ocrad is an OCR (Optical Character Recognition) program. Pre-release.

ocrad.texi  (ocrad-0.25-pre5):ocrad.texi  (ocrad-0.25-pre6)
\input texinfo @c -*-texinfo-*- \input texinfo @c -*-texinfo-*-
@c %**start of header @c %**start of header
@setfilename ocrad.info @setfilename ocrad.info
@documentencoding ISO-8859-15 @documentencoding ISO-8859-15
@settitle GNU Ocrad Manual @settitle GNU Ocrad Manual
@finalout @finalout
@c %**end of header @c %**end of header
@set UPDATED 8 January 2015 @set UPDATED 19 January 2015
@set VERSION 0.25-pre5 @set VERSION 0.25-pre6
@dircategory GNU Packages @dircategory GNU Packages
@direntry @direntry
* Ocrad: (ocrad). The GNU OCR program * Ocrad: (ocrad). The GNU OCR program
@end direntry @end direntry
@ifnothtml @ifnothtml
@titlepage @titlepage
@title GNU Ocrad @title GNU Ocrad
@subtitle The GNU OCR Program @subtitle The GNU OCR Program
skipping to change at line 253 skipping to change at line 253
Filters don't enable the recognition of characters, just filter them Filters don't enable the recognition of characters, just filter them
from the output. Use @samp{--charset} to enable the recognition of a from the output. Use @samp{--charset} to enable the recognition of a
character set different from the default ISO-8859-15. character set different from the default ISO-8859-15.
Ocrad provides both built-in filters and user-defined filters. Ocrad provides both built-in filters and user-defined filters.
@section User-defined filters @section User-defined filters
The format of a user-defined filter file (@pxref{--user-filter}) is very The format of a user-defined filter file (@pxref{--user-filter}) is very
simple. Each line contains a comma-separated list of quoted characters simple. Each line contains either a character conversion or a word that
specifies the default behaviour for unlisted characters.
A character conversion is a comma-separated list of quoted characters
('c'), character sets ([0-9A-Z]), character codes (U0063), or character ('c'), character sets ([0-9A-Z]), character codes (U0063), or character
ranges (U0000 - UFFFF), and an optional conversion (an equal sign (=) ranges (U0000 - UFFFF), and an optional conversion (an equal sign (=)
followed by a quoted character or a character code). The characters in followed by a quoted character or a character code). The characters in
the list are converted to the character in the conversion. If no the list are converted to the character in the conversion. If no
conversion is specified, the character is left unmodified (converted to conversion is specified, the character is left unmodified (converted to
itself). itself).
Any character not appearing in the file, either by itself or included in The default behaviour is to discard unlisted characters, i.e. those
a set or range, will be discarded. The destination character of a characters not appearing in the file, either by themselves or included
conversion is considered as listed by default. Every character may be in a set or range. If a line containing just the word @samp{leave} is
listed more than once, even as part of different conversions. The last found in the file, unlisted characters are left unmodified. If the word
conversion affecting a given character is the one that is performed. is @samp{mark}, unlisted characters are marked as unrecognized.
The destination character of a conversion is considered as listed by
default. Every character may be listed more than once, even as part of
different conversions. The last conversion affecting a given character
is the one that is performed.
Character sets and quoted characters may contain escape sequences. Character sets and quoted characters may contain escape sequences.
The character @samp{#} at begin of line or after whitespace starts a The character @samp{#} at begin of line or after whitespace starts a
comment that extends to the end of the line. comment that extends to the end of the line.
Ranges of characters may be specified in character sets by writing the Ranges of characters may be specified in character sets by writing the
starting and ending characters with a @samp{-} between them. Thus, starting and ending characters with a @samp{-} between them. Thus,
@samp{[A-Z]} matches any ASCII uppercase letter. @samp{-} may be @samp{[A-Z]} matches any ASCII uppercase letter. @samp{-} may be
specified by placing it first or last. @samp{]} may be specified by specified by placing it first or last. @samp{]} may be specified by
skipping to change at line 294 skipping to change at line 302
Spaces and control characters are unaffected by filters, except that Spaces and control characters are unaffected by filters, except that
leadind, trailing, and duplicate spaces produced by the removal of other leadind, trailing, and duplicate spaces produced by the removal of other
characters will be themselves removed. characters will be themselves removed.
@noindent @noindent
Here is an example user-defined filter file equivalent to the built-in Here is an example user-defined filter file equivalent to the built-in
filter @samp{numbers}: filter @samp{numbers}:
@example @example
U0000 - U00FF # remove this line to get @samp{numbers_only} leave # remove this line to get @samp{numbers_only}
'D', 'O', 'Q', 'o' = '0' 'D', 'O', 'Q', 'o' = '0'
'I', 'L', 'l', '|' = '1' 'I', 'L', 'l', '|' = '1'
'Z', 'z' = '2' 'Z', 'z' = '2'
'3' '3'
'A', 'q' = '4' 'A', 'q' = '4'
'S', 's' = '5' 'S', 's' = '5'
'G', 'b', U00F3 = '6' # latin small letter o with acute 'G', 'b', U00F3 = '6' # latin small letter o with acute
'J', 'T' = '7' 'J', 'T' = '7'
'&', 'B' = '8' '&', 'B' = '8'
'g' = '9' 'g' = '9'
 End of changes. 4 change blocks. 
9 lines changed or deleted 17 lines changed or added

Home  |  About  |  All  |  Newest  |  Fossies Dox  |  Screenshots  |  Comments  |  Imprint  |  Privacy  |  HTTPS