ocrad.texi (ocrad-0.25-pre5) | : | ocrad.texi (ocrad-0.25-pre6) | ||
---|---|---|---|---|
\input texinfo @c -*-texinfo-*- | \input texinfo @c -*-texinfo-*- | |||
@c %**start of header | @c %**start of header | |||
@setfilename ocrad.info | @setfilename ocrad.info | |||
@documentencoding ISO-8859-15 | @documentencoding ISO-8859-15 | |||
@settitle GNU Ocrad Manual | @settitle GNU Ocrad Manual | |||
@finalout | @finalout | |||
@c %**end of header | @c %**end of header | |||
@set UPDATED 8 January 2015 | @set UPDATED 19 January 2015 | |||
@set VERSION 0.25-pre5 | @set VERSION 0.25-pre6 | |||
@dircategory GNU Packages | @dircategory GNU Packages | |||
@direntry | @direntry | |||
* Ocrad: (ocrad). The GNU OCR program | * Ocrad: (ocrad). The GNU OCR program | |||
@end direntry | @end direntry | |||
@ifnothtml | @ifnothtml | |||
@titlepage | @titlepage | |||
@title GNU Ocrad | @title GNU Ocrad | |||
@subtitle The GNU OCR Program | @subtitle The GNU OCR Program | |||
skipping to change at line 253 | skipping to change at line 253 | |||
Filters don't enable the recognition of characters, just filter them | Filters don't enable the recognition of characters, just filter them | |||
from the output. Use @samp{--charset} to enable the recognition of a | from the output. Use @samp{--charset} to enable the recognition of a | |||
character set different from the default ISO-8859-15. | character set different from the default ISO-8859-15. | |||
Ocrad provides both built-in filters and user-defined filters. | Ocrad provides both built-in filters and user-defined filters. | |||
@section User-defined filters | @section User-defined filters | |||
The format of a user-defined filter file (@pxref{--user-filter}) is very | The format of a user-defined filter file (@pxref{--user-filter}) is very | |||
simple. Each line contains a comma-separated list of quoted characters | simple. Each line contains either a character conversion or a word that | |||
specifies the default behaviour for unlisted characters. | ||||
A character conversion is a comma-separated list of quoted characters | ||||
('c'), character sets ([0-9A-Z]), character codes (U0063), or character | ('c'), character sets ([0-9A-Z]), character codes (U0063), or character | |||
ranges (U0000 - UFFFF), and an optional conversion (an equal sign (=) | ranges (U0000 - UFFFF), and an optional conversion (an equal sign (=) | |||
followed by a quoted character or a character code). The characters in | followed by a quoted character or a character code). The characters in | |||
the list are converted to the character in the conversion. If no | the list are converted to the character in the conversion. If no | |||
conversion is specified, the character is left unmodified (converted to | conversion is specified, the character is left unmodified (converted to | |||
itself). | itself). | |||
Any character not appearing in the file, either by itself or included in | The default behaviour is to discard unlisted characters, i.e. those | |||
a set or range, will be discarded. The destination character of a | characters not appearing in the file, either by themselves or included | |||
conversion is considered as listed by default. Every character may be | in a set or range. If a line containing just the word @samp{leave} is | |||
listed more than once, even as part of different conversions. The last | found in the file, unlisted characters are left unmodified. If the word | |||
conversion affecting a given character is the one that is performed. | is @samp{mark}, unlisted characters are marked as unrecognized. | |||
The destination character of a conversion is considered as listed by | ||||
default. Every character may be listed more than once, even as part of | ||||
different conversions. The last conversion affecting a given character | ||||
is the one that is performed. | ||||
Character sets and quoted characters may contain escape sequences. | Character sets and quoted characters may contain escape sequences. | |||
The character @samp{#} at begin of line or after whitespace starts a | The character @samp{#} at begin of line or after whitespace starts a | |||
comment that extends to the end of the line. | comment that extends to the end of the line. | |||
Ranges of characters may be specified in character sets by writing the | Ranges of characters may be specified in character sets by writing the | |||
starting and ending characters with a @samp{-} between them. Thus, | starting and ending characters with a @samp{-} between them. Thus, | |||
@samp{[A-Z]} matches any ASCII uppercase letter. @samp{-} may be | @samp{[A-Z]} matches any ASCII uppercase letter. @samp{-} may be | |||
specified by placing it first or last. @samp{]} may be specified by | specified by placing it first or last. @samp{]} may be specified by | |||
skipping to change at line 294 | skipping to change at line 302 | |||
Spaces and control characters are unaffected by filters, except that | Spaces and control characters are unaffected by filters, except that | |||
leadind, trailing, and duplicate spaces produced by the removal of other | leadind, trailing, and duplicate spaces produced by the removal of other | |||
characters will be themselves removed. | characters will be themselves removed. | |||
@noindent | @noindent | |||
Here is an example user-defined filter file equivalent to the built-in | Here is an example user-defined filter file equivalent to the built-in | |||
filter @samp{numbers}: | filter @samp{numbers}: | |||
@example | @example | |||
U0000 - U00FF # remove this line to get @samp{numbers_only} | leave # remove this line to get @samp{numbers_only} | |||
'D', 'O', 'Q', 'o' = '0' | 'D', 'O', 'Q', 'o' = '0' | |||
'I', 'L', 'l', '|' = '1' | 'I', 'L', 'l', '|' = '1' | |||
'Z', 'z' = '2' | 'Z', 'z' = '2' | |||
'3' | '3' | |||
'A', 'q' = '4' | 'A', 'q' = '4' | |||
'S', 's' = '5' | 'S', 's' = '5' | |||
'G', 'b', U00F3 = '6' # latin small letter o with acute | 'G', 'b', U00F3 = '6' # latin small letter o with acute | |||
'J', 'T' = '7' | 'J', 'T' = '7' | |||
'&', 'B' = '8' | '&', 'B' = '8' | |||
'g' = '9' | 'g' = '9' | |||
End of changes. 4 change blocks. | ||||
9 lines changed or deleted | 17 lines changed or added |