"Fossies" - the Fresh Open Source Software Archive

Member "utrac-0.3.2/src/utrac.1" (4 Jan 2009, 3351 Bytes) of package /linux/privat/old/utrac-0.3.2.tgz:

Caution: As a special service "Fossies" has tried to format the requested manual source page into HTML format but links to other man pages may be missing or even erroneous. Alternatively you can here view or download the uninterpreted manual source code. A member file download can also be achieved by clicking within a package contents listing on the according byte size field.




utrac − recognize and convert charset and end-of-line of text files


utrac [OPTION] [FILE]


Utrac is a tool (and a library) that recognize the charset and the end of line type used in a text file. It can also convert it. In case of 8bits charsets, recognition is not sure, so it can also assist the user to choose the correct charset, for instance by filtering the text and displaying only lines that matter.


With no FILE, read standard input. With no OPTION, recognize and write converted text to standard output.
, −−print-charset

Print the name of the charset that suits best the input file.

−P, −−print-all-charset

Print ranked list of charsets. The first column is the mark with locale bonus (language and system), the second is the mark brut, the third is the checksum of all extended character (to know which charsets produce the same results) and the fourth is the charset name (on the same line if their mark with bonus and their checksum are identical).
If the recognition is sure (ASCII or UTF-8), print only one name.

−f, −−from

Force input charset (disable recognition) and/or EOL. For instance, "UTF-8/CRLF".

−t, −−to

Select output charset and/or EOL. See above.

−L, −−language

Select language. All charsets that fit this language will get a bonus during recognition. If none specified, LC_* variables are used.

−S, −−system

Select system. All charsets that fit this language will get a bonus during recognition.

−x, −−ext-chars

Print lines with extended characters (try to print each extended character not more than once).

−d, −−distribution

Print distribution, i.e. the count of each 8bits character.

−a, −−all-ext-chars

Print each extended character of the file in each different charset (UTF-8 output is recommended).

−c, −−colors

(with -x or -a) Use color.

−b, −−bar

Display a progress bar.

−i, −−info

Print default/chosen parameters.

−l, −−list

List charsets/eol/languages/systems.

−h, −−help

Print some help.

−v, −−version

Print version.



This file should be located in /usr/local/share/utrac/ or /usr/share/utrac/. It contains informations about charsets and their related charmap. If you want to add new charsets (they must be 8bits and ASCII compatible), check the script merge.pl in Utrac source directory.


Utrac is still a beta version, so you can expect to find some bugs... Please report them to <antoine@alliancemca.net>. If you have a text file that is not well recognize by Utrac, please send it to improve the recognition algorithm.


Written by Antoine Calando <antoine@alliancemca.net>.


Copyright © 2004 Alliance MCA.
This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


You can find more documentation from http://utrac.sourceforge.net