"Fossies" - the Fresh Open Source Software Archive  

Source code changes of the file "uniconv/uniconv.1" between
yudit-3.0.5.tar.gz and yudit-3.0.7.tar.gz

About: yudit is an Unicode plain-text editor that can do True Type font rendering, printing, transliterated keyboard input and handwriting recognition with no dependencies on external engines.

uniconv.1  (yudit-3.0.5):uniconv.1  (yudit-3.0.7)
skipping to change at line 16 skipping to change at line 16
SYNOPSIS SYNOPSIS
uniconv -out output-file [ -decode input-encoding ] [ -encode output-enc oding ] [ input-file ] [ -todos uniconv -out output-file [ -decode input-encoding ] [ -encode output-enc oding ] [ input-file ] [ -todos
] [ -fromdos ] [ -tomac ] [ -frommac ] ] [ -fromdos ] [ -tomac ] [ -frommac ]
DESCRIPTION DESCRIPTION
uniconv program decodes scripts with a certain encoding encodes them wit h some other encoding. The uniconv program decodes scripts with a certain encoding encodes them wit h some other encoding. The
scipt is a 16,8 or 7 bit-byte stream. The converted text will be sent to the standard output, even in scipt is a 16,8 or 7 bit-byte stream. The converted text will be sent to the standard output, even in
case of 16-bit encodings,unless the output file is specified by the -out option. case of 16-bit encodings,unless the output file is specified by the -out option.
The -decode and -encode options are optional, the default converter is ut f-8. The program reads the The -decode and -encode options are optional, the default converter is ut f-8. The program reads the
Unicode map helper files (*.my) from the default directory /usr/share/dat Unicode map helper files (*.my) from the default directory /Appli
a. Simple 1-to-1 encodings can cations/Yudit.app/Contents/Mac-
be added on the fly by adding a a my-file, or setting your OS/share/data. Simple 1-to-1 encodings can be added on the fly by adding
yudit.datapath property in a a my-file, or setting your
~/.yudit/yudit.properties or /usr/share/yudit/config/yudit.properties. B yudit.datapath property in ~/.yudit/yudit.properties or /Appli
y default /usr/share/yudit/data cations/Yudit.app/Contents/Mac-
is searched. OS/share/yudit/config/yudit.properties. By default /Appli
cations/Yudit.app/Contents/Mac-
OS/share/yudit/data is searched.
My-files can be created by a program called The files can be converted be My-files can be created by a program called The files can be converted b
tween dos/unix/mac line-ending etween dos/unix/mac line-ending
variants with -fromdos, -frommac, -todos, -tomac options. the defaul variants with -fromdos, -frommac, -todos, -tomac options. the default (n
t (not scpecified one) is Unix. ot scpecified one) is Unix.
makeumap. makeumap.
ENCODING ENCODING
If you received this program through the Yudit distribution, then as of t oday you can convert between If you received this program through the Yudit distribution, then as o f today you can convert between
the encodings below. the encodings below.
utf-8 Yudit recommends this format for international information e utf-8 Yudit recommends this format for international information excha
xchange. ASCII text will get nge. ASCII text will get
through intact, while other unicode characters will get their 8th through intact, while other unicode characters will get their
bit set and the length of 8th bit set and the length of
the code will depend on how far away they are in the Unicode spa ce. This is the only transfor- the code will depend on how far away they are in the Unicode spa ce. This is the only transfor-
mation format that can encode both 16-bit (ucs-2) and 31-bit (ucs- 4) unicode. mation format that can encode both 16-bit (ucs-2) and 31-bit (ucs- 4) unicode.
utf-8-s utf-8-s
Hackers utf-8 format - it does not give an error message when a su Hackers utf-8 format - it does not give an error message when a s
rrogate pair is decoded and it urrogate pair is decoded and it
can encode a surrogate pair 'as is'. This is not a recommende can encode a surrogate pair 'as is'. This is not a recommended e
d encoding format although this ncoding format although this
format is used to encode/decode clipboard data, in order to preser ve input. format is used to encode/decode clipboard data, in order to preser ve input.
utf-16 Although 16 is bigger than 8 this is still a compromise required b y OSes like Windows that can utf-16 Although 16 is bigger than 8 this is still a compromise require d by OSes like Windows that can
not handle ucs-4 - this encoding produces 16-bit unicode streams. In addition to BMP it can con- not handle ucs-4 - this encoding produces 16-bit unicode streams. In addition to BMP it can con-
vert 16 planes using the Unicode Surrogate Area. This encoding ca vert 16 planes using the Unicode Surrogate Area. This encoding
n not convert anything above can not convert anything above
U+10FFFF (Plane 16). The input byte order is recognized by the U+10FFFF (Plane 16). The input byte order is recognized by the fi
first two characters BEM (byte- rst two characters BEM (byte-
order-mark) U+FEFF. This format is used in Windows NT for document s like notepad .txt files. order-mark) U+FEFF. This format is used in Windows NT for document s like notepad .txt files.
utf-16-be utf-16-be
Big endian utf-16 converter. Big endian utf-16 converter.
utf-16-le utf-16-le
Littlen endian utf-16 converter. Littlen endian utf-16 converter.
utf-7 This is the recommended format for international information excha nge, when 7-bit can only be utf-7 This is the recommended format for international information e xchange, when 7-bit can only be
used. It can only handle 16-bit (utf-16) unicode, for ucs-4 (above U+10FFFF) you should use utf-8 used. It can only handle 16-bit (utf-16) unicode, for ucs-4 (above U+10FFFF) you should use utf-8
encoding. encoding.
iso-8859-1 iso-8859-1
This is the ISO 8859-1 character encoding format. It is also know n as "Latin-1" encoding. This is the ISO 8859-1 character encoding format. It is also know n as "Latin-1" encoding.
iso-8859-2 iso-8859-2
This is the ISO 8859-2 character encoding format. It is also kno wn as "Central European" encod- This is the ISO 8859-2 character encoding format. It is also kno wn as "Central European" encod-
ing. ing.
skipping to change at line 84 skipping to change at line 85
This is the CP1251 cyrillic character encoding format. It is mainl y used in Microsoft Windows and This is the CP1251 cyrillic character encoding format. It is mainl y used in Microsoft Windows and
some web sites. some web sites.
iso-2022-jp iso-2022-jp
This is a Japanese character encoding format. It is a 7-bit encodi ng format. This is a Japanese character encoding format. It is a 7-bit encodi ng format.
iso-2022-jp-3 iso-2022-jp-3
This is a Japanese character encoding format. It is a 7-bit encodi ng format. It is base upon JIS This is a Japanese character encoding format. It is a 7-bit encodi ng format. It is base upon JIS
X 0213 standard. X 0213 standard.
euc-jp This is a Japanese character encoding format. It is an 8-bit e ncoding format. Mainly used in euc-jp This is a Japanese character encoding format. It is an 8-bit encod ing format. Mainly used in
UNIX systems. UNIX systems.
euc-jp-3 euc-jp-3
The official name is EUC-JISX0213 - I just could not read this. T his is a Japanese character The official name is EUC-JISX0213 - I just could not read this . This is a Japanese character
encoding format. It is a 8-bit encoding format. It is base upon J IS X 0213 standard. encoding format. It is a 8-bit encoding format. It is base upon J IS X 0213 standard.
shift-jis shift-jis
This is a Japanese character encoding format. It is an 8-bit encoding format. Mainly used in This is a Japanese character encoding format. It is an 8-bit enco ding format. Mainly used in
MSDOS/Windows. MSDOS/Windows.
shift-jis-3 shift-jis-3
The official name is Shift_JISX0213 - I just could not read this. This is a Japanese character The official name is Shift_JISX0213 - I just could not read this . This is a Japanese character
encoding format. It is an 8-bit encoding format. Mainly used in M SDOS/Windows. encoding format. It is an 8-bit encoding format. Mainly used in M SDOS/Windows.
iso-2022-jp iso-2022-jp
This is a Japanese 7-bit character encoding format. The is o-2022-jp email messages can be This is a Japanese 7-bit character encoding format. The iso-20 22-jp email messages can be
decoded/encoded are in this format. decoded/encoded are in this format.
iso-2022-x11 iso-2022-x11
This is a Japanese character encoding format. It is also known a This is a Japanese character encoding format. It is also known
s "COMPOUND_TEXT" encoding for as "COMPOUND_TEXT" encoding for
the X Window System. This is a 7-bit encoding format. It can the X Window System. This is a 7-bit encoding format. It can be
be derived from the ISO 2022-JP derived from the ISO 2022-JP
format with some differences. format with some differences.
ksc-5601-x11 ksc-5601-x11
This is a Korean character encoding format used by the X window system(COMPOUND_TEXT encoding) This is a Korean character encoding format used by the X window system(COMPOUND_TEXT encoding)
to encode Korean(KS X 1001) and US-ASCII. This is a 7bit encodi to encode Korean(KS X 1001) and US-ASCII. This is a 7bit encoding
ng format compliant to ISO-2022 format compliant to ISO-2022
specification for encoding of multiple character sets. Please, no specification for encoding of multiple character sets. Please, n
te that this is DIFFERENT from ote that this is DIFFERENT from
ISO-2022-KR (defined in IETF RFC 1557). ISO-2022-KR (defined in IETF RFC 1557).
euc-kr This is an 8bit multibyte encoding for Korean. It encodes US-A SCII(7bit) in single byte range euc-kr This is an 8bit multibyte encoding for Korean. It encodes US-A SCII(7bit) in single byte range
and characters in KS X 1001(formerly KS C 5601) in double byte ran ge with MSB on(8bit). It's used and characters in KS X 1001(formerly KS C 5601) in double byte ran ge with MSB on(8bit). It's used
in Unix and Internet. Korean version of MS-DOS, MacOS and MS-Wind ows use compatible (most cases, in Unix and Internet. Korean version of MS-DOS, MacOS and MS-Wind ows use compatible (most cases,
identical) variant of this encoding. identical) variant of this encoding.
johab This is a Korean encoding specified in KS X 1001(KS C 5601 johab This is a Korean encoding specified in KS X 1001(KS C 560
-1992), Annex 3 as a sup- 1-1992), Annex 3 as a sup-
plementary encoding. Widely used in Korean MS-DOS until mid-1990 plementary encoding. Widely used in Korean MS-DOS until mid-1990'
's. It can encode all Hangul s. It can encode all Hangul
syllables(11,172) of modern Korean as well as all the special syllables(11,172) of modern Korean as well as all the spe
symbols and Hanja (Chinese cial symbols and Hanja (Chinese
ideograms used in Korea) defined in KS X 1001. ideograms used in Korea) defined in KS X 1001.
uhc A variant of EUC-KR used in Korean MS-Windows 95/98( uhc A variant of EUC-KR used in Korean MS-Windows 95/98(pr
proprietary encoding of Micro- oprietary encoding of Micro-
soft,CP949). Its character repertoire includes all modern syllabl soft,CP949). Its character repertoire includes all modern sylla
es of Hangul,Korean script bles of Hangul,Korean script
as well as all the special symbols and Hanja (Chinese ideograms as well as all the special symbols and Hanja (Chinese ideograms us
used in Korea) defined in KS X ed in Korea) defined in KS X
1001. 1001.
gb-18030 gb-18030
This is a Chinese character encoding format based upon GB 1 8030. It encodes the whole This is a Chinese character encoding format based upon GB 18030. It encodes the whole
U+0000..U+10FFFF range, while being compatible with gb-2312. U+0000..U+10FFFF range, while being compatible with gb-2312.
gb-2312-x11 gb-2312-x11
This is a Chinese character encoding format based upon GB 2312. I t is a 7-bit encoding format. This is a Chinese character encoding format based upon GB 2312. I t is a 7-bit encoding format.
gb-2312 gb-2312
This is a Chinese character encoding format based upon GB 2312. I t is an 8-bit encoding format. This is a Chinese character encoding format based upon GB 2312. I t is an 8-bit encoding format.
big-5 This is a Chinese character encoding format based upon BIG5 enc oding. It is an 8-bit encoding big-5 This is a Chinese character encoding format based upon BIG5 encodi ng. It is an 8-bit encoding
format. format.
hz This is a Chinese character encoding format based upon "Hanzi" enc oding. It is a 7-bit encoding hz This is a Chinese character encoding format based upon "Hanzi" en coding. It is a 7-bit encoding
format. format.
viscii This is a Vietnamese character encoding format. viscii This is a Vietnamese character encoding format.
ucs-2-be ucs-2-be
This converts 16-bit unicode (ucs-2) streams. The format takes car e of big-endian variant. Yudit This converts 16-bit unicode (ucs-2) streams. The format takes car e of big-endian variant. Yudit
does not recommend this format. does not recommend this format.
ucs-2-le ucs-2-le
This converts 16-bit unicode (ucs-2) streams. The format takes ca re of little-endian variant. This converts 16-bit unicode (ucs-2) streams. The format takes care of little-endian variant.
Yudit does not recommend this format. Yudit does not recommend this format.
ucs-2 This converts 16-bit unicode (ucs-2) streams. The input byte o rder is recognized by the first ucs-2 This converts 16-bit unicode (ucs-2) streams. The input byte orde r is recognized by the first
two characters BEM (byte-order-mark) U+FEFF. Yudit does not recom mend this format. two characters BEM (byte-order-mark) U+FEFF. Yudit does not recom mend this format.
java This converts \uxxxx character escapes. When encoding, all char java This converts \uxxxx character escapes. When encoding, all
acters above U+0080 will be characters above U+0080 will be
escaped with a string like '\u0080'. When decoding the same form
at is decoded but, in addition,
utf-8 format is also recognized, so it can also be used to recover
data accidentally saved with
the wrong enconding. The U+10000..U+10FFFF area is converted to su
rrogates and vice versa.
java-s This converts \uxxxx character escapes. When encoding, all
characters above U+0080 will be
escaped with a string like '\u0080'. When decoding the same format is decoded but, in addition, escaped with a string like '\u0080'. When decoding the same format is decoded but, in addition,
utf-8 format is also recognized, so it can also be used to recov er data accidentally saved with utf-8 format is also recognized, so it can also be used to recov er data accidentally saved with
the wrong enconding. Surrogates are not treated specially during c the wrong enconding. The U+10000..U+10FFFF area is converted to su
onversion - this is why it is rrogates and vice versa.
java-s This converts \uxxxx character escapes. When encoding, all char
acters above U+0080 will be
escaped with a string like '\u0080'. When decoding the same form
at is decoded but, in addition,
utf-8 format is also recognized, so it can also be used to recover
data accidentally saved with
the wrong enconding. Surrogates are not treated specially during
conversion - this is why it is
not a recommened conversion. not a recommened conversion.
FILES FILES
~/.yudit/yudit.properties or /usr/share/yudit/config/yudit.properties ~/.yudit/yudit.properties or /Applications/Yudit.app/Contents/MacOS/share
can have yudit.datapath property. This is where the map /yudit/config/yudit.properties
files are kept. By default can have yudit.datapath property. This is where the map files are
/usr/share/yudit/data is searched. kept. By default /Applica-
tions/Yudit.app/Contents/MacOS/share/yudit/data is searched.
SEE ALSO SEE ALSO
makeumap makeumap
AUTHOR AUTHOR
This program was written by gsinai@yudit.org (Gaspar Sinai), Tokyo, 2 Ja nuary, 2001. This program was written by gsinai@yudit.org (Gaspar Sinai), Tokyo, 2 Ja nuary, 2001.
LINUX COMMANDS Nov 5 1997 UNICONV(1) LINUX COMMANDS Nov 5 1997 UNICONV(1)
 End of changes. 25 change blocks. 
73 lines changed or deleted 76 lines changed or added

Home  |  About  |  Features  |  All  |  Newest  |  Dox  |  Diffs  |  RSS Feeds  |  Screenshots  |  Comments  |  Imprint  |  Privacy  |  HTTP(S)