"Fossies" - the Fresh Open Source Software Archive

Member "tin-2.6.2/doc/umlauts.txt" (23 Aug 2021, 4246 Bytes) of package /linux/misc/tin-2.6.2.tar.xz:


As a special service "Fossies" has tried to format the requested text file into HTML format (style: standard) with prefixed line numbers. Alternatively you can here view or download the uninterpreted source code file.

    1 Charset and umlaut handling in tin
    2 ==================================
    3 
    4 Umlauts when reading
    5 --------------------
    6 
    7 After reading a posting from the newsserver tin checks if a charset has been
    8 declared in the header. If not, tin assumes the appropriate entry of a
    9 corresponding undeclared_charset variable in the attributes file. If there
   10 still is no match, tin assumes US-ASCII as the charset for this posting.
   11 
   12 After that the posting is converted to the local charset. This charset is
   13 set in the so-called locales which are normally set via environment
   14 variables (LANG, LC_*). If the posting contains characters not included in
   15 the determined charset (e. g. 8 bit characters in a US-ASCII posting), these
   16 characters are substituted by a question mark. This is also the case for
   17 characters that cannot be displayed in the local charset. The now converted
   18 posting is then displayed.
   19 
   20 
   21 Umlauts when writing
   22 --------------------
   23 
   24 If you answer to a posting, the converted posting will be handed over to
   25 your editor. The editor should be able to cope with characters in your local
   26 charset, of course. Joe, for example, has difficulties with UTF-8, so you
   27 shouldn't use it in such an environment. Finish your response in the editor
   28 as normal and leave it to get back to tin.
   29 
   30 When you post the message, tin determines the charset you want to use. You
   31 set this charset with the variable mm_network_charset either in your
   32 attributes file for the current group or globally in your tinrc file. The
   33 later can also be done from the Menu (watch out for MM_NETWORK_CHARSET). Tin
   34 then converts the posting (or the mail) from the local charset to the
   35 mentioned charset. As when reading it might be possible that you used
   36 characters locally that are not available in the destination charset. In
   37 this case tin issues a warning so that you can replace the offending
   38 characters. If you ignore the warning, these characters are again
   39 substituted by question marks.
   40 
   41 
   42 When you always see question marks
   43 ----------------------------------
   44 
   45 First you should make sure that tin knows what local charset to use for
   46 display. Tin normally uses "locale" for that. Just enter `locale` on your
   47 console to find out, or `echo $LANG, $LC_TYPE`. You should get something
   48 like "de_DE.ISO-8859-1" which is a language code (de in this case) followed
   49 by an underscore, a country code (DE) followed by a dot, and a charset
   50 (ISO-8859-1).
   51 
   52 If you don't see a valid setting for your locale you should setup one for
   53 yourself as described above. For example, in the US and for a UTF-8 capable
   54 terminal you would use `LC_CTYPE=en_US.UTF-8; export LC_CTYPE` in a bash or
   55 ksh; the corresponding command for (t)csh is `setenv LC_TYPE en_US.UTF-8`.
   56 
   57 As the next step you should configure a charset that tin assumes if there is
   58 no charset declared in the header of a posting. This can be done via the
   59 undeclared_charset variable in your attributes file (to be found in your
   60 .tin directory:
   61 
   62 scope=*
   63 undeclared_charset=Windows-1252
   64 
   65 This tells tin to assume the Windows-1252 charset. Since most people use
   66 Windows nowadays and this charset is default for North America and Western
   67 Europe, and this charset is mostly compatible with the widespread ISO 8859-1
   68 charset, this should cover many postings. For special newsgroups this
   69 configuration should be improved by setting up another charset in a
   70 different scope. For example, the pl.* hierarchy mostly uses ISO 8859-2:
   71 
   72 scope=pl.*,cz.*,hin.*,sk.*,hr.*
   73 undeclared_charset=ISO-8859-2
   74 
   75 Especially in the Far East you may need further entries, for example:
   76 
   77 scope=chinese.*,alt.chinese.text.big5,tw.*
   78 undeclared_charset=Big5
   79 
   80 scope=fj.*,jp.*,japan.*
   81 undeclared_charset=ISO-2022-JP
   82 
   83 If all these settings don't help you it is likely that the system locales
   84 are broken or even not installed. If the latter is the case, you should
   85 install an appropriate library (or let your administration do it for you if
   86 you use a pre-configured environment). Libiconv from Bruno Haible is a good
   87 choice.
   88 
   89 If even this isn't possible the last alternative is to recompile tin. Invoke
   90 `make distclean` and configure with --disable-locale added your normal
   91 options. In this case tin assumes every posting to be in the local charset.
   92 Note: This can screw up your terminal.