"Fossies" - the Fresh Open Source Software Archive

Member "texstudio-3.0.1/src/hunspell/README" (31 Aug 2020, 9887 Bytes) of package /linux/misc/texstudio-3.0.1.tar.gz:


As a special service "Fossies" has tried to format the requested text file into HTML format (style: standard) with prefixed line numbers. Alternatively you can here view or download the uninterpreted source code file. See also the last Fossies "Diffs" side-by-side code changes report for "README": 2.12.22_vs_3.0.0.

    1 # About Hunspell
    2 
    3 Hunspell is a free spell checker and morphological analyzer library
    4 and command-line tool, licensed under LGPL/GPL/MPL tri-license.
    5 
    6 Hunspell is used by LibreOffice office suite, free browsers, like
    7 Mozilla Firefox and Google Chrome, and other tools and OSes, like
    8 Linux distributions and macOS. It is also a command-line tool for
    9 Linux, Unix-like and other OSes.
   10 
   11 It is designed for quick and high quality spell checking and
   12 correcting for languages with word-level writing system,
   13 including languages with rich morphology, complex word compounding
   14 and character encoding.
   15 
   16 Hunspell interfaces: Ispell-like terminal interface using Curses
   17 library, Ispell pipe interface, C++/C APIs and shared library, also
   18 with existing language bindings for other programming languages.
   19 
   20 Hunspell's code base comes from OpenOffice.org's MySpell library,
   21 developed by Kevin Hendricks (originally a C++ reimplementation of
   22 spell checking and affixation of Geoff Kuenning's International
   23 Ispell from scratch, later extended with eg. n-gram suggestions),
   24 see http://lingucomponent.openoffice.org/MySpell-3.zip, and
   25 its README, CONTRIBUTORS and license.readme (here: license.myspell) files.
   26 
   27 Main features of Hunspell library, developed by László Németh:
   28 
   29   - Unicode support
   30   - Highly customizable suggestions: word-part replacement tables and
   31     stem-level phonetic and other alternative transcriptions to recognize
   32     and fix all typical misspellings, don't suggest offensive words etc.
   33   - Complex morphology: dictionary and affix homonyms; twofold affix
   34     stripping to handle inflectional and derivational morpheme groups for
   35     agglutinative languages, like Azeri, Basque, Estonian, Finnish, Hungarian,
   36     Turkish; 64 thousand affix classes with arbitrary number of affixes;
   37     conditional affixes, circumfixes, fogemorphemes, zero morphemes,
   38     virtual dictionary stems, forbidden words to avoid overgeneration etc.
   39   - Handling complex compounds (for example, for Finno-Ugric, German and
   40     Indo-Aryan languages): recognizing compounds made of arbitrary
   41     number of words, handle affixation within compounds etc.
   42   - Custom dictionaries with affixation
   43   - Stemming
   44   - Morphological analysis (in custom item and arrangement style)
   45   - Morphological generation
   46   - SPELLML XML API over plain spell() API function for easier integration
   47     of stemming, morpological generation and custom dictionaries with affixation
   48   - Language specific algorithms, like special casing of Azeri or Turkish
   49     dotted i and German sharp s, and special compound rules of Hungarian.
   50 
   51 Main features of Hunspell command line tool, developed by László Németh:
   52 
   53   - Reimplementation of quick interactive interface of Geoff Kuenning's Ispell
   54   - Parsing formats: text, OpenDocument, TeX/LaTeX, HTML/SGML/XML, nroff/troff
   55   - Custom dictionaries with optional affixation, specified by a model word
   56   - Multiple dictionary usage (for example hunspell -d en_US,de_DE,de_medical)
   57   - Various filtering options (bad or good words/lines)
   58   - Morphological analysis (option -m)
   59   - Stemming (option -s)
   60 
   61 See man hunspell, man 3 hunspell, man 5 hunspell for complete manual.
   62 
   63 # Dependencies
   64 
   65 Build only dependencies:
   66 
   67     g++ make autoconf automake autopoint libtool
   68 
   69 Runtime dependencies:
   70 
   71 |               | Mandatory        | Optional         |
   72 |---------------|------------------|------------------|
   73 |libhunspell    |                  |                  |
   74 |hunspell tool  | libiconv gettext | ncurses readline |
   75 
   76 # Compiling on GNU/Linux and Unixes
   77 
   78 We first need to download the dependencies. On Linux, `gettext` and
   79 `libiconv` are part of the standard library. On other Unixes we
   80 need to manually install them.
   81 
   82 For Ubuntu:
   83 
   84     sudo apt install autoconf automake autopoint libtool
   85 
   86 Then run the following commands:
   87 
   88     autoreconf -vfi
   89     ./configure
   90     make
   91     sudo make install
   92     sudo ldconfig
   93 
   94 For dictionary development, use the `--with-warnings` option of
   95 configure.
   96 
   97 For interactive user interface of Hunspell executable, use the
   98 `--with-ui option`.
   99 
  100 Optional developer packages:
  101 
  102   - ncurses (need for --with-ui), eg. libncursesw5 for UTF-8
  103   - readline (for fancy input line editing, configure parameter:
  104     --with-readline)
  105 
  106 In Ubuntu, the packages are:
  107 
  108     libncurses5-dev libreadline-dev
  109 
  110 # Compiling on OSX and macOS
  111 
  112 On macOS for compiler always use `clang` and not `g++` because Homebrew
  113 dependencies are build with that.
  114 
  115     brew install autoconf automake libtool gettext
  116     brew link gettext --force
  117 
  118 Then run autoreconf, configure, make. See above.
  119 
  120 # Compiling on Windows
  121 
  122 ## Compiling with Mingw64 and MSYS2
  123 
  124 Download Msys2, update everything and install the following
  125     packages:
  126 
  127     pacman -S base-devel mingw-w64-x86_64-toolchain mingw-w64-x86_64-libtool
  128 
  129 Open Mingw-w64 Win64 prompt and compile the same way as on Linux, see
  130 above.
  131 
  132 ## Compiling in Cygwin environment
  133 
  134 Download and install Cygwin environment for Windows with the following
  135 extra packages:
  136 
  137   - make
  138   - automake
  139   - autoconf
  140   - libtool
  141   - gcc-g++ development package
  142   - ncurses, readline (for user interface)
  143   - iconv (character conversion)
  144 
  145 Then compile the same way as on Linux. Cygwin builds depend on
  146 Cygwin1.dll.
  147 
  148 # Debugging
  149 
  150 It is recommended to install a debug build of the standard library:
  151 
  152     libstdc++6-6-dbg
  153 
  154 For debugging we need to create a debug build and then we need to start
  155 `gdb`.
  156 
  157     ./configure CXXFLAGS='-g -O0 -Wall -Wextra'
  158     make
  159     ./libtool --mode=execute gdb src/tools/hunspell
  160 
  161 You can also pass the `CXXFLAGS` directly to `make` without calling
  162 `./configure`, but we don't recommend this way during long development
  163 sessions.
  164 
  165 If you like to develop and debug with an IDE, see documentation at
  166 https://github.com/hunspell/hunspell/wiki/IDE-Setup
  167 
  168 # Testing
  169 
  170 Testing Hunspell (see tests in tests/ subdirectory):
  171 
  172     make check
  173 
  174 or with Valgrind debugger:
  175 
  176     make check
  177     VALGRIND=[Valgrind_tool] make check
  178 
  179 For example:
  180 
  181     make check
  182     VALGRIND=memcheck make check
  183 
  184 # Documentation
  185 
  186 features and dictionary format:
  187 
  188     man 5 hunspell
  189     man hunspell
  190     hunspell -h
  191 
  192 http://hunspell.github.io/
  193 
  194 # Usage
  195 
  196 After compiling and installing (see INSTALL) you can run the Hunspell
  197 spell checker (compiled with user interface) with a Hunspell or Myspell
  198 dictionary:
  199 
  200     hunspell -d en_US text.txt
  201 
  202 or without interface:
  203 
  204     hunspell
  205     hunspell -d en_GB -l <text.txt
  206 
  207 Dictionaries consist of an affix (.aff) and dictionary (.dic) file, for
  208 example, download American English dictionary files of LibreOffice
  209 (older version, but with stemming and morphological generation) with
  210 
  211     wget -O en_US.aff  https://cgit.freedesktop.org/libreoffice/dictionaries/plain/en/en_US.aff?id=a4473e06b56bfe35187e302754f6baaa8d75e54f
  212     wget -O en_US.dic https://cgit.freedesktop.org/libreoffice/dictionaries/plain/en/en_US.dic?id=a4473e06b56bfe35187e302754f6baaa8d75e54f
  213 
  214 and with command line input and output, it's possible to check its work quickly,
  215 for example with the input words "example", "examples", "teached" and
  216 "verybaaaaaaaaaaaaaaaaaaaaaad":
  217 
  218     $ hunspell -d en_US
  219     Hunspell 1.7.0
  220     example
  221     *
  222 
  223     examples
  224     + example
  225 
  226     teached
  227     & teached 9 0: taught, teased, reached, teaches, teacher, leached, beached
  228 
  229     verybaaaaaaaaaaaaaaaaaaaaaad
  230     # verybaaaaaaaaaaaaaaaaaaaaaad 0
  231 
  232 Where in the output, `*` and `+` mean correct (accepted) words (`*` = dictionary stem,
  233 `+` = affixed forms of the following dictionary stem), and
  234 `&` and `#` mean bad (rejected) words (`&` = with suggestions, `#` = without suggestions)
  235 (see man hunspell).
  236 
  237 Example for stemming:
  238 
  239     $ hunspell -d en_US -s
  240     mice
  241     mice mouse
  242 
  243 Example for morphological analysis (very limited with this English dictionary):
  244 
  245     $ hunspell -d en_US -m
  246     mice
  247     mice  st:mouse ts:Ns
  248 
  249     cats
  250     cats  st:cat ts:0 is:Ns
  251     cats  st:cat ts:0 is:Vs
  252 
  253 # Other executables
  254 
  255 The src/tools directory contains the following executables after compiling.
  256 
  257   - The main executable:
  258       - hunspell: main program for spell checking and others (see
  259         manual)
  260   - Example tools:
  261       - analyze: example of spell checking, stemming and morphological
  262         analysis
  263       - chmorph: example of automatic morphological generation and
  264         conversion
  265       - example: example of spell checking and suggestion
  266   - Tools for dictionary development:
  267       - affixcompress: dictionary generation from large (millions of
  268         words) vocabularies
  269       - makealias: alias compression (Hunspell only, not back compatible
  270         with MySpell)
  271       - wordforms: word generation (Hunspell version of unmunch)
  272       - hunzip: decompressor of hzip format
  273       - hzip: compressor of hzip format
  274       - munch (DEPRECATED, use affixcompress): dictionary generation
  275         from vocabularies (it needs an affix file, too).
  276       - unmunch (DEPRECATED, use wordforms): list all recognized words
  277         of a MySpell dictionary
  278 
  279 Example for morphological generation:
  280 
  281     $ ~/hunspell/src/tools/analyze en_US.aff en_US.dic /dev/stdin
  282     cat mice
  283     generate(cat, mice) = cats
  284     mouse cats
  285     generate(mouse, cats) = mice
  286     generate(mouse, cats) = mouses
  287 
  288 # Using Hunspell library with GCC
  289 
  290 Including in your program:
  291 
  292     #include <hunspell.hxx>
  293 
  294 Linking with Hunspell static library:
  295 
  296     g++ -lhunspell-1.7 example.cxx
  297     # or better, use pkg-config
  298     g++ $(pkg-config --cflags --libs hunspell) example.cxx
  299 
  300 ## Dictionaries
  301 
  302 Hunspell (MySpell) dictionaries:
  303 
  304   - https://wiki.documentfoundation.org/Language_support_of_LibreOffice
  305   - http://cgit.freedesktop.org/libreoffice/dictionaries
  306   - http://extensions.libreoffice.org
  307   - http://extensions.openoffice.org
  308   - http://wiki.services.openoffice.org/wiki/Dictionaries
  309 
  310 Aspell dictionaries (conversion: man 5 hunspell):
  311 
  312   - ftp://ftp.gnu.org/gnu/aspell/dict
  313 
  314 László Németh, nemeth at numbertext org
  315