utf8proc  2.3.0
About: utf8proc is a clean C library for processing UTF-8 Unicode data: normalization, case-folding, graphemes, and other operations.
  Fossies Dox: utf8proc-2.3.0.tar.gz  ("inofficial" and yet experimental doxygen-generated source code documentation)  

Some Fossies usage hints in advance:

  1. To see the Doxygen generated documentation please click on one of the items in the "quick index" bar above or use the side panel at the left which displays a hierarchical tree-like index structure and is adjustable in width.
  2. If you want to search for something by keyword rather than browse for it you can use the client side search facility (using Javascript and DHTML) that provides live searching, i.e. the search results are presented and adapted as you type in the Search input field at the top right.
  3. Doxygen doesn't incorporate all member files but just a definable subset (basically the main project source code files that are written in a supported language). So to search and browse all member files you may visit the Fossies utf8proc-2.3.0.tar.gz contents page and use the Fossies standard member browsing features (also with source code highlighting and additionally with optional code folding).
utf8proc

Travis CI Status AppVeyor status

utf8proc is a small, clean C library that provides Unicode normalization, case-folding, and other operations for data in the UTF-8 encoding. It was initially developed by Jan Behrens and the rest of the Public Software Group, who deserve nearly all of the credit for this package. With the blessing of the Public Software Group, the Julia developers have taken over development of utf8proc, since the original developers have moved to other projects.

(utf8proc is used for basic Unicode support in the Julia language, and the Julia developers became involved because they wanted to add Unicode 7 support and other features.)

(The original utf8proc package also includes Ruby and PostgreSQL plug-ins. We removed those from utf8proc in order to focus exclusively on the C library for the time being, but plan to add them back in or release them as separate packages.)

The utf8proc package is licensed under the free/open-source MIT "expat" license (plus certain Unicode data governed by the similarly permissive Unicode data license); please see the included LICENSE.md file for more detailed information.

For compilation of the C library run make.

The C library is found in this directory after successful compilation and is named libutf8proc.a (for the static library) and libutf8proc.so (for the dynamic library).

The Unicode version supported is 12.0.0.

For Unicode normalizations, the following options are used:

  • Normalization Form C: STABLE, COMPOSE
  • Normalization Form D: STABLE, DECOMPOSE
  • Normalization Form KC: STABLE, COMPOSE, COMPAT
  • Normalization Form KD: STABLE, DECOMPOSE, COMPAT

The documentation for the C library is found in the utf8proc.h header file. utf8proc_map is function you will most likely be using for mapping UTF-8 strings, unless you want to allocate memory yourself.

See the Github issues list.

Bug reports, feature requests, and other queries can be filed at the utf8proc issues page on Github.

An independent Lua translation of this library, lua-mojibake, is also available.