"Fossies" - the Fresh Open Source Software Archive

Member "elasticsearch-6.8.23/docs/plugins/analysis-phonetic.asciidoc" (29 Dec 2021, 2347 Bytes) of package /linux/www/elasticsearch-6.8.23-src.tar.gz:


As a special service "Fossies" has tried to format the requested source page into HTML format (assuming AsciiDoc format). Alternatively you can here view or download the uninterpreted source code file. A member file download can also be achieved by clicking within a package contents listing on the according byte size field.

Phonetic Analysis Plugin

The Phonetic Analysis plugin provides token filters which convert tokens to their phonetic representation using Soundex, Metaphone, and a variety of other algorithms.

Installation

This plugin can be installed using the plugin manager:

sudo bin/elasticsearch-plugin install analysis-phonetic

The plugin must be installed on every node in the cluster, and each node must be restarted after installation.

This plugin can be downloaded for offline install from {plugin_url}/analysis-phonetic/analysis-phonetic-{version}.zip.

Removal

The plugin can be removed with the following command:

sudo bin/elasticsearch-plugin remove analysis-phonetic

The node must be stopped before removing the plugin.

phonetic token filter

The phonetic token filter takes the following settings:

encoder

Which phonetic encoder to use. Accepts metaphone (default), double_metaphone, soundex, refined_soundex, caverphone1, caverphone2, cologne, nysiis, koelnerphonetik, haasephonetik, beider_morse, daitch_mokotoff.

replace

Whether or not the original token should be replaced by the phonetic token. Accepts true (default) and false. Not supported by beider_morse encoding.

PUT phonetic_sample
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "my_analyzer": {
            "tokenizer": "standard",
            "filter": [
              "lowercase",
              "my_metaphone"
            ]
          }
        },
        "filter": {
          "my_metaphone": {
            "type": "phonetic",
            "encoder": "metaphone",
            "replace": false
          }
        }
      }
    }
  }
}

GET phonetic_sample/_analyze
{
  "analyzer": "my_analyzer",
  "text": "Joe Bloggs" (1)
}
  1. Returns: J, joe, BLKS, bloggs

Double metaphone settings

If the double_metaphone encoder is used, then this additional setting is supported:

max_code_len

The maximum length of the emitted metaphone token. Defaults to 4.

Beider Morse settings

If the beider_morse encoder is used, then these additional settings are supported:

rule_type

Whether matching should be exact or approx (default).

name_type

Whether names are ashkenazi, sephardic, or generic (default).

languageset

An array of languages to check. If not specified, then the language will be guessed. Accepts: any, common, cyrillic, english, french, german, hebrew, hungarian, polish, romanian, russian, spanish.