Phonetic Analysis Plugin
The Phonetic Analysis plugin provides token filters which convert tokens to their phonetic representation using Soundex, Metaphone, and a variety of other algorithms.
Installation
This plugin can be installed using the plugin manager:
sudo bin/elasticsearch-plugin install analysis-phonetic
The plugin must be installed on every node in the cluster, and each node must be restarted after installation.
This plugin can be downloaded for offline install from {plugin_url}/analysis-phonetic/analysis-phonetic-{version}.zip.
Removal
The plugin can be removed with the following command:
sudo bin/elasticsearch-plugin remove analysis-phonetic
The node must be stopped before removing the plugin.
phonetic
token filter
The phonetic
token filter takes the following settings:
encoder
-
Which phonetic encoder to use. Accepts
metaphone
(default),double_metaphone
,soundex
,refined_soundex
,caverphone1
,caverphone2
,cologne
,nysiis
,koelnerphonetik
,haasephonetik
,beider_morse
,daitch_mokotoff
. replace
-
Whether or not the original token should be replaced by the phonetic token. Accepts
true
(default) andfalse
. Not supported bybeider_morse
encoding.
PUT phonetic_sample
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_metaphone"
]
}
},
"filter": {
"my_metaphone": {
"type": "phonetic",
"encoder": "metaphone",
"replace": false
}
}
}
}
}
}
GET phonetic_sample/_analyze
{
"analyzer": "my_analyzer",
"text": "Joe Bloggs" (1)
}
-
Returns:
J
,joe
,BLKS
,bloggs
Double metaphone settings
If the double_metaphone
encoder is used, then this additional setting is
supported:
max_code_len
-
The maximum length of the emitted metaphone token. Defaults to
4
.
Beider Morse settings
If the beider_morse
encoder is used, then these additional settings are
supported:
rule_type
-
Whether matching should be
exact
orapprox
(default). name_type
-
Whether names are
ashkenazi
,sephardic
, orgeneric
(default). languageset
-
An array of languages to check. If not specified, then the language will be guessed. Accepts:
any
,common
,cyrillic
,english
,french
,german
,hebrew
,hungarian
,polish
,romanian
,russian
,spanish
.