A Trainable Method for the Phonetic Similarity Search in German Proper Names

Oliver Jokisch¹⁶ &
Horst-Udo Hain¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10458))

Included in the following conference series:

International Conference on Speech and Computer

2353 Accesses
2 Citations

Abstract

Efficient methods for the similarity search in word databases play a significant role in various applications such as the robust search or indexing of names and addresses, spell-checking algorithms or the monitoring of trademark rights. The underlying distance measures are associated with similarity criteria of the users, and phonetic-based search algorithms are well-established since decades. Nonetheless, rule-based phonetic algorithms exhibit some weak points, e.g. their strong language dependency, the search overhead by tolerance or the risk of missing valid matches vice versa, which causes a pseudo-phonetic functionality in some cases. In contrast, we suggest a novel, adaptive method for similarity search in words, which is based on a trainable grapheme-to-phoneme (G2P) converter that generates most likely and widely correct pronunciations. Only as a second step, the similarity search in the phonemic reference data is performed by involving a conventional string metric such as the Levenshtein distance (LD). The G2P algorithm achieves a string accuracy of up to 99.5% in a German pronunciation lexicon and can be trained for different languages or specific domains such as proper names. The similarity tolerance can be easily adjusted by parameters like the admissible number or likability of pronunciation variants as well as by the phonemic or graphemic LD. As a proof of concept, we compare the G2P-based search method on a German surname database and a telephone book including first name, surname and street name to similarity matches by the conventional Cologne phonetic (Kölner Phonetik, KP) algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 71.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 89.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Integrating Approximate String Matching with Phonetic String Similarity

Grapheme to Phoneme Translation Using Conditional Random Fields with Re-Ranking

A Phonetization Approach for the Forced-Alignment Task in SPPAS

References

Baayen, R., Piepenbrock, R., Gulikers, L.: CELEX2 lexical database of German (Version 2.0). Linguistic Data Consortium Philadelphia (1995). https://catalog.ldc.upenn.edu/ldc96l14. Accessed 12 Oct 2016
Bisani, M., Ney, H.: Joint-sequence models for grapheme-to-phoneme conversion. Speech Commun. 50(5), 434–451 (2008). https://www-i6.informatik.rwth-aachen.de/web/Software/g2p.html, gPL software
Article Google Scholar
D’Haro, L.F., Banchs, R.E.: Automatic correction of ASR outputs by using machine translation. In: Interspeech 2016, San Francisco, pp. 3469–3473 (2016). http://dx.doi.org/10.21437/Interspeech.2016-299
Hain, H.-U.: Graphem-Phonem-Konvertierung, Patent DE 100 42 944 C2 (2003). (in German)
Google Scholar
Hain, H.-U.: Phonetische Transkription für ein multilinguales Sprachsynthesesystem. PhD thesis, TU Dresden (2004). (in German)
Google Scholar
Kessler, B.: Phonetic comparison algorithms. Trans. Philol. Soc. 103(2), 243–260 (2005)
Article MathSciNet Google Scholar
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Dokl. Akad. Nauk SSSR 163(4), 845–848 (1965). (in Russian)
MathSciNet MATH Google Scholar
Madden, R.: (2013). https://github.com/rockymadden/stringmetric/. Accessed 03 Apr 2017
Odell, M.K., Russell, R.C.: US patents 1 261 167 and 1 435 683 (1918, 1922). https://en.wikipedia.org/wiki/Soundex
Pardeshi, J.B., Nandwalkar, B.R.: Survey on rule based phonetic search for slavic surnames. J. Comput. Technol. Appl. 7(1), 65–68 (2016)
Google Scholar
Parmar, V.P., Kumbharana, C.K.: Study existing various phonetic algorithms and designing and development of a working model for the new developed algorithm and comparison by implementing it with existing algorithms. J. Comput. Appl. 98(19), 45–49 (2014)
Google Scholar
Philips, L.: Hanging on the metaphone. J. Comput. Lang. 7(12), 39–44 (1990)
Google Scholar
Philips, L.: The double metaphone search algorithm. C/C++ Users J. 18(6), 38–43 (2000)
MathSciNet Google Scholar
Plique, G.: (2014). http://yomguithereal.github.io/clj-fuzzy/. Accessed 03 Apr 2017
Postel, H.J.: Die Kölner Phonetik. Ein Verfahren zur Identifizierung von Personennamen auf der Grundlage der Gestaltanalyse. IBM-Nachrichten 19, 925–931 (1969). (in German)
Google Scholar
Interface for data exchange in automated information process according to §112 TKG between Federal Network Agency and beneficiary (SBS, in German). Version 1.0, 27 October (2015). https://www.bundesnetzagentur.de/DE/Sachgebiete/Telekommunikation/Unternehmen_Institutionen/Anbieterpflichten/OeffentlicheSicherheit/AutomatisiertesAuskunftsverfahren/Automatisiertesauskunftsverfahren-node.html. Accessed 10 Dec 2016
Interface for data exchange in automated information process according to Section 112 TKG between Federal Network Agency and obligor (SBV, in German). Version 1.1 (Draft), 04 January (2016). https://www.bundesnetzagentur.de/DE/Sachgebiete/Telekommunikation/Unternehmen_Institutionen/Anbieterpflichten/OeffentlicheSicherheit/AutomatisiertesAuskunftsverfahren/Automatisiertesauskunftsverfahren-node.html. Accessed 10 Dec 2016
Shah, R., Singh, D.K.: Analysis and comparative study on phonetic matching techniques. Int. J. Comput. Appl. 87(9), 14–17 (2014)
Google Scholar
Shah, R., Singh, D.K.: Improvement of Soundex algorithm for Indian language based on phonetic matching. Int. J. Comput. Sci. Eng. Appl. (IJCSEA) 4(3), 31–39 (2014)
Google Scholar
http://yomguithereal.github.io/talisman/phonetics/. Accessed 03 Apr 2017
Das Telefonbuch Deutschland. https://www.telefoncd.de/DasTelefonbuch-CD-mit-Rueckwaertssuche.html (2016). German phone book DVD 2016–17, data status 01 September 2016
Supraregional collection of German family names from death certificates. Verein für Computergenealogie, Erkrath (2016). www.familienanzeigen.org/totzfanamen.php. Accessed 12 Oct 2016
Wells, J.: SAMPA - computer readable phonetic alphabet (1997). http://www.phon.ucl.ac.uk/home/sampa/. Accessed 10 Jan 2017
Zahoranský, D., Polasek, I.: Text search of surnames in some slavic and other morphologically rich languages using rule based phonetic algorithms. IEEE/ACM Trans. Audio Speech Lang. Process. 23(3), 553–563 (2015)
Article Google Scholar

Download references

Acknowledgements

We would like to thank Haya Hadidi and Tristan Münz from the Federal Network Agency of Germany (Bundesnetzagentur) for initiating this research and their practical hints on AAV procedures. Further thanks goes to Viktor Iaroshenko from HfT Leipzig and to Gabor Pintér from Kobe University in Japan for their project support and advice.

Author information

Authors and Affiliations

Leipzig University of Telecommunications (HfTL), Leipzig, Germany
Oliver Jokisch
GWT-TUD GmbH, Dresden, Germany
Horst-Udo Hain

Authors

Oliver Jokisch
View author publications
You can also search for this author in PubMed Google Scholar
Horst-Udo Hain
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Oliver Jokisch .

Editor information

Editors and Affiliations

SPIIRAS, Saint Petersburg, Russia
Alexey Karpov
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova
University of Hertfordshire, Hatfield, United Kingdom
Iosif Mporas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jokisch, O., Hain, HU. (2017). A Trainable Method for the Phonetic Similarity Search in German Proper Names. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-66429-3_4
Published: 13 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66428-6
Online ISBN: 978-3-319-66429-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics