Abstract
We present a disambiguation algorithm for weighted automata. The algorithm admits two main stages: a pre-disambiguation stage followed by a transition removal stage. We give a detailed description of the algorithm and the proof of its correctness. The algorithm is not applicable to all weighted automata but we prove sufficient conditions for its applicability in the case of the tropical semiring by introducing the weak twins property. In particular, the algorithm can be used with all acyclic weighted automata and more generally any determinizable weighted automata. While disambiguation can sometimes be achieved using determinization, our disambiguation algorithm in some cases can return a result that is exponentially smaller than any equivalent deterministic automaton. We also present some empirical evidence of the space benefits of disambiguation over determinization in speech recognition and machine translation applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The removal of ambiguous transitions requires the following key property which is guaranteed by our \(\mathsf R\)-pre-disambiguation algorithm: after removal of ambiguous transitions, the weight of a remaining path must be precisely the same as the weight assigned to the string labeling that path by the original automaton. Let us also emphasize that the procedure of [14] is not a special instance of our algorithm and in particular does not benefit from the crucial use of the relation \(\mathsf R^*\).
- 2.
Our algorithms can be straightforwardly extended to the case of weakly left divisible left semirings [3].
- 3.
This condition can in fact be relaxed: it suffices that there exists a co-reachable state \((q_i, s_i)\) with \(i < j\) since it can be shown that in that case, there exists necessarily such a state with a a-transition to \((q_0, s_0)\).
- 4.
The lemma is stated as processing one list, but from the proof it is clear it applies to multiple lists.
- 5.
In [3], the authors use instead the terminology of cycle-unambiguous weighted automata, which coincides with that of polynomially ambiguous weighted automata.
References
Albert, J., Kari, J.: Digital image compression. In: Handbook of Weighted Automata. Springer, Heidelberg (2009)
Allauzen, C., Benson, E., Chelba, C., Riley, M., Schalkwyk, J.: Voice query refinement. In: Interspeech (2012)
Allauzen, C., Mohri, M.: Efficient algorithms for testing the twins property. J. Automata, Lang. Comb. 8(2), 117–144 (2003)
Allauzen, C., Riley, M., Schalkwyk, J., Skut, W., Mohri, M.: OpenFst Library (2007). http://www.openfst.org
Breuel, T.M.: The OCRopus open source OCR system. In: Proceedings of IS&T/SPIE 20th Annual Symposium (2008)
Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.J.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Camb. Univ. Press, Cambridge (1998)
Eilenberg, S.: Automata, Languages and Machines. Academic Press, New York (1974)
Eppstein, D.: Finding the \(k\) shortest paths. SIAM J. Comp. 28(2), 652–673 (1998)
Iglesias, G., Allauzen, C., Byrne, W., de Gispert, A., Riley, M.: Hierarchical phrase-based translation representations. In: Proceedings of EMNLP, pp. 1373–1383 (2011)
Kaplan, R.M., Kay, M.: Regular models of phonological rule systems. Comput. Linguist. 20(3), 331–378 (1994)
Kirsten, D.: A Burnside approach to the termination of Mohri’s algorithm for polynomially ambiguous min-plus-automata. ITA 42(3), 553–581 (2008)
Kirsten, D.: Decidability, undecidability, and pspace-completeness of the twins property in the tropical semiring. Theor. Comput. Sci. 420, 56–63 (2012)
Kirsten, D., Lombardy, S.: Deciding unambiguity and sequentiality of polynomially ambiguous min-plus automata. In: STACS, pp. 589–600 (2009)
Klimann, I., Lombardy, S., Mairesse, J., Prieur, C.: Deciding unambiguity and sequentiality from a finitely ambiguous max-plus automaton. Theor. Comput. Sci. 327(3), 349–373 (2004)
Kuich, W., Salomaa, A.: Semirings, Automata, Languages. EATCS Monographs on Theoretical Computer Science, vol. 5. Springer, Germany (1986)
Mohri, M.: Finite-state transducers in language and speech processing. Comput. Linguist. 23(2), 269–311 (1997)
Mohri, M.: On the disambiguation of finite automata and functional transducers. Int. J. Found. Comput. Sci. 24(6), 847–862 (2013)
Mohri, M., Pereira, F.C.N., Riley, M.: Speech recognition with weighted finite-state transducers. In: Handbook on Speech Proc. and Speech Comm. Springer, Heidelberg (2008)
Mohri, M., Riley, M.: An efficient algorithm for the n-best-strings problem. In Interspeech (2002)
Mohri, M., Riley, M.D.: On the disambiguation of weighted automata. ArXiv 1405.0500, May 2014
Schalkwyk, J., Beeferman, D., Beaufays, F., Byrne, B., Chelba, C., Cohen, M., Kamvar, M., Strope, B.: Your word is my command: Google search by voice: A case study. In: Advances in Speech Recognition, pp. 61–90. Springer, Heidelberg (2010)
Schmidt, E.M.: Succinctness of description of context-free, regular and unambiguous languages. Ph.D. thesis, Dept. of Comp. Sci., University of Aarhus (1978)
Acknowledgments
We thank Cyril Allauzen for discussions about the topic of this research. This work was partly funded by the NSF award IIS-1117591.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Mohri, M., Riley, M.D. (2015). On the Disambiguation of Weighted Automata. In: Drewes, F. (eds) Implementation and Application of Automata. CIAA 2015. Lecture Notes in Computer Science(), vol 9223. Springer, Cham. https://doi.org/10.1007/978-3-319-22360-5_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-22360-5_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22359-9
Online ISBN: 978-3-319-22360-5
eBook Packages: Computer ScienceComputer Science (R0)