Random Indexing Explained with High Probability

Behrang QasemiZadeh^15,16 &
Siegfried Handschuh^15,16

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9302))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

1851 Accesses

Abstract

Random indexing (RI) is an incremental method for constructing a vector space model (VSM) with a reduced dimensionality. Previously, the method has been justified using the mathematical framework of Kanerva’s sparse distributed memory. This justification, although intuitively plausible, fails to provide the information that is required to set the parameters of the method. In order to suggest criteria for the method’s parameters, the RI method is revisited and described using the principles of linear algebra and sparse random projections in Euclidean spaces. These simple mathematics are then employed to suggest criteria for setting the method’s parameters and to explain their influence on the estimated distances in the RI-constructed VSMs. The empirical results observed in an evaluation are reported to support the suggested guidelines in the paper.

The first three pages previously appeared in [1].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Random Indexing Revisited

Random indexing of multidimensional data

Article Open access 07 December 2016

Language Geometry Using Random Indexing

References

QasemiZadeh, B.: Random indexing revisited. In: Biemann, C., Handschuh, S., Freitas, A., Meziane, F., Métais, E. (eds.) NLDB 2015. LNCS, vol. 9103, pp. 437–442. Springer, Heidelberg (2015)
Chapter Google Scholar
Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. Journal of Artificial Intelligence Research 37(1), 141–188 (2010)
MathSciNet MATH Google Scholar
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)
Article MATH Google Scholar
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is nearest neighbor meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)
Chapter Google Scholar
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)
Article Google Scholar
Brand, M.: Fast low-rank modifications of the thin singular value decomposition. Linear Algebra and its Applications 415(1), 20–30 (2006). Special Issue on Large Scale Linear and Nonlinear Eigenvalue Problems
Article MathSciNet MATH Google Scholar
Johnson, W., Lindenstrauss, J.: Extensions of Lipschitz mappings into a Hilbert space. In: Beals, R., Beck, A., Bellow, A., Hajian, A. (eds.) Conference on Modern Analysis and Probability (1982: Yale University). Contemporary Mathematics, vol. 26. American Mathematical Society, pp. 189–206 (1984)
Google Scholar
Dasgupta, S., Gupta, A.: An elementary proof of a theorem of Johnson and Lindenstrauss. Random Structures and Algorithms 22(1), 60–65 (2003)
Article MathSciNet MATH Google Scholar
Achlioptas, D.: Database-friendly random projections. In: Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 274–281. ACM, Santa Barbara, May 2001
Google Scholar
Li, P., Hastie, T.J., Church, K.W.: Very sparse random projections. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 287–296. ACM, New York (2006)
Google Scholar
Kanerva, P., Kristoferson, J., Holst, A.: Random indexing of text samples for latent semantic analysis. In: Gleitman, L.R., Josh, A.K. (eds.) Proceedings of the 22nd Annual Conference of the Cognitive Science Society, p. 1036. Erlbaum, Mahwah (2000)
Google Scholar
Sahlgren, M.: An introduction to random indexing. Technical report, Swedish ICT (SICS) (2005). Retrived from https://www.sics.se/~mange/papers/RI_intro.pdf
Lupu, M.: On the usability of random indexing in patent retrieval. In: Hernandez, N., Jäschke, R., Croitoru, M. (eds.) ICCS 2014. LNCS, vol. 8577, pp. 202–216. Springer, Heidelberg (2014)
Google Scholar
Polajnar, T., Clark, S.: Improving distributional semantic vectors through context selection and normalisation. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Gothenburg, pp. 230–238, April 2014
Google Scholar
Baroni, M., Bernardini, S., Ferraresi, A., Zanchetta, E.: The WaCky Wide Web: A collection of very large linguistically processed Web-crawled corpora. Language Resources and Evaluation 43(3), 209–226 (2009)
Article Google Scholar
Brinkman, B., Charikar, M.: On the impossibility of dimension reduction in L1. Journal of the ACM 52(5), 766–788 (2005)
Article MathSciNet MATH Google Scholar
Lapesa, G., Evert, S.: Evaluating neighbor rank and distance measures as predictors of semantic priming. In: Proceedings of the Fourth Annual Workshop on Cognitive Modeling and Computational Linguistics. Association for Computational Linguistics, Sofia, pp. 66–74, August 2013
Google Scholar
Indyk, P.: Stable distributions, pseudorandom generators, embeddings, and data stream computation. Journal of the ACM 53(3), 307–323 (2006)
Article MathSciNet MATH Google Scholar
Li, P., Samorodnitsk, G., Hopcroft, J.: Sign Cauchy projections and chi-square kernel. In Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 26, pp. 2571–2579. Curran Associates, Inc. (2013)
Google Scholar
Geva, S., De Vries, C.M.: TOPSIG: topology preserving document signatures. In: Berendt, B., de Vries, A., Fan, W., Macdonald, C., Ounis, I., Ruthven, I. (eds.) Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 333–338. ACM, Glasgow (2011)
Google Scholar
Zadeh, B.Q., Handschuh, S.: Random Manhattan integer indexing: incremental L1 normed vector space construction. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1713–1723. Association for Computational Linguistics, Doha (2014)
Google Scholar
Zadeh, B.Q., Handschuh, S.: Random Manhattan indexing. In: 25th International Workshop on Database and Expert Systems Applications (DEXA), pp. 203–208. IEEE, Munich (2014)
Google Scholar
Baroni, M., Lenci, A., Onnis, L.: ISA meets Lara: an incremental word space model for cognitively plausible simulations of semantic learning. In: Proceedings of the Workshop on Cognitive Aspects of Computational Language Acquisition, pp. 49–56. Association for Computational Linguistics, Prague, June 2007
Google Scholar

Download references

Author information

Authors and Affiliations

Digital Libraries and Web Information Systems, University of Passau, Passau, Germany
Behrang QasemiZadeh & Siegfried Handschuh
National University of Ireland, Galway, Ireland
Behrang QasemiZadeh & Siegfried Handschuh

Authors

Behrang QasemiZadeh
View author publications
You can also search for this author in PubMed Google Scholar
Siegfried Handschuh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Behrang QasemiZadeh .

Editor information

Editors and Affiliations

University of West Bohemia, Pilsen, Czech Republic
Pavel Král
University of West Bohemia, Pilsen, Czech Republic
Václav Matoušek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

QasemiZadeh, B., Handschuh, S. (2015). Random Indexing Explained with High Probability. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_47

Download citation

DOI: https://doi.org/10.1007/978-3-319-24033-6_47
Published: 11 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24032-9
Online ISBN: 978-3-319-24033-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Random Indexing Explained with High Probability

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Random Indexing Revisited

Random indexing of multidimensional data

Language Geometry Using Random Indexing

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Random Indexing Explained with High Probability

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Random Indexing Revisited

Random indexing of multidimensional data

Language Geometry Using Random Indexing

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation