[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2396761.2398499acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

SemaFor: semantic document indexing using semantic forests

Published: 29 October 2012 Publication History

Abstract

Traditional document indexing techniques store documents using easily accessible representations, such as inverted indices, which can efficiently scale for large document sets. These structures offer scalable and efficient solutions in text document management tasks, though, they omit the cornerstone of the documents' purpose: meaning. They also neglect semantic relations that bind terms into coherent fragments of text that convey messages. When semantic representations are employed, the documents are mapped to the space of concepts and the similarity measures are adapted appropriately to better fit the retrieval tasks. However, these methods can be slow both at indexing and retrieval time. In this paper we propose SemaFor, an indexing algorithm for text documents, which uses semantic spanning forests constructed from lexical resources, like Wikipedia, and WordNet, and spectral graph theory in order to represent documents for further processing.

References

[1]
M. Barnsley. Fractals Everywhere. Morgan Kaufmann, 2000.
[2]
R. Basili, M. Cammisa, and A. Moschitti. A semantic kernel to exploit linguistic knowledge. In Proc. of the AI*IA, 2005.
[3]
N. Biggs. Algebraic Graph Theory. Cambridge University Press, 1993.
[4]
W. Buntine, J. Löfström, J. Perkiö, S. Perttu, V. Poroshin, T. Silander, H. Tirri, A. Tuominen, and V. Tuulos. A scalable topic-based open source search engine. In Proc. of the IEEE/WIC/ACM International Conference on Web Intelligence, pages 228--234, 2004.
[5]
J. Chang, J. Lee, Y. Kim, and B. Zhang. Topic extraction from text documents using multiple-cause networks. In Proc. of PRICAI, pages 434--443, 2002.
[6]
H. El Ghawalby and R. Hancock. Measuring graph similarity using spectral geometry. In Proc. of the 5th International Conference on Image Analysis and Recognition, 2008.
[7]
B. Kang and S. Lee. Document indexing: A concept-based approach to term weight estimation. Information Processing and Management, 41:1065--1080, 2005.
[8]
O. Milne and I. Witten. An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In Proc. of the first AAAI Workshop on Wikipedia and AI, 2008.
[9]
R. Navigli. Word sense disambiguation: A survey. ACM Computing Surveys, 41(2), Article 10, 2009.
[10]
P. Schone, J. Towsend, T. Crystal, and C. Olano. Text retrieval via semantic forests. In Proc. of the Sixth Text Retrieval Conference (TREC6), pages 761--773, 1997.
[11]
K. Toutanova, D. Klein, C. Manning, and Y. Singer. Feature-rich part-of-speech tagging with a cyclic dependency network. In Proc. of HLT-NAACL, pages 252--259, Canada, 2003. ACM.
[12]
G. Tsatsaronis and V. Panagiotopoulou. A generalized vector space model for text retrieval based on semantic relatedness. In Proc. of the EACL 2009 (Student Research Workshop), pages 70--78, 2009.
[13]
G. Tsatsaronis, I. Varlamis, and M. Vazirgiannis. Text relatedness based on a word thesaurus. Journal of Artificial Intelligence Research, 37:1--39, 2010.
[14]
G. Tsatsaronis, M. Vazirgiannis, and I. Androutsopoulos. Word sense disambiguation with spreading activation networks generated from thesauri. In Proc. of the 20th IJCAI, pages 1725--1730, 2007.
[15]
J. Wang and W. Taylor. Concept forest: A new ontology-assisted text document similarity measurement method. In Proc. of the IEEE/WIC/ACM International Conference on Web Intelligence, pages 395--401, 2007.
[16]
B. Xiao and E. Hancock. Geometric characterisation of graphs. In Proc. of the International Conference on Image Analysis and Processing (ICIAP), pages 471--478, 2005.
[17]
S. Yau and R. Scoen. Differential Geometry. Science Publication, 1988.
[18]
G. Young and A. Householder. Discussion of a set of points in terms of their mutual distances. Psychometrica, 3, 1938.
[19]
Z. Zhang, A. Gentile, and F. Ciravegna. Recent advances in methods of lexical semantic relatedness - a survey. Natural Language Engineering. 2012.

Cited By

View all
  • (2015)A Combined Index for Mixed Structured and Unstructured DataProceedings of the 2015 12th Web Information System and Application Conference (WISA)10.1109/WISA.2015.36(217-222)Online publication date: 11-Sep-2015
  • (2015)A Method of Calculating Comment Text Similarity Based on Tree StructureProceedings of the 2015 7th International Conference on Intelligent Human-Machine Systems and Cybernetics - Volume 0110.1109/IHMSC.2015.244(220-223)Online publication date: 26-Aug-2015
  • (2013)Understanding the diversity of tweets in the time of outbreaksProceedings of the 22nd International Conference on World Wide Web10.1145/2487788.2488172(1335-1342)Online publication date: 13-May-2013
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management
October 2012
2840 pages
ISBN:9781450311564
DOI:10.1145/2396761
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. document indexing
  2. semantic graphs
  3. text representation

Qualifiers

  • Short-paper

Conference

CIKM'12
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2015)A Combined Index for Mixed Structured and Unstructured DataProceedings of the 2015 12th Web Information System and Application Conference (WISA)10.1109/WISA.2015.36(217-222)Online publication date: 11-Sep-2015
  • (2015)A Method of Calculating Comment Text Similarity Based on Tree StructureProceedings of the 2015 7th International Conference on Intelligent Human-Machine Systems and Cybernetics - Volume 0110.1109/IHMSC.2015.244(220-223)Online publication date: 26-Aug-2015
  • (2013)Understanding the diversity of tweets in the time of outbreaksProceedings of the 22nd International Conference on World Wide Web10.1145/2487788.2488172(1335-1342)Online publication date: 13-May-2013
  • (2013)Ontology-Based Query Expansion for Supporting Information Retrieval in AgricultureThe 8th International Conference on Knowledge Management in Organizations10.1007/978-94-007-7287-8_24(299-311)Online publication date: 6-Sep-2013

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media