More Web Proxy on the site http://driver.im/

Article

Contextual search and name disambiguation in email using graphs

Authors:

William W. Cohen,

Andrew Y. NgAuthors Info & Claims

SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 27 - 34

https://doi.org/10.1145/1148170.1148179

Published: 06 August 2006 Publication History

Abstract

Similarity measures for text have historically been an important tool for solving information retrieval problems. In many interesting settings, however, documents are often closely connected to other documents, as well as other non-textual objects: for instance, email messages are connected to other messages via header information. In this paper we consider extended similarity metrics for documents and other objects embedded in graphs, facilitated via a lazy graph walk. We provide a detailed instantiation of this framework for email data, where content, social networks and a timeline are integrated in a structural graph. The suggested framework is evaluated for two email-related problems: disambiguating names in email documents, and threading. We show that reranking schemes based on the graph-walk similarity measures often outperform baseline methods, and that further improvements can be obtained by use of appropriate learning methods.

References

[1]

M. Aery and S. Chakravarthy. EmailSift: Email classification based on structure and content. In ICDM, 2005.

Digital Library

[2]

A. Balmin, V. Hristidis, and Y. Papakonstantinou. ObjectRank: Authority-based keyword search in databases. In VLDB, 2004.

Digital Library

[3]

R. Bekkerman, R. El-Yaniv, and A. McCallum. Multi-way distributional clustering via pairwise interactions. In ICML, 2005.

Digital Library

[4]

H. Berger, M. Dittenbach, and D Merkl. An adaptive information retrieval system. based on associative networks. In APCCM, 2004.

Digital Library

[5]

V. R. Carvalho and W. W. Cohen. On the collective classification of email "speech acts". In SIGIR, 2005.

Digital Library

[6]

W. W. Cohen. Data integration using similarity joins and a word-based information representation language. ACM Transactions on Information Systems, 18(3):288--321, 2000.

Digital Library

[7]

W. W. Cohen, P. Ravikumar, and S. Fienberg. A comparison of string distance metrics for name-matching tasks. In IIWEB, 2003.

Digital Library

[8]

W. W. Cohen, R. E. Schapire, and Y. Singer. Learning to order things. Journal of Artificial Intelligence Research (JAIR), 10:243--270, 1999.

Digital Library

[9]

M. Collins. Ranking algorithms for named-entity extraction: Boosting and the voted perceptron. In ACL, 2002.

Digital Library

[10]

M. Collins and T. Koo. Discriminative reranking for natural language parsing. Computational Linguistics, 31(1):25--69, 2005.

Digital Library

[11]

K. Collins-Thompson and J. Callan. Query expansion using random walk models. In CIKM, 2005.

Digital Library

[12]

W. B. Croft and J. Lafferty. Language Modeling for Information Retrieval. Springer, 2003.

Digital Library

[13]

C. P. Diehl, L. Getoor, and G. Namata. Name reference resolution in organizational email archives. In SIAM, 2006.

[14]

M. Diligenti, M. Gori, and M. Maggini. Learning web page scores by error back-propagation. In IJCAI, 2005.

Digital Library

[15]

S. Haykin. Neural Networks. Macmillan College Publishing Company, 1994.

[16]

M. Hearst. Texttiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics, 23(1):33--64, 1997.

Digital Library

[17]

G. Jeh and J. Widom. Simrank: A measure of structural-context similarity. In SIGKDD, 2002.

Digital Library

[18]

D. Kalashnikov, S. Mehrotra, and Z. Chen. Exploiting relationship for domain independent data cleaning. In SIAM, 2005.

[19]

B. Klimt and Y. Yang. The enron corpus: A new dataset for email classification research. In ECML, 2004.

Digital Library

[20]

O. Kurland and L. Lee. Pagerank without hyperlinks: Structural re-ranking using links induced by language models. In SIGIR, 2005.

Digital Library

[21]

D. E. Lewis and K. A. Knowles. Threading electronic mail: A preliminary study. Information Processing and Management, 1997.

Digital Library

[22]

B. Malin, E. M. Airoldi, and K. M. Carley. A social network analysis model for name disambiguation in lists. Journal of Computational and Mathematical Organization Theory, 11(2), 2005.

Digital Library

[23]

E. Minkov, R. C. Wang, and W. W. Cohen. Extracting personal names from emails: Applying named entity recognition to informal text. In HLT-EMNLP, 2005.

Digital Library

[24]

Z. Nie, Y. Zhang, J.-R. Wen, and W.-Y. Ma. Object-level ranking: Bringing order to web objects. In WWW, 2005.

Digital Library

[25]

L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. In Technical Report, Computer Science department, Stanford University, 1998.

[26]

G. Salton and C. Buckley. On the use of spreading activation methods in automatic information retrieval. In SIGIR, 1988.

Digital Library

[27]

G. Salton and C. Buckley. Global text matching for information retrieval. Science, 253:1012--1015, 1991.

[28]

G. Salton, A. Singhal, M. Mitra, and C. Buckley. Automatic text structuring and summarization. Information Processing and Management, 33(2):193--208, 1997.

Digital Library

[29]

R. E. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3):297--336, 1999.

Digital Library

[30]

K. Toutanova, C. D. Manning, and A. Y. Ng. Learning random walk models for inducing word dependency distributions. In ICML, 2004.

Digital Library

[31]

W. Xi, E. A. Fox, W. P. Fan, B. Zhang, Z. Chen, J. Yan, and D. Zhuang. Simfusion: Measuring similarity using unified relationship matrix. In SIGIR, 2005.

Digital Library

[32]

Y. Yang and C. Chute. An example-based mapping method for text classification and retrieval. ACM Transactions on Information Systems, 12(3), 1994.

Digital Library

[33]

D. Zhou, B. Scholkopf, and T. Hofmann. Semi-supervised learning on directed graphs. In NIPS, 2005.

[34]

X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In ICML, 2003.

Digital Library

Cited By

Mei QLi X(2024)Robust Chinese Short Text Entity Disambiguation Method Based on Feature Fusion and Contrastive LearningInformation10.3390/info1503013915:3(139)Online publication date: 29-Feb-2024
https://doi.org/10.3390/info15030139
Nafa YChen QHou BLi Z(2024)Adaptive deep learning for entity disambiguation via knowledge-based risk analysisExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122342238:PEOnline publication date: 27-Feb-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.122342
Elyashar APuzis RFire M(2021)It Runs in the Family: Unsupervised Algorithm for Alternative Name Suggestion Using Digitized Family TreesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3096670(1-1)Online publication date: 2021
https://doi.org/10.1109/TKDE.2021.3096670
Show More Cited By

Index Terms

Contextual search and name disambiguation in email using graphs
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing
    2. Retrieval models and ranking

Recommendations

On Graph-Based Name Disambiguation

Name ambiguity stems from the fact that many people or objects share identical names in the real world. Such name ambiguity decreases the performance of document retrieval, Web search, information integration, and may cause confusion in other ...
Preventing Spam Email by Delivery Limitation in RMX
IDEAS '15: Proceedings of the 19th International Database Engineering & Applications Symposium

On the rule-based email exchange system called RMX, similar to general mailing lists, anyone can send emails by sending to an address unique to RMX. However, there is a security problem that we cannot prevent spam emails and accidentally sending email ...
Name Disambiguation Using Semantic Association Clustering
ICEBE '09: Proceedings of the 2009 IEEE International Conference on e-Business Engineering

Due to homonyms, abbreviations, etc., name ambiguity is widely available in web and e-document. For example, when integrating heterogeneous literature databases, because there are different name specifications, different authors may be thought of as the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

August 2006

768 pages

ISBN:1595933697

DOI:10.1145/1148170

General Chair:
Efthimis N. Efthimiadis
University of Washington
,
Program Chairs:
Susan Dumais
Microsoft Research, Redmond
,
David Hawking
CSIRO ICT Centre, Canberra, Australia
,
Kalervo Järvelin,
University of Tampere, Finland

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 August 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

SIGIR06

Sponsor:

SIGIR06: The 29th Annual International SIGIR Conference

August 6 - 11, 2006

Washington, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

106
Total Citations
View Citations
1,597
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)2

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Mei QLi X(2024)Robust Chinese Short Text Entity Disambiguation Method Based on Feature Fusion and Contrastive LearningInformation10.3390/info1503013915:3(139)Online publication date: 29-Feb-2024
https://doi.org/10.3390/info15030139
Nafa YChen QHou BLi Z(2024)Adaptive deep learning for entity disambiguation via knowledge-based risk analysisExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122342238:PEOnline publication date: 27-Feb-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.122342
Elyashar APuzis RFire M(2021)It Runs in the Family: Unsupervised Algorithm for Alternative Name Suggestion Using Digitized Family TreesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3096670(1-1)Online publication date: 2021
https://doi.org/10.1109/TKDE.2021.3096670
Zong CXia RZhang JZong CXia RZhang J(2021)Information ExtractionText Data Mining10.1007/978-981-16-0100-2_10(227-283)Online publication date: 21-Jan-2021
https://doi.org/10.1007/978-981-16-0100-2_10
Arslan EOrhan UTahiroğlu B(2019)Morphological Disambiguation of Turkish with Free-order Co-occurrence StatisticsGümüşhane Üniversitesi Fen Bilimleri Enstitüsü Dergisi10.17714/gumusfenbil.430034Online publication date: 31-Jan-2019
https://doi.org/10.17714/gumusfenbil.430034
Heindorf SScholten YEngels GPotthast M(2019)Debiasing Vandalism Detection Models at WikidataThe World Wide Web Conference10.1145/3308558.3313507(670-680)Online publication date: 13-May-2019
https://dl.acm.org/doi/10.1145/3308558.3313507
Si HTong WKausar S(2018)A conditional random field model for name disambiguation in National Natural Science Foundation of China fundJournal of Algorithms & Computational Technology10.1177/174830181775148112:2(91-100)Online publication date: 6-Feb-2018
https://doi.org/10.1177/1748301817751481
Sheng FCao QCai HYao JXie C(2018)GraPUProceedings of the ACM Symposium on Cloud Computing10.1145/3267809.3267811(301-312)Online publication date: 11-Oct-2018
https://dl.acm.org/doi/10.1145/3267809.3267811
Wi Tay NYang SLee CKubota N(2018)Ontology-based Adaptive e-Textbook Platform for Student and Machine Co-Learning2018 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)10.1109/FUZZ-IEEE.2018.8491480(1-7)Online publication date: Jul-2018
https://doi.org/10.1109/FUZZ-IEEE.2018.8491480
Naeem MLinggawa IMughal ALutteroth CWeber G(2018)A Smart Email Client Prototype for Effective Reuse of Past RepliesIEEE Access10.1109/ACCESS.2018.28785236(69453-69471)Online publication date: 2018
https://doi.org/10.1109/ACCESS.2018.2878523
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents