More Web Proxy on the site http://driver.im/

Article

Evaluating the novelty of text-mined rules using lexical knowledge

Authors:

Raymond J. Mooney,

Krupakar V. Pasupuleti,

Joydeep GhoshAuthors Info & Claims

KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 233 - 238

https://doi.org/10.1145/502512.502544

Published: 26 August 2001 Publication History

Abstract

In this paper, we present a new method of estimating the novelty of rules discovered by data-mining methods using WordNet, a lexical knowledge-base of English words. We assess the novelty of a rule by the average semantic distance in a knowledge hierarchy between the words in the antecedent and the consequent of the rule - the more the average distance, more is the novelty of the rule. The novelty of rules extracted by the DiscoTEX text-mining system on Amazon.com book descriptions were evaluated by both human subjects and by our algorithm. By computing correlation coefficients between pairs of human ratings and between human and automatic ratings, we found that the automatic scoring of rules based on our novelty measure correlates with human judgments about as well as human judgments correlate with one another. @Text mining

References

[1]

R. J. Bayardo Jr. and R. Agrawal. Mining the most interesting rules. In Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining (KDD-gg), pages 145-154, San Diego, CA, August 1999.]]

Digital Library

[2]

S. C. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and 1%. A. Haxshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41:391-407, 1990.]]

[3]

R. Feldman, edRor. Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI-99} Workshop on Text Mining: Foundations, Techniques and Applications, Stockholm, Sweden, August 1999.]]

[4]

R. Feldman and I. Dagan. Knowledge discovery in textual databases (KDT). In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD-95), pages 112-117, 1995.]]

[5]

C. D. Fellbaum. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA, 1998.]]

[6]

J. Han and Y. Fu. Discovery of multiple-level association rules from large databases. In Proceedings of the 21st International Conference on Very Large Databases (VLDB-95}, pages 420-431, Zurich, Switzerland, 1995.]]

Digital Library

[7]

J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kauffmann Publishers, San Francisco, 2001.]]

Digital Library

[8]

M. Hearst. Untangling text data mining. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL-99), pages 3-10, College Park, MD, June 1999.]]

Digital Library

[9]

G. Hirst and D. St-Onge. Lexical chains as representations of context for the detection and correction of malapropims. In C. Fellbaum, editor, WordNet: An Electronic Lexical Database, chapter 13, pages 305-332. MIT Press, 1998.]]

[10]

F. Hussain, H. Liu, E. Suzuki, and H. Lu. Exception rule mining with a relative interestingness measure. In Proceedings of Pacific Asia Conference on Knowledge Discovery in DataBases (PAKDD-2000), pages 86-97, April 2000.]]

Digital Library

[11]

M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A. Verkamo. Finding interesting rules from large sets of discovered association rules. In Proceedings of The Third International Conference on Information and Knowledge Management (CIKM-94), pages 401--407, 1994.]]

Digital Library

[12]

T. K. Landauer and S. T. Dumais. A solution to Plato's problem: The Latent Semantic Analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104:211-240, 1997.]]

[13]

C. Leacock and M. Chodorow. Combining local context and WordNet similarity for word sense identification. In C. Fellbaum, editor, WordNet: An Electronic Lezical Database, chapter 11, pages 265-284. MIT Press, 1998.]]

[14]

J. H. Lee, M. H. Kim, and Y. J. Lee. Information retrieval based on a conceptual distance in IS-A heirarchy. Journal of Documentation, 49(2):188-207, June 1993.]]

[15]

B. Liu, W. Hsu, L.-F. Mun, and H. Lee. Finding interesting patterns using user expectations. IEEE Transactions on Knowledge and Data Engineering, 11(6):817-832, 1999.]]

Digital Library

[16]

C. D. Manning and H. Schfitze. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA, 1999.]]

Digital Library

[17]

D. Mladenid, editor. Proceedings of the Sixth International Conference on Knowledge Discovery and Data Mining (KDD-2000) Workshop on Text Mining, Boston, MA, August 2000.]]

[18]

U. Y. Nahm and R. J. Mooney. A mutually beneficial integration of data mining and information extraction. In Proceedings of the Seventeenth National Conference .on Artificial Intelligence (AAAI-2000}, pages 627-632, Austin, TX, July 2000.]]

Digital Library

[19]

U. Y. Nahm and R. J. Mooney. Mining soft-matching rules from textual data. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI-2001), Seattle, WA, 2001.]]

[20]

B. Padmanabhan and A. Tuzhilin. A belief-driven method for discovering unexpected patterns. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98), pages 94-100, 1998.]]

Digital Library

[21]

R. Rada, H. Mili, E. Bicknell, and M. Blettner. Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man and Cybernetics, 19(1):17-30, 1989.]]

[22]

P. Resnick. WordNet and distribution analysis: A class-based approach to lexical discovery. In Statistically-Based Natural-Language-Processing Techniques: Papers from the 1992 AAAI Workshop. AAAI Press, 1992.]]

[23]

P. Resnick. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI-95), pages 448-453, 1995.]]

Digital Library

[24]

S. Sahar. Interestingness via what is not interesting. In The Fifth A CM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 332-336, August 1999.]]

Digital Library

[25]

A. Silberschatz and A. Tuzhilin. What makes patterns interesting in knowledge discovery systems. IEEE Transactions on Knowledge and Data Engineering, 8(6):970-974, 1996.]]

Digital Library

[26]

M. Sussna. Word sense disambiguation for free-text indexing using a massive semantic network. In Proceedings of The Second International Conference on Information and Knowledge Management (CIKM-93), pages 67-74, 1993.]]

Digital Library

Cited By

Durga BSanjana KBaig YTendulkar NMothukuri RVignesh T(2023)Information Extraction From Text Messages Using Natural Language Processing2023 International Conference on Computer Communication and Informatics (ICCCI)10.1109/ICCCI56745.2023.10128641(1-6)Online publication date: 23-Jan-2023
https://doi.org/10.1109/ICCCI56745.2023.10128641
Afolabi IBadejo JAdubi SOdetunmibi O(2020)Identifying major civil engineering research influencers and topics using social network analysisCogent Engineering10.1080/23311916.2020.18351477:1(1835147)Online publication date: 26-Oct-2020
https://doi.org/10.1080/23311916.2020.1835147
Chaudhuri NDhar DYammiyavar P(2020)A computational model for subjective evaluation of novelty in descriptive aptitudeInternational Journal of Technology and Design Education10.1007/s10798-020-09638-232:2(1121-1158)Online publication date: 22-Nov-2020
https://doi.org/10.1007/s10798-020-09638-2
Show More Cited By

Index Terms

Evaluating the novelty of text-mined rules using lexical knowledge
1. Applied computing
  1. Document management and text processing
    1. Document capture
      1. Document analysis
2. Information systems
  1. Data management systems
    1. Database management system engines
      1. Triggers and rules

Recommendations

Collocation Dictionary Optimization Using WordNetand k-Nearest Neighbor Learning

In machine translation, collocation dictionaries are important for selecting accurate target words. However, if the dictionary size is too large it can decrease the efficiency of translation. This paper presents a method to develop a compact ...
Interestingness measures for association rules based on statistical validity

Assessing rules with interestingness measures is the pillar of successful application of association rules discovery. However, association rules discovered are normally large in number, some of which are not considered as interesting or significant for ...
Interestingness of Association Rules Using Symmetrical Tau and Logistic Regression
AI '09: Proceedings of the 22nd Australasian Joint Conference on Advances in Artificial Intelligence

While association rule mining is one of the most popular data mining techniques, it usually results in many rules, some of which are not considered as interesting or significant for the application at hand. In this paper, we conduct a systematic ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining

August 2001

493 pages

ISBN:158113391X

DOI:10.1145/502512

Conference Chair:
Doheon Lee
Chonnam National University, Korea
,
General Chair:
Mario Schkolnick
SGI
,
Program Chairs:
Foster Provost
New York University
,
Ramakrishnan Srikant
IBM Almaden Research Center

Copyright © 2001 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
AAAI: American Association for Artificial Intelligence

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 August 2001

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

KDD01

Sponsor:

KDD01: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 26 - 29, 2001

California, San Francisco

Acceptance Rates

KDD '01 Paper Acceptance Rate 31 of 237 submissions, 13%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

28
Total Citations
View Citations
829
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)1

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Durga BSanjana KBaig YTendulkar NMothukuri RVignesh T(2023)Information Extraction From Text Messages Using Natural Language Processing2023 International Conference on Computer Communication and Informatics (ICCCI)10.1109/ICCCI56745.2023.10128641(1-6)Online publication date: 23-Jan-2023
https://doi.org/10.1109/ICCCI56745.2023.10128641
Afolabi IBadejo JAdubi SOdetunmibi O(2020)Identifying major civil engineering research influencers and topics using social network analysisCogent Engineering10.1080/23311916.2020.18351477:1(1835147)Online publication date: 26-Oct-2020
https://doi.org/10.1080/23311916.2020.1835147
Chaudhuri NDhar DYammiyavar P(2020)A computational model for subjective evaluation of novelty in descriptive aptitudeInternational Journal of Technology and Design Education10.1007/s10798-020-09638-232:2(1121-1158)Online publication date: 22-Nov-2020
https://doi.org/10.1007/s10798-020-09638-2
Sethi RShekar B(2018)Subjective Interestingness in Association Rule Mining: A Theoretical AnalysisDigital Business10.1007/978-3-319-93940-7_15(375-389)Online publication date: 27-Jul-2018
https://doi.org/10.1007/978-3-319-93940-7_15
Sinoara RAntunes JRezende S(2017)Text mining and semantics: a systematic mapping studyJournal of the Brazilian Computer Society10.1186/s13173-017-0058-723:1Online publication date: 29-Jun-2017
https://doi.org/10.1186/s13173-017-0058-7
Chen CSong MChen CSong M(2017)Text Mining with Unstructured TextRepresenting Scientific Knowledge10.1007/978-3-319-62543-0_6(223-261)Online publication date: 28-Nov-2017
https://doi.org/10.1007/978-3-319-62543-0_6
Diamantini CPotena DStorti E(2017)Data Semantics Meets Knowledge Discovery in DatabasesA Comprehensive Guide Through the Italian Database Research Over the Last 25 Years10.1007/978-3-319-61893-7_23(391-405)Online publication date: 31-May-2017
https://doi.org/10.1007/978-3-319-61893-7_23
Gim JJung HJeong D(2015)XOnto-Apriori: An Effective Association Rule Mining Algorithm for Personalized Recommendation SystemsComputer Science and its Applications10.1007/978-3-662-45402-2_160(1131-1138)Online publication date: 2015
https://doi.org/10.1007/978-3-662-45402-2_160
Loukides GGkoulalas-Divanis AShao J(2013)Efficient and flexible anonymization of transaction dataKnowledge and Information Systems10.1007/s10115-012-0544-336:1(153-210)Online publication date: 1-Jul-2013
https://dl.acm.org/doi/10.1007/s10115-012-0544-3
Watrous-deVersterre LWang CSong MBoughida KHoward BNelson MVan de Sompel HSølvberg I(2012)Concept chaining utilizing meronyms in text characterizationProceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries10.1145/2232817.2232862(241-248)Online publication date: 10-Jun-2012
https://dl.acm.org/doi/10.1145/2232817.2232862
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents