[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/502512.502544acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Evaluating the novelty of text-mined rules using lexical knowledge

Published: 26 August 2001 Publication History

Abstract

In this paper, we present a new method of estimating the novelty of rules discovered by data-mining methods using WordNet, a lexical knowledge-base of English words. We assess the novelty of a rule by the average semantic distance in a knowledge hierarchy between the words in the antecedent and the consequent of the rule - the more the average distance, more is the novelty of the rule. The novelty of rules extracted by the DiscoTEX text-mining system on Amazon.com book descriptions were evaluated by both human subjects and by our algorithm. By computing correlation coefficients between pairs of human ratings and between human and automatic ratings, we found that the automatic scoring of rules based on our novelty measure correlates with human judgments about as well as human judgments correlate with one another. @Text mining

References

[1]
R. J. Bayardo Jr. and R. Agrawal. Mining the most interesting rules. In Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining (KDD-gg), pages 145-154, San Diego, CA, August 1999.]]
[2]
S. C. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and 1%. A. Haxshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41:391-407, 1990.]]
[3]
R. Feldman, edRor. Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI-99} Workshop on Text Mining: Foundations, Techniques and Applications, Stockholm, Sweden, August 1999.]]
[4]
R. Feldman and I. Dagan. Knowledge discovery in textual databases (KDT). In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD-95), pages 112-117, 1995.]]
[5]
C. D. Fellbaum. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA, 1998.]]
[6]
J. Han and Y. Fu. Discovery of multiple-level association rules from large databases. In Proceedings of the 21st International Conference on Very Large Databases (VLDB-95}, pages 420-431, Zurich, Switzerland, 1995.]]
[7]
J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kauffmann Publishers, San Francisco, 2001.]]
[8]
M. Hearst. Untangling text data mining. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL-99), pages 3-10, College Park, MD, June 1999.]]
[9]
G. Hirst and D. St-Onge. Lexical chains as representations of context for the detection and correction of malapropims. In C. Fellbaum, editor, WordNet: An Electronic Lexical Database, chapter 13, pages 305-332. MIT Press, 1998.]]
[10]
F. Hussain, H. Liu, E. Suzuki, and H. Lu. Exception rule mining with a relative interestingness measure. In Proceedings of Pacific Asia Conference on Knowledge Discovery in DataBases (PAKDD-2000), pages 86-97, April 2000.]]
[11]
M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A. Verkamo. Finding interesting rules from large sets of discovered association rules. In Proceedings of The Third International Conference on Information and Knowledge Management (CIKM-94), pages 401--407, 1994.]]
[12]
T. K. Landauer and S. T. Dumais. A solution to Plato's problem: The Latent Semantic Analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104:211-240, 1997.]]
[13]
C. Leacock and M. Chodorow. Combining local context and WordNet similarity for word sense identification. In C. Fellbaum, editor, WordNet: An Electronic Lezical Database, chapter 11, pages 265-284. MIT Press, 1998.]]
[14]
J. H. Lee, M. H. Kim, and Y. J. Lee. Information retrieval based on a conceptual distance in IS-A heirarchy. Journal of Documentation, 49(2):188-207, June 1993.]]
[15]
B. Liu, W. Hsu, L.-F. Mun, and H. Lee. Finding interesting patterns using user expectations. IEEE Transactions on Knowledge and Data Engineering, 11(6):817-832, 1999.]]
[16]
C. D. Manning and H. Schfitze. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA, 1999.]]
[17]
D. Mladenid, editor. Proceedings of the Sixth International Conference on Knowledge Discovery and Data Mining (KDD-2000) Workshop on Text Mining, Boston, MA, August 2000.]]
[18]
U. Y. Nahm and R. J. Mooney. A mutually beneficial integration of data mining and information extraction. In Proceedings of the Seventeenth National Conference .on Artificial Intelligence (AAAI-2000}, pages 627-632, Austin, TX, July 2000.]]
[19]
U. Y. Nahm and R. J. Mooney. Mining soft-matching rules from textual data. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI-2001), Seattle, WA, 2001.]]
[20]
B. Padmanabhan and A. Tuzhilin. A belief-driven method for discovering unexpected patterns. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98), pages 94-100, 1998.]]
[21]
R. Rada, H. Mili, E. Bicknell, and M. Blettner. Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man and Cybernetics, 19(1):17-30, 1989.]]
[22]
P. Resnick. WordNet and distribution analysis: A class-based approach to lexical discovery. In Statistically-Based Natural-Language-Processing Techniques: Papers from the 1992 AAAI Workshop. AAAI Press, 1992.]]
[23]
P. Resnick. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI-95), pages 448-453, 1995.]]
[24]
S. Sahar. Interestingness via what is not interesting. In The Fifth A CM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 332-336, August 1999.]]
[25]
A. Silberschatz and A. Tuzhilin. What makes patterns interesting in knowledge discovery systems. IEEE Transactions on Knowledge and Data Engineering, 8(6):970-974, 1996.]]
[26]
M. Sussna. Word sense disambiguation for free-text indexing using a massive semantic network. In Proceedings of The Second International Conference on Information and Knowledge Management (CIKM-93), pages 67-74, 1993.]]

Cited By

View all
  • (2023)Information Extraction From Text Messages Using Natural Language Processing2023 International Conference on Computer Communication and Informatics (ICCCI)10.1109/ICCCI56745.2023.10128641(1-6)Online publication date: 23-Jan-2023
  • (2020)Identifying major civil engineering research influencers and topics using social network analysisCogent Engineering10.1080/23311916.2020.18351477:1(1835147)Online publication date: 26-Oct-2020
  • (2020)A computational model for subjective evaluation of novelty in descriptive aptitudeInternational Journal of Technology and Design Education10.1007/s10798-020-09638-232:2(1121-1158)Online publication date: 22-Nov-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
August 2001
493 pages
ISBN:158113391X
DOI:10.1145/502512
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 August 2001

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. WordNet
  2. interesting rules
  3. knowledge hierarchy
  4. novelty
  5. semantic distance

Qualifiers

  • Article

Conference

KDD01
Sponsor:

Acceptance Rates

KDD '01 Paper Acceptance Rate 31 of 237 submissions, 13%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)1
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Information Extraction From Text Messages Using Natural Language Processing2023 International Conference on Computer Communication and Informatics (ICCCI)10.1109/ICCCI56745.2023.10128641(1-6)Online publication date: 23-Jan-2023
  • (2020)Identifying major civil engineering research influencers and topics using social network analysisCogent Engineering10.1080/23311916.2020.18351477:1(1835147)Online publication date: 26-Oct-2020
  • (2020)A computational model for subjective evaluation of novelty in descriptive aptitudeInternational Journal of Technology and Design Education10.1007/s10798-020-09638-232:2(1121-1158)Online publication date: 22-Nov-2020
  • (2018)Subjective Interestingness in Association Rule Mining: A Theoretical AnalysisDigital Business10.1007/978-3-319-93940-7_15(375-389)Online publication date: 27-Jul-2018
  • (2017)Text mining and semantics: a systematic mapping studyJournal of the Brazilian Computer Society10.1186/s13173-017-0058-723:1Online publication date: 29-Jun-2017
  • (2017)Text Mining with Unstructured TextRepresenting Scientific Knowledge10.1007/978-3-319-62543-0_6(223-261)Online publication date: 28-Nov-2017
  • (2017)Data Semantics Meets Knowledge Discovery in DatabasesA Comprehensive Guide Through the Italian Database Research Over the Last 25 Years10.1007/978-3-319-61893-7_23(391-405)Online publication date: 31-May-2017
  • (2015)XOnto-Apriori: An Effective Association Rule Mining Algorithm for Personalized Recommendation SystemsComputer Science and its Applications10.1007/978-3-662-45402-2_160(1131-1138)Online publication date: 2015
  • (2013)Efficient and flexible anonymization of transaction dataKnowledge and Information Systems10.1007/s10115-012-0544-336:1(153-210)Online publication date: 1-Jul-2013
  • (2012)Concept chaining utilizing meronyms in text characterizationProceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries10.1145/2232817.2232862(241-248)Online publication date: 10-Jun-2012
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media