[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Text analysis and knowledge mining system

Published: 01 October 2001 Publication History

Abstract

Large text databases potentially contain a great wealth of knowledge. However, text represents factual information (and information about the author's communicative intentions) in a complex, rich, and opaque manner. Consequently, unlike numerical and fixed field data, it cannot be analyzed by standard statistical data mining methods. Relying on human analysis results in either huge workloads or the analysis of only a tiny fraction of the database. We are working on text mining technology to extract knowledge from very large amounts of textual data. Unlike information retrieval technology that allows a user to select documents that meet the user's requirements and interests, or document clustering technology that organizes documents, we focus on finding valuable patterns and rules in text that indicate trends and significant features about specific topics. By applying our prototype system named TAKMI (Text Analysis and Knowledge Mining) to textual databases in PC help centers, we can automatically detect product failures; determine issues that have led to rapid increases in the number of calls and their underlying reasons; and analyze help center productivity and changes in customers' behavior involving a particular product, without reading any of the text. We have verified that our framework is also effective for other data such as patent documents.

References

[1]
1. O. Zamir, O. Etzioni, and R. Karp, "Fast and Intuitive Clustering of Web Documents," Proceedings of KDD-97 (1997), pp. 287-290.]]
[2]
2. W. Cohen and H. Hirsh, "Joins That Generalize: Text Classification Using WHIRL," Proceedings of KDD-98 (1998), pp. 169-173.]]
[3]
3. G. Salton and M. J. McGill, SMART and SIRE Experimental Retrieval Systems, McGraw-Hill, Inc., New York (1983).]]
[4]
4. A. M. Hearst, "Untangling Text Data Mining," Proceedings of ACL-09 (1999), pp. 3-10.]]
[5]
5. K. Night, "Mining Online Text," Communications of the ACM42, No. 11, 58-61 (1999).]]
[6]
6. U. Hahn and K. Schnattinger, "Deep Knowledge Discovery from Natural Language Texts," Proceedings of KDD-97 (1997), pp. 175-178.]]
[7]
7. Information Extraction, Lecture Notes in Artificial Intelligence, M. T. Pazienza, Editor, Springer-Verlag, Rome (1997).]]
[8]
8. Message Understanding Conferences, see http://www. itl.nist.gov/iad/894.02/related_projects/muc/index.html.]]
[9]
9. R. Feldman and I. Dagan, "Knowledge Discovery in Textual Databases," Proceedings of KDD-95 (1995), pp. 112-117.]]
[10]
10. R. Feldman, W. Kloesgen, and A. Zilberstein, "Visualization Techniques to Explore Data Mining Results for Documents," Proceedings of KDD-97 (1997), pp. 16-23.]]
[11]
11. B. Lent, R. Agrawal, and R. Srikant, "Discovering Trends in Text Databases," Proceedings of KDD-97 (1997), pp. 227-230.]]
[12]
12. J. Mladenic, "Text-Learning and Related Intelligent Agents: A Survey," IEEE Intelligent Systems14, No. 4, 44-54 (1999).]]
[13]
13. V. Hatzivassiloglou and K. McKeown, "Predicting the Semantic Orientation of Adjectives," Proceedings of ACL-97 (1997), pp. 174-181.]]
[14]
14. H. Matsuzawa and T. Fukuda, "Mining Structured Association Patterns from Databases," Proceedings of the 4th Pacific and Asia International Corference on Knowledge Discovery and Data Mining (2000), pp. 233-244.]]
[15]
15. The category of "fail" is very dependent on the domain. Thus, it should be defined in the semantic dictionary.]]
[16]
16. R. Agrawal, T. Imielinski, and A. Swami, "Mining Association Rules Between Sets of Items in Large Databases," Proceedings of the ACM SIGMOD '93 (1993), pp. 207-216.]]
[17]
17. H. Nomiyama, Topic Analysis in Newspaper Articles, Technical Report TR-0129, IBM Tokyo Research Laboratory, Tokyo (1996).]]
[18]
18. M. Morohashi, K. Takeda, H. Nomiyama, and H. Maruyama, "Information Outlining--Filling the Gap Between Visualization and Navigation in Digital Libraries," Proceedings of the International Symposium on Digital Libraries (1995), pp. 151-158.]]
[19]
19. P. Xia, "Knowledge Discovery in Integrated Call Centers: A Framework for Effective Customer-Driven Marketing," Proceedings of KDD-97 (1997), pp. 279-282.]]
[20]
20. This category is contained in structured data, whereas calls on voiceType is were collected based on information in unstructured text.]]
[21]
21. The verb "use" is "tsukau" in Japanese.]]
[22]
22. Information on Medline can be found at http://www. nlm.nih.gov/.]]
[23]
23. H. Maruyama, A Formal Approach to Japanese Analysis in Japanese-to-English Machine Translation, Dissertation, Kyoto University, Kyoto, Japan (1995).]]
[24]
24. This is a result of analysis in a small set of sample data to demonstrate the capability of the system.]]

Cited By

View all
  • (2023)A review of natural language processing in contact centre automationPattern Analysis & Applications10.1007/s10044-023-01182-826:3(823-846)Online publication date: 1-Aug-2023
  • (2020)An Approach to Mine SBVR Vocabularies and Rules from Business DocumentsProceedings of the 13th Innovations in Software Engineering Conference (formerly known as India Software Engineering Conference)10.1145/3385032.3385046(1-11)Online publication date: 27-Feb-2020
  • (2018)An analysis of the 2016 US presidential election using Chanakya-a knowledge discovery platform for text miningInternational Journal of Knowledge Engineering and Data Mining10.5555/3272143.32721455:1-2(17-39)Online publication date: 1-Jan-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IBM Systems Journal
IBM Systems Journal  Volume 40, Issue 4
October 2001
196 pages

Publisher

IBM Corp.

United States

Publication History

Published: 01 October 2001

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)A review of natural language processing in contact centre automationPattern Analysis & Applications10.1007/s10044-023-01182-826:3(823-846)Online publication date: 1-Aug-2023
  • (2020)An Approach to Mine SBVR Vocabularies and Rules from Business DocumentsProceedings of the 13th Innovations in Software Engineering Conference (formerly known as India Software Engineering Conference)10.1145/3385032.3385046(1-11)Online publication date: 27-Feb-2020
  • (2018)An analysis of the 2016 US presidential election using Chanakya-a knowledge discovery platform for text miningInternational Journal of Knowledge Engineering and Data Mining10.5555/3272143.32721455:1-2(17-39)Online publication date: 1-Jan-2018
  • (2018)Relation Identification in Business Rules for Domain-specific DocumentsProceedings of the 11th Innovations in Software Engineering Conference10.1145/3172871.3172884(1-5)Online publication date: 9-Feb-2018
  • (2018)Efficiently mining frequent itemsets applied for textual aggregationApplied Intelligence10.1007/s10489-017-1050-948:4(1013-1019)Online publication date: 1-Apr-2018
  • (2017)Knowledge Discovery and Data VisualizationInternational Journal of Organizational and Collective Intelligence10.4018/IJOCI.20170701057:3(56-69)Online publication date: 1-Jul-2017
  • (2017)Semantic association rule mining in text using domain ontologyInternational Journal of Metadata, Semantics and Ontologies10.1504/IJMSO.2017.08764612:1(28-34)Online publication date: 1-Jan-2017
  • (2017)Hybrid Knowledge Mining EcosystemProceedings of the 9th International Conference on Management of Digital EcoSystems10.1145/3167020.3167024(22-27)Online publication date: 7-Nov-2017
  • (2017)An Approach to Mine Business Rule Intents from Domain-specific DocumentsProceedings of the 10th Innovations in Software Engineering Conference10.1145/3021460.3021470(96-106)Online publication date: 5-Feb-2017
  • (2016)Domain-independent method of detecting inconsistencies in SBVR-based business rulesProceedings of the International Workshop on Formal Methods for Analysis of Business Systems10.1145/2975941.2975943(9-16)Online publication date: 4-Sep-2016
  • Show More Cited By

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media