[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1401890.1401962acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Knowledge discovery of semantic relationships between words using nonparametric bayesian graph model

Published: 24 August 2008 Publication History

Abstract

We developed a model based on nonparametric Bayesian modeling for automatic discovery of semantic relationships between words taken from a corpus. It is aimed at discovering semantic knowledge about words in particular domains, which has become increasingly important with the growing use of text mining, information retrieval, and speech recognition. The subject-predicate structure is taken as a syntactic structure with the noun as the subject and the verb as the predicate. This structure is regarded as a graph structure. The generation of this graph can be modeled using the hierarchical Dirichlet process and the Pitman-Yor process. The probabilistic generative model we developed for this graph structure consists of subject-predicate structures extracted from a corpus. Evaluation of this model by measuring the performance of graph clustering based on WordNet similarities demonstrated that it outperforms other baseline models.

References

[1]
D. Aldous. Exchangeability and related topics. Lecture Notes in Math, 1117, 1985.
[2]
Antoniak. Mixtures of dirichlet processes with applications to bayesian nonparametric problems. The Annals of Statistics, 2(6), 1974.
[3]
C. Biemann. A random text model for the generation of statistical language invariants. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, 2007.
[4]
B. Dorow and D. Widdows. Discovering corpus-specific word senses. In In Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics, 2003.
[5]
B. Dorow, D. Widdows, K. Ling, J.-P. Eckmann, D. Sergi, and E. Moses. Using curvature and markov clustering in graphs for lexical acquisition and word sense discrimination. In MEANING, 2005.
[6]
Escobar and West. Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association, 90, 1995.
[7]
Ferguson. A bayesian analysis of some nonparametric problems. The Annals of Statistics, 1(2), 1973.
[8]
D. Gfeller, J.-C. Chappelier, and P. D. L. Rios. Synonym dictionary improvement through markov clustering and clustering stability. In In Proceedings of Applied Stochastic Models and Data Analysis, 2005.
[9]
M. Hagiwara, Y. Ogawa, and K. Toyama. Selection of effective contextual information for automatic synonym acquisition. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pages 353--360, 2006.
[10]
C. Kemp, J. B. Tenenbaum, T. L. Griffiths, T. Yamada, and N. Ueda. Learning systems of concepts with an infinite relational model. In In Proceedings of the 21st National Conference on Artificial Intelligence, 2006.
[11]
D. Lin. Automatic retrieval and clustering of similar words. In In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics, 1998.
[12]
S. McDonald and M. Ramscar. Testing the distributional hypothesis: The influence of context on judgements of semantic similarity. In In Proceedings of the 23rd Annual Conference of the Cognitive Science Society, 2001.
[13]
M. E. J. Newman and E. A. Leicht. Mixture models and exploratory analysis in networks. In In Proceedings of National Academy of Sciences of the United States of America, 2007.
[14]
J. Pitman and M. Yor. The two-parameter poisson-dirichlet distribution derived from a stable subordinator. Annals of Probability, 25, 1997.
[15]
Sethuraman. A constructive definition of dirichlet priors. Statistica Sinica, 4, 1994.
[16]
Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476):1566--1581, 2006.
[17]
S. van Dongen. Graph clustering by flow simulation. PhD thesis, 2000.

Cited By

View all
  • (2015)A method of designing a generic actor model for a professional social networkHuman-centric Computing and Information Sciences10.1186/s13673-015-0042-15:1Online publication date: 12-Aug-2015
  • (2012)Learning semantics and selectional preference of adjective-noun pairsProceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation10.5555/2387636.2387649(70-74)Online publication date: 7-Jun-2012

Index Terms

  1. Knowledge discovery of semantic relationships between words using nonparametric bayesian graph model

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2008
    1116 pages
    ISBN:9781605581934
    DOI:10.1145/1401890
    • General Chair:
    • Ying Li,
    • Program Chairs:
    • Bing Liu,
    • Sunita Sarawagi
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 August 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. graph clustering
    2. nonparametric bayes
    3. probabilistic generative model
    4. text mining

    Qualifiers

    • Research-article

    Conference

    KDD08

    Acceptance Rates

    KDD '08 Paper Acceptance Rate 118 of 593 submissions, 20%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 13 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2015)A method of designing a generic actor model for a professional social networkHuman-centric Computing and Information Sciences10.1186/s13673-015-0042-15:1Online publication date: 12-Aug-2015
    • (2012)Learning semantics and selectional preference of adjective-noun pairsProceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation10.5555/2387636.2387649(70-74)Online publication date: 7-Jun-2012

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media