[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

DF-Miner

Published: 01 March 2015 Publication History

Abstract

Organizing a set of domain-specific terms into a meaningful hierarchical structure is an essential task for faceted search and knowledge organization. In this paper, we present an automatic approach, called domain-specific facet (DF)-Miner, to discover DFs based on the hyperlink structure within the Wikipedia article pages. Each article page corresponds to a domain-specific term. The hyperlink structures among article pages represent the connections among these terms. The community structure of the connections among a domain-specific term set reveals the facets of the domain. The terms with more connections provide important clues for facet labeling. Accordingly, DF-Miner first constructs a domain-specific hyperlink graph from the Wikipedia article pages. Then it extracts a tree structure from the Wikipedia category pages. DF-Miner groups the terms of a domain into multiple facets based on the result of community detection. Finally, DF-Miner selects a meaningful label for each facet based on the connection number of terms and the extracted tree structure from the category pages. Two experiments were conducted with six real-world datasets to evaluate DF-Miner. The experimental results show that DF-Miner performs better than the textual content-based approaches.

References

[1]
B. Wei, J. Liu, Q. Zheng, W. Zhang, X. Fu, B. Feng, A survey of faceted search, J. Web Eng., 12 (2013) 041-064.
[2]
P. Buitelaar, P. Cimiano, B. Magnini, Ontology learning from text: an overview, in: Ontology Learning from Text: Methods, Evaluation and Applications, IOS Press, 2005.
[3]
A.K. Karlson, G.G. Robertson, D.C. Robbins, M.P. Czerwinski, G.R. Smith, FaThumb: a facet-based interface for mobile search, in: SIGCHI Conference on Human Factors in Computing Systems, Montreal, Quebec, Canada, 2006, pp. 711-720.
[4]
B. HjoRland, Facet analysis: the logical approach to knowledge organization, Inf. Process. Manage., 49 (2013) 545-557.
[5]
W. Dakka, P.G. Ipeirotis, Automatic extraction of useful facet hierarchies from text databases, in: 2008 IEEE 24th International Conference on Data Engineering, Cancun, Mexico, 2008, pp. 466-475.
[6]
E. Stoica, M.A. Hearst, M. Richardson, Automating creation of hierarchical faceted metadata structures, in: Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Rochester, NY, 2007, pp. 244-251.
[7]
O. Medelyan, D. Milne, C. Legg, I.H. Witten, Mining meaning from Wikipedia, Int. J. Hum. Comput Stud., 67 (2009) 716-754.
[8]
V. Blondel, J. Guillaume, R. Lambiotte, E. Lefebvre, Fast unfolding of communities in large networks, J. Stat. Mech: Theory Exp., 10 (2008) P10008.
[9]
M. Girvan, M.E.J. Newman, Community structure in social and biological networks, Proc. Natl. Acad. Sci., 99 (2002) 7821-7826.
[10]
X. Wang, J. Li, Detecting communities by the core-vertex and intimate degree in complex networks, Physica A, 392 (2013) 2555-2563.
[11]
J. Pound, S. Paparizos, P. Tsaparas, Facet discovery for structured web search: a query-log mining approach, in: 2011 ACM SIGMOD International Conference on Management of Data, Athens, Greece, 2011, pp. 169-180.
[12]
S.B. Roy, H.D. Wang, U. Nambiar, G. Das, M. Mohania, DynaCet: building dynamic faceted search systems over databases, in: 2009 IEEE 25th International Conference on Data Engineering, Shanghai, China, 2009, pp. 1463-1466.
[13]
E. Oren, R. Delbru, S. Decker, Extending faceted navigation for RDF data, in: 5th International Semantic Web Conference, 2006, pp. 559-572.
[14]
J. Koren, Y. Zhang, X. Liu, Personalized interactive faceted search, in: 17th International Conference on World Wide Web, Beijing, China, 2008, pp. 477-486.
[15]
A. Kashyap, V. Hristidis, M. Petropoulos, FACeTOR: cost-driven exploration of faceted query results, in: 19th ACM International Conference on Information and knowledge Management, Toronto, Ontario, Canada, 2010.
[16]
L. Shen, Y.K. Lim, H.T. Loh, Domain-specific concept-based information retrieval system, vol. 2, pp. 525-529.
[17]
S.B. Cleveland, B.J. Gao, MASFA: mass-collaborative faceted search for online communities, in: 22nd International Conference on World Wide Web Companion, Rio de Janeiro, Brazil, 2013, pp. 293-296.
[18]
Z. Dou, S. Hu, K. Chen, R. Song, J. Wen, Multi-dimensional search result diversification, in: Fourth ACM International Conference on Web search and Data Mining, Hong Kong, China, 2011, pp. 475-484.
[19]
B. Stein, T. Gollub, D. Hoppe, Search result presentation based on faceted clustering, in: 21st ACM International Conference on Information and Knowledge Management, Maui, HI, United states, 2012, pp. 1940-1944.
[20]
D.R. Cutting, D.R. Karger, J.O. Pedersen, J.W. Tukey, Scatter/Gather: a cluster-based approach to browsing large document collections, in: 15th International ACM SIGIR Conference on Research and Development in Information Retrieval, Copenhagen, Denmark, 1992, pp. 318-329.
[21]
H. Zeng, Q. He, Z. Chen, W. Ma, J. Ma, Learning to cluster web search results, in: 27th International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, United Kingdom, 2004, pp. 210-217.
[22]
C.D. Manning, H. Schitze, Foundations of Statistical Natural Language Processing, MIT Press, 1999.
[23]
D. Carmel, H. Roitman, N. Zwerdling, Enhancing cluster labeling using Wikipedia, in: 32nd International ACM SIGIR conference on Research and Development in Information Retrieval, Boston, MA, USA, 2009, pp. 139-146.
[24]
A. Clauset, C. Shalizi, M. Newman, Power-law distributions in empirical data, SIAM Rev., 51 (2009) 661-703.
[25]
J. Carletta, Assessing agreement on classification tasks: the kappa statistic, Comput. Linguist., 22 (1996) 249-254.
[26]
M.E.J. Newman, Modularity and community structure in networks, Proc. Natl. Acad. Sci., 103 (2006) 8577-8582.
[27]
A. Strehl, J. Ghosh, Cluster ensembles - a knowledge reuse framework for combining multiple partitions, J. Mach. Learn. Res., 3 (2003) 583-617.
[28]
A. Turpin, F. Scholer, User performance versus precision measures for simple search tasks, in: 29th International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, Washington, USA, 2006, pp. 11-18.
[29]
B. Wei, J. Liu, J. Ma, Q. Zheng, W. Zhang, B. Feng, DFT-extractor: a system to extract domain-specific faceted taxonomies from Wikipedia, in: Proceedings of the 22nd International Conference on World Wide Web companion, Rio de Janeiro, Brazil, 2013, pp. 277-280.
[30]
D.M. Blei, A.Y. Ng, M.I. Jordan, Latent Dirichlet allocation, J. Mach. Learn. Res., 3 (2003) 993-1022.
[31]
G. Heinrich, Parameter estimation for text analysis, Technical report, 2005.
[32]
T.L. Griffiths, M. Steyvers, Finding scientific topics, Proc. Natl. Acad. Sci., 101 (2004) 5228-5235.
[33]
X. Wei, W.B. Croft, LDA-based document models for ad-hoc retrieval, in: 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, Washington, USA, 2006, pp. 178-185.
[34]
E. Achtert, S. Goldhofer, H. Kriegel, E. Schubert, A. Zimek, Evaluation of clusterings - metrics and visual support, in: 2012 IEEE 28th International Conference on Data Engineering, Washington, DC, USA, 2012, pp. 1285-1288.
[35]
J. Chen, J. Yan, B. Zhang, Q. Yang, Z. Chen, Diverse topic phrase extraction through latent semantic analysis, in: 6th International Conference on Data Mining, Atlanta, Georgia, USA, 2006, pp. 834-838.
[36]
K.W. Church, P. Hanks, Word association norms, mutual information, and lexicography, Comput. Linguist., 16 (1990) 22-29.
[37]
P. Treeratpituk, J. Callan, Automatically labeling hierarchical clusters, in: 2006 International Conference on Digital Government Research, San Diego, California, 2006, pp. 167-176.
[38]
C. Nguyen, X. Phan, S. Horiguchi, T. Nguyen, Q. Ha, Web search clustering and labeling with hidden topics, ACM Trans. Asian Lang. Inf. Process., 8 (2009) 1-40.
[39]
Q. Mei, X. Shen, C. Zhai, Automatic labeling of multinomial topic models, in: 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, USA, 2007, pp. 490-499.
[40]
J.H. Lau, K. Grieser, D. Newman, T. Baldwin, Automatic labelling of topic models, in: 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, 2011, pp. 1536-1545.

Cited By

View all
  • (2021)A Survey on Conversational Recommender SystemsACM Computing Surveys10.1145/345315454:5(1-36)Online publication date: 25-May-2021
  • (2021)Seamlessly Unifying Attributes and Items: Conversational Recommendation for Cold-start UsersACM Transactions on Information Systems10.1145/344642739:4(1-29)Online publication date: 31-Oct-2021
  • (2019)TF-Miner: Topic-Specific Facet Mining by Label PropagationDatabase Systems for Advanced Applications10.1007/978-3-030-18590-9_66(457-460)Online publication date: 22-Apr-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Knowledge-Based Systems
Knowledge-Based Systems  Volume 77, Issue C
March 2015
129 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 March 2015

Author Tags

  1. Community structure
  2. Domain-specific facet mining
  3. Hyperlink structure
  4. Scale-free property
  5. Wikipedia

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2021)A Survey on Conversational Recommender SystemsACM Computing Surveys10.1145/345315454:5(1-36)Online publication date: 25-May-2021
  • (2021)Seamlessly Unifying Attributes and Items: Conversational Recommendation for Cold-start UsersACM Transactions on Information Systems10.1145/344642739:4(1-29)Online publication date: 31-Oct-2021
  • (2019)TF-Miner: Topic-Specific Facet Mining by Label PropagationDatabase Systems for Advanced Applications10.1007/978-3-030-18590-9_66(457-460)Online publication date: 22-Apr-2019
  • (2017)Synchronization clustering based on central force optimization and its extension for large-scale datasetsKnowledge-Based Systems10.1016/j.knosys.2016.11.007118:C(31-44)Online publication date: 15-Feb-2017
  • (2016)A new algorithm for approximate pattern mining in multi-graph collectionsKnowledge-Based Systems10.1016/j.knosys.2016.07.003109:C(198-207)Online publication date: 1-Oct-2016

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media