[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Confucius and its intelligent disciples: integrating social with search

Published: 01 September 2010 Publication History

Abstract

Q&A sites continue to flourish as a large number of users rely on them as useful substitutes for incomplete or missing search results. In this paper, we present our experience with developing Confucius, a Google Q&A service launched in 21 countries and four languages by the end of 2009. Confucius employs six data mining subroutines to harness synergy between web search and social networks. We present these subroutines' design goals, algorithms, and their effects on service quality. We also describe techniques for and experience with scaling the subroutines to mine massive data sets.

References

[1]
E. Agichtein, C. Castillo, D. Donato, A. Gionis, and G. Mishne. Finding high-quality content in social media. In Proceedings of WSDM '08 , pages 183--194, 2008.
[2]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. The Journal of Machine Learning Research , 3:993--1022, 2003.
[3]
S. Brin, L. Page, R. Motwani, and T. Winograd. The PageRank citation ranking: bringing order to the web. In Proceedings of ASIS98 , pages 161--172, 1998.
[4]
H. T. Dang, D. Kelly, and J. Lin. Overview of the TREC 2007 question answering track. In NIST Special Publication: SP 500--274 The Sixteenth Text REtrieval Conference (TREC 2007) Proceedings , 2007.
[5]
T. Hofmann. Probabilistic latent semantic analysis. In Proceedings of UAI'99 , 1999.
[6]
D. Horowitz and S. D. Kamvar. Anatomy of a large-scale social search engine. In Proceedings of WWW'2010 , 2010.
[7]
J. Jeon, W. B. Croft, and J. H. Lee. Finding similar questions in large question and answer archives. In Proceedings of CIKM'05 , pages 84--90, 2005.
[8]
J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM , 46(5), 1999.
[9]
J. Ko, E. Nyberg, and L. Si. A probabilistic graphical model for joint answer ranking in question answering. In Proceedings of SIGIR'07 , 2007.
[10]
Y. S. Lai, K. A. Fung, and C. H. Wu. FAQ mining via list detection. In Proceedings of the Workshop on Multilingual Summarization and Question Answering , 2002.
[11]
F. Li, X. Zhang, J. Yuan, and X. Zhu. Classifying what-type questions by head noun tagging. In Proceedings of COLING 2008 , 2008.
[12]
Zhiyuan Liu, Yuzhou Zhang, Edward Y. Chang, and Maosong Sun. PLDA*: Parallel latent dirichlet allocation with data placement and pipeline processing. ACM Transactions on Intelligent Systems and Technology (Accepted) , 2010.
[13]
V. Murdock, D. Kelly, W. B. Croft, N. J. Belkin, and X. Yuan. Identifying and improving retrieval for procedural questions. Information Processing and Management , 43(1), 2007.
[14]
A. Y. Ng, A. X. Zheng, and M. I. Jordan. Stable algorithms for link analysis. In Proceedings of SIGIR'01 , 2001.
[15]
K. Papineni, S. Roukos, T. Ward, and W. J. Zhu. Bleu: a method for automatic evaluation of machine translation. In Proceedings of ACL'02 , pages 311--318, 2001.
[16]
D. Radev, W. Fan, H. Qi, H. Wu, and A. Grewal. Probabilistic question answering on the web. In Proceedings of WWW '02 , 2002.
[17]
X. Si, Z. Gyongyi, E. Y. Chang, and M. S. Sun. Scalable mining of topic-dependent user reputation for improving user generated content search quality. Technical report, Google, 2010.
[18]
M. Surdeanu, M. Ciaramita, and H. Zaragoza. Learning to rank answers on large online QA collections. Proceedings of ACL-08: HLT , 2008.
[19]
X. Xue, J. Jeon, and W. B. Croft. Retrieval models for question and answer archives. In Proceedings of SIGIR '08 , 2008.
[20]
X. Zhang, Y. Hao, X. Zhu, M. Li, and D. R. Cheriton. Information distance from a question to an answer. In Proceedings of KDD '07 , pages 874--883, 2007.

Cited By

View all
  • (2021)Towards an efficient weighted random walk dominationProceedings of the VLDB Endowment10.14778/3436905.343691514:4(560-572)Online publication date: 22-Feb-2021
  • (2021)Familia: A Configurable Topic Modeling Framework for Industrial Text EngineeringDatabase Systems for Advanced Applications10.1007/978-3-030-73200-4_36(516-528)Online publication date: 11-Apr-2021
  • (2017)DeepQProceedings of the 25th ACM international conference on Multimedia10.1145/3123266.3130875(1068-1068)Online publication date: 23-Oct-2017
  • Show More Cited By
  1. Confucius and its intelligent disciples: integrating social with search

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 3, Issue 1-2
    September 2010
    1658 pages

    Publisher

    VLDB Endowment

    Publication History

    Published: 01 September 2010
    Published in PVLDB Volume 3, Issue 1-2

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 05 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Towards an efficient weighted random walk dominationProceedings of the VLDB Endowment10.14778/3436905.343691514:4(560-572)Online publication date: 22-Feb-2021
    • (2021)Familia: A Configurable Topic Modeling Framework for Industrial Text EngineeringDatabase Systems for Advanced Applications10.1007/978-3-030-73200-4_36(516-528)Online publication date: 11-Apr-2021
    • (2017)DeepQProceedings of the 25th ACM international conference on Multimedia10.1145/3123266.3130875(1068-1068)Online publication date: 23-Oct-2017
    • (2016)Spatial Consensus Queries in a Collaborative EnvironmentACM Transactions on Spatial Algorithms and Systems10.1145/28299432:1(1-37)Online publication date: 30-Mar-2016
    • (2016)Data mining techniques in social mediaNeurocomputing10.1016/j.neucom.2016.06.045214:C(654-670)Online publication date: 19-Nov-2016
    • (2014)NCRProceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management10.1145/2661829.2661978(709-718)Online publication date: 3-Nov-2014
    • (2012)Simultaneous realization of page-centric communication and searchProceedings of the 21st ACM international conference on Information and knowledge management10.1145/2396761.2398738(2719-2721)Online publication date: 29-Oct-2012
    • (2012)Question routing in community based QAProceedings of the ACM SIGKDD Workshop on Mining Data Semantics10.1145/2350190.2350195(1-8)Online publication date: 12-Aug-2012
    • (2012)A conversation with Dr. Edward Y. ChangACM SIGKDD Explorations Newsletter10.1145/2207243.220725613:2(73-74)Online publication date: 1-May-2012
    • (2012)A classification-based approach to question routing in community question answeringProceedings of the 21st International Conference on World Wide Web10.1145/2187980.2188201(783-790)Online publication date: 16-Apr-2012
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media