Abstract
In traditional pseudo feedback, the main reason of the topic drift is the low quality of the feedback source. Clustering search results is an effective way to improve the quality of feedback set. For XML data, how to effectively perform clustering algorithm and then identify good xml fragments from the clustering results is a intricate problem. This paper mainly focus on the latter problem. Based on k-mediod clustering results, This work firstly proposes an cluster label extraction method to select candidate relevant clusters. Secondly, multiple ranking features are introduced to assist the related xml fragments identification from the candidate clusters. Top N fragments compose the high quality pseudo feedback set finally. Experimental results on standard INEX test data show that in one hand, the proposed cluster label extraction method could obtain proper cluster key terms and lead to appropriate candidate cluster selection. On the other hand, the presented ranking features are beneficial to the relevant xml fragments identification. The quality of feedback set is ensured.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Kyung, S.L., Croft, W.B., James, A.: A Cluster-Based Resampling Method for Pseudo-Relevance Feedback. In: Proc. of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 235–242. ACM Press, New York (2008)
Ben, H., Ladh, O.: Finding Good Feedback Documents. In: Proc. of the 18th ACM Conf. on Information and Knowledge Management (CIKM), pp. 2011–2014. ACM Press, New York (2009)
Raman, K., Udupa, R., Bhattacharya, P., Bhole, A.: On Improving Pseudo-Relevance Feedback Using Pseudo-Irrelevant Documents. In: Gurrin, C., He, Y., Kazai, G., Kruschwitz, U., Little, S., Roelleke, T., Rüger, S., van Rijsbergen, K. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 573–576. Springer, Heidelberg (2010)
Sakai, T., Manabe, T., Koyama, M.: Flexible Pseudo-Relevance Feedback via Selective Sampling. ACM Transactions on Asian Language Information Processing 4(2), 111–135 (2005)
Shariq, B., Andreas, B.: Improving Retrievability of Patents with Cluster-Based Pseudo-Relevance Feedback Document Selection. In: Proc. of the 18th ACM Conf. on Information and Knowledge Management (CIKM), pp. 1863–1866. ACM Press, New York (2009)
Kevyn, C.T., Jamie, C.: Estimation and Use of Uncertainty in Pseudo-Relevance Feedback. In: Proc. of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 303–310. ACM Press, New York (2007)
Zhong, M.: Combining Term Semantics with Content and Structure Semantics for XML Element Search Results Clustering. Journal of Convergence Information Technology 7(15), 26–35 (2012)
Carnegie Mellon University and the University of Massachusetts. INDRI: Language Modeling Meets Inference Networks (March 2010), http://www.lemurproject.org/indri/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhong, M., Wan, C., Liu, D., Liao, S., Luo, S. (2013). Cluster Labeling Extraction and Ranking Feature Selection for High Quality XML Pseudo Relevance Feedback Fragments Set. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds) Advanced Data Mining and Applications. ADMA 2013. Lecture Notes in Computer Science(), vol 8347. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53917-6_38
Download citation
DOI: https://doi.org/10.1007/978-3-642-53917-6_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-53916-9
Online ISBN: 978-3-642-53917-6
eBook Packages: Computer ScienceComputer Science (R0)