Abstract
Patent prior art search uses dispersed information to retrieve all the relevant documents with strong ambiguity from the massive patent database. This challenging task consists in patent reduction and patent expansion. Existing studies on patent reduction ignore the relevance between technical characteristics and technical domains, and result in ambiguous queries. Works on patent expansion expand terms from external resource by selecting words with similar distribution or similar semantics. However, this splits the relevance between the distribution and semantics of the terms. Besides, common repository hardly meets the requirement of patent expansion for uncommon semantics and unusual terms. In order to solve these problems, we first present a novel composite-domain perspective model which converts the technical characteristic of a query patent to a specific composite classified domain and generates aspect queries. We then implement patent expansion with double consistency by combining distribution and semantics simultaneously.We also propose to train semantic vector spaces via word embedding under the specific classified domains, so as to provide domain-aware expanded resource. Finally, multiple retrieval results of the same topic are merged based on perspective weight and rank in the results. Our experimental results on CLEP-IP 2010 demonstrate that our method is very effective. It reaches about 5.43% improvement in recall and nearly 12.38% improvement in PRES over the state-of-the-art. Our work also achieves the best performance balance in terms of recall, MAP and PRES.
Similar content being viewed by others
References
Zhang L, Li L, Li T. Patent mining: a survey. ACM SIGKDD Explorations Newsletter, 2015, 16(2): 1–19
Xue X, Croft B. Automatic query generation for patent search. In: Proceedings of the 18th ACM International Conference on Information and Knowledge Management. 2009, 2037–2040
Xue X, Croft B. Transforming patents into prior-art queries. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2009, 808–809
Kim Y, Seo J, Croft B. Automatic boolean query suggestion for professional search. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2011, 825–834
Kim Y, Croft B. Diversifying query suggestions based on query documents. In: Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2014, 891–894
Far G, Sanner S, Bouadjenek M R, Ferraro G, Hawking D. On term selection techniques for patent prior art search. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2015, 803–806
Al-Shboul B, Myaeng H. Query phrase expansion using wikipedia in patent class search. In: Proceedings of the 7th Asia Information Retrieval Symposium. 2011, 115–126
Magdy W, Jones J F. A study on query expansion methods for patent retrieval. In: Proceedings of the 4th Workshop on Patent Information Retrieval. 2011, 19–24
Kishida K. Pseudo relevance feedback method based on taylor expansion of retrieval function in NTCIR-3 patent retrieval task. In: Proceedings of the ACL-2003 Workshop on Patent Corpus Processing. 2003, 33–40
Mahdabi P, Andersson L, Keikha M, Crestani F. Automatic refinement of patent queries using concept importance predictors. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2012, 505–514
Mahdabi P, Gerani S, Huang X, Crestani F. Leveraging conceptual lexicon: query disambiguation using proximity information for patent retrieval. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2013, 113–122
Wang F, Lin L. Domain lexicon-based query expansion for patent retrieval. In: Proceedings of the 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery. 2016, 1543–1547
Mahdabi P, Crestani F. Query-driven mining of citation networks for patent citation retrieval and recommendation. In: Proceedings of the 23rd ACM International Conference on Information and Knowledge Management. 2014, 1659–1668
Judea A, Schütze H, Brügmann S. Unsupervised training set generation for automatic acquisition of technical terminology in patents. In: Proceedings of the 15th International Conference on Computational Linguistics. 2014, 290–300
Magdy W, Leveling J, Jones G J F. Exploring structured documents and query formulation techniques for patent retrieval. In: Proceedings of the Workshop on Cross-Language Evaluation Forum for European Languages. 2009, 410–417
Mahdabi P, Crestani F. Patent query formulation by synthesizing multiple sources of relevance evidence. ACM Transactions on Information Systems, 2014, 32(4): 1–30
Cetintas S, Si L. Effective query generation and postprocessing strategies for prior art patent search. Journal of the Association for Information Science and Technology, 2012, 63(3): 512–527
Ganguly D, Leveling J, Magdy W, Jones G J F. Patent query reduction using pseudo relevance feedback. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management. 2011, 1953–1956
Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. 2013, arXiv preprint arXiv:1301.3781
Magdy W, Jones G J F. PRES: a score metric for evaluating recall-oriented information retrieval applications. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2010, 611–618
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Grant Nos. 61232002, 61572376), the Science and Technology Support Program of Hubei Province (2015BAA127) and the Wuhan Innovation Team Project (2014070504020237).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Fei Wang is a PhD candidate at the School of Computer Science, Wuhan University, China. His current research interests are in the database, complex data management, patent mining, information retrieval and natural language processing. He received the ME degree in Computer Science from Chengdu University of Information Technology, China in 2014.
Tieyun Qian is a professor at the State Key Laboratory of Software Engineering at Wuhan University, China. She received her BS degree in computer science from Wuhan University of Technology, China in 1991, and her PhD degree in computer science from Huazhong University of Science and Technology, China in 2006. Her current research interests include text mining, Web mining, and natural language processing. She has published over 30 papers in leading conferences including ACL, EMNLP, SIGIR, etc. She is a member of CCF and ACM. She has served as program committee member of many premium conferences: WWW, COLING, DASFAA, WAIM, and APWeb.
Bin Liu is a lecture at the School of Computer Science, Wuhan University, China. Bin Liu received the PhD, BS, and ME degree in Computer Science from Wuhan University, China. His current research interests are in the database, data mining, complex data management and natural language processing.
Zhiyong Peng received the BS and ME degree in Computer Science from Wuhan University and Changsha Institute of Technology of China, respectively. He received PhD degree from Kyoto University of Japan in 1995. He is a professor at Wuhan University. Prior to join Wuhan University in 2000, he worked as a researcher at the Advanced Software Technology and Mechatronics Research Institute of Kyoto from 1995 to 1997 and was a member of the technical staff at Hewlett-Packard Laboratories, Japan from 1997 to 2000. His current research interests are in the database, trusted data management, and complex data management.
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Wang, F., Qian, T., Liu, B. et al. Patent expanded retrieval via word embedding under composite-domain perspectives. Front. Comput. Sci. 13, 1048–1061 (2019). https://doi.org/10.1007/s11704-018-7056-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11704-018-7056-6