Abstract
Patents bring technology companies commercial values in modern business operations. However, companies have to bear the high cost of handling patent applications or infringement cases. A common yet expensive task among these jobs is to analyze relevant patent literature. Lengthy and technically complicated patents require a large number of human efforts. This paper focuses on automatically analyzing the similar contents between a patent and its relevant literature, relevant patents specifically, to help experts review the similarities among these patents. We formulate this as a one-to-many document comparison problem by generating a comparative summary of a given patent and its relevant patents. We extract essential technical features from semantic dependency trees based on sentences in claims and construct a multi-relational graph to model the relevance between features and patents. The key to generating the comparative summary is selecting comparative essential technical features, which we formulate as an optimization problem and solve by a fast greedy algorithm. Experiments on real-world datasets and case studies demonstrate the effectiveness and efficiency of the proposed methods.
Similar content being viewed by others
References
Abbas, A., Zhang, L., & Khan, S. U. (2014). A literature review on the state-of-the-art in patent analysis. World Patent Information, 37, 3–13.
Cascini, G., & Zini, M. (2008). Measuring patent similarity by comparing inventions functional trees. IFIP International Federation for Information Processing, 277, 31–42.
Choi, S., Kim, H., Yoon, J., Kim, K., & Lee, J. Y. (2012). An sao-based text-mining approach for technology roadmapping using patent information. R & D Management, 43(1), 52–74.
Devlin, J., Chang, MW., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota (Vol. 1, pp 4171–4186). https://doi.org/10.18653/v1/N19-1423.
Erkan, G., & Radev, D. R. (2004) LexPageRank: Prestige in multi-document text summarization. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Barcelona, Spain, pp. 365–371.
Federico, P., Heimerl, F., Koch, S., & Miksch, S. (2017). A survey on visual approaches for analyzing scientific literature and patents. IEEE Transactions on Visualization and Computer Graphics, 23(9), 2179–2198. https://doi.org/10.1109/TVCG.2016.2610422
Gong, Y., & Liu, X. (2001). Generic text summarization using relevance measure and latent semantic analysis. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Association for Computing Machinery, New York, NY, USA, SIGIR’01, p 19–25
Helmers, L., Horn, F., Biegler, F., Oppermann, T., & Müller, K. R. (2019). Automating the search for a patent’s prior art with a full text similarity search. PLOS ONE, 14(3), 1–17.
Hu, P., Huang, M., Xu, P., Li, W., Usadi, A. K., & Zhu, X. (2012). Finding nuggets in ip portfolios: Core patent mining through textual temporal analysis. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, Association for Computing Machinery, CIKM ’12, pp. 1819–1823.
Huang, X., Wan, X., & Xiao, J. (2014). Comparative news summarization using concept-based optimization. Knowledge & Information Systems, 38(3), 691–716.
Krestel, R., Chikkamath, R., Hewel, C., & Risch, J. (2021). A survey on deep learning for patent analysis. World Patent Information, 65, 102035.
Lee, C., Song, B., & Park, Y. (2013). How to assess patent infringement risks: a semantic patent claim analysis using dependency relationships. Technology Analysis & Strategic Management, 25(1), 23–38.
Li, T., & Ding, C. (2008). Weighted consensus clustering. In Proceedings of the 2008 SIAM International Conference on Data Mining, SIAM (pp. 798–809).
Lupu, M., Mayer, K., Kando, N., & Trippe, A. J. (2017). Current challenges in patent information retrieval. Springer. https://doi.org/10.1007/978-3-662-53817-3
Mani, I., & Bloedorn, E. (1997). Multi-document summarization by graph search and matching. In Proceedings of the Fourteenth National Conference on Artificial Intelligence and Ninth Conference on Innovative Applications of Artificial Intelligence, AAAI Press, AAAI’97/IAAI’97, pp. 622–628.
Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S. J., & McClosky, D (2014) The Stanford CoreNLP natural language processing toolkit. In: Association for Computational Linguistics (ACL) System Demonstrations, pp 55–60
Mihalcea, R., Tarau, P (2005) A language independent algorithm for single and multiple document summarization. In: Companion Volume to the Proceedings of Conference including Posters/Demos and tutorial abstracts, Asian Federation of Natural Language Processing
Mikolov, T., Sutskever, I., Chen, K., Corrado, GS., & Dean, J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Ren, X., Lv, Y., Wang, K., & Han, J (2017) Comparative document analysis for large text corpora. Association for Computing Machinery, New York, NY, USA, WSDM ’17, p 325-334, 10.1145/3018661.3018690, https://doi.org/10.1145/3018661.3018690
Risch, J., & Krestel, R. (2019). Domain-specific word embeddings for patent classification. Data Technologies and Applications, 53(1), 108–122.
Shalaby, W., & Zadrozny, W. (2019). Patent retrieval: a literature review. Knowledge and Information Systems, 61(2), 631–660. https://doi.org/10.1007/s10115-018-1322-7
Shen C, & Li T (2010) Multi-document summarization via the minimum dominating set. In: Proceedings of the 23rd International Conference on Computational Linguistics, Association for Computational Linguistics, USA, COLING’10, p 984–992
Shen, D., Sun, JT., Li, H., Yang, Q., & Chen, Z (2007) Document summarization using conditional random fields. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, IJCAI’07, p 2862–2867
Souza, C. M., Meireles, M. R. G., & Almeida, P. E. M. (2021). A comparative study of abstractive and extractive summarization techniques to label subgroups on patent dataset. Scientometrics, 126(1), 135–156. https://doi.org/10.1007/s11192-020-03732-x
Tang, J., Wang, B., Yang, Y., Hu, P., Zhao, Y., Yan, X., Gao, B., Huang, M., Xu, P., Li, W., et al (2012) Patentminer: Topic-driven patent analysis and mining. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery, New York, NY, USA, KDD’12, p 1366–1374, 10.1145/2339530.2339741
Tseng, Y. H., Lin, C. J., & Lin, Y. I. (2007). Text mining techniques for patent analysis. Inf Process Manage, 43(5), 1216–1247. https://doi.org/10.1016/j.ipm.2006.11.011
Wan, X., & Yang, J (2008) Multi-document summarization using cluster-based link analysis. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Association for Computing Machinery, New York, NY, USA, SIGIR’08, p 299–306, 10.1145/1390334.1390386
Wang, D., & Li, T (2010) Many are better than one: Improving multi-document summarization via weighted consensus. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Association for Computing Machinery, New York, NY, USA, SIGIR’10, p 809–810, 10.1145/1835449.1835627
Wang, D., Zhu, S., Li, T., & Gong, Y (2009) Multi-document summarization using sentence-based topic models. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, Association for Computational Linguistics, USA, ACLShort’09, p 297–300
Wang, D., Zhu, S., Li, T., & Gong, Y (2012) Comparative document summarization via discriminative sentence selection. ACM Trans Knowl Discov Data 6(3), 10.1145/2362383.2362386
Yang, SY., & Soo, VW (2008) Comparing the conceptual graphs extracted from patent claims. In: Proceedings of the 2008 IEEE International Conference on Sensor Networks, Ubiquitous, and Trustworthy Computing (Sutc 2008), IEEE Computer Society, USA, SUTC’08, p 394–399, 10.1109/SUTC.2008.87
Zhang, L., Li, L., & Li, T. (2015). Patent mining: A survey. SIGKDD Explor Newsl, 16, 1–19.
Zhang, L., Li, L., Shen, C., & Li, T (2015b) Patentcom: A comparative view of patent document retrieval. In: Proceedings of the 2015 SIAM International Conference on Data Mining, SIAM, pp 163–171
Zhang, L., Liu, Z., Li, L., Shen, C., & Li, T. (2018). PatSearch: an integrated framework for patentability retrieval. Knowledge and Information Systems, 57(1), 135–158. https://doi.org/10.1007/s10115-017-1127-0
Zhou, D., Bousquet, O., Lal, TN., Weston, J.,&Schölkopf, B (2003) Learning with local and global consistency. In: Proceedings of the 16th International Conference on Neural Information Processing Systems, MIT Press, Cambridge, MA, USA, NIPS’03, p 321–328
Funding
Funding was provided by Nanjing University of Posts and Telecommunications (Grant No. NY219084).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liu, Z., Zhang, J., Qin, T. et al. One-to-many comparative summarization for patents. Scientometrics 127, 1969–1993 (2022). https://doi.org/10.1007/s11192-022-04307-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-022-04307-8