[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

LiDetector: License Incompatibility Detection for Open Source Software

Published: 13 February 2023 Publication History

Abstract

Open-source software (OSS) licenses dictate the conditions, which should be followed to reuse, distribute, and modify software. Apart from widely-used licenses such as the MIT License, developers are also allowed to customize their own licenses (called custom license), whose descriptions are more flexible. The presence of such various licenses imposes challenges to understand licenses and their compatibility. To avoid financial and legal risks, it is essential to ensure license compatibility when integrating third-party packages or reusing code accompanied with licenses. In this work, we propose LiDetector, an effective tool that extracts and interprets OSS licenses (including both official licenses and custom licenses), and detects license incompatibility among these licenses. Specifically, LiDetector introduces a learning-based method to automatically identify meaningful license terms from an arbitrary license, and employs Probabilistic Context-Free Grammar (PCFG) to infer rights and obligations for incompatibility detection. Experiments demonstrate that LiDetector outperforms existing methods with 93.28% precision for term identification, and 91.09% accuracy for right and obligation inference, and can effectively detect incompatibility with 10.06% FP rate and 2.56% FN rate. Furthermore, with LiDetector, our large-scale empirical study on 1,846 projects reveals that 72.91% of the projects are suffering from license incompatibility, including popular ones such as the MIT License and the Apache License. We highlighted lessons learned from perspectives of different stakeholders and made all related data and the replication package publicly available to facilitate follow-up research.

References

[1]
Thomas A. Alspaugh, Hazeline U. Asuncion, and Walt Scacchi. 2009. Intellectual property rights requirements for heterogeneously-licensed systems. In Proceedings of the 17th IEEE International Requirements Engineering Conference. 24–33.
[2]
Thomas A. Alspaugh, Walt Scacchi, and Hazeline U. Asuncion. 2010. Software licenses in context: The challenge of heterogeneously-licensed systems. Journal of the Association for Information Systems 11, 11 (2010), 2.
[3]
Benjamin Andow, Samin Yaseer Mahmud, Wenyu Wang, Justin Whitaker, William Enck, Bradley Reaves, Kapil Singh, and Tao Xie. 2019. Policylint: Investigating internal privacy policy contradictions on Google Play. In Proceedings of the 28th USENIX Conference on Security Symposium. 585–602.
[4]
BDF. 2021. The Backdoor Factory. Retrieved 27th Sep 2021 from https://github.com/secretsquirrel/the-backdoor-factory.
[5]
Blosc. 2021. A blocking, shuffling and lossless compression library. Retrieved 27th Sep 2021 from https://github.com/Blosc/c-blosc.
[6]
Sen Chen, Lingling Fan, Guozhu Meng, Ting Su, Minhui Xue, Yinxing Xue, Yang Liu, and Lihua Xu. 2020. An empirical assessment of security risks of global android banking apps. In Proceedings of the 2020 IEEE/ACM 42nd International Conference on Software Engineering. IEEE, 1310–1322.
[7]
Sen Chen, Ting Su, Lingling Fan, Guozhu Meng, Minhui Xue, Yang Liu, and Lihua Xu. 2018. Are mobile banking apps secure? what can be improved? In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 797–802.
[8]
choosealicense. 2012. Choose an open source license. Retrieved 27th Sep 2021 from https://choosealicense.com/no-permission/.
[9]
Gordon Thomas F.2010. Report on prototype decision support system for oss license compatibility issues. Qualipso 79 (2010), 80.
[10]
facebookarchive. 2021. Augmented Traffic Control. Retrieved 27th Sep 2021 from https://github.com/facebookarchive/augmented-traffic-control.
[11]
Runyu Fan, Lizhe Wang, Jining Yan, Weijing Song, Yingqian Zhu, and Xiaodao Chen. 2020. Deep learning-based named entity recognition and knowledge graph construction for geological hazards. ISPRS International Journal of Geo-Information 9, 1 (2020), 15.
[12]
Linux Foundation. 2018. The Software Package Data Exchange. Retrieved 27th Sep 2021 from https://spdx.dev/.
[13]
GR Gangadharan, Vincenzo D’Andrea, Stefano De Paoli, and Michael Weiss. 2012. Managing license compliance in free and open source software development. Information Systems Frontiers 14, 2 (2012), 143–154.
[14]
Daniel German and Massimiliano Di Penta. 2012. A method for open source license compliance of java applications. IEEE Software 29, 3 (2012), 58–63.
[15]
Daniel M. German, Yuki Manabe, and Katsuro Inoue. 2010. A sentence-matching method for automatic license identification of source code files. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering. 437–446.
[16]
Robert Gobeille. 2008. The fossology project. In Proceedings of the 2008 International Working Conference on Mining Software Repositories. 47–50.
[17]
Thomas F. Gordon. 2011. Analyzing open source license compatibility issues with carneades. In Proceedings of the 13th International Conference on Artificial Intelligence and Law. 51–55.
[18]
Thomas F. Gordon. 2013. Introducing the carneades web application. In Proceedings of the 14th International Conference on Artificial Intelligence and Law. 243–244.
[19]
Thomas F. Gordon. 2014. A demonstration of the MARKOS license analyser. In Proceedings of the 5th International Conference on Computational Models of Argument. 461–462.
[20]
Stanford NLP Group. 2020. corenlp. Retrieved 27th Sep 2021 from https://stanfordnlp.github.io/CoreNLP/.
[21]
Hao Guo, Sen Chen, Zhenchang Xing, Xiaohong Li, Yude Bai, and Jiamou Sun. 2022. Detecting and augmenting missing key aspects in vulnerability descriptions. ACM Transactions on Software Engineering and Methodology (TOSEM) 31, 3 (2022), 1–27.
[22]
Hao Guo, Zhenchang Xing, Sen Chen, Xiaohong Li, Yude Bai, and Hu Zhang. 2021. Key aspects augmentation of vulnerability description based on multiple security databases. In Proceedings of the 2021 IEEE 45th Annual Computers, Software, and Applications Conference. IEEE, 1020–1025.
[23]
HaboMalHunter. 2021. Habo Linux Malware Analysis System. Retrieved 27th Sep 2021 from https://github.com/Tencent/HaboMalHunter.
[24]
Yunosuke Higashi, Yuki Manabe, and Masao Ohira. 2016. Clustering OSS license statements toward automatic generation of license rules. In Proceddings of the 7th International Workshop on Empirical Software Engineering in Practice. 30–35.
[25]
Georgia Kapitsaki and Georgia Charalambous. 2019. Modeling and recommending open source licenses with findOSSLicense. IEEE Transactions on Software Engineering 47, 5 (2019), 919–935.
[26]
Georgia M. Kapitsaki and Frederik Kramer. 2014. Open source license violation check for SPDX files. In Proceedings of the Software Reuse for Dynamic Systems in the Cloud and Beyond. 90–105.
[27]
Georgia M. Kapitsaki, Frederik Kramer, and Nikolaos D. Tselikas. 2017. Automating the license compatibility process in open source software with SPDX. Journal of Systems and Software 131 (2017), 386–401.
[28]
Georgia M. Kapitsaki and Demetris Paschalides. 2017. Identifying terms in open source software license texts. In Proceedigns of the 24th Asia-Pacific Software Engineering Conference. 540–545.
[29]
Petros Karvelis, Dimitris Gavrilis, George Georgoulas, and Chrysostomos Stylios. 2018. Topic recommendation using Doc2Vec. In Proceedings of the 2018 International Joint Conference on Neural Networks. 1–6.
[30]
kevin. 2012. Software Licenses in Plain English. Retrieved 27th Sep 2021 from https://tldrlegal.com/.
[31]
Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning. 1188–1196.
[32]
Liuqing Li, He Feng, Wenjie Zhuang, Na Meng, and Barbara Ryder. 2017. Cclearner: A deep learning-based clone detection approach. In Proceedings of the 2017 IEEE International Conference on Software Maintenance and Evolution. IEEE, 249–260.
[33]
librariesio. 2015. Check compatibility between different SPDX licenses for checking dependency license compatibility. Retrieved from https://github.com/librariesio/license-compatibility.
[34]
Lawrence Rosen. 2004. Open Source Licensing: Software Freedom and Intellectual Property Law. Upper Saddle River, Prentice Hall.
[35]
Ling. 2003. Alphabetical list of part-of-speech tags used in the Penn Treebank Project. Retrieved 27th Sep 2021 from https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html.
[36]
Chengwei Liu, Sen Chen, Lingling Fan, Bihuan Chen, Yang Liu, and Xin Peng. 2022. Demystifying the vulnerability propagation and its evolution via dependency trees in the NPM ecosystem. In Proceedings of the 2022 IEEE/ACM 44nd International Conference on Software Engineering. IEEE.
[37]
Fabio Mancinelli, Jaap Boender, Roberto Di Cosmo, Jerome Vouillon, Berke Durak, Xavier Leroy, and Ralf Treinen. 2006. Managing the complexity of large free and open source package-based software distributions. In Proceedings of the 21st IEEE/ACM International Conference on Automated Software Engineering. 199–208.
[38]
Arunesh Mathur, Harshal Choudhary, Priyank Vashist, William Thies, and Santhi Thilagam. 2012. An empirical study of license violations in open source projects. In Proceedings of the 35th Annual IEEE Software Engineering Workshop. 168–176.
[39]
nltk. 2021. Natural Language Toolkit. Retrieved 27th Sep 2021 from https://www.nltk.org/.
[40]
Opensource. 2021. What is open source? Retrieved 27th Sep 2021 from https://opensource.com/resources/what-open-source.
[41]
Demetris Paschalides and Georgia M Kapitsaki. 2016. Validate your SPDX files for open source license violations. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 1047–1051.
[42]
paul. 2021. Full extractor of class/interface/method definitions. Retrieved 27th Sep 2021 from https://github.com/paul-hammant/qdox.
[43]
pivotal. 2021. Find licenses for your project’s dependencies. Retrieved 27th Sep 2021 from https://github.com/pivotal/LicenseFinder.
[44]
ProgrammerSought. 2021. The First Case of GPL Agreement in China is Settled. How Should the Relevant Open Source Software be Controlled? Retrieved from https://segmentfault.com/a/1190000040661920/en.
[45]
PyPi. 2021. Find, install and publish Python packages with the Python Package Index. Retrieved 27th Sep 2021 from https://pypi.org/.
[46]
Jaideep Reddy. 2015. The Consequences of Violating Open Source Licenses. Retrieved 27th Sep 2021 from https://btlj.org/2015/11/consequences-violating-open-source-licenses/.
[47]
Nils Reimers and Iryna Gurevych. 2017. Optimal hyperparameters for deep LSTM-networks for sequence labeling tasks. arXiv:1707.06799. Retrieved from https://arxiv.org/abs/1707.06799.
[48]
Christopher D. Manning Richard Socher. 2014. GloVe: Global Vectors for Word Representation. Retrieved 27th Sep 2021 from https://nlp.stanford.edu/projects/glove/.
[49]
robinhood. 2021. Faust. Retrieved 27th Sep 2021 from https://github.com/robinhood/faust.
[50]
Colin Scicluna, James de la Higuera. 2016. Grammatical inference of PCFGs applied to language modelling and unsupervised parsing. Fundamenta Informaticae 146, 4 (2016), 379–402.
[51]
Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 1631–1642.
[52]
Georgios S. Solakidis, Konstantinos N. Vavliakis, and Pericles A. Mitkas. 2014. Multilingual sentiment analysis using emoticons and keywords. In Proceedings of the IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies. 102–109.
[53]
SPDX. 2018. Apache License 2.0. Retrieved 27th Sep 2021 from https://spdx.org/licenses/Apache-2.0.html.
[54]
SPDX. 2018. BSD 3-Clause “New” or “Revised” License. Retrieved 27th Sep 2021 from https://spdx.org/licenses/BSD-3-Clause.html.
[55]
SPDX. 2018. Creative Commons Attribution Share Alike 4.0 International. Retrieved 27th Sep 2021 from https://spdx.org/licenses/CC-BY-SA-4.0.html.
[56]
SPDX. 2018. GNU Lesser General Public License v3.0 only. Retrieved 27th Sep 2021 from https://spdx.org/licenses/LGPL-3.0-only.html.
[57]
SPDX. 2018. The MIT License. Retrieved 27th Sep 2021 from https://spdx.org/licenses/MIT.html.
[58]
SPDX. 2018. Zope Public License 2.1. Retrieved 27th Sep 2021 from https://spdx.org/licenses/ZPL-2.1.html.
[59]
SPDX. 2021. Creative Commons Attribution 3.0 Unported. Retrieved 27th Sep 2021 from https://spdx.org/licenses/CC-BY-3.0.html.
[60]
SPDX. 2021. SPDX License List. Retrieved 27th Sep 2021 from https://spdx.org/licenses/.
[61]
Statsite. 2021. Statsite. Retrieved 27th Sep 2021 from https://github.com/statsite/statsite.
[62]
Tuunanen Timo, Koskinen Jussi, and Kärkkäinen Tommi. 2009. Automated software license analysis. Automated Software Engineering 16 (2009), 455–490.
[63]
David A. Wheeler. 2007. The free-libre / open source software (FLOSS) license slide. Retrieved 27th Sep 2021 from http://www.dwheeler.com/essays/floss-license-slide.pdf.
[64]
Linzhong Xia, Jun Liu, and Zhenjiu Zhang. 2019. Automatic essay scoring model based on two-layer bi-directional long-short term memory network. In Proceedings of the 2019 3rd International Conference on Computer Science and Artificial Intelligence. 133–137.
[65]
HongBo Xu, HuiHui Yang, Dan Wan, and JiangPing Wan. 2010. The design and implement of open source license tracking system. In Proceddings of the 2010 International Conference on Computational Intelligence and Software Engineering. 1–4.
[66]
Sihan Xu, Ya Gao, Lingling Fan, Zheli Liu, Yang Liu, and Hua Ji. 2021. LiDetector: License Incompatiblity Detection for Open Source Software. Retrieved 1st Jan 2022 from https://sites.google.com/view/lidetector.
[67]
Sihan Xu, Ya Gao, Lingling Fan, Zheli Liu, Yang Liu, and Hua Ji. 2021. LiDetector: License Incompatiblity Detection for Open Source Software. Retrieved 1st Jan 2022 from https://github.com/XuSihan/LiDetector.
[68]
Xian Zhan, Lingling Fan, Sen Chen, Feng Wu, Tianming Liu, Xiapu Luo, and Yang Liu. 2021. Atvhunter: Reliable version detection of third-party libraries for vulnerability identification in android applications. In Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering. IEEE, 1695–1707.
[69]
Xian Zhan, Lingling Fan, Tianming Liu, Sen Chen, Li Li, Haoyu Wang, Yifei Xu, Xiapu Luo, and Yang Liu. 2020. Automated third-party library detection for android applications: Are we there yet?. In Proceedings of the 2020 35th IEEE/ACM International Conference on Automated Software Engineering. IEEE, 919–930.
[70]
Xian Zhan, Tianming Liu, Lingling Fan, Li Li, Sen Chen, Xiapu Luo, and Yang Liu. 2021. Research on third-party libraries in android apps: A taxonomy and systematic literature review. IEEE Transactions on Software Engineering (2021), 1–32.

Cited By

View all
  • (2024)The Software Genome Project: Unraveling Software Through Genetic PrinciplesProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695307(2319-2323)Online publication date: 27-Oct-2024
  • (2024)Catch the Butterfly: Peeking into the Terms and Conflicts Among SPDX Licenses2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00056(477-488)Online publication date: 12-Mar-2024
  • (2024)LiScopeLens: An Open-Source License Incompatibility Analysis Tool Based on Scope Representation of License Terms2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE62328.2024.00013(13-24)Online publication date: 28-Oct-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology
ACM Transactions on Software Engineering and Methodology  Volume 32, Issue 1
January 2023
954 pages
ISSN:1049-331X
EISSN:1557-7392
DOI:10.1145/3572890
  • Editor:
  • Mauro Pezzè
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 February 2023
Online AM: 19 May 2022
Accepted: 14 February 2022
Revised: 15 December 2021
Received: 13 August 2021
Published in TOSEM Volume 32, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Open source software
  2. license
  3. incompatibility detection

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • National Natural Science Foundation of China
  • National Key Research Project of China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)440
  • Downloads (Last 6 weeks)54
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)The Software Genome Project: Unraveling Software Through Genetic PrinciplesProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695307(2319-2323)Online publication date: 27-Oct-2024
  • (2024)Catch the Butterfly: Peeking into the Terms and Conflicts Among SPDX Licenses2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00056(477-488)Online publication date: 12-Mar-2024
  • (2024)LiScopeLens: An Open-Source License Incompatibility Analysis Tool Based on Scope Representation of License Terms2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE62328.2024.00013(13-24)Online publication date: 28-Oct-2024
  • (2023)Understanding and Remediating Open-Source License Incompatibilities in the PyPI EcosystemProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE56229.2023.00175(178-190)Online publication date: 11-Nov-2023
  • (2023)The software heritage license dataset (2022 edition)Empirical Software Engineering10.1007/s10664-023-10377-w28:6Online publication date: 8-Nov-2023

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media