[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1858996.1859088acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
research-article

A sentence-matching method for automatic license identification of source code files

Published: 20 September 2010 Publication History

Abstract

The reuse of free and open source software (FOSS) components is becoming more prevalent. One of the major challenges in finding the right component is finding one that has a license that is e for its intended use. The license of a FOSS component is determined by the licenses of its source code files. In this paper, we describe the challenges of identifying the license under which source code is made available, and propose a sentence-based matching algorithm to automatically do it. We demonstrate the feasibility of our approach by implementing a tool named Ninka. We performed an evaluation that shows that Ninka outperforms other methods of license identification in precision and speed. We also performed an empirical study on 0.8 million source code files of Debian that highlight interesting facts about the manner in which licenses are used by FOSS

References

[1]
}}T. A. Alspaugh, H. U. Asuncion, and W. Scacchi. Analyzing software licenses in open architecture software systems. In FLOSS '09: Proc. Workshop on Emerging Trends in Free/Libre/Open Source Software Research and Development, pages 54--57, 2009.
[2]
}}A. Capiluppi, P. Lago, and M. Morisio. Characteristics of open source projects. Software Maintenance and Reengineering, European Conference on, 0:317, 2003.
[3]
}}P. Claugh. A Perl program for sentence splitting using rules. http://ir.shef.ac.uk/cloughie/software.html, April 2001.
[4]
}}J. Colazo and Y. Fang. Impact of license choice on open source software development activity. J. Am. Soc. Inf. Sci. Technol., 60(5):997--1011, 2009.
[5]
}}M. Di Penta and D. M. German. Who are source code contributors and how do they change? In "Proc. 16th Working Conference on Reverse Engineering WCRE'09", pages 11--20, Oct 2009.
[6]
}}M. Di Penta, D. M. German, and G. Antoniol. Identifying Licensing of Jar Archives using a Code-Search Approach. In International Working Conference on Mining Software Repositories (MSR 2010), pages 151--160, May 2010.
[7]
}}M. Di Penta, D. M. German, Y.-G. Guéhéneuc, and G. Antoniol. An exploratory study of the evolution of software licensing. In Proc. of the 32rd Int. Conf. on Software Engineering (ICSE'10), pages 145--154, 2010.
[8]
}}Free Software Foundation. Frequently Asked Questions about the GNU Licenses. http://www.fsf.org/licensing/licenses/gpl-faq.html. Accessed Feb. 2009.
[9]
}}D. M. German, M. Di Penta, and J. Davis. Understanding and auditing the licensing of open source software distributions. In 18th Int. Conf. on Program Comprehension (ICPC'2010), May 2010.
[10]
}}D. M. German and A. E. Hassan. License integration patterns: Addressing license mismatches in component-based development. In Proc. 31st Int. Conf. on Soft. Eng., ICSE, pages 188--198, 2009.
[11]
}}D. M. German, M. D. Penta, Y. Gueheneuc, and G. Antoniol. Code siblings: Technical and legal implications of copying code between applications. In Proceedings of the International Working Conference in Mining Software Repositories, pages 81--90, 2009.
[12]
}}R. Gobeille. The FOSSology project. In MSR '08: Proceedings of the 2008 International Conference on Mining Software Repositories, pages 47--50, 2008.
[13]
}}J. M. Gonzalez-Barahona, G. Robles, M. Michlmayr, J. J. Amor, and D. M. German. Macro-level software evolution: a case study of a large software compilation. Journal of Empirical Software Engineering, 14(3):262--285, 2009 2009.
[14]
}}P. Heckel. A technique for isolating differences between files. Commun. ACM, 21(4):264--268, 1978.
[15]
}}J. Li, R. Conradi, C. Bunse, M. Torchiano, O. Slyngstad, and M. Morisio. Development with off-the-shelf components: 10 facts. Software, IEEE, 26(2):80--87, 2009.
[16]
}}Nokia Corp. About PySide. http://www.pyside.org/about/, 2009. Acc. Sept. 2009.
[17]
}}L. Rosen. Open Source Licensing: Software Freedom and Intellectual Property Law. Prentice Hall, 2004.
[18]
}}C. Ruffin and C. Ebert. Using open source software in product development: a primer. IEEE Software, 21(1):82--86, 2004.
[19]
}}W. Scacchi. Free/open source software development. In ESEC-FSE '07: Proc. of the the 6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, pages 459--468, 2007.
[20]
}}C. Subramaniam, R. Sen, and M. L. Nelson. Determinants of open source software project success: A longitudinal study. Decision Support Systems, 46(2):576--585, 2009.
[21]
}}T. Tuunanen, J. Koskinen, and T. KärkkŒinen. Automated software license analysis. Automated Software Engg., 16(3-4):455--490, 2009

Cited By

View all
  • (2024)An Exploratory Investigation into Code License Infringements in Large Language Model Training DatasetsProceedings of the 2024 IEEE/ACM First International Conference on AI Foundation Models and Software Engineering10.1145/3650105.3652298(74-85)Online publication date: 14-Apr-2024
  • (2024)ModelGo: A Practical Tool for Machine Learning License AnalysisProceedings of the ACM Web Conference 202410.1145/3589334.3645520(1158-1169)Online publication date: 13-May-2024
  • (2024)LiScopeLens: An Open-Source License Incompatibility Analysis Tool Based on Scope Representation of License Terms2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE62328.2024.00013(13-24)Online publication date: 28-Oct-2024
  • Show More Cited By

Index Terms

  1. A sentence-matching method for automatic license identification of source code files

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ASE '10: Proceedings of the 25th IEEE/ACM International Conference on Automated Software Engineering
    September 2010
    534 pages
    ISBN:9781450301169
    DOI:10.1145/1858996
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    • IEEE CS

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 September 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. automated license identification
    2. open source licenses
    3. software licenses

    Qualifiers

    • Research-article

    Conference

    ASE10
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 82 of 337 submissions, 24%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)45
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 24 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)An Exploratory Investigation into Code License Infringements in Large Language Model Training DatasetsProceedings of the 2024 IEEE/ACM First International Conference on AI Foundation Models and Software Engineering10.1145/3650105.3652298(74-85)Online publication date: 14-Apr-2024
    • (2024)ModelGo: A Practical Tool for Machine Learning License AnalysisProceedings of the ACM Web Conference 202410.1145/3589334.3645520(1158-1169)Online publication date: 13-May-2024
    • (2024)LiScopeLens: An Open-Source License Incompatibility Analysis Tool Based on Scope Representation of License Terms2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE62328.2024.00013(13-24)Online publication date: 28-Oct-2024
    • (2023)Automating License Rule Generation to Help Maintain Rule-based OSS License Identification ToolsJournal of Information Processing10.2197/ipsjjip.31.231(2-12)Online publication date: 2023
    • (2023)Towards Automated Detection of Unethical Behavior in Open-Source Software ProjectsProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616314(644-656)Online publication date: 30-Nov-2023
    • (2023)LiResolver: License Incompatibility Resolution for Open Source SoftwareProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598085(652-663)Online publication date: 12-Jul-2023
    • (2023)LiDetector: License Incompatibility Detection for Open Source SoftwareACM Transactions on Software Engineering and Methodology10.1145/351899432:1(1-28)Online publication date: 13-Feb-2023
    • (2023)An Empirical Study of License Conflict in Free and Open Source SoftwareProceedings of the 45th International Conference on Software Engineering: Software Engineering in Practice10.1109/ICSE-SEIP58684.2023.00050(495-505)Online publication date: 17-May-2023
    • (2023)FOSSLT: An Efficient Model for Automatic Finding Open Source Software License Texts2023 2nd International Conference on Cloud Computing, Big Data Application and Software Engineering (CBASE)10.1109/CBASE60015.2023.10439089(269-275)Online publication date: 3-Nov-2023
    • (2023)Understanding and Remediating Open-Source License Incompatibilities in the PyPI EcosystemProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE56229.2023.00175(178-190)Online publication date: 11-Nov-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media