8000 feat: improve dpkg cataloger license recognition for "license agreements" by spiffcs · Pull Request #3888 · anchore/syft · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

feat: improve dpkg cataloger license recognition for "license agreements" #3888

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 14, 2025

Conversation

spiffcs
Copy link
Contributor
@spiffcs spiffcs commented May 14, 2025

Description

The DPKG cataloger can run into issues when parsing license contents from a reader.

// For more information see: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/#license-syntax
var (
licensePattern = regexp.MustCompile(`^License: (?P<license>\S*)`)
commonLicensePathPattern = regexp.MustCompile(`/usr/share/common-licenses/(?P<license>[0-9A-Za-z_.\-]+)`)
)
func parseLicensesFromCopyright(reader io.Reader) []string {
findings := strset.New()
scanner := bufio.NewScanner(reader)
for scanner.Scan() {
line := scanner.Text()
if value := findLicenseClause(licensePattern, "license", line); value != "" {
findings.Add(value)
}
if value := findLicenseClause(commonLicensePathPattern, "license", line); value != "" {
findings.Add(value)
}
}
results := findings.List()
sort.Strings(results)
return results
}

The above code will follow this specification to try and read License: <VALUE> from the copyright file.

What this PR Solves

Some popular licenses do not include the License: tag and instead prompt a heading with:

End User License Agreement
--------------------------

This PR takes the above header case into consideration and builds an exception list so that these multi line license agreements can be surfaced as license strings and returned from the copyright reader.

I followed the directions on the issue to see if the use case for detecting the NVIDIA license agreement is fulfilled by this PR. In the below snippet you can see two new licenses being detected along with their associated packages.

go run cmd/syft/main.go -qo json nvidia/cuda:12.5.1-cudnn-runtime-ubuntu20.04 | grant list -o json | jq -r '
 [.results[] |
   {
     label: (if .license.license_id != "" then .license.license_id else .license.name end),
     packages: (.packages | map(.name) | join(", "))
   }
 ]
 | sort_by(.label)
 | .[]
 | "\(.label): \(.packages)"
' | rg NVIDIA


LICENSE AGREEMENT FOR NVIDIA SOFTWARE DEVELOPMENT KITS: libcudnn9-cuda-12
NVIDIA Software License Agreement and CUDA Supplement to Software License Agreement: cuda-cudart-12-5, cuda-nvrtc-12-5, cuda-nvtx-12-5, cuda-toolkit-12-5-config-common, cuda-toolkit-12-config-common, cuda-toolkit-config-common, libcublas-12-5, libcufft-12-5, libcurand-12-5, libcusolver-12-5, libcusparse-12-5, libnpp-12-5, libnvfatbin-12-5, libnvjitlink-12-5, libnvjpeg-12-5

Alternative License Discovery Methods This PR Solves

This PR also attempts to categorize licenses that do NOT have a License: field by passing their contents through the new scanner from ##3876

This will create a license with the shasum256 of the contents. If the user enables the variable SYFT_LICENSE_CONTENT=unknown, the SBOM will return the full contents for these unknown licenses.

Here is an example of one of these alternatively discovered licenses with SYFT_LICENSE_CONTENT=unknown enabled that was not discovered previously:

    {
      "id": "78752887a93235eb",
      "name": "libcom-err2",
      "version": "1.45.5-2ubuntu1.1",
      "type": "deb",
      "foundBy": "dpkg-db-cataloger",
      "locations": [...],
      "licenses": [
        {
          "value": "LicenseRef-sha256:9e3a4384b6d8d2358d44103f62bcd948328b3f8a63a1a6baa66abeb43302d581",
          "fullText": "",
          "spdxExpression": "",
          "type": "concluded",
          "urls": [],
          "locations": [...],
          "contents": "This is the Debian GNU/Linux prepackaged version of the Common Error\nDescription library. It is currently distributed together with the EXT2 file\nsystem utilities, which are otherwise packaged as \"e2fsprogs\".\n\nThis package was put together by Yann Dirson <dirson@debian.org>,\nfrom sources obtained from a mirror of:\n tsx-11.mit.edu:/pub/linux/packages/ext2fs/\n\nFrom the original distribution:\n\nCopyright 1987, 1988 by the Student Information Processing Board\n\tof the Massachusetts Institute of Technology\n\nPermission to use, copy, modify, and distribute this software\nand its documentation for any purpose and without fee is\nhereby granted, provided that the above copyright notice\nappear in all copies and that both that copyright notice and\nthis permission notice appear in supporting documentation,\nand that the names of M.I.T. and the M.I.T. S.I.P.B. not be\nused in advertising or publicity pertaining to distribution\nof the software without specific, written prior permission.\nM.I.T. and the M.I.T. S.I.P.B. make no representations about\nthe suitability of this software for any purpose.  It is\nprovided \"as is\" without express or implied warranty.\n"
        }
      ]

Packages with no licenses for the given test image and why

Names and versions of packages still missing licenses in the final SBOM after failing to find values in parseLicensesFromCopyright and running the licenses classifier against their contents.

cuda-compat-12-5 555.42.06-1
cuda-libraries-12-5 12.5.1-1

For some of the cuda packages not having a copyright file under /usr/share/doc/<PKG>/ means we cannot do either of the above methods for license identification. In this case there are no contents to pass to the classifier and no reader to try and extract License: fields from.

Full list of packages with no licenses after this is merged:

go run cmd/syft/main.go -o json nvidia/cuda:12.5.1-cudnn-runtime-ubuntu20.04 | jq -r '
 .artifacts[]
 | select(.licenses == [] )
 | "\(.name) \(.version)"
'

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist:

  • I have added unit tests that cover changed behavior
  • I have tested my code in common scenarios and confirmed there are no regressions
  • I have added comments to my code, particularly in hard-to-understand sections

Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
@spiffcs spiffcs changed the title feat: improve dpkg license recognition for "license agreements" feat: improve dpkg cataloger license recognition for "license agreements" May 14, 2025
@spiffcs spiffcs marked this pull request as ready for review May 14, 2025 00:39
spiffcs added 4 commits May 13, 2025 21:52
Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
@spiffcs spiffcs merged commit e5d7760 into main May 14, 2025
13 checks passed
@spiffcs spiffcs deleted the 3090-dpkg-license-improvment-for-license-agreement branch May 14, 2025 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: dpkg license improvement for non SPDX licenses
2 participants
0