8000 improve robustness of linking to license on hosting website · Issue #73 · google/go-licenses · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

improve robustness of linking to license on hosting website #73

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Bobgy opened this issue Jun 25, 2021 · 8 comments · May be fixed by #110 or #328
Open

improve robustness of linking to license on hosting website #73

Bobgy opened this issue Jun 25, 2021 · 8 comments · May be fixed by #110 or #328

Comments

@Bobgy
Copy link
Collaborator
Bobgy commented Jun 25, 2021

In v2, I implemented some utils to get github repo from go-import=1 and use it to generate public & versioned links to detected licenses' hosting website (for now, only github).

I noticed some harder problems:

  1. distinguishing "major branch" and "major subdirectory" conventions

There is one problem: for a major version greater than 1, the templates for “major branch” and “major subdirectory” conventions differ (See https://research.swtch.com/vgo-module for a discussion of these conventions.) To determine the right template, make a HEAD request for the go.mod file using each template, and select the one that succeeds. For example, for module github.com/a/b/v2 at version v2.3.4, probe both github.com/a/b/blob/v2.3.4/go.mod (the location of the go.mod file using the “major branch” convention) and github.com/a/b/blob/v2.3.4/v2/go.mod (its location using “major subdirectory”).

  1. support modules not at root of a repo, example https://github.com/Azure/go-autorest/tree/autorest/v0.9.0. Note that tags are also different, a tag "autorest/v0.9.0" means v0.9.0 version of the module ROOT/autorest. https://github.com/googleapis/google-cloud-go/tree/master/storage is another example, tags for it has "storage/` prefix.
  2. support other source hosting websites

Potential Solution

@wlynch pointed out the following references, there's an internal source package built for pkgsite that exactly provides a package that can figure out repo hosting website of a go import path and get a public link to source code. However, the package is internal, so we cannot directly import it.

I'll ask if they are ready to make it public, or I have to vendor it in some way.

EDIT: the reply is that we need to vendor it: golang/go#40477 (comment).

References

@Bobgy Bobgy changed the title [v2] properly handle both "major branch" and "major subdirectory" conventions [v2] improve robustness of linking to license on hosting website Jun 25, 2021
@Bobgy
Copy link
Collaborator Author
Bobgy commented Jan 5, 2022

I noticed that problem 2 and 3 are mostly solved by pkgsite/source package.
While problem 1 -- distinguishing "major branch" and "major subdirectory" conventions may still cause incorrect remote URLs.

We will still need to leave this issue as open.

@Bobgy
Copy link
Collaborator Author
Bobgy commented Jan 23, 2022

Giving a breaking example for case 2 "support modules not at root":

$ go-licenses csv cloud.google.com/go/storage
...
cloud.google.com/go/storage, https://github.com/googleapis/google-cloud-go/blob/storage/v1.10.0/storage/LICENSE, Apache-2.0
...

Note the URL https://github.com/googleapis/google-cloud-go/blob/storage/v1.10.0/storage/LICENSE is broken, the correct URL should be https://github.com/googleapis/google-cloud-go/blob/storage/v1.10.0/LICENSE. The problem is caused by the fact that:

  • for modules in a subdir of a repo, when go caches module files and found the submodule does not have a LICENSE file, it "magically" copies LICENSE file from root folder to the sub-module. e.g. https://github.com/googleapis/google-cloud-go/tree/storage/v1.10.0/storage
  • therefore, go-licenses finds a LICENSE file at root of submodule and tries to guess its remote URL as root of submodule, while the actual LICENSE file is at root of repo

Note, adopting pkgsite/source allowed us to get the correct tag storage/v1.10.0 for this repo, but we still hit this LICENSE file path problem.

@Bobgy
Copy link
Collaborator Author
Bobgy commented Jan 23, 2022

Examples for problem 1: distinguishing "major branch" and "major subdirectory" conventions

Major branch (result is correct)

Major branch: a new major version is released in a branch, source code is at root of repo.
gopkg.in/yaml.v2
License: https://github.com/go-yaml/yaml/blob/v2.4.0/LICENSE

Major subdirectory (incorrect)

Major subdir: a new major version is released in a subdir in the same branch as v1, source code for v2 is at a subdir ./v2/
github.com/googleapis/gax-go/v2
License: got https://github.com/googleapis/gax-go/blob/v2.1.1/v2/LICENSE, but should be https://github.com/googleapis/gax-go/blob/v2.1.1/LICENSE

Therefore, root cause for this failure example is in fact the same as #73 (comment). The guessed URL is incorrect for module not at the root of a repo.

@Bobgy
Copy link
Collaborator Author
Bobgy commented Jan 24, 2022

Added a v2 proposal roadmap item: validate license URL by fetching it, we can detect these failures and turn the URL into unknown or try other locations again and finally verifying file content is exactly the same. With these workarounds, we can mitigate the issue of user unknowingly got an invalid URL.

@Bobgy
Copy link
Collaborator Author
Bobgy commented Feb 3, 2022

Furthermore, we can solve all above broken cases by:

  1. Infer remote license URL as usual
  2. Fetch raw license file from remote, validate it's the same as the locally found license file
  3. If 2 failed, we can further try and validate LICENSE at repo root
  4. If everything failed, return UNKNOWN

@Bobgy Bobgy changed the title [v2] improve robustness of linking to license on hosting website improve robustness of linking to license on hosting website Apr 11, 2022
@dschmidt
Copy link
dschmidt commented Sep 6, 2022

Could you export a (versioned) URL to the root of the repo as well?
Possibly a breaking change to add it to the CSV, but it could be added to the data available to templates.

I'm creating a licenses page in my web app and would like to link the package name to the respective github (or wherever) page.

@Bobgy
Copy link
Collaborator Author
Bobgy commented Sep 6, 2022

Possibly a breaking change to add it to the CSV

The csv format is fixed, I would not modify it.

but it could be added to the data available to templates.

Welcome a PR, this isn't too hard.

@dschmidt
Copy link
dschmidt commented Sep 6, 2022

Okies, already started and have it basically working - unfortunately I won't have time to polish/finish it this/next week, but will do when I get to it.

gwatts added a commit to gwatts/go-licenses that referenced this issue May 9, 2025
resolves google#73, resolves google#186

For modules that are not at the root of the repo, Go will make a copy of
the LICENSE file from the repo root into the module's directory when
creating the .zip file (see https://go.dev/ref/mod#vcs-license)

This throws off URL generation, as FileURL will create a link to a repo
URL to the license that doesn't exist as the file was synthesized by Go.

Attempt to detect this case by examining the zip file that Go creates -
As it stands, it appends the LICENSE file to the zip's file directory;
we can generally assume that if we see such an addition, it's a result
of Go copying it there and therefore it's not a real file in the repo,
instead being copied from the root.

If we find that to be the case, generate a link to the original LICENSE
file at the root of the repo instead.

The complex e2e test here exercises this nicely (updated links didn't
work before, now they do).
gwatts added a commit to gwatts/go-licenses that referenced this issue May 9, 2025
resolves google#73, resolves google#186

For modules that are not at the root of the repo, Go will make a copy of
the LICENSE file from the repo root into the module's directory when
creating the .zip file (see https://go.dev/ref/mod#vcs-license)

This throws off URL generation, as FileURL will create a link to a repo
URL to the license that doesn't exist as the file was synthesized by Go.

Attempt to detect this case by examining the zip file that Go creates -
As it stands, it appends the LICENSE file to the zip's file directory;
we can generally assume that if we see such an addition, it's a result
of Go copying it there and therefore it's not a real file in the repo,
instead being copied from the root.

If we find that to be the case, generate a link to the original LICENSE
file at the root of the repo instead.

The complex e2e test here exercises this nicely (updated links didn't
work before, now they do).
@gwatts gwatts linked a pull request May 9, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants
0