Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AG-34] Search GitHub for package when license is not available and GitHub repository is not declared #25

Open
felipead opened this issue Jan 11, 2018 · 0 comments

Comments

@felipead
Copy link
Contributor

felipead commented Jan 11, 2018

Some packages don't have licenses declared, neither a GitHub URL available in the license registry.

However, after a Web search we can find the GitHub repository associated with that package. In many cases, we can be 100% sure the repository belongs to the correct package because at least some of these facts are true:

  • the GitHub repository name matches the package name.
  • the GitHub username matches the maintainer username declared in the package descriptor.
  • the readme contains a link back to the package in the package registry server.
  • the programming language from the GitHub matches the platform (ex: JavaScript or HTML should match Node.js).

The more these facts are true, the higher the accuracy of the license detection.

Then we can detect the GitHub license from the GitHub licenses API or the technique described in AG-26.

Here's an example. We have the "region-flags" Node.js package declared at http://registry.npmjs.org/region-flags. This package has no GitHub urls nor license fields.

However, the package descriptor has a maintainers section:

"maintainers": [
        {
          "name": "behnam",
          "email": "npm@behnam.es"
        }

From this entry we can easily discover the corresponding GitHub url: https://github.com/behnam/region-flags. The license is then declared in the COPYING file.

We also have a case for a Ruby gem declared at https://rubygems.org/api/v1/gems/active_data.json. This gem also does not declare GitHub urls or license fields. But it has an "authors" field:

    "authors": "pyromaniac",

From this we can build the GitHub url, which is https://github.com/pyromaniac/active_data. This repository is licensed as MIT according to GitHub.

Some other cases might not be as much straightforward though. For instance, we have a Ruby gem declared at https://rubygems.org/api/v1/gems/guard-rails_best_practices.json.

It doesn't declare any possible GitHub username in the "authors" field. Instead, it declares a full name:

    "authors": "Logan Koester",

However, searching github for the package name ("guard-rails_best_practices") we can find the following match: https://github.com/logankoester/guard-rails_best_practices

We can safely assume this github repository is the one we're looking for because:

  • The repository name is exactly the package name.
  • The programming language is Ruby.
  • At least one of the words "Logan" or "Koester" is found on the github username ("logankoester").
  • Most importantly: there's a link back to RubyGems (https://rubygems.org/gems/guard-rails_best_practices) in the repository url field.

Another advanced case happens for the "globalize-accessors" Ruby gem (https://rubygems.org/api/v1/gems/globalize-accessors.json). Search github for this repository we find https://github.com/globalize/globalize-accessors. But how can we be sure this repository belongs to the package we're looking for? It belongs to the "globalizer" organization, and we can't find any mentions of it in the maintainers or authors page. However, the RubyGems list the authors as:

  "authors": "Tomasz Stachewicz, Wojciech Pietrzak, Steve Verlinden, Robert Pankowecki, Chris Salzberg",

Querying the list of contributors from GitHub (https://github.com/globalize/globalize-accessors/graphs/contributors) we can find all those names as repository committers. Thus we can safely conclude this is the repository we're looking for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant