Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: (multiple) attachments for license texts of type "expression" #554

Open
jkowalleck opened this issue Dec 14, 2024 · 12 comments
Open
Assignees
Milestone

Comments

@jkowalleck
Copy link
Member

jkowalleck commented Dec 14, 2024

currently , we do allow one text attachment per "named"-/"spdx"-license.
but we dont allow any test attachments for a SPDX license expression

Request

allow multiple license text attachments per SPDX license expression

Discussion

why thou?

short: not all SPDX licenses are templates.

Not all SPDX licenses are templates, some have qualified "placeholders" that need to be filled
by the ones applying them.
Therefore, it is important to carry the actual declared license texts of a component, even when using a SPDX license expression (like MIT or GPL-3.0-or-later)
And even for template texts (like Apache-2.0), it might be required to carry license amendment texts (like a NOTICE file for Apache2).

This is why it is needed to have a license texts for SPDX expression.

why multiple tests, why not a single text?

short: expressions might consist of multiple different licenses, each having an own text

expected outcome: the specification

Have an option to carry the text for each SPDX-license-ID and SPD-license-ref in an SPDX license expression

intended implementation

use the existing structure of an attachment, but also have a field to tell which SPDX id or ref-name it applies to.
Spdx id MUST use the existing enum CycloneDX spec usesfor that matter. -- https://cyclonedx.org/docs/1.6/json/#components_items_licenses_oneOf_i0_items_license_id
Name is free text -- https://cyclonedx.org/docs/1.6/json/#components_items_licenses_oneOf_i0_items_license_name
Like with existing license spec -- EITHER name OR id (XSD <xs:choice> / JSON-schema oneOf - one, not both)

possible results

{
  "expression": "MIT OR GPL-3.0-or-later OR LicenseRef-.amazon.com.-AmznSL-1.0",
  "acknowledgement": "declared",
  "texts": [
    {
      "id": "MIT",
      "content": "Copyright (c) 1984 Example org\n\n\nPermission is hereby granted, free of charge, to any person obtaining a copy of this software\n[...]"
    },
    {
      "id": "GPL-3.0-or-later",
      "content": "Example project\nCopyright (C) 1984 Example org\n\nThis program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.[...]"
    },
    {
      "name": "LicenseRef-.amazon.com.-AmznSL-1.0",
      "content": " Amazon Software License 1.0\\n\\n\\nThis Amazon Software License (\"License\") governs your use, reproduction, and\\n\\ndistribution of the accompanying software as specified below.\\n\\n\\n## 1. Definitions\\n\\n\\n  \"Licensor\" means any person or entity that distributes its Work.\n[...]"
    }
  ]
}

original story:

Hi @jkowalleck ,

My impression is that with v1.5 we have a significant design flaw.

@Joerki , could you give a practical example for something that is not possible with today's design?

to do a separation with between a list or single expression I see the following issues:

With the expression I don't see how to include a license text for a certain item of the expression.
Licenses that come with the SPDX license list that come with a text without placeholders are not a problem.
For a standard license this might be a problem if the license definition has placeholders for e.g. authors or a company in the text. My colleague who deals with legal aspects says that the use of such a "template" is not sufficient for a reference, we need a "verbatim" copy of the license text (which is stored in the public repo of the component). In an attribution report (that we generate from the SBOM) we must have texts that satisfy these legal requirements, so the text must be contained in the SBOM.
This could be a problem with 1.4 already if such a license is referenced in an expression.
SPDX allows to create a custom ID (LicenseRef-*). This is declared also as "expression" like compound expression given as example in the CycloneDX spec. And again: where can I specify the license text that belongs to the non-standard ID?

Example: https://metadata.ftp-master.debian.org/changelogs//main/o/openssl/openssl_3.0.15-1~deb12u1_copyright
Please note that in these files you do not find standardized IDs. Therefore you have both IDs and texts. Texts might appear in a "Files" stanza or a dedicated "License" stanza (which makes sense when licenses appear multiple times). So I don't need a reference to content outside the copyright file.
I convert the IDs to an SPDX ID of a standard license, this was possible for me in the past to have finally a proper SPDX expression. I use the aboutcode.org license list repo and extend it for us.

My conclusion:
With the license list I have the chance to provide (almost) full information when several licenses need to be considered at the same time including license texts (X AND Y).

CycloneDX limits the use of SPDX expressions to cases where the creator has to make a conclusion for a multi-licensed component where he can choose between licenses (X OR Y) that have a known, standardized text that can be taken 1:1 from its original definition.

Originally posted by @Joerki in #349

@jkowalleck
Copy link
Member Author

jkowalleck commented Dec 14, 2024

I will work on a solution, planned for milestone 1.7.
All discussion and every help is welcome 👋

@jkowalleck jkowalleck self-assigned this Dec 14, 2024
@Joerki
Copy link

Joerki commented Dec 16, 2024

Hi @jkowalleck,

it's great to see this item for planning in the 1.7 milestone!

To prevent confusion, the example should show "GPL-3.0-or-later" (which is the proper SPDX identifier) instead of "GPL3+" (an ID you find in Debian copyright files).

I don't know what you (CycloneDX creators) had in mind with "SPDX expression" in CycloneDX spec context. My impression is that in contrast to to the license ID/name list you wanted to have a counterpart that supports a compound expression, which appears in declared licenses.

The definition of "SPDX expression" by SPDX is broader and covers "simple" and "compound" expressions, e.g. user defined licenses with "LicenseRef-[idstring]" (also named as "license-ref" in contrast to "license-ids", the items in official license list)), see

https://spdx.github.io/spdx-spec/v2.3/SPDX-license-expressions/
https://spdx.github.io/spdx-spec/v3.0.1/annexes/spdx-license-expressions/#overview

To make CycloneDX usable for license compliance (including OSS) I see the need to support SPDX expressions that fully support SPDX's definition.

A further suggestion:

To simplify attribution reports for humans that are generated from a CycloneDX SBOM I suggest to add also an optional "name" field like we have for the "named" license. The license identifiers we have in the SPDX License List don't require it. They already have a full name. But in case the full "SPDX expression" definition is considered including "license-refs", a name should be available in the specfication to be consistent with the SPDX ID/name list for reports.

For compatiblity and readability I suggest to stick with the "license ID/name list" and "SPDX expression" approach, but - as said - without a usage restriction.

BR,
Jörg

@jkowalleck
Copy link
Member Author

To prevent confusion, the example should show "GPL-3.0-or-later" (which is the proper SPDX identifier) instead of "GPL3+" (an ID you find in Debian copyright files).

Thank you for pointing that out. I've edited my original feature request, fixed the GPL3+ to be the correct term GPL-3.0-or-later

@jkowalleck
Copy link
Member Author

To simplify attribution reports for humans that are generated from a CycloneDX SBOM I suggest to add also an optional "name" field like we have for the "named" license. The license identifiers we have in the SPDX License List don't require it. They already have a full name. But in case the full "SPDX expression" definition is considered including "license-refs", a name should be available in the specfication to be consistent with the SPDX ID/name list for reports

great call! I've edited the original feature request example to reflect this.
please have a review.

@Joerki
Copy link

Joerki commented Dec 19, 2024

Hi,

my example looks like this:

{
  "expression": "MIT OR GPL-3.0-or-later OR LicenseRef-.amazon.com.-AmznSL-1.0",
  "acknowledgement": "declared",
  "texts": [
    {
      "license-identifier": "MIT",
      "text": {
        "content": "Copyright (c) 1984 Example org\n\n\nPermission is hereby granted, free of charge, to any person 
obtaining a copy of this software\n[...]"
      }
    },
    {
      "license-identifier": "GPL-3.0-or-later",
      "text": {
        "content": "Example project\nCopyright (C) 1984 Example org\n\nThis program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.[...]"
      }
    },
    {
      "license-identifier": "LicenseRef-.amazon.com.-AmznSL-1.0",
      "name": "Amazon Software License",
      "text": {
        "content": "Amazon Software License 1.0\\n\\n\\nThis Amazon Software License (\\"License\\") governs your use, reproduction, and\\n\\ndistribution of the accompanying software as specified below.\\n\\n\\n## 1. Definitions\\n\\n\\n  \\"Licensor\\" means any person or entity that distributes its Work.\n[...]"
      }
    }
  ]
}

Another significant reference:
BSI-TR-03183-2 Version 2.0.0 (10.10.2024)
Federal Office for Information Security (BSI)
https://www.bsi.bund.de/SharedDocs/Downloads/EN/BSI/Publications/TechGuidelines/TR03183/BSI-TR-03183-2-2_0_0.html
Chapter 6.1 (License identifiers and expressions)
They refer to SPDX annexes about usage, they also recommend to use the Scancode LicenseDB Aboutcode!

My conclusion:

  • Expressions and IDs (SPDX IDs, custom IDs) should not be mixed up with names and text that give additional context.
  • (Human readable) names give additional help for identification of a license inside the SBOM (to have it in the external DB like Aboutcode is not enough)
  • (human readable) names get more helpful when no reference (LicenseRef-xyz) to another source of information exists (like with SPDX ID official list or Aboutcode DB with LicenseRef-scancode-* entries that come with human readable license names)
  • The BSI gives strict rules about the identification of licenses, which means that the .licenses.license lists with SPDX ID/names are not considered applicable anymore for companies that decide to implement and reference the BSI TR.
  • License information must be understandable for humans, supporting different concepts of license declaration
    • the license declaration from authors and output of current tools is very often not sufficient for license compliance and requires manual efforts for identifcation and conclusion
    • the transfer of copyright and licensing information we find in different ecosystems into CycloneDX needs to be managable by software tools and humans if a flexible fashion
      • metadata contained in packages distributed by platforms (npm, pypi, nuget etc.)
      • copyright data distributed by Linux distributions (Debian machine-readable/not m.-r. copyright data)

@jkowalleck
Copy link
Member Author

jkowalleck commented Dec 19, 2024

re: #554 (comment)

your suggestion

    {
      "id": "LicenseRef-.amazon.com.-AmznSL-1.0",
      "name": "Amazon Software License",
      "content": "..."
    }

The "id" is planned to be the usual CycloneDX enum, see https://cyclonedx.org/docs/1.6/json/#components_items_licenses_oneOf_i0_items_license_id
Therefore, it is not possible to use LicenseRef-* here. This is why my example looked this way.
Changing the spec, so id may be either enum value or an arbitrary string following [%s"DocumentRef-"(idstring)":"]%s"LicenseRef-"(idstring), is not in the scope of this very ticket. Please open an extra ticket for this matter, if needed.

Current spec allows either id or name for licenses. see https://cyclonedx.org/docs/1.6/json/#components_items_licenses_oneOf_i0_items_license - OneOf/Choice -> not both
Changing the spec, so name and id may exist at the same time, is not in the scope of this very ticket. Please open an extra ticket for this matter, if needed.

@Joerki
Copy link

Joerki commented Dec 20, 2024

re: #554 (comment)

your suggestion

{
  "id": "LicenseRef-.amazon.com.-AmznSL-1.0",
  "name": "Amazon Software License",
  "content": "..."
}

The "id" is planned to be the usual CycloneDX enum, see https://cyclonedx.org/docs/1.6/json/#components_items_licenses_oneOf_i0_items_license_id Therefore, it is not possible to use LicenseRef-* here.

Based on SPDX and BSI documentation referenced above, please give a proof of how CycloneDX including this approach fulfills license declaration based on legal license compliance requirements.
I do not see it with the CycloneDX spec and this enhancement.

@jkowalleck
Copy link
Member Author

Based on SPDX and BSI documentation referenced above, please give a proof of how CycloneDX including this approach fulfills license declaration based on legal license compliance requirements.
I do not see it with the CycloneDX spec and this enhancement.

which aspects do you see as not feasible?

@Joerki
Copy link

Joerki commented Jan 8, 2025

We extend the existing CycloneDX specification and should take care that the naming of elements do not have different meaning dependent in their context.

We (already) have:

  • "id": An SPDX-License-Identifier of a license that is present in the official SPDX license list, e.g. "0BSD"
  • "name": The name of a license, e.g. "Acme Software License"
  • "expression": An SPDX expression (currently seen as SPDX compound expression)
  • A license identified by "id" and "name" may come with text that identifies the (exact) license text
  • No license texts can be associated to expression items

What we need additionally:

a) An attribute that holds the license texts for items that are used in an expression context
b) An identifier attribute representing an SPDX identifier that is not limited to official "SPDX license list", but is compliant to the "simple-expression" and "simple-expression ( %s"WITH" / %s"with" ) addition-expression" definition of the "SPDX license expression" annex
c) The attributes' names shall not overload the meaning of an alraady existing one in the spec to prevent confusion
d) Dedicated counterparts to "id" and "name" and "text" are meaningful and should exist to achieve consistency with the existing definition

Point a)
The suggestion is to have a section "texts" having a "content" item representing the text.
We already have a "text" definition containing "contentType", "encoding" and "content".
I expect that a new element's context needs to be compatible with it.
An item of the list should contain the "text" in the same fashion like we have at other locations in the spec.

Point b)
I suggest "license-identifier" which is compatible to the current SPDX wording.

Point c)
In "licenses.license" context the name is a human readable label of the license (CycloneDX's example "Acme Software License"). To mix this with an item that is finally an ID causes inconsistency and has therefore protentional risks for automated processing.

Point d)
To have a "name" makes sense for labelling a license like for existing licenses.license.name item.

@Joerki
Copy link

Joerki commented Jan 9, 2025

Beyond the text, why shoudn't we support all other attributes licenses.license has to give a true alternative to licenses.license?

@jkowalleck
Copy link
Member Author

jkowalleck commented Jan 9, 2025

Beyond the text, why shoudn't we support all other attributes licenses.license has to give a true alternative to licenses.license?

I am not against such a proposal per se 😄 ,
but please keep the scope of this very ticket: this ticket is about licence text attachments - and nothing more

@jkowalleck jkowalleck changed the title feat: multiple attachments for licenses of type "expression" feat: (multiple) attachments for license texts of type "expression" Jan 9, 2025
@Joerki
Copy link

Joerki commented Jan 10, 2025

Beyond the text, why shoudn't we support all other attributes licenses.license has to give a true alternative to licenses.license?

I am not against such a proposal per se 😄 , but please keep the scope of this very ticket: this ticket is about licence text attachments - and nothing more

Yes, but an alternative for "texts" is feasible to be more generic when extending the spec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants