Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invoke known tools to gather build-time dependency information #1562

Open
4 tasks
kzantow opened this issue Feb 9, 2023 · 8 comments
Open
4 tasks

Invoke known tools to gather build-time dependency information #1562

kzantow opened this issue Feb 9, 2023 · 8 comments
Assignees
Labels
enhancement New feature or request external-data Data for cataloging that does not exist in packaging metadata (--with tools candidate) planning high level epic that should be broken into smaller tasks

Comments

@kzantow
Copy link
Contributor

kzantow commented Feb 9, 2023

What would you like to be added:
Add the ability to shell-out to known tools such as go and mvn in order to capture more accurate build-time dependency information.

Why is this needed:
To improve the build-time dependency support in Syft.

Additional context:
From a working document:

Creating higher quality SBOMs in Syft at build-time

At build time, static analysis of dependencies implemented today is limited. Improving static analysis metrics can be done by simulating what build systems do. This is subject to drift and additional maintenance to keep up with behaviors of the build systems.

One approach to resolving this issue is to call out to build systems to get that information instead. This introduces additional (optional) dependencies.

Syft as a build-time SBOM generator tool

Syft can be seen as a “build-time” SBOM generator tool, and can start thinking about utilizing build tooling. Calling out to build tools can be such as the following, we will use 2 examples.

Golang

Instead of reading and trying to parse and resolve the go.mod file, the go mod graph command can be used to get a fully resolved dependency tree.

Java

Maven has the mvn dependency:tree command which shows the fully-resolved dependency graph.

NPM

Npm has npm ls --all

Python

Python has pipdeptree

Considerations

  • When using external tooling, version and parameter information should be captured
  • Warnings on quality of what is being used to generate it can be made visible, as well as suggestions on how to obtain better SBOMs (e.g. dependency pinning)
@kzantow kzantow added the enhancement New feature or request label Feb 9, 2023
@kzantow
Copy link
Contributor Author

kzantow commented Feb 9, 2023

cc: @lumjjb

@wagoodman
Copy link
Contributor

This is most likely needed on order to achieve #1674 and #572 in a meaningful way.

This functionality should be opt-in, that is, by default syft should remain a static analysis tool. Executing other commands on the system should still be not allowed by default (again, unless the user opts in).

Considerations:

  • should these "external querying capabilities" be encapsulated into their own separate catalogers? For example go-mod-file-cataloger stays as it is today and allow for a new go-tooling-cataloger. In this way opting in would be adding a cataloger (or enabling a flag which would automatically swap out one cataloger for another)... or should we go in the direction of keeping the existing catalogers today that behave differently based on configuration? (one assumption Im making by going down this path is that the impl for the go.mod cataloging today is mutually exclusive to using the build tooling)

  • we probably don't want to find duplicate packages by doing a static analysis and tooling query, there should be an obvious mechanism for enforcing mutual exclusivity for existing (static) analysis and tooling analysis.

  • even if a new cataloger is not used to encapsulate this behavior it should be obvious to the user that this was found via a tooling query vs looking at just the go.mod contents (more than just the application configuration probably).

@setchy
Copy link

setchy commented Jun 30, 2023

would the same be true for npm, too?

@noqcks
Copy link
Contributor

noqcks commented Sep 12, 2023

Instead of shelling out to cli tools, would you consider building parsers directly inside syft? It wouldn't require one to depend on the presence of local tooling, and I could envision that the tools might have different ouputs depending on the installed version of the tooling.

Snyk, for example, has built a bunch of parsers for various ecosystems in js https://github.com/snyk/dotnet-deps-parser

I just implemented something similar in cdxgen for .NET [ref] and npm [ref] to determine direct/indirect deps in build files. Wondering if the same could work here but written in golang.

Add the ability to shell-out to known tools such as go and mvn in order to capture more accurate build-time dependency information.

Can you expand on what specifically would be more accurate. In my mind I can only imagine direct/indirect deps. But is there more?

I also noticed that syft doesn't generate a dependencies section for CycloneDX for different language specific files (go.mod, package-lock.json). Would the outcome of this issue be that this section would be filled?

@kzantow
Copy link
Contributor Author

kzantow commented Sep 12, 2023

@noqcks we do already have lots of parsers for different ecosystems. This change, at least initially, would be an opt-in behavior to shell out to the tools. This would allow things like Go - which has a flat list of dependencies in the go.mod - to get the dependency graph and properly output it in different formats.

@noqcks
Copy link
Contributor

noqcks commented Sep 12, 2023

I suppose what I meant is only shelling out to tools where necessary (in the case that go mod graph is truly the only way to see the dependency tree for go projects), and writing all other dep graph parsers directly into syft where possible.

I'd like to work on getting real dependency graph for javascript projects inside syft, and wanted to write this dep parser inside syft instead of relying on an external npm cli.

Wanted to clarify whether this would be an appropriate avenue to pursue before I started the work.

@wagoodman
Copy link
Contributor

Since this is a potentially large item that would affect multiple ecosystems I think a detailed plan is needed to move forward with this (how would this work within a single cataloger, what abstractions do we want to introduce (if any), would abstractions be generalizable to other ecosystem catalogers (if so, how), etc)

@wagoodman wagoodman added the external-data Data for cataloging that does not exist in packaging metadata (--with tools candidate) label Jan 2, 2025
@kzantow
Copy link
Contributor Author

kzantow commented Jan 2, 2025

This continues to come up in discussion and also some open PRs: for mvn and for dotnet.

At the very least, once we start allowing shell-out behavior, we should probably integrate every option with the --enrich flag and probably --enrich all should enable all known enrichment including shelling out, but have an option of some sort like --enrich all,-shell could only disable shelling sources independent of the other enrichment options or --enrich network to enable only network enrichment without any shell-out options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request external-data Data for cataloging that does not exist in packaging metadata (--with tools candidate) planning high level epic that should be broken into smaller tasks
Projects
Status: Backlog
Development

No branches or pull requests

4 participants