Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent uncontrolled file tree recursion while validating configuration files #419

Merged
merged 4 commits into from
Nov 30, 2023

Conversation

per1234
Copy link
Contributor

@per1234 per1234 commented Nov 23, 2023

The assets and repository infrastructure validates various configuration file against their JSON schema.

The ajv-cli validator is used to perform the validation. ajv-cli supports the use of globs in the data file path argument. "globstars" are used to cause the tool to search the file tree in the repository recursively for data files to validate.

Previously these glob patterns were not carefully chosen, which made it possible for unintended data files to be validated. Searching and validating these files is inefficient at best and the cause of spurious failures at worst. That worst case scenario has now occurred as the unintended validation of a file discovered through uncontrolled recursive searching is failing validation and causing the spurious failure of the markdownlint configuration validation task and workflow:

https://github.com/arduino/tooling-project-assets/actions/runs/6961626319/job/18943637236#step:4:45

node_modules/markdownlint/schema/.markdownlint.yaml invalid

In this case, the file is externally maintained (meaning that the cause of a validation failure can not reasonably be fixed from inside this project) and not even the type of data being targeted (it is a JSON schema file for the markdownlint configuration rather than an actual markdownlint configuration file).

Various strategies are used here to ensure only intended files are validated. See the individual commit messages for details.

…iles

The project infrastructure validates the Dependabot configuration files against their JSON schema. In addition to the
repository's own Dependabot configuration file at the standard location, a reusable standardized "asset" configuration
file for GitHub Actions workflows is hosted in the repository. There is a need to eventually add additional "asset"
files to configure Dependabot for other dependency management frameworks.

Previously, in order to provide validation coverage for all the dependabot.yml files in the repository, a "globstar" was
used to cause the validator to recursively search the entire file tree under the repository. That approach is
problematic because the repository contains externally maintained files (e.g., the npm packages under the node_modules
folder). Searching and validating these files is inefficient at best and the cause of spurious failures at worst.

This is avoided by targeting the search. In order to support the addition of more "asset" Dependabot configuration files
in the future, a globstar is still used, but the recursion is limited to the folder dedicated as a container for those
files, which will never lead to unintended files being validated.
…on files

The project infrastructure validates the markdownlint configuration files against their JSON schema. In addition to the
repository's own markdownlint configuration file at the standard location, a reusable standardized "asset" configuration
file is hosted in the repository.

Previously, in order to provide validation coverage for all the .markdownlint.yml files in the repository, a "globstar"
was used to cause the validator to recursively search the entire file tree under the repository. That approach is
problematic because the repository contains externally maintained files (e.g., the npm packages under the node_modules
folder). Searching and validating these files is inefficient at best and the cause of spurious failures at worst. That
worst case scenario has now occurred as a file from a npm package dependency is being discovered by the validation
system and failing validation.

This is avoided by targeting the search. The chosen approach is to validate only the "asset" file. Even though this
causes the file in the root of the repository to no longer be validated directly, it continues to be indirectly
validated because the file in the root is an exact copy of the "asset" (and the repository infrastructure enforces the
sync).
The "Check npm" workflow checks for problems with the npm configuration files of a repository. Previously this workflow
assumed that a repository would only ever have these files in the root folder. However, a repository might also contain
multiple separate npm-managed projects in arbitrary subfolders.

Support for any repository structure is added to the workflow by using a job matrix to check the projects under an array
of arbitrary paths that can be configured for the specific repository the template is installed in.
The project infrastructure validates the package.json npm configuration files against their JSON schema.

Previously, in order to provide validation coverage for all package.json files in any locations in the repository, a
"globstar" was used to cause the validator to recursively search the entire file tree under the repository. That
approach is problematic because the repository contains externally maintained files (e.g., the npm packages under the
node_modules folder). Searching and validating these files is inefficient at best and the cause of spurious failures at
worst.

This is avoided by targeting the search. Support for a repository maintainer to configure any number of specific
locations of npm-managed projects in the "Check npm" workflow has been added, so this system is used to target the
validations. When the `npm:validate` task is ran by a contributor on their local clone, it defaults to the root of the
repository, but the path can be configured by setting the PROJECT_PATH taskfile variable via an argument to the task
invocation command.
@per1234 per1234 added type: imperfection Perceived defect in any part of project topic: code Related to content of the project itself topic: infrastructure Related to project infrastructure labels Nov 23, 2023
@per1234 per1234 self-assigned this Nov 23, 2023
Copy link
Contributor

@alessio-perugini alessio-perugini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😮 wow this was tricky. Good job

@per1234 per1234 merged commit 0e986a4 into arduino:main Nov 30, 2023
46 checks passed
@per1234 per1234 deleted the avoid-uncontrolled-recursion branch December 5, 2023 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: code Related to content of the project itself topic: infrastructure Related to project infrastructure type: imperfection Perceived defect in any part of project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants