Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add codemod ID #35

Merged
merged 3 commits into from
Jul 22, 2024
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion codetf.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,26 @@ This open format describes code changes or suggestions made by an automated tool

# The specification JSON

The [specification](codetf.json) is immature right now, only existing as a marked-up JSON file instead of a proper JSON schema. It's also not independently versioned outside of langauge-specific bindings (e.g., [Java binding](https://github.com/pixee/codetf-java-bindings)). We are avoiding more investment in ceremony, versioning, governance, etc., until we feel it has reached a more stable footing. Following [SARIF](https://docs.oasis-open.org/sarif/sarif/v2.1.0/csprd01/sarif-v2.1.0-csprd01.html) stylistically as a long term goal makes sense, not only because it's a successful standard, but also because our results will be closely linked with SARIF, so we could have many users, consumers, and implementors in common.
The [specification](codetf.schema.json) is expressed in terms of [JSON Schema](https://json-schema.org/). The schema is currently not versioned. We are avoiding investment in ceremony, versioning, governance, etc., until we feel it has reached a more stable footing. Following [SARIF](https://docs.oasis-open.org/sarif/sarif/v2.1.0/csprd01/sarif-v2.1.0-csprd01.html) stylistically as a long term goal makes sense, not only because it's a successful standard, but also because our results will be closely linked with SARIF, so we could have many users, consumers, and implementors in common.

Note that like SARIF, this format is not intended to be a replacement for a diagnostic log. It's not intended to have anything more than minimum diagnostics to help with reproducibility.

# Structure

It may help to understand the major components of CodeTF from a high levels first before exploring or attemptin to implement the specification. The `results` and `changeset` fields can be seen as a series of patches against a project's directory. Each patch builds on any previous patches seen. Therefore, applying a patch from the middle of a `changeset` without the others may be invalid. Multiple locations can be changed in a single file within the scope of a single codemod and be represented by a single `changeset` array entry.

# Codemod IDs

Codemods are uniquely identified by an ID, which is represented in CodeTF as the `codemod` property of the `result` object.
drdavella marked this conversation as resolved.
Show resolved Hide resolved

IDs are descriptive and must conform to the following schema: `<origin>:<language>/<name>`

Each component of the ID has a particular meaning:

* `<origin>`: Origin describes the source of the analysis or transformation. For example, "find and fix" codemods provided by Pixee are labelled with the origin "pixee". Codemods that remediate issues found by a static analysis tool might be labelled with the origin corresponding to that tool name (e.g. "semgrep" or "codeql"). Implementers of custom codemods may use a unique identifier that is specific to their organization or tool.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Origin describes the source of the analysis or transformation

I feel like it should always be the source of the analysis and never the transformation.

Let's consider the custom codemod use case. If a user develops a custom Sonar codemod and wants to use it with our platform, the origin better be "sonar", or our platform will not consider it to be a Sonar codemod that gets access to Sonar results.

If you agree, then can we rename this to scanner()? I have come across places in our platform where we actually translate "origin" to "scanner", and I think the latter makes more sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like scanner because there's no reason codemodder couldn't take input from non-scanner sources of code information, like observability tools, IAST, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

detector?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like detector. This fits with the way we talk about it in codemodder architecture.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love it when we use the same words everywhere 😊

drdavella marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codemods that remediate issues found by a static analysis tool

Our docs use the term fix only codemods, but I don't think that has stuck.

* `<language>`: The language that is transformed by the codemod. This should be a short, unique identifier for the language. Valid languages include `java`, `python`, and `javascript`.
* `<name>`: The name of the codemod. This should be a short, unique identifier for the transformation that is performed. Individual words in the name should be separated by hyphens. For example: `remove-unused-imports`.

# Notes
Note that the `changeset` array can have multiple entries for the same given file.

Expand Down