Skip to content

Commit

Permalink
User-facing docs for secret disclosure API (pypi#17236)
Browse files Browse the repository at this point in the history
* User-facing docs for secret disclosure API

* Add to the ToC

* Add details about keys & signatures
  • Loading branch information
di authored Dec 5, 2024
1 parent 7ce8cee commit a3e7e6a
Show file tree
Hide file tree
Showing 3 changed files with 135 additions and 46 deletions.
55 changes: 9 additions & 46 deletions docs/dev/development/token-scanning.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,43 +6,13 @@ content managers run regexes to try and identify published secrets, and ideally
have them deactivated. PyPI has started integrating with such systems in order
to help secure packages.


How to recognize a PyPI secret
------------------------------

A PyPI API token is a string consisting of a prefix (``pypi``), a separator
(``-``) and a macaroon serialized with PyMacaroonv2, which means it's the
``base64`` of::

\x02\x01\x08pypi.org\x02\x01b

Thanks to this, we know that a PyPI token is bound to start with::

pypi-AgEIcHlwaS5vcmc[A-Za-z0-9-_]{70,}

A token can be arbitrary long because we may add arbitrary many caveats. For
more details on the token format, see `pypitoken
<https://pypitoken.readthedocs.io>`_.

GitHub Secret Scanning
----------------------

GitHub's Token scanning feature used to be called "Token Scanning" and is now
"Secret Scanning". You may find the 2 names. GitHub scans public commits with
the regex above (actually the limit to at least 130 characters long). For all
tokens identified within a "push" event, they send us reports in bulk. The
format is explained thouroughly in `their doc
<https://docs.github.com/en/developers/overview/secret-scanning-partner-program>`_ as well as
in the `warehouse implementation ticket
<https://github.com/pypi/warehouse/issues/6051>`_.

In short: they send us a cryptographically signed payload describing each
leaked token alongside with a public URL pointing to it.
User-facing documentation about this feature is available here:
`<https://docs.pypi.org/api/secrets/>`_.

How to test it manually
^^^^^^^^^^^^^^^^^^^^^^^

A fake github service is launched by Docker Compose. Head your browser to
A fake token reporting service is launched by Docker Compose. Head your browser to
``http://localhost:8964``. Create/reorder/... one ore more public keys, make
sure one key is marked as current, then write your payload, using the following
format:
Expand All @@ -55,7 +25,7 @@ format:
"url": "https://example.com"
}]
Send your payload. It sends it to your local Warhouse. If a match is found, you
Send your payload. It sends it to your local Warehouse. If a match is found, you
should find that:

- the token you sent has disappeared from the user account page,
Expand All @@ -69,24 +39,17 @@ Content').
Whether it worked or not, a bunch of metrics have been issued, you can see them
in the ``notdatadog`` container log.

GitLab Secret Detection
-----------------------

GitLab also has an equivalent mechanism, named "Secret Detection", not
implemented in Warehouse yet (see `#9280
<https://github.com/pypi/warehouse/issues/9280>`_).

PyPI token disclosure infrastructure
------------------------------------

The code is mainly in ``warehouse/integration/github``.
The code is mainly in ``warehouse/integrations/secrets/``.
There are 3 main parts in handling a token disclosure report:

- The Web view, which is the top-level glue but does not implement the logic
- Vendor specific authenticity check & loading. In the case of GitHub, we check
that the payload and the associated signature match with the public keys
available in their meta-API
- (Supposedly-)Vendor-independent disclosure analysis:
- Vendor specific authenticity check & loading. We check that the payload and
the associated signature match with the public keys available in their
meta-API
- Vendor-independent disclosure analysis:

- Each token is processed individually in its own celery task
- Token is analyzed, we check if its format is correct and if it
Expand Down
1 change: 1 addition & 0 deletions docs/mkdocs-user-docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -95,3 +95,4 @@ nav:
- "api/stats.md"
- "api/bigquery.md"
- "api/feeds.md"
- "api/secrets.md"
125 changes: 125 additions & 0 deletions docs/user/api/secrets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# Secret reporting API

!!! warning "Not publicly available"

Note that this API is only available on a case-by-case basis. Please
contact admin@pypi.org if you would like to integrate with this API.

Third parties integrate with PyPI to find, identify and revoke API tokens that
are accidentally made public. The following partners currently report publicly
exposed API tokens to PyPI:

* https://github.com
* https://deps.dev

All PyPI users that use API tokens are opted into this by default, and no
action is necessary to benefit from this.

This API is for third parties who may find PyPI API tokens and wish to
report them to PyPI.

## Detecting the PyPI secret format

A PyPI API token is a string consisting of a prefix (``pypi``), a separator
(`-`) and a string representing a Macaroon `base64` serialized with
[PyMacaroon]:

pypi-[A-Za-z0-9-_]{85,}

The `base64` string will not be shorter than 85 characters. A token can be
arbitrarily long because we may add arbitrary caveats to the serialized
Macaroon.

## Integrating

PyPI has adopted the [GitHub secret scanning reporting pattern].

### Public key identifier & signature

PyPI expects every request to this API to include two headers:

* A header containing a public key identifier
* A header containing a signature of the raw message body using this key

The names of these headers can be arbitrary and should be provided to PyPI at
integration time. They will be verified for every request.

PyPI assumes that the signature is an ECDSA signature, and that the digest is
SHA-256.

### Public key verification

PyPI expects to be able to verify the public key used to sign the request at a
URL provided at integration time. This URL structure is arbitrary but should
exist at a trusted domain.

Integrating parties should be prepared to provide P-256/384/521 keys, and use
SHA-256 only (not SHA-384 or SHA-512, despite those being common with P-384 and
P-521 respectively).

The response from a GET request to this URL should return a JSON document with
the following example structure:

```json
{
"public_keys": [
{
"key_identifier": "90a421169f0a406205f1563a953312f0be898d3c7b6c06b681aa86a874555f4a",
"key": "-----BEGIN PUBLIC KEY-----\nMFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE9MJJHnMfn2+H4xL4YaPDA4RpJqUq\nkCmRCBnYERxZanmcpzQSXs1X/AljlKkbJ8qpVIW4clayyef9gWhFbNHWAA==\n-----END PUBLIC KEY-----\n",
"is_current": false
},
{
"key_identifier": "bcb53661c06b4728e59d897fb6165d5c9cda0fd9cdf9d09ead458168deb7518c",
"key": "-----BEGIN PUBLIC KEY-----\nMFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEYAGMWO8XgCamYKMJS6jc/qgvSlAd\nAjPuDPRcXU22YxgBrz+zoN19MzuRyW87qEt9/AmtoNP5GrobzUvQSyJFVw==\n-----END PUBLIC KEY-----\n",
"is_current": true
}
]
}
```

Note that more providing more than one key is not necessary. PyPI will not
accept responses for keys that are not marked as current at the time of
disclosure.

## Routes

### Reporting a secret

Route: `POST /_/secrets/disclose-token/`

Accepts a report of one or more arbitrary API tokens, with details on where it
was located. The message body is a JSON array that contains one or more
objects, with each object representing a single secret match.

The keys for each secret match are:

* `token`: The value of the secret match (required)
* `url`: The public URL where the match was found (optional)

Additional fields may be provide but will be ignored.

Example request:

```http
POST /_/secrets/disclose-token/ HTTP/1.1
Host: pypi.org
Some-Public-Key-Identifier: ...
Some-Public-Key-Signature: ...
[
{
"token": "pypi-NMIfyYncKcRALEXAMPLE...",
"url": "https://github.com/octocat/Hello-World/blob/12345600b9cbe38a219f39a9941c9319b600c002/foo/bar.txt",
}
]
```

Status codes:

* `204 No Content` - We acknowledge the request but won't comment on the outcome.
* `400 Bad Request` - The request was in some way malformed and we are unable
to process the report. The token was not disclosed and should be
re-submitted.

[PyMacaroon]: https://pymacaroons.readthedocs.io/
[GitHub secret scanning reporting pattern]: https://docs.github.com/en/code-security/secret-scanning/secret-scanning-partnership-program/secret-scanning-partner-program

0 comments on commit a3e7e6a

Please sign in to comment.