Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requests 403 Client Error #11763

Open
bjlittle opened this issue Nov 13, 2024 · 23 comments
Open

Requests 403 Client Error #11763

bjlittle opened this issue Nov 13, 2024 · 23 comments
Labels
Needed: more information A reply from issue author is required Support Support question

Comments

@bjlittle
Copy link

Details

Expected Result

We've successfully been using pooch to download various external assets required to build the sphinx-gallery of our documentation.

However, we're now getting a 403 Client Error. Has there been a very recent RTD server-side change that may be causing this?

Actual Result

For further details see https://readthedocs.org/projects/geovista/builds/26264004/

@humitos
Copy link
Member

humitos commented Nov 13, 2024

It seems it's getting 403 when trying to download https://raw.githubusercontent.com/bjlittle/geovista-data/2024.10.2/assets/natural_earth/physical/ne_coastlines_10m.vtk.bz2. However, I'm able to download that file without issues.

I'd say it was a temporary error on the GitHub side. This doesn't seems related to Read the Docs.

@bjlittle
Copy link
Author

Thanks @humitos for getting back so quickly 💯

I'm able to successfully wget this file and also use pooch to download this file too.

How are you replicating this issue on your side? And what are you getting? 403 also?

@humitos
Copy link
Member

humitos commented Nov 13, 2024

How are you replicating this issue on your side?

Just clicking on that link, it works and download the file.

@bjlittle
Copy link
Author

Okay, I think this is a rate limiting related issue on the GH server side when pulling assets from GH to RTD.

I'm just going to close this issue, thanks again @humitos 🍻

@kmuehlbauer
Copy link

This is a persisting issue and is affecting more packages which build docs on rtd an use pooch to retrieve assets. It's hard to debug if this only happens on rtd and not locally or in other setups. What I can tell, it already fails on the first fetch of an asset, so I doubt the rate limiting theory.

Here are two more links to very recent issues:

@humitos How would we get to the bottom of this? Do you see a way to debug this from your side. I was assuming some other dependency issue but did not spot any recent changes so far. One problem for debugging from user side is the missing mamba environment listing (can we activate that somehow?).

@humitos
Copy link
Member

humitos commented Nov 14, 2024

@kmuehlbauer I don't have a different way to debug this from my side. I would recommend you first to create a minimal reproducible example that generates the issue --outside the environment of your project.

One problem for debugging from user side is the missing mamba environment listing (can we activate that somehow?).

What is "mamba environment list"? If I understand correctly you refer to the list of packages installed in the environment. If that's correct, you can run mamba list using https://docs.readthedocs.io/en/latest/build-customization.html#extend-the-build-process

@kmuehlbauer
Copy link

Thanks @humitos, that's helpful. Would you mind opening the issue again, until the root cause is found?

@humitos humitos reopened this Nov 14, 2024
@kmuehlbauer
Copy link

@humitos I've distilled the issue to just use https://github.com/readthedocs/tutorial-template and requests together with a readthedocs github resource.

Pull Request

https://github.com/kmuehlbauer/pooch_rtd_issue/pulls

Build logs

https://app.readthedocs.org/projects/pooch-rtd-issue/builds/26278637/

Code

output_file = open("output_file.nc", "w+b")
url = "https://github.com/readthedocs/readthedocs.org/raw/refs/heads/main/docs/dev/code-of-conduct.rst"
print("downloading: ", url)
try:
    response = requests.get(url, timeout=30, allow_redirects=True)
    response.raise_for_status()
    output_file.write(response.content)
finally:
    output_file.close()

Something is broken between RTD and GitHub and this is as far as I can get. I'd appreciate of you could sort this out with GitHub, as this seems to be a problem of RTD builds. Thanks!

@ericholscher
Copy link
Member

Hrm, I just tried to reproduce this on our build servers with a shell, and I got a 200:

docs@build-default-i-00923ec13a48b3b12(org):~/checkouts/readthedocs.org$ python
Python 3.10.12 (main, Sep 11 2024, 15:47:36) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>> 
>>> output_file = open("output_file.nc", "w+b")
>>> url = "https://github.com/readthedocs/readthedocs.org/raw/refs/heads/main/docs/dev/code-of-conduct.rst"
>>> print("downloading: ", url)
downloading:  https://github.com/readthedocs/readthedocs.org/raw/refs/heads/main/docs/dev/code-of-conduct.rst
>>> try:
...     response = requests.get(url, timeout=30, allow_redirects=True)
...     response.raise_for_status()
...     output_file.write(response.content)
... finally:
...     output_file.close()
... 
4401
>>> print(response.status_code)
200

So it isn't something that's totally broken 🤔

@ericholscher
Copy link
Member

It looks like other folks have been having a similar issue: https://stackoverflow.com/questions/39907742/github-api-is-responding-with-a-403-when-using-requests-request-function

Is it related to this perhaps?

@ericholscher
Copy link
Member

I'm wondering if this was a temporary networking issue or something, since I can't seem to reproduce it on our build servers at all. Curious if you rebuild your test repo if it still fails?

@ericholscher
Copy link
Member

Hrm, I was able to reproduce it in the build... https://app.readthedocs.org/projects/eric-pooch-rtd-issue/builds/26282150/

@ericholscher
Copy link
Member

I updated it to print out the error: https://app.readthedocs.org/projects/eric-pooch-rtd-issue/builds/26282177/

51 | <h1>Access to this site has been restricted.</h1>
52 |  
53 | <p>
54 | <br>
55 | If you believe this is an error,
56 | please contact <a href="https://support.github.com">Support</a>.
57 | </p>

I guess we need to contact GitHub support 🙃

@jrbourbeau
Copy link

👋 I ran into the same problem over in the Dask docs (xref dask/dask#11522). Stack overflow suggested setting a custom User-Agent in the header (happened here dask/dask-sphinx-theme#91), which seems to have fixed things (docs build is passing again).

Though fixing things on the GitHub side would be much more convenient : )

@ericholscher
Copy link
Member

ericholscher commented Nov 14, 2024

Yea, looks like the issue is the lack of a user agent. When I updated the example to pass a user agent, it works:

https://github.com/ericholscher/pooch_rtd_issue/blob/768198a654f68bedd85d30854a7a7a9af893ba4a/docs/source/conf.py#L42-L50

https://app.readthedocs.org/projects/eric-pooch-rtd-issue/builds/26282216/

Guessing this is GH getting hammered by AI bots, and restricting requests without agents, like the rest of us.

@kmuehlbauer
Copy link

Thanks @ericholscher for testing it and @jrbourbeau for the solution. It's a pity that RTD and GitHub have to take these countermeasures, but totally reasonable and understandable.

Thanks a bunch ❤️

@kmuehlbauer
Copy link

I updated it to print out the error: https://app.readthedocs.org/projects/eric-pooch-rtd-issue/builds/26282177/

51 | <h1>Access to this site has been restricted.</h1>
52 |  
53 | <p>
54 | <br>
55 | If you believe this is an error,
56 | please contact <a href="https://support.github.com">Support</a>.
57 | </p>

@ericholscher Would it make sense to come up with a solution to provide some auth token to the request? At least for those builds which have been triggered/authorized with GitHub this might be possible for requests which reach out to GitHub resources. Any thoughts?

@mathause
Copy link

For pooch the you have to define a downloader:

downloader = pooch.HTTPDownloader(headers={"User-Agent": "agent"})


return REMOTE_RESSOURCE.fetch(name, downloader=downloader)

raphaelshirley pushed a commit to lephare-photoz/lephare that referenced this issue Nov 15, 2024
hub is requireing a custom user now to prevent getting spammed.

see readthedocs/readthedocs.org#11763
RondeauG added a commit to hydrologie/xhydro that referenced this issue Nov 20, 2024
<!-- Please ensure the PR fulfills the following requirements! -->
<!-- If this is your first PR, make sure to add your details to the
AUTHORS.rst! -->
### Pull Request Checklist:
- [x] This PR addresses an already opened issue (for bug fixes /
features)
  - This PR fixes #xyz
- [x] (If applicable) Documentation has been added / updated (for bug
fixes / features).
- [x] (If applicable) Tests have been added.
- [x] CHANGELOG.rst has been updated (with summary of main changes).
- [x] Link to issue (:issue:`number`) and pull request (:pull:`number`)
has been added.

### What kind of change does this PR introduce?

* Fixes an issue related to `pooch` in recent ReadTheDocs builds.

### Other information:

- See readthedocs/readthedocs.org#11763
Zeitsperre added a commit to Ouranosinc/xclim that referenced this issue Nov 20, 2024
### What kind of change does this PR introduce?

* Overloads the `fetch()` method of `nimbus()` to add a `UserAgent`,
thus preventing requests from ReadTheDocs from being forbidden by
GitHub.
* Fixes up the logic for fetching the `registry.txt` files and testing
data from non-`Ouranosinc/xclim-testdata` repositories that follow the
same conventions (forks, `xhydro-testdata`, etc.).

### Does this PR introduce a breaking change?

Not really. The `fetch` calls have been modified and the registry files
for non-`Ouranosinc/xclim-testdata` files are now saved to the testing
folder with the following convention:
`registry.{repo-name}.{branch-name}.txt`.

### Other information:

readthedocs/readthedocs.org#11763
jpmckinney added a commit to open-contracting/extension_registry.py that referenced this issue Nov 25, 2024
jpmckinney added a commit to open-contracting/sphinxcontrib-opencontracting that referenced this issue Nov 26, 2024
jpmckinney added a commit to open-contracting/sphinxcontrib-opencontracting that referenced this issue Nov 26, 2024
znichollscr added a commit to climate-resource/input4mips_validation that referenced this issue Nov 26, 2024
znichollscr added a commit to znichollscr/cf-xarray that referenced this issue Nov 26, 2024
Zeitsperre added a commit to hydrologie/xhydro that referenced this issue Nov 27, 2024
### What kind of change does this PR introduce?

* Adds a user-agent to the pooch call in order to deal with forbidden
remote requests
* Fixes the URL joining logic in `load_registry` and `deveraux`
* Removes an unused import in `conftest.py`

### Does this PR introduce a breaking change?

No.

### Other information:

readthedocs/readthedocs.org#11763
@ssbarnea
Copy link

ssbarnea commented Dec 3, 2024

I think that something changed recently that is causing this failure. My RTD_TOKEN was not updated/reset in 7 months, but is now failing with 400 error. https://github.com/ansible/ansible-lint/actions/runs/12146553571/job/33870743030#step:6:52

requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://readthedocs.org/api/v3/projects/ansible-lint/redirects/

Also filed at nextstrain/readthedocs-cli#8

@ericholscher
Copy link
Member

@ssbarnea That is a different issue, so you should open a new issue with it. If you can share an example request, along with the data passed in, that would be really helpful in debugging it on our side.

@agjohnson
Copy link
Contributor

@ssbarnea This might be closer to this issue:

However, I believe if it is you would get a 403 not a 400 response.

@ericholscher
Copy link
Member

Yea, I looked at the logs and the 400 is coming from the origin, which would be our application. So it's likely that something changed in the validation logic there or something. I'm guessing there might have been a new required field added or something, but I don't remember anything obvious that we've changed recently.

@amotl
Copy link

amotl commented Dec 9, 2024

Hi. We also have been tripped by this problem, running a particular job that talks to the RTD HTTP API from a Python EOL environment using ancient dependencies.

Fortunately, considering those information bits and blueprints (thanks a stack, @tsibley and @joverlee521!),

we have been able to likewise update versions of the urllib3 and requests packages, which promptly also fixed the problem for us. ✨

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needed: more information A reply from issue author is required Support Support question
Projects
None yet
Development

No branches or pull requests

9 participants