Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too many attempts to download http://www.bbc.co.uk/news/stories?print=true #2

Open
carlgieringer opened this issue Nov 1, 2017 · 1 comment

Comments

@carlgieringer
Copy link
Member

The scraper errors while trying to download http://www.bbc.co.uk/news/stories?print=true. This doesn't seem to prevent the rest of the scraping from completing.

$ less /tmp/newsdiffs_logging_errs                                                                                                                                                   [ruby-2.4.2p198]
2017-11-01 06:48:12.591:ERROR:Unknown exception when updating http://www.bbc.co.uk/news/stories
2017-11-01 06:48:12.592:ERROR:Traceback (most recent call last):
  File "/Users/tech/code/newsdiffs/website/frontend/management/commands/scraper.py", line 414, in update_versions
    update_article(article)
  File "/Users/tech/code/newsdiffs/website/frontend/management/commands/scraper.py", line 321, in update_article
    parsed_article = load_article(article.url)
  File "/Users/tech/code/newsdiffs/website/frontend/management/commands/scraper.py", line 306, in load_article
    parsed_article = parser(url)
  File "/Users/tech/code/newsdiffs/website/frontend/management/commands/parsers/baseparser.py", line 117, in __init__
    self.html = grab_url(self._printableurl())
  File "/Users/tech/code/newsdiffs/website/frontend/management/commands/parsers/baseparser.py", line 45, in grab_url
    return grab_url(url, max_depth-1, opener)
  File "/Users/tech/code/newsdiffs/website/frontend/management/commands/parsers/baseparser.py", line 45, in grab_url
    return grab_url(url, max_depth-1, opener)
  File "/Users/tech/code/newsdiffs/website/frontend/management/commands/parsers/baseparser.py", line 45, in grab_url
    return grab_url(url, max_depth-1, opener)
  File "/Users/tech/code/newsdiffs/website/frontend/management/commands/parsers/baseparser.py", line 45, in grab_url
    return grab_url(url, max_depth-1, opener)
  File "/Users/tech/code/newsdiffs/website/frontend/management/commands/parsers/baseparser.py", line 45, in grab_url
    return grab_url(url, max_depth-1, opener)
  File "/Users/tech/code/newsdiffs/website/frontend/management/commands/parsers/baseparser.py", line 43, in grab_url
    raise Exception('Too many attempts to download %s' % url)
Exception: Too many attempts to download http://www.bbc.co.uk/news/stories?print=true
@carlgieringer
Copy link
Member Author

One approach was to increase the timeout from 5 to 10 seconds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant