Too many attempts to download http://www.bbc.co.uk/news/stories?print=true #2

carlgieringer · 2017-11-01T13:02:50Z

The scraper errors while trying to download http://www.bbc.co.uk/news/stories?print=true. This doesn't seem to prevent the rest of the scraping from completing.

$ less /tmp/newsdiffs_logging_errs                                                                                                                                                   [ruby-2.4.2p198]
2017-11-01 06:48:12.591:ERROR:Unknown exception when updating http://www.bbc.co.uk/news/stories
2017-11-01 06:48:12.592:ERROR:Traceback (most recent call last):
  File "/Users/tech/code/newsdiffs/website/frontend/management/commands/scraper.py", line 414, in update_versions
    update_article(article)
  File "/Users/tech/code/newsdiffs/website/frontend/management/commands/scraper.py", line 321, in update_article
    parsed_article = load_article(article.url)
  File "/Users/tech/code/newsdiffs/website/frontend/management/commands/scraper.py", line 306, in load_article
    parsed_article = parser(url)
  File "/Users/tech/code/newsdiffs/website/frontend/management/commands/parsers/baseparser.py", line 117, in __init__
    self.html = grab_url(self._printableurl())
  File "/Users/tech/code/newsdiffs/website/frontend/management/commands/parsers/baseparser.py", line 45, in grab_url
    return grab_url(url, max_depth-1, opener)
  File "/Users/tech/code/newsdiffs/website/frontend/management/commands/parsers/baseparser.py", line 45, in grab_url
    return grab_url(url, max_depth-1, opener)
  File "/Users/tech/code/newsdiffs/website/frontend/management/commands/parsers/baseparser.py", line 45, in grab_url
    return grab_url(url, max_depth-1, opener)
  File "/Users/tech/code/newsdiffs/website/frontend/management/commands/parsers/baseparser.py", line 45, in grab_url
    return grab_url(url, max_depth-1, opener)
  File "/Users/tech/code/newsdiffs/website/frontend/management/commands/parsers/baseparser.py", line 45, in grab_url
    return grab_url(url, max_depth-1, opener)
  File "/Users/tech/code/newsdiffs/website/frontend/management/commands/parsers/baseparser.py", line 43, in grab_url
    raise Exception('Too many attempts to download %s' % url)
Exception: Too many attempts to download http://www.bbc.co.uk/news/stories?print=true

The text was updated successfully, but these errors were encountered:

carlgieringer · 2017-11-02T19:29:34Z

One approach was to increase the timeout from 5 to 10 seconds.

carlgieringer added bug high priority labels Nov 1, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Too many attempts to download http://www.bbc.co.uk/news/stories?print=true #2

Too many attempts to download http://www.bbc.co.uk/news/stories?print=true #2

carlgieringer commented Nov 1, 2017

carlgieringer commented Nov 2, 2017

Too many attempts to download http://www.bbc.co.uk/news/stories?print=true #2

Too many attempts to download http://www.bbc.co.uk/news/stories?print=true #2

Comments

carlgieringer commented Nov 1, 2017

carlgieringer commented Nov 2, 2017