scraper creates "https/" articles directory #1

carlgieringer · 2017-11-01T12:58:16Z

When running the scraper from scratch, there appears a directory articles/https/. There are some articles under this directory, and I don't think they match up with articles not under this directory in the browse view. E.g. articles/https//www.nytimes.com/ don't appear along with articles/www.nytimes.com.

The text was updated successfully, but these errors were encountered:

carlgieringer · 2017-11-02T19:30:05Z

This is due to a legacy artifact in models.Article#filename:

elif ans.startswith('https://'):
            # Terrible hack for backwards compatibility from when https was stored incorrectly,
            # perpetuating the problem
            return 'https:/' + ans[len('https://'):]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scraper creates "https/" articles directory #1

scraper creates "https/" articles directory #1

carlgieringer commented Nov 1, 2017

carlgieringer commented Nov 2, 2017

scraper creates "https/" articles directory #1

scraper creates "https/" articles directory #1

Comments

carlgieringer commented Nov 1, 2017

carlgieringer commented Nov 2, 2017