Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Reject non-HTML instead of accepting only HTML
Trying to accept only files that end in .html causes problems when: 1. Links on a page don't end in a trailing slash (e.g. /foo/bar), and wget interprets the link of being of type "bar", and thus rejects it. 2. Long URLs get truncated when saved as files and thus don't end in .html. These get deleted by wget. This change restores old behavior that provided an explicit rejectlist instead of only accepting html. This is a little suboptimal; it would be nice not to have to list out a potentially-ever-growing list of file extensions, but I'm not sure of a better way to accomplish what we want.
- Loading branch information