Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to download images #26

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Conversation

64bitpandas
Copy link

I'm using this tool to mirror some of my Substack posts to my website, and as part of that process I'd really like to host my own images instead of having them link to the Substack CDN!

In case this will help someone else, here's a PR 🙂

Here's a list of some tweaks I made to get that to happen:

  • Add an --images flag that will download images for all posts being scraped into a substack_images/ folder
  • Add an option to download a single post (by passing in a --url in the format https://example.substack.com/p/postname
  • When downloading images, Substack nests them like [![alt](/path)](/path). Change these to just be ![alt](/path) so clicking on the images doesn't link to itself.
  • Add some tests, to prove to myself this code works the way I expect it

As a bonus, the progress bars reflect image downloads (since they can take a while)! As an example:

Scraping posts: 100%|██████████| 2/2 [00:30<00:00, 15.00s/post]
  Downloading images for test-post: 100%|██████████| 7/7 [00:14<00:00, 2.00s/image]
  Downloading images for another-post: 100%|██████████| 4/4 [00:08<00:00, 2.00s/image]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant