You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Scraping all of these would be incredibly time demanding, so it might be nice to provide an adjustable limit.
I also noticed when scrolling through the artworks index to gather the urls, the page gets incredibly slow due to loading all the artworks. It might be nice to sort by date and scroll through every two years (with the url ending in: &date=1954).
The text was updated successfully, but these errors were encountered:
Hi @neenkah, thanks for bringing this to our attention.
Only the first 5000 artwork urls get saved to works.txt
Do you know what exactly happens when you hit the 5000 artworks limit while scraping? Does the right arrow disappear, is it no longer clickable, or something else entirely?
Scraping all of these would be incredibly time demanding, so it might be nice to provide an adjustable limit.
What exactly do you mean? Do you mean that we should scroll through to (much) fewer than 5000 artworks at a time?
It might be nice to sort by date and scroll through every two years (with the url ending in: &date=1954).
Collecting the artworks by year is a good suggestion. However, an artist like Alfred Eisenstaedt with ~200,000 paintings might well have >5000 artworks / year in their most prolific years.
I have never seen the moment 5000 artworks are scraped, so I cannot provide any details about that.
About the adjustable limit, I indeed meant fewer artworks in the case that someone using the scraper might prefer to have coverage of all artists, but not all artworks within a limited scrape timeframe. I think it could be a nice feature to have, but it might not be feasible to check this whilst scraping...
Hi @modhurita,
Only the first 5000 artwork urls get saved to
works.txt
for artists with many artworks (photographers) such as Gordon Parks, Alfred Eisenstaedt or Carl Mydans.Scraping all of these would be incredibly time demanding, so it might be nice to provide an adjustable limit.
I also noticed when scrolling through the artworks index to gather the urls, the page gets incredibly slow due to loading all the artworks. It might be nice to sort by date and scroll through every two years (with the url ending in:
&date=1954
).The text was updated successfully, but these errors were encountered: