You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm very glad to learn that you're finding use for monolith! WARC can be simply done, I'll likely implement it around the same time as MHTML. The long story short, I'll make monolith first crawl the target document, download all assets into a store of sorts (cache), and then either build a monolithic HTML, MHTML, or WARC. This way it won't require too much redundant code, and will essentially be the same process for every output format. The first step right now is to revamp the caching mechanism, I'll work on it ASAP.
Hi there @hugo-akaora, thank you for the link! It's in Python, but I'll use it as a reference, seems like a straightforward format.
Hi
@ll
.I have been using
monolith
more and more for webpage capture but couldn't find a way to make downloads in WARC format (as documented at https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.0/).I believe such an option would greatly enhance the reach of
monolith
as a general purpose utility.Anyways thanks for your great work as it is. 😎
The text was updated successfully, but these errors were encountered: