-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update en.wikipedia-on-ipfs.org #61
Comments
I would love to be part of the collaborative cluster, any ETA of that? |
Cool news! Thanks for the work! Maybe setup the cluster now, in the mean time people can join the cluster, while you work in an updated version of the data? :) |
Let's track that in #68 |
Is it possible that this recreation job would be done on a monthly basis? Every month a cron job would execute scaping/downloading + pushing it on IPFS, and then provide a new pin for the collaberative clusters, and update the dnslink to match the new version. This also needs #71, if it's possible to de-duplicate unedited wiki pages between months, the new pin can stay up-to-date while requiring minimal extra data to be pulled. |
Yes, ideally, in short term we would update every time the new snapshot is published by Kiwix. Unfortunately English version is blocked on scraping/downloading step: Upstream Kiwix project is having trouble with generating |
Alright, i'll be keeping an eye out on that issue as well 👍 |
https://download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2020-06.zim.torrent just landed (88GB) If anyone has bandwidth and storage to download it and make a test build using instructions from (I may do it eventually but super thin on free time so no ETA) |
@lidel openzim/mwoffliner#1043 seems to be fixed, would this issue be unblocked? |
@lidel Same remark here, was is still missing? |
I dropped the ball here :( I suspect by now we need to switch scripts from third-party ZIM decompressor @kelson42 if you feel updating old Turkish and English snapshots is worth the effort, we could try to do a manual one-off update. I'll try to allocate some time this week and see if its feasible with updated tooling. But it is not sustainable long-term, for which we need to use ZIMs directly (#42). |
Now that #77 landed I will attempt to build English in the next two weeks and see how it goes. |
I suspected disk fs running out of inodes, but that's not the case 🤔
Most likely sed was generating some IO overhead, so just to remove this brittle breaking point I've tweaked idempotency a bit and disabled debug logs for redirect fix in 84a70b9 Restarted the build. 🤞 I also ordered 1TB SSD, so if this fails again, I might be able to retry on a faster setup. |
Ok, it failed again after ~45h with the same error from sed 😿 |
The sed issue seems to be gone when running on SSD. |
Badger(?) datastore seems to have a problem with many small files, filed #85 to investigate specifics. Next:
|
Good news! I ended up switching to flatfs datastore with Quick notes:
@kelson42 any suggestion here for 💔 ? I want to understand if it makes sense to add code for fixing exceptions as post-unpack step, as it will be a very expensive process given the size of English wiki, or if I should fill an issue in https://github.com/openzim/zim-tools/issues for this. FYSA the way we would like to handle _exceptions like this is is to move |
@lidel I have noe clue why th article "Africa dance" is an exception, so n way to say if this is legit or not. |
/ipfs/bafybeicarbywfeinwuwxcivurnle2mzwue42xr3c3dutrf2mngfun6pdum/wiki/Operating_system (linked to from /ipfs/bafybeicarbywfeinwuwxcivurnle2mzwue42xr3c3dutrf2mngfun6pdum/wiki/Software) does something weird too (might be the same issue as Africa Dance). It shows the standard "folder" view with a single file in it called |
Maybe |
I am sorry I was a bit vague in the problem description. What I meant are issues with pages that have conflict with articles that had
|
@lidel I think all of this is a problem we have identified 10 days ago around the mgmt of articles including a |
@kelson42 I've filled openzim/zim-tools#226 with some ideas, let's discuss there if it is feasible on your end. |
Quick update on updating English to 2021-02: I retried with fix for openzim/zim-tools#227 and unpacking step went ok. 👍 |
I cheated a bit and run Anyway, generated a version with changes from #88:
Give it a try: https://bafybeiehlicfvvqhauuxyj7ghspu63uv7vlq224aqytcfv5frewve5jxoq.ipfs.dweb.link/wiki/ |
FYI I'm taking |
#89 is fixed. Give a new build a try: 👉 https://bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze.ipfs.dweb.link/wiki/ Lmk if you find any broken articles or unexpected behaviors. 🙏 |
en.wikipedia-on-ipfs.org
(Create DNSLink for en.wikipedia-on-ipfs.org infra#491)This could be done manually or as a part of #58
The text was updated successfully, but these errors were encountered: