-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recommended alternative? #28
Comments
https://covid.ourworldindata.org/data/owid-covid-data.csv https://github.com/CSSEGISandData has some data split out by province/state (for Canada, Australia, China, etc), which I didn't want. I suppose I could code something to come up with totals for those countries. |
I think I'm going to use this source.. in particular, probably https://datahub.io/core/covid-19/r/countries-aggregated.csv since I don't need province/state data.. I had issues with John Hopkins data before which is why I switched to this repo, my suggestion claims to have cleaned up the messy bits in JH data. Need to update my loader program and deal with possible country name issues today. Cheers, Ian |
FWIW, I made a (very simple) converter https://github.com/kallewoof/covid19-csv-converter between the old format (John Hopkins IIRC) and this one, and I will probably add another mode for the covid.ourworldindata.org variant soon, since this one also seems to have gone under.. |
Looks like even after their clean-up, there are still strange bumps in the data. Guess I will just have to live with it. Cheers, Ian |
@iandoug I'm not super happy with the owid dataset, so I am probably going to switch to the datasets one. Could you work around the strange bumps by using this dataset and append only the missing data? |
Mmmnnn.. that's an idea I didn't think of. I'm a bit reluctant though, because THIS repo used end-of-day around midnight GMT (or maybe 2am, never could figure it out, I fetched at 4am GMT) and datasets/John Hopkins uses ((I think) midnight Eastern Standard time as their cut-off point. So "cases on 2020-xx-yy" is going to differ between the two sets, making a merge tricky. I see "datasets" has not updated since yesterday, and several closed tickets on their repo about it NOT updating in the past, so that's a bit worrying in terms of reliability. I switched from JH data long ago because they had so many issues and kept changing their file layouts etc. Regarding the bumps, given the number of sites using datasets data, you'd think they would have sorted it out by now. :-( Let me ponder your idea a bit more. I had to fix these country names between this repo, datasets, and my names, your fix list may be similar or not.
Cheers, Ian |
Hi Ian, Yeah, I think I followed your exact foot steps. It's still a rough proof of concept, but I have a tool to convert between these here: https://github.com/kallewoof/csvman To get the github.com/datasets/covid-19.git data set into the ulklc format, clone the above, then:
It's still a WIP but yeah, it supports fixing names and such manually. I've got part of the ones you listed but will add the others. Also, not sure what you mean by the dates being 1 off -- are the actual dates in the file showing for one day earlier/later depending on the set?? Edit: I don't see several of the country name differences that you are listing (e.g. both this repo and the datasets/covid-19 one use "China", "Kyrgyzstan", "Kazakhstan", ...). |
It depends on when countries release their figures, and when the various sites process the numbers. site 1 : day ends at midnight GMT so figures released at 2am GMT is going to be on different days in each data set. datasets data is a mess around 13-14 December because the Turkey figure is wrong. I did raise it as an issue but it looks like can't fix/won't fix because that's what they get from JH. Which is exactly the kind of reason I stopped using JH in the first place. What also bothers me is the huge discrepancy between their numbers and WorldoMeter ... eg yesterday WoM 89,343,183, datasets 88,860,500, about half a million less.There used to be around 10-40k difference before which I accepted as end-of-day differences. Still hoping ulklc will resurface. Cheers, Ian |
Hi
Anyone able to recommend an alternative data feed?
Thanks, Ian
The text was updated successfully, but these errors were encountered: