Skip to content
This repository has been archived by the owner on Feb 1, 2022. It is now read-only.

Handle parsing error #70

Open
dasmur opened this issue Jun 19, 2020 · 2 comments
Open

Handle parsing error #70

dasmur opened this issue Jun 19, 2020 · 2 comments

Comments

@dasmur
Copy link
Contributor

dasmur commented Jun 19, 2020

Component
crawler

Problem
I tried to get an overview of current status of the crawler regarding the number of successfully parsing districts. By randomly running some scripts, I already noticed some scripts which are not able to extract the current numbers of cases (probably due to changes to the corresponding website structure). In order to identify failing scripts, it would be nice to have some kind of common error signalling.

Suggestion
My first idea is based on UNIX exit codes, by simply return 1 if the parser is not able to extract the data.

I already included this approach in one script which I will link to this issue.

dasmur added a commit to dasmur/corona_landkreis_fallzahlen_scraping that referenced this issue Jun 19, 2020
The current implementation simply crashes in the case of
the regex not matching anything. This commit checks the regex match
(at least for `status`) and uses the UNIX exit code to signal
parsing errors to the caller (e.g. `run.py`).

See also: corona-zahlen-landkreis#70
@dasmur
Copy link
Contributor Author

dasmur commented Jun 19, 2020

Of course, even within this script, there could occur parsing errors in subsequent parts of the code, but it should be enough to get the idea.

If this would be introduced into all scripts, it would be quite easy to get an overview.

@dasmur
Copy link
Contributor Author

dasmur commented Jun 24, 2020

Ok, while my suggestion (using UNIX exit codes to early exit failing parsers) might be a good thing to improve the overall coding style, it is not really necessary to answer the question:

Q: How many scripts are currently able to extract district case numbers?

The answer is 23 of the 62 are running without errors (defined by an exit code of 0) or in other words, currently 39 parsers are failing with an exit code of 1.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant