-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only keep records that have been indexed to a catchment/flowline ID. #220
Comments
Will tackle this issue this week. The prerequisite will depend on getting a fresh copy of the demo database to be sure I'm working against the current standard schema and content.
As we add configuration options, it may be worth discussing how the crawler is invoked. Right now, it is run from the Linux command line, with |
I think I may have misunderstood what has been happening with the crawler source table.... I just pulled a fresh copy of the Of specific interest to me are the suffixes and the crawler source ID integers. Were those integers going to be re-ranged starting from 1 and no skips? |
I have ported the logic from the java crawler into python. Mostly, this is just arranging different framing around the SQL lifted directly from the java repo. It is a minor security risk to allow "raw" SQL to execute (injection concerns)... but this code has very limited ability for users to affect the variables, so it is reasonably insulated from such attacks. In terms of testing -- I was only able to match three features from source 11 ( geoconnex contribution demo sites) against the NHD data in the NHDplus artifact at https://github.com/internetofwater/nldi-db/releases/download/artifacts-2.0.0/ Looking for domain experts to help me understand if that is the expected result. @dblodgett-usgs I drop all ingested features with COMID=0 after the crawl. This is not optional (yet). |
Currently, the crawler keeps all records that are read in whether they get indexed or not. The crawler should operate exclusively where it only keeps data that indexes to a comid.
When a crawl finishes, no rows with NULL comids should remain in the NLDI database. This could be made configurable but default to drop un-indexed features.
The text was updated successfully, but these errors were encountered: