-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dedicated repository to store examples of pre-compiled datasets #5
Comments
A lot of some reference files already were done at https://github.com/HXL-CPLP and https://github.com/EticaAI/HXL-Data-Science-file-formats/tree/main/ontologia. Except that reference files started to get too big to store with the HXL-Data-Science-file-formats. Another major point (which actually is not about code at all) is decision of how to give a numeric number for codes which do not have such. Such reversible algorithm actually would be pretty common need, but this is a future issue. |
…caAI/HXL-Data-Science-file-formats/bin/hxl2example)
…dard tools and then hxl2numerordinatio.py become pretty simple
…sv, 1603.47.15924.tsv
Repository https://github.com/EticaAI/n-data renamed to https://github.com/EticaAI/lsf-cache |
Even if we do not do something such as precompile all public UN P Codes (here not focused on GIS, but their metadata, such as name) we would still need data tables.
These data tables (even the most basic) would start to overload the history of this main repository (which is more focused on documentation and reference code). So an alternative would create a different repository, share some short URL, and then use that repository as base.
Advantages
make easier for transition between online and offline
By storing the data on another repository, we already can make some minimal checks on the main interface to detect if the user is using something such as http://localhost/numerordinatio instead of https://numerordinatio.etica.ai.. . Not really sure how to handle the files loaded from CDNs (such as the bootstrap CSS and JavaScript) but if loading from localhost, then we could try search the datasets with relative paths from something like:
With data already on a dedicated repository this makes it easier to download and put on an USB stick or something. Also, most ideal users to compile new work in the future may already have help from others to deliver most data already packed.
"offline" access not just for privacy
One reason to have an alternative to local from localhost is not even mere privacy or go full offline, but actually reduce internet usage. Depending of how well optimized the interface becomes, each time a user makes a force reload, this could easily keep downloading from the internet several small files. For example I'm not fully aware how many megabytes all entire world PCodes (without geometry) could take, but this could easily waste a lot of bandwidth.
Another potential advantage of this approach is that for tables which already are not more automated, if a user need to edit something, can do this with an code editor (such as Viscose opening the folder with all datasets) and then reload the main interface to see if abstract syntax tree makes sense.
Disclaimer: on the history of the different repository
The dedicated repository is mostly simplified free static file hosting. The GitHub history may be cleaned from time to time to save space.
Also, even operations which already are not automated (such as using GitHub Actions to pull data from other places) we're likely to commit as bot account such as the @eticaaibot
The text was updated successfully, but these errors were encountered: