Releases: gleanerio/gleaner
IGSN Sprint Release
This is a release for Gleaner based on some updates for the IGSN2040 multi-week sprint. This sprint is detailed elsewhere but was focused on testing the structured data on the web pattern for the IGSN PID architecture.
Some improvements here include:
- gzipped sitemap support (by popular request)
- improved headless support for dynamic JSON-LD injection (headless support is still rather implementation focused... needs to be more generic)
- improved performance for large resources counts
- better object store layout
- now JSON-LD 1.1 based for better support of more advanced context patterns
- thread count now controlled from config file
- delay for indexing calls can be set in config file
- SHACL service URL now set in config (allows use of cloud based SHACL services)
- general performance improvements along the way ( fixed some bad code loops) ;)
Note this version will break on old config file formats.. be sure to add in the thread and delay params in the config file. I'll fix that in later versions.
Onebucket MkIII
Notes for the screen cast
Screen cast video is at: https://www.youtube.com/watch?v=12figImXgDk
I made a few small changes in some directory / bucket locations and file names. So there will be a few small differences between the video and this release.
Get the files we need from the GitHub repo releases section
https://github.com/earthcubearchitecture-project418/gleaner/releases
Make and set a directory for the data volume for the Docker containers if you are using those.
Examples would be:
mkdir /home/tmp/dv
export DATAVOL=/home/tmp/dv
Need to grab any context files we use
Just schema.org for now. Note, not required but highly recommended
curl -L -H "Accept: application/ld+json" -H "Content-Type: application/ld+json" https://schema.org > jsonldcontext.jsonld
Minio client (or use your web browser)
Ref: https://docs.min.io/docs/minio-client-complete-guide.html
wget https://dl.min.io/client/mc/release/linux-amd64/mc
chmod 755 mc
./mc config host add minio http://0.0.0.0:9000 gleaneraccess gleanersecret --api S3v4
After running gleaner you can look for the output graphs and load the data into Jena
./mc cat local/gleaner/results/runid/samplesearth_graph.nq | curl -X POST --header "Content-Type:application/n-quads" -d @- http://localhost:3030/demo/data
The great burn MkII
Missed some edits needed for the demo
Been a while....
This release is the result some effort to start REMOVING things from Gleaner. Gleaner has been a bit of a playground for me and as such it started to bloat and worse, break, due to this. I'm moving my playing elsewhere and getting Gleaner back to focused on just harvesting and validating JSON-LD data graphs from the web.
This is a release from branch onebucket where I am changing gleaner to work with a single bucket (S3, Minio, Google Cloud, etc) and use object prefixes from there.
I'm not merged this yet as there is some work yet to do to remove deprecated code and resolve some code duplication that occurred during this process.
The great burn
Been a while....
This release is the result some effort to start REMOVING things from Gleaner. Gleaner has been a bit of a playground for me and as such it started to bloat and worse, break, due to this. I'm moving my playing elsewhere and getting Gleaner back to focused on just harvesting and validating JSON-LD data graphs from the web.
This is a release from branch onebucket where I am changing gleaner to work with a single bucket (S3, Minio, Google Cloud, etc) and use object prefixes from there.
I'm not merged this yet as there is some work yet to do to remove deprecated code and resolve some code duplication that occurred during this process.
The Old Dutch Church release
This release updates the code to address some issues with index sites that dynamically place the JSON-LD into the page DOM.
These sites use Javascript to call back to a server and obtain the JSON-LD. The DOM is then updated with this material. To process these a service must be in place that allows Gleaner to render the page, thus processing the JS and updating the DOM with the JSON-LD
An error in the docker compose file, an update to chrome sandboxing and a "bug" from P418 days of the code all combined to make this not work. Actually, each one did that.. I just had several issues at once to have redundancy in failure.
This is an updated release that I hope resolves all these.
Pilot take 2
Forgot to update the zip file with the new compose file. Found during filming.. take 2
Hollywood Pilot Edition
This is a roll up of some of the updates made during a CODATA meeting. It is also the basis for the first draft of the getting started documentation.
Been a while.. regression fix
Sorry for the long time to this point. I've been trying to do a major code reduction. I've been removing a lot of code and trying to replace some things with better community libraries. For example, I now use Viper for the config file management. I have been looking at replacing the sitemap code with a community package, but I need to verify it can address some edge cases we have with sitemaps and robots files.
I'm also looking to add back in the ability to read config files from the object store and use that to allow me to run the gleaner binary as a CLI from a docker image. You can do this now but it's a bit tricky to pass in the config file at the command line to a container. So I want to make that easier. At that point we would be able to deploy the entire system as a docker compose file.
It will NOT be long till the next release. I plan to push them out better going forward.
Are We There Yet?
An updated version I'm using to help build out the Gleaner demo that will be at the EarthCube Annual Meeting in Denver June 2019.
The "not quite ready" release
This release is "runnable" external to my setup. However, it lacks documentation to let anyone understand it. So I guess it's "not quite ready".
Testing the release process to move toward that. 2.0.4 should be a first cut.
The basic steps to running will be.
- Using docker to bring up the supporting images
- Setting your environment variables for connecting to those containers
- Ensuring the needed buckets and config file are ready and present (the code will do a sanity check and help with that
- Download the binary from the release and run with required flags.
We're almost there!