Skip to content

Calculate statistics for use in a Service Description or Void file

License

Notifications You must be signed in to change notification settings

DDMAL/void-generator

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A genearl introduction of "VoID-generator"

  1. VoID stands for "Vocabulary of Interlinked Datasets". It is a specification for describing the metadata of RDF datasets. For example, in our LinkedMusic datalake, there are various RDF graphs in one SPARQL endpoint. VoID can serve to describe the metadata for the different graphs and their interrelationships. This is useful for managing our datalake as well as play a role in the workflow to faciliate NLQ2SPARQL using LLMs, particularly to generate "federated SPARQL query" (across different databases).
  2. This is transferred from https://github.com/JervenBolleman/void-generator, which is referenced from the paper LLM-based SPARQL Query Generation from Natural Language over Federated Knowledge Graphs (https://arxiv.org/abs/2410.06062).

[Junjun Cao (junjun.cao@mail.mcgill.ca) commented]

A Detailed Statistics generator for a VoID file

This CLI tool will automatically generates VoID description as RDF for a public endpoint given its URL.

Build

Building uses java 17 and maven 3.6+

mvn package

Or you can use one of the pre build releases.

How to use

  1. Always have permission from the endpoint hoster to run this program
  2. Always run this locally and directly on the endpoint, without cache/proxy servers in between
  3. If using virtuoso connecting via jdbc is much faster than connecting via http sparql protocol
  4. Check the help option for exact options.

Runs on a java 17+ jdk, see the help for how to use

java -jar target/void-generator-*uber.jar --help

Running against a local Virtuoso endpoint

java -jar target/void-generator-*uber.jar \
    --user dba \
    --password dba \
    --virtuoso-jdbc=jdbc:virtuoso://localhost:1111/charset=UTF-8 \ # note the localhost and "isql-t" port
    -r "https://YOUR_SPARQL_ENDPOINT/sparql" \
    -s void-file-locally-stored.ttl \
    -i "https://YOUR_SPARQL_ENDPOINT/.well-known/void"

The IRI is supposed to be the URL of the published location of the void file/service description.

Running against a regular SPARQL endpoint

For any endpoint supporting the SPARQL protocol.

Example given here for the WikiPathways SPARQL endpoint:

java -jar target/void-generator-*-uber.jar -r https://sparql.wikipathways.org/sparql \
   -p https://sparql.wikipathways.org/sparql \
   --void-file void-wikipathway.ttl \
   --iri-of-void 'https://rdf.wikipathway.org/.well-known/void#' \
   -g http://rdf.wikipathways.org/

What about all the options?

The command line options are there to turn off certain queries/void features that may not run in a reasonable time on specific endpoints. The default should be not to use any options.

Structure of a void file

General advice

When looking to generate shapes or code from a VoID file the main thing to look for are the void:classPartitions. For most shape and programming languages when generating code you would want one shape or object oriented class for each of them.

These void:classPartitionss will have void:predicatePartitions and void:datatypePartitions. The predicatePartions will lead to other resources (objects that are denoted by an IRI or BNode) while the datatypePartitions will lead to literal values.

The void:predicatePartitions are objects of a triple where a void:subjectTarget will be that triples predicate. The subject of that triple will also be a type void:LinkSet and the void:objectTarget will point to a different object of a void:classPartition triple.

Logging for debugging

-Dorg.slf4j.simpleLogger.defaultLogLevel=debug -Dorg.slf4j.simpleLogger.showDateTime=true -Dorg.slf4j.simpleLogger.log.org.apache=info

SPARQL queries to retrieve VoID descriptions

Once the VoID description turtle file has been generated you can upload it to your endpoint and retrieve its information with the SPARQL queries below.

Without subject/objects count:

PREFIX up: <http://purl.uniprot.org/core/>
PREFIX void: <http://rdfs.org/ns/void#>
PREFIX void-ext: <http://ldf.fi/void-ext#>
SELECT DISTINCT ?subjectClass ?prop ?objectClass ?objectDatatype
WHERE {
  {
    ?cp void:class ?subjectClass ;
        void:propertyPartition ?pp .
    ?pp void:property ?prop .
    OPTIONAL {
        {
            ?pp  void:classPartition [ void:class ?objectClass ] .
        	
        } UNION {
            ?pp void-ext:datatypePartition [ void-ext:datatype ?objectDatatype ] .
        }
    }
  } UNION {
    ?linkset void:subjectsTarget [ void:class ?subjectClass ] ;
      void:linkPredicate ?prop ;
      void:objectsTarget [ void:class ?objectClass ] .
  }
}

With subject/objects count:

PREFIX up: <http://purl.uniprot.org/core/>
PREFIX void: <http://rdfs.org/ns/void#>
PREFIX void-ext: <http://ldf.fi/void-ext#>
SELECT DISTINCT ?subjectsCount ?subjectClass ?prop ?objectClass ?objectsCount ?objectDatatype
WHERE {
  {
    ?cp void:class ?subjectClass ;
        void:entities ?subjectsCount ;
        void:propertyPartition ?pp .
    ?pp void:property ?prop .
    OPTIONAL {
        {
            ?pp  void:classPartition [ void:class ?objectClass ; void:triples ?objectsCount ] .
        	
        } UNION {
            ?pp void-ext:datatypePartition [ void-ext:datatype ?objectDatatype ] .
        }
    }
  } UNION {
    ?linkset void:subjectsTarget [ void:class ?subjectClass ; void:entities ?subjectsCount ] ;
      void:linkPredicate ?prop ;
      void:objectsTarget [ void:class ?objectClass ; void:entities ?objectsCount ] .
  }
}

About

Calculate statistics for use in a Service Description or Void file

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 90.6%
  • HTML 9.4%