This repository contains the vocabulary mappings that are used for dereferencing in Metis. The structure of this repository is described below.
The file src/main/resources/directory.yml
is a YAML file containing an index of all vocabularies.
It consists of a list of objects, each consisting of a metadata file and a mapping file. One such
object could be:
# YSO
- metadata: vocabularies/concept/yso.yml
mapping: vocabularies/concept/yso.xsl
These values all refer to file locations that are given with respect to this directory file. The developer is free to choose any directory structure he/she likes, and this setup also allows the possibility that multiple vocabulary declarations use the same mapping XSLT. However, it is not allowed to reuse the metadata file: each such file should occur only once in the directory. Some uniqueness constraints on the metadata will be enforced/assumed and the file location will be used as a vocabulary ID.
We have chosen to have all metadata in separate files so to better compartmentalize any changes that developers might make to vocabularies, and to allow for adding more information to them in the future without the risk of bloating this directory file.
Each vocabulary in the directory file has exactly one unique metadata file. This YAML file contains the following information:
- name (String value): the unique human-readable name of the vocabulary, by which you can recognise the vocabulary. This field is obligatory.
- types (String value): the type(s) of the vocabulary, i.e. the kind(s) of conceptual classes
that are generated by this mapping. The possible values are
AGENT
,CONCEPT
,PLACE
andTIMESPAN
. This field is obligatory and at least one value must be given. - paths (list of String values): the path(s) in the entity's ID value (
rdf:about
or equivalent) that this vocabulary will apply to. These values function as prefixes, so they must at least include the scheme and the host. This should be as precise as possible so that the vocabulary is not triggered (and no expensive transformations are performed) unless strictly necessary. Multiple values can be given, meaning that the vocabulary will be considered if an entity's ID starts with either one of the given paths. At least one path must be given. These paths may not collide with each other or with paths from other vocabularies, in the sense that one is not allowed to be a substring of another. This guarantees that for any entity ID (rdf:about
) there is always at most one vocabulary that matches it. - suffix (String value): the suffix to be applied to the entity's ID value (
rdf:about
) in order to obtain a workable download URL. Common values are.edm
or.rdf
, but other values can be set. This field is optional (with the empty String as default value). - parentIterations (Integer value): the number of times that we will resolve/dereference parent
entity references (and include the entities in the dereference result). So this determines the
maximum remoteness of parents (
skos:broader
ordc:isPartOf
) that will be included. This value can be0
to disable this behavior, but it cannot be negative. This field is optional (with0
as default value). - examples (list of String values): an optional list of examples of record IDs that should be supported by this vocabulary. This may be used for testing purposes (to check if applying this vocabulary applied to the given entity ID returns an object of the given type).
- counterExamples (list of String values): an optional list of examples of existing record IDs that should not be supported by this vocabulary. This may be used for testing purposes (to check that applying this vocabulary applied to the given entity ID neither fails nor returns a result).
Here follows an example metadata file:
name: YSO
types:
- CONCEPT
paths:
- http://www.yso.fi/onto/yso/
parentIterations: 0
examples:
- http://www.yso.fi/onto/yso/p5007
- http://www.yso.fi/onto/yso/p1808
- http://www.yso.fi/onto/yso/p4818
counterExamples:
- http://www.yso.fi/onto/yso/p105081
- http://www.yso.fi/onto/yso/p105069
Additionally, the file name (including the relative path) is guaranteed to be unique and may therefore be used as unique identifier for the vocabulary. This means that it is not recommended to change the name and location of these files without emptying the caches in the dereference service.
Note: several old fields have been removed from this format.
- The
url
andrules
fields have been merged into thepaths
field, to make things more clear. - The
typeRules
field has been abandoned. Any behavior here can be specified more precisely and in a less error-prone way by modifying the XSLT mapping file.
One example of a so-called typeRule
converted to an XSLT condition is the following:
<xsl:if test="rdf:type/@rdf:resource[.='http://www.yso.fi/onto/yso-meta/Concept']">
<!-- Mapping goes here -->
</xsl:if>
Each vocabulary in the directory file has exactly one mapping file. These are XSLT files that should comply with all rules governing this file type. Additionally, every mapping file:
- must declare a parameter by the name of
targetId
, and - can not declare any other parameters.
Note that the parameter targetId
is not required to be used. But it is recommended, and sometimes
necessary, to do so in order to ensure that the data we receive is accurate. Such a mapping could
look like this:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:skos="http://www.w3.org/2004/02/skos/core#" version="1.0">
<xsl:param name="targetId"/>
<xsl:output indent="yes" encoding="UTF-8"/>
<xsl:template match="/rdf:RDF">
<xsl:for-each select="./*[@rdf:about=$targetId]">
<xsl:if test="rdf:type/@rdf:resource[.='http://www.yso.fi/onto/yso-meta/Concept']">
<skos:Concept>
<xsl:copy-of select="@rdf:about"/>
<!-- Data mappings go here -->
</skos:Concept>
</xsl:if>
</xsl:for-each>
</xsl:template>
</xslt>