-
Notifications
You must be signed in to change notification settings - Fork 36
conversion:Enhancement
csv2rdf4lod automates the invocation of a converter that is controlled using explicit specifications described by a conversion vocabulary whose namespace is http://purl.org/twc/vocab/conversion/. This is done to minimize human error, increase consistency and quality of the resulting RDF representation, and enable transparency and accessibility using provenance. It also provides interpretation metadata that can be efficiently queried to increase discoverability and [reapplied](Reusing enhancement parameters for multiple versions or datasets) to other datasets that share similar structures.
These enhancement parameters have annealed throughout the past four years as it has been applied to hand-curate 100s of datasets from dozens of source organizations, and our experience has indicated that a small set of RDFS-inspired principles have broad applicability to wide structural variability and provide significant advantage for both curator and subsequent data consumer.
Although order does not matter for the predicates on a conversion:Enhancement, the following order is suggested to align with the order in which they affect the triple asserted. Note that this is FULLY conceptual; order of enhancement specification (and application) is irrelevant. This order could be used when designing a GUI for constructing the enhancement parameters.
-
ov:csvCol
- referencing the column that this enhancement affects. - conversion:fromCol / conversion:toCol - shorthand for referencing more than one column on a single enhancement.
-
conversion:property_name
- an alternative way to reference the column via its resulting predicate's local name. NOTE: not implemented.
-
ov:csvHeader
- PURELY an (OWL) annotation property; a "poor-man's provenance" retrieved from the CSV header to aid identification between the 1) original data file, 2) the enhancements modifying it, and 3) its resulting instance data. The converter does not look for this nor does it behave differently with our without this value or when this value changes. It only exists for human reference. See conversion:label.
Enhancements that affect the subject of the triple produced:
-
a conversion:
{Omitted, Only_if_column, DataStartRow} - aborts the assertion of a triple 1) always or 2) if the cell in the current column is empty, respectively. -
conversion:bundled_by - changes the "location" from which to draw the subject of the triples instantiated by the current column.
- conversion:name_template - to name an implicit bundle.
-
a conversion:
conversion:name_template - to make the implicit bundle a bnode. - conversion:type_name - to type the implicit bundle.
-
a conversion:
{ExampleResource, SubjectAnnotation} - flags a row as containing an exemplary resource that should become a void:exampleResource, TODO describe subject annotation. - conversion:domain_template - changes the URI used to name the subject (see DEPRECATING: conversion:domain_template).
- conversion:domain_name - to specify a class name to type resources created for the rows of the table. A local class URI is constructed from this label.
- conversion:subject - to specify triple patterns using [context-free templates](Using template variables to construct new values).
-
conversion:predicate - (when
ov:csvCol 0
) provides an arbitrary predicate for an additional description of the resource subject created. -
conversion:object - (when
ov:csvCol 0
) provides an arbitrary object for an additional description of the resource subject created. - conversion:object_search - annotates the subject by searching the object literal.
Enhancements that affect the predicate of the triple produced:
- conversion:equivalent_property - used to specify an external URI for this column, without the local-external redundancy of subproperty_of.
- conversion:label - will become the rdfs:label of the predicate created for the triples instantiated by the current column.
- conversion:comment - will become the rdfs:comment of the predicate created for the triples instantiated by the current column.
- conversion:subproperty_of - identities additional predicates to use for the triples instantiated by the current column.
Enhancements that affect the object of the triple produced:
- conversion:eg - gives an example value from a cell in the column. Present only for human reference.
-
a conversion:
{Repeat_previous_if_empty_column, LargeValue} - - conversion:repeat_previous -
- conversion:interpret -
- conversion:pattern - specify how to parse an input cell value into a date or dateTime.
- conversion:delimits_object - specifies a delimiter regex to parse the input value into multiple objects.
- conversion:object - provides the template for the up-value in cell based conversions.
- conversion:range_template - changes the name of the object.
Enhancements that affect the descriptions of the object of the triple produced:
- conversion:range - {rdfs:Literal, rdfs:Resource, xsd:integer, xsd:decimal}
-
a
conversion:Unlabeled - suppresses rdfs:labels on resources promoted from a particular column. - conversion:multiplier - for any numeric conversion:range
- conversion:range_name - to specify a class name to type resources promoted from cell values. A local class URI is constructed from this label.
-
conversion:links_via - cites lod-link graphs that can be used to assert owl:sameAs from the subject or object of a triple created during conversion.
- conversion:subject_of - specifies the predicate in the lod-link file that should behave as a owl:InverseFunctionalProperty. If specified, overrides the default predicates of dcterms:title and dcterms:identifier.
- conversion:keys - specifies additional attributes that the link-via resources must exhibit in order to infer identity.
-
a
conversion:IncludesLODLinks - transcludes the links-via graph as part of conversion output.
-
conversion:predicate - (when
ov:csvCol
>0
) provides an arbitrary predicate for an additional description of the resource object created. -
conversion:object - (when
ov:csvCol
>0
) provides an arbitrary object for an additional description of the resource object created. - conversion:object_label_property - specifies additional properties to assert for the label of a promoted resource object (in addition to the rdfs:label and dcterms:identifier).
- conversion:class_name - cites label of a local class created by another enhancement to become rdfs:subClassOf of that cited by subclass_of.
- conversion:subclass_of - the URI or template citing the superclass of a local class.
- Structural conversion:Enhancements:
- conversion:charset - to specify the character encoding of the input file.
- conversion:HeaderRow - to specify the row that contains header data (or [dimensional values](Converting with cell based subjects)).
-
conversion:DataStartRow - to specify the first (inclusive) row that contains data.
- conversion:delimits_cell - to specify the character that terminates a cell.
- conversion:Only_if_column - to omit processing a row if a certain column's value is missing.
- conversion:Repeat_previous_if_empty_column - to "downfill" an empty cell with the value from above.
- conversion:repeat_previous - to specify a value that indicates repetition (instead of just an empty value).
- conversion:Omitted - to specify a column to omit.
- conversion:DataEndRow - to specify the last (inclusive) row that contains data.
- Vocabulary conversion:Enhancements
- Enhancements that provide owl:sameAs triples
- conversion:links_via - cites lod-link files that can be used to assert owl:sameAs from the subject or object.
- conversion:subject_of - identifies the predicate in the lod-link file that should behave as a owl:InverseFunctionalProperty.
conversion:includes should be placed in the position corresponding to the types of enhancements it is including. For example, if it is including conversion:symbol/conversion:interpretation pairs, place it at the position that conversion:interpret would go.
Enhancement parameters is a less organized listing of the same enhancements shown here.
- Enhancement Parameters Reference was the original one-stop-shop for the enhancements that could be performed, but the wikimedia syntax isn't too happy on github. So, I'm breaking it up into this page and pages for each enhancement.