Skip to content

Enhancing a CSV that describes another CSV's headers

Timothy Lebo edited this page Feb 14, 2012 · 18 revisions
csv2rdf4lod-automation is licensed under the [Apache License, Version 2.0](https://github.com/timrdf/csv2rdf4lod-automation/wiki/License)

Sometimes, a CSV is used to store metadata of another CSV's headers:

e.g., manual/enviro-reports-and-indicators.csv has the data:

ID No.,Title,Organization,Year
16,City of Bowie State of the Environment Report,Department of Planning and Economic Development,2009

and manual/definitions-of-fields.csv:

Column Heading,Definition
ID No.,A unique reference number to facilitate the identification of resources.

The first column of the data CSV is named

http://logd.tw.rpi.edu/source/epa-gov-mcmahon-ethan/dataset/environmental-reports/vocab/enhancement/1/id_no

while the first row of the metadata CSV is referring to the header (and resulting predicate) of the data csv.

The objective is to name the subjects of the metadata CSV rows to match the predicates created during conversion of the data CSV, which can be done with the following enhancement (see Using template variables to construct new values):

conversion:domain_template "[/sd]vocab/enhancement/[e]/[#1]";

Renames the subjects in the metadata CSV conversion to:

http://logd.tw.rpi.edu/source/epa-gov-mcmahon-ethan/dataset/environmental-reports/vocab/enhancement/1/ID_No

The only mismatch is the case of the characters, so we can use the Codebook Enhancement to make the input look different:

conversion:enhance [
   ov:csvCol         1;
   conversion:interpret [
      conversion:symbol         "ID No.";
      conversion:interpretation "id_no";
   ];

Addition uses of this pattern

This pattern can also be used to describe datasets, as can be seen when enhancing data.gov's Dataset 92 (their data catalog for all other datasets).

Clone this wiki locally