Skip to content
Tim L edited this page Jun 19, 2013 · 74 revisions
csv2rdf4lod-automation is licensed under the [Apache License, Version 2.0](https://github.com/timrdf/csv2rdf4lod-automation/wiki/License)

Q: memory error!

laptop:~/research/source/scraperwiki-com/uk-offshore-oil-wells/version/2011-Jan-24$ ./convert-uk-offshore-oil-wells.sh 
--------------------------------------------------------------------------------
uk-offshore-oil-wells.csv
convert-uk-offshore-oil-wells.sh converting newlines of source/uk-offshore-oil-wells.csv
10720 rows in source/uk-offshore-oil-wells.csv
RAW CONVERSION
Error occurred during initialization of VM
Could not reserve enough space for object heap
Could not create the Java virtual machine.
cat: automatic/uk-offshore-oil-wells.csv.raw.ttl: No such file or directory
   convert.sh done
convert-aggregate.sh delaying publishing until an enhancement is available.
  To publish with only raw, set CSV2RDF4LOD_PUBLISH_DELAY_UNTIL_ENHANCED="false" in $CSV2RDF4LOD_HOME/my-csv2rdf4lod-source-me.sh.
  To publish raw with enhanced, add enhancement to manual/uk-offshore-oil-wells.csv.e1.params.ttl and rerun convert...wells.sh
  To force publishing now, run publish/bin/publish.sh
===========================================================================================

By default, csv2rdf4lod-automation requests 3GB of memory for csv2rdf4lod. To reduce that, in your [my-csv2rdf4lod-source-me.sh](Script: source me.sh), change:

    export CSV2RDF4LOD_CONVERTER=""

to:

    export CSV2RDF4LOD_CONVERTER="java edu.rpi.tw.data.csv.CSVtoRDF"

If you want to adjust the amount of memory to use (say, to 4GB):

    export CSV2RDF4LOD_CONVERTER="java -Xmx4096m edu.rpi.tw.data.csv.CSVtoRDF"

Q: Do you start counting at 0 or 1?

 I think there is 0th col in your example

conversion:enhance [
        ov:csvCol         0;
        conversion:predicate rdfs:seeAlso;
        conversion:object
<https://www.og.decc.gov.uk/pls/wons/wdep0100.qryWell>;
     ];

The first column is column 1. We're humans, not computers. Any enhancement with ov:csvCol 0 is referring to the row.

See the descriptions at conversion:Enhancement and conversion:predicate.

Q: How do I install csv2rdf4lod-automation?

Q: What dependencies does csv2rdf4lod-automation have?

Q: What version of csv2rdf4lod.jar do I have?

bash-3.2$ export CLASSPATH=$CLASSPATH`$CSV2RDF4LOD_HOME/bin/util/cr-situate-classpaths.sh`
bash-3.2$ java edu.rpi.tw.data.csv.CSVtoRDF --version
CSVtoRDF: version 2012-Sep-06

Q: The date failed

DATE FAILED: "1989-10-02" !~ 0 templates @ :thing_2 :completion_date

See conversion:date_pattern.

Q: Who is using csv2rdf4lod?

Q: How do I grab the headers of a file?

$ java edu.rpi.tw.data.csv.impl.CSVHeaders
usage: CSVHeaders <file> [--header-line headerLineNumber] [--delimiter delimiter]
java edu.rpi.tw.data.csv.impl.CSVHeaders manual/current.csv

If the headers are the second line and the file is pipe delimited:

java edu.rpi.tw.data.csv.impl.CSVHeaders manual/current.csv --header-line 2 --delimiter \|
java edu.rpi.tw.data.csv.impl.CSVHeaders source/gene2ensembl --header-line 1 --delimiter '\t'

Q: what is the difference between x.e1.params.ttl and x.global.e1.params.ttl?

When in a conversion cockpit, both manual/x.e1.params.ttl and manual/x.global.e1.params.ttl can appear (where x is some file name). The most common is manual/x.e1.params.ttl and is created by cr-create-convert-sh.sh when pulling the conversion trigger for the first time. If, however, [global parameters](Reusing enhancement parameters for multiple versions or datasets) are present, manual/x.global.e1.params.ttl is [generated from the global parameters](Generating enhancement parameters) (e.g., ../x.e1.params.ttl or ../e1.params.ttl) each time the conversion trigger is pulled. As the comments in manual/x.global.e1.params.ttl suggest, they should NOT be edited by hand. The page about the retrieval phase of the conversion process provides a good introduction to the directory structure conventions used and can help understand where [global parameters](Reusing enhancement parameters for multiple versions or datasets) need to be situated. When placed, the global parameters are recognized during automated creation of a new Versioned Dataset.

Q: Maximum column length of 100,000 exceeded?

The following exception can be handled using conversion:LargeValue (though, make sure that your input file actually does have values greater than 100,000 characters!):

Processed 52913 rows in 0 min. Flushing 735726 + 158733 triples as ttl.
exception on row : 80595
java.io.IOException: Maximum column length of 100,000 exceeded in
column 3 in record 80,595. Set the SafetySwitch property to false if
you're expecting column lengths greater than 100,000 characters to
avoid this error.
	at com.csvreader.CsvReader.readRecord(Unknown Source)
	at edu.rpi.tw.data.csv.CSVParser.visitRecords(CSVParser.java:274)
	at edu.rpi.tw.data.csv.CSVtoRDF.toRDF(CSVtoRDF.java:1189)
	at edu.rpi.tw.data.csv.CSVtoRDF.main(CSVtoRDF.java:429)
java.io.IOException: This instance of the CsvReader class has already
been closed.
	at com.csvreader.CsvReader.checkClosed(Unknown Source)
	at com.csvreader.CsvReader.readRecord(Unknown Source)
	at edu.rpi.tw.data.csv.CSVParser.visitRecords(CSVParser.java:301)
	at edu.rpi.tw.data.csv.CSVtoRDF.toRDF(CSVtoRDF.java:1189)
	at edu.rpi.tw.data.csv.CSVtoRDF.main(CSVtoRDF.java:429)

Historical note

FAQs elsewhere that should be consolidated to this page:

Clone this wiki locally