-
Notifications
You must be signed in to change notification settings - Fork 36
FAQ
laptop:~/research/source/scraperwiki-com/uk-offshore-oil-wells/version/2011-Jan-24$ ./convert-uk-offshore-oil-wells.sh
--------------------------------------------------------------------------------
uk-offshore-oil-wells.csv
convert-uk-offshore-oil-wells.sh converting newlines of source/uk-offshore-oil-wells.csv
10720 rows in source/uk-offshore-oil-wells.csv
RAW CONVERSION
Error occurred during initialization of VM
Could not reserve enough space for object heap
Could not create the Java virtual machine.
cat: automatic/uk-offshore-oil-wells.csv.raw.ttl: No such file or directory
convert.sh done
convert-aggregate.sh delaying publishing until an enhancement is available.
To publish with only raw, set CSV2RDF4LOD_PUBLISH_DELAY_UNTIL_ENHANCED="false" in $CSV2RDF4LOD_HOME/my-csv2rdf4lod-source-me.sh.
To publish raw with enhanced, add enhancement to manual/uk-offshore-oil-wells.csv.e1.params.ttl and rerun convert...wells.sh
To force publishing now, run publish/bin/publish.sh
===========================================================================================
By default, csv2rdf4lod-automation requests 3GB of memory for csv2rdf4lod. To reduce that, in your [my-csv2rdf4lod-source-me.sh](Script: source me.sh), change:
export CSV2RDF4LOD_CONVERTER=""
to:
export CSV2RDF4LOD_CONVERTER="java edu.rpi.tw.data.csv.CSVtoRDF"
If you want to adjust the amount of memory to use (say, to 4GB):
export CSV2RDF4LOD_CONVERTER="java -Xmx4096m edu.rpi.tw.data.csv.CSVtoRDF"
I think there is 0th col in your example
conversion:enhance [
ov:csvCol 0;
conversion:predicate rdfs:seeAlso;
conversion:object
<https://www.og.decc.gov.uk/pls/wons/wdep0100.qryWell>;
];
The first column is column 1. We're humans, not computers. Any enhancement with ov:csvCol 0
is referring to the row.
See the descriptions at conversion:Enhancement and conversion:predicate.
- Dependencies are listed in Installing csv2rdf4lod automation - complete
bash-3.2$ export CLASSPATH=$CLASSPATH`$CSV2RDF4LOD_HOME/bin/util/cr-situate-classpaths.sh`
bash-3.2$ java edu.rpi.tw.data.csv.CSVtoRDF --version
CSVtoRDF: version 2012-Sep-06
DATE FAILED: "1989-10-02" !~ 0 templates @ :thing_2 :completion_date
- Well, I guess just Tetherless World :-/
- Please let us know if you know of any other adoptions.
- See List of SPARQL endpoints containing datasets produced by csv2rdf4lod
$ java edu.rpi.tw.data.csv.impl.CSVHeaders
usage: CSVHeaders <file> [--header-line headerLineNumber] [--delimiter delimiter]
java edu.rpi.tw.data.csv.impl.CSVHeaders manual/current.csv
If the headers are the second line and the file is pipe delimited:
java edu.rpi.tw.data.csv.impl.CSVHeaders manual/current.csv --header-line 2 --delimiter \|
java edu.rpi.tw.data.csv.impl.CSVHeaders source/gene2ensembl --header-line 1 --delimiter '\t'
When in a conversion cockpit, both manual/x.e1.params.ttl and manual/x.global.e1.params.ttl can appear (where x
is some file name). The most common is manual/x.e1.params.ttl and is created by cr-create-convert-sh.sh
when pulling the conversion trigger for the first time. If, however, [global parameters](Reusing enhancement parameters for multiple versions or datasets) are present, manual/x.global.e1.params.ttl is [generated from the global parameters](Generating enhancement parameters) (e.g., ../x.e1.params.ttl or ../e1.params.ttl) each time the conversion trigger is pulled. As the comments in manual/x.global.e1.params.ttl suggest, they should NOT be edited by hand. The page about the retrieval phase of the conversion process provides a good introduction to the directory structure conventions used and can help understand where [global parameters](Reusing enhancement parameters for multiple versions or datasets) need to be situated. When placed, the global parameters are recognized during automated creation of a new Versioned Dataset.
The following exception can be handled using conversion:LargeValue (though, make sure that your input file actually does have values greater than 100,000 characters!):
Processed 52913 rows in 0 min. Flushing 735726 + 158733 triples as ttl.
exception on row : 80595
java.io.IOException: Maximum column length of 100,000 exceeded in
column 3 in record 80,595. Set the SafetySwitch property to false if
you're expecting column lengths greater than 100,000 characters to
avoid this error.
at com.csvreader.CsvReader.readRecord(Unknown Source)
at edu.rpi.tw.data.csv.CSVParser.visitRecords(CSVParser.java:274)
at edu.rpi.tw.data.csv.CSVtoRDF.toRDF(CSVtoRDF.java:1189)
at edu.rpi.tw.data.csv.CSVtoRDF.main(CSVtoRDF.java:429)
java.io.IOException: This instance of the CsvReader class has already
been closed.
at com.csvreader.CsvReader.checkClosed(Unknown Source)
at com.csvreader.CsvReader.readRecord(Unknown Source)
at edu.rpi.tw.data.csv.CSVParser.visitRecords(CSVParser.java:301)
at edu.rpi.tw.data.csv.CSVtoRDF.toRDF(CSVtoRDF.java:1189)
at edu.rpi.tw.data.csv.CSVtoRDF.main(CSVtoRDF.java:429)
FAQs elsewhere that should be consolidated to this page:
- http://logd.tw.rpi.edu/lab/faq/what_conventions_should_we_use_when_choosing_conversionsource_identifier_conversiondataset_identifier_and_conversionversion_identifier
- http://logd.tw.rpi.edu/lab/faq/how_do_we_identify_versions_abstract_dataset
- http://logd.tw.rpi.edu/lab/faq/why_csv2rdf4lods_cr-create-convert-shsh_not_creating_script_instead_just_printing_it_out