-
Notifications
You must be signed in to change notification settings - Fork 1
Data Conversion
This is the first functionality to be implemented within the service. This functionality let the user create Linked Data from his own raw data.
Unfortunately, we are not able yet to automatically generate Linked Data, therefore the service needs to be helped to convert the generic data to Linked Data. Below is an illustration of the process to convert data to linked data.
So a quick explanation of the reasoning of the above image.
Load data set - The application needs to be able to interact with the data
Classify columns - The application needs to know which columns contain URI's and which contain Literals
Link data - The application needs to know the relation between columns/classes
Download result - The user is able to download and/or publish his data set
Which comes down to that the sub-components gather information/instructions needed for the next step of the process.
When comparing this functionality with the original software, (Open Refine), with the Google refine extension the classification and linking are put within one interface. The choice to use divide the steps within the application is in order to be able to enforce that literals are not able to be a subject within an ontology.
Openrefine is developed for handling and transforming large amounts of data. The Google Refine extension supplies you with a way to create a RDF-skeleton. Within this view, you need to create links in a row based structure. A better way to represent the structure is to use a linked graph.
The data conversion component exists of one main component and four sub-components
- DataCreation
The DataCreation component is responsible for handling the data which needs to persist and converting the data to formats its sub-components use.
The Data Creation component exists of a tab view populated with the four steps the user takes. The user is not able to click on a tab to navigate between stages completion because if the data changes the application cannot guarantee that the next steps are valid. The tabs give the user an idea of how far along he is in the process.
name | type | required | description |
---|---|---|---|
executeQuery | function | ✔️ | function which calls a query on the datastore The function needs to be called with the query and a callback function |
The second part of the data conversion is to classify the data types. When the user has done this we know which columns contain URI's and which columns contain literal values.
- As a user, I want to be able to assign classes to my data in order to let the computer know what we are 'talking' about.
- As a user, I want to specify a base URI if my data doesn't contain URI's yet in order to create valid Linked Data.
- As a user, I want to use common vocabularies to classify my columns in order to create better linkable data.
- As a user, I want a way to undo my classification if I made an error.
- A Literal cannot have a base URI or a type
- When a base-URI is specified a new column of data needs to created containing these URI's
- At least one column must be classified as a URI before the user is able to continue
This component exists of three components
- Interaction bar
- Table representation
- Classification Dialog
The interactions bar only contain the continue button which is by default disabled until the user has at least one URI classification.
The table contains the following data from each row
- Column header
- First value of the column
- URI check box
- Current classification
- Base URI
- reset
The interaction the user has is to click on a URI check box. When the user clicks on it the dialog opens. When the user confirms his classification of the column the check box gets filled in and disabled. Furthermore, the reset button appears. When the user clicks on the reset button the data gets reset to the original state.
The purpose of this dialog is that the user is able to specify a base URI and a classification for a column.
The dialog exists of two steps. In the first step, the user is able to specify a Base URI. When the Base URI field is filled in, new columns will be created with the URI-data.
The second step forces the user to pick a classification from the vocabulary library. The user types a search-term which when hitting enter or clicking the search button will fire a query to the API. The user then gets a maximum of ten options. After that, the user is able to commit the classification and the dialog closes.
This component requires only some data from the previous step. In order to create the structure needed the application uses the following function in the Data creation
setData(data, filename) {
let dataClassifications; // Create an empty object
if (data.length > 1) { // If there is data
dataClassifications = data[0].map((column, index) => ({ // For each column header (first row of data)
columnName: column, // Name it after the first value in the column
exampleValue: data[1][index], // Set the actual first value (second row)
class: { name: 'Literal' }, // Everything is literal by default.
uri: false, // See above
}));
} else { // If no data was found
dataClassifications = []; // There are no values examlevalues
}
this.setState({// Update the state
data,
dataClassifications,
filename,
});
}
name | type | required | description |
---|---|---|---|
data | Array | ✔️ | An array of data descriptions objects as seen in the code example above |
setClass | function | ✔️ | function called with the column index and the new class object of this column |
setURI | function | ✔️ | function called with the column index and the new state of the URI flag of the data-object |
setBaseUri | function | ✔️ | function called with the column index and the new base-URI of the object |
nextPage | function | ✔️ | A Callback function which is called when a user clicks on continue |
This component is used for creating relations between the now classified columns of data
- As a user, I want to be able to create relations between my classes in order to link my data
- As a user, I want the relations to use existing vocabularies in order to make my data understandable to other users
- As a user, I want to be able to see context about the data I'm linking in order to make better decision
- As a user, I want to automatically have a relationship between a label and URI if a newly generated row is created
- As a user, I want to easily differentiate between which nodes represent URI's and which represent literals
- Relationships can not be subjects within an ontology
This page contains one main component, a react di-graph implementation.
The user is able to draw edges between nodes using shift-drag
motion and able to delete them by selecting and hitting delete on the keyboard. The create and delete node functions are not necessary as you are not able to create data and nodes which have no relations don't get their data in the next conversion algorithm.
The page also contains a status bar where the user is able to go back and go forward in the process. However, all his progress within this screen is lost as changes in the Data Classify View invalidate the state of the Data Link View
The graph view requires at least nodes to let the user create their relation structure. Using the information we gathered from the previous step we are able to extract which type of nodes need to be displayed.
function nodeCreation(data, classifications) {
let nodes = [];
const edges = [];
classifications.forEach((classification, index) => { // For each column
if (classification.baseUri) { // If a column has a baseUri specified
nodes.push( // Push its own column
{
...
});
nodes.push( // Push the new column
{
...
});
edges.push( // Push the relation between them
{
...
},
);
let newRow = data.map((dataRow, rowIndex) => { // create a new column copy of the data
// Column header
if (rowIndex === 0) {
return `${classification.class.name}_uri`; // Name the column
}
// Data is empty
if (!dataRow[index]) { // Return empty value if no original value is there
return '';
}
// Find the same
const like = data.filter(rowData => rowData[index] === dataRow[index]); // If the value is already there
if (like.length > 0) {
return like[0][index];
}
return '';
});
if (classification.baseUri) { // Transform the values into uri's
newRow = newRow.map((item, idx) => {
let baseUri = classification.baseUri;
if (idx === 0) {
return item; // Don't change the header
}
if (baseUri.startsWith('www')) { // Check for URI validity if not complete it
baseUri = `http://${baseUri}`;
} else if (!baseUri.startsWith('http')) {
baseUri = `http://www.${baseUri}`;
}
if (classification.baseUri[classification.baseUri.length - 1] !== '/') { // Check if it ends with /
baseUri += '/'; // If not add it
}
return baseUri + item;
});
}
data.forEach((dataRow, idx) => dataRow.push(newRow[idx]));// Add the data to the data matrice
} else if (classification.class.uri) {// If a node is a URI
nodes.push(
{
...
});
} else { // Otherwise it is a literal
nodes.push(
{
...
});
}
});
// Distribution algorithm
nodes = distribute(nodes); // Distrubute the nodes within a grid
return ({
data, // the new data with possibly added columns
edges, // the edges
nodes, // the nodes
});
}
The design of this component is one of the key features of this service compared to alternatives. By using a graph as the way to interact with the data-structure we are able to represent it better.
With the original design, the idea was that the user would drag the nodes onto the canvas and use a toggle between mouse and line drawing. However, this was adding an unnecessarily complicated step as nodes without relations would be filtered out of the data with the next conversion. One of the unfortunate thing about React-Digraph is that the controls cannot be added.
To give the user an easier way to differentiate the URI's from the literal nodes we changed the shape of them.
name | type | required | description |
---|---|---|---|
nodes | Array | ✔️ | An array of nodes |
links | Array | ✔️ | An array of edges |
getExampleData | function | ✔️ | function to retrieve the first ten values of a column |
pushEdge | function | ✔️ | function called when the user creates an edge |
deleteEdge | function | ✔️ | function called when the user removes an edge |
previousPage | function | ✔️ | A Callback function which is called when a user clicks previous |
nextPage | function | ✔️ | A Callback function which is called when a user clicks on continue |
The download view component is responsible for generation four different output types.
- Turtle (an RDF graph notation)
- JSON-LD (a JSON RDF graph notation)
- N-triples (a raw textual representation of an RDF graph)
- SPARQL (a SPARQL query where these graphs are inserted)
- As a user, I want to download the newly generated dataset.
- As a user, I want to publish my data set online
- As a user, I want to have a choice
- As a user, I want to be notified when an upload succeeds
- As a developer, I want that all graphs have the same name structure to identify them in an open graph
- As a user, I want that my graph has metadata in order know which graph is mine
- Data can not be downloaded or published when there is no data to upload/publish
The component exists of a download button, a publish button, an output-type selector and the generated output.
The download button prompts uses a href to download the file. As react re-renders the page with every update, all the references are updated as well. Below is the implementation of the download button.
<RaisedButton
label="download"
href={`data:${this.state.dataType};charset=utf-8,${encodeURIComponent(
this.state.displayText)}`}
download={`${this.props.filename}${this.state.text}`}
disabled={this.props.processing}
/>
The output type selector is a select-field with four options as described above. Next, to switching the format of the data available for download and the data visible on the screen it also changes the file type.
Format | File-type |
---|---|
Turtle | .turtle |
Json-LD | .json |
N-triples | .txt |
SPARQL | .txt |
For the output viewer, the React-highlight a React implementation of Highlight.js is used. It contains styling for various languages and makes the output better to look at.
When the user clicks on the publish button a dialog will show up where the user is required to fill in meta-data about their datasets. The dialog contains four fields;
Name | Description | editable |
---|---|---|
Dataset URI | The URI the graph will receive when published | ❌ |
Title | The title of the data set | ✔️ |
Description | a small description where the data is about | ✔️ |
Date | The date of creation of this data set | ❌ |
The dataset URI is set programmatically generated from the URL of the page with the title added.
RDF-PAQT is the result of the bachelor thesis of Gerwin Bosch commissioned by the Kadaster