Skip to content
This repository has been archived by the owner on Aug 18, 2023. It is now read-only.

Data Conversion

Gerwin Bosch edited this page Sep 15, 2017 · 13 revisions

This is the first functionality to be implemented within the service. This functionality let the user create Linked Data from his own raw data.

Process

Unfortunately, we are not able yet to automatically generate Linked Data, therefore the service needs to be helped to convert the generic data to Linked Data. Below is an illustration of the process to convert data to linked data.

Sequence diagram Converting data

So a quick explanation of the reasoning of the above image.

Load data set - The application needs to be able to interact with the data
Classify columns - The application needs to know which columns contain URI's and which contain Literals
Link data - The application needs to know the relation between columns/classes
Download result - The user is able to download and/or publish his data set

Which comes down to that the sub-components gather information/instructions needed for the next step of the process.

When comparing this functionality with the original software, (Open Refine), with the Google refine extension the classification and linking are put within one interface. The choice to use divide the steps within the application is in order to be able to enforce that literals are not able to be a subject within an ontology.

OpenRefine interface

Openrefine is developed for handling and transforming large amounts of data. The Google Refine extension supplies you with a way to create a RDF-skeleton. Within this view, you need to create links in a row based structure. A better way to represent the structure is to use a linked graph.

Structure

The data conversion component exists of one main component and four sub-components

The DataCreation component is responsible for handling the data which needs to persist and converting the data to formats its sub-components use.

Design

The Data Creation component exists of a tab view populated with the four steps the user takes. The user is not able to click on a tab to navigate between stages completion because if the data changes the application cannot guarantee that the next steps are valid. The tabs give the user an idea of how far along he is in the process.

The viewable part of the compents

Variables

name type required description
executeQuery function ✔️ function which calls a query on the datastore
The function needs to be called with the query and a callback function


Data classification

The second part of the data conversion is to classify the data types. When the user has done this we know which columns contain URI's and which columns contain literal values.

Use Cases

  • As a user, I want to be able to assign classes to my data in order to let the computer know what we are 'talking' about.
  • As a user, I want to specify a base URI if my data doesn't contain URI's yet in order to create valid Linked Data.
  • As a user, I want to use common vocabularies to classify my columns in order to create better linkable data.
  • As a user, I want a way to undo my classification if I made an error.

Rules

  • A Literal cannot have a base URI or a type
  • When a base-URI is specified a new column of data needs to created containing these URI's
  • At least one column must be classified as a URI before the user is able to continue

Structure

This component exists of three components

  • Interaction bar
  • Table representation
  • Classification Dialog

Interaction bar

The interactions bar only contain the continue button which is by default disabled until the user has at least one URI classification.

Table representation

The table contains the following data from each row

  • Column header
  • First value of the column
  • URI check box
  • Current classification
  • Base URI
  • reset

The interaction the user has is to click on a URI check box. When the user clicks on it the dialog opens. When the user confirms his classification of the column the check box gets filled in and disabled. Furthermore, the reset button appears. When the user clicks on the reset button the data gets reset to the original state.

Dialog

The purpose of this dialog is that the user is able to specify a base URI and a classification for a column. The dialog exists of two steps. In the first step, the user is able to specify a Base URI. When the Base URI field is filled in, new columns will be created with the URI-data.
The second step forces the user to pick a classification from the vocabulary library. The user types a search-term which when hitting enter or clicking the search button will fire a query to the API. The user then gets a maximum of ten options. After that, the user is able to commit the classification and the dialog closes.

Data

This component requires only some data from the previous step. In order to create the structure needed the application uses the following function in the Data creation

  setData(data, filename) {
    let dataClassifications; // Create an empty object
    if (data.length > 1) { // If there is data
      dataClassifications = data[0].map((column, index) => ({ // For each column header (first row of data)
        columnName: column, // Name it after the first value in the column
        exampleValue: data[1][index], // Set the actual first value (second row)
        class: { name: 'Literal' }, // Everything is literal by default. 
        uri: false, // See above 
      }));
    } else { // If no data was found
      dataClassifications = []; // There are no values examlevalues
    }
    this.setState({// Update the state
      data,
      dataClassifications,
      filename,
    });
  }

Design

Default screen

Default Classify screen

Dialog

Classify Dialog step 1 Classify Dialog step 2

Variables

name type required description
data Array ✔️ An array of data descriptions objects as seen in the code example above
setClass function ✔️ function called with the column index and the new class object of this column
setURI function ✔️ function called with the column index and the new state of the URI flag of the data-object
setBaseUri function ✔️ function called with the column index and the new base-URI of the object
nextPage function ✔️ A Callback function which is called when a user clicks on continue

Data Link View

This component is used for creating relations between the now classified columns of data

use cases

  • As a user, I want to be able to create relations between my classes in order to link my data
  • As a user, I want the relations to use existing vocabularies in order to make my data understandable to other users
  • As a user, I want to be able to see context about the data I'm linking in order to make better decision
  • As a user, I want to automatically have a relationship between a label and URI if a newly generated row is created
  • As a user, I want to easily differentiate between which nodes represent URI's and which represent literals

Rules

  • Relationships can not be subjects within an ontology

Structure

This page contains one main component, a react di-graph implementation. The user is able to draw edges between nodes using shift-drag motion and able to delete them by selecting and hitting delete on the keyboard. The create and delete node functions are not necessary as you are not able to create data and nodes which have no relations don't get their data in the next conversion algorithm.

The page also contains a status bar where the user is able to go back and go forward in the process. However, all his progress within this screen is lost as changes in the Data Classify View invalidate the state of the Data Link View

Data

The graph view requires at least nodes to let the user create their relation structure. Using the information we gathered from the previous step we are able to extract which type of nodes need to be displayed.

function nodeCreation(data, classifications) {
  let nodes = [];
  const edges = [];
  classifications.forEach((classification, index) => { // For each column
    if (classification.baseUri) { // If a column has a baseUri specified
      nodes.push( // Push its own column
        {
          ...
        });
      nodes.push( // Push the new column
        {
          ...
        });
      edges.push( // Push the relation between them
        {
          ...
        },
      );
      let newRow = data.map((dataRow, rowIndex) => { // create a new column copy of the data
        // Column header
        if (rowIndex === 0) {
          return `${classification.class.name}_uri`; // Name the column
        }
        // Data is empty
        if (!dataRow[index]) { // Return empty value if no original value is there
          return '';
        }
        // Find the same
        const like = data.filter(rowData => rowData[index] === dataRow[index]); // If the value is already there
        if (like.length > 0) {
          return like[0][index];
        }
        return '';
      });
      if (classification.baseUri) { // Transform the values into uri's
        newRow = newRow.map((item, idx) => {
          let baseUri = classification.baseUri;
          if (idx === 0) {
            return item; // Don't change the header
          }
          if (baseUri.startsWith('www')) { // Check for URI validity if not complete it 
            baseUri = `http://${baseUri}`;
          } else if (!baseUri.startsWith('http')) {
            baseUri = `http://www.${baseUri}`;
          }
          if (classification.baseUri[classification.baseUri.length - 1] !== '/') { // Check if it ends with /
            baseUri += '/'; // If not add it
          }
          return baseUri + item;
        });
      }
      data.forEach((dataRow, idx) => dataRow.push(newRow[idx]));// Add the data to the data matrice
    } else if (classification.class.uri) {// If a node is a URI
      nodes.push(
        {
           ...
        });
    } else { // Otherwise it is a literal
      nodes.push(
        {
           ...
        });
    }
  });
  // Distribution algorithm
  nodes = distribute(nodes); // Distrubute the nodes within a grid
  return ({
    data, // the new data with possibly added columns
    edges, // the edges
    nodes, // the nodes
  });
}

Design

The design of this component is one of the key features of this service compared to alternatives. By using a graph as the way to interact with the data-structure we are able to represent it better.

Layout

With the original design, the idea was that the user would drag the nodes onto the canvas and use a toggle between mouse and line drawing. However, this was adding an unnecessarily complicated step as nodes without relations would be filtered out of the data with the next conversion. One of the unfortunate thing about React-Digraph is that the controls cannot be added.

Comparison between the implementation and the design

Items

To give the user an easier way to differentiate the URI's from the literal nodes we changed the shape of them.

Variables

name type required description
nodes Array ✔️ An array of nodes
links Array ✔️ An array of edges
getExampleData function ✔️ function to retrieve the first ten values of a column
pushEdge function ✔️ function called when the user creates an edge
deleteEdge function ✔️ function called when the user removes an edge
previousPage function ✔️ A Callback function which is called when a user clicks previous
nextPage function ✔️ A Callback function which is called when a user clicks on continue

Download view

The download view component is responsible for generation four different output types.

  • Turtle (an RDF graph notation)
  • JSON-LD (a JSON RDF graph notation)
  • N-triples (a raw textual representation of an RDF graph)
  • SPARQL (a SPARQL query where these graphs are inserted)

Use cases

  • As a user, I want to download the newly generated dataset.
  • As a user, I want to publish my data set online
  • As a user, I want to have a choice
  • As a user, I want to be notified when an upload succeeds
  • As a developer, I want that all graphs have the same name structure to identify them in an open graph
  • As a user, I want that my graph has metadata in order know which graph is mine

Rules

  • Data can not be downloaded or published when there is no data to upload/publish

Structure

The component exists of a download button, a publish button, an output-type selector and the generated output.

Download button

The download button prompts uses a href to download the file. As react re-renders the page with every update, all the references are updated as well. Below is the implementation of the download button.

<RaisedButton
  label="download"
  href={`data:${this.state.dataType};charset=utf-8,${encodeURIComponent(
    this.state.displayText)}`}
  download={`${this.props.filename}${this.state.text}`}
  disabled={this.props.processing}
/>

The output type selector is a select-field with four options as described above. Next, to switching the format of the data available for download and the data visible on the screen it also changes the file type.

Format File-type
Turtle .turtle
Json-LD .json
N-triples .txt
SPARQL .txt

Output viewer

For the output viewer, the React-highlight a React implementation of Highlight.js is used. It contains styling for various languages and makes the output better to look at.

Publish button

When the user clicks on the publish button a dialog will show up where the user is required to fill in meta-data about their datasets. The dialog contains four fields;

Name Description editable
Dataset URI The URI the graph will receive when published
Title The title of the data set ✔️
Description a small description where the data is about ✔️
Date The date of creation of this data set

The dataset URI is set programmatically generated from the URL of the page with the title added.

Data

Design

Variables

Clone this wiki locally