Skip to content
Alexander Lex edited this page Jul 30, 2014 · 30 revisions

UpSet is a visualization technique for set-based data. Here you find documentation targeted at developers or advanced users.

Please refer to http://vcg.github.io/upset/about for general information and to http://vcg.github.io/upset/ for the visualization technique itself.

You can currently provide data to UpSet through a publicly available dataset and a simple description file, which also must be publicly available.

Data Format

UpSet uses a binary encoding for the sets. Here is a simple example, with Sets A, B and C, and three elements in the rows (R1, R2, R3):

Row;A;B;C
R1;1;0;0
R2;0;1;0
R3;0;0;1

You can download this file here.

To make upset understand this data format you have to provide a simple JSON file. The configuration file for the above dataset is as simple as this:

{
	"file": "http://vcg.github.io/upset/data/test/test.csv",
	"name": "Test",
	"header": 0,
	"separator": ";",
	"skip": 0,
	"meta": [
		{ "type": "id", "index": 0, "name": "Name" }
	],
	"sets": [
		{ "format": "binary", "start": 1, "end": 3 }
	]
}

You can download this file here and also look at it in UpSet.

The properties of these attributes are the following:

  • file describes the path to the data file. This path typically should be a globally accessible URL, unless you run upset locally, in which case you can use relative paths.
  • name is a custom name that you can give to your dataset, as it will appear in UpSet.
  • header defines the row in the dataset where your column IDs are stored (the sets and the attributes)
  • separator defines which symbols are used to separate the cells in the matrix. Common symbols used are semicolon ;, colon ,, and tab \t.
  • skip is currently not in use but will provide the ability to skip rows at the beginning of the file.
  • meta is an array of metadata that specifies the id column and attribute columns. The above example defines the first column in the file (the column with index 0) to contain the identifiers for the elements. The name of the identifiers is "Name". meta is also used for attributes, discussed later.
  • sets defines the sets in the dataset. It is specified in an array to allow multiple ranges of sets within a file. Here only one range of sets is defined, from the start index 1 (the second column) to the end index 3 (the fourth column).

Attributes

UpSet can visualize attributes in addition to sets. Here is the movie dataset we use for examples with attributes:

Name;ReleaseDate;Action;Adventure;Children;Comedy;Crime;Documentary;Drama;Fantasy;Noir;Horror;Musical;Mystery;Romance;SciFi;Thriller;War;Western;AvgRating;Watches
Toy Story (1995);1995;0;0;1;1;0;0;0;0;0;0;0;0;0;0;0;0;0;4.15;2077
Jumanji (1995);1995;0;1;1;0;0;0;0;1;0;0;0;0;0;0;0;0;0;3.2;701
Grumpier Old Men (1995);1995;0;0;0;1;0;0;0;0;0;0;0;0;1;0;0;0;0;3.02;478
Waiting to Exhale (1995);1995;0;0;0;1;0;0;1;0;0;0;0;0;0;0;0;0;0;2.73;170
Father of the Bride Part II (1995);1995;0;0;0;1;0;0;0;0;0;0;0;0;0;0;0;0;0;3.01;296

As you can see, the first column (index 0) contains the name of the movie (the element ID), the second column (index 1) the release year, which is an attribute, columns index 2-18 contain the sets, i.e., the movie genres, while the last two columns again contain attributes.

Here is the JSON definition for this file:

{
	"file": "https://dl.dropboxusercontent.com/u/36962787/UpSet/movies.csv",
	"name": "Online Movies Genres ",
	"header": 0,
	"separator": ";",
	"skip": 0,
	"meta": [
		{ "type": "id", "index": 0, "name": "Name" },
		{ "type": "integer", "index": 1, "name": "Release Date" },
		{ "type": "float", "index": 19, "name": "Average Rating", "min": 1, "max": 5 },
		{ "type": "integer", "index": 20, "name": "Times Watched" }
	],
	"sets": [
		{ "format": "binary", "start": 2, "end": 18 }
	]
}

It is very similar to the definition we used before, with the exception of additional attributes in the meta tag. Here we see three attribute definitions for Release Date as an integer attribute, Average Rating as a float, (a real number), attribute and Times Watched also as an integer attribute. In addition, UpSet can also handle text, defined as string type.

Data Import

Once you've brought your data into the right format and have written the JSON file you must put it on a globally accessible web server. Unfortunately, not every web server will work due to security and configuration reasons, but an option that will definitely work is to use the public folder on your Dropbox (note that public links out of other Dropbox folders don't work).

To make this work, simply move both, your JSON and your CSV file to your Public Dropbox folder. Now put the public link to your CSV file into your JSON file. You can option the public link by right-clicking on any file and choosing "Get public link" in the Dropbox web interface.

Next, open UpSet and click on Load Data in the upper right corner. You will see a text field where you must past the link to your JSON file. The filed is already pre-filled with the movie example above.

Once you hit submit you will see your dataset appear in UpSet! Congratulations!

Clone this wiki locally