Skip to content

Latest commit

 

History

History
75 lines (65 loc) · 2.38 KB

README.md

File metadata and controls

75 lines (65 loc) · 2.38 KB

DataSetter

DataSetter helps you to create datasets with python, and serve them in a HTTP API.

A dataset is a product based on a homogeneous set of data. It is not only a table of data, but also it's associated metadata :

  • Name : A unique product name that identifies the product.
  • Description : A short explaination of the product : what it contains, how it has been collected, what is can (or cannot) be used for...
  • Columns : A description of the product 's components, with column names, types and description.
  • Facets : The list of the columns that can be used to filter and/or aggregate data. In large datasets, it can ba a strict subset of the columns.

For example, you can define a dataset based on a pandas.DataFrame object :

>>> import pandas as pd
>>> from datasetter.pandas_dataset import PandasDataset
>>>
>>> dataframe = pd.DataFrame([
>>>     ['A', 'alpha', 1],
>>>     ['A', 'beta', 13],
>>>     ['A', 'gamma', 8],
>>>     ['B', 'alpha', 1],
>>>     ['B', 'beta', 31],
>>>     ['C', 'gamma', 9],
>>>     ['C', 'alpha', 2],
>>>     ['D', 'beta', 21],
>>>     ['D', 'gamma', 0],
>>>     ], columns=['letter', 'greek', 'number'])
>>>
>>> dataset = PandasDataset(
>>>     dataframe,
>>>     name="Random letters",
>>>     description="A simple dataset with letters, greek letters and integers.",
>>>     columns=[
>>>         {"name": "letter", "type": "string", "description": "A column with letters."},
>>>         {"name": "greek", "type": "string", "description": "A column with greek letters."},
>>>         {"name": "number", "type": "integer", "description": "A column with numbers."},
>>>         ],
>>>     facets=['letter', 'greek'])

Then, access it's methods in a standard way :

>>> dataset.count()
9

>>> dataset.count(letter="A")
3

>>> dataset.sample(2, greek="gamma")
  letter  greek  number
2      A  gamma       8
5      C  gamma       9

>>> dataset.count_by('greek')
alpha    3
beta     3
gamma    3
Name: greek, dtype: int64

>>> dataset.metadata()
{'name': 'Random letters',
 'description': 'A simple dataset with letters, greek letters and integers.',
 'columns': [{'name': 'letter',
   'type': 'string',
   'description': 'A column with letters.'},
  {'name': 'greek',
   'type': 'string',
   'description': 'A column with greek letters.'},
  {'name': 'number',
   'type': 'integer',
   'description': 'A column with numbers.'}],
 'facets': ['letter', 'greek']}