Skip to content
/ sinagot Public
forked from YannBeauxis/sinagot

Python library to manage data processing on a dataset.

License

Notifications You must be signed in to change notification settings

GHFC/sinagot

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sinagot

Sinagot is a Python library to manage data processing with scripts on a dataset. Sinagot is able to batch scripts runs with a simple API. Parallelization of data processing is possible with Dask.distributed.

Installation

Sinagot is available on PyPi :

pip install sinagot

Full Documentation

https://sinagot.readthedocs.io/en/latest/

Concept

Dataset are structured around some core concept : record, subset, task, modality and script. A record, identified by its unique ID, correspond to a recording session where experimental tasks are performed generating data of various modalities. Raw data of a record are processed with scripts to generate more useful data.

The idea of Sinagot emerged for the data management of an EEG platform called SoNeTAA. For documentation purpose SoNeTAA dataset structure will be used as example.

On SoNeTAA, a record with an ID with timestamp info in this format REC-[YYMMDD]-[A-Z], for example "REC-200331-A".

For a record, 3 tasks are performed : "RS", "MMN" and "HDC", 2 main modalities handle data for every tasks: "EEG" and "clinical", and a third one "behavior" exists only for HDC.

Demo with SoNeTAA example

Create a Dataset instance

Import Dataset class

>>> from sinagot import Dataset

A Dataset instance need 3 things :

  • A config file in toml format.
  • A folder containing the dataset
  • A folder containing all the scripts

To instantiate a dataset use the config file path as argument :

>>> ds = Dataset('/path/to/conf')
>>> ds
<Dataset instance | task: None, modality: None>

Be sure that dataset path and scripts path are correctly set in the config file

Explore records

You can list all records ids :

>>> for id in ds.ids():
...     print(id)
...
REC-200331-A
REC-200331-B

Create a Record instance. For a specific record :

>>> rec = ds.get('REC-200331-A')
>>> rec
<Record instance | id: REC-200331-A, task: None, modality: None>

Or the first record found :

>>> ds.first()
<Record instance | id: REC-200331-B, task: None, modality: None>

Records are not sort by their ids.

run scripts

You can run all scripts for each record of the dataset :

>>> ds.run()
2020-03-31 16:03:58,869 : Begin step run
...
2020-03-31 16:03:58,869 : Step run finished

Or for a single record :

>>> rec.run()
2020-03-31 16:06:57,313 : Begin step run
...
2020-03-31 16:06:57,314 : Step run finished

Explore by task or modality

Each dataset or record has subscopes corresponding to their tasks and modalities simply accessible by self attributes with the scope name.

For example to select only the task RS of the dataset :

>>> ds.RS
<Subset instance | task: RS, modality: None>

A dataset subscope is a subset.

Or the EEG modality of a record :

>>> rec.EEG
<Record instance | id: REC-200331-A, task: None, modality: EEG>

You can select a specific couple of task and modality (called unit) :

>>> ds.RS.EEG
<Subset instance | task: RS, modality: EEG>
>>> ds.EEG.RS
<Subset instance | task: RS, modality: EEG>

About

Python library to manage data processing on a dataset.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%