organizing interface+service abstractions and testing it with input_o…

…utput tasks
sensein · Apr 12, 2024 · d56f9c5 · d56f9c5
1 parent d7227da
commit d56f9c5
Show file tree

Hide file tree

Showing 83 changed files with 2,419 additions and 873 deletions.
diff --git a/FEATURES.md b/FEATURES.md
diff --git a/FEATURES.tmp.md b/FEATURES.tmp.md
@@ -0,0 +1,38 @@
+# Functionalities 
+This file is here just as a support for development. 
+
+AUDIO
+
+[TODO]: 
+- speech to text
+    1. to transcribe speech into text
+        - INPUT: 
+            1. a datasets object with the audio recordings in the "audio" column
+            2. the audio column (default = "audio")
+            3. the speech to text service to use (including the name, the version, the revision, and - for some services only and sometimes it's optional - the language of the transcription model we want to use)
+        - PREPROCESSING:
+            1. adapt the language to the service format
+            2. organize the dataset into batches
+        - PROCESSING:
+            1. transcribe the dataset
+        - POSTPROCESSING: 
+            1. formatting the transcripts to follow a standard organization
+        - OUTPUT:
+            1. a new dataset including only the transcripts of the audios in a standardized json format (plus an index?)
+        - TESTS:
+            1. test input errors (a field is missing, the audio column exists and contains audio objects, params missing)
+            2. test the transcript of a test file is ok
+            3. test the language is supported (and the tool handles errors)
+
+    2. to compute word error rate
+        - INPUT: 
+            1. a dataset object with the "transcript" and the "groundtruth" columns 
+            2. a service with a name (default is jitter)
+        - PROCESSING:
+            1. computing the per-row WER between the 2 columns
+        - OUTPUT: 
+            1. a dataset with the "WER" column
+        - TESTS:
+            1. test input errors (a field is missing, fields missing, the 2 columns don't contain strings)
+            2. test output is ok
+
diff --git a/README.md b/README.md
@@ -10,19 +10,32 @@
 
 [![pages](https://img.shields.io/badge/api-docs-blue)](https://sensein.github.io/pipepal)
 
-Welcome to the ```pipepal``` repo! This is a Python package for doing incredible stuff with speech and voice.
+Welcome to the ```pipepal``` repo! This is a Python package for streamlining the processing and analysis of behavioral data, such as voice and speech patterns, with robust and reproducible methodologies. 
 
 **Caution:**: this package is still under development and may change rapidly over the next few weeks.
 
 ## Features
-- A few
-- Cool
-- Things
-- These may include a wonderful CLI interface.
+- **Modular design**: Utilize a variety of task-specific transformations that can be easily integrated or used standalone, allowing for flexible data manipulation and analysis strategies.
 
-## Installation
-Install this package via :
+- **Pre-built pipelines**: Access pre-configured pipelines combining multiple transformations tailored for common analysis tasks, which help in reducing setup time and effort.
+
+- **Reproducibility**: Ensures consistent outputs through the use of fixed seeds and version-controlled processing steps, making your results verifiable and easily comparable.
+
+- **Easy integration**: Designed to fit into existing workflows with minimal configuration, `pipepal` can be used alongside other data analysis tools and frameworks seamlessly.
+
+- **Extensible**: Open to modifications and contributions, the package can be expanded with custom transformations and pipelines to meet specific research needs. <u>Do you want to contribute? Please, reach out!</u>
+
+- **Comprehensive documentation**: Comes with detailed documentation for all features and modules, including examples and guides on how to extend the package for other types of behavioral data analysis.
+
+- **Performance Optimized**: Efficiently processes large datasets with optimized code and algorithms, ensuring quick turnaround times even for complex analyses.
 
+- **Interactive Examples**: Includes Jupyter notebooks that provide practical examples of how `pipepal` can be implemented to derive insights from real-world data sets.
+
+Whether you're researching speech disorders, analyzing customer service calls, or studying communication patterns, `pipepal` provides the tools and flexibility needed to extract meaningful conclusions from your data.
+
+
+## Installation
+Install this package via:
 
 ```sh
 pip install pipepal
@@ -42,5 +55,20 @@ hello_world()
 ```
 
 ## To do:
-- [ ] A
-- [ ] lot
+- [ ] Integrating more audio tasks and moving functions from b2aiprep package:
+    - [ ] data_augmentation 
+    - [ ] data_representation
+    - [x] example_task
+    - [x] input_output
+    - [ ] raw_signal_processing
+    - [ ] speaker_diarization
+    - [ ] speech emotion recognition
+    - [ ] speech enhancement
+    - [ ] speech_to_text
+    - [ ] text_to_speech
+    - [ ] voice conversion
+- [ ] Integrating more video tasks:
+    - [x] input_output
+
+- [ ] Preparing some pipelines with pydra
+- [ ] Populating the CLI
diff --git a/data_for_testing/audio_48khz_mono_16bits.wav b/data_for_testing/audio_48khz_mono_16bits.wav
diff --git a/data_for_testing/audio_48khz_stereo_16bits.wav b/data_for_testing/audio_48khz_stereo_16bits.wav
diff --git a/data_for_testing/output_dataset/data-00000-of-00001.arrow b/data_for_testing/output_dataset/data-00000-of-00001.arrow
diff --git a/data_for_testing/output_dataset/dataset_info.json b/data_for_testing/output_dataset/dataset_info.json
@@ -0,0 +1,16 @@
+{
+  "citation": "",
+  "description": "",
+  "features": {
+    "pokemon": {
+      "dtype": "string",
+      "_type": "Value"
+    },
+    "type": {
+      "dtype": "string",
+      "_type": "Value"
+    }
+  },
+  "homepage": "",
+  "license": ""
+}
diff --git a/data_for_testing/output_dataset/state.json b/data_for_testing/output_dataset/state.json
@@ -0,0 +1,13 @@
+{
+  "_data_files": [
+    {
+      "filename": "data-00000-of-00001.arrow"
+    }
+  ],
+  "_fingerprint": "57821607a631abce",
+  "_format_columns": null,
+  "_format_kwargs": {},
+  "_format_type": null,
+  "_output_all_columns": false,
+  "_split": null
+}
diff --git a/data_for_testing/video_48khz_stereo_16bits.mp4 b/data_for_testing/video_48khz_stereo_16bits.mp4