-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding utility functions #48
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## audio_abstract_dtype #48 +/- ##
=========================================================
+ Coverage 41.62% 59.38% +17.75%
=========================================================
Files 25 36 +11
Lines 627 943 +316
=========================================================
+ Hits 261 560 +299
- Misses 366 383 +17 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good to me. I have two suggestions that might be helpful in the future that can be minor tasks for others later:
- In speech_to_text_evaluation I think it would be good to give more concrete definitions on what some of these mean or reference say the jiwer docs since I haven't heard of/am not sure the definitions of say WIP or WIL.
- In cca_cka we should create an enum for the types of kernels that are allowed as a better software practice
thank you, @wilke0818, for your feedback. For now, I have created an enum for the kernel types. I will address documentation issues more calmly if that sounds reasonable. |
""" | ||
down_mixed_audios = [] | ||
for audio in audios: | ||
if audio.waveform.dim() != 2 or audio.waveform.size(0) < 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is needed since Pydantic should be handling this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair. i have removed this
if audio.waveform.dim() != 2 or audio.waveform.size(0) < 1: | ||
raise ValueError("waveform should have shape (num_channels, num_samples)") | ||
|
||
down_mixed_audio = audio.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think we need to change this to audio.model_copy(deep=True)
but curious if what's better is this or just creating a new Audio instance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am not sure there is a big difference here. But can easily switch to creating a new Audio object so that we are consistent in coding style plus it becomes clearer how we create a new entity/item
raise ValueError("waveform should have shape (num_channels, num_samples)") | ||
|
||
down_mixed_audio = audio.copy() | ||
down_mixed_audio.waveform = audio.waveform.mean(dim=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry slightly about this but maybe your test cases handle this, but does this guarantee that waveform is shape (1, num_samples)? I'm not 100% sure that the field_validator runs in this case but if it does then this is fine. If it doesn't, then use audio.waveform.mean(dim=0, keepdim=True)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was managed anyway, but i have now added keepdim=True
to make it clearer
Shape: (num_channels, num_samples). | ||
|
||
Returns: | ||
List[Audio]: The list of audio objects with a mono waveform averaged from all channels. Shape: (num_samples). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this gets into a later comment I made, but at least in the Audio class as of right now, we standardize the waveform field to be (num_channels, num_samples) and perhaps to keep consistency we keep it as (1, num_samples)?
return down_mixed_audios | ||
|
||
|
||
def select_channel_from_audios(audios: List[Audio], channel_index: int) -> List[Audio]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
basically see all of my comments for the above function
src/senselab/utils/tasks/eer.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we've broken up a lot of the tasks into individual files (e.g. individual modules) and I'm wondering if we should consolidate a bit to make it easier to use. Not super opinionated on this though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a good point. I think we can proceed like this for this first release and next week we do some restructure
I have added some utility functions for computing cosine similarity, eer, wer, cca, cka, cross correlation