Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_raw_xdf() can I use the nominal sampling rate? #436

Open
behinger opened this issue Sep 27, 2024 · 13 comments
Open

read_raw_xdf() can I use the nominal sampling rate? #436

behinger opened this issue Sep 27, 2024 · 13 comments

Comments

@behinger
Copy link

Hi!
Thanks a ton for the xdf importer. I stumbled today because I want to use my EEG-Amp as the "main-clock", that is, change all timestamps to the nominal sampling rate provided by the EEG stream (I trust the amp more than my recording laptop). If I don't do this, every subject has a slightly different sampling rate (1000.0001 vs. 9999.9998 etc.), which is annoying to continue working with.

As I understood, I could specify a fs_new, but this would also resample my EEG-dataset; whereas I don't need resampling for that one, just dropping the timestamps and recacluating them with the nominal sampling rate (also sometimes we collect with differet FS, would be nice to just be able to specify the nominal one).

I think this is the default behavior of the matlab xdf importer - but I dont have too much experience with that one either.

Maybe this is already possible, if not, I wonder how people are dealing with this issue right now? I'm willing to also spend some time to suggest a PR, but asking first is appropriate I think :)

Cheers, Bene

@cbrnr
Copy link
Owner

cbrnr commented Sep 30, 2024

Hi @behinger! What a happy coincidence, I have just started working on refactoring (and improving) the XDF importer! I'm mentioning this because it is related to your question about nominal vs. effective sampling frequency.

In summary, we decided to use the effective sampling frequency as the default, since currently we do not have any reason to believe that amplifier clocks are more accurate than computer clocks (see the original discussion for reference). Yes, this means that recordings will not be associated with "nice" frequencies like 1000 Hz, but you will see something like 1000.0001 Hz. In practice, this difference is absolutely negligible, so we (I) decided to use the effective sampling frequency as "the ground truth" in my implementation.

One of the reasons why I chose the effective sampling frequency is that it is derived directly from the time stamps (by default, pyxdf.load_xdf() also applies a dejittering algorithm to smooth out variations in lengths between samples). And now here's what I'm currently working on: if there are gaps in the data, these will be automatically reflected by the time stamps, whereas assuming a constant (nominal) sampling frequency will lead to errors in the timing of the signals. Unfortunately, this is what is currently happening, see #385 for reference. My solution is to treat time stamps as a non-uniformly sampled time series and interpolate onto a regular grid (e.g. the nominal sampling frequency). This is very similar to what is currently happening (it is effectively resampling to the nominal sampling frequency), but any gaps in the data are not accounted for.

However, I can still see the value in wanting to treat the data as if it had been sampled with the nominal sampling frequency without any resampling/interpolation. I think we could handle this use case by maybe adding/changing a parameter, but I don't have a good idea yet. In addition, I don't know how we could handle gaps with this approach, which I think is pretty important. One option could be to let the user decide to explicitly disable any gap handling when treating the data as sampled with the nominal sampling frequency. Otherwise, I don't think there's a way around resampling/interpolation, as that's just how XDF works.

I'm very open to input and ideas of course!

@behinger
Copy link
Author

behinger commented Sep 30, 2024

thanks for the detailed response. I didnt remember I had posted on that other thread before.

  • I try to remove necessitiy for downsampling. E.g. I once downsampled data from 1024 to 500 Hz and got the weirdest frequency artefacts (at 48/24Hz). Turns out, introduced due to the non-divisor downsampling. Now I understand that for linear interpolation this shouldnt happen - but ok - I need to figure out fs_new (as the nominal) in either case.

  • The dejitter etc. is independent of which time-stamps to use. I imagine resampling is only necessary if there are multiple regularly sampled streams in the XDF, and then only n-1 need to be interpolated/resampled.

-The gap issue: afaik the matlab importer has a gap detector, maybe this is used for this case? But if you do not resample, just modify the timestamps to follow the nominal rate, this shouldnt be a problem.
I imagine the calculation like this:

sf_eff= 1001
sf_nom = 1000
t_lsl = range(0,step=1/sf_eff,length=100)
t_new = t_lsl ./sf_eff .* sf_nom

edit: haha, this is indeed how I did it in Julia
https://github.com/s-ccs/LslTools.jl/blob/a21a45661022b2637c646a7ef14da3418f5e2504/src/LslTools.jl#L12

@cbrnr
Copy link
Owner

cbrnr commented Sep 30, 2024

If done correctly, resampling should not introduce any weird artifacts. I'm trying to do it correctly this time 😄.

Resampling is primarily necessary if there are multiple streams, yes, but as I've mentioned, I currently also use it even when there's only a single stream and I do not want the effective sampling frequency (but the nominal one for example).

I don't know if pyxdf is able to detect gaps, but even if it does, I still need to create a regularly sampled 2D NumPy array, where gaps are filled with NaNs.

As a workaround, you can do exactly what you suggest. I'm just not sure how to integrate this in the reader.

If there are multiple streams, there is no way around interpolation/resampling, right?

If there is only a single stream, there are two options: no resampling (which currently uses the effective sampling frequency) and resampling. I guess we could let the users decide whether to use the effective or the nominal sampling frequency in the first case.

One idea to change the API would be to remove the fs_new parameter and instead introduce a new parameter fs with possible values "effective", "nominal", or a float. All three values would be possible with a single stream (of which "effective" and "nominal" would not involve resampling), whereas multiple streams require a float (a new sampling frequency).

WDYT? I have to ponder this a little more, maybe there is a better solution.

@behinger
Copy link
Author

(the resampling was via eeglab... never looked into it in detail, no time!)

  • gaps could be added to MNE with some kind of break event (I dont know MNE well enough, in eeglab a boundary event), or maybe MNE doesnt support that?
  • multiple streams: if you only want to apply the nominal srate, no need to interpolate. If you need to have them on the same sampling rate & "grid", yes, at least n-1 need interpolation agreed
  • Your idea could work; another idea: selection option for the "main_clock" which could be "lsl" by default (resulting in effective sampling rates), or a stream name, which would result into a factor being applied to all timestamps, k = sf_nom / sf_eff. Maybe this is more transparent? It detaches resampling need from the main-clock

@behinger
Copy link
Author

behinger commented Jan 13, 2025

This dropped from my radar, but came up again right now (I'm now battling a weird "Stream 1: Calculated effective sampling rate -2067.3343 Hz is different from specified rate 1000.0000 Hz." error; very strange, something to do with the dejitter algorithm)

Any update / plans right now?

@cbrnr
Copy link
Owner

cbrnr commented Jan 13, 2025

Can you share the file? It seems like this error is emitted by pyxdf and not by MNELAB, and if that's the case we'd have to dig a little deeper.

@behinger
Copy link
Author

I noticed this is completly unrelated to pyxdf/mnelab - but the initial issue still stands ;)

@cbrnr
Copy link
Owner

cbrnr commented Jan 15, 2025

Alright, and yes, I agree that there should be some option to avoid resampling and at the same time being able to override the sampling rate. Of course, this will only be possible for a single stream, and currently you can already get the actual samples as is (without resampling), but the sampling rate will then be the measured one and not the nominal one. This is what you were suggesting, right?

@behinger
Copy link
Author

no, the samples of e.g. markers would need to be modified according to the new "main clock"

@cbrnr
Copy link
Owner

cbrnr commented Jan 15, 2025

So you do want to resample markers but not the data stream?

@behinger
Copy link
Author

I guess markers and other streams, just not the "main" stream

@cbrnr
Copy link
Owner

cbrnr commented Jan 15, 2025

I don't know if this makes sense. Essentially, you are saying that you don't trust the XDF (LSL) timestamps, but all recorded streams have gone through the entire synchronization business, including marker streams. If that's what you want, I don't know if XDF/LSL is the right tool for you. You might be better off recording markers directly with your EEG amplifier, but of course this won't work if you also record other devices.

@behinger
Copy link
Author

behinger commented Jan 15, 2025

mh I think we are talking past each other. I absolutely need XDF/LSL for the synchronisation, but ultimately I want to use the EEG amplifier as the master clock. Then I need to reimpolate one dataset less (the EEG one), but still need to do everything for everything else

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants