Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Randomly when playing a sound (wav) a very large memory allocation happens, on Linux Raspberry Pi #606

Open
FinitelyFailed opened this issue Aug 26, 2024 · 10 comments

Comments

@FinitelyFailed
Copy link

Sometimes when my program is going to play an audio file (wav), my program crashes with the error:
memory allocation of 5404319552844632832 bytes failed

After that happens I have to reboot the computer to make audio work again. I have searched and found nothing on this issue.

Unfortunately I do not have a small code example, and I have it happened only when my program is running on a Raspberry Pi . I have not been able to reproduce it on my laptop.

I would love to get a hint on where I should start looking for a solution.

@dvdsk
Copy link
Collaborator

dvdsk commented Aug 26, 2024

5404319552844632832 is about 2^62 the code is trying to allocate/use a huge amount of memory in one go. It sounds to me like there might be an underflow somewhere.

Regarding it only happening on your pi, there can be a ton of reasons. If the program contains unsafe code that can trigger wildly different behavior depending on the system.

I can not really help without a minimal example or the code. I do not mind browsing through it?

I need to know if it is a problem caused by:

  • a bug in rodio's code (that would be pretty bad)
  • a mistake in how rodio is used (then we might be able to improve the docs or api of rodio to prevent that from happening to anyone else)
  • fault in the program you wrote unrelated to rodio

There are tree things you can try:

  • Try to remove parts of your code and see if you can find a minimized example that still produces the error on the Pi.
  • Fill your program with print statements (tip: use dbg!("")) and see where the error happens.
  • Run your program under a debugger like gdb or even better valgrind, that will probably point you to the exact part where things go wrong. Note that because of how they work the issue might dissapear when you use those tools, we call those bugs heisenbugs... they are the worst.

@FinitelyFailed
Copy link
Author

FinitelyFailed commented Aug 27, 2024

This is the code for playing an audio file on a specific audio device at a specific volume.

I had some problems where "sleep_until_end" would end up getting a poll error from alsa, I think. So my log was full of: "an error occurred on output stream: A backend-specific error has occurred: alsa::poll() returned POLLERR". Which is way I added some code which kills the playback when the duration of the audio file has run (+ 50 ms to be sure). Or if there aren't any duration in the meta data the playback will run until "sleep_until_end" is done or until a specified time has run out. All to prevent the error to hang the playback forever.

    async fn play_sound_on_device(device: String,
        volume: f32,
        file_path: String,
        default_sound_duration: u64) -> anyhow::Result<()> {

        // IMPORTANT STUFF!
        // Note that there are functions used here that uses std io functions and thread-blocking
        // functions, i.e. NOT using Tokio tasks.
        // Which means that other tasks running on the same thread will be blocked.
        // The solution to this is to put them all in a task where it is ok to block the thread,
        // ie. by using spawn_blocking.
        let mut set = tokio::task::JoinSet::new();
        set.spawn_blocking(move || {

            // TODO check file exists first
             let file = match std::fs::File::open(&file_path) {
                Ok(file) => {

                    let metadata_string = if let Ok(metadata) = file.metadata() {
                        format!("{:?}", metadata)
                    } else {
                        "Unkown".to_owned()
                    };

                    info!("Successfully read file to play, path: '{file_path}', metadata: {metadata_string}");
                    file
                },
                Err(e) => {
                    warn!("Failed to open file, file path: '{file_path}', error: {e}");
                    bail!("Failed to open file, error: {e}");
                }
            };

            let decoder = match rodio::Decoder::new(BufReader::new(file)) {
                Ok(decoder) => {
                    info!("Successfully created decoder");
                    decoder
                },
                Err(e) => {
                    error!("Failed to create decoder, file path: '{file_path}', error: {e}");
                    bail!("Failed to create decoder, file path: '{file_path}', error: {e}");
                }
            };

            debug!("Source has {} channels", &decoder.channels());

            // Keep _stream on the stack, as it is used during playback and should only be dropped when leaving the scope.
            let (_stream, device_handle) = match Self::get_output_stream(&device) {
                Ok((_stream, device_handle)) => {
                    info!("Successfully created stream and device handle");
                    (_stream, device_handle)
                },
                Err(e) => {
                    error!("Failed to get output stream, device: '{device}', error: {e}");
                    bail!("Failed to get output stream, device: '{device}', error: {e}");
                }
            };

            let sink = match rodio::Sink::try_new(&device_handle) {
                Ok(sink) => {
                    info!("Successfully created sink");
                    sink
                },
                Err(e) => {
                    error!("Failed to create new sink, device: '{device}', error: {e}");
                    bail!("Failed to create new sink, device: '{device}', error: {e}");
                }
            };
            
            let duration = decoder.total_duration();
           
            sink.set_volume(volume);
            sink.append(decoder);

            match duration {
                Some(mut dur) => {
                    debug!("Will play sound '{file_path}' on device '{device}' for a duration of {} ms, at volume: {volume}", dur.as_millis());
                    dur = dur + Duration::from_millis(50);
                    thread::sleep(dur);
                    if sink.empty() {
                        info!("Played sound '{file_path}' on device '{device}' for {} ms, at volume: {volume}", dur.as_millis());
                    } else {
                        error!("The callout should have been done by now, will stop it, number of sounds left: {}", sink.len());
                        sink.stop();
                    }
                },
                None => {
                    warn!("Will play sound '{file_path}' on device '{device}'. Didn't find an duration, will shut it down after a maximum of {default_sound_duration} s");
                    sink.sleep_until_end(); 
                }
            }
            
            Ok(())
        });

        set.spawn(async move {
            tokio::time::sleep(tokio::time::Duration::from_millis(default_sound_duration)).await;
            bail!("Sound playing reach maximum duration of {default_sound_duration} s, will shut down tasks")
        });

        while let Some(res) = set.join_next().await {
            return res?;
        }
        Ok(())
    }

@dvdsk
Copy link
Collaborator

dvdsk commented Aug 27, 2024

How often is this function called? If a lot of sinks are created that might explain the ALSA error.

I do not think the spawn_blocking code is actually aborted when the JoinSet is dropped. Its rather hidden in their docs:

Be aware that tasks spawned using spawn_blocking cannot be aborted because they are not async.

Still this function as is should never create that large a memory allocation. If it does then thats on rodio. Could you run your program using gdb (this might be a reasonable tutorial)?

@FinitelyFailed
Copy link
Author

FinitelyFailed commented Aug 27, 2024

How often is this function called? If a lot of sinks are created that might explain the ALSA error.

At most at an 2 min interval. More commonly like every 10 minutes.

I do not think the spawn_blocking code is actually aborted when the JoinSet is dropped. Its rather hidden in their docs

Be aware that tasks spawned using spawn_blocking cannot be aborted because they are not async.

When you are using jointset, I think you can abort blocking, but I have to run some tests to actually see if it works.

Still this function as is should never create that large a memory allocation. If it does then thats on rodio. Could you run your program using gdb (this might be a reasonable tutorial)?

I'll try to set up gdb, the problem is that the error happens sporadic.

@dvdsk
Copy link
Collaborator

dvdsk commented Aug 27, 2024

I think you can abort blocking, but I have to run some tests to actually see if it works.

Do run the test, however reading the source code of tokio it seems they can not be aborted. JoinSet::spawn_blocking(f) just calls runtime::spawn_blocking(f) just like task::spawn_blocking does. It then puts the JoinHandle returned by spawn blocking in a set.

Joinset is just a Set in which the JoinHandles are put with the added benefit that it calls abort on all the handles when the set drops

@dvdsk
Copy link
Collaborator

dvdsk commented Aug 27, 2024

I'll try to set up gdb, the problem is that the error happens sporadic.

You can see if you can change something to your code to make it happen all the time. But that is really hard. At this point I would just rewrite it a bit and hope the error goes away.

@FinitelyFailed
Copy link
Author

I think you can abort blocking, but I have to run some tests to actually see if it works.

Do run the test, however reading the source code of tokio it seems they can not be aborted. JoinSet::spawn_blocking(f) just calls runtime::spawn_blocking(f) just like task::spawn_blocking does. It then puts the JoinHandle returned by spawn blocking in a set.

Joinset is just a Set in which the JoinHandles are put with the added benefit that it calls abort on all the handles when the set drops

Thanks for pointing that out for me. I wrote a test and you are correct. It is just really weird that the tokio documentation is that contradictory ... I'll rewrite my code, to handle it differently.

@FinitelyFailed
Copy link
Author

FinitelyFailed commented Aug 27, 2024

I managed to get some more logs via journalctl:

Aug 27 14:11:08 raspberrypi4-64 myprogram[9416]: 2024-08-27T14:11:08.014471931+00:00 INFO libs/modules/callout_module/src/application_state/utils/sound_player.rs:133 - Successfully read file to play, path: '/data/gpt/fe/audio/83537078-9a4d-469f-b50f-e6dfd5876514', metadata: Metadata { file_type: FileType(FileType { mode: 33188 }), is_dir: false, is_file: true, permissions: Permissions(FilePermissions { mode: 33188 }), modified: Ok(SystemTime { tv_sec: 1724766468, tv_nsec: 45997281 }), accessed: Ok(SystemTime { tv_sec: 1724764557, tv_nsec: 193258474 }), created: Ok(SystemTime { tv_sec: 1723705373, tv_nsec: 10650076 }), .. }
Aug 27 14:11:08 raspberrypi4-64 myprogram[9416]: 2024-08-27T14:11:08.014960579+00:00 INFO libs/modules/callout_module/src/application_state/utils/sound_player.rs:144 - Successfully created decoder
Aug 27 14:11:08 raspberrypi4-64 device-service[335]: {"Timestamp":"2024-08-27T14:11:07.030546671Z","Level":"Information","RenderedMessage":"Successfully updated 'Flags'","Properties":{"BlockGid":"9041005900300633","ApplicationNetwork":"69366fd4-baa9-460a-b9ee-0e0d0f37b234","BlockRef":"9041005900300633","SessionId":"8b573bbd-92d4-4887-99cd-23789df05327","File":"libs/modules/inner_signage_module/src/data_handler.rs","VehicleRef":"9031005900306034","ApplicationName":"PassengerInformationService","DeviceId":"1d93d2c2-2173-4ba1-a457-02172efd15ae-gptedge","Target":"inner_signage_module::data_handler","VehicleId":"9031005900306034","JourneyRef":"9015005000300133","ModulePath":"inner_signage_module::data_handler","TripId":"55700000076131905","EventId":"5749357a-331a-4618-a6aa-85ccbfab845a","Line":"148","MACAddress":"e45f0126601d","TripGid":"9015005000300133","Hostname":"raspberrypi4-64"}}
Aug 27 14:11:08 raspberrypi4-64 myprogram[9416]: ALSA lib ../../alsa-plugins-1.2.1/oss/pcm_oss.c:377:(_snd_pcm_oss_open) Unknown field port
Aug 27 14:11:08 raspberrypi4-64 myprogram[9416]: ALSA lib ../../alsa-plugins-1.2.1/oss/pcm_oss.c:377:(_snd_pcm_oss_open) Unknown field port
Aug 27 14:11:08 raspberrypi4-64 myprogram[9416]: ALSA lib ../../alsa-plugins-1.2.1/usb_stream/pcm_usb_stream.c:508:(_snd_pcm_usb_stream_open) Unknown field hint
Aug 27 14:11:08 raspberrypi4-64 myprogram[9416]: ALSA lib ../../alsa-plugins-1.2.1/usb_stream/pcm_usb_stream.c:508:(_snd_pcm_usb_stream_open) Unknown field hint
Aug 27 14:11:08 raspberrypi4-64 myprogram[9416]: ALSA lib ../../alsa-plugins-1.2.1/usb_stream/pcm_usb_stream.c:508:(_snd_pcm_usb_stream_open) Unknown field hint
Aug 27 14:11:08 raspberrypi4-64 myprogram[9416]: ALSA lib ../../alsa-plugins-1.2.1/usb_stream/pcm_usb_stream.c:508:(_snd_pcm_usb_stream_open) Unknown field hint
Aug 27 14:11:08 raspberrypi4-64 myprogram[9416]: memory allocation of 5404319552844632832 bytes failed
Aug 27 14:11:08 raspberrypi4-64 audit[9416]: ANOM_ABEND auid=4294967295 uid=0 gid=0 ses=4294967295 pid=9416 comm="cpal_alsa_out" exe="/usr/bin/myprogram" sig=6 res=1
Aug 27 14:11:08 raspberrypi4-64 myprogram[9416]: 2024-08-27T14:11:08.157546733+00:00 INFO libs/modules/callout_module/src/application_state/utils/sound_player.rs:158 - Successfully created stream and device handle
Aug 27 14:11:08 raspberrypi4-64 myprogram[9416]: 2024-08-27T14:11:08.157714492+00:00 INFO libs/modules/callout_module/src/application_state/utils/sound_player.rs:169 - Successfully created sink
Aug 27 14:11:08 raspberrypi4-64 kernel: audit: type=1701 audit(1724767868.152:328): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=9416 comm="cpal_alsa_out" exe="/usr/bin/myprogram" sig=6 res=1
Aug 27 14:11:08 raspberrypi4-64 kernel[222]: [43852.532479] audit: type=1701 audit(1724767868.152:328): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=9416 comm="cpal_alsa_out" exe="/usr/bin/myprogram" sig=6 res=1
Aug 27 14:11:08 raspberrypi4-64 mosquitto[292]: 1724767868: Client myprograme45f0126601d closed its connection.
Aug 27 14:11:08 raspberrypi4-64 systemd[1]: myprogram: Main process exited, code=killed, status=6/ABRT
Aug 27 14:11:08 raspberrypi4-64 systemd[1]: myprogram: Failed with result 'signal'.

Which seems to point to cpal?

@dvdsk
Copy link
Collaborator

dvdsk commented Aug 27, 2024

Which seems to point to cpal?

I see nothing suspicious here, could you point it out for me?

@FinitelyFailed
Copy link
Author

FinitelyFailed commented Aug 27, 2024

These three lines:

Aug 27 14:11:08 raspberrypi4-64 audit[9416]: ANOM_ABEND auid=4294967295 uid=0 gid=0 ses=4294967295 pid=9416 comm="cpal_alsa_out" exe="/usr/bin/myprogram" sig=6 res=1
Aug 27 14:11:08 raspberrypi4-64 kernel: audit: type=1701 audit(1724767868.152:328): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=9416 comm="cpal_alsa_out" exe="/usr/bin/myprogram" sig=6 res=1
Aug 27 14:11:08 raspberrypi4-64 kernel[222]: [43852.532479] audit: type=1701 audit(1724767868.152:328): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=9416 comm="cpal_alsa_out" exe="/usr/bin/myprogram" sig=6 res=1

I do not know what "kernel: audit" means, but both lines mentions "comm="cpal_alsa_out".

dvdsk added a commit to dvdsk/tokio that referenced this issue Aug 27, 2024
I noticed some confusion when users interact with JoinSet. While
task::spawn_blocking notes that blocking code can not be aborted
that same warning is not present in the JoinSet.

While this makes perfect sense when you know how async works in Rust
it is difficult for users less experienced with the inner workings of
async.

Example: RustAudio/rodio#606 (comment)
@github-staff github-staff deleted a comment from Lxx-c Oct 23, 2024
@github-staff github-staff deleted a comment from Lxx-c Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
@dvdsk @FinitelyFailed and others