Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Fix audio file with cover image treated as video file #91

Open
wants to merge 8 commits into
base: 90-bug-audio-file-treated-as-a-video-file
Choose a base branch
from

Conversation

Rushi1109
Copy link

fixes the bug raised in #90

The ffprobe tool is used to check if any stream contains codec_name as mjpeg.

If mjpeg is present, the media is treated as Audio, other media is treated as Video

@omeryusufyagci
Copy link
Owner

Hey @Rushi1109, thanks for the PR. Frankly, I wasn't familiar with the full output set of ffprobe, but from what I see they provide disposition.attached_pic as a field in itself. Could you please get the json output which should give you a list of streams, with each stream providing you this field. From what I briefly checked, that should literally be what we're looking for.

Thanks for checking!

@omeryusufyagci omeryusufyagci added bug Something isn't working core Media processing component, the core C++ binary. iteration Requires further iteration for approval labels Dec 24, 2024
@Rushi1109
Copy link
Author

Hey @omeryusufyagci,

When looking at the raw data of the media I see DISPOSITION:attached_pic = 1 for stream index 2 which is the video stream.

But, When I tried to get the information about disposition.attached_pic from ffprobe tool, I am not able to get it's value. I tried extracting from json but still it's not showing attached_pic for that file. Do you have any idea about why it's getting filtered out, by ffmpeg?

@Rushi1109
Copy link
Author

Moreover @omeryusufyagci,

I have researched a bit and got to know that If the video stream contains a static image, the average frame rate for that stream will be 0/0. My current implementation is using the avg_frame_rate from video stream and decides the media type.

Do you see any issue with this approach?

@omeryusufyagci
Copy link
Owner

Hey @omeryusufyagci,

When looking at the raw data of the media I see DISPOSITION:attached_pic = 1 for stream index 2 which is the video stream.

But, When I tried to get the information about disposition.attached_pic from ffprobe tool, I am not able to get it's value. I tried extracting from json but still it's not showing attached_pic for that file. Do you have any idea about why it's getting filtered out, by ffmpeg?

Please make sure you include disposition after the -show_entries flag, e.g. -show_entries stream=index,<other_stuff>,disposition.

Could you try like that, and if there's still an issue please provide the command you used and the output you received.

I have researched a bit and got to know that If the video stream contains a static image, the average frame rate for that stream will be 0/0. My current implementation is using the avg_frame_rate from video stream and decides the media type.

Do you see any issue with this approach?

Thanks for figuring this out! Good to know that'd be a valid fix. The disposition.attached_image would be a more robust solution, as it explicitly identifies cover images and aligns better with the intent of filtering out such streams.

Could you please verify this and let me know if it resolves the issue? Thank you!

@Rushi1109
Copy link
Author

Please make sure you include disposition after the -show_entries flag, e.g. -show_entries stream=index,<other_stuff>,disposition

Yes I included disposition. I am using the same command of ffprobe but instead of avg_frame_rate, I added disposition and removed that awk command. But, I am not seeing any output for disposition.

Can you check it on your side? Do you see disposition in output?

@omeryusufyagci
Copy link
Owner

omeryusufyagci commented Dec 29, 2024

Please make sure you include disposition after the -show_entries flag, e.g. -show_entries stream=index,<other_stuff>,disposition

Yes I included disposition. I am using the same command of ffprobe but instead of avg_frame_rate, I added disposition and removed that awk command. But, I am not seeing any output for disposition.

Can you check it on your side? Do you see disposition in output?

HI @Rushi1109, you're right. I was able to reproduce the same behavior. It depends on what's bundled with the metadata, and we won't be able to count on the existence of that field. That means we'll have to check for this ourselves.

You're also correct, that a cover image is reported as a video stream that has 0/0 avg frame rate. It may also be reported as 'Cover' in a comment field, but that again is dependent on the metadata, whereas the 0/0 avg frame rate seems to be always available.

In this case, I propose to get the same ffprobe output as json (fyi, we're already using nlohmann/json in the project), parse the avg frame rate into a vector, and check if a given file has only 1 video stream and that video stream has 0/0 avg frame rate. Meaning, as soon as we iterate through a second video stream, we should conclude our check to avoid wasting time there.

Could you update this PR to implement this? Thanks

@Rushi1109
Copy link
Author

HI @Rushi1109, you're right. I was able to reproduce the same behavior. It depends on what's bundled with the metadata, and we won't be able to count on the existence of that field. That means we'll have to check for this ourselves.

You're also correct, that a cover image is reported as a video stream that has 0/0 avg frame rate. It may also be reported as 'Cover' in a comment field, but that again is dependent on the metadata, whereas the 0/0 avg frame rate seems to be always available.

In this case, I propose to get the same ffprobe output as json (fyi, we're already using nlohmann/json in the project), parse the avg frame rate into a vector, and check if a given file has only 1 video stream and that video stream has 0/0 avg frame rate. Meaning, as soon as we iterate through a second video stream, we should conclude our check to avoid wasting time there.

Could you update this PR to implement this? Thanks

Sure. I'll look into that today.

@Rushi1109
Copy link
Author

Hello @omeryusufyagci,

I have made required changes. I have tested it manually and it is working as expected.

Can you test it on your side and let me know if any change is required?

@Rushi1109 Rushi1109 changed the title check if the stream contains mjpeg as codec_name, then treat media as audio Bug: Fix audio file with cover image treated as video file Dec 31, 2024
Copy link
Owner

@omeryusufyagci omeryusufyagci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes @Rushi1109 !

Here's some comments for you:

About the parseFrameRate() addition:
The MediaProcessor::Utils NS is normally meant for generic utility. In this case, the frame rate helper is intrinsically coupled with getMediaType(). So, we should move this under the Engine class as a private member function. Also, for expressiveness, let's rename it to hasZeroFrameRate().

Regarding the implementation of this helper, I believe we can make it simpler. We want to know if an otherwise audio file has a video stream that has exactly "0/0" for the avg frame rate. So, we can simply and do a full string comparison, instead of parsing each char in the field. Pass a const ref of the stream object, and it can be as simple as a 1-line return statement that checks if it matches the expected case. We should however, make sure to cover the cases where the field is missing or invalid.

Updates on getMediaType():
Good start on this with the json parsing, but we can also simplify this similar to above. We should also encapsulate the logic where we're determining if we have a valid video stream. Otherwise, it makes it needlessly difficult understand what this function does at a glance.

Here's roughly what I have in mind:

for (const auto& stream : streams) {
    if (hasValidVideoStream(stream)) {
        return MediaType::Video;
    }
    if (hasValidAudioStream(stream)) {
        hasAudioStream = true; // define this flag false before the loop
    }
}

// if we haven't returned for Video and have found a valid audio stream
if (hasAudioStream) {
    return MediaType::Audio;
}

// otherwise
return MediaType::Unsupported;
}

So please encapsulate both checks (hasValid...) into individual functions. I realize the one for audio will likely be a 1-line return statement, but I'd still prefer to have it for the sake of consistency.
Within the hasValidVideoStream(), you'd call hasZeroFrameRate() with a const ref of the json stream for the avg frame rate field.
The other 2 helpers should also be called with const refs of the stream and all 3 new functions should be const themselves too.

This approach should make it easier to understand what getMediaType() does, and in general simplify the deduction logic to see if we have a cover image. Could you please update this PR accordingly?

Please let me know if you have any questions along the way. Many thanks!

@Rushi1109 Rushi1109 changed the base branch from main to 90-bug-audio-file-treated-as-a-video-file January 2, 2025 15:43
@Rushi1109
Copy link
Author

Updated the PR.

Copy link
Owner

@omeryusufyagci omeryusufyagci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Rushi1109, thanks for the updates. Definitely moving in the right direction! Could you please take a look and update the PR? Thanks!

Comment on lines 30 to 31
mutable bool hasAudioStream;
mutable int videoStreamCount;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason to add these as member variables is not clear to me. They do not represent persistent state for the Engine class. Their meanings persist through a function call of getMediaType() and do not really make sense outside of that.

Although not critical here as we won't keep them, but for the future: before adding mutable-specified variables, we should take a good look at why we intend to that. It could point to a bad design choice that I'd rather address at source than to go the mutable route.

Please refactor these variables to be local to the respective function, where they're used.

Comment on lines 87 to 94
// If we haven't returned for Video and have found a valid audio stream
if (hasAudioStream) {
return MediaType::Audio;
} else {
throw std::runtime_error("Unsupported media type detected.");
}

// otherwise
return MediaType::Unsupported;
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 2 lines of comments here were more for providing clarity in the discussion, but typically we refrain to add comments that state otherwise obvious operations. So in this case, let's remove the comments here

Comment on lines 79 to 85
for (const auto& stream : streamData["streams"]) {
if (hasValidVideoStream(stream)) {
return MediaType::Video;
} else if (hasValidAudioStream(stream)) {
hasAudioStream = true;
}
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

l.82 here the else if is redundant as we're returning for valid video above. It should be a separate if statement

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

l.82 here the else if is redundant as we're returning for valid video above. It should be a separate if statement

How is it redundant? I don't understand. Can you please explain it?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two conditional checks have no logical coupling or dependency. Since we already return if a valid video stream is found, the code following that if statement will only execute if the condition was false. So, the else if is redundant, as the second check is independent and can simply be a separate if.

Comment on lines 56 to 63
/**
* @brief Checks if the provided stream is valid video stream
*
* @param stream json data of stream
*
* @return True if the stream is valid. False otherwise.
*/
bool hasValidVideoStream(const nlohmann::json& stream) const;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does valid mean here? We should describe what exactly that means here


bool Engine::hasValidVideoStream(const nlohmann::json& stream) const {
if (stream["codec_type"] == "video") {
++videoStreamCount;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incrementing this here (even after the refactor to make it a local var to getMediaType) would make it more difficult to understand the workflow. This should be done at the call side.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incrementing this here (even after the refactor to make it a local var to getMediaType) would make it more difficult to understand the workflow. This should be done at the call side.

Removing this variable. As we will return from getMediaType() as soon as we will get an actual video stream.(Video stream that is not cover image.)

In case when we encounter video stream with cover image, we will return false from here. In next iteration, when an actual video stream is computed, we will return true after checking avg_frame_rate of that stream. I do suggest to check for avg_frame_rate for that stream as well.

I don't think we should return just by video stream count being more than 1. We should check for avg_frame_rate. What do you think about this? let me know.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked the changes.

First off, we're now saying any stream that has the avg frame rate field will be treated as an invalid video stream. What if ffprobe returns sensible values for actual videos with that field? which wouldn't be surprising
Checking if the frame rate is exactly 0/0 and invalidating then, is logical. To determine if a video stream is valid, perhaps you can check non-zero duration in addition.

Also, I don't see the point of refactoring the function name to isActualVideoStream.

Regarding your question: As soon as we have even one single valid video stream, that would classify the media file as a video and we should return Video. There is no longer a need to check each of the remaining streams.
On the contrary, to determine if an audio file only has a cover attached, we have to ensure that media file has only 1 video stream whose avg frame rate is exactly 0/0.

I hope this answers your question, and please let me know otherwise.

Copy link
Author

@Rushi1109 Rushi1109 Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First off, we're now saying any stream that has the avg frame rate field will be treated as an invalid video stream. What if ffprobe returns sensible values for actual videos with that field? which wouldn't be surprising
Checking if the frame rate is exactly 0/0 and invalidating then, is logical. To determine if a video stream is valid, perhaps you can check non-zero duration in addition.

The current implementation works as described below:

  1. It checks for the codec_type field. It should be video.
  2. It then ensures the avg_frame_rate field exists.
  3. And then we check if the avg_frame_rate is equal to 0/0 which indicates it is a static media.

In all the above cases we return false.

  1. Otherwise we return true as the stream is now actual video stream.

The thing with the duration field would be problematic, because when we checked earlier, the ffmpeg tool outputs the duration of static media same as audio duration. So, I don't think we should rely on duration field.

The current implementation look good to me. As there is early return when we encounter first ever valid video stream. (We terminate the for loop from that point)

But, we do iterate when the first stream is not valid video stream (static media, cover image).

Let me know if there's something I've misunderstood.

}

bool Engine::hasValidVideoStream(const nlohmann::json& stream) const {
if (stream["codec_type"] == "video") {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also adding to my comment below, this should be an early return. If it's not a video stream to begin with, we should immediately return false.

Comment on lines 100 to 102
if (videoStreamCount > 1 || !hasZeroFrameRate(stream["avg_frame_rate"])) {
return true;
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not cover the inexistence of the field, and may cause a crash. Please ensure the field is available and return false otherwise. Similarly, do a return false, instead of nesting under a conditional level

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working core Media processing component, the core C++ binary. iteration Requires further iteration for approval
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants