Add support for Real-ESRGAN, plus various fixes (hangs, async video, etc.) #1133

arximboldi · 2024-06-29T19:29:50Z

This picks up the work in #1102 by @aa-ko to introduce Real-ESRGAN support, adding support for all models exposed by realesrgan-ncnn-py and testing it outside of a Docker container.

Additionally, I have fixed a few issues that I've found. I wasn't sure whether you prefer to have multiple smaller PR, or to just review all at once. The work is separated into multiple commits so I can still split into multiple PR's if you prefer.

These are the highlights from this PR:

A fix for the mysterious hangs described in Hitting mysterious hangs on a specific frame when upscaling #780, plus other smaller issues with the application hanging during the teardown sequence
A fix for the out of sync videos described in Video slowly getting out of sync (slower) than audio and subtitles #1132 . This is actually related to the previous issue, as it was all related to fundamental problems with how
A fix for the terminal output flooding issues with some upscalers: Get progress from docker container #1079 (comment)

There are also some smaller changes that I'm not sure you agree with, but that I think are quite convenient:

A shell.nix environment that allows installing all the system dependencies in an isolated environment, by simply running nix-shell. This was crucial for me to be able to run and test the program locally outside of a Docker container
Minor improvements to the logging system (proper acknowledgement of the -l flag, and introduction of a new -L flag).

Thank you @aa-ko for starting the work on integrating Real-ESRGAN. It's model realesr-animevideov3 is giving me incredible results, with image quality comparable or better for anime than realcugan, yet much faster!

Thank you @k4yt3x for this incredible tool. Looking into the code has shown me how much love there is into it and it is working super well for me now!

Choosing realesrgan ignores noise flag for now and always uses the realesrgan-x4plus model.

A few scalers output a lot of crap like: 10% 25% 50% ... That messes with the progress bar indicator.

This allows a reproducible way to get all the needed dependencies and allows running the software in NixOS environments.

This solves a couple of issues: 1. The log level passed with -l was not properly applied to loguru, it was always using `debug`, the default. 2. When actually passing `-l debug` ffmpeg would flood us with too much information, it is better to have a separate option to debug ffmpeg issues.

This can actually lead to corrupted videos, or with bad synchronization or without skipping support. This option seems to do the right thing for this and is inocuous in our case.

...this would cause the whole application to hang during the teardown sequence of the application.

Since now these are only printed when passing -l debug, I think this is an acceptable compromise and can help debug the various problems that have existed with the application hanging

This problem used to be more severe, but has become less frequent with our fix for k4yt3x#1132 The problem happens because we used to take `frame_count` as an absolute truth, and we would iterate until we had processed that many frames. However, as we've learnt, this is just an estimate: it can happen that the `Decoder` thread is done before we hit that frame count. With this change, we detect that scenario, and gracefully finish the rendering process, making sure that all pending frames have been read and processed and get written in the final stream.

Some anime files in particular like to include custom fonts and stuff like that in these streams. I think it is useful to keep them as to keep the generated file as close to the original as possible.

twardoch · 2024-09-01T07:12:02Z

@arximboldi

class KillableListener(Listener, KillableThreadMixin):
    """
    A killable version of pyinput.keyboard.Listener, as joining()
    seems to hang on some systems even after properly calling close().
    """
    pass

in https://github.com/arximboldi/video2x/blob/realesrgan/video2x/video2x.py#L131C1-L136C9 causes a fail where pynput is not available. Because Python tries to subclass from Listener. You should define that class in a scoped way.

twardoch · 2024-09-01T07:27:48Z

@arximboldi Also when I try running with realesrgan on Google Colab, I get

Unrecognized option 'fps_mode'. Error splitting the argument list: Option not found

(which seems to be referenced in https://github.com/arximboldi/video2x/blob/realesrgan/video2x/decoder.py )

twardoch · 2024-09-01T21:32:52Z

Turns out that 'fps_mode' is a relatively new option in ffmpeg, and the standard Google Colab environment installs an older ffmpeg. But I was able to work around it with:

# Install system dependencies
!apt-get update
!apt-get install -y --no-install-recommends \
    python3-pip libvulkan-dev glslang-dev glslang-tools \
    build-essential swig ninja-build nvidia-driver-535 \
    mpv xserver-xorg-video-dummy xvfb libomp5
!sudo curl -L https://github.com/BtbN/FFmpeg-Builds/releases/download/latest/ffmpeg-master-latest-linux64-gpl.tar.xz -o /usr/local/bin/ffmpeg.tar.xz

%cd /usr/local/bin/
!7z e /usr/local/bin/ffmpeg.tar.xz -aoa
!7z e /usr/local/bin/ffmpeg.tar -aoa
!sudo chmod a+rx /usr/local/bin/ffmpeg

twardoch · 2024-09-01T21:33:59Z

Similarly, to work around the poor integration of Listener, I was able to work around it in Google Colab by providing:

%env DISPLAY=:0
class Listener:
    pass

from video2x import Video2X
video2x = Video2X()

But that’s not how it should work :)

k4yt3x · 2024-10-08T03:41:31Z

Thank you all @arximboldi @aa-ko @twardoch for the amazing work here. I've been very busy with work and several other non-FOSS projects over the last half of this year. Sorry that I missed this PR came in. I've been just getting way too many messages from GitHub and I didn't notice this PR was submitted. I only saw if after I completed the 6.0.0 rewrite.

It has always been my goal to introduce RealESRGAN in Video2X. I managed to do it in the rewrite. Take a look if you're still interested.

Unfortunately I can't merge this without destroying the commit history, so I'll just have to close this. Sorry again for not responding to this in time. I really appreciate your amazing work!

aa-ko · 2024-10-08T07:57:26Z

@k4yt3x Wow, you actually just casually dropped the C++ rewrite, I am speechless 😮

As far as I can tell, this looks exactly like what I was hoping for. I'll try 6.0.0 ASAP, maybe I can contribute something this time lmao

Thanks for all the hard work, cheers! 🎉

twardoch · 2024-10-08T13:11:09Z

I’ll try it ASAP on Colab

k4yt3x · 2024-10-08T15:08:33Z

@twardoch I haven't updated the Colab playbook yet. You'll need to compile it if you wanna try colab.

twardoch · 2024-10-08T15:12:09Z

I guess so. I wonder if you could provide simple building instructions on how to build the stuff. I can adapt it to macOS and will PR. So far I had installed various libs on macOS via brew and then "make" failed on realesrgan being available. I will look at your github action for how you build on Linux and will try to adapt for macOS

k4yt3x · 2024-10-08T15:27:58Z

@twardoch if you can actually make it work for mac that'll be amazing. I never had a mac in my life so it'll be hard for me to make that work. As for steps to build, here's for Debian/Ubuntu:

video2x/Makefile

Lines 36 to 54 in 411cca4

    
           debian: 
        
           	apt-get update 
        
           	apt-get install -y --no-install-recommends \ 
        
           		build-essential cmake clang pkg-config \ 
        
           		libavcodec-dev \ 
        
           		libavdevice-dev \ 
        
           		libavfilter-dev \ 
        
           		libavformat-dev \ 
        
           		libavutil-dev \ 
        
           		libswscale-dev \ 
        
           		libvulkan-dev \ 
        
           		glslang-tools \ 
        
           		libomp-dev 
        
           	cmake -B /tmp/build -S . -DUSE_SYSTEM_NCNN=OFF \ 
        
           		-DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ \ 
        
           		-DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/tmp/install \ 
        
           		-DINSTALL_BIN_DESTINATION=. -DINSTALL_INCLUDE_DESTINATION=include \ 
        
           		-DINSTALL_LIB_DESTINATION=. -DINSTALL_MODEL_DESTINATION=. 
        
           	cmake --build /tmp/build --config Release --target install --parallel

Another one is for Arch in PKGBUILD.

twardoch · 2024-10-08T15:44:50Z

The Homebrew package installer for macOS does include "ncnn", but I'm not sure if that means that anything will have to be adapted from vulkan to another backend.

Right now the macOS situation for AI is very complicated because drastically different backends are used for the new Apple Silicon Macs (where there are more inference acceleration possibilities) vs. the older Intel Macs (which often can do only CPU inference).

twardoch · 2024-10-08T15:50:01Z

Ps. Many local AI packages only exist for the Apple M-Silicon Macs because the older Apple Intel hardware is basically completely unsuitable for local AI inference.

For example Topaz Video AI runs like 50x faster on my MacBook Air M3 than on a beefy MacBook Pro Intel that's less than 3 years older.

So it's running time of 3 minutes vs. 2 hours to complete the same task.

mirh · 2024-10-09T02:17:45Z

That sounds like a mac problem, I'm skeptical that even without a npu a properly accelerated coffee lake could be so behind

Anyhow, is RESRGAN supposed to be the only supported driver now?

k4yt3x · 2024-10-09T04:20:59Z

@mirh there's also libplacebo, which renders Anime4K v4 now, but it should be compatible with any MPV compatible GLSL

RealESRGAN has both an anime model and a real-life model. From their paper it also looks like the performance is better than RealSR (?), so I didn't bother adding RealSR. The other ones are kinda old so I didn't bother either.

mirh · 2024-10-10T00:22:10Z

Their paper speaks truth, and all things considered I would also probably recommend this as the "overall" default.
But IIRC from my ancient tests, while RealSR may generally be blurrier (though while still retaining the most detail in the market bar W2xEX's "photo-conservative") I did find it less foolhardy too. Like, at least in my 360p video whatever the exaggeration it could produce still felt "natural" (as in "smoothly transitioning with the rest of the picture"), unlike some of the oversharpening blunders of RESRGAN.

Similarly with an anime DVD, while not "wildly making stuff up" (when the camera pans a scene with a corrugated iron roof and a metal fence it seems like they are dancing with CUGAN) you can still slightly tell in certain places that it is trying too much (so for this reason waifu2x seems a solid "safe ground").
Presumably temporal stability isn't going to be anywhere near a concern if you are working with higher resolution content.

And last but not least, just a few months ago I figured out that somehow SRMD (which I had always thought to be always strictly inferior to the alternatives at least as far as quality was concerned) could give me the best results bar none with a 369x207 crop of a picture from my phone.

p.s. Anime4K if any seems a bit pointless. Exactly because it's as lightweight as it is fairly low quality, I don't think it's the kind of upscaler that people would use in a complex tool like this.

k4yt3x · 2024-10-10T08:30:56Z

I've been away for long enough to not know what the best options are anymore. In addition to collecting opinions I'd also like to have the decision of what to add backed by testing results like VMAF (mostly because adding support for new solutions is time consuming). It's a bit beyond my ability to do it all by myself. Ideally we create an environment where people can discuss and vote... perhaps try bringing this up in the telegram group or something?

mirh · 2024-10-10T21:15:53Z

I'm also very much out of the loop, but honestly there hasn't been that much activity in the last years (I assume there's just so much you can do with tiny generic networks?). At most there might be slightly different trained models (that is a madhouse tbf)?

As for "objective testing®" I cannot recommend enough FFMetrics (and then maybe this?).

k4yt3x · 2024-10-11T03:48:01Z

Both look interesting. Perhaps I should set up some kind of a feature poll in the discussions for people to discuss and vote for any new model that should be implemented. My specialty is not actually in AI nor CV, so I would really like if people actually from those fields can bring up and discuss what's worth implementing.

mirh · 2024-10-12T20:35:17Z

Having eyes (and probably lot of time to waste) is probably more important here than knowledge.. like, papers do it too.

One super dope thing that is missing if any, is some easy automatic way to test/compare different solutions (a bit like this perhaps, but for models rather than compression settings.. even though those would be interesting too now that I think).
Like, what I used to do was using FFMetrics to find the most different timestamps (not just with the source, but also between each other), then seek for it and compare the different MPC windows.

k4yt3x · 2024-10-14T17:45:24Z

@twardoch I opened a new issue for mac support. If you make any progress could you please post them there? Thanks.
#1189

aa-ko and others added 4 commits February 21, 2024 15:50

This kinda works with realesrgan for now.

62a1f10

This works, but it's still ugly.

b6d7da7

Choosing realesrgan ignores noise flag for now and always uses the realesrgan-x4plus model.

Add support for the other realesrgan models

14799c5

Implement redirection of output of the scaler subprocesses

67e3c16

A few scalers output a lot of crap like: 10% 25% 50% ... That messes with the progress bar indicator.

arximboldi changed the title ~~Add support for Real-ESRGAN, and fix hang and async videos, other minor improvements~~ Add support for Real-ESRGAN, plus various fixes (hangs, async video, etc.) Jun 29, 2024

arximboldi added 14 commits June 29, 2024 22:04

Add basic nix-shell environment so we can run pdm

dc0ba87

This allows a reproducible way to get all the needed dependencies and allows running the software in NixOS environments.

Update pdm.lock

494d1b7

Set venv backend to venv, virtualenv doesn't play well with NixOS

ed68e5c

Use frame count as computed by ffmpeg precisely

19c6647

Estimate frame count using the duration and the target FPS

890b6ee

Fix k4yt3x#1132, video getting out of sync

7acb93e

Performance: do not diff frames if not needed

7304ea7

Increase ffmpeg queue sizes to avoid warnings

5ce1699

Fix warning "Starting new cluster due to timestamp"

fba3609

This can actually lead to corrupted videos, or with bad synchronization or without skipping support. This option seems to do the right thing for this and is inocuous in our case.

Fix the keyboard listener would sometimes hang...

7dca62a

...this would cause the whole application to hang during the teardown sequence of the application.

Add more debug messages to help debugging the teardown sequence

be2cb43

Since now these are only printed when passing -l debug, I think this is an acceptable compromise and can help debug the various problems that have existed with the application hanging

Copy data and attachments...

4924bf5

Some anime files in particular like to include custom fonts and stuff like that in these streams. I think it is useful to keep them as to keep the generated file as close to the original as possible.

arximboldi force-pushed the realesrgan branch from 1e5859a to 4924bf5 Compare June 29, 2024 20:04

aa-ko mentioned this pull request Jun 29, 2024

Add support for Real-ESRGAN #1102

Closed

k4yt3x closed this Oct 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Real-ESRGAN, plus various fixes (hangs, async video, etc.) #1133

Add support for Real-ESRGAN, plus various fixes (hangs, async video, etc.) #1133

arximboldi commented Jun 29, 2024 •

edited

Loading

twardoch commented Sep 1, 2024

twardoch commented Sep 1, 2024

twardoch commented Sep 1, 2024 •

edited

Loading

twardoch commented Sep 1, 2024

k4yt3x commented Oct 8, 2024

aa-ko commented Oct 8, 2024

twardoch commented Oct 8, 2024

k4yt3x commented Oct 8, 2024

twardoch commented Oct 8, 2024

k4yt3x commented Oct 8, 2024

twardoch commented Oct 8, 2024 •

edited

Loading

twardoch commented Oct 8, 2024

mirh commented Oct 9, 2024

k4yt3x commented Oct 9, 2024 •

edited

Loading

mirh commented Oct 10, 2024 •

edited

Loading

k4yt3x commented Oct 10, 2024 •

edited

Loading

mirh commented Oct 10, 2024

k4yt3x commented Oct 11, 2024

mirh commented Oct 12, 2024

k4yt3x commented Oct 14, 2024

Add support for Real-ESRGAN, plus various fixes (hangs, async video, etc.) #1133

Add support for Real-ESRGAN, plus various fixes (hangs, async video, etc.) #1133

Conversation

arximboldi commented Jun 29, 2024 • edited Loading

twardoch commented Sep 1, 2024

twardoch commented Sep 1, 2024

twardoch commented Sep 1, 2024 • edited Loading

twardoch commented Sep 1, 2024

k4yt3x commented Oct 8, 2024

aa-ko commented Oct 8, 2024

twardoch commented Oct 8, 2024

k4yt3x commented Oct 8, 2024

twardoch commented Oct 8, 2024

k4yt3x commented Oct 8, 2024

twardoch commented Oct 8, 2024 • edited Loading

twardoch commented Oct 8, 2024

mirh commented Oct 9, 2024

k4yt3x commented Oct 9, 2024 • edited Loading

mirh commented Oct 10, 2024 • edited Loading

k4yt3x commented Oct 10, 2024 • edited Loading

mirh commented Oct 10, 2024

k4yt3x commented Oct 11, 2024

mirh commented Oct 12, 2024

k4yt3x commented Oct 14, 2024

arximboldi commented Jun 29, 2024 •

edited

Loading

twardoch commented Sep 1, 2024 •

edited

Loading

twardoch commented Oct 8, 2024 •

edited

Loading

k4yt3x commented Oct 9, 2024 •

edited

Loading

mirh commented Oct 10, 2024 •

edited

Loading

k4yt3x commented Oct 10, 2024 •

edited

Loading