Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Real-ESRGAN, plus various fixes (hangs, async video, etc.) #1133

Closed
wants to merge 18 commits into from

Conversation

arximboldi
Copy link

@arximboldi arximboldi commented Jun 29, 2024

This picks up the work in #1102 by @aa-ko to introduce Real-ESRGAN support, adding support for all models exposed by realesrgan-ncnn-py and testing it outside of a Docker container.

Additionally, I have fixed a few issues that I've found. I wasn't sure whether you prefer to have multiple smaller PR, or to just review all at once. The work is separated into multiple commits so I can still split into multiple PR's if you prefer.

These are the highlights from this PR:

There are also some smaller changes that I'm not sure you agree with, but that I think are quite convenient:

  • A shell.nix environment that allows installing all the system dependencies in an isolated environment, by simply running nix-shell. This was crucial for me to be able to run and test the program locally outside of a Docker container
  • Minor improvements to the logging system (proper acknowledgement of the -l flag, and introduction of a new -L flag).

Thank you @aa-ko for starting the work on integrating Real-ESRGAN. It's model realesr-animevideov3 is giving me incredible results, with image quality comparable or better for anime than realcugan, yet much faster!

Thank you @k4yt3x for this incredible tool. Looking into the code has shown me how much love there is into it and it is working super well for me now!

aa-ko and others added 4 commits February 21, 2024 15:50
Choosing realesrgan ignores noise flag for now and always uses the realesrgan-x4plus model.
A few scalers output a lot of crap like:
   10%
   25%
   50%
   ...

That messes with the progress bar indicator.
@arximboldi arximboldi changed the title Add support for Real-ESRGAN, and fix hang and async videos, other minor improvements Add support for Real-ESRGAN, plus various fixes (hangs, async video, etc.) Jun 29, 2024
This allows a reproducible way to get all the needed dependencies and
allows running the software in NixOS environments.
This solves a couple of issues:

1. The log level passed with -l was not properly applied to loguru, it
   was always using `debug`, the default.

2. When actually passing `-l debug` ffmpeg would flood us with too
   much information, it is better to have a separate option to debug
   ffmpeg issues.
This can actually lead to corrupted videos, or with bad
synchronization or without skipping support.  This option seems to do
the right thing for this and is inocuous in our case.
...this would cause the whole application to hang during the teardown
sequence of the application.
Since now these are only printed when passing -l debug, I think this
is an acceptable compromise and can help debug the various problems
that have existed with the application hanging
This problem used to be more severe, but has become less frequent with
our fix for k4yt3x#1132

The problem happens because we used to take `frame_count` as an
absolute truth, and we would iterate until we had processed that many
frames.  However, as we've learnt, this is just an estimate: it can
happen that the `Decoder` thread is done before we hit that frame
count.

With this change, we detect that scenario, and gracefully finish the
rendering process, making sure that all pending frames have been read
and processed and get written in the final stream.
Some anime files in particular like to include custom fonts and stuff
like that in these streams. I think it is useful to keep them as to
keep the generated file as close to the original as possible.
@twardoch
Copy link

twardoch commented Sep 1, 2024

@arximboldi

class KillableListener(Listener, KillableThreadMixin):
    """
    A killable version of pyinput.keyboard.Listener, as joining()
    seems to hang on some systems even after properly calling close().
    """
    pass

in https://github.com/arximboldi/video2x/blob/realesrgan/video2x/video2x.py#L131C1-L136C9 causes a fail where pynput is not available. Because Python tries to subclass from Listener. You should define that class in a scoped way.

@twardoch
Copy link

twardoch commented Sep 1, 2024

@arximboldi Also when I try running with realesrgan on Google Colab, I get

Unrecognized option 'fps_mode'. Error splitting the argument list: Option not found

(which seems to be referenced in https://github.com/arximboldi/video2x/blob/realesrgan/video2x/decoder.py )

@twardoch
Copy link

twardoch commented Sep 1, 2024

Turns out that 'fps_mode' is a relatively new option in ffmpeg, and the standard Google Colab environment installs an older ffmpeg. But I was able to work around it with:

# Install system dependencies
!apt-get update
!apt-get install -y --no-install-recommends \
    python3-pip libvulkan-dev glslang-dev glslang-tools \
    build-essential swig ninja-build nvidia-driver-535 \
    mpv xserver-xorg-video-dummy xvfb libomp5
!sudo curl -L https://github.com/BtbN/FFmpeg-Builds/releases/download/latest/ffmpeg-master-latest-linux64-gpl.tar.xz -o /usr/local/bin/ffmpeg.tar.xz

%cd /usr/local/bin/
!7z e /usr/local/bin/ffmpeg.tar.xz -aoa
!7z e /usr/local/bin/ffmpeg.tar -aoa
!sudo chmod a+rx /usr/local/bin/ffmpeg

@twardoch
Copy link

twardoch commented Sep 1, 2024

Similarly, to work around the poor integration of Listener, I was able to work around it in Google Colab by providing:

%env DISPLAY=:0
class Listener:
    pass

from video2x import Video2X
video2x = Video2X()

But that’s not how it should work :)

@k4yt3x
Copy link
Owner

k4yt3x commented Oct 8, 2024

Thank you all @arximboldi @aa-ko @twardoch for the amazing work here. I've been very busy with work and several other non-FOSS projects over the last half of this year. Sorry that I missed this PR came in. I've been just getting way too many messages from GitHub and I didn't notice this PR was submitted. I only saw if after I completed the 6.0.0 rewrite.

It has always been my goal to introduce RealESRGAN in Video2X. I managed to do it in the rewrite. Take a look if you're still interested.

Unfortunately I can't merge this without destroying the commit history, so I'll just have to close this. Sorry again for not responding to this in time. I really appreciate your amazing work!

@k4yt3x k4yt3x closed this Oct 8, 2024
@aa-ko
Copy link

aa-ko commented Oct 8, 2024

@k4yt3x Wow, you actually just casually dropped the C++ rewrite, I am speechless 😮

As far as I can tell, this looks exactly like what I was hoping for. I'll try 6.0.0 ASAP, maybe I can contribute something this time lmao

Thanks for all the hard work, cheers! 🎉

@twardoch
Copy link

twardoch commented Oct 8, 2024

I’ll try it ASAP on Colab

@k4yt3x
Copy link
Owner

k4yt3x commented Oct 8, 2024

@twardoch I haven't updated the Colab playbook yet. You'll need to compile it if you wanna try colab.

@twardoch
Copy link

twardoch commented Oct 8, 2024

I guess so. I wonder if you could provide simple building instructions on how to build the stuff. I can adapt it to macOS and will PR. So far I had installed various libs on macOS via brew and then "make" failed on realesrgan being available. I will look at your github action for how you build on Linux and will try to adapt for macOS

@k4yt3x
Copy link
Owner

k4yt3x commented Oct 8, 2024

@twardoch if you can actually make it work for mac that'll be amazing. I never had a mac in my life so it'll be hard for me to make that work. As for steps to build, here's for Debian/Ubuntu:

video2x/Makefile

Lines 36 to 54 in 411cca4

debian:
apt-get update
apt-get install -y --no-install-recommends \
build-essential cmake clang pkg-config \
libavcodec-dev \
libavdevice-dev \
libavfilter-dev \
libavformat-dev \
libavutil-dev \
libswscale-dev \
libvulkan-dev \
glslang-tools \
libomp-dev
cmake -B /tmp/build -S . -DUSE_SYSTEM_NCNN=OFF \
-DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/tmp/install \
-DINSTALL_BIN_DESTINATION=. -DINSTALL_INCLUDE_DESTINATION=include \
-DINSTALL_LIB_DESTINATION=. -DINSTALL_MODEL_DESTINATION=.
cmake --build /tmp/build --config Release --target install --parallel

Another one is for Arch in PKGBUILD.

@twardoch
Copy link

twardoch commented Oct 8, 2024

The Homebrew package installer for macOS does include "ncnn", but I'm not sure if that means that anything will have to be adapted from vulkan to another backend.

Right now the macOS situation for AI is very complicated because drastically different backends are used for the new Apple Silicon Macs (where there are more inference acceleration possibilities) vs. the older Intel Macs (which often can do only CPU inference).

@twardoch
Copy link

twardoch commented Oct 8, 2024

Ps. Many local AI packages only exist for the Apple M-Silicon Macs because the older Apple Intel hardware is basically completely unsuitable for local AI inference.

For example Topaz Video AI runs like 50x faster on my MacBook Air M3 than on a beefy MacBook Pro Intel that's less than 3 years older.

So it's running time of 3 minutes vs. 2 hours to complete the same task.

@mirh
Copy link

mirh commented Oct 9, 2024

That sounds like a mac problem, I'm skeptical that even without a npu a properly accelerated coffee lake could be so behind

Anyhow, is RESRGAN supposed to be the only supported driver now?

@k4yt3x
Copy link
Owner

k4yt3x commented Oct 9, 2024

@mirh there's also libplacebo, which renders Anime4K v4 now, but it should be compatible with any MPV compatible GLSL

RealESRGAN has both an anime model and a real-life model. From their paper it also looks like the performance is better than RealSR (?), so I didn't bother adding RealSR. The other ones are kinda old so I didn't bother either.

@mirh
Copy link

mirh commented Oct 10, 2024

Their paper speaks truth, and all things considered I would also probably recommend this as the "overall" default.
But IIRC from my ancient tests, while RealSR may generally be blurrier (though while still retaining the most detail in the market bar W2xEX's "photo-conservative") I did find it less foolhardy too. Like, at least in my 360p video whatever the exaggeration it could produce still felt "natural" (as in "smoothly transitioning with the rest of the picture"), unlike some of the oversharpening blunders of RESRGAN.

Similarly with an anime DVD, while not "wildly making stuff up" (when the camera pans a scene with a corrugated iron roof and a metal fence it seems like they are dancing with CUGAN) you can still slightly tell in certain places that it is trying too much (so for this reason waifu2x seems a solid "safe ground").
Presumably temporal stability isn't going to be anywhere near a concern if you are working with higher resolution content.

And last but not least, just a few months ago I figured out that somehow SRMD (which I had always thought to be always strictly inferior to the alternatives at least as far as quality was concerned) could give me the best results bar none with a 369x207 crop of a picture from my phone.

p.s. Anime4K if any seems a bit pointless. Exactly because it's as lightweight as it is fairly low quality, I don't think it's the kind of upscaler that people would use in a complex tool like this.

@k4yt3x
Copy link
Owner

k4yt3x commented Oct 10, 2024

I've been away for long enough to not know what the best options are anymore. In addition to collecting opinions I'd also like to have the decision of what to add backed by testing results like VMAF (mostly because adding support for new solutions is time consuming). It's a bit beyond my ability to do it all by myself. Ideally we create an environment where people can discuss and vote... perhaps try bringing this up in the telegram group or something?

@mirh
Copy link

mirh commented Oct 10, 2024

I'm also very much out of the loop, but honestly there hasn't been that much activity in the last years (I assume there's just so much you can do with tiny generic networks?). At most there might be slightly different trained models (that is a madhouse tbf)?

As for "objective testing®" I cannot recommend enough FFMetrics (and then maybe this?).

@k4yt3x
Copy link
Owner

k4yt3x commented Oct 11, 2024

Both look interesting. Perhaps I should set up some kind of a feature poll in the discussions for people to discuss and vote for any new model that should be implemented. My specialty is not actually in AI nor CV, so I would really like if people actually from those fields can bring up and discuss what's worth implementing.

@mirh
Copy link

mirh commented Oct 12, 2024

Having eyes (and probably lot of time to waste) is probably more important here than knowledge.. like, papers do it too.

One super dope thing that is missing if any, is some easy automatic way to test/compare different solutions (a bit like this perhaps, but for models rather than compression settings.. even though those would be interesting too now that I think).
Like, what I used to do was using FFMetrics to find the most different timestamps (not just with the source, but also between each other), then seek for it and compare the different MPC windows.

@k4yt3x
Copy link
Owner

k4yt3x commented Oct 14, 2024

@twardoch I opened a new issue for mac support. If you make any progress could you please post them there? Thanks.
#1189

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants