Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discord? Discuss ONNX implementation #72

Open
catselectro opened this issue Jun 8, 2024 · 10 comments
Open

Discord? Discuss ONNX implementation #72

catselectro opened this issue Jun 8, 2024 · 10 comments

Comments

@catselectro
Copy link

Hi,

I just found this project and the repository at https://github.com/instant-high/wav2lip-onnx-HQ. I think that combining both may make this even faster.

I ran some quick tests using the ONNX model from this repository (with the help of ChatGPT), and it seems I get about 15%-20% faster generation times. However, I've never implemented something like this before, so I might be doing something wrong.

Your Discord link seems to be down, so I couldn't contact you there. Do you have another link or another way to chat?

Thanks for this awesome project.

Best.

@anothermartz
Copy link
Owner

anothermartz commented Jun 8, 2024

Interesting, I'll have to give out this onyx project a try and then perhaps I can implement an easy install/GUI for it.

Although I'm spending less time at my computer at the moment because I'm about to move home and there's lots of planning and busyness going on for me.

Here's the DeepFaceLab discord with a wav2lip channel that's good for discussing all this stuff:

https://discord.com/invite/9scUkmcf8V

@catselectro
Copy link
Author

Thanks, I'll take a look. If you want my quick implementation, I can send you the file, I just changed inference.py to use the onnx model. Good luck with your projects!

@Echolink50
Copy link

Are their any other improvements besides the speed increase? Thanks

@catselectro
Copy link
Author

I noticed a slight increase in VRAM usage when using the ONNX model, from 0.3 GB to 0.7 GB, so there's no improvement in that aspect. The model's file size is reduced to a quarter of the original. There might be potential for further improvements in VRAM, but I'm not sure.

@Echolink50
Copy link

Ok the vram increase is not to bad. Did you also use the "new" face detection and alignment mentioned or any of the "new" face enhancers mentioned? Any improvements in quality of the lip sync? Thanks

@catselectro
Copy link
Author

I used all the functionality on this repo. I just changed the model by the onnx version of the repo I cited, so quality is the same and I used the "improved" method on this repo.

@Echolink50
Copy link

Oh ok. I saw that the onnx repo had some other features like different face restoration models and different detection and alignment. I will check it out. Thanks

@anothermartz
Copy link
Owner

I used all the functionality on this repo. I just changed the model by the onnx version of the repo I cited, so quality is the same and I used the "improved" method on this repo.

so you mean you just used the wav2lip.onnx file instead of the Wav2Lip.pth file?

I see no difference in speed between the 2 in my own tests, but GPEN I think is faster than GFPGAN, at least according to tests I did for that using the ONYX project.

I'm more interested in the improved face tracking and also the cool little crop feature where you select the face location to make things faster that way.

But making an easy installer for that project would take me more work than I'm willing to do at the moment, it's still wav2lip after all so while there are improvements, they're not groundbreaking enough for me to adapt at this time.

@Echolink50
Copy link

I used all the functionality on this repo. I just changed the model by the onnx version of the repo I cited, so quality is the same and I used the "improved" method on this repo.

so you mean you just used the wav2lip.onnx file instead of the Wav2Lip.pth file?

I see no difference in speed between the 2 in my own tests, but GPEN I think is faster than GFPGAN, at least according to tests I did for that using the ONYX project.

I'm more interested in the improved face tracking and also the cool little crop feature where you select the face location to make things faster that way.

But making an easy installer for that project would take me more work than I'm willing to do at the moment, it's still wav2lip after all so while there are improvements, they're not groundbreaking enough for me to adapt at this time.

Can you release the onnx implementation and the new features you tested for manual install? Thanks

@catselectro
Copy link
Author

I used all the functionality on this repo. I just changed the model by the onnx version of the repo I cited, so quality is the same and I used the "improved" method on this repo.

so you mean you just used the wav2lip.onnx file instead of the Wav2Lip.pth file?

Yes, modifying inference.py to load it instead of the .pth file. I noticed this slight speed improvement only when using the onnx on this project, not just by using the onnx project by itself, but I didn't do extensive testing there because the mouth quality on this project seems better, at least for the example I was testing. This is how I modified inference.py: https://gist.github.com/catselectro/90627227b93c92eb0909d2392fa1239a#file-inference_onnx_new-py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants