TPS loss under bot #8

sfxworks · 2023-04-09T04:36:54Z

Under python3 server.py --wbits 4 --model ozcur_alpaca-native-4bit --verbose --listen --gpu-memory 5 --groupsize 128 via the UI, I get about 20 tokens per second

Output generated in 9.68 seconds (20.56 tokens/s, 199 tokens, context 42)

Under the bot with the same flags, I get only 2
python3 bot.py --wbits 4 --model ozcur_alpaca-native-4bit --verbose --listen --gpu-memory 5 --groupsize 128
Output generated in 10.30 seconds (2.04 tokens/s, 21 tokens, context 170)

The context seems to be the issue? As adding more to the context decreased it to 17.

Output generated in 7.73 seconds (17.20 tokens/s, 133 tokens, context 70)

Can the input for this bot be optimized?

The text was updated successfully, but these errors were encountered:

xNul · 2023-04-09T21:50:04Z

Thanks, I've been able to reproduce on my end.

I made everything deterministic to see if there was a parameter I was missing, but with the same parameters, context, and sequence of inputs, I was able to produce the exact same response on both webui and the bot, the only difference being webui tokens were generated at 5.98 tokens/s and the bot tokens were generated at 2.30 tokens/s. Let me see where this thread goes.

xNul · 2023-04-09T22:54:08Z

I removed all the async code, discord bot stuff, any other unnecessary code to run the prompt, and called the API directly. Now I'm getting 6.15 tokens/s. I guess it's going to have something to do with async.

xNul · 2023-04-10T01:52:26Z

I found the issue. This line of code for streaming the text generation to Discord is blocking the token generation, slowing it down. Since Python async is concurrent and uses only one thread, throwing Message.edit calls to async won't work. The only option is to run Message.edit in a separate process which means doing some work with multiprocessing or a message queue. I'm looking into the different options.

xNul · 2023-04-10T02:40:22Z

Since the Client object in discord.py can't be serialized, it can't be moved to another process and used to make edits. This means that in order to keep performance and response streaming, I'll need to move all LLM logic into another process. From that process, I can then send messages via IPC back to the process for Discord and make those message edits for streaming.

Oh boy, I didn't realize this was going to be such a headache. I'm working on something else atm so I'm going to put this on the back-burner for a week or two. If you prefer to have performance over streaming, just remove that line I mentioned and you'll get the same performance as with webui.

sfxworks · 2023-04-10T06:32:09Z

I appreciate your diligence in looking into this issue!

xNul mentioned this issue Apr 30, 2023

I can't install the Discord module into Conda oobabooga/text-generation-webui#1503

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TPS loss under bot #8

TPS loss under bot #8

sfxworks commented Apr 9, 2023

xNul commented Apr 9, 2023

xNul commented Apr 9, 2023 •

edited

Loading

xNul commented Apr 10, 2023

xNul commented Apr 10, 2023

sfxworks commented Apr 10, 2023

TPS loss under bot #8

TPS loss under bot #8

Comments

sfxworks commented Apr 9, 2023

xNul commented Apr 9, 2023

xNul commented Apr 9, 2023 • edited Loading

xNul commented Apr 10, 2023

xNul commented Apr 10, 2023

sfxworks commented Apr 10, 2023

xNul commented Apr 9, 2023 •

edited

Loading