Skip to content

Fix wrong token latency when batch size is greater than 1 #4708

Fix wrong token latency when batch size is greater than 1

Fix wrong token latency when batch size is greater than 1 #4708

Triggered via pull request November 21, 2024 10:43
Status Success
Total duration 34m 39s
Artifacts

causal_lm_cpp.yml

on: pull_request
Matrix: cpp-beam_search_causal_lm-ubuntu
cpp-multinomial-greedy_causal_lm-ubuntu
13m 22s
cpp-multinomial-greedy_causal_lm-ubuntu
cpp-greedy_causal_lm-windows
24m 30s
cpp-greedy_causal_lm-windows
cpp-greedy_causal_lm-Qwen-7B-Chat
10m 18s
cpp-greedy_causal_lm-Qwen-7B-Chat
cpp-beam_search_causal_lm-Qwen1_5-7B-Chat
33m 14s
cpp-beam_search_causal_lm-Qwen1_5-7B-Chat
cpp-beam_search_causal_lm-Phi-2
16m 5s
cpp-beam_search_causal_lm-Phi-2
cpp-beam_search_causal_lm-notus-7b-v1
14m 36s
cpp-beam_search_causal_lm-notus-7b-v1
cpp-speculative_decoding_lm-ubuntu
12m 13s
cpp-speculative_decoding_lm-ubuntu
cpp-prompt_lookup_decoding_lm-ubuntu
14m 57s
cpp-prompt_lookup_decoding_lm-ubuntu
cpp-Phi-1_5
7m 42s
cpp-Phi-1_5
cpp-greedy_causal_lm-redpajama-3b-chat
15m 51s
cpp-greedy_causal_lm-redpajama-3b-chat
cpp-chat_sample-ubuntu
14m 19s
cpp-chat_sample-ubuntu
visual_language_chat_sample-ubuntu-minicpm_v2_6
7m 56s
visual_language_chat_sample-ubuntu-minicpm_v2_6
visual_language_chat_sample-ubuntu-llava_1_5  /  visual_language_chat_sample-ubuntu-llava
31m 22s
visual_language_chat_sample-ubuntu-llava_1_5 / visual_language_chat_sample-ubuntu-llava
visual_language_chat_sample-ubuntu-llava_next  /  visual_language_chat_sample-ubuntu-llava
18m 30s
visual_language_chat_sample-ubuntu-llava_next / visual_language_chat_sample-ubuntu-llava
visual_language_chat_sample-ubuntu-internvl2
14m 8s
visual_language_chat_sample-ubuntu-internvl2
cpp-continuous-batching-ubuntu
15m 1s
cpp-continuous-batching-ubuntu
cpp-continuous-batching-windows
25m 18s
cpp-continuous-batching-windows
cpp-continuous-batching-macos
19m 16s
cpp-continuous-batching-macos
ci/gha_overall_status_causal_lm
0s
ci/gha_overall_status_causal_lm
Fit to window
Zoom out
Zoom in

Annotations

20 warnings
cpp-Phi-1_5
The following actions use a deprecated Node.js version and will be forced to run on node20: actions/setup-python@v4. For more info: https://github.blog/changelog/2024-03-07-github-actions-all-actions-will-run-on-node20-instead-of-node16-by-default/
visual_language_chat_sample-ubuntu-minicpm_v2_6
The following actions use a deprecated Node.js version and will be forced to run on node20: actions/setup-python@v4. For more info: https://github.blog/changelog/2024-03-07-github-actions-all-actions-will-run-on-node20-instead-of-node16-by-default/
cpp-greedy_causal_lm-Qwen-7B-Chat
The following actions use a deprecated Node.js version and will be forced to run on node20: actions/setup-python@v4. For more info: https://github.blog/changelog/2024-03-07-github-actions-all-actions-will-run-on-node20-instead-of-node16-by-default/
cpp-speculative_decoding_lm-ubuntu
The following actions use a deprecated Node.js version and will be forced to run on node20: actions/setup-python@v4. For more info: https://github.blog/changelog/2024-03-07-github-actions-all-actions-will-run-on-node20-instead-of-node16-by-default/
cpp-chat_sample-ubuntu
The following actions use a deprecated Node.js version and will be forced to run on node20: actions/setup-python@v4. For more info: https://github.blog/changelog/2024-03-07-github-actions-all-actions-will-run-on-node20-instead-of-node16-by-default/
cpp-multinomial-greedy_causal_lm-ubuntu
The following actions use a deprecated Node.js version and will be forced to run on node20: actions/setup-python@v4. For more info: https://github.blog/changelog/2024-03-07-github-actions-all-actions-will-run-on-node20-instead-of-node16-by-default/
visual_language_chat_sample-ubuntu-internvl2
The following actions use a deprecated Node.js version and will be forced to run on node20: actions/setup-python@v4. For more info: https://github.blog/changelog/2024-03-07-github-actions-all-actions-will-run-on-node20-instead-of-node16-by-default/
cpp-beam_search_causal_lm-notus-7b-v1
The following actions use a deprecated Node.js version and will be forced to run on node20: actions/setup-python@v4. For more info: https://github.blog/changelog/2024-03-07-github-actions-all-actions-will-run-on-node20-instead-of-node16-by-default/
cpp-greedy_causal_lm-redpajama-3b-chat
The following actions use a deprecated Node.js version and will be forced to run on node20: actions/setup-python@v4. For more info: https://github.blog/changelog/2024-03-07-github-actions-all-actions-will-run-on-node20-instead-of-node16-by-default/
cpp-continuous-batching-ubuntu
The following actions use a deprecated Node.js version and will be forced to run on node20: actions/setup-python@v4. For more info: https://github.blog/changelog/2024-03-07-github-actions-all-actions-will-run-on-node20-instead-of-node16-by-default/
cpp-prompt_lookup_decoding_lm-ubuntu
The following actions use a deprecated Node.js version and will be forced to run on node20: actions/setup-python@v4. For more info: https://github.blog/changelog/2024-03-07-github-actions-all-actions-will-run-on-node20-instead-of-node16-by-default/
cpp-beam_search_causal_lm-Phi-2
The following actions use a deprecated Node.js version and will be forced to run on node20: actions/setup-python@v4. For more info: https://github.blog/changelog/2024-03-07-github-actions-all-actions-will-run-on-node20-instead-of-node16-by-default/
cpp-beam_search_causal_lm-ubuntu (./build/samples/cpp/beam_search_causal_lm/beam_search_causal_lm)
The following actions use a deprecated Node.js version and will be forced to run on node20: actions/setup-python@v4. For more info: https://github.blog/changelog/2024-03-07-github-actions-all-actions-will-run-on-node20-instead-of-node16-by-default/
cpp-beam_search_causal_lm-ubuntu (python ./samples/python/beam_search_causal_lm/beam_search_causa...
The following actions use a deprecated Node.js version and will be forced to run on node20: actions/setup-python@v4. For more info: https://github.blog/changelog/2024-03-07-github-actions-all-actions-will-run-on-node20-instead-of-node16-by-default/
cpp-continuous-batching-macos
The following actions use a deprecated Node.js version and will be forced to run on node20: actions/setup-python@v4. For more info: https://github.blog/changelog/2024-03-07-github-actions-all-actions-will-run-on-node20-instead-of-node16-by-default/
visual_language_chat_sample-ubuntu-llava_next / visual_language_chat_sample-ubuntu-llava
The following actions use a deprecated Node.js version and will be forced to run on node20: actions/setup-python@v4. For more info: https://github.blog/changelog/2024-03-07-github-actions-all-actions-will-run-on-node20-instead-of-node16-by-default/
cpp-greedy_causal_lm-windows
The following actions use a deprecated Node.js version and will be forced to run on node20: actions/setup-python@v4. For more info: https://github.blog/changelog/2024-03-07-github-actions-all-actions-will-run-on-node20-instead-of-node16-by-default/
cpp-continuous-batching-windows
The following actions use a deprecated Node.js version and will be forced to run on node20: actions/setup-python@v4. For more info: https://github.blog/changelog/2024-03-07-github-actions-all-actions-will-run-on-node20-instead-of-node16-by-default/
visual_language_chat_sample-ubuntu-llava_1_5 / visual_language_chat_sample-ubuntu-llava
The following actions use a deprecated Node.js version and will be forced to run on node20: actions/setup-python@v4. For more info: https://github.blog/changelog/2024-03-07-github-actions-all-actions-will-run-on-node20-instead-of-node16-by-default/
cpp-beam_search_causal_lm-Qwen1_5-7B-Chat
The following actions use a deprecated Node.js version and will be forced to run on node20: actions/setup-python@v4. For more info: https://github.blog/changelog/2024-03-07-github-actions-all-actions-will-run-on-node20-instead-of-node16-by-default/