Rejection Sampling on GSM8k #78

AlexPiche · 2024-10-31T02:42:45Z

Rejection Sampling on GSM8k

Implement a simple reasoning architecture. COTMathAgent
Fix vLLM get log probs OOM OOM when getting logprobs from vLLM reference model #77
Do not include empty user message in nodes.py
Launch training jobs with accelerate to handle multi-gpu. better multi-gpu inference and training in the RL example #76 (There is an NCCL timeout for large gradient accumulation passes NCCL timeout on multi-gpu #104)
Get dataset stats (max, min, mean length) in finetune.py
Fix finetune to start from the latest model instead of the base one.
Move finetune/rl helper functions into finetune/rl/utils.py
Implicit KL like Eq3 https://arxiv.org/abs/2402.14740
Clean up rl orchestrator: remove discounting and max steps

Tape Browser PR from @rizar

add a script to launch the tape browser
save tapes in a single json file, which is a format that the default tape browser can load
add a script to gather legacy tapes in single file so that we can view legacy tapes in the browser
load LLM Calls 100 at a time to speed up tape browser loading time (the speedup was huge when the browser did not know where to look for LLM Calls; now that I added the search for llm_calls.sqlite to browser.py, loading is taking some time again)

Examples:
run tape browser like this: python -m examples.rl_gsm8k.browse outputs/yolo2_4gpu_lr1e-6/tapes/train/0/all

Reasoning Architecture

Learning curve

python examples/rl_gsm8k/orchestrate_rl.py finetune.rl.algo=reinforce finetune.train_batch_size=4 finetune.gradient_accumulation_passes=1 finetune.rl.implicit_kl_coef=0.0 finetune.rl.kl_coef=0.0 finetune.rl.use_advantages=false +finetune.rl.relu_weights=true use_rejection_sampling=true test_every_n_iterations=5 finetune.learning_rate=0.000001 finetune.gradient_clipping_threshold=1.0 finetune.save_checkpoint_steps=8 finetune.weight_decay=0.1 max_agent_forks=5000 attempts=8

Reproducing GSM8k

rizar · 2024-10-31T12:39:36Z

will address #77 and #76

rizar

Looks mostly good! We should not traverse the whole training set every time the script is launched though.

Note for the re-review:

check ReLU weights citation
check running aggregation of input lengths

tapeagents/finetune/checkpoints.py

tapeagents/finetune/finetune.py

tapeagents/finetune/rl/__init__.py

tapeagents/observe.py

Co-authored-by: Dzmitry Bahdanau <dzmitry.bahdanau@servicenow.com>

rizar

lgtm!

run 70b

0044902

AlexPiche changed the base branch from main to grpo_wild_chat October 31, 2024 02:42

AlexPiche added 3 commits October 31, 2024 02:45

clean up debug code

09f7fb2

conf_dir as str

0537e02

chunked prefill vllm

4b9407a

Base automatically changed from grpo_wild_chat to main October 31, 2024 13:28

AlexPiche added 23 commits October 31, 2024 14:08

iteration as str

04279b0

no testing when test_every_n_iterations is -1

537f9a4

no testing with -1

819a161

print finetuning output

b772d97

rm seq length tokens

4317fcb

fp8 quant

d781b74

use deepspeed

8657767

better vllm logging

d050cf4

deepspeed training

6b0bc26

update accelerate version

2a3f0ee

negative rewards for too many steps

1503884

discounting

5600f85

try to get lora working

743a3e6

step norm

5e11b32

penalize 20 steps tape

b42a785

step discount

e8872c3

discoutn and max steps

7be219b

fix discount typo

f46c0ea

merge 8b branch

5d87a2a

use adafactor with accelerate

4e79c5f

implicit kl

1dfe78d

implicit kl

7a2961c

better variable names

c737e72

Merge remote-tracking branch 'origin/main' into llama70b_gsm8k

8e19489

AlexPiche changed the title ~~RL training of Llama 3.1 8b on multi-gpus~~ Rejection Sampling on GSM8k Nov 18, 2024

AlexPiche added 3 commits November 18, 2024 19:29

Merge remote-tracking branch 'origin/fix_test' into llama70b_gsm8k

3b09425

rm run training in process

1303602

rm if rl start from base model

73ee354

AlexPiche changed the base branch from main to fix_test November 19, 2024 15:13

AlexPiche added 4 commits November 19, 2024 16:13

clean up

35cb007

typo

03174da

clean up

6ffc6b0

better docs

397e0b8

AlexPiche changed the base branch from fix_test to main November 19, 2024 16:28

clean up

8b60ba6

AlexPiche requested a review from rizar November 19, 2024 16:40

rizar requested changes Nov 19, 2024

View reviewed changes

AlexPiche and others added 13 commits November 19, 2024 12:48

Update tapeagents/observe.py

0200482

Co-authored-by: Dzmitry Bahdanau <dzmitry.bahdanau@servicenow.com>

dima changes

4c4e71a

clean up

5618143

update readme

6268098

reverse change

9dd4cf7

improve doc

a59aaad

improve doc

251ce78

rm debug code

cbae73f

fix logging of max min dataset len

94109b0

fix min seq length logging

6ef2ea3

fix naming of variables

2d698cd

typo

18b1b02

fix naming

1291be2

rizar approved these changes Nov 20, 2024

View reviewed changes

hf dataset

90b080f

AlexPiche merged commit 74948b6 into main Nov 20, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rejection Sampling on GSM8k #78

Rejection Sampling on GSM8k #78

AlexPiche commented Oct 31, 2024 •

edited

Loading

rizar commented Oct 31, 2024

rizar left a comment

rizar left a comment

Rejection Sampling on GSM8k #78

Rejection Sampling on GSM8k #78

Conversation

AlexPiche commented Oct 31, 2024 • edited Loading

Rejection Sampling on GSM8k

Tape Browser PR from @rizar

Reasoning Architecture

Learning curve

Reproducing GSM8k

rizar commented Oct 31, 2024

rizar left a comment

Choose a reason for hiding this comment

rizar left a comment

Choose a reason for hiding this comment

AlexPiche commented Oct 31, 2024 •

edited

Loading