Add Llama2-70b sparsecore collective model to trillium configs #1042

Obliviour · 2024-11-15T20:06:33Z

Description

Add a new llama2-70b config with sparsecore offloading that scales up to 32 pods so far. Adjust the num_slices arg below:

build maxtext dependencies
have xpk in ~/xpk/xpk.py
run below command

Tests

Ran successful training on TPUs

python3 benchmarks/benchmark_runner.py --project=${PROJECT} --zone={zone} --device_type=v6e-256 --num_slices=1 --cluster_name=${CLUSTER_NAME} --base_output_directory=${OUTPUT_DIR}
--model_name="llama2_70b_4096_sc" --libtpu_version=20241106 --base_docker_image=maxtext_base_image

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed.

Obliviour added 2 commits November 15, 2024 15:06

Update maxtext_trillium_model_configs.py

9c91541

Update respective flags, and cmds.

e1966b7

Obliviour changed the title ~~Update maxtext_trillium_model_configs.py~~ Add Llama2-70b sparsecore collective model to trillium configs Nov 15, 2024

Obliviour marked this pull request as ready for review November 20, 2024 18:02

Obliviour requested review from gobbleturk, jonb377, khatwanimohit, bvandermoon and vipannalla as code owners November 20, 2024 18:02

Merge branch 'main' into Add-Llama2-70b-offload-config

01ef569

khatwanimohit approved these changes Nov 20, 2024

View reviewed changes

Obliviour added 2 commits December 2, 2024 17:56

Merge branch 'main' into Add-Llama2-70b-offload-config

7b41fc3

Merge branch 'main' into Add-Llama2-70b-offload-config

05d3525

Obliviour added pull ready and removed pull ready labels Dec 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Llama2-70b sparsecore collective model to trillium configs #1042

Add Llama2-70b sparsecore collective model to trillium configs #1042

Obliviour commented Nov 15, 2024 •

edited

Loading

Add Llama2-70b sparsecore collective model to trillium configs #1042

Are you sure you want to change the base?

Add Llama2-70b sparsecore collective model to trillium configs #1042

Conversation

Obliviour commented Nov 15, 2024 • edited Loading

Description

Tests

Checklist

Obliviour commented Nov 15, 2024 •

edited

Loading