feat(trainer_builder): refactor trainer_builder and preserve optional callable for custom model dispatch function in isp mode #293

season0528 · 2024-08-06T11:32:31Z

Motivation

Improve readability of trainer_builder, a.k.a, make the code self-documenting.
preserve optional callable for custom model_dispatch_func, might be useful when you try to integrate huggingface models with ISP.

Modification

internlm/core/trainer_builder.py
internlm/core/parallel/comm/isp.py
internlm/model/builder.py
internlm/model/ops/attention.py
internlm/train/pipeline.py

BC-breaking (Optional)

None

Use cases (Optional)

None

Checklist

Before PR:

Pre-commit or other linting tools are used to fix the potential lint issues.
Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests.
The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
The documentation has been modified accordingly, like docstring or example tutorials.

After PR:

If the modification has potential influence on downstream or other related projects, this PR should be tested with those projects.
CLA has been signed and all committers have signed the CLA in this PR.

season0528 · 2024-08-11T01:50:10Z

Experimental results for acc/loss alignment:

Code base:
InternEvo-HFModels 6bfd957
InternEvo 7696aac

Huggingface InternLM2 pack mode, w/wo ISP. (800 steps, perfectly aligned, diff is e-3~e-2)

Huggingface InternLM2 unpack mode, w/wo ISP. (700 steps, perfectly aligned, diff is e-3~e-2)

Huggingface InternLM1 pack mode, w/wo ISP. (500 steps, perfectly aligned, diff is e-3~e-2)

Huggingface InternLM1 unpack mode, w/wo ISP. (700steps, might not perfectly aligned, diff is e-2~e-1)

season0528 · 2024-08-11T06:15:23Z

ISP adaption code: https://github.com/InternLM/InternEvo-HFModels/commit/6bfd9576005817e74302000ffa35567aca8260b4
ISP adaption docs: https://github.com/InternLM/InternEvo-HFModels/blob/main/huggingface_model/README.md

For other huggingface models, just refer to the adaption code/docs of huggingface InternLM1 and InternLM2. (Just need to modify few lines of code)

season0528 · 2024-08-11T06:24:41Z

Status (Done.):

ISP adaption code.
ISP adaption docs.

season0528 · 2024-08-14T11:53:56Z

weight parallel also enabled and acc/loss aligned https://github.com/InternLM/InternEvo-HFModels/tree/enable_wp

season0528 · 2024-08-27T03:56:56Z

Close this PR, since we choose to drop the repo https://github.com/InternLM/InternEvo-HFModels

refactor trainer_builder and make it more readable

1b3431a

mm-assistant bot assigned yhcc Aug 6, 2024

typo fix

efe0efc

sunpengsdu requested a review from sallyjunjun August 8, 2024 02:16

season0528 and others added 4 commits August 8, 2024 10:49

Merge branch 'InternLM:develop' into refactor_trainer_builder

02b063e

debug isp

1fc507a

add isp and pack support for internlm2

e7390b4

add callable for mdoel dispatch

ed01b8a

season0528 changed the title ~~feat(trainer_builder): refactor trainer_builder and make it more readable~~ feat(trainer_builder): refactor trainer_builder and preserve optional callable for model dispatch function Aug 8, 2024

season0528 closed this Aug 9, 2024

fix

36f831b

season0528 reopened this Aug 10, 2024

season0528 added 3 commits August 10, 2024 23:56

Merge branch 'develop' into refactor_trainer_builder

00a7c50

fix pylint

92e37db

fix pylint

4e7ad07

add assertion

7d92689

season0528 changed the title ~~feat(trainer_builder): refactor trainer_builder and preserve optional callable for model dispatch function~~ feat(trainer_builder): refactor trainer_builder and preserve optional callable for custom model dispatch function in isp mode Aug 11, 2024

fix pylint

6a5b667

refine trainer_builder

7696aac

sallyjunjun force-pushed the refactor_trainer_builder branch from 519aed7 to 925637f Compare August 14, 2024 11:30

add isp split ckpt weight

ab6265c

sallyjunjun force-pushed the refactor_trainer_builder branch from 925637f to ab6265c Compare August 15, 2024 06:23

season0528 closed this Aug 19, 2024

season0528 reopened this Aug 20, 2024

season0528 closed this Aug 21, 2024

season0528 mentioned this pull request Aug 27, 2024

feat(usability): Attempt for easier usability #304

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(trainer_builder): refactor trainer_builder and preserve optional callable for custom model dispatch function in isp mode #293

feat(trainer_builder): refactor trainer_builder and preserve optional callable for custom model dispatch function in isp mode #293

season0528 commented Aug 6, 2024 •

edited

Loading

season0528 commented Aug 11, 2024 •

edited

Loading

season0528 commented Aug 11, 2024 •

edited

Loading

season0528 commented Aug 11, 2024 •

edited

Loading

season0528 commented Aug 14, 2024

season0528 commented Aug 27, 2024

feat(trainer_builder): refactor trainer_builder and preserve optional callable for custom model dispatch function in isp mode #293

feat(trainer_builder): refactor trainer_builder and preserve optional callable for custom model dispatch function in isp mode #293

Conversation

season0528 commented Aug 6, 2024 • edited Loading

Motivation

Modification

BC-breaking (Optional)

Use cases (Optional)

Checklist

season0528 commented Aug 11, 2024 • edited Loading

season0528 commented Aug 11, 2024 • edited Loading

season0528 commented Aug 11, 2024 • edited Loading

season0528 commented Aug 14, 2024

season0528 commented Aug 27, 2024

season0528 commented Aug 6, 2024 •

edited

Loading

season0528 commented Aug 11, 2024 •

edited

Loading

season0528 commented Aug 11, 2024 •

edited

Loading

season0528 commented Aug 11, 2024 •

edited

Loading