Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(trainer_builder): refactor trainer_builder and preserve optional callable for custom model dispatch function in isp mode #293

Closed
wants to merge 14 commits into from

Conversation

season0528
Copy link
Collaborator

@season0528 season0528 commented Aug 6, 2024

Motivation

  1. Improve readability of trainer_builder, a.k.a, make the code self-documenting.
  2. preserve optional callable for custom model_dispatch_func, might be useful when you try to integrate huggingface models with ISP.

Modification

internlm/core/trainer_builder.py
internlm/core/parallel/comm/isp.py
internlm/model/builder.py
internlm/model/ops/attention.py
internlm/train/pipeline.py

BC-breaking (Optional)

None

Use cases (Optional)

None

Checklist

Before PR:

  • Pre-commit or other linting tools are used to fix the potential lint issues.
  • Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests.
  • The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
  • The documentation has been modified accordingly, like docstring or example tutorials.

After PR:

  • If the modification has potential influence on downstream or other related projects, this PR should be tested with those projects.
  • CLA has been signed and all committers have signed the CLA in this PR.

@sunpengsdu sunpengsdu requested a review from sallyjunjun August 8, 2024 02:16
@season0528 season0528 changed the title feat(trainer_builder): refactor trainer_builder and make it more readable feat(trainer_builder): refactor trainer_builder and preserve optional callable for model dispatch function Aug 8, 2024
@season0528 season0528 closed this Aug 9, 2024
@season0528 season0528 reopened this Aug 10, 2024
@season0528
Copy link
Collaborator Author

season0528 commented Aug 11, 2024

Experimental results for acc/loss alignment:

Code base:
InternEvo-HFModels 6bfd957
InternEvo 7696aac

  1. Huggingface InternLM2 pack mode, w/wo ISP. (800 steps, perfectly aligned, diff is e-3~e-2)
image
  1. Huggingface InternLM2 unpack mode, w/wo ISP. (700 steps, perfectly aligned, diff is e-3~e-2)
image
  1. Huggingface InternLM1 pack mode, w/wo ISP. (500 steps, perfectly aligned, diff is e-3~e-2)
image
  1. Huggingface InternLM1 unpack mode, w/wo ISP. (700steps, might not perfectly aligned, diff is e-2~e-1)
image

@season0528 season0528 changed the title feat(trainer_builder): refactor trainer_builder and preserve optional callable for model dispatch function feat(trainer_builder): refactor trainer_builder and preserve optional callable for custom model dispatch function in isp mode Aug 11, 2024
@season0528
Copy link
Collaborator Author

season0528 commented Aug 11, 2024

ISP adaption code: https://github.com/InternLM/InternEvo-HFModels/commit/6bfd9576005817e74302000ffa35567aca8260b4
ISP adaption docs: https://github.com/InternLM/InternEvo-HFModels/blob/main/huggingface_model/README.md

For other huggingface models, just refer to the adaption code/docs of huggingface InternLM1 and InternLM2. (Just need to modify few lines of code)

@season0528
Copy link
Collaborator Author

season0528 commented Aug 11, 2024

Status (Done.):

  • ISP adaption code.
  • ISP adaption docs.

@sallyjunjun sallyjunjun force-pushed the refactor_trainer_builder branch from 519aed7 to 925637f Compare August 14, 2024 11:30
@season0528
Copy link
Collaborator Author

weight parallel also enabled and acc/loss aligned https://github.com/InternLM/InternEvo-HFModels/tree/enable_wp

@sallyjunjun sallyjunjun force-pushed the refactor_trainer_builder branch from 925637f to ab6265c Compare August 15, 2024 06:23
@season0528 season0528 closed this Aug 19, 2024
@season0528 season0528 reopened this Aug 20, 2024
@season0528 season0528 closed this Aug 21, 2024
@season0528
Copy link
Collaborator Author

Close this PR, since we choose to drop the repo https://github.com/InternLM/InternEvo-HFModels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants