You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
01/12 22:37:29 - mmengine - WARNING - Failed to search registry with scope "mmengine" in the "builder" registry tree. As a workaround, the current "builder" registry in "xtuner" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmengine" is a correct scope, or whether the registry is initialized.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:07<00:00, 1.85s/it]
You are using an old version of the checkpointing format that is deprecated (We will also silently ignore `gradient_checkpointing_kwargs` in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method `_set_gradient_checkpointing` in your model.
Processing zero checkpoint '/home/chou/.cache/huggingface/hub/models--OpenFace-CQUPT--Human_LLaVA'
Traceback (most recent call last):
File "/home/chou/deep/HumanVLM/xtuner/xtuner/tools/train.py", line 364, in <module>
main()
File "/home/chou/deep/HumanVLM/xtuner/xtuner/tools/train.py", line 353, in main
runner = Runner.from_cfg(cfg)
File "/home/chou/miniconda3/envs/humancaption/lib/python3.8/site-packages/mmengine/runner/runner.py", line 462, in from_cfg
runner = cls(
File "/home/chou/miniconda3/envs/humancaption/lib/python3.8/site-packages/mmengine/runner/runner.py", line 429, in __init__
self.model = self.build_model(model)
File "/home/chou/miniconda3/envs/humancaption/lib/python3.8/site-packages/mmengine/runner/runner.py", line 836, in build_model
model = MODELS.build(model)
File "/home/chou/miniconda3/envs/humancaption/lib/python3.8/site-packages/mmengine/registry/registry.py", line 570, in build
return self.build_func(cfg, *args, **kwargs, registry=self)
File "/home/chou/miniconda3/envs/humancaption/lib/python3.8/site-packages/mmengine/registry/build_functions.py", line 232, in build_model_from_cfg
return build_from_cfg(cfg, registry, default_args)
File "/home/chou/miniconda3/envs/humancaption/lib/python3.8/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
obj = obj_cls(**args) # type: ignore
File "/home/chou/deep/HumanVLM/xtuner/xtuner/model/llava.py", line 109, in __init__
pretrained_state_dict = guess_load_checkpoint(pretrained_pth)
File "/home/chou/deep/HumanVLM/xtuner/xtuner/model/utils.py", line 313, in guess_load_checkpoint
state_dict = get_state_dict_from_zero_checkpoint(
File "/home/chou/deep/HumanVLM/xtuner/xtuner/utils/zero_to_any_dtype.py", line 617, in get_state_dict_from_zero_checkpoint
return _get_state_dict_from_zero_checkpoint(ds_checkpoint_dir,
File "/home/chou/deep/HumanVLM/xtuner/xtuner/utils/zero_to_any_dtype.py", line 228, in _get_state_dict_from_zero_checkpoint
optim_files = get_optim_files(ds_checkpoint_dir)
File "/home/chou/deep/HumanVLM/xtuner/xtuner/utils/zero_to_any_dtype.py", line 103, in get_optim_files
return get_checkpoint_files(checkpoint_dir, '*_optim_states.pt')
File "/home/chou/deep/HumanVLM/xtuner/xtuner/utils/zero_to_any_dtype.py", line 96, in get_checkpoint_files
raise FileNotFoundError(
FileNotFoundError: can't find *_optim_states.pt files in directory '/home/chou/.cache/huggingface/hub/models--OpenFace-CQUPT--Human_LLaVA'
The text was updated successfully, but these errors were encountered:
问题描述
在微调 HumanVLM 模型时,遇到了如下错误:
FileNotFoundError: can't find *_optim_states.pt files in directory '/home/chou/.cache/huggingface/hub/models--OpenFace-CQUPT--Human_LLaVA'
根据报错内容,似乎是优化器状态文件缺失,但模型下载目录中没有相关文件(如 *_optim_states.pt),导致微调无法正常开始。
以下是我的操作步骤和遇到的问题详细描述。
复现步骤
1, 克隆 HumanVLM 项目并安装依赖。
2, 准备预训练模型和数据:
3, 执行微调命令:
xtuner train HumanVLM/human_llama3_8b_instruct_siglip_so400m_large_p14_384_lora_e1_gpu8_finetune.py
4, 遇到上述错误。
附加信息:
The text was updated successfully, but these errors were encountered: