Skip to content

Commit

Permalink
fix doc
Browse files Browse the repository at this point in the history
  • Loading branch information
sallyjunjun committed Aug 26, 2024
1 parent 1a392b8 commit f41bfc0
Show file tree
Hide file tree
Showing 9 changed files with 1,076 additions and 638 deletions.
128 changes: 71 additions & 57 deletions doc/code-docs/locales/en/LC_MESSAGES/checkpoint.po
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ msgid ""
msgstr ""
"Project-Id-Version: InternLM \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2024-01-23 18:01+0800\n"
"POT-Creation-Date: 2024-08-26 16:29+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: en\n"
Expand All @@ -16,7 +16,7 @@ msgstr ""
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.12.1\n"
"Generated-By: Babel 2.15.0\n"

#: ../../source/checkpoint.rst:2
msgid "模型加载与保存"
Expand Down Expand Up @@ -54,64 +54,66 @@ msgstr ""
"file. Currently, the relevant parameters are as follows"

#: ../../source/checkpoint.rst:13
msgid "``enable_save_ckpt``: 是否开启检查点存储功能(不影响检查点加载)。参数类型 ``bool``,必选参数。"
msgid "``enable_save_ckpt`` : 是否开启检查点存储功能(不影响检查点加载)。参数类型 ``bool`` ,必选参数。"
msgstr ""
"``enable_save_ckpt``: Whether to enable checkpoint storage functionality "
"(does not affect checkpoint loading). Parameter type: `bool`, it is a "
"required parameter."

#: ../../source/checkpoint.rst:15
msgid "``save_ckpt_folder``: 检查点存储路径,参数类型 ``str``,默认为: ``None``,在开启检查点存储功能时为必选参数。"
msgid ""
"``save_ckpt_folder`` : 检查点存储路径,参数类型 ``str`` ,默认为: ``None`` "
",在开启检查点存储功能时为必选参数。"
msgstr ""
"``save_ckpt_folder``: Checkpoint storage path. Parameter type: ``str``. "
"This is a required parameter when enabling checkpoint storage "
"functionality."

#: ../../source/checkpoint.rst:17
msgid "``checkpoint_every``: 检查点存储频率,参数类型 ``int``,默认为: ``50``。"
msgid "``checkpoint_every`` : 检查点存储频率,参数类型 ``int`` ,默认为: ``50`` 。"
msgstr ""
"``checkpoint_every``: Checkpoint storage frequency. Parameter type: "
"``int``."
"``checkpoint_every`` : Checkpoint storage frequency. Parameter type: "
"``int`` . default is: ``50`` . "

#: ../../source/checkpoint.rst:19
msgid ""
"``load_ckpt_folder``: 初始化检查点/权重加载路径。参数类型 ``str``,默认为: ``None``,详见 :ref"
":`load-ckpt-folder`。"
"``load_ckpt_info`` : 初始化检查点/权重加载信息。参数类型 ``dict`` ,默认为: ``None`` ,详见 :ref:"
" `load-ckpt-info` 。"
msgstr ""
"``load_ckpt_folder``: Initialization checkpoint/weight loading path. "
"Parameter type: ``str``. Default is ``None``. :ref:`load-ckpt-folder`"
"``load_ckpt_info`` : Initialization checkpoint/weight loading path. "
"Parameter type: ``dict`` . Default is ``None`` . :ref: `load-ckpt-info` "

#: ../../source/checkpoint.rst:21
msgid "``async_upload``: 是否开启异步上传,默认值为:``False``,详见 :ref:`asyncupload`。"
msgid "``async_upload`` : 是否开启异步上传,默认值为: ``False`` ,详见 :ref: `asyncupload` 。"
msgstr ""
"``async_upload``: Whether to enable asynchronous uploading. See "
"documentation for more details :ref:`asyncupload`"
"``async_upload``: Whether to enable asynchronous uploading. Default is "
"``False`` . See documentation for more details :ref:`asyncupload`"

#: ../../source/checkpoint.rst:23
msgid "``async_upload_tmp_folder``: 异步上传临时存储路径。"
msgid "``async_upload_tmp_folder`` : 异步上传临时存储路径。"
msgstr ""
"``async_upload_tmp_folder``: Temporary storage path for asynchronous "
"uploading."

#: ../../source/checkpoint.rst:25
msgid ""
"``oss_snapshot_freq``: 快照存储频率,默认值为:``checkpoint_every``的一半。详见 "
":ref:`snapshot`。"
"``oss_snapshot_freq`` : 快照存储频率,默认值为: ``checkpoint_every`` 的一半。详见 :ref: "
"`snapshot` 。"
msgstr ""
"``oss_snapshot_freq``: Snapshot storage frequency. See documentation for "
"more details :ref:`snapshot`."

#: ../../source/checkpoint.rst:27
msgid "``auto_resume``: 是否开启检查点自动恢复,默认值为:``True``,详见 :ref:`autoresume`。"
msgid "``auto_resume`` : 是否开启检查点自动恢复,默认值为: ``True`` ,详见 :ref: `autoresume` 。"
msgstr ""
"``auto_resume``: Whether to enable automatic checkpoint resume. See "
"documentation for more details :ref:`autoresume`."
"``auto_resume``: Whether to enable automatic checkpoint resume. Default "
"is ``True`` . See documentation for more details :ref:`autoresume`."

#: ../../source/checkpoint.rst:29
msgid "``stop_file_path`` : 检查点存储控制文件的路径,默认值为:``None``,详见 :ref:`stopfile`。"
msgid "``stop_file_path`` : 检查点存储控制文件的路径,默认值为: ``None`` ,详见 :ref: `stopfile` 。"
msgstr ""
"``stop_file_path``: Path to the checkpoint storage control file. See "
"documentation for more details :ref:`stopfile`."
"``stop_file_path``: Path to the checkpoint storage control file. Default "
"is ``None`` . See documentation for more details :ref:`stopfile`."

#: ../../source/checkpoint.rst:32
msgid "下面给出config文件的参数设置例子:"
Expand Down Expand Up @@ -170,14 +172,14 @@ msgstr ""
" storage speed."

#: ../../source/checkpoint.rst:79
msgid "(2) 模型加载(load_ckpt_folder)格式约定"
msgstr "(2) Model loading format conventions (load_ckpt_folder)."
msgid "(2) 模型加载(load_ckpt_info)格式约定"
msgstr "(2) Model loading format conventions (load_ckpt_info)."

#: ../../source/checkpoint.rst:81
msgid "load_ckpt_folder 由三个字段组成, ``path`` 、 ``content`` 和 ``ckpt_type`` 。"
msgid "load_ckpt_info 由三个字段组成, ``path`` 、 ``content`` 和 ``ckpt_type`` 。"
msgstr ""
"``load_ckpt_folder`` consists of three fields: ``path``, ``content``, and"
" ``ckpt_type``."
"``load_ckpt_info`` consists of three fields: ``path``, ``content``, and "
"``ckpt_type``."

#: ../../source/checkpoint.rst:83
msgid "``path``:给出了检查点/初始化模型权重的加载路径(path的格式见下小节)"
Expand Down Expand Up @@ -221,18 +223,26 @@ msgstr ""
"currently supported fields include:"

#: ../../source/checkpoint.rst:95
msgid "``internlm``:internevo约定的checkpoint存储格式。"
msgstr "``internlm``: Checkpoint storage format as per InternEvo conventions."
msgid "``internevo``:internevo约定的checkpoint存储格式。"
msgstr "``internevo``: Checkpoint storage format specified by internevo."

#: ../../source/checkpoint.rst:96
msgid "``llama``:huggingface llama约定的checkpoint存储格式。"
msgstr "``llama``: Checkpoint storage format specified by huggingface llama."

#: ../../source/checkpoint.rst:97
msgid "``hf``:huggingface 模型约定的checkpoint存储格式。"
msgstr "``hf``: Checkpoint storage format specified by huggingface model."

#: ../../source/checkpoint.rst:99
msgid "下面给出两个例子:"
msgstr "Here are two examples:"

#: ../../source/checkpoint.rst:111
#: ../../source/checkpoint.rst:112
msgid "异步上传"
msgstr "Asynchronous upload."

#: ../../source/checkpoint.rst:113
#: ../../source/checkpoint.rst:114
msgid ""
"异步上传会先同步的将模型存储到 ``async_upload_tmp_folder`` "
"中,再异步的写入远端存储(OSS/NFS)中。从而避免存储ckpt阻塞训练过长时间。"
Expand All @@ -242,18 +252,18 @@ msgstr ""
"storage (OSS/NFS). This helps prevent blocking training for extended "
"periods while storing checkpoints."

#: ../../source/checkpoint.rst:115 ../../source/checkpoint.rst:133
#: ../../source/checkpoint.rst:149 ../../source/checkpoint.rst:164
#: ../../source/checkpoint.rst:116 ../../source/checkpoint.rst:134
#: ../../source/checkpoint.rst:150 ../../source/checkpoint.rst:165
msgid "config.ckpt 中相关的参数:"
msgstr "The parameters related to ``config.ckpt`` are:"

#: ../../source/checkpoint.rst:117
#: ../../source/checkpoint.rst:118
msgid "``async_upload``: 是否开启异步上传。参数类型 ``bool/None``,默认为 ``False``。"
msgstr ""
"``async_upload``: Whether to enable asynchronous upload. Parameter type: "
"``bool/None``. Default is ``False``."

#: ../../source/checkpoint.rst:119
#: ../../source/checkpoint.rst:120
msgid ""
"``async_upload_tmp_folder``: 异步上传临时存储路径。参数类型 ``str/None``, 默认值为 "
"``/dev/shm/{JOB_NAME}_tmp_ckpt/``。"
Expand All @@ -262,14 +272,14 @@ msgstr ""
"upload. Parameter type: `str/None`. Default value is "
"``/dev/shm/{JOB_NAME}_tmp_ckpt/``."

#: ../../source/checkpoint.rst:121
#: ../../source/checkpoint.rst:122
msgid "需要注意的是,异步上传功能仅在backend为非local时才会有效果,bcakend为local时只支持同步存储。"
msgstr ""
"It's important to note that asynchronous upload functionality is only "
"effective when the backend is set to \"boto3.\" When the backend is set "
"to \"local,\" only synchronous storage is supported."

#: ../../source/checkpoint.rst:123
#: ../../source/checkpoint.rst:124
msgid ""
"``async_upload_tmp_folder`` "
"设置的的原则为尽量设置为计算节点的local目录,这样才可以获得最佳的异步上传速度,一般来说建议为 ``/dev/shm`` 或 "
Expand All @@ -281,11 +291,11 @@ msgstr ""
"or ``/nvme``. If If you use synchronous upload, this path does not need "
"to be given."

#: ../../source/checkpoint.rst:129
#: ../../source/checkpoint.rst:130
msgid "快照检查点"
msgstr "Snapshot Checkpoint"

#: ../../source/checkpoint.rst:131
#: ../../source/checkpoint.rst:132
msgid ""
"快照检查点是一种特殊的检查点,其是为了减少模型因为训练崩溃(ECC error, NCCL error, "
".etc)等问题导致训练任务崩溃而损失的训练进度。其采用交替覆盖写的策略,所占用的存储大小为两个step的检查点所需的空间。配合上异步的检查点写入,在不影响训练速度和存储容量的条件下极大的增大了检查点的存储频率。"
Expand All @@ -299,13 +309,13 @@ msgstr ""
"frequency of checkpoints without affecting training speed and storage "
"capacity."

#: ../../source/checkpoint.rst:135
#: ../../source/checkpoint.rst:136
msgid "``oss_snapshot_freq``: 快照存储频率。参数类型 ``int/None``,默认为 ``50``。"
msgstr ""
"``oss_snapshot_freq``: Snapshot storage frequency. Parameter type "
"``int/None``, default is ``50``"

#: ../../source/checkpoint.rst:137
#: ../../source/checkpoint.rst:138
msgid ""
"``oss_snapshot_freq`` 可以根据模型每step时间酌情设置,一般快照频率在1小时以下,半小时以上为怡/不给(默认值是 "
"``checkpoint_every`` 的二分之一)。"
Expand All @@ -315,11 +325,11 @@ msgstr ""
"is Yi/Non for more than half an hour (the default value is one-half of "
"``checkpoint_every``)"

#: ../../source/checkpoint.rst:143
#: ../../source/checkpoint.rst:144
msgid "检查点自动恢复"
msgstr "Checkpoint automatic recovery"

#: ../../source/checkpoint.rst:145
#: ../../source/checkpoint.rst:146
msgid ""
"检查点自动加载功能的目的是在resume训练时,自动加载 ``save_ckpt_folder`` "
"路径下最新的检查点(包括snapshot检查点)。配合上自动重启机制,可以实现无人干预的任务自动恢复。"
Expand All @@ -330,36 +340,37 @@ msgstr ""
"automatic restart mechanism, tasks can be automatically restored without "
"human intervention."

#: ../../source/checkpoint.rst:147
#: ../../source/checkpoint.rst:148
msgid ""
"该功能默认开启,所以要注意如果需要加载 ``load_ckpt_folder`` 路径下的模型权重,要将 ``auto_resume`` 设置为 "
"该功能默认开启,所以要注意如果需要加载 ``load_ckpt_info`` 路径下的模型权重,要将 ``auto_resume`` 设置为 "
"False,否则可能会产生预期外的行为。"
msgstr ""
"This function is enabled by default, so please note that if you need to "
"load the model weights under the ``load_ckpt_folder`` path, you must set "
"load the model weights under the ``load_ckpt_info`` path, you must set "
"``auto_resume`` to ``False``, otherwise unexpected behavior may occur."

#: ../../source/checkpoint.rst:151
#: ../../source/checkpoint.rst:152
msgid "``auto_resume``: 是否开启检查点自动恢复。参数类型 ``bool``,默认为 ``True``。"
msgstr ""
"``auto_resume``: Whether to enable automatic checkpoint recovery. "
"Parameter type ``bool``, default is ``True``"

#: ../../source/checkpoint.rst:153
#: ../../source/checkpoint.rst:154
msgid ""
"``auto_resume`` 如果为True,则尝试从 ``save_ckpt_folder`` "
"路径中自动加载最新的ckpt,如果找不到,则从step 0开始训练。如果为False,则尝试从 ``load_ckpt_folder`` "
"路径中自动加载最新的ckpt,如果找不到,则从step 0开始训练。如果为False,则尝试从 ``load_ckpt_info`` "
"中加载模型参数。"
msgstr ""
"``auto_resume`` If True, attempts to save_ckpt_folder`Automatically load "
"the latest ckpt in the path. If not found, training will start from step "
"0. If False, try to load model parameters from ``load_ckpt_folder``"
"``auto_resume`` If True, attempts to automatically load the latest ckpt "
"in the path specified in ``save_ckpt_folder`` . If not found, training "
"will start from step 0. If False, try to load model parameters from "
"``load_ckpt_info`` "

#: ../../source/checkpoint.rst:159
#: ../../source/checkpoint.rst:160
msgid "手动控制检查点存储"
msgstr "Manual control of checkpoint storage"

#: ../../source/checkpoint.rst:161
#: ../../source/checkpoint.rst:162
msgid ""
"在模型距离下一次检查点存储还有很长时间,这时如果希望立刻停止一个任务,又不希望丢失目前训练进度时可以使用手动控制检查点存储功能。通过向一个位于NFS上的"
" ``stop_file_path`` 文件中写入希望任务停止的step步数,Global Rank "
Expand All @@ -375,14 +386,14 @@ msgstr ""
"all training processes, and it is agreed that each process will store a "
"checkpoint when training reaches this step, and choose whether to exit."

#: ../../source/checkpoint.rst:166
#: ../../source/checkpoint.rst:167
msgid "``stop_file_path``:检查点存储控制文件的路径,参数类型 ``str/None``,默认为 ``None``,表示关闭该功能。"
msgstr ""
"``stop_file_path``: The path of the checkpoint storage control file, "
"parameter type ``str/None``, the default is ``None``, indicating to turn "
"off this function"

#: ../../source/checkpoint.rst:168
#: ../../source/checkpoint.rst:169
msgid "下面给出一个写入 ``stop_file_path`` 的例子:"
msgstr "An example of writing to ``stop_file_path`` is given below:"

Expand Down Expand Up @@ -422,3 +433,6 @@ msgstr "An example of writing to ``stop_file_path`` is given below:"
#~ msgid "Save checkpoint to the given folder path."
#~ msgstr ""

#~ msgid "``hf_model``:适用于加载huggingface所有模型的checkpoint存储格式。"
#~ msgstr ""

Loading

0 comments on commit f41bfc0

Please sign in to comment.