From 0f87f47644813737794312d19d43363b187959b9 Mon Sep 17 00:00:00 2001 From: jiaxingli <43110891+li126com@users.noreply.github.com> Date: Tue, 16 Jul 2024 19:17:45 +0800 Subject: [PATCH] Fix(docker): update docker image and dockerfile for new version (#200) --- README-zh-Hans.md | 6 +++--- doc/en/install.md | 24 ++++++++++++++++-------- doc/install.md | 21 ++++++++++++++------- docker.Makefile | 24 ++++++++++-------------- docker/Dockerfile-centos | 15 +++++++++------ docker/Dockerfile-ubuntu | 15 +++++++++------ experiment/Dockerfile-centos | 23 +++++++++++++---------- experiment/Dockerfile-ubuntu | 23 +++++++++++++---------- experiment/README-CN.md | 12 +++--------- experiment/README-EN.md | 12 +++--------- 10 files changed, 93 insertions(+), 82 deletions(-) diff --git a/README-zh-Hans.md b/README-zh-Hans.md index 10768d65..237a50e2 100644 --- a/README-zh-Hans.md +++ b/README-zh-Hans.md @@ -17,9 +17,9 @@ [![使用文档](https://readthedocs.org/projects/internevo/badge/?version=latest)](https://internevo.readthedocs.io/zh_CN/latest/?badge=latest) [![license](./doc/imgs/license.svg)](./LICENSE) -[📘使用教程](./doc/en/usage.md) | -[🛠️安装指引](./doc/en/install.md) | -[📊框架性能](./doc/en/train_performance.md) | +[📘使用教程](./doc/usage.md) | +[🛠️安装指引](./doc/install.md) | +[📊框架性能](./doc/train_performance.md) | [🤔问题报告](https://github.com/InternLM/InternEvo/issues/new) [English](./README.md) | diff --git a/doc/en/install.md b/doc/en/install.md index 304d110a..eae4a12c 100644 --- a/doc/en/install.md +++ b/doc/en/install.md @@ -78,7 +78,10 @@ cd ../../../../ Install Apex (version 23.05): ```bash cd ./third_party/apex -pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./ +# if pip >= 23.1 (ref: https://pip.pypa.io/en/stable/news/#v23-1) which supports multiple `--config-settings` with the same key... +pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./ +# otherwise +pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --global-option="--cpp_ext" --global-option="--cuda_ext" ./ cd ../../ ``` @@ -88,31 +91,36 @@ pip install git+https://github.com/databricks/megablocks@v0.3.2 # MOE need ``` ### Environment Image -Users can use the provided dockerfile combined with docker.Makefile to build their own images, or obtain images with InternEvo runtime environment installed from https://hub.docker.com/r/internlm/internlm. +Users can use the provided dockerfile combined with docker.Makefile to build their own images, or obtain images with InternEvo runtime environment installed from https://hub.docker.com/r/internlm/internevo/tags. #### Image Configuration and Build The configuration and build of the Dockerfile are implemented through the docker.Makefile. To build the image, execute the following command in the root directory of InternEvo: ``` bash make -f docker.Makefile BASE_OS=centos7 ``` -In docker.Makefile, you can customize the basic image, environment version, etc., and the corresponding parameters can be passed directly through the command line. For BASE_OS, ubuntu20.04 and centos7 are respectively supported. +In docker.Makefile, you can customize the basic image, environment version, etc., and the corresponding parameters can be passed directly through the command line. The default is the recommended environment version. For BASE_OS, ubuntu20.04 and centos7 are respectively supported. #### Pull Standard Image The standard image based on ubuntu and centos has been built and can be directly pulled: ```bash # ubuntu20.04 -docker pull internlm/internlm:torch1.13.1-cuda11.7.1-flashatten1.0.5-ubuntu20.04 +docker pull internlm/internevo:torch2.1.0-cuda11.8.0-flashatten2.2.1-ubuntu20.04 # centos7 -docker pull internlm/internlm:torch1.13.1-cuda11.7.1-flashatten1.0.5-centos7 +docker pull internlm/internevo:torch2.1.0-cuda11.8.0-flashatten2.2.1-centos7 ``` #### Run Container For the local standard image built with dockerfile or pulled, use the following command to run and enter the container: ```bash -docker run --gpus all -it -m 500g --cap-add=SYS_PTRACE --cap-add=IPC_LOCK --shm-size 20g --network=host --name myinternlm internlm/internlm:torch1.13.1-cuda11.7.1-flashatten1.0.5-centos7 bash +docker run --gpus all -it -m 500g --cap-add=SYS_PTRACE --cap-add=IPC_LOCK --shm-size 20g --network=host --name internevo_centos internlm/internevo:torch2.1.0-cuda11.8.0-flashatten2.2.1-centos7 bash +``` + +#### Start Training +The default directory in the container is `/InternEvo`, please start training according to the [Usage](./usage.md). The default 7B model starts the single-machine with 8-GPU training command example as follows: +```bash +torchrun --nproc_per_node=8 --nnodes=1 train.py --config configs/7B_sft.py --launcher torch ``` -The default directory in the container is `/InternLM`, please start training according to the [Usage](./usage.md). ## Environment Installation (NPU) For machines with NPU, the version of the installation environment can refer to that of GPU. Use Ascend's torch_npu instead of torch on NPU machines. Additionally, Flash-Attention and Apex are no longer supported for installation on NPU. The corresponding functionalities have been internally implemented in the InternEvo codebase. The following tutorial is only for installing torch_npu. @@ -135,4 +143,4 @@ pip3 install pyyaml pip3 install setuptools wget https://gitee.com/ascend/pytorch/releases/download/v6.0.rc1-pytorch2.1.0/torch_npu-2.1.0.post3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl pip install torch_npu-2.1.0.post3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -``` \ No newline at end of file +``` diff --git a/doc/install.md b/doc/install.md index b894f8fa..f1493472 100644 --- a/doc/install.md +++ b/doc/install.md @@ -78,7 +78,10 @@ cd ../../../../ 安装 Apex (version 23.05): ```bash cd ./third_party/apex -pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./ +# if pip >= 23.1 (ref: https://pip.pypa.io/en/stable/news/#v23-1) which supports multiple `--config-settings` with the same key... +pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./ +# otherwise +pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --global-option="--cpp_ext" --global-option="--cuda_ext" ./ cd ../../ ``` @@ -88,32 +91,36 @@ pip install git+https://github.com/databricks/megablocks@v0.3.2 # MOE相关 ``` ### 环境镜像 -用户可以使用提供的 dockerfile 结合 docker.Makefile 来构建自己的镜像,或者也可以从 https://hub.docker.com/r/internlm/internlm 获取安装了 InternEvo 运行环境的镜像。 +用户可以使用提供的 dockerfile 结合 docker.Makefile 来构建自己的镜像,或者也可以从 https://hub.docker.com/r/internlm/internevo/tags 获取安装了 InternEvo 运行环境的镜像。 #### 镜像配置及构造 dockerfile 的配置以及构造均通过 docker.Makefile 文件实现,在 InternEvo 根目录下执行如下命令即可 build 镜像: ``` bash make -f docker.Makefile BASE_OS=centos7 ``` -在 docker.Makefile 中可自定义基础镜像,环境版本等内容,对应参数可直接通过命令行传递。对于 BASE_OS 分别支持 ubuntu20.04 和 centos7。 +在 docker.Makefile 中可自定义基础镜像,环境版本等内容,对应参数可直接通过命令行传递,默认为推荐的环境版本。对于 BASE_OS 分别支持 ubuntu20.04 和 centos7。 #### 镜像拉取 基于 ubuntu 和 centos 的标准镜像已经 build 完成也可直接拉取使用: ```bash # ubuntu20.04 -docker pull internlm/internlm:torch1.13.1-cuda11.7.1-flashatten1.0.5-ubuntu20.04 +docker pull internlm/internevo:torch2.1.0-cuda11.8.0-flashatten2.2.1-ubuntu20.04 # centos7 -docker pull internlm/internlm:torch1.13.1-cuda11.7.1-flashatten1.0.5-centos7 +docker pull internlm/internevo:torch2.1.0-cuda11.8.0-flashatten2.2.1-centos7 ``` #### 容器启动 对于使用 dockerfile 构建或拉取的本地标准镜像,使用如下命令启动并进入容器: ```bash -docker run --gpus all -it -m 500g --cap-add=SYS_PTRACE --cap-add=IPC_LOCK --shm-size 20g --network=host --name myinternlm internlm/internlm:torch1.13.1-cuda11.7.1-flashatten1.0.5-centos7 bash +docker run --gpus all -it -m 500g --cap-add=SYS_PTRACE --cap-add=IPC_LOCK --shm-size 20g --network=host --name internevo_centos internlm/internevo:torch2.1.0-cuda11.8.0-flashatten2.2.1-centos7 bash ``` -容器内默认目录即 `/InternLM`,根据[使用文档](./usage.md)即可启动训练。 +#### 训练启动 +容器内默认目录即 `/InternEvo`,参考[使用文档](./usage.md)可获取具体使用方法。默认7B模型启动单机8卡训练命令样例: +```bash +torchrun --nproc_per_node=8 --nnodes=1 train.py --config configs/7B_sft.py --launcher torch +``` ## 环境安装(NPU) 在搭载NPU的机器上安装环境的版本可参考GPU,在NPU上使用昇腾torch_npu代替torch,同时Flash-Attention和Apex不再支持安装,相应功能已由InternEvo代码内部实现。以下教程仅为torch_npu安装。 diff --git a/docker.Makefile b/docker.Makefile index 7cfd55af..2bcbbae0 100644 --- a/docker.Makefile +++ b/docker.Makefile @@ -1,12 +1,11 @@ DOCKER_REGISTRY ?= docker.io -DOCKER_ORG ?= my -DOCKER_IMAGE ?= internlm +DOCKER_ORG ?= internlm +DOCKER_IMAGE ?= internevo DOCKER_FULL_NAME = $(DOCKER_REGISTRY)/$(DOCKER_ORG)/$(DOCKER_IMAGE) -CUDA_VERSION = 11.7.1 -GCC_VERSION = 10.2.0 - +CUDA_VERSION = 11.8.0 CUDNN_VERSION = 8 + BASE_RUNTIME = # ubuntu20.04 centos7 BASE_OS = centos7 @@ -17,9 +16,10 @@ CUDA_CHANNEL = nvidia INSTALL_CHANNEL ?= pytorch PYTHON_VERSION ?= 3.10 -PYTORCH_VERSION ?= 1.13.1 -TORCHVISION_VERSION ?= 0.14.1 -TORCHAUDIO_VERSION ?= 0.13.1 +PYTORCH_TAG ?= 2.1.0 +PYTORCH_VERSION ?= 2.1.0+cu118 +TORCHVISION_VERSION ?= 0.16.0+cu118 +TORCHAUDIO_VERSION ?= 2.1.0+cu118 BUILD_PROGRESS ?= auto TRITON_VERSION ?= GMP_VERSION ?= 6.2.1 @@ -28,18 +28,14 @@ MPC_VERSION ?= 1.2.1 GCC_VERSION ?= 10.2.0 HTTPS_PROXY_I ?= HTTP_PROXY_I ?= -FLASH_ATTEN_VERSION ?= 1.0.5 +FLASH_ATTEN_VERSION ?= 2.2.1 FLASH_ATTEN_TAG ?= v${FLASH_ATTEN_VERSION} BUILD_ARGS = --build-arg BASE_IMAGE=$(BASE_IMAGE) \ --build-arg PYTHON_VERSION=$(PYTHON_VERSION) \ - --build-arg CUDA_VERSION=$(CUDA_VERSION) \ - --build-arg CUDA_CHANNEL=$(CUDA_CHANNEL) \ --build-arg PYTORCH_VERSION=$(PYTORCH_VERSION) \ --build-arg TORCHVISION_VERSION=$(TORCHVISION_VERSION) \ --build-arg TORCHAUDIO_VERSION=$(TORCHAUDIO_VERSION) \ - --build-arg INSTALL_CHANNEL=$(INSTALL_CHANNEL) \ - --build-arg TRITON_VERSION=$(TRITON_VERSION) \ --build-arg GMP_VERSION=$(GMP_VERSION) \ --build-arg MPFR_VERSION=$(MPFR_VERSION) \ --build-arg MPC_VERSION=$(MPC_VERSION) \ @@ -98,7 +94,7 @@ all: devel-image .PHONY: devel-image devel-image: BASE_IMAGE := $(BASE_DEVEL) -devel-image: DOCKER_TAG := torch${PYTORCH_VERSION}-cuda${CUDA_VERSION}-flashatten${FLASH_ATTEN_VERSION}-${BASE_OS} +devel-image: DOCKER_TAG := torch${PYTORCH_TAG}-cuda${CUDA_VERSION}-flashatten${FLASH_ATTEN_VERSION}-${BASE_OS} devel-image: $(DOCKER_BUILD) diff --git a/docker/Dockerfile-centos b/docker/Dockerfile-centos index 9a8f8e5b..7b2a0fd0 100644 --- a/docker/Dockerfile-centos +++ b/docker/Dockerfile-centos @@ -107,18 +107,18 @@ ENV CXX=${GCC_HOME}/bin/c++ ############################################################################## -# Install InternLM development environment, including flash-attention and apex +# Install InternEvo development environment, including flash-attention and apex ############################################################################## FROM dep as intrenlm-dev -COPY . /InternLM -WORKDIR /InternLM +COPY . /InternEvo +WORKDIR /InternEvo ARG https_proxy ARG http_proxy ARG TORCH_CUDA_ARCH_LIST="8.0;8.6+PTX" RUN git submodule update --init --recursive \ && /opt/conda/bin/pip --no-cache-dir install -r requirements/torch.txt \ && /opt/conda/bin/pip --no-cache-dir install -r requirements/runtime.txt \ - && cd /InternLM/third_party/flash-attention \ + && cd /InternEvo/third_party/flash-attention \ && /opt/conda/bin/python setup.py install \ && cd ./csrc \ && cd fused_dense_lib && /opt/conda/bin/pip install -v . \ @@ -127,6 +127,9 @@ RUN git submodule update --init --recursive \ && cd ../layer_norm && /opt/conda/bin/pip install -v . \ && cd ../../../../ \ && cd ./third_party/apex \ - && /opt/conda/bin/pip --no-cache-dir install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./ \ + && /opt/conda/bin/pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./ \ + && /opt/conda/bin/pip install pytorch-extension \ && /opt/conda/bin/pip cache purge \ - && rm -rf ~/.cache/pip + && rm -rf ~/.cache/pip \ + && /opt/conda/bin/conda init \ + && . ~/.bashrc diff --git a/docker/Dockerfile-ubuntu b/docker/Dockerfile-ubuntu index da16f560..8c429381 100644 --- a/docker/Dockerfile-ubuntu +++ b/docker/Dockerfile-ubuntu @@ -88,18 +88,18 @@ ENV CXX=${GCC_HOME}/bin/c++ ############################################################################## -# Install InternLM development environment, including flash-attention and apex +# Install InternEvo development environment, including flash-attention and apex ############################################################################## FROM dep as intrenlm-dev -COPY . /InternLM -WORKDIR /InternLM +COPY . /InternEvo +WORKDIR /InternEvo ARG https_proxy ARG http_proxy ARG TORCH_CUDA_ARCH_LIST="8.0;8.6+PTX" RUN git submodule update --init --recursive \ && /opt/conda/bin/pip --no-cache-dir install -r requirements/torch.txt \ && /opt/conda/bin/pip --no-cache-dir install -r requirements/runtime.txt \ - && cd /InternLM/third_party/flash-attention \ + && cd /InternEvo/third_party/flash-attention \ && /opt/conda/bin/python setup.py install \ && cd ./csrc \ && cd fused_dense_lib && /opt/conda/bin/pip install -v . \ @@ -108,6 +108,9 @@ RUN git submodule update --init --recursive \ && cd ../layer_norm && /opt/conda/bin/pip install -v . \ && cd ../../../../ \ && cd ./third_party/apex \ - && /opt/conda/bin/pip --no-cache-dir install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./ \ + && /opt/conda/bin/pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./ \ + && /opt/conda/bin/pip install pytorch-extension \ && /opt/conda/bin/pip cache purge \ - && rm -rf ~/.cache/pip + && rm -rf ~/.cache/pip \ + && /opt/conda/bin/conda init \ + && . ~/.bashrc diff --git a/experiment/Dockerfile-centos b/experiment/Dockerfile-centos index 4ac9c64e..e967a1c6 100644 --- a/experiment/Dockerfile-centos +++ b/experiment/Dockerfile-centos @@ -106,11 +106,11 @@ ENV CXX=${GCC_HOME}/bin/c++ ############################################################################## -# Install InternLM development environment, including flash-attention and apex +# Install InternEvo development environment, including flash-attention and apex ############################################################################## FROM dep as intrenlm-dev -COPY . /InternLM -WORKDIR /InternLM +COPY . /InternEvo +WORKDIR /InternEvo ARG https_proxy ARG http_proxy ARG PYTORCH_VERSION @@ -134,11 +134,11 @@ RUN /opt/conda/bin/pip --no-cache-dir install \ torch-scatter \ pyecharts \ py-libnuma \ - -f https://data.pyg.org/whl/torch-${PYTORCH_VERSION}+cu117.html \ + -f https://data.pyg.org/whl/torch-${PYTORCH_VERSION}.html \ && /opt/conda/bin/pip --no-cache-dir install \ - --extra-index-url https://download.pytorch.org/whl/cu117 \ - torch==${PYTORCH_VERSION}+cu117 \ - torchvision==${TORCHVISION_VERSION}+cu117 \ + --extra-index-url https://download.pytorch.org/whl/cu118 \ + torch==${PYTORCH_VERSION} \ + torchvision==${TORCHVISION_VERSION} \ torchaudio==${TORCHAUDIO_VERSION} ARG https_proxy @@ -147,7 +147,7 @@ ARG TORCH_CUDA_ARCH_LIST="8.0;8.6+PTX" ARG FLASH_ATTEN_TAG RUN git submodule update --init --recursive \ - && cd /InternLM/third_party/flash-attention \ + && cd /InternEvo/third_party/flash-attention \ && git checkout ${FLASH_ATTEN_TAG} \ && /opt/conda/bin/python setup.py install \ && cd ./csrc \ @@ -157,6 +157,9 @@ RUN git submodule update --init --recursive \ && cd ../layer_norm && /opt/conda/bin/pip install -v . \ && cd ../../../../ \ && cd ./third_party/apex \ - && /opt/conda/bin/pip --no-cache-dir install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./ \ + && /opt/conda/bin/pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./ \ + && /opt/conda/bin/pip install pytorch-extension \ && /opt/conda/bin/pip cache purge \ - && rm -rf ~/.cache/pip + && rm -rf ~/.cache/pip \ + && /opt/conda/bin/conda init \ + && . ~/.bashrc diff --git a/experiment/Dockerfile-ubuntu b/experiment/Dockerfile-ubuntu index 055f9a62..79945702 100644 --- a/experiment/Dockerfile-ubuntu +++ b/experiment/Dockerfile-ubuntu @@ -87,11 +87,11 @@ ENV CXX=${GCC_HOME}/bin/c++ ############################################################################## -# Install InternLM development environment, including flash-attention and apex +# Install InternEvo development environment, including flash-attention and apex ############################################################################## FROM dep as intrenlm-dev -COPY . /InternLM -WORKDIR /InternLM +COPY . /InternEvo +WORKDIR /InternEvo ARG https_proxy ARG http_proxy ARG PYTORCH_VERSION @@ -115,11 +115,11 @@ RUN /opt/conda/bin/pip --no-cache-dir install \ torch-scatter \ pyecharts \ py-libnuma \ - -f https://data.pyg.org/whl/torch-${PYTORCH_VERSION}+cu117.html \ + -f https://data.pyg.org/whl/torch-${PYTORCH_VERSION}.html \ && /opt/conda/bin/pip --no-cache-dir install \ - --extra-index-url https://download.pytorch.org/whl/cu117 \ - torch==${PYTORCH_VERSION}+cu117 \ - torchvision==${TORCHVISION_VERSION}+cu117 \ + --extra-index-url https://download.pytorch.org/whl/cu118 \ + torch==${PYTORCH_VERSION} \ + torchvision==${TORCHVISION_VERSION} \ torchaudio==${TORCHAUDIO_VERSION} ARG https_proxy @@ -128,7 +128,7 @@ ARG TORCH_CUDA_ARCH_LIST="8.0;8.6+PTX" ARG FLASH_ATTEN_TAG RUN git submodule update --init --recursive \ - && cd /InternLM/third_party/flash-attention \ + && cd /InternEvo/third_party/flash-attention \ && git checkout ${FLASH_ATTEN_TAG} \ && /opt/conda/bin/python setup.py install \ && cd ./csrc \ @@ -138,6 +138,9 @@ RUN git submodule update --init --recursive \ && cd ../layer_norm && /opt/conda/bin/pip install -v . \ && cd ../../../../ \ && cd ./third_party/apex \ - && /opt/conda/bin/pip --no-cache-dir install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./ \ + && /opt/conda/bin/pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./ \ + && /opt/conda/bin/pip install pytorch-extension \ && /opt/conda/bin/pip cache purge \ - && rm -rf ~/.cache/pip + && rm -rf ~/.cache/pip \ + && /opt/conda/bin/conda init \ + && . ~/.bashrc diff --git a/experiment/README-CN.md b/experiment/README-CN.md index 7fee559b..de56039b 100644 --- a/experiment/README-CN.md +++ b/experiment/README-CN.md @@ -2,24 +2,18 @@ 本模块用于测试新版本环境,默认测试新环境 torch=2.0.1,flash-attention=2.1.0。新环境可能具有不稳定性,标准环境安装请参考:[安装文档](../doc/install.md) ### 镜像构建及拉取 -构建镜像时请于 InternLM 根目录下执行 docker.Makefile,该文件与标准环境镜像共用,所使用的 Dockerfile 位于 experiment 目录下。也可直接从 https://hub.docker.com/r/internlm/internlm 拉取镜像,命令如下: +构建镜像时请于 InternEvo 根目录下执行 docker.Makefile,该文件与标准环境镜像共用,所使用的 Dockerfile 位于 experiment 目录下。也可直接从 https://hub.docker.com/r/internlm/internevo/tags 拉取镜像,命令如下: ```bash # 构建镜像 # ubuntu20.04 make -f docker.Makefile BASE_OS=ubuntu20.04 DOCKERFILE_PATH=./experiment/Dockerfile-ubuntu PYTORCH_VERSION=2.0.1 TORCHVISION_VERSION=0.15.2 TORCHAUDIO_VERSION=2.0.2 FLASH_ATTEN_VERSION=2.1.0 # centos7 make -f docker.Makefile BASE_OS=centos7 DOCKERFILE_PATH=./experiment/Dockerfile-centos PYTORCH_VERSION=2.0.1 TORCHVISION_VERSION=0.15.2 TORCHAUDIO_VERSION=2.0.2 FLASH_ATTEN_VERSION=2.1.0 - -# 拉取镜像 -# ubuntu20.04 -docker pull internlm/internlm:experiment-torch2.0.1-flashatten2.1.0-ubuntu20.04 -# centos7 -docker pull internlm/internlm:experiment-torch2.0.1-flashatten2.1.0-centos7 ``` ### 容器启动 对于使用 dockerfile 构建或拉取的本地标准镜像,使用如下命令启动并进入容器: ```bash -docker run --gpus all -it -m 500g --cap-add=SYS_PTRACE --cap-add=IPC_LOCK --shm-size 20g --network=host --name myinternlm internlm/internlm:experiment-torch2.0.1-flashatten2.1.0-centos7 bash +docker run --gpus all -it -m 500g --cap-add=SYS_PTRACE --cap-add=IPC_LOCK --shm-size 20g --network=host --name myinternlm internlm/Internevo:experiment-torch2.0.1-flashatten2.1.0-centos7 bash ``` -容器内默认目录即 `/InternLM`,根据[使用文档](../doc/usage.md)即可启动训练。 +容器内默认目录即 `/InternEvo`,根据[使用文档](../doc/usage.md)即可启动训练。 diff --git a/experiment/README-EN.md b/experiment/README-EN.md index f68efc86..8f4daf6a 100644 --- a/experiment/README-EN.md +++ b/experiment/README-EN.md @@ -2,24 +2,18 @@ This module is used to test the new version environment, the default test new environment is torch=2.0.1, flash-attention=2.1.0. The new environment may be unstable, for the standard environment installation please refer to: [installation guide](../doc/en/install.md) ### Build and Pull Image -When building the image, please make docker.Makefile in the InternLM root directory. This Makefile is shared with the standard environment image, and the Dockerfile used is located in the experiment directory. You can also pull the image directly from https://hub.docker.com/r/internlm/internlm, the command is as follows: +When building the image, please make docker.Makefile in the InternEvo root directory. This Makefile is shared with the standard environment image, and the Dockerfile used is located in the experiment directory. You can also pull the image directly from https://hub.docker.com/r/internlm/internevo/tags, the command is as follows: ```bash # Build Image # ubuntu20.04 make -f docker.Makefile BASE_OS=ubuntu20.04 DOCKERFILE_PATH=./experiment/Dockerfile-ubuntu PYTORCH_VERSION=2.0.1 TORCHVISION_VERSION=0.15.2 TORCHAUDIO_VERSION=2.0.2 FLASH_ATTEN_VERSION=2.1.0 # centos7 make -f docker.Makefile BASE_OS=centos7 DOCKERFILE_PATH=./experiment/Dockerfile-centos PYTORCH_VERSION=2.0.1 TORCHVISION_VERSION=0.15.2 TORCHAUDIO_VERSION=2.0.2 FLASH_ATTEN_VERSION=2.1.0 - -# Pull Image -# ubuntu20.04 -docker pull internlm/internlm:experiment-torch2.0.1-flashatten2.1.0-ubuntu20.04 -# centos7 -docker pull internlm/internlm:experiment-torch2.0.1-flashatten2.1.0-centos7 ``` ### Run Container For the local standard image built with dockerfile or pulled, use the following command to run and enter the container: ```bash -docker run --gpus all -it -m 500g --cap-add=SYS_PTRACE --cap-add=IPC_LOCK --shm-size 20g --network=host --name myinternlm internlm/internlm:experiment-torch2.0.1-flashatten2.1.0-centos7 bash +docker run --gpus all -it -m 500g --cap-add=SYS_PTRACE --cap-add=IPC_LOCK --shm-size 20g --network=host --name myinternlm internlm/Internevo:experiment-torch2.0.1-flashatten2.1.0-centos7 bash ``` -The default directory in the container is `/InternLM`, please start training according to the [Usage](../doc/en/usage.md). +The default directory in the container is `/InternEvo`, please start training according to the [Usage](../doc/en/usage.md).