New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[Doc] 多卡并行文档修改建议 #115

Open

linnlh opened this issue Sep 13, 2024 · 0 comments

linnlh commented Sep 13, 2024

📚 问题

在多卡并行文档中关于多机推理的 DISTRIBUTE_CONFIG_FILE 的示例文件如下所示：

{
    "llama13B_2A10_PCIE_1_inference_part0": {
        "name": "llama13B_2A10_PCIE_1_inference_part0",
        "ip": "11.161.48.116",
        "port": 10000
    },
    "llama13B_2A10_PCIE_1_inference_part1": {
        "name": "llama13B_2A10_PCIE_1_inference_part1",
        "ip": "11.161.48.116",
        "port": 20000
    }
}

模型被拆分部署到两个节点中，这个示例文件中的 ip 字段部分在两个节点中相同，带有一定的迷惑性。实际测试在多节点部署的时候 ip 字段应该填写为各自节点对应的 ip 地址。

p.s. 对于多机部署启动时需要填写的环境变量似乎没有相关的文档说明。

相关

上述提及的相关文档链接：MultiGPU.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment