昆山中心安装pytorch1.10-dtk22.04

1) 本地whl所在目录(安装教程以dtk-22.04.1为例)

/public/software/apps/DeepLearning/whl/dtk-22.04.1 
或
/public/software/apps/DeepLearning/whl/dtk-22.04.2

描述

2) 添加变量

添加miopen变量

export MIOPEN_SYSTEM_DB_PATH=/temp/pytorch-miopen-2.8 
export MIOPEN_DEBUG_DISABLE_FIND_DB=1
export MIOPEN_DEBUG_CONV_WINOGRAD=0 
export MIOPEN_DEBUG_CONV_IMPLICIT_GEMM=0
export HSA_USERPTR_FOR_PAGED_MEM=0

3) conda创建python3.7环境(以创建python3.7环境为例)

conda create -n pytorch_1.10-dtk_22.04 python=3.7

描述

添加conda的lib库

export LD_LIBRARY_PATH=/public/home/username/miniconda3/envs/pytorch_1.10-dtk_22.04/lib:$PATH

4)在conda环境中安装PyTorch1.10(以python3.7-pytorch1.10版本为例)

conda activate pytorch_1.10-dtk_22.04

描述

pip install /public/software/apps/DeepLearning/whl/dtk-22.04/torch-1.10.0a0+git450cdd1.dtk22.4-cp37-cp37m-linux_x86_64.whl

描述

5) 安装依赖包

pip install numpy -i https://pypi.tuna.tsinghua.edu.cn/simple

6) 查看安装是否成功(能否调用到dcu)

查看队列:

whichpartition

申请节点并登录计算节点,进行测试。

salloc -p 队列名 -N 1 --gres=dcu:2

登录节点(根据申请到的节点登录)

ssh 节点名称

描述

切换rocm编译器版本

module switch compiler/rocm/dtk-22.04.1

7)在本地创建一个pytorch_env.sh的文件,添加环境变量

vi  ~/pytorch_env.sh
export LD_LIBRARY_PATH=/public/software/apps/DeepLearning/PyTorch/lib:/public/software/apps/DeepLearning/PyTorch/lmdb-0.9.24-build/lib:/public/software/apps/DeepLearning/PyTorch/opencv-2.4.13.6-build/lib:/public/software/apps/DeepLearning/PyTorch/openblas-0.3.7-build/lib:$LD_LIBRARY_PATH
source ~/pytorch_env.sh

激活pytorch_1.10-dtk_22.04环境(登录到计算节点后会退出之前的环境,所以需要重新激活环境)

conda activate pytorch_1.10-dtk_22.04

进入环境中依次执行

python
import torch
torch.cuda.is_available()

描述

results matching ""

    No results matching ""