PyTorch-1.8 、1.9 conda方式自定义安装教程
1. conda安装
1.1 复制Anaconda安装文件
Anaconda安装文件在公共目录:
/public/software/apps/DeepLearning/whl/Anaconda/Anaconda3-2021.05-Linux-x86_64.sh
执行:(修改username为自己的用户名)
cp -rf /public/software/apps/DeepLearning/whl/Anaconda/Anaconda3-2021.05-Linux-x86_64.sh /public/home/username/
1.2 创建文件夹并运行安装文件
mkdir -p ~/anaconda3/
bash Anaconda3-2021.05-Linux-x86_64.sh -b -f -p "~/anaconda3/"
rm -rf Anaconda3-2021.05-Linux-x86_64.sh
1.3 初始化 Conda 环境
~/anaconda3/bin/conda init
source ~/.bashrc
2. pytorch-1.9 安装(以pytorch-1.9为例)
2.1 创建并进入python3.6环境
conda create -n pytorch-1.9 python=3.6
conda activate pytorch-1.9
2.2 安装pytorch-1.9(适配rocm-4.0.1及以上)
PyTorch1.8和PyTorch1.9安装wheel包在公共目录:
/public/software/apps/DeepLearning/whl/rocm-4.0.1/
安装pytorch_1.9-rocm_4.0.1(使用清华源)
pip install /public/software/apps/DeepLearning/whl/rocm-4.0.1/torch-1.9.0+rocm4.0.1-cp36-cp36m-linux_x86_64.whl -i https://pypi.tuna.tsinghua.edu.cn/simple/
将公共目录中torchvision包拷贝到自定义的conda环境中的site-package中(注意修改拷贝目的地路径的username为自己的用户名):
cp -r /public/software/apps/DeepLearning/whl/rocm-4.0.1/torchvision-0.10-pytorch1.9-rocm-4.0.1-py36/torchvision/ /public/home/username/anaconda3/envs/pytorch-1.9/lib/python3.6/site-packages/
cp -r /public/software/apps/DeepLearning/whl/rocm-4.0.1/torchvision-0.10-pytorch1.9-rocm-4.0.1-py36/torchvision-0.10.0a0+cde7ff0.dist-info/ /public/home/username/anaconda3/envs/pytorch-1.9/lib/python3.6/site-packages/
安装依赖包:(可以使用清华源)
pip install numpy pillow -i https://pypi.tuna.tsinghua.edu.cn/simple/
3. 在slurm脚本中添加配置MIOPEN环境变量
export MIOPEN_DEBUG_DISABLE_FIND_DB=1
export MIOPEN_DEBUG_CONV_WINOGRAD=0
export MIOPEN_DEBUG_CONV_IMPLICIT_GEMM=0
export HSA_USERPTR_FOR_PAGED_MEM=0
export GLOO_SOCKET_IFNAME=ib0,ib1,ib2,ib3
export MIOPEN_SYSTEM_DB_PATH=/temp/pytorch-miopen-2.8
4. 在bashrc文件中添加路径
vi ~/.bashrc
export LD_LIBRARY_PATH=/public/home/username/anaconda3/bin/../lib/:$LD_LIBRARY_PATH
source ~/.bashrc
5. 若发生miopen错误
在home下ls -a找到隐藏文件夹.cache和.config,进入文件夹中删除MIOPEN文件。