在AMD的CPU上,有些程序运行时间会明显慢于Intel,若使用openblas重新编译的Numpy,Scipy和Pytorch,可能带来显著的性能提升。
如果你遇到了类似的问题,建议先用pycharm等工具profile,确定耗时的地方在Numpy,Scipy和Pytorch三者之一,然后再考虑重新编译,否则可能花费时间,但运算速度没有提高。
这一步可选,真男人用啥虚拟环境,hahaha。
conda create -n ~~~~~~~ python=3.8 conda activate ~~~~~~~
# 安装gfortran sudo apt-get install gfortran ## 来自:https://stackoverflow.com/questions/11443302/compiling-numpy-with-openblas-integration # 下载openblas git clone https://github.com/xianyi/OpenBLAS # 进入目录,然后编译 cd OpenBLAS && make FC=gfortran # 安装,以及解决环境变量 sudo make PREFIX=/opt/OpenBLAS install export LD_LIBRARY_PATH=/opt/OpenBLAS/lib:$LD_LIBRARY_PATH sudo ldconfig
(如果之前已经安装了其他版本的numpy,先采用稳妥的方式卸载)
# 克隆numpy git clone https://github.com/numpy/numpy.git cd numpy
接下来创建一个site.cfg,里面指明openblas的位置,当然site.cfg是有模板的(site.cfg.example):
cp site.cfg.example site.cfg nano site.cfg
把下面这几行的注释去掉
....
[openblas]
libraries = openblas
library_dirs = /opt/OpenBLAS/lib
include_dirs = /opt/OpenBLAS/include
....
然后运行测试(新版本的numpy似乎不能使用这种方法检查了,这不重要,直接跳过就行)
python setup.py config
如果顺利的话,应该会看到openblas,而不是MKL
FOUND: libraries = ['openblas', 'openblas'] library_dirs = ['/opt/OpenBLAS/lib'] language = c define_macros = [('HAVE_CBLAS', None)] runtime_library_dirs = ['/opt/OpenBLAS/lib'] FOUND: libraries = ['openblas', 'openblas'] library_dirs = ['/opt/OpenBLAS/lib'] language = c define_macros = [('HAVE_CBLAS', None)] runtime_library_dirs = ['/opt/OpenBLAS/lib']
如果出现MKL(如下),说明openblas没装好,或者conda环境中已经安装了MKL,需要在一个干净的conda环境中重复以上过程。
FOUND: 如果失败如出现 MKL libraries = ['mkl_rt', 'pthread'] library_dirs = ['/home/fuqingxu/anaconda3/lib'] define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)] include_dirs = ['/usr/local/include', '/usr/include', '/home/fuqingxu/anaconda3/include']
最后,all clear,可以直接编译安装了,简单的不得了
pip install . --verbose
检查是不是安装成功了
cd .. python
然后显示runtime 运行库都是openblas,成功。。另外scipy的编译和numpy是一模一样的,只要别下载错仓库就行,所以不再赘述
>>> import numpy >>> numpy.__config__.show() blas_mkl_info: NOT AVAILABLE blis_info: NOT AVAILABLE openblas_info: libraries = ['openblas', 'openblas'] library_dirs = ['/opt/OpenBLAS/lib'] language = c define_macros = [('HAVE_CBLAS', None)] runtime_library_dirs = ['/opt/OpenBLAS/lib'] blas_opt_info: libraries = ['openblas', 'openblas'] library_dirs = ['/opt/OpenBLAS/lib'] language = c define_macros = [('HAVE_CBLAS', None)] runtime_library_dirs = ['/opt/OpenBLAS/lib'] lapack_mkl_info: NOT AVAILABLE openblas_lapack_info: libraries = ['openblas', 'openblas'] library_dirs = ['/opt/OpenBLAS/lib'] language = c define_macros = [('HAVE_CBLAS', None)] runtime_library_dirs = ['/opt/OpenBLAS/lib'] lapack_opt_info: libraries = ['openblas', 'openblas'] library_dirs = ['/opt/OpenBLAS/lib'] language = c define_macros = [('HAVE_CBLAS', None)] runtime_library_dirs = ['/opt/OpenBLAS/lib']
三、编译pytorch
编译pytorch更难,在网络不好的地方,你首先试试把pytorch克隆下来哈哈:
# Get the PyTorch Source git clone --recursive pytorch/pytorch cd pytorch # if you are updating an existing checkout git submodule sync git submodule update --init --recursive
克隆不了的话,只能设置代理:
git config --global http.proxy socks5://127.0.0.1:port???? git config --global https.proxy socks5://127.0.0.1:port????
需要安装除了排除掉MKL的其他依赖,注意里面不能有numpy,有的话会直接覆盖掉前面的numpy,前面numpy就白编译了
conda install ninja pyyaml setuptools cmake cffi typing_extensions future six requests dataclasses conda install -c pytorch magma-cuda110
配置编译所需的环境变量,其中USE_CUDA是对GPU的支持,当然要支持,不支持GPU用啥pytorch,MAX_JOBS是使用多少线程编译,取决于cpu有多牛逼,BLAS=OpenBLAS即指定Openblas的使用,USE_MKLDNN要一块关了因为我们用CUDNN
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"} export USE_OPENMP=1 # 不是很懂这个 export BLAS=OpenBLAS export USE_CUDA=1 export USE_CUDNN=1 export MAX_JOBS=128 export USE_MKLDNN=0 # export USE_NUMPY=1 这个我不是很确定,哪位大佬看到可以评论一下这个有什么作用 export USE_MPI=1 # 不是很懂这个 export BUILD_BINARY=1 export LD_LIBRARY_PATH=/opt/OpenBLAS/lib:$LD_LIBRARY_PATH export OpenBLAS_HOME=/opt/OpenBLAS export CUDA_HOME=/usr/local/cuda-11.0 # 取决于cuda的安装位置和版本 export CUDNN_LIB_DIR=/usr/local/cuda-11.0/lib64 # 取决于cuda的安装位置和版本 export CUDNN_INCLUDE_DIR=/usr/local/cuda-11.0/include # 取决于cuda的安装位置和版本
当然,也得安装cuda和cudnn,务必注意版本问题,由于版本更新很快,需要自己把握各组件的版本对应关系。
最后编译和安装就可以了
# exectute building python setup.py install
万一失败了,再次编译前需要清理上次编译的残余:
git clean -xfd && git submodule foreach git clean -xfd
测试显示已经替换掉了MKL
cd .. python >>> import torch >>> torch.cuda.is_available() True >>> torch.__config__.show() 'PyTorch built with:
- GCC 9.3
- C++ Version: 201402
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 11.0
- NVCC architecture flags: -gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_75,code=sm_75; -gencode;arch=compute_80,code=compute_80
- CuDNN 8.1.1 (built against CUDA 11.2)
- Magma 2.5.2
- Build settings: BLAS_INFO=open, BUILD_TYPE=Release, CUDA_VERSION=11.0, CUDNN_VERSION=8.1.1, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=open, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=1, USE_CUDNN=1, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKLDNN=OFF, USE_MPI=1, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=1,
'
感谢看到这,有帮助的话留个赞吧,不胜感谢!