有人使用过Tensorflow-DirectML嘛,性能和cuda或者rocm相比如何?
说在前面:正巧前段时间做了个简单的使用PyTorch-directml 的测试,应该有些参考价值。不过因为不是专门测试,所以没有保留测试数据,见谅。
起因:配了一台新的AMD 5700G的小机器,开始只是单纯为了发掘潜在用途,于是想到了深度计算。
因为ROCm不支持windows以及核显,抱着玩玩的心态,尝试着在wsl里装了PyTorch-directml,Bios里给显卡分配了4G空间,跑的是微软的官方测试脚本,resnet50(pytorch1.13) train.py,没想到居然能跑起来!看了设备,确实是用了GPU。后来又加了内存,显卡分了8G,测试速度快了将近一倍。不过后来再增加,速度提升的就不多了。
因为这个意外的收获,于是又在一台笔记本上进行了测试。独显是1060(m),显存6G,结果竟然提示显存不足错误。后来把测试脚本中放大图片resize的部分去掉,脚本才跑了起来,于是在windows环境下分别做了使用cuda的测试和在wsl中使用directml的测试,directml对比cuda性能还是有明显差距的。
这次简单的测试,最让我意外的是AMD显卡在windows下终于可以进行深度计算(虽然是微软实现),另外一个意外就是1060的6G显存跑不动的测试,居然在5700G的核显上分配4G显存就可以跑下来,虽然慢一些,但是能跑下来比跑不动强多了!!!这样一来,一些代码就不用非得买nv的大显存显卡了。具体的道理尚未仔细研究,大家可以自行测试。
##### 更新 #####
因为rocm 目前仍然不支持windows,而directml 又只支持windows,所以无法直接对比rocm 和directml 。今天另找了台nvidia显卡的机器,终于有足够显存跑样例了。分别测试了windows环境下的cuda 和directml ,以及WSL里的cuda 和directml,样例仍然使用了之前的PyTorch/1.13/resnet50,测试结果如下:
========== windows11/cuda ==========
(py38-pytorch) microsoft-directml\DirectML> python .\PyTorch\1.13\resnet50_cuda\train.py
Finished moving resnet50 to device: cuda:0 in 0.0s.
Epoch 1
-------------------------------
loss: 2.241352 [ 3200/50000] in 20.780781s
loss: 2.591142 [ 6400/50000] in 19.960362s
loss: 2.236478 [ 9600/50000] in 20.559404s
loss: 1.929659 [12800/50000] in 20.624541s
loss: 1.965856 [16000/50000] in 20.960731s
loss: 1.553656 [19200/50000] in 20.729742s
loss: 2.126348 [22400/50000] in 20.868487s
loss: 1.810576 [25600/50000] in 21.010355s
loss: 1.865492 [28800/50000] in 20.838489s
loss: 1.903866 [32000/50000] in 20.901460s
loss: 1.843706 [35200/50000] in 21.045890s
loss: 1.797519 [38400/50000] in 20.916061s
loss: 2.328907 [41600/50000] in 20.994241s
loss: 1.936519 [44800/50000] in 21.063764s
loss: 1.926670 [48000/50000] in 20.989604s
current highest_accuracy: 0.34470000863075256
Test Error:
Accuracy: 34.5%, Avg loss: 1.784137
Epoch 2
-------------------------------
loss: 1.712744 [ 3200/50000] in 19.781586s
loss: 2.029347 [ 6400/50000] in 19.756735s
loss: 2.078274 [ 9600/50000] in 19.971937s
========== windows11/directml ==========
(py38-pytorch-dml) microsoft-directml\DirectML> conda activate py38-pytorch-dml
Finished moving resnet50 to device: privateuseone:0 in 0.0s.
Epoch 1
-------------------------------
loss: 2.540365 [ 3200/50000] in 45.464760s
loss: 2.216681 [ 6400/50000] in 42.757590s
loss: 1.781944 [ 9600/50000] in 41.206029s
loss: 2.342978 [12800/50000] in 43.580112s
loss: 1.937655 [16000/50000] in 44.562225s
loss: 1.745151 [19200/50000] in 41.561499s
loss: 2.086621 [22400/50000] in 41.219541s
loss: 1.989843 [25600/50000] in 43.179621s
loss: 2.191449 [28800/50000] in 43.027872s
loss: 1.858849 [32000/50000] in 44.147829s
loss: 2.104559 [35200/50000] in 42.561446s
loss: 2.535424 [38400/50000] in 45.391162s
loss: 2.133829 [41600/50000] in 46.196628s
loss: 1.941975 [44800/50000] in 45.158385s
loss: 1.919041 [48000/50000] in 45.482952s
current highest_accuracy: 0.3734999895095825
Test Error:
Accuracy: 37.3%, Avg loss: 1.776877
Epoch 2
-------------------------------
loss: 1.930406 [ 3200/50000] in 32.436144s
loss: 2.082932 [ 6400/50000] in 31.731541s
loss: 1.850514 [ 9600/50000] in 31.988396s
========= WSL2(windows11)/cuda =========
(py38-pytorch) xxx@yyy:~/python/microsoft-directml/DirectML$ python ./PyTorch/1.13/resnet50_cuda/train.py
Finished moving resnet50 to device: cuda:0 in 7.152557373046875e-07s.
Epoch 1
-------------------------------
loss: 2.422694 [ 3200/50000] in 19.412009s
loss: 2.225193 [ 6400/50000] in 18.882695s
loss: 2.421190 [ 9600/50000] in 19.305523s
loss: 2.512413 [12800/50000] in 19.701946s
loss: 2.253301 [16000/50000] in 19.812532s
loss: 2.219946 [19200/50000] in 19.914915s
loss: 2.053805 [22400/50000] in 19.917545s
loss: 2.081501 [25600/50000] in 19.903334s
loss: 2.022675 [28800/50000] in 19.920985s
loss: 2.315096 [32000/50000] in 19.943515s
loss: 2.019898 [35200/50000] in 19.931639s
loss: 2.224422 [38400/50000] in 20.033506s
loss: 1.937939 [41600/50000] in 19.977067s
loss: 2.025831 [44800/50000] in 19.967999s
loss: 1.814435 [48000/50000] in 19.981438s
current highest_accuracy: 0.3939000070095062
Test Error:
Accuracy: 39.4%, Avg loss: 1.687911
Epoch 2
-------------------------------
loss: 1.909725 [ 3200/50000] in 18.882461s
loss: 1.752400 [ 6400/50000] in 18.920463s
loss: 2.160418 [ 9600/50000] in 18.971684s
========= WSL2(windows11)/directml ======
(py38-pytorch-dml) xxx@yyy:~/python/microsoft-directml/DirectML$ python ./PyTorch/1.13/resnet50/train.py
Finished moving resnet50 to device: privateuseone:0 in 1.6689300537109375e-06s.
Epoch 1
-------------------------------
loss: 2.329326 [ 3200/50000] in 52.835301s
loss: 3.163958 [ 6400/50000] in 42.266013s
loss: 2.050293 [ 9600/50000] in 44.762950s
loss: 1.932261 [12800/50000] in 45.753648s
loss: 2.291736 [16000/50000] in 45.594762s
loss: 2.215965 [19200/50000] in 46.115555s
loss: 2.110554 [22400/50000] in 43.269897s
loss: 1.973630 [25600/50000] in 41.979957s
loss: 1.768795 [28800/50000] in 43.764015s
loss: 1.764418 [32000/50000] in 43.815332s
loss: 1.876074 [35200/50000] in 43.976614s
loss: 1.982149 [38400/50000] in 43.792736s
loss: 1.973091 [41600/50000] in 43.983851s
loss: 1.956055 [44800/50000] in 42.120856s
loss: 1.841768 [48000/50000] in 43.683074s
current highest_accuracy: 0.3562999963760376
Test Error:
Accuracy: 35.6%, Avg loss: 1.788380
Epoch 2
-------------------------------
loss: 1.732707 [ 3200/50000] in 31.911953s
loss: 1.812796 [ 6400/50000] in 32.767786s
loss: 1.767053 [ 9600/50000] in 32.925835s