有人使用过Tensorflow-DirectML嘛,性能和cuda或者rocm相比如何?

关注者
26
被浏览
30,816
登录后你可以
不限量看优质回答 私信答主深度交流 精彩内容一键收藏

说在前面:正巧前段时间做了个简单的使用PyTorch-directml 的测试,应该有些参考价值。不过因为不是专门测试,所以没有保留测试数据,见谅。

起因:配了一台新的AMD 5700G的小机器,开始只是单纯为了发掘潜在用途,于是想到了深度计算。

因为ROCm不支持windows以及核显,抱着玩玩的心态,尝试着在wsl里装了PyTorch-directml,Bios里给显卡分配了4G空间,跑的是微软的官方测试脚本,resnet50(pytorch1.13) train.py,没想到居然能跑起来!看了设备,确实是用了GPU。后来又加了内存,显卡分了8G,测试速度快了将近一倍。不过后来再增加,速度提升的就不多了。

因为这个意外的收获,于是又在一台笔记本上进行了测试。独显是1060(m),显存6G,结果竟然提示显存不足错误。后来把测试脚本中放大图片resize的部分去掉,脚本才跑了起来,于是在windows环境下分别做了使用cuda的测试和在wsl中使用directml的测试,directml对比cuda性能还是有明显差距的。

这次简单的测试,最让我意外的是AMD显卡在windows下终于可以进行深度计算(虽然是微软实现),另外一个意外就是1060的6G显存跑不动的测试,居然在5700G的核显上分配4G显存就可以跑下来,虽然慢一些,但是能跑下来比跑不动强多了!!!这样一来,一些代码就不用非得买nv的大显存显卡了。具体的道理尚未仔细研究,大家可以自行测试。

##### 更新 #####

因为rocm 目前仍然不支持windows,而directml 又只支持windows,所以无法直接对比rocm 和directml 。今天另找了台nvidia显卡的机器,终于有足够显存跑样例了。分别测试了windows环境下的cuda 和directml ,以及WSL里的cuda 和directml,样例仍然使用了之前的PyTorch/1.13/resnet50,测试结果如下:

========== windows11/cuda ==========

(py38-pytorch) microsoft-directml\DirectML> python .\PyTorch\1.13\resnet50_cuda\train.py

Finished moving resnet50 to device: cuda:0 in 0.0s.

Epoch 1

-------------------------------

loss: 2.241352 [ 3200/50000] in 20.780781s

loss: 2.591142 [ 6400/50000] in 19.960362s

loss: 2.236478 [ 9600/50000] in 20.559404s

loss: 1.929659 [12800/50000] in 20.624541s

loss: 1.965856 [16000/50000] in 20.960731s

loss: 1.553656 [19200/50000] in 20.729742s

loss: 2.126348 [22400/50000] in 20.868487s

loss: 1.810576 [25600/50000] in 21.010355s

loss: 1.865492 [28800/50000] in 20.838489s

loss: 1.903866 [32000/50000] in 20.901460s

loss: 1.843706 [35200/50000] in 21.045890s

loss: 1.797519 [38400/50000] in 20.916061s

loss: 2.328907 [41600/50000] in 20.994241s

loss: 1.936519 [44800/50000] in 21.063764s

loss: 1.926670 [48000/50000] in 20.989604s

current highest_accuracy: 0.34470000863075256

Test Error:

Accuracy: 34.5%, Avg loss: 1.784137


Epoch 2

-------------------------------

loss: 1.712744 [ 3200/50000] in 19.781586s

loss: 2.029347 [ 6400/50000] in 19.756735s

loss: 2.078274 [ 9600/50000] in 19.971937s



========== windows11/directml ==========

(py38-pytorch-dml) microsoft-directml\DirectML> conda activate py38-pytorch-dml

Finished moving resnet50 to device: privateuseone:0 in 0.0s.

Epoch 1

-------------------------------

loss: 2.540365 [ 3200/50000] in 45.464760s

loss: 2.216681 [ 6400/50000] in 42.757590s

loss: 1.781944 [ 9600/50000] in 41.206029s

loss: 2.342978 [12800/50000] in 43.580112s

loss: 1.937655 [16000/50000] in 44.562225s

loss: 1.745151 [19200/50000] in 41.561499s

loss: 2.086621 [22400/50000] in 41.219541s

loss: 1.989843 [25600/50000] in 43.179621s

loss: 2.191449 [28800/50000] in 43.027872s

loss: 1.858849 [32000/50000] in 44.147829s

loss: 2.104559 [35200/50000] in 42.561446s

loss: 2.535424 [38400/50000] in 45.391162s

loss: 2.133829 [41600/50000] in 46.196628s

loss: 1.941975 [44800/50000] in 45.158385s

loss: 1.919041 [48000/50000] in 45.482952s

current highest_accuracy: 0.3734999895095825

Test Error:

Accuracy: 37.3%, Avg loss: 1.776877


Epoch 2

-------------------------------

loss: 1.930406 [ 3200/50000] in 32.436144s

loss: 2.082932 [ 6400/50000] in 31.731541s

loss: 1.850514 [ 9600/50000] in 31.988396s



========= WSL2(windows11)/cuda =========

(py38-pytorch) xxx@yyy:~/python/microsoft-directml/DirectML$ python ./PyTorch/1.13/resnet50_cuda/train.py

Finished moving resnet50 to device: cuda:0 in 7.152557373046875e-07s.

Epoch 1

-------------------------------

loss: 2.422694 [ 3200/50000] in 19.412009s

loss: 2.225193 [ 6400/50000] in 18.882695s

loss: 2.421190 [ 9600/50000] in 19.305523s

loss: 2.512413 [12800/50000] in 19.701946s

loss: 2.253301 [16000/50000] in 19.812532s

loss: 2.219946 [19200/50000] in 19.914915s

loss: 2.053805 [22400/50000] in 19.917545s

loss: 2.081501 [25600/50000] in 19.903334s

loss: 2.022675 [28800/50000] in 19.920985s

loss: 2.315096 [32000/50000] in 19.943515s

loss: 2.019898 [35200/50000] in 19.931639s

loss: 2.224422 [38400/50000] in 20.033506s

loss: 1.937939 [41600/50000] in 19.977067s

loss: 2.025831 [44800/50000] in 19.967999s

loss: 1.814435 [48000/50000] in 19.981438s

current highest_accuracy: 0.3939000070095062

Test Error:

Accuracy: 39.4%, Avg loss: 1.687911


Epoch 2

-------------------------------

loss: 1.909725 [ 3200/50000] in 18.882461s

loss: 1.752400 [ 6400/50000] in 18.920463s

loss: 2.160418 [ 9600/50000] in 18.971684s



========= WSL2(windows11)/directml ======

(py38-pytorch-dml) xxx@yyy:~/python/microsoft-directml/DirectML$ python ./PyTorch/1.13/resnet50/train.py

Finished moving resnet50 to device: privateuseone:0 in 1.6689300537109375e-06s.

Epoch 1

-------------------------------

loss: 2.329326 [ 3200/50000] in 52.835301s

loss: 3.163958 [ 6400/50000] in 42.266013s

loss: 2.050293 [ 9600/50000] in 44.762950s

loss: 1.932261 [12800/50000] in 45.753648s

loss: 2.291736 [16000/50000] in 45.594762s

loss: 2.215965 [19200/50000] in 46.115555s

loss: 2.110554 [22400/50000] in 43.269897s

loss: 1.973630 [25600/50000] in 41.979957s

loss: 1.768795 [28800/50000] in 43.764015s

loss: 1.764418 [32000/50000] in 43.815332s

loss: 1.876074 [35200/50000] in 43.976614s

loss: 1.982149 [38400/50000] in 43.792736s

loss: 1.973091 [41600/50000] in 43.983851s

loss: 1.956055 [44800/50000] in 42.120856s

loss: 1.841768 [48000/50000] in 43.683074s

current highest_accuracy: 0.3562999963760376

Test Error:

Accuracy: 35.6%, Avg loss: 1.788380


Epoch 2

-------------------------------

loss: 1.732707 [ 3200/50000] in 31.911953s

loss: 1.812796 [ 6400/50000] in 32.767786s

loss: 1.767053 [ 9600/50000] in 32.925835s

编辑于 2023-02-11 13:49 ・IP 属地北京