RuntimeError: CUDA error: the launch timed out and was terminated
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
[W CUDAGuardImpl.h:113] Warning: CUDA warning: the launch timed out and was terminated (function destroyEvent)
terminate called after throwing an instance of 'c10::CUDAError'
what(): CUDA error: the launch timed out and was terminated
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception raised from create_event_internal at ../c10/cuda/CUDACachingAllocator.cpp:1230 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f62be2c17d2 in /home/cheyuxuanll/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x239de (0x7f62f6ea69de in /home/cheyuxuanll/.local/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x22d (0x7f62f6ea857d in /home/cheyuxuanll/.local/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #3: <unknown function> + 0x300568 (0x7f63736d9568 in /home/cheyuxuanll/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #4: c10::TensorImpl::release_resources() + 0x175 (0x7f62be2aa005 in /home/cheyuxuanll/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #5: std::vector<c10d::Reducer::Bucket, std::allocator<c10d::Reducer::Bucket> >::~vector() + 0x2e9 (0x7f62fa8ca5e9 in /home/cheyuxuanll/.local/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #6: c10d::Reducer::~Reducer() + 0x205 (0x7f62fa8bcd25 in /home/cheyuxuanll/.local/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #7: std::_Sp_counted_ptr<c10d::Reducer*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() + 0x12 (0x7f6373bb7212 in /home/cheyuxuanll/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #8: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x46 (0x7f63735c7506 in /home/cheyuxuanll/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #9: <unknown function> + 0x7e182f (0x7f6373bba82f in /home/cheyuxuanll/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #10: <unknown function> + 0x1f5b20 (0x7f63735ceb20 in /home/cheyuxuanll/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #11: <unknown function> + 0x1f6cce (0x7f63735cfcce in /home/cheyuxuanll/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #12: /usr/bin/python3() [0x5d1ec4]
frame #13: /usr/bin/python3() [0x5a958d]
frame #14: /usr/bin/python3() [0x5ed1a0]
frame #15: /usr/bin/python3() [0x544188]
frame #16: /usr/bin/python3() [0x5441da]
frame #17: /usr/bin/python3() [0x5441da]
frame #18: PyDict_SetItemString + 0x538 (0x5ce7c8 in /usr/bin/python3)
frame #19: PyImport_Cleanup + 0x79 (0x685179 in /usr/bin/python3)
frame #20: Py_FinalizeEx + 0x7f (0x68040f in /usr/bin/python3)
frame #21: Py_RunMain + 0x32d (0x6b7a1d in /usr/bin/python3)
frame #22: Py_BytesMain + 0x2d (0x6b7c8d in /usr/bin/python3)
frame #23: __libc_start_main + 0xf3 (0x7f6378be40b3 in /lib/x86_64-linux-gnu/libc.so.6)