memory - cudaMemcpy transfer kinds: Default vs HostToDevice/DeviceToHost

Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams manually specifying the direction allows for some additional error checking by the cuda runtime. For example, if you specify HostToDevice, the cuda runtime can check that the destination pointer you have passed is valid for use on the device. If you passed Default, and mistakenly used two host pointers, you will simply get a host->host copy of data, with no indication that anything is wrong. – Robert Crovella Apr 2, 2019 at 15:20 Good point. I think it's even better if there is static type checking to prevent mixing of host and device pointers. I have wrappers for device pointers and the memory API. These ensure at compile-time that such invalid mixing does not happen. The documentation recommends using


    cudaMemcpyDefault

over manually specifying but does not explain why. – Yashas Apr 2, 2019 at 15:24 This strikes me as a different question than the one you asked in your question. In your question you asked if there was an advantage of manual specification, which I responded to. I'm not able to reveal undocumented information. If you would like to see an improvement in CUDA, you can file a bug report (in this case against the documentation) at developer.nvidia.com – Robert Crovella Apr 2, 2019 at 15:51

tl;dr: Almost certainly no advantage.

cudaMemcpyDefault was added IIRC when GPUs started becoming capable of easily identifying the memory space by inspecting the address ("Unified virtual addressing"). Before that, you had to specify the direction. See, for example, the CUDA 3 documentation, accessible here . Look for cudaMemcpyKind in the API reference - no Default, just H2H, H2D, D2H and H2H.

When this changed, I guess it made sense to nVIDIA not to overload the function or name it differently, but just add a different constant value for the new capability.

I'm not 100% certain there's no difference, it's just very reasonable; and speaking from anecdotal personal experience, I've not seen any advantage/difference. Certainly the copying is not faster.

[...] Passing cudaMemcpyDefault is recommended, in which case the type of transfer is inferred from the pointer values. However, cudaMemcpyDefault is only allowed on systems that support unified virtual addressing. [...]

Therefore if you have a GPU that allows unified virtual addressing, use cudaMemcpyDefault , otherwise you got no option than to be explicit.

You can query if your system supports it with

cudaGetDeviceProperties() with the device property cudaDeviceProp::unifiedAddressing .

@talonmies well, as you know, it is not documented there, as plenty of other things of CUDA that don't seem documented. If the docs recommend it, I assume there is some reason why its better (otherwise I'd argue they are badly written). I do not know it. If you do, please answer the question, that way more of us can learn ;) – Ander Biguri Apr 2, 2019 at 15:16

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question . Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.
Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers .