c - CUDA driver API equivalent for cudaSetDevice

Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
What is the CUDA driver's API equivalent for the runtime API function cudaSetDevice ?
I was looking into the driver API and cannot find an equivalent function. What I can do is
cuDeviceGet(&cuDevice, device_no);
cuCtxCreate(&cuContext, 0, cuDevice);
which is not equivalent since beside setting the device it also creates a context. The runtime API cudaSetDevice does not create a context per se. In the runtime API the CUDA context is created implicitly with the first CUDA call that requires state on the device.
Background for this question: CUDA-aware MPI (MVAPICH2 1.8/9) initialization requires the CUDA device to be set before MPI_init is called. Using the CUDA runtime API this can be done with 
cudaSetDevice(device_no);
MPI_init();
However, I don't want to use the call to the CUDA runtime since the rest of my application is purely using the driver API and I'd like to avoid linking also to the runtime.
What's wrong in creating the context already before MPI is initialized? In principle nothing. Just wondering if there is an equivalent call in the driver API.
                Thinking of it again. I think there is no equivalent call, since in the driver API the device is not set. Instead it's as I posted: One creates a handle to a device and creates a context (implicitly setting the device). With this MVAPICH2 is happy.
– ritter
                Aug 19, 2013 at 19:29
                In other words: Using the driver API one has to create the context when using CUDA-aware MPI.
– ritter
                Aug 19, 2013 at 19:30
                In the runtime API, as of CUDA 4.0, cudaSetDevice does create a context, if one is not already in existence on the device in question.
– talonmies
                Aug 19, 2013 at 20:23
You can find information about this in the Programming Guide Appendix about the Driver API, but the short version is this:
cuCtxCreate acts as the first cudaSetDevice call (that is it creates a context on the driver context stack)
The cuCtxPushCurrent() and cuCtxPopCurrent() pair (or cuCtxSetCurrent depending on which API version you are using) acts as any subsequent cudaSetDevice call (that is it pushes or selects a previously created context to be the active context for all subsequent API calls until the context is popped off the driver context stack or deselected)
                Not to forget cuCtxSetCurrent() which replaces cuCtxPush/PopCurrent since Cuda 4.x (not sure about the exact version). It also comes closer to cudaSetDevice(), except that it takes a context as argument and not the deviceID, and doesn't create a new context on first call.
– kunzmi
                Aug 19, 2013 at 21:03
                You are right. cudaSetDevice creates a context. I checked this by looking at the amount of virtual memory assigned to the process.
– ritter
                Aug 19, 2013 at 22:52
                Why then is it possible to do this: cudaSetDevice(0); cudaMalloc(ptr,16); cudaSetDevice(1) on a 2 GPU machine? (All commands return no error; checked)
– ritter
                Aug 19, 2013 at 22:53
                I remember setting the device on a context which holds state to barf. Has this become more flexible? What happens to the allocated memory? (Using CUDA 5.5 here)
– ritter
                Aug 19, 2013 at 22:54
                A call to cudaSetDevice(deviceId) creates a new context, if for ‘deviceID’ no context exists. Following calls to cudaMalloc or kernel launches happen on the device which was last set using cudaSetDevice; so what happens in your previous comment is, that ptr gets allocated on device 0, following calls would go to device 1. If you would use ptr now in a kernel, this would fail (if no unified virtual addressing is enabled) as it is allocated on another device. But if you call again cudaSetDevice(0), no context is created this time, you have again access to ptr.
– kunzmi
                Aug 20, 2013 at 0:07
Actually, cudaSetDevice() isn't exactly like creating to retrieving a context as though cuCtxCreate() was called. It's very similar, but there is a special context which the CUDA runtime API uses. This context is called the device's primary context. There are specific driver API functions for working with this special context:
CUresult cuDevicePrimaryCtxGetState ( CUdevice dev, unsigned int* flags, int* active );
CUresult cuDevicePrimaryCtxRelease ( CUdevice dev );
CUresult cuDevicePrimaryCtxReset ( CUdevice dev );
CUresult cuDevicePrimaryCtxRetain ( CUcontext* pctx, CUdevice dev );
CUresult cuDevicePrimaryCtxSetFlags ( CUdevice dev, unsigned int  flags );
So, why you want to achieve the equivane of cudaSetDevice(), that would be involve (ignoring error checking) something like:
CUcontext* primary_context;
cuDevicePrimaryCtxRetain(&primary_context, device_id);
cuCtxSetCurrent(primary_context);
Notes:
You should call the Release function at some point to reduce the reference count; but - don't do it without setting another current context.
You can either replace-replace-replace the current context, or push and then finally pop the current context from a stack of context. The replace works on the top of the stack (so it's like pop, then push).
        Thanks for contributing an answer to Stack Overflow!
Please be sure to answer the question. Provide details and share your research!
But avoid …
Asking for help, clarification, or responding to other answers.
Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.