cuda驱动程序入口点访问分析

1. 简介

Driver Entry Point Access APIs提供了一种检索 CUDA 驱动程序函数地址的方法。从 CUDA 11.3 开始，用户可以使用从这些 API 获取的函数指针调用可用的 CUDA 驱动程序 API。

这些 API 提供的功能类似于它们的对应项，POSIX 平台上的 dlsym 和 Windows 上的 GetProcAddress。提供的 API 将允许用户：

使用CUDA

Driver

API.

使用CUDA

Runtime

API.

请求 CUDA 驱动程序函数的 每线程默认流 版本。有关更多详细信息，请参阅检索每线程默认流版本

使用较新的驱动程序访问较旧工具包上的新 CUDA 功能。

2. 驱动程序函数类型定义

为了帮助检索 CUDA 驱动程序 API 入口点，CUDA 工具包提供了对包含所有 CUDA 驱动程序 API 的函数指针定义的标头的访问。这些标头与 CUDA 工具包一起安装，并在工具包的include/目录中提供。下表总结了包含typedefs每个 CUDA API 头文件的头文件。

上面的标头本身没有定义实际的函数指针;它们定义函数指针的 typedef。例如，cudaTypedefs.h具有以下驱动程序cuMemAlloc API 的类型定义：

typedef CUresult (CUDAAPI *PFN_cuMemAlloc_v3020)(CUdeviceptr_v2 *dptr, size_t bytesize);

typedef CUresult (CUDAAPI *PFN_cuMemAlloc_v2000)(CUdeviceptr_v1 *dptr, unsigned int bytesize);

CUDA 驱动程序符号具有基于版本的命名方案，其名称中带有_v*扩展名，但第一个版本除外。当特定 CUDA 驱动程序cuMemAlloc API 的签名或语义发生变化时，会递增相应驱动程序符号的版本号。对于驱动程序 API，第一个驱动程序符号名称为cuMemAlloc ，下一个符号名称为cuMemAlloc_v2 。CUDA 2.0 （2000）中引入的第一个版本的 typedef 是 PFN_cuMemAlloc_v2000。CUDA 3.2 （3020）中引入的下一个版本的 typedef 是PFN_cuMemAlloc_v3020 。

typedefs可用于更轻松地在代码中定义适当类型的函数指针：

PFN_cuMemAlloc_v3020 pfn_cuMemAlloc_v2;

PFN_cuMemAlloc_v2000 pfn_cuMemAlloc_v1;

如果用户对特定版本的 API 感兴趣，则首选上述方法。此外，标头具有发布已安装的 CUDA 工具包时可用的所有驱动程序符号的最新版本的预定义宏;这些 Typedef 没有_v*后缀。对于 CUDA 11.3 工具包， cuMemAlloc_v2是最新版本，因此还可以定义其函数指针，如下所示：

PFN_cuMemAlloc pfn_cuMemAlloc;

3. 驱动程序函数检索

使用驱动程序入口点访问 API 和适当的 typedef，可以获取指向任何 CUDA 驱动程序 API 的函数指针。

3.1. 使用驱动程序 API

驱动程序 API 需要 CUDA 版本作为参数，以获取所请求驱动程序符号的 ABI 兼容版本_v*。CUDA 驱动程序 API 具有用扩展表示的每个函数 ABI。例如，考虑以下cuStreamBeginCapture版本中的cudaTypedefs.h版本及其对应的typedefs版本：

// cuda.h

CUresult CUDAAPI cuStreamBeginCapture(CUstream hStream);

CUresult CUDAAPI cuStreamBeginCapture_v2(CUstream hStream, CUstreamCaptureMode mode);

// cudaTypedefs.h

typedef CUresult (CUDAAPI *PFN_cuStreamBeginCapture_v10000)(CUstream hStream);

typedef CUresult (CUDAAPI *PFN_cuStreamBeginCapture_v10010)(CUstream hStream, CUstreamCaptureMode mode);

从上面的typedefs代码片段中，版本后缀_v10000与_v10010指示上述 API 分别在 CUDA 10.0 和 CUDA 10.1 中引入。

#include <cudaTypedefs.h>

// Declare the entry points for cuStreamBeginCapture

PFN_cuStreamBeginCapture_v10000 pfn_cuStreamBeginCapture_v1;

PFN_cuStreamBeginCapture_v10010 pfn_cuStreamBeginCapture_v2;

// Get the function pointer to the cuStreamBeginCapture driver symbol

cuGetProcAddress("cuStreamBeginCapture", &pfn_cuStreamBeginCapture_v1, 10000, CU_GET_PROC_ADDRESS_DEFAULT, &driverStatus);

// Get the function pointer to the cuStreamBeginCapture_v2 driver symbol

cuGetProcAddress("cuStreamBeginCapture", &pfn_cuStreamBeginCapture_v2, 10010, CU_GET_PROC_ADDRESS_DEFAULT, &driverStatus);

参考上面的代码片段，要检索驱动程序cuStreamBeginCapture API版本_v1的地址，CUDA 版本参数应正好为 10.0 （10000）。同样，用于检索 API _v2版本地址的 CUDA 版本应为 10.1 （10010）。指定更高的 CUDA 版本来检索驱动程序 API 的特定版本可能并不总是可移植的。例如，在此处使用 11030 仍会返回_v2符号，但如果在 CUDA 11.3 中发布了假设_v3版本，则cuGetProcAddress API 将在与 CUDA 11.3 驱动程序配对时开始返回较新的_v3符号。由于 and 符号的 ABI _v2和_v3函数签名可能不同，因此使用_v10010用于_v2符号的定义调用_v3函数将表现出未定义的行为。

要检索给定 CUDA 工具包的驱动程序 API 的最新版本，还可以将 CUDA_VERSION 指定为version参数，并使用未版本化的 typedef 来定义函数指针。由于_v2是 CUDA 11.3 中驱动程序cuStreamBeginCapture API 的最新版本，因此下面的代码片段显示了检索它的不同方法。

// Assuming we are using CUDA 11.3 Toolkit

#include <cudaTypedefs.h>

// Declare the entry point

PFN_cuStreamBeginCapture pfn_cuStreamBeginCapture_latest;

// Intialize the entry point. Specifying CUDA_VERSION will give the function pointer to the

// cuStreamBeginCapture_v2 symbol since it is latest version on CUDA 11.3.

cuGetProcAddress("cuStreamBeginCapture", &pfn_cuStreamBeginCapture_latest, CUDA_VERSION, CU_GET_PROC_ADDRESS_DEFAULT, &driverStatus);

请注意，请求具有无效 CUDA 版本的驱动程序 API 将返回CUDA_ERROR_NOT_FOUND误差。在上面的代码示例中，传入小于 10000 （CUDA 10.0）的版本将无效。

3.2. 使用运行时 API

运行时 API 使用 CUDA 运行时版本获取所请求驱动程序符号的 ABI 兼容版本。在下面的代码片段中，所需的最低 CUDA 运行时版本是当时引入cuMemAllocAsync的 CUDA 11.2。

#include <cudaTypedefs.h>

// Declare the entry point

PFN_cuMemAllocAsync pfn_cuMemAllocAsync;

// Intialize the entry point. Assuming CUDA runtime version >= 11.2

cudaGetDriverEntryPoint("cuMemAllocAsync", &pfn_cuMemAllocAsync, cudaEnableDefault, &driverStatus);

// Call the entry point

if(driverStatus == cudaDriverEntryPointSuccess && pfn_cuMemAllocAsync) {

    pfn_cuMemAllocAsync(...);

3.3. 检索每线程默认流版本

某些 CUDA 驱动程序 API 可以配置为具有默认流或 每线程默认流 语义。具有 每线程默认流 语义的驱动程序 API 的名称中带有 后缀_ptsz 或 _ptds 。例如， cuLaunchKernel具有名为cuLaunchKernel_ptsz的 每线程默认流 变体。使用驱动程序入口点访问 API，用户可以请求驱动程序 API 的 每线程 默认流版本，而不是 默认流 版本。为默认流或 每线程默认流 语义配置 CUDA 驱动程序 API cuLaunchKernel会影响同步行为。更多细节可以在这里找到。

可以通过以下方法之一获取驱动程序 API 的默认流或 每线程默认流 版本：

使用--default-stream

per-thread编译标志或定义宏CUDA_API_PER_THREAD_DEFAULT_STREAM以获取 每个线程的默认流 行为。

分别使用标志强制默认流CU_GET_PROC_ADDRESS_LEGACY_STREAM/cudaEnableLegacyStream或CU_GET_PROC_ADDRESS_PER_THREAD_DEFAULT_STREAM/cudaEnablePerThreadDefaultStream 每线程默认流 行为。

3.4. 访问新的 CUDA 功能

始终建议安装最新的 CUDA 工具包以访问新的 CUDA 驱动程序功能，但如果由于某种原因，用户不想更新或无法访问最新的工具包，则该 API 可用于访问新的 CUDA 功能仅具有更新的 CUDA 驱动程序。为了进行讨论，让假设用户在 CUDA 11.3 上，并希望使用 CUDA 12.0 驱动程序中提供的新cuFoo驱动程序 API。下面的代码片段说明了这个用例：

int main()

    // Assuming we have CUDA 12.0 driver installed.

    // Manually define the prototype as cudaTypedefs.h in CUDA 11.3 does not have the cuFoo typedef

    typedef CUresult (CUDAAPI *PFN_cuFoo)(...);

    PFN_cuFoo pfn_cuFoo = NULL;

    CUdriverProcAddressQueryResult driverStatus;

    // Get the address for cuFoo API using cuGetProcAddress. Specify CUDA version as

    // 12000 since cuFoo was introduced then or get the driver version dynamically

    // using cuDriverGetVersion

    int driverVersion;

    cuDriverGetVersion(&driverVersion);

    CUresult status = cuGetProcAddress("cuFoo", &pfn_cuFoo, driverVersion, CU_GET_PROC_ADDRESS_DEFAULT, &driverStatus);

    if (status == CUDA_SUCCESS && pfn_cuFoo) {

        pfn_cuFoo(...);

    else {

        printf("Cannot retrieve the address to cuFoo - driverStatus = %d. Check if the latest driver for CUDA 12.0 is installed.\n", driverStatus);

        assert(0);

    // rest of code here

4. cuGetProcAddress 的潜在影响

下面是一组与和的潜在问题的具体cuGetProcAddress和cudaGetDriverEntryPoint理论示例。

4.1. cuGetProcAddress 与隐式链接的含义

cuDeviceGetUuid在 CUDA 9.2 中引入。此 API 在 CUDA 11.4 中引入了较新的修订版（cuDeviceGetUuid_v2）。为了保持次要版本的兼容性，在 CUDA 12.0 之前cuDeviceGetUuid不会在 cuda.h 中进行版本碰撞cuDeviceGetUuid_v2。这意味着通过cuGetProcAddress获取指向它的函数指针来调用它可能具有不同的行为。直接使用 API 的示例：

#include <cuda.h>

CUuuid uuid;

CUdevice dev;

CUresult status;

status = cuDeviceGet(&dev, 0); // Get device 0

// handle status

status = cuDeviceGetUuid(&uuid, dev) // Get uuid of device 0

在此示例中，假设用户正在使用 CUDA 11.4 进行编译。请注意，这将执行cuDeviceGetUuid 的行为，而不是_v2版本。现在使用cuGetProcAddress以下示例：

#include <cudaTypedefs.h>

CUuuid uuid;

CUdevice dev;

CUresult status;

CUdriverProcAddressQueryResult driverStatus;

status = cuDeviceGet(&dev, 0); // Get device 0

// handle status

PFN_cuDeviceGetUuid pfn_cuDeviceGetUuid;

status = cuGetProcAddress("cuDeviceGetUuid", &pfn_cuDeviceGetUuid, CUDA_VERSION, CU_GET_PROC_ADDRESS_DEFAULT, &driverStatus);

if(CUDA_SUCCESS == status && pfn_cuDeviceGetUuid) {

    // pfn_cuDeviceGetUuid points to ???

在此示例中，假设用户正在使用 CUDA 11.4 进行编译。这将获得cuDeviceGetUuid_v2 的函数指针。然后，调用函数指针将调用新的_v2函数，这与前面示例中所示的cuDeviceGetUuid函数不同。

4.2. cuGetProcAddress 中的编译时间与运行时版本使用情况

让处理同样的问题并进行一个小调整。最后一个示例使用 CUDA_VERSION 的编译时常量来确定要获取的函数指针。如果用户使用 cuDriverGetVersion或 cudaDriverGetVersion传递给cuGetProcAddress 动态查询驱动程序版本，则会出现更多复杂情况。例：

#include <cudaTypedefs.h>

CUuuid uuid;

CUdevice dev;

CUresult status;

int cudaVersion;

CUdriverProcAddressQueryResult driverStatus;

status = cuDeviceGet(&dev, 0); // Get device 0

// handle status

status = cuDriverGetVersion(&cudaVersion);

// handle status

PFN_cuDeviceGetUuid pfn_cuDeviceGetUuid;

status = cuGetProcAddress("cuDeviceGetUuid", &pfn_cuDeviceGetUuid, cudaVersion, CU_GET_PROC_ADDRESS_DEFAULT, &driverStatus);

if(CUDA_SUCCESS == status && pfn_cuDeviceGetUuid) {

    // pfn_cuDeviceGetUuid points to ???

在此示例中，假设用户正在使用 CUDA 11.3 进行编译。用户将使用获取cuDeviceGetUuid（不是_v2版本）的已知行为来调试、测试和部署此应用程序。由于 CUDA 保证了次要版本之间的 ABI 兼容性，因此预计在驱动程序升级到 CUDA 11.4（无需更新工具包和运行时）后，同一应用程序将运行，而无需重新编译。不过，这将具有未定义的行为，因为现在PFN_cuDeviceGetUuid仍将是原始版本的签名，但由于cudaVersion现在是 11040 （CUDA 11.4），cuGetProcAddress将返回指向_v2版本的函数指针，这意味着调用它可能具有未定义的行为。

请注意，在这种情况下，原始（不是_v2版本）typedef如下所示：

typedef CUresult (CUDAAPI *PFN_cuDeviceGetUuid_v9020)(CUuuid *uuid, CUdevice_v1 dev);

但是_v2版本的typedef看起来像：

typedef CUresult (CUDAAPI *PFN_cuDeviceGetUuid_v11040)(CUuuid *uuid, CUdevice_v1 dev);

因此，在这种情况下，API/ABI 将是相同的，运行时 API 调用可能不会引起问题，只会导致未知 uuid 返回的可能性。在对API/ABI的影响中，讨论了 API/ABI 兼容性的一个更成问题的情况。

4.3. 显式版本检查的 API 版本颠簸

上面，是一个具体的例子。例如，现在让使用一个理论示例，该示例仍然存在驱动程序版本之间的兼容性问题。例：

CUresult cuFoo(int bar); // Introduced in CUDA 11.4

CUresult cuFoo_v2(int bar); // Introduced in CUDA 11.5

CUresult cuFoo_v3(int bar, void* jazz); // Introduced in CUDA 11.6

typedef CUresult (CUDAAPI *PFN_cuFoo_v11040)(int bar);

typedef CUresult (CUDAAPI *PFN_cuFoo_v11050)(int bar);

typedef CUresult (CUDAAPI *PFN_cuFoo_v11060)(int bar, void* jazz);

请注意，自 CUDA 11.4 中最初创建以来，API 已被修改两次，CUDA 11.6 中的最新 API 也修改了函数的 API/ABI 接口。针对 CUDA 11.5 编译的用户代码中的用法是：

#include <cuda.h>

#include <cudaTypedefs.h>

CUresult status;

int cudaVersion;

CUdriverProcAddressQueryResult driverStatus;

status = cuDriverGetVersion(&cudaVersion);

// handle status

PFN_cuFoo_v11040 pfn_cuFoo_v11040;

PFN_cuFoo_v11050 pfn_cuFoo_v11050;

if(cudaVersion < 11050 ) {

    // We know to get the CUDA 11.4 version

    status = cuGetProcAddress("cuFoo", &pfn_cuFoo_v11040, cudaVersion, CU_GET_PROC_ADDRESS_DEFAULT, &driverStatus);

    // Handle status and validating pfn_cuFoo_v11040

else {

    // Assume >= CUDA 11.5 version we can use the second version

    status = cuGetProcAddress("cuFoo", &pfn_cuFoo_v11050, cudaVersion, CU_GET_PROC_ADDRESS_DEFAULT, &driverStatus);

    // Handle status and validating pfn_cuFoo_v11050

在此示例中，如果没有 CUDA 11.6 中新 typedef 的更新，并且使用这些新的 typedef 和大小写处理重新编译应用程序，应用程序将返回cuFoo_v3函数指针，然后该函数的任何使用都会导致未定义的行为。此示例的重点是说明，即使是显式的版本检查cuGetProcAddress也可能无法安全地涵盖 CUDA 主要版本中的次要版本颠簸。

4.4. 运行时 API 使用问题

上述示例重点介绍了驱动程序 API 用于获取指向驱动程序 API 的函数指针的用法问题。现在，将讨论的运行时 API 使用cudaApiGetDriverEntryPoint的潜在问题。

将首先使用类似于上述的运行时 API。

#include <cuda.h>

#include <cudaTypedefs.h>

#include <cuda_runtime.h>

CUresult status;

cudaError_t error;

int driverVersion, runtimeVersion;

CUdriverProcAddressQueryResult driverStatus;

// Ask the runtime for the function

PFN_cuDeviceGetUuid pfn_cuDeviceGetUuidRuntime;

error = cudaGetDriverEntryPoint ("cuDeviceGetUuid", &pfn_cuDeviceGetUuidRuntime, cudaEnableDefault, &driverStatus);

if(cudaSuccess == error && pfn_cuDeviceGetUuidRuntime) {

    // pfn_cuDeviceGetUuid points to ???

此示例中的函数指针甚至比上面的仅驱动程序示例更复杂，因为无法控制要获取哪个版本的函数;它将始终获取当前 CUDA 运行时版本的 API。有关详细信息，请参阅下表：

V11.3 => 11.3 CUDA Runtime and Toolkit (includes header files cuda.h and cudaTypedefs.h)

V11.4 => 11.4 CUDA Runtime and Toolkit (includes header files cuda.h and cudaTypedefs.h)

v1 => cuDeviceGetUuid

v2 => cuDeviceGetUuid_v2

x => Implies the typedef function pointer won't match the returned

     function pointer.  In these cases, the typedef at compile time

     using a CUDA 11.4 runtime, would match the _v2 version, but the

     returned function pointer would be the original (non _v2) function.

表中的问题来自较新的 CUDA 11.4 运行时和工具包以及较旧的驱动程序（CUDA 11.3）组合，在上面标记为 v1x。此组合将使驱动程序返回指向旧函数（非 _v2）的指针，但应用程序中使用的 typedef 将用于新函数指针。

4.5. 运行时 API 和动态版本控制的问题

当考虑编译应用程序的 CUDA 版本、CUDA 运行时版本和应用程序动态链接的 CUDA 驱动程序版本的不同组合时，会出现更多复杂性。

#include <cuda.h>

#include <cudaTypedefs.h>

#include <cuda_runtime.h>

CUresult status;

cudaError_t error;

int driverVersion, runtimeVersion;

CUdriverProcAddressQueryResult driverStatus;

enum cudaDriverEntryPointQueryResult runtimeStatus;

PFN_cuDeviceGetUuid pfn_cuDeviceGetUuidDriver;

status = cuGetProcAddress("cuDeviceGetUuid", &pfn_cuDeviceGetUuidDriver, CUDA_VERSION, CU_GET_PROC_ADDRESS_DEFAULT, &driverStatus);

if(CUDA_SUCCESS == status && pfn_cuDeviceGetUuidDriver) {

    // pfn_cuDeviceGetUuidDriver points to ???

// Ask the runtime for the function

PFN_cuDeviceGetUuid pfn_cuDeviceGetUuidRuntime;

error = cudaGetDriverEntryPoint ("cuDeviceGetUuid", &pfn_cuDeviceGetUuidRuntime, cudaEnableDefault, &runtimeStatus);

if(cudaSuccess == error && pfn_cuDeviceGetUuidRuntime) {

    // pfn_cuDeviceGetUuidRuntime points to ???

// Ask the driver for the function based on the driver version (obtained via runtime)

error = cudaDriverGetVersion(&driverVersion);

PFN_cuDeviceGetUuid pfn_cuDeviceGetUuidDriverDriverVer;

status = cuGetProcAddress ("cuDeviceGetUuid", &pfn_cuDeviceGetUuidDriverDriverVer, driverVersion, CU_GET_PROC_ADDRESS_DEFAULT, &driverStatus);

if(CUDA_SUCCESS == status && pfn_cuDeviceGetUuidDriverDriverVer) {

    // pfn_cuDeviceGetUuidDriverDriverVer points to ???

需要以下函数指针矩阵：

tX -> Typedef version used at compile time

vX -> Version returned/used at runtime

如果应用程序是针对 CUDA 版本 11.3 编译的，它将具有原始函数的 typedef，但如果针对 CUDA 版本 11.4 编译，它将具有 _v2 函数的 typedef。因此，请注意 typedef 与返回/使用的实际版本不匹配的情况数。

4.6. 对 API/ABI 的影响

在上面使用 cuDeviceGetUuid的示例中，不匹配的 API 的影响很小，并且对于许多用户来说可能并不完全明显，因为添加了_v2以支持多实例 GPU （MIG）模式。因此，在没有 MIG 的系统上，用户甚至可能没有意识到他们正在获得不同的 API。

更成问题的是更改其应用程序签名（以及因此的 ABI）的 API，例如cuCtxCreate .CUDA 2.3 中引入的 _v2 版本目前在使用cuda.h时cuCtxCreate用作默认值，但现在在 CUDA 11.4 中引入了更新的版本（cuCtxCreate_v3）。API 签名也已修改，现在需要额外的参数。因此，在上述某些情况下，如果指向函数指针的 typedef 与返回的函数指针不匹配，则可能会出现不明显的 ABI 不兼容，从而导致未定义的行为。

例如，假设以下代码针对安装了 CUDA 11.3 驱动程序的 CUDA 11.4 工具包编译：

PFN_cuCtxCreate cuUnknown;

CUdriverProcAddressQueryResult driverStatus;

status = cuGetProcAddress("cuCtxCreate", (void**)&cuUnknown, cudaVersion, CU_GET_PROC_ADDRESS_DEFAULT, &driverStatus);

if(CUDA_SUCCESS == status && cuUnknown) {

    status = cuUnknown(&ctx, 0, dev);

运行此代码，其中cudaVersion设置为 >=11040（指示 CUDA 11.4）可能会有未定义的行为，因为没有充分提供 _v3 版本的cuCtxCreate_v3 API 所需的所有参数。

5. 确定 cuGetProcAddress 失败原因

cuGetProcAddress有两种类型的误差。它们是（1） API/使用误差和（2）无法找到请求的驱动程序 API。第一个误差类型将通过 CUresult 返回值从 API 返回误差代码。诸如将 NULL 作为pfn变量传递或传递无效flags .

第二个误差类型在 CUdriverProcAddressQueryResult *symbolStatus 中编码，可用于帮助区分驱动程序无法找到请求的符号的潜在问题。举个例子：

// cuDeviceGetExecAffinitySupport was introduced in release CUDA 11.4

#include <cuda.h>

CUdriverProcAddressQueryResult driverStatus;

cudaVersion = ...;

status = cuGetProcAddress("cuDeviceGetExecAffinitySupport", &pfn, cudaVersion, 0, &driverStatus);

if (CUDA_SUCCESS == status) {

    if (CU_GET_PROC_ADDRESS_VERSION_NOT_SUFFICIENT == driverStatus) {

        printf("We can use the new feature when you upgrade cudaVersion to 11.4, but CUDA driver is good to go!\n");

        // Indicating cudaVersion was < 11.4 but run against a CUDA driver >= 11.4

    else if (CU_GET_PROC_ADDRESS_SYMBOL_NOT_FOUND == driverStatus) {

        printf("Please update both CUDA driver and cudaVersion to at least 11.4 to use the new feature!\n");

        // Indicating driver is < 11.4 since string not found, doesn't matter what cudaVersion was

    else if (CU_GET_PROC_ADDRESS_SUCCESS == driverStatus && pfn) {

        printf("You're using cudaVersion and CUDA driver >= 11.4, using new feature!\n");

        pfn();

返回代码的CU_GET_PROC_ADDRESS_VERSION_NOT_SUFFICIENT第一个情况表示在 CUDA 驱动程序中搜索时找到了symbol ，但它的添加时间晚于cudaVersion提供的驱动程序。在示例中，指定为 11030 或更少的任何值，并且在针对 CUDA 驱动程序运行时 >= CUDA 11.4 将给出以下cudaVersionCU_GET_PROC_ADDRESS_VERSION_NOT_SUFFICIENT结果。这是因为cuDeviceGetExecAffinitySupport是在 CUDA 11.4 （11040）中添加的。

返回代码CU_GET_PROC_ADDRESS_SYMBOL_NOT_FOUND的第二种情况表示在 CUDA 驱动程序中搜索时未找到symbol 。这可能是由于几个原因造成的，例如由于驱动程序较旧而导致的 CUDA 功能不受支持以及只是拼写误差。在后者中，类似于上一个示例，如果用户已输入symbol CUDeviceGetExecAffinitySupport - 请注意大写的 CU 以开始字符串 - 将无法找到 API，因为字符串cuGetProcAddress不匹配。在前一种情况下，示例可能是用户针对支持新 API 的 CUDA 驱动程序开发应用程序，并针对较旧的 CUDA 驱动程序部署应用程序。使用最后一个示例，如果开发人员针对 CUDA 11.4 或更高版本开发，但针对 CUDA 11.3 驱动程序部署，则在开发过程中，他们cuGetProcAddress可能已经成功了，但是当部署针对 CUDA 11.3 驱动程序运行的应用程序时，调用将不再适用于CU_GET_PROC_ADDRESS_SYMBOL_NOT_FOUND返回的driverStatus。

人工智能芯片与自动驾驶