編譯模型 — Qualcomm® AI Hub 說明文件

編譯 PyTorch 到 TensorFlow Lite 

要編譯 PyTorch 模型，我們必須首先使用 PyTorch 中的 jit.trace 方法在記憶體中生成 TorchScript 模型。一旦追蹤完成，您可以使用 submit_compile_job() API 來編譯該模型。
TensorFlow Lite 模型可以在 CPU、GPU (使用 GPU 委派 ) 或 NPU (使用 QNN 委派 ) 上運行.
import torch
import torchvision
import qai_hub as hub
# Using pre-trained MobileNet
torch_model = torchvision.models.mobilenet_v2(pretrained=True)
torch_model.eval()
# Trace model
input_shape: tuple[int, ...] = (1, 3, 224, 224)
example_input = torch.rand(input_shape)
pt_model = torch.jit.trace(torch_model, example_input)
# Compile model on a specific device
compile_job = hub.submit_compile_job(
    pt_model,
    name="MobileNet_V2",
    device=hub.Device("Samsung Galaxy S24 (Family)"),
    input_specs=dict(image=input_shape),
# Download the optimized compiled model
compile_job.download_target_model("MobileNet_V2.tflite")
如果您已經有保存的追蹤或腳本化的 torch 模型 (使用 torch.jit.save 保存)，您可以直接提交.我們將使用 mobilenet_v2.pt 作為範例.在此範例中，我們還會分析編譯的模型
import qai_hub as hub
# Compile a model
compile_job = hub.submit_compile_job(
    model="mobilenet_v2.pt",
    device=hub.Device("Samsung Galaxy S24 (Family)"),
    input_specs=dict(image=(1, 3, 224, 224)),
# Profile the compiled model
profile_job = hub.submit_profile_job(
    model=compile_job.get_target_model(),
    device=hub.Device("Samsung Galaxy S24 (Family)"),
# Download the optimized compiled model
compile_job.download_target_model("MobileNet_V2.tflite")
編譯 PyTorch 模型到 QNN 模型庫
Qualcomm® AI Hub 支援將 PyTorch 模型編譯和分析為 QNN 模型庫.在此範例中，我們將使用 mobilenet_v2.pt 並將其編譯為 ARM64 Android 平台 (aarch64_android) 的 QNN 模型庫 (.so 檔案).
模型庫是一種與作業系統相關的部署機制，與 SoC 無關。請注意，Qualcomm® AI Engine Direct SDK 不保證模型庫與所有 SDK 版本的 ABI 相容性。這表示使用某一版本 SDK 編譯的模型不一定能在其他版本的 SDK 上執行。詳情請參閱 Qualcomm® AI Engine Direct Options。
import qai_hub as hub
# Compile a model to a QNN Model Library
compile_job = hub.submit_compile_job(
    model="mobilenet_v2.pt",
    device=hub.Device("Samsung Galaxy S23 (Family)"),
    options="--target_runtime qnn_lib_aarch64_android",
    input_specs=dict(image=(1, 3, 224, 224)),
assert isinstance(compile_job, hub.CompileJob)
返回值是一個 CompileJob 的實例.請參閱 此範例 了解如何為 Snapdragon® 神經處理單元 (NPU) 分析此模型.
將 PyTorch 模型編譯為 QNN DLC
Qualcomm® AI Hub 支持將 PyTorch 模型編譯和分析為 QNN DLC。在此範例中，我們將使用 mobilenet_v2.pt 並將其編譯為 QNN DCL（.bin 文件）。
DLC 與硬體無關。 Qualcomm® AI Engine Direct SDK 保證 DLC 可與更新版本的 SDK 相容。這表示使用某一版本 SDK 編譯的 DLC 可在更新版本的 SDK 上執行。詳情請參閱 Qualcomm® AI Engine Direct Options。
import qai_hub as hub
# Compile a model to QNN DLC
compile_job = hub.submit_compile_job(
    model="mobilenet_v2.pt",
    device=hub.Device("Samsung Galaxy S24 (Family)"),
    options="--target_runtime qnn_dlc",
    input_specs=dict(image=(1, 3, 224, 224)),
assert isinstance(compile_job, hub.CompileJob)
返回值是一個 CompileJob 的實例.請參閱 此範例 了解如何為 Snapdragon® 神經處理單元 (NPU) 分析此模型.
編譯 PyTorch 模型到 QNN 上下文二進位檔
Qualcomm® AI Hub 支援將 PyTorch 模型編譯和分析為 QNN 上下文二進位檔.在此範例中，我們將使用 mobilenet_v2.pt 並將其編譯為針對特定設備優化的 QNN 上下文二進位檔.由於它們是針對特定硬體優化的，因此只能為單一設備編譯.
上下文二進位檔是一種 SOC 特定的部署機制.當為設備編譯時，預期模型將部署到相同的設備.該格式與操作系統無關，因此相同的模型可以部署在 Android、Linux 或 Windows .上下文二進位檔僅設計用於 NPU.
import qai_hub as hub
# Compile a model to QNN context binary
compile_job = hub.submit_compile_job(
    model="mobilenet_v2.pt",
    device=hub.Device("Samsung Galaxy S24 (Family)"),
    options="--target_runtime qnn_context_binary",
    input_specs=dict(image=(1, 3, 224, 224)),
assert isinstance(compile_job, hub.CompileJob)
返回值是一個 CompileJob 的實例.請參閱 此範例 了解如何為 Snapdragon® 神經處理單元 (NPU) 分析此模型.
QNN 上下文二進位檔也可以嵌入到 ONNX 模型中.
編譯為預編譯的 QNN ONNX
Qualcomm® AI Hub 支援編譯和分析預編譯的 ONNX Runtime 模型。這是一個與 ONNX Runtime 相容的模型，包含可在 Snapdragon 設備上使用 ONNX Runtime 運行的預編譯 QNN 二進位檔案。更多詳細資訊請參閱 此處文件。
使用預編譯 QNN ONNX 的優點:
部署方便:適用於 Android、Linux 或 Windows.
性能提升:相當於 QNN 上下文二進位檔.
簡單的推理代碼: ONNX Runtime 使用 QNN Execution Provider 在編譯的模型上運行推理.
大型模型:適用於大型模型 (>1GB) 如 LLMs、Stable Diffusion 等.
請注意，QNN 上下文二進位檔與作業系統無關，但與裝置相關.此外，上下文二進位檔僅適用於 NPU.在此範例中，假設我們想要針對 Snapdragon® 8 Elite:
import qai_hub as hub
# Compile a model to QNN context binary
compile_job = hub.submit_compile_job(
    model="mobilenet_v2.pt",
    device=hub.Device("Snapdragon 8 Elite QRD"),
    options="--target_runtime precompiled_qnn_onnx",
    input_specs=dict(image=(1, 3, 224, 224)),
assert isinstance(compile_job, hub.CompileJob)
編譯的模型是一個可以打包的目錄（副檔名為 .onnx），其中包含一個 ONNX 檔案和一個 QNN 上下文二進位檔案。如果您上傳自己預先編譯的 ONNX Runtime 模型，它應該符合以下文件夾結構：
<modeldir>.onnx
   ├── <model>.onnx
   └── <model>.bin
請注意，從 ONNX 模型到 QNN 上下文二進位檔有相對路徑引用，因此如果您重新命名或移動 .bin 檔案，請注意該引用.
編譯 PyTorch 模型以適用於 ONNX Runtime
Qualcomm® AI Hub 支援為 ONNX Runtime 編譯 PyTorch 模型。在此範例中，我們將使用 mobilenet_v2.pt 並將其編譯為 ONNX 模型。此模型可以使用 ONNX Runtime 進行分析。
ONNX Runtime 支援在 CPU、GPU（使用 DML Execution Provider）或 NPU（使用 QNN Execution Provider）上執行：
import qai_hub as hub
# Compile a model to an ONNX model
compile_job = hub.submit_compile_job(
    model="mobilenet_v2.pt",
    device=hub.Device("Samsung Galaxy S23 (Family)"),
    options="--target_runtime onnx",
    input_specs=dict(image=(1, 3, 224, 224)),
# Download the optimized compiled model
compile_job.download_target_model("MobileNet_V2.onnx")
編譯 ONNX 模型為 TensorFlow Lite 或 QNN
Qualcomm® AI Hub 也支援將 ONNX 模型編譯為 TensorFlow Lite 或 QNN 模型庫。我們將使用 mobilenet_v2.onnx 作為範例。
import qai_hub as hub
# Compile a model to TensorFlow Lite
compile_job = hub.submit_compile_job(
    model="mobilenet_v2.onnx",
    device=hub.Device("Samsung Galaxy S23 (Family)"),
compile_job.download_target_model("MobileNet_V2.tflite")
# Compile a model to a QNN Model Library
compile_job = hub.submit_compile_job(
    model="mobilenet_v2.onnx",
    device=hub.Device("Samsung Galaxy S23 (Family)"),
    options="--target_runtime qnn_lib_aarch64_android",
compile_job.download_target_model("MobileNet_V2.so")
# Compile a model to a QNN DLC
compile_job = hub.submit_compile_job(
    model="mobilenet_v2.onnx",
    device=hub.Device("Samsung Galaxy S23 (Family)"),
    options="--target_runtime qnn_dlc",
compile_job.download_target_model("MobileNet_V2.dlc")
請注意，ONNX 模型可能是未量化的（如上例所示），也可能是量化的（如我們在 量化 中所見）。如果來源模型是量化的，則會遵循量化參數以生成量化的可部署資產。ONNX 模型的目錄也可以支持 ONNX 模型的外部權重。這個目錄（附檔名為 .onnx）可以選擇壓縮，必須包含一個 .onnx 文件和一個附檔名為 .data 的權重文件。它應符合以下文件夾結構：
<modeldir>.onnx
   ├── <model>.onnx
   └── <model>.data
其中 <modeldir> 和 <model> 可以是任何名稱。如果您的 ONNX 模型不符合該結構，請使用以下代碼使其符合：
# if you have an ONNX model "file.onnx" which uses external weights,
# but does not adhere to Qualcomm AI Hub's required format, use this
# code to make it adhere
import onnx
model = onnx.load("file.onnx")
onnx.save(model, "new_file.onnx", save_as_external_data=True, location="new_file.data")
# place both "new_file.onnx" and "new_file.data" in a new directory with
# a .onnx extension, without any other files and upload that directory
# to Qualcomm AI Hub, either as is or as a .zip file
請注意，從 ONNX 模型到權重文件有相對路徑引用，因此如果您重新命名或移動權重文件，請注意該引用。
將使用 AIMET 量化的模型編譯為 TensorFlow Lite 或 QNN
AI Model Efficiency Toolkit (AIMET) 是一個開源庫，提供用於訓練神經網絡模型的先進模型量化和壓縮技術。AIMET 的 QuantizationSimModel 可以導出為 ONNX 模型（.onnx）和具有量化參數的編碼文件（.encodings）。
要使用此模型，請建立一個名稱中包含 .aimet 的目錄。它應包含一個 .onnx 模型和相應的編碼文件，
<modeldir>.aimet
   ├── <model>.onnx
   ├── <model>.data (optional)
   └── <encodings>.encodings
其中 <modeldir>, <model>, 和 <encodings> 可以是任何名稱。只有當 ONNX 模型具有外部權重時，才需要 <model.data>。
讓我們以 mobilenet_v2_onnx.aimet.zip 為例。解壓到 mobilenet_v2_onnx.aimet 目錄後，我們可以通過以下方式提交編譯作業：
import qai_hub as hub
# Compile to TensorFlow Lite
compile_job = hub.submit_compile_job(
    model="mobilenet_v2_onnx.aimet",
    device=hub.Device("Samsung Galaxy S24 (Family)"),
compile_job.download_target_model("MobileNet_V2.tflite")
# Compile to a QNN DLC
compile_job = hub.submit_compile_job(
    model="mobilenet_v2_onnx.aimet",
    device=hub.Device("Samsung Galaxy S24 (Family)"),
    options="--target_runtime qnn_dlc --quantize_full_type int8",
compile_job.download_target_model("MobileNet_V2.dlc")
      References to "Qualcomm" may mean Qualcomm Incorporated, or
      subsidiaries or business units within the Qualcomm corporate structure, as
      applicable.
      Qualcomm Incorporated includes our licensing business, QTL, and the vast
      majority of our patent portfolio. Qualcomm Technologies, Inc., a
      subsidiary of Qualcomm Incorporated, operates, along with its
      subsidiaries, substantially all of our engineering, research and
      development functions, and substantially all of our products and services
      businesses, including our QCT semiconductor business.
      Materials that are as of a specific date, including but not limited to
      press releases, presentations, blog posts and webcasts, may have been
      superseded by subsequent events or disclosures.
      Nothing in these materials is an offer to sell any of the components or
      devices referenced herein.