训练好的深度学习模型是怎么部署的？

FightingCV

训练好的深度学习模型是怎么部署的？

来源： https://www. zhihu.com/question/3293 72124

作者：田子宸

先说结论：部署的方式取决于需求

需求一：简单的demo演示，只要看看效果的，像是学校里面的demo展示这种

caffe、tf、pytorch等框架随便选一个，切到test模式，拿python跑一跑就好，顺手写个简单的GUI展示结果

高级一点，可以用CPython包一层接口，然后用C++工程去调用

需求二：要放到服务器上去跑，但一不要求吞吐二不要求时延的那种，说白了还是有点玩玩的意思

caffe、tf、pytorch等框架随便选一个，按照官方的部署教程，老老实实用C++部署，例如pytorch模型用工具导到libtorch下跑（官方有教程，很简单）

这种还是没有脱离框架，有很多为训练方便保留的特性没有去除，性能并不是最优的；

另外，这些框架要么CPU，要么NVIDIA GPU，对硬件平台有要求，不灵活；

还有，框架是真心大，占内存（tf还占显存），占磁盘

需求三：放到服务器上跑，要求吞吐和时延（重点是吞吐）

这种应用在互联网企业居多，一般是互联网产品的后端AI计算，例如人脸验证、语音服务、应用了深度学习的智能推荐等。

由于一般是大规模部署，这时不仅仅要考虑吞吐和时延，还要考虑功耗和成本。所以除了软件外，硬件也会下功夫，比如使用推理专用的NVIDIA P4、寒武纪MLU100等。这些推理卡比桌面级显卡功耗低，单位能耗下计算效率更高，且硬件结构更适合高吞吐量的情况

软件上，一般都不会直接上深度学习框架。对于NVIDIA的产品，一般都会使用TensorRT来加速（我记得NVIDIA好像还有TensorRT inference server什么的，名字记不清了，反正是不仅可以加速前传，还顺手帮忙调度了）。TensorRT用了CUDA、CUDNN，而且还有图优化、fp16、int8量化等。反正用NVIDIA的一套硬软件就对了

需求四：放在NVIDIA嵌入式平台上跑，注重时延

比如PX2、TX2、Xavier等，参考上面（用全家桶就对了），也就是贵一点嘛

需求五：放在其他嵌入式平台上跑，注重时延

硬件方面，要根据模型计算量和时延要求，结合成本和功耗要求，选合适的嵌入式平台。

比如模型计算量大的，可能就要选择带GPU的SoC，用opencl/opengl/vulkan编程；也可以试试NPU，不过现在NPU支持的算子不多，一些自定义Op多的网络可能部署不上去

对于小模型，或者帧率要求不高的，可能用CPU就够了，不过一般需要做点优化（剪枝、量化、SIMD、汇编、Winograd等）

顺带一提，在手机上部署深度学习模型也可以归在此列，只不过硬件没得选，用户用什么手机你就得部署在什么手机上23333。为老旧手机部署才是最为头疼的

上述部署和优化的软件工作，在一些移动端开源框架都有人做掉了，一般拿来改改就可以用了，性能都不错。

需求六：上述部署方案不满足我的需求

比如开源移动端框架速度不够——自己写一套。比如像商汤、旷世、Momenta都有自己的前传框架，性能应该都比开源框架好。只不过自己写一套比较费时费力，且如果没有经验的话，很有可能费半天劲写不好

剩下的也只能见招拆招了，祝题主顺利。

作者：倪静风

模型最好用c++重写，像tensorflow,caffe,pytorch,mxnet可以直接编译成二进制。c++对矩阵、张量、图像运算，并行计算可以使用opencv，openml，opencl，opengl，cuda、cudnn等加速。

放入移动设备时，需要有专门的加速芯片，对权重进行剪枝，可能要重新编译。

在服务器上，你可以做个分布式负载均衡。

不追求性能可以用Docker做容器，用Kubernetes做集群，用python的flask做成微服务。下面是实例是在Flask的微服务中调用Keras预测图像类别，ResNet50为ImageNet数据集上预训练好的深度残差网络，在生产集群上可以要用flask+nginx+gunicorn实现微服务整合。

# USAGE
# Start the server:
# python app.py
# Submit a request via cURL:
# curl -X POST -F image=@dog.jpg 'http://localhost:5000/predict'
# import the necessary packages
from keras.applications import ResNet50
from keras.preprocessing.image import img_to_array
from keras.applications import imagenet_utils
from PIL import Image
import numpy as np
import flask
import io
import tensorflow as tf
# initialize our Flask application and the Keras model
app = flask.Flask(__name__)
model = None
def load_model():
 # load the pre-trained Keras model (here we are using a model
 # pre-trained on ImageNet and provided by Keras, but you can
 # substitute in your own networks just as easily)
 global model
 model = ResNet50(weights="imagenet")
 global graph
 graph = tf.get_default_graph()
def prepare_image(image, target):
 # if the image mode is not RGB, convert it
 if image.mode != "RGB":
 image = image.convert("RGB")
 # resize the input image and preprocess it
 image = image.resize(target)
 image = img_to_array(image)
 image = np.expand_dims(image, axis=0)
 image = imagenet_utils.preprocess_input(image)
 # return the processed image
 return image
@app.route("/predict", methods=["POST"])
def predict():
 # initialize the data dictionary that will be returned from the
 # view
 data = {"success": False}
 # ensure an image was properly uploaded to our endpoint
 if flask.request.method == "POST":
 if flask.request.files.get("image"):
 # read the image in PIL format
 image = flask.request.files["image"].read()
 image = Image.open(io.BytesIO(image))
 # preprocess the image and prepare it for classification
 image = prepare_image(image, target=(224, 224))
 # classify the input image and then initialize the list
 # of predictions to return to the client
 with graph.as_default():
 preds = model.predict(image)
 results = imagenet_utils.decode_predictions(preds)
 data["predictions"] = []
 # loop over the results and add them to the list of
 # returned predictions
 for (imagenetID, label, prob) in results[0]:
 r = {"label": label, "probability": float(prob)}
 data["predictions"].append(r)
 # indicate that the request was a success
 data["success"] = True
 # return the data dictionary as a JSON response
 return flask.jsonify(data)
# if this is the main thread of execution first load the model and