十分钟完成 PP-OCRv3 识别全流程实战

项目地址:PaddleOCR github 地址: https://github.com/PaddlePaddle/PaddleOCR

PaddleOCR是百度开源的超轻量级OCR模型库,提供了数十种文本检测、识别模型,旨在打造一套丰富、领先、实用的文字检测、识别模型/工具库,助力使用者训练出更好的模型,并应用落地。同时PaddleOCR也几经更新, 🔥在2022.5.9 发布最新版本PaddleOCR release/2.5

  • 发布 PP-OCRv3 ,速度可比情况下,中文场景效果相比于PP-OCRv2再提升5%,英文场景提升11%,80语种多语言模型平均识别准确率提升5%以上;
  • 发布半自动标注工具 PPOCRLabelv2 :新增表格文字图像、图像关键信息抽取任务和不规则文字图像的标注功能;
  • 发布OCR产业落地工具集:打通22种训练部署软硬件环境与方式,覆盖企业90%的训练部署环境需求;
  • 发布交互式OCR开源电子书 《动手学OCR》 ,覆盖OCR全栈技术的前沿理论与代码实践,并配套教学视频。

本教程旨在帮助使用者快速了解PP-OCRv3识别,并掌握其使用方式,包括:

  1. PP-OCR3识别快速使用
    1. 十分钟完成文本识别模型的训练和预测方式
  • 最后带来PP-OCRv3直播预告,敬请期待!

    1 PP-OCRv3识别快速使用

    本节介绍如何使用PaddleOCR的轻量级模型完成文本识别的任务。

    1.1 准备运行环境

    首先,安装PaddleOCR的依赖库。

    In [ ]

    import os
    # 修改代码运行的默认目录为 /home/aistudio/
    os.chdir("/home/aistudio")
    # 如果git clone方式下载速度慢,您可直接在github中下载PaddleOCR的dygraph分支的zip压缩文件,然后上传到工作环境中解压使用
    #!unzip PaddleOCR-dygraph.zip
    !git clone -b dygraph https://github.com/PaddlePaddle/PaddleOCR.git

    In [ ]

    # 安装依赖库
    os.chdir("/home/aistudio/PaddleOCR")
    !pip install -r requirements.txt -i https://mirror.baidu.com/pypi/simple

    1.2. 快速预测文字内容

    测试图片:

    In [8]

    import os os.chdir('/home/aistudio/PaddleOCR') # 也可安装paddleocr whl包进行快速使用 # !pip install paddleocr from paddleocr import PaddleOCR ocr = PaddleOCR() # need to run only once to download and load model into memory img_path = '/home/aistudio/PaddleOCR/doc/imgs_words/en/word_1.png' result = ocr.ocr(img_path, det=False) for line in result: print(line)
      0%|          | 0.00/3.67M [00:00<?, ?iB/s]
    download https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar to /home/aistudio/.paddleocr/whl/det/ch/ch_PP-OCRv3_det_infer/ch_PP-OCRv3_det_infer.tar
    
    100%|██████████| 3.67M/3.67M [00:00<00:00, 6.36MiB/s]
      0%|          | 0.00/11.9M [00:00<?, ?iB/s]
    download https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar to /home/aistudio/.paddleocr/whl/rec/ch/ch_PP-OCRv3_rec_infer/ch_PP-OCRv3_rec_infer.tar
    
    100%|██████████| 11.9M/11.9M [00:00<00:00, 42.0MiB/s]
     19%|█▉        | 279k/1.45M [00:00<00:00, 2.67MiB/s]
    download https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar to /home/aistudio/.paddleocr/whl/cls/ch_ppocr_mobile_v2.0_cls_infer/ch_ppocr_mobile_v2.0_cls_infer.tar
    
    100%|██████████| 1.45M/1.45M [00:00<00:00, 4.66MiB/s]
    [2022/05/05 14:53:56] ppocr DEBUG: Namespace(alpha=1.0, benchmark=False, beta=1.0, cls_batch_num=6, cls_image_shape='3, 48, 192', cls_model_dir='/home/aistudio/.paddleocr/whl/cls/ch_ppocr_mobile_v2.0_cls_infer', cls_thresh=0.9, cpu_threads=10, crop_res_save_dir='./output', det=True, det_algorithm='DB', det_db_box_thresh=0.6, det_db_score_mode='fast', det_db_thresh=0.3, det_db_unclip_ratio=1.5, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_east_score_thresh=0.8, det_fce_box_type='poly', det_limit_side_len=960, det_limit_type='max', det_model_dir='/home/aistudio/.paddleocr/whl/det/ch/ch_PP-OCRv3_det_infer', det_pse_box_thresh=0.85, det_pse_box_type='quad', det_pse_min_area=16, det_pse_scale=1, det_pse_thresh=0, det_sast_nms_thresh=0.2, det_sast_polygon=False, det_sast_score_thresh=0.5, draw_img_save_dir='./inference_results', drop_score=0.5, e2e_algorithm='PGNet', e2e_char_dict_path='./ppocr/utils/ic15_dict.txt', e2e_limit_side_len=768, e2e_limit_type='max', e2e_model_dir=None, e2e_pgnet_mode='fast', e2e_pgnet_score_thresh=0.5, e2e_pgnet_valid_set='totaltext', enable_mkldnn=False, fourier_degree=5, gpu_mem=500, help='==SUPPRESS==', image_dir=None, ir_optim=True, label_list=['0', '180'], lang='ch', layout=True, layout_label_map=None, layout_path_model='lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config', max_batch_size=10, max_text_length=25, min_subgraph_size=15, mode='structure', ocr=True, ocr_version='PP-OCRv3', output='./output', precision='fp32', process_id=0, rec=True, rec_algorithm='CRNN', rec_batch_num=6, rec_char_dict_path='/home/aistudio/PaddleOCR/ppocr/utils/ppocr_keys_v1.txt', rec_image_shape='3, 32, 320', rec_model_dir='/home/aistudio/.paddleocr/whl/rec/ch/ch_PP-OCRv3_rec_infer', save_crop_res=False, save_log_path='./log_output/', scales=[8, 16, 32], show_log=True, structure_version='PP-STRUCTURE', table=True, table_char_dict_path=None, table_max_len=488, table_model_dir=None, total_process_num=1, type='ocr', use_angle_cls=False, use_dilation=False, use_gpu=True, use_mp=False, use_onnx=False, use_pdserving=False, use_space_char=True, use_tensorrt=False, vis_font_path='./doc/fonts/simfang.ttf', warmup=False)
    
    [2022/05/05 14:53:59] ppocr WARNING: Since the angle classifier is not initialized, the angle classifier will not be uesd during the forward process
    ('JOINT', 0.9179949760437012)
    

    2. 训练文字识别模型

    本节提供了PaddleOCR文本识别任务的全流程指南,包括数据准备、模型训练、调优、评估、预测,各个阶段的详细说明:

    2.1. 数据准备

    2.1.1 自定义数据集

    下面以通用数据集为例, 介绍如何准备数据集:

    建议将训练图片放入同一个文件夹,并用一个txt文件(rec_gt_train.txt)记录图片路径和标签,txt文件里的内容如下:

    注意: txt文件中默认请将图片路径和图片标签用 \t 分割,如用其他方式分割将造成训练报错。

    " 图像文件名                 图像标注信息 "
    train_data/rec/train/word_001.jpg   简单可依赖
    train_data/rec/train/word_002.jpg   用科技让复杂的世界更简单
    

    最终训练集应有如下文件结构:

    |-train_data
      |-rec
        |- rec_gt_train.txt
        |- train
            |- word_001.png
            |- word_002.jpg
            |- word_003.jpg
            | ...
    

    除上述单张图像为一行格式之外,PaddleOCR也支持对离线增广后的数据进行训练,为了防止相同样本在同一个batch中被多次采样,我们可以将相同标签对应的图片路径写在一行中,以列表的形式给出,在训练中,PaddleOCR会随机选择列表中的一张图片进行训练。对应地,标注文件的格式如下。

    ["11.jpg", "12.jpg"]   简单可依赖
    ["21.jpg", "22.jpg", "23.jpg"]   用科技让复杂的世界更简单
    3.jpg   ocr
    

    上述示例标注文件中,"11.jpg"和"12.jpg"的标签相同,都是简单可依赖,在训练的时候,对于该行标注,会随机选择其中的一张图片进行训练。

    同训练集类似,验证集也需要提供一个包含所有图片的文件夹(test)和一个rec_gt_test.txt,验证集的结构如下所示:

    |-train_data
      |-rec
        |- rec_gt_test.txt
        |- test
            |- word_001.jpg
            |- word_002.jpg
            |- word_003.jpg
            | ...
    

    2.1.2 数据下载

    • ICDAR2015

    若您本地没有数据集,可以在官网下载 ICDAR2015 数据,用于快速验证。也可以参考DTRB ,下载 benchmark 所需的lmdb格式数据集。

    如果你使用的是icdar2015的公开数据集,PaddleOCR 提供了一份用于训练 ICDAR2015 数据集的标签文件,通过以下方式下载:

    # 训练集标签
    wget -P ./train_data/ic15_data  https://paddleocr.bj.bcebos.com/dataset/rec_gt_train.txt
    # 测试集标签
    wget -P ./train_data/ic15_data  https://paddleocr.bj.bcebos.com/dataset/rec_gt_test.txt
    

    PaddleOCR 也提供了数据格式转换脚本,可以将ICDAR官网 label 转换为PaddleOCR支持的数据格式。 数据转换工具在 ppocr/utils/gen_label.py, 这里以训练集为例:

    # 将官网下载的标签文件转换为 rec_gt_label.txt
    python gen_label.py --mode="rec" --input_path="{path/of/origin/label}" --output_label="rec_gt_label.txt"
    

    数据样式格式如下,(a)为原始图片,(b)为每张图片对应的 Ground Truth 文本文件: 

    我们在 ~/data/data34824/ 目录下准备了数据集,可以使用如下指令解压数据文件。

    In [9]

    !mkdir train_data && cd ./train_data/ && mkdir -p ic15_data && cd ic15_data && cp ~/data/data34824/ic15_rec.zip ./ && unzip -o -q ic15_rec.zip && tar xf ic15.tar

    2.2. 开始训练

    PaddleOCR提供了训练脚本、评估脚本和预测脚本,本节将以 PP-OCRv3中文识别模型为例:

    首先下载pretrain model,您可以下载训练好的模型在 icdar2015 数据上进行finetune

    In [10]

    !wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar
    # 解压模型参数
    !cd ./pretrain_models/ && tar -xf ch_PP-OCRv3_rec_train.tar && rm -rf ch_PP-OCRv3_rec_train.tar
    --2022-05-05 14:54:16--  https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar
    正在解析主机 paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)... 182.61.200.229, 182.61.200.195, 2409:8c04:1001:1002:0:ff:b001:368a
    正在连接 paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)|182.61.200.229|:443... 已连接。
    已发出 HTTP 请求,正在等待回应... 200 OK
    长度: 287467520 (274M) [application/x-tar]
    正在保存至: “./pretrain_models/ch_PP-OCRv3_rec_train.tar”
    ch_PP-OCRv3_rec_tra 100%[===================>] 274.15M  47.9MB/s    in 8.6s    
    2022-05-05 14:54:24 (31.9 MB/s) - 已保存 “./pretrain_models/ch_PP-OCRv3_rec_train.tar” [287467520/287467520])
    

    2.2.1 启动训练

    需将configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml中的训练和评估数据集路径修改为ic15的数据集路径:

    Train:
      dataset:
        name: SimpleDataSet
        data_dir: ./train_data/ic15_data/
        ext_op_transform_idx: 1
        label_file_list: ["./train_data/ic15_data/rec_gt_train.txt"]
        ......
    Eval:   
      dataset:
        name: SimpleDataSet
        data_dir: ./train_data/ic15_data
        label_file_list: ["./train_data/ic15_data/rec_gt_test.txt"]
    

    如果您安装的是cpu版本,请将配置文件中的 use_gpu 字段修改为false

    启动训练命令很简单,指定好配置文件即可。另外在命令行中可以通过 -o 修改配置文件中的参数值。启动训练命令如下所示

    • Global.pretrained_model: 加载的预训练模型路径
    • Global.character_dict_path : 字典路径(这里只支持26个小写字母+数字)
    • Global.eval_batch_step : 评估频率
    • Global.epoch_num: 总训练轮数

    如果训练速度慢,可去掉数据增强,但是当数据量较少,应用场景复杂时,建议保留数据增强,可提高模型泛化性和精度。

    Train:
      dataset:
        name: SimpleDataSet
        data_dir: ./train_data/ic15_data/
        ext_op_transform_idx: 1
        label_file_list: ["./train_data/ic15_data/rec_gt_train.txt"]
        transforms:
        - DecodeImage:
            img_mode: BGR
            channel_first: false
        # - RecConAug:
        #     prob: 0.5
        #     ext_data_num: 2
        #     image_shape: [48, 320, 3]
        # - RecAug:
    

    In [ ]

    # 由于预训练模型提供的是蒸馏模型,需先将Student模型的参数提取出
    import paddle
    params = paddle.load('./pretrain_models/ch_PP-OCRv3_rec_train/best_accuracy' + '.pdparams')
    new_state_dict = {}
    for k1 in params.keys():
        if 'Student.' in k1:
            new_state_dict[k1.replace('Student.','')] = params[k1]
            # print(k1)
    paddle.save(new_state_dict, './pretrain_models/ch_PP-OCRv3_rec_train/best_accuracy'+'_new.pdparams')
    # CPU 训练
    # 训练icdar15英文数据 训练日志会自动保存为 "{save_model_dir}" 下的train.log
    # !python3 tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml \
    #  -o Global.pretrained_model=./pretrain_models/ch_PP-OCRv3_rec_train/best_accuracy_new Global.use_gpu=False \
    #     Global.character_dict_path=ppocr/utils/en_dict.txt \
    #     Global.eval_batch_step=[0,200] \
    #     Global.epoch_num=40
    # GPU训练 支持单卡,多卡训练
    #单卡训练
    !python3 tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml \
     -o Global.pretrained_model=./pretrain_models/ch_PP-OCRv3_rec_train/best_accuracy_new\
        Global.character_dict_path=ppocr/utils/en_dict.txt \
        Global.eval_batch_step=[0,200] \
        Global.epoch_num=40
    #多卡训练,通过--gpus参数指定卡号
    #!python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o Global.pretrained_model=./pretrain_models/ch_PP-OCRv3_rec_train/best_accuracy

    2.3. 模型评估与预测

    2.3.1 评估

    训练中模型参数默认保存在Global.save_model_dir目录下。在评估指标时,需要设置Global.checkpoints指向保存的参数文件。评估数据集可以通过 configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml 修改Eval中的 label_file_path 设置。

    In [22]

    # GPU 评估 !python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o Global.checkpoints=./output/rec_ppocr_v3/best_accuracy Global.character_dict_path=ppocr/utils/en_dict.txt # CPU 评估, Global.checkpoints 为待测权重 # !python3 tools/eval.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o Global.checkpoints=./output/rec_ppocr_v3/best_accuracy Global.use_gpu=False Global.character_dict_path=ppocr/utils/en_dict.txt
    -----------  Configuration Arguments -----------
    backend: auto
    elastic_server: None
    force: False
    gpus: 0
    heter_devices: 
    heter_worker_num: None
    heter_workers: 
    host: None
    http_port: None
    ips: 127.0.0.1
    job_id: None
    log_dir: log
    np: None
    nproc_per_node: None
    run_mode: None
    scale: 0
    server_num: None
    servers: 
    training_script: tools/eval.py
    training_script_args: ['-c', 'configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml', '-o', 'Global.checkpoints=./output/rec_ppocr_v3/best_accuracy', 'Global.character_dict_path=ppocr/utils/en_dict.txt']
    worker_num: None
    workers: 
    ------------------------------------------------
    WARNING 2022-05-05 16:08:53,089 launch.py:423] Not found distinct arguments and compiled with cuda or xpu. Default use collective mode
    launch train in GPU mode!
    INFO 2022-05-05 16:08:53,095 launch_utils.py:528] Local start 1 processes. First process distributed environment info (Only For Debug): 
        +=======================================================================================+
        |                        Distributed Envs                      Value                    |
        +---------------------------------------------------------------------------------------+
        |                       PADDLE_TRAINER_ID                        0                      |
        |                 PADDLE_CURRENT_ENDPOINT                 127.0.0.1:55453               |
        |                     PADDLE_TRAINERS_NUM                        1                      |
        |                PADDLE_TRAINER_ENDPOINTS                 127.0.0.1:55453               |
        |                     PADDLE_RANK_IN_NODE                        0                      |
        |                 PADDLE_LOCAL_DEVICE_IDS                        0                      |
        |                 PADDLE_WORLD_DEVICE_IDS                        0                      |
        |                     FLAGS_selected_gpus                        0                      |
        |             FLAGS_selected_accelerators                        0                      |
        +=======================================================================================+
    INFO 2022-05-05 16:08:53,095 launch_utils.py:532] details abouts PADDLE_TRAINER_ENDPOINTS can be found in log/endpoints.log, and detail running logs maybe found in log/workerlog.0
    launch proc_id:10834 idx:0
    [2022/05/05 16:08:54] ppocr INFO: Architecture : 
    [2022/05/05 16:08:54] ppocr INFO:     Backbone : 
    [2022/05/05 16:08:54] ppocr INFO:         last_conv_stride : [1, 2]
    [2022/05/05 16:08:54] ppocr INFO:         last_pool_type : avg
    [2022/05/05 16:08:54] ppocr INFO:         name : MobileNetV1Enhance
    [2022/05/05 16:08:54] ppocr INFO:         scale : 0.5
    [2022/05/05 16:08:54] ppocr INFO:     Head : 
    [2022/05/05 16:08:54] ppocr INFO:         head_list : 
    [2022/05/05 16:08:54] ppocr INFO:             CTCHead : 
    [2022/05/05 16:08:54] ppocr INFO:                 Head : 
    [2022/05/05 16:08:54] ppocr INFO:                     fc_decay : 1e-05
    [2022/05/05 16:08:54] ppocr INFO:                 Neck : 
    [2022/05/05 16:08:54] ppocr INFO:                     depth : 2
    [2022/05/05 16:08:54] ppocr INFO:                     dims : 64
    [2022/05/05 16:08:54] ppocr INFO:                     hidden_dims : 120
    [2022/05/05 16:08:54] ppocr INFO:                     name : svtr
    [2022/05/05 16:08:54] ppocr INFO:                     use_guide : True
    [2022/05/05 16:08:54] ppocr INFO:             SARHead : 
    [2022/05/05 16:08:54] ppocr INFO:                 enc_dim : 512
    [2022/05/05 16:08:54] ppocr INFO:                 max_text_length : 25
    [2022/05/05 16:08:54] ppocr INFO:         name : MultiHead
    [2022/05/05 16:08:54] ppocr INFO:     Transform : None
    [2022/05/05 16:08:54] ppocr INFO:     algorithm : SVTR
    [2022/05/05 16:08:54] ppocr INFO:     model_type : rec
    [2022/05/05 16:08:54] ppocr INFO: Eval : 
    [2022/05/05 16:08:54] ppocr INFO:     dataset : 
    [2022/05/05 16:08:54] ppocr INFO:         data_dir : ./train_data/ic15_data
    [2022/05/05 16:08:54] ppocr INFO:         label_file_list : ['./train_data/ic15_data/rec_gt_test.txt']
    [2022/05/05 16:08:54] ppocr INFO:         name : SimpleDataSet
    [2022/05/05 16:08:54] ppocr INFO:         transforms : 
    [2022/05/05 16:08:54] ppocr INFO:             DecodeImage : 
    [2022/05/05 16:08:54] ppocr INFO:                 channel_first : False
    [2022/05/05 16:08:54] ppocr INFO:                 img_mode : BGR
    [2022/05/05 16:08:54] ppocr INFO:             MultiLabelEncode : None
    [2022/05/05 16:08:54] ppocr INFO:             RecResizeImg : 
    [2022/05/05 16:08:54] ppocr INFO:                 image_shape : [3, 48, 320]
    [2022/05/05 16:08:54] ppocr INFO:             KeepKeys : 
    [2022/05/05 16:08:54] ppocr INFO:                 keep_keys : ['image', 'label_ctc', 'label_sar', 'length', 'valid_ratio']
    [2022/05/05 16:08:54] ppocr INFO:     loader : 
    [2022/05/05 16:08:54] ppocr INFO:         batch_size_per_card : 128
    [2022/05/05 16:08:54] ppocr INFO:         drop_last : False
    [2022/05/05 16:08:54] ppocr INFO:         num_workers : 4
    [2022/05/05 16:08:54] ppocr INFO:         shuffle : False
    [2022/05/05 16:08:54] ppocr INFO: Global : 
    [2022/05/05 16:08:54] ppocr INFO:     cal_metric_during_train : True
    [2022/05/05 16:08:54] ppocr INFO:     character_dict_path : ppocr/utils/en_dict.txt
    [2022/05/05 16:08:54] ppocr INFO:     checkpoints : ./output/rec_ppocr_v3/best_accuracy
    [2022/05/05 16:08:54] ppocr INFO:     debug : False
    [2022/05/05 16:08:54] ppocr INFO:     distributed : False
    [2022/05/05 16:08:54] ppocr INFO:     epoch_num : 500
    [2022/05/05 16:08:54] ppocr INFO:     eval_batch_step : [0, 2000]
    [2022/05/05 16:08:54] ppocr INFO:     infer_img : doc/imgs_words/ch/word_1.jpg
    [2022/05/05 16:08:54] ppocr INFO:     infer_mode : False
    [2022/05/05 16:08:54] ppocr INFO:     log_smooth_window : 20
    [2022/05/05 16:08:54] ppocr INFO:     max_text_length : 25
    [2022/05/05 16:08:54] ppocr INFO:     pretrained_model : None
    [2022/05/05 16:08:54] ppocr INFO:     print_batch_step : 10
    [2022/05/05 16:08:54] ppocr INFO:     save_epoch_step : 3
    [2022/05/05 16:08:54] ppocr INFO:     save_inference_dir : None
    [2022/05/05 16:08:54] ppocr INFO:     save_model_dir : ./output/rec_ppocr_v3
    [2022/05/05 16:08:54] ppocr INFO:     save_res_path : ./output/rec/predicts_ppocrv3.txt
    [2022/05/05 16:08:54] ppocr INFO:     use_gpu : True
    [2022/05/05 16:08:54] ppocr INFO:     use_space_char : True
    [2022/05/05 16:08:54] ppocr INFO:     use_visualdl : False
    [2022/05/05 16:08:54] ppocr INFO: Loss : 
    [2022/05/05 16:08:54] ppocr INFO:     loss_config_list : 
    [2022/05/05 16:08:54] ppocr INFO:         CTCLoss : None
    [2022/05/05 16:08:54] ppocr INFO:         SARLoss : None
    [2022/05/05 16:08:54] ppocr INFO:     name : MultiLoss
    [2022/05/05 16:08:54] ppocr INFO: Metric : 
    [2022/05/05 16:08:54] ppocr INFO:     ignore_space : False
    [2022/05/05 16:08:54] ppocr INFO:     main_indicator : acc
    [2022/05/05 16:08:54] ppocr INFO:     name : RecMetric
    [2022/05/05 16:08:54] ppocr INFO: Optimizer : 
    [2022/05/05 16:08:54] ppocr INFO:     beta1 : 0.9
    [2022/05/05 16:08:54] ppocr INFO:     beta2 : 0.999
    [2022/05/05 16:08:54] ppocr INFO:     lr : 
    [2022/05/05 16:08:54] ppocr INFO:         learning_rate : 0.001
    [2022/05/05 16:08:54] ppocr INFO:         name : Cosine
    [2022/05/05 16:08:54] ppocr INFO:         warmup_epoch : 5
    [2022/05/05 16:08:54] ppocr INFO:     name : Adam
    [2022/05/05 16:08:54] ppocr INFO:     regularizer : 
    [2022/05/05 16:08:54] ppocr INFO:         factor : 3e-05
    [2022/05/05 16:08:54] ppocr INFO:         name : L2
    [2022/05/05 16:08:54] ppocr INFO: PostProcess : 
    [2022/05/05 16:08:54] ppocr INFO:     name : CTCLabelDecode
    [2022/05/05 16:08:54] ppocr INFO: Train : 
    [2022/05/05 16:08:54] ppocr INFO:     dataset : 
    [2022/05/05 16:08:54] ppocr INFO:         data_dir : ./train_data/ic15_data/
    [2022/05/05 16:08:54] ppocr INFO:         ext_op_transform_idx : 1
    [2022/05/05 16:08:54] ppocr INFO:         label_file_list : ['./train_data/ic15_data/rec_gt_train.txt']
    [2022/05/05 16:08:54] ppocr INFO:         name : SimpleDataSet
    [2022/05/05 16:08:54] ppocr INFO:         transforms : 
    [2022/05/05 16:08:54] ppocr INFO:             DecodeImage : 
    [2022/05/05 16:08:54] ppocr INFO:                 channel_first : False
    [2022/05/05 16:08:54] ppocr INFO:                 img_mode : BGR
    [2022/05/05 16:08:54] ppocr INFO:             MultiLabelEncode : None
    [2022/05/05 16:08:54] ppocr INFO:             RecResizeImg : 
    [2022/05/05 16:08:54] ppocr INFO:                 image_shape : [3, 48, 320]
    [2022/05/05 16:08:54] ppocr INFO:             KeepKeys : 
    [2022/05/05 16:08:54] ppocr INFO:                 keep_keys : ['image', 'label_ctc', 'label_sar', 'length', 'valid_ratio']
    [2022/05/05 16:08:54] ppocr INFO:     loader : 
    [2022/05/05 16:08:54] ppocr INFO:         batch_size_per_card : 128
    [2022/05/05 16:08:54] ppocr INFO:         drop_last : True
    [2022/05/05 16:08:54] ppocr INFO:         num_workers : 4
    [2022/05/05 16:08:54] ppocr INFO:         shuffle : True
    [2022/05/05 16:08:54] ppocr INFO: profiler_options : None
    [2022/05/05 16:08:54] ppocr INFO: train with paddle 2.2.2 and device CUDAPlace(0)
    [2022/05/05 16:08:54] ppocr INFO: Initialize indexs of datasets:['./train_data/ic15_data/rec_gt_test.txt']
    W0505 16:08:54.870386 10834 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
    W0505 16:08:54.875574 10834 device_context.cc:465] device: 0, cuDNN Version: 7.6.
    [2022/05/05 16:08:59] ppocr INFO: resume from ./output/rec_ppocr_v3/best_accuracy
    [2022/05/05 16:08:59] ppocr INFO: metric in ckpt ***************
    [2022/05/05 16:08:59] ppocr INFO: acc:0.5551275851943784
    [2022/05/05 16:08:59] ppocr INFO: norm_edit_dis:0.8100207578002598
    [2022/05/05 16:08:59] ppocr INFO: fps:1855.1658462248745
    [2022/05/05 16:08:59] ppocr INFO: best_epoch:36
    [2022/05/05 16:08:59] ppocr INFO: start_epoch:37
    eval model::   0%|          | 0/17 [00:00<?, ?it/s]
    eval model::   6%|▌         | 1/17 [00:00<00:14,  1.08it/s]
    eval model::  18%|█▊        | 3/17 [00:01<00:09,  1.49it/s]
    eval model::  29%|██▉       | 5/17 [00:01<00:05,  2.04it/s]
    eval model::  41%|████      | 7/17 [00:01<00:03,  2.73it/s]
    eval model::  53%|█████▎    | 9/17 [00:01<00:02,  3.59it/s]
    eval model::  65%|██████▍   | 11/17 [00:01<00:01,  4.72it/s]
    eval model::  76%|███████▋  | 13/17 [00:01<00:00,  6.07it/s]
    eval model::  88%|████████▊ | 15/17 [00:01<00:00,  7.59it/s]
    eval model:: 100%|██████████| 17/17 [00:02<00:00,  8.22it/s]
    [2022/05/05 16:09:01] ppocr INFO: metric eval ***************
    [2022/05/05 16:09:01] ppocr INFO: acc:0.5551275851943784
    [2022/05/05 16:09:01] ppocr INFO: norm_edit_dis:0.8100207578002598
    [2022/05/05 16:09:01] ppocr INFO: fps:2231.024193160108
    INFO 2022-05-05 16:09:05,136 launch.py:311] Local processes completed.
    

    2.3.2 测试识别效果

    使用 PaddleOCR 训练好的模型,可以通过以下脚本进行快速预测。

    默认预测图片存储在 infer_img 里,通过 -o Global.checkpoints 加载训练好的参数文件:

    根据配置文件中设置的的 save_model_dir 和 save_epoch_step 字段,会有以下几种参数被保存下来:

    output/rec/
    ├── best_accuracy.pdopt  
    ├── best_accuracy.pdparams  
    ├── best_accuracy.states  
    ├── config.yml  
    ├── iter_epoch_3.pdopt  
    ├── iter_epoch_3.pdparams  
    ├── iter_epoch_3.states  
    ├── latest.pdopt  
    ├── latest.pdparams  
    ├── latest.states  
    └── train.log
    

    其中 best_accuracy.* 是评估集上的最优模型;iter_epoch_x.* 是以 save_epoch_step 为间隔保存下来的模型;latest.* 是最后一个epoch的模型。

    In [23]

    # 预测英文结果
    # GPU预测
    !python3 tools/infer_rec.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o Global.pretrained_model=./output/rec_ppocr_v3/best_accuracy Global.character_dict_path=ppocr/utils/en_dict.txt Global.infer_img=doc/imgs_words/en/word_1.png
    [2022/05/05 16:09:44] ppocr INFO: Architecture : 
    [2022/05/05 16:09:44] ppocr INFO:     Backbone : 
    [2022/05/05 16:09:44] ppocr INFO:         last_conv_stride : [1, 2]
    [2022/05/05 16:09:44] ppocr INFO:         last_pool_type : avg
    [2022/05/05 16:09:44] ppocr INFO:         name : MobileNetV1Enhance
    [2022/05/05 16:09:44] ppocr INFO:         scale : 0.5
    [2022/05/05 16:09:44] ppocr INFO:     Head : 
    [2022/05/05 16:09:44] ppocr INFO:         head_list : 
    [2022/05/05 16:09:44] ppocr INFO:             CTCHead : 
    [2022/05/05 16:09:44] ppocr INFO:                 Head : 
    [2022/05/05 16:09:44] ppocr INFO:                     fc_decay : 1e-05
    [2022/05/05 16:09:44] ppocr INFO:                 Neck : 
    [2022/05/05 16:09:44] ppocr INFO:                     depth : 2
    [2022/05/05 16:09:44] ppocr INFO:                     dims : 64
    [2022/05/05 16:09:44] ppocr INFO:                     hidden_dims : 120
    [2022/05/05 16:09:44] ppocr INFO:                     name : svtr
    [2022/05/05 16:09:44] ppocr INFO:                     use_guide : True
    [2022/05/05 16:09:44] ppocr INFO:             SARHead : 
    [2022/05/05 16:09:44] ppocr INFO:                 enc_dim : 512
    [2022/05/05 16:09:44] ppocr INFO:                 max_text_length : 25
    [2022/05/05 16:09:44] ppocr INFO:         name : MultiHead
    [2022/05/05 16:09:44] ppocr INFO:     Transform : None
    [2022/05/05 16:09:44] ppocr INFO:     algorithm : SVTR
    [2022/05/05 16:09:44] ppocr INFO:     model_type : rec
    [2022/05/05 16:09:44] ppocr INFO: Eval : 
    [2022/05/05 16:09:44] ppocr INFO:     dataset : 
    [2022/05/05 16:09:44] ppocr INFO:         data_dir : ./train_data/ic15_data
    [2022/05/05 16:09:44] ppocr INFO:         label_file_list : ['./train_data/ic15_data/rec_gt_test.txt']
    [2022/05/05 16:09:44] ppocr INFO:         name : SimpleDataSet
    [2022/05/05 16:09:44] ppocr INFO:         transforms : 
    [2022/05/05 16:09:44] ppocr INFO:             DecodeImage : 
    [2022/05/05 16:09:44] ppocr INFO:                 channel_first : False
    [2022/05/05 16:09:44] ppocr INFO:                 img_mode : BGR
    [2022/05/05 16:09:44] ppocr INFO:             MultiLabelEncode : None
    [2022/05/05 16:09:44] ppocr INFO:             RecResizeImg : 
    [2022/05/05 16:09:44] ppocr INFO:                 image_shape : [3, 48, 320]
    [2022/05/05 16:09:44] ppocr INFO:             KeepKeys : 
    [2022/05/05 16:09:44] ppocr INFO:                 keep_keys : ['image', 'label_ctc', 'label_sar', 'length', 'valid_ratio']
    [2022/05/05 16:09:44] ppocr INFO:     loader : 
    [2022/05/05 16:09:44] ppocr INFO:         batch_size_per_card : 128
    [2022/05/05 16:09:44] ppocr INFO:         drop_last : False
    [2022/05/05 16:09:44] ppocr INFO:         num_workers : 4
    [2022/05/05 16:09:44] ppocr INFO:         shuffle : False
    [2022/05/05 16:09:44] ppocr INFO: Global : 
    [2022/05/05 16:09:44] ppocr INFO:     cal_metric_during_train : True
    [2022/05/05 16:09:44] ppocr INFO:     character_dict_path : ppocr/utils/en_dict.txt
    [2022/05/05 16:09:44] ppocr INFO:     checkpoints : None
    [2022/05/05 16:09:44] ppocr INFO:     debug : False
    [2022/05/05 16:09:44] ppocr INFO:     distributed : False
    [2022/05/05 16:09:44] ppocr INFO:     epoch_num : 500
    [2022/05/05 16:09:44] ppocr INFO:     eval_batch_step : [0, 2000]
    [2022/05/05 16:09:44] ppocr INFO:     infer_img : doc/imgs_words/en/word_1.png
    [2022/05/05 16:09:44] ppocr INFO:     infer_mode : False
    [2022/05/05 16:09:44] ppocr INFO:     log_smooth_window : 20
    [2022/05/05 16:09:44] ppocr INFO:     max_text_length : 25
    [2022/05/05 16:09:44] ppocr INFO:     pretrained_model : ./output/rec_ppocr_v3/best_accuracy
    [2022/05/05 16:09:44] ppocr INFO:     print_batch_step : 10
    [2022/05/05 16:09:44] ppocr INFO:     save_epoch_step : 3
    [2022/05/05 16:09:44] ppocr INFO:     save_inference_dir : None
    [2022/05/05 16:09:44] ppocr INFO:     save_model_dir : ./output/rec_ppocr_v3
    [2022/05/05 16:09:44] ppocr INFO:     save_res_path : ./output/rec/predicts_ppocrv3.txt
    [2022/05/05 16:09:44] ppocr INFO:     use_gpu : True
    [2022/05/05 16:09:44] ppocr INFO:     use_space_char : True
    [2022/05/05 16:09:44] ppocr INFO:     use_visualdl : False
    [2022/05/05 16:09:44] ppocr INFO: Loss : 
    [2022/05/05 16:09:44] ppocr INFO:     loss_config_list : 
    [2022/05/05 16:09:44] ppocr INFO:         CTCLoss : None
    [2022/05/05 16:09:44] ppocr INFO:         SARLoss : None
    [2022/05/05 16:09:44] ppocr INFO:     name : MultiLoss
    [2022/05/05 16:09:44] ppocr INFO: Metric : 
    [2022/05/05 16:09:44] ppocr INFO:     ignore_space : False
    [2022/05/05 16:09:44] ppocr INFO:     main_indicator : acc
    [2022/05/05 16:09:44] ppocr INFO:     name : RecMetric
    [2022/05/05 16:09:44] ppocr INFO: Optimizer : 
    [2022/05/05 16:09:44] ppocr INFO:     beta1 : 0.9
    [2022/05/05 16:09:44] ppocr INFO:     beta2 : 0.999
    [2022/05/05 16:09:44] ppocr INFO:     lr : 
    [2022/05/05 16:09:44] ppocr INFO:         learning_rate : 0.001
    [2022/05/05 16:09:44] ppocr INFO:         name : Cosine
    [2022/05/05 16:09:44] ppocr INFO:         warmup_epoch : 5
    [2022/05/05 16:09:44] ppocr INFO:     name : Adam
    [2022/05/05 16:09:44] ppocr INFO:     regularizer : 
    [2022/05/05 16:09:44] ppocr INFO:         factor : 3e-05
    [2022/05/05 16:09:44] ppocr INFO:         name : L2
    [2022/05/05 16:09:44] ppocr INFO: PostProcess : 
    [2022/05/05 16:09:44] ppocr INFO:     name : CTCLabelDecode
    [2022/05/05 16:09:44] ppocr INFO: Train : 
    [2022/05/05 16:09:44] ppocr INFO:     dataset : 
    [2022/05/05 16:09:44] ppocr INFO:         data_dir : ./train_data/ic15_data/
    [2022/05/05 16:09:44] ppocr INFO:         ext_op_transform_idx : 1
    [2022/05/05 16:09:44] ppocr INFO:         label_file_list : ['./train_data/ic15_data/rec_gt_train.txt']
    [2022/05/05 16:09:44] ppocr INFO:         name : SimpleDataSet
    [2022/05/05 16:09:44] ppocr INFO:         transforms : 
    [2022/05/05 16:09:44] ppocr INFO:             DecodeImage : 
    [2022/05/05 16:09:44] ppocr INFO:                 channel_first : False
    [2022/05/05 16:09:44] ppocr INFO:                 img_mode : BGR
    [2022/05/05 16:09:44] ppocr INFO:             MultiLabelEncode : None
    [2022/05/05 16:09:44] ppocr INFO:             RecResizeImg : 
    [2022/05/05 16:09:44] ppocr INFO:                 image_shape : [3, 48, 320]
    [2022/05/05 16:09:44] ppocr INFO:             KeepKeys : 
    [2022/05/05 16:09:44] ppocr INFO:                 keep_keys : ['image', 'label_ctc', 'label_sar', 'length', 'valid_ratio']
    [2022/05/05 16:09:44] ppocr INFO:     loader : 
    [2022/05/05 16:09:44] ppocr INFO:         batch_size_per_card : 128
    [2022/05/05 16:09:44] ppocr INFO:         drop_last : True
    [2022/05/05 16:09:44] ppocr INFO:         num_workers : 4
    [2022/05/05 16:09:44] ppocr INFO:         shuffle : True
    [2022/05/05 16:09:44] ppocr INFO: profiler_options : None
    [2022/05/05 16:09:44] ppocr INFO: train with paddle 2.2.2 and device CUDAPlace(0)
    W0505 16:09:44.179414 10923 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
    W0505 16:09:44.184576 10923 device_context.cc:465] device: 0, cuDNN Version: 7.6.
    [2022/05/05 16:09:48] ppocr INFO: load pretrain successful from ./output/rec_ppocr_v3/best_accuracy
    [2022/05/05 16:09:48] ppocr INFO: infer_img: doc/imgs_words/en/word_1.png
    [2022/05/05 16:09:48] ppocr INFO: 	 result: JOINT	0.9950313568115234
    [2022/05/05 16:09:48] ppocr INFO: success!
    

    预测图片:

    预测使用的配置文件必须与训练一致.

    测试文件夹下所有图像的文字识别效果

    In [24]

    # GPU预测
    !python3 tools/infer_rec.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o Global.pretrained_model=./output/rec_ppocr_v3/best_accuracy Global.character_dict_path=ppocr/utils/en_dict.txt Global.infer_img=./doc/imgs_words_en/
    [2022/05/05 16:09:59] ppocr INFO: Architecture : 
    [2022/05/05 16:09:59] ppocr INFO:     Backbone : 
    [2022/05/05 16:09:59] ppocr INFO:         last_conv_stride : [1, 2]
    [2022/05/05 16:09:59] ppocr INFO:         last_pool_type : avg
    [2022/05/05 16:09:59] ppocr INFO:         name : MobileNetV1Enhance
    [2022/05/05 16:09:59] ppocr INFO:         scale : 0.5
    [2022/05/05 16:09:59] ppocr INFO:     Head : 
    [2022/05/05 16:09:59] ppocr INFO:         head_list : 
    [2022/05/05 16:09:59] ppocr INFO:             CTCHead : 
    [2022/05/05 16:09:59] ppocr INFO:                 Head : 
    [2022/05/05 16:09:59] ppocr INFO:                     fc_decay : 1e-05
    [2022/05/05 16:09:59] ppocr INFO:                 Neck : 
    [2022/05/05 16:09:59] ppocr INFO:                     depth : 2
    [2022/05/05 16:09:59] ppocr INFO:                     dims : 64
    [2022/05/05 16:09:59] ppocr INFO:                     hidden_dims : 120
    [2022/05/05 16:09:59] ppocr INFO:                     name : svtr
    [2022/05/05 16:09:59] ppocr INFO:                     use_guide : True
    [2022/05/05 16:09:59] ppocr INFO:             SARHead : 
    [2022/05/05 16:09:59] ppocr INFO:                 enc_dim : 512
    [2022/05/05 16:09:59] ppocr INFO:                 max_text_length : 25
    [2022/05/05 16:09:59] ppocr INFO:         name : MultiHead
    [2022/05/05 16:09:59] ppocr INFO:     Transform : None
    [2022/05/05 16:09:59] ppocr INFO:     algorithm : SVTR
    [2022/05/05 16:09:59] ppocr INFO:     model_type : rec
    [2022/05/05 16:09:59] ppocr INFO: Eval : 
    [2022/05/05 16:09:59] ppocr INFO:     dataset : 
    [2022/05/05 16:09:59] ppocr INFO:         data_dir : ./train_data/ic15_data
    [2022/05/05 16:09:59] ppocr INFO:         label_file_list : ['./train_data/ic15_data/rec_gt_test.txt']
    [2022/05/05 16:09:59] ppocr INFO:         name : SimpleDataSet
    [2022/05/05 16:09:59] ppocr INFO:         transforms : 
    [2022/05/05 16:09:59] ppocr INFO:             DecodeImage : 
    [2022/05/05 16:09:59] ppocr INFO:                 channel_first : False
    [2022/05/05 16:09:59] ppocr INFO:                 img_mode : BGR
    [2022/05/05 16:09:59] ppocr INFO:             MultiLabelEncode : None
    [2022/05/05 16:09:59] ppocr INFO:             RecResizeImg : 
    [2022/05/05 16:09:59] ppocr INFO:                 image_shape : [3, 48, 320]
    [2022/05/05 16:09:59] ppocr INFO:             KeepKeys : 
    [2022/05/05 16:09:59] ppocr INFO:                 keep_keys : ['image', 'label_ctc', 'label_sar', 'length', 'valid_ratio']
    [2022/05/05 16:09:59] ppocr INFO:     loader : 
    [2022/05/05 16:09:59] ppocr INFO:         batch_size_per_card : 128
    [2022/05/05 16:09:59] ppocr INFO:         drop_last : False
    [2022/05/05 16:09:59] ppocr INFO:         num_workers : 4
    [2022/05/05 16:09:59] ppocr INFO:         shuffle : False
    [2022/05/05 16:09:59] ppocr INFO: Global : 
    [2022/05/05 16:09:59] ppocr INFO:     cal_metric_during_train : True
    [2022/05/05 16:09:59] ppocr INFO:     character_dict_path : ppocr/utils/en_dict.txt
    [2022/05/05 16:09:59] ppocr INFO:     checkpoints : None
    [2022/05/05 16:09:59] ppocr INFO:     debug : False
    [2022/05/05 16:09:59] ppocr INFO:     distributed : False
    [2022/05/05 16:09:59] ppocr INFO:     epoch_num : 500
    [2022/05/05 16:09:59] ppocr INFO:     eval_batch_step : [0, 2000]
    [2022/05/05 16:09:59] ppocr INFO:     infer_img : ./doc/imgs_words_en/
    [2022/05/05 16:09:59] ppocr INFO:     infer_mode : False
    [2022/05/05 16:09:59] ppocr INFO:     log_smooth_window : 20
    [2022/05/05 16:09:59] ppocr INFO:     max_text_length : 25
    [2022/05/05 16:09:59] ppocr INFO:     pretrained_model : ./output/rec_ppocr_v3/best_accuracy
    [2022/05/05 16:09:59] ppocr INFO:     print_batch_step : 10
    [2022/05/05 16:09:59] ppocr INFO:     save_epoch_step : 3
    [2022/05/05 16:09:59] ppocr INFO:     save_inference_dir : None
    [2022/05/05 16:09:59] ppocr INFO:     save_model_dir : ./output/rec_ppocr_v3
    [2022/05/05 16:09:59] ppocr INFO:     save_res_path : ./output/rec/predicts_ppocrv3.txt
    [2022/05/05 16:09:59] ppocr INFO:     use_gpu : True
    [2022/05/05 16:09:59] ppocr INFO:     use_space_char : True
    [2022/05/05 16:09:59] ppocr INFO:     use_visualdl : False
    [2022/05/05 16:09:59] ppocr INFO: Loss : 
    [2022/05/05 16:09:59] ppocr INFO:     loss_config_list : 
    [2022/05/05 16:09:59] ppocr INFO:         CTCLoss : None
    [2022/05/05 16:09:59] ppocr INFO:         SARLoss : None
    [2022/05/05 16:09:59] ppocr INFO:     name : MultiLoss
    [2022/05/05 16:09:59] ppocr INFO: Metric : 
    [2022/05/05 16:09:59] ppocr INFO:     ignore_space : False
    [2022/05/05 16:09:59] ppocr INFO:     main_indicator : acc
    [2022/05/05 16:09:59] ppocr INFO:     name : RecMetric
    [2022/05/05 16:09:59] ppocr INFO: Optimizer : 
    [2022/05/05 16:09:59] ppocr INFO:     beta1 : 0.9
    [2022/05/05 16:09:59] ppocr INFO:     beta2 : 0.999
    [2022/05/05 16:09:59] ppocr INFO:     lr : 
    [2022/05/05 16:09:59] ppocr INFO:         learning_rate : 0.001
    [2022/05/05 16:09:59] ppocr INFO:         name : Cosine
    [2022/05/05 16:09:59] ppocr INFO:         warmup_epoch : 5
    [2022/05/05 16:09:59] ppocr INFO:     name : Adam
    [2022/05/05 16:09:59] ppocr INFO:     regularizer : 
    [2022/05/05 16:09:59] ppocr INFO:         factor : 3e-05
    [2022/05/05 16:09:59] ppocr INFO:         name : L2
    [2022/05/05 16:09:59] ppocr INFO: PostProcess : 
    [2022/05/05 16:09:59] ppocr INFO:     name : CTCLabelDecode
    [2022/05/05 16:09:59] ppocr INFO: Train : 
    [2022/05/05 16:09:59] ppocr INFO:     dataset : 
    [2022/05/05 16:09:59] ppocr INFO:         data_dir : ./train_data/ic15_data/
    [2022/05/05 16:09:59] ppocr INFO:         ext_op_transform_idx : 1
    [2022/05/05 16:09:59] ppocr INFO:         label_file_list : ['./train_data/ic15_data/rec_gt_train.txt']
    [2022/05/05 16:09:59] ppocr INFO:         name : SimpleDataSet
    [2022/05/05 16:09:59] ppocr INFO:         transforms : 
    [2022/05/05 16:09:59] ppocr INFO:             DecodeImage : 
    [2022/05/05 16:09:59] ppocr INFO:                 channel_first : False
    [2022/05/05 16:09:59] ppocr INFO:                 img_mode : BGR
    [2022/05/05 16:09:59] ppocr INFO:             MultiLabelEncode : None
    [2022/05/05 16:09:59] ppocr INFO:             RecResizeImg : 
    [2022/05/05 16:09:59] ppocr INFO:                 image_shape : [3, 48, 320]
    [2022/05/05 16:09:59] ppocr INFO:             KeepKeys : 
    [2022/05/05 16:09:59] ppocr INFO:                 keep_keys : ['image', 'label_ctc', 'label_sar', 'length', 'valid_ratio']
    [2022/05/05 16:09:59] ppocr INFO:     loader : 
    [2022/05/05 16:09:59] ppocr INFO:         batch_size_per_card : 128
    [2022/05/05 16:09:59] ppocr INFO:         drop_last : True
    [2022/05/05 16:09:59] ppocr INFO:         num_workers : 4
    [2022/05/05 16:09:59] ppocr INFO:         shuffle : True
    [2022/05/05 16:09:59] ppocr INFO: profiler_options : None
    [2022/05/05 16:09:59] ppocr INFO: train with paddle 2.2.2 and device CUDAPlace(0)
    W0505 16:09:59.636674 10950 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
    W0505 16:09:59.641562 10950 device_context.cc:465] device: 0, cuDNN Version: 7.6.
    [2022/05/05 16:10:04] ppocr INFO: load pretrain successful from ./output/rec_ppocr_v3/best_accuracy
    [2022/05/05 16:10:04] ppocr INFO: infer_img: ./doc/imgs_words_en/word_10.png
    [2022/05/05 16:10:04] ppocr INFO: 	 result: PAIN	0.9976047277450562
    [2022/05/05 16:10:04] ppocr INFO: infer_img: ./doc/imgs_words_en/word_116.png
    [2022/05/05 16:10:04] ppocr INFO: 	 result: QBHOUSE	0.9709253311157227
    [2022/05/05 16:10:04] ppocr INFO: infer_img: ./doc/imgs_words_en/word_19.png
    [2022/05/05 16:10:04] ppocr INFO: 	 result: SLOW	0.9971550703048706
    [2022/05/05 16:10:04] ppocr INFO: infer_img: ./doc/imgs_words_en/word_201.png
    [2022/05/05 16:10:04] ppocr INFO: 	 result: HOUSE	0.9960419535636902
    [2022/05/05 16:10:04] ppocr INFO: infer_img: ./doc/imgs_words_en/word_308.png
    [2022/05/05 16:10:04] ppocr INFO: 	 result: LITTLE	0.9545474052429199
    [2022/05/05 16:10:04] ppocr INFO: infer_img: ./doc/imgs_words_en/word_336.png
    [2022/05/05 16:10:04] ppocr INFO: 	 result: SUPER	0.9802681803703308
    [2022/05/05 16:10:04] ppocr INFO: infer_img: ./doc/imgs_words_en/word_401.png
    [2022/05/05 16:10:04] ppocr INFO: 	 result: BURGE	0.827716052532196
    [2022/05/05 16:10:04] ppocr INFO: infer_img: ./doc/imgs_words_en/word_461.png
    [2022/05/05 16:10:04] ppocr INFO: 	 result: SPED	0.912112832069397
    [2022/05/05 16:10:04] ppocr INFO: infer_img: ./doc/imgs_words_en/word_52.png
    [2022/05/05 16:10:04] ppocr INFO: 	 result: Future	0.9685637354850769
    [2022/05/05 16:10:04] ppocr INFO: infer_img: ./doc/imgs_words_en/word_545.png
    [2022/05/05 16:10:04] ppocr INFO: 	 result: EORIT	0.9076364636421204
    [2022/05/05 16:10:04] ppocr INFO: success!
    

    2.4. 模型导出与预测

    inference 模型(paddle.jit.save保存的模型) 一般是模型训练,把模型结构和模型参数保存在文件中的固化模型,多用于预测部署场景。 训练过程中保存的模型是checkpoints模型,保存的只有模型的参数,多用于恢复训练等。 与checkpoints模型相比,inference 模型会额外保存模型的结构信息,在预测部署、加速推理上性能优越,灵活方便,适合于实际系统集成。

    识别模型转inference模型与检测的方式相同,如下:

    In [25]

    # -c 后面设置训练算法的yml配置文件
    # -o 配置可选参数
    # Global.pretrained_model 参数设置待转换的训练模型地址,不用添加文件后缀 .pdmodel,.pdopt或.pdparams。
    # Global.save_inference_dir参数设置转换的模型将保存的地址。
    !python3 tools/export_model.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o Global.pretrained_model=./output/rec_ppocr_v3/best_accuracy Global.character_dict_path=ppocr/utils/en_dict.txt Global.save_inference_dir=./inference/ch_PP-OCRv3_rec/
    
    W0505 16:10:41.224627 11024 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
    W0505 16:10:41.229585 11024 device_context.cc:465] device: 0, cuDNN Version: 7.6.
    [2022/05/05 16:10:45] ppocr INFO: load pretrain successful from ./output/rec_ppocr_v3/best_accuracy
    [2022/05/05 16:10:47] ppocr INFO: inference model is saved to ./inference/ch_PP-OCRv3_rec/inference
    

    **注意:**如果您是在自己的数据集上训练的模型,并且调整了中文字符的字典文件,请注意修改配置文件中的character_dict_path为自定义字典文件。

    转换成功后,在目录下有三个文件:

    inference/ch_PP-OCRv3_rec/
        ├── inference.pdiparams         # 识别inference模型的参数文件
        ├── inference.pdiparams.info    # 识别inference模型的参数信息,可忽略
        └── inference.pdmodel           # 识别inference模型的program文件
    
    • 自定义模型推理

      如果训练时修改了文本的字典,在使用inference模型预测时,需要通过--rec_char_dict_path指定使用的字典路径

    • 使用PP-OCRv3识别进行推理时,不需要使用--rec_algorithm指定算法名称,使用默认的推理方式即为PP-OCRv3识别的推理过程。

    In [27]

    # GPU预测
    !python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir=./inference/ch_PP-OCRv3_rec/ --rec_char_dict_path=ppocr/utils/en_dict.txt
    # CPU预测
    !python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir=./inference/ch_PP-OCRv3_rec/ --rec_char_dict_path=ppocr/utils/en_dict.txt --use_gpu=False
    [2022/05/05 16:13:50] ppocr INFO: Predicts of ./doc/imgs_words_en/word_336.png:('SUPER', 0.9802668690681458)
    [2022/05/05 16:13:53] ppocr INFO: Predicts of ./doc/imgs_words_en/word_336.png:('SUPER', 0.9802700281143188)
    

    推理预测的图片为:

    3. FAQ

    Q1: 训练模型转inference 模型之后预测效果不一致?

    A:此类问题出现较多,问题多是trained model预测时候的预处理、后处理参数和inference model预测的时候的预处理、后处理参数不一致导致

    Q2: 如何自定义字典、修改backbone、训练多语言模型?

    A:请参考PP-OCRv3识别详细教程

    4. 直播预告

    🔥2022.5.11~13 每晚8:30【超强OCR技术详解与产业应用实战】三日直播课

    • 11日:开源最强OCR系统PP-OCRv3揭秘
    • 12日:云边端全覆盖的PP-OCRv3训练部署实战
    • 13日:OCR产业应用全流程拆解与实战 赶紧扫码报名吧!
    转自AI Studio,原文链接:【官方】十分钟完成 PP-OCRv3 识别全流程实战 - 飞桨AI Studio十分钟完成 PP-OCRv3 识别全流程实战项目地址:PaddleOCR github 地址:https://github.com/PaddlePaddle/PaddleOCRPaddleOCR是百度开源的超轻量级OCR模型库,提供了数十种文本检测、识别模型,旨在打造一套丰富、领先、实用的文字检测、识别模型/工具库,助力使用者训练出更好的模型,并应用落地。同时PaddleOC.
    接上一篇:检测模型训练(二) PaddlePaddle环境的构建详见专栏内其他文章。 本文使用MobileNetV3_large_x0_5_pretrained预训练检测模型,评估该检测模型在icdar2015上的检测效果。 icdar2015检测数据集如上图所示。 首先修改配置文件,文件路径如下图所示 这是MobileNetV3_large_x0_5_pretrained模型的配置文件,如果用的是其他模型,请使用其他的.yml配置文件。 打开.yml配置文件,在Architecture标签下可以看到
    登录飞桨的官网下载最新的paddle,官网地址:飞桨PaddlePaddle-源于产业实践的开源深度学习平台 选择合适的CUDA版本,然后会在下面生成对应的命令。 然后,复制命令即可 conda install paddlepaddle-gpu==2.2.2 cudatoolkit=11.2 -c https
    智能驾驶 车牌检测和识别(五)《C++实现车牌检测和识别(可实时车牌识别)》:https://blog.csdn.net/guyuealian/article/details/128704276 更多项目《智能驾驶 车牌检测和识别》系列文章请参考: 智能驾驶 车牌检测和识别(一)《CCPD车牌数据集》:https://blog.csdn.net/guyuealian/article/details/128704181 智能驾驶 车牌检测和识别(二)《YOLOv5实现车牌检测(含车牌检测数据集和训练代码)》:https://blog.csdn.net/guyuealian/article/details/128704068 智能驾驶 车牌检测和识别(三)《CRNN和LPRNet实现车牌识别(含车牌识别数据集和训练代码)》:https://blog.csdn.net/guyuealian/article/details/128704209 智能驾驶 车牌检测和识别(四)《Android实现车牌检测和识别(可实时车牌识别)》:https://blog.csdn.net/guyueali
    1、本文基于上一篇文章:关于提高OCR识别准确率的一些优化(二)进行了一些优化,将图片方向识别准确率提升至96%。 2、在阅读这篇文章之前,建议先看上一篇,以便更好的理解 一、优化思路 1、在上一篇文章中,我们使用paddleocr的方向分类器直接判别图片方向,发现效果并不怎么好,而且效率也很低,识别一张图片平均耗时2s。 2、鉴于以上存在的问题,于是想出了一个新的优化方案: 使用paddleocr的文本矩形框检测得到所 远程传导的方式:在PC的终端中输入 scp -r <PC端文件路径> <ARM端用户名>@<ARMIP>:<ARM保存路径> 例如:scp -r G:/opencv ubuntu@192.168.233.1.3:/home/ubuntu/ (加一个-r是因为远程上传的是含有多级目录的文件夹) ARM端的环境变量需要添加我们的代码文件路径(因为用到了某些包在
    PP-OCR: A Practical Ultra Lightweight OCR System 论文地址:https://arxiv.org/abs/2009.09941 代码地址:https://github.com/PaddlePaddle/PaddleOCR PP-OCR是一个实用的超轻量中英文OCR系统,是针对中英文OCR问题,对最新的文本检测算法 Differentiable Binarization (DB) 和经典的文本识别算法CRNN的能力充分挖掘,虽然没有理论创新,但是从骨干.
    解决报错:pytesseract.pytesseract.TesseractError 安装后的默认文件路径为(这里使用的是Windows版本):C:\Program Files (x86)\Tesseract-OCR\ 然后将源码中的: tesseract_cmd = 'tesseract' tesseract_cmd = r'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'
    ch_pp-ocrv3_rec_train是一个用于中文文本识别训练的开源框架,它基于PyTorch实现,提供了多种预处理,数据增强和模型优化的方法,可以用于训练自己的中文OCR模型。其训练过程主要分为数据准备、模型定义、模型训练和模型评估几个步骤,能够构建出高精度的中文OCR模型,为OCR在实际应用中提供了有力的支持。 而ch_ppocr_mobile_v2.0_rec_pre是一个移动端中文文本识别预测模型,主要针对手机等移动端设备,采用了轻量化的模型结构和精简的参数,保证了高效的预测速度和较高的识别准确性。它支持的输入图像类型包括常见的jpg、png等格式,可以实现图片批量处理和在线图片预测等功能,适合于移动端OCR场景中的文字识别任务。 综上,ch_pp-ocrv3_rec_train和ch_ppocr_mobile_v2.0_rec_pre分别是中文OCR训练和预测的工具。ch_pp-ocrv3_rec_train可以用于训练自己的OCR模型,达到高精度的识别效果;ch_ppocr_mobile_v2.0_rec_pre则可以用于移动端OCR应用中,快速、准确地识别图片上的中文文字。