十分钟完成 PP-OCRv3 识别全流程实战
项目地址:PaddleOCR github 地址:
https://github.com/PaddlePaddle/PaddleOCR
PaddleOCR是百度开源的超轻量级OCR模型库,提供了数十种文本检测、识别模型,旨在打造一套丰富、领先、实用的文字检测、识别模型/工具库,助力使用者训练出更好的模型,并应用落地。同时PaddleOCR也几经更新,
🔥在2022.5.9 发布最新版本PaddleOCR
release/2.5
:
-
发布
PP-OCRv3
,速度可比情况下,中文场景效果相比于PP-OCRv2再提升5%,英文场景提升11%,80语种多语言模型平均识别准确率提升5%以上;
-
发布半自动标注工具
PPOCRLabelv2
:新增表格文字图像、图像关键信息抽取任务和不规则文字图像的标注功能;
-
发布OCR产业落地工具集:打通22种训练部署软硬件环境与方式,覆盖企业90%的训练部署环境需求;
-
发布交互式OCR开源电子书
《动手学OCR》
,覆盖OCR全栈技术的前沿理论与代码实践,并配套教学视频。
本教程旨在帮助使用者快速了解PP-OCRv3识别,并掌握其使用方式,包括:
-
PP-OCR3识别快速使用
-
十分钟完成文本识别模型的训练和预测方式
最后带来PP-OCRv3直播预告,敬请期待!
1 PP-OCRv3识别快速使用
本节介绍如何使用PaddleOCR的轻量级模型完成文本识别的任务。
1.1 准备运行环境
首先,安装PaddleOCR的依赖库。
In [ ]
import os
# 修改代码运行的默认目录为 /home/aistudio/
os.chdir("/home/aistudio")
# 如果git clone方式下载速度慢,您可直接在github中下载PaddleOCR的dygraph分支的zip压缩文件,然后上传到工作环境中解压使用
#!unzip PaddleOCR-dygraph.zip
!git clone -b dygraph https://github.com/PaddlePaddle/PaddleOCR.git
In [ ]
# 安装依赖库
os.chdir("/home/aistudio/PaddleOCR")
!pip install -r requirements.txt -i https://mirror.baidu.com/pypi/simple
1.2. 快速预测文字内容
测试图片:
In [8]
import os
os.chdir('/home/aistudio/PaddleOCR')
# 也可安装paddleocr whl包进行快速使用
# !pip install paddleocr
from paddleocr import PaddleOCR
ocr = PaddleOCR() # need to run only once to download and load model into memory
img_path = '/home/aistudio/PaddleOCR/doc/imgs_words/en/word_1.png'
result = ocr.ocr(img_path, det=False)
for line in result:
print(line)
0%| | 0.00/3.67M [00:00<?, ?iB/s]
download https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar to /home/aistudio/.paddleocr/whl/det/ch/ch_PP-OCRv3_det_infer/ch_PP-OCRv3_det_infer.tar
100%|██████████| 3.67M/3.67M [00:00<00:00, 6.36MiB/s]
0%| | 0.00/11.9M [00:00<?, ?iB/s]
download https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar to /home/aistudio/.paddleocr/whl/rec/ch/ch_PP-OCRv3_rec_infer/ch_PP-OCRv3_rec_infer.tar
100%|██████████| 11.9M/11.9M [00:00<00:00, 42.0MiB/s]
19%|█▉ | 279k/1.45M [00:00<00:00, 2.67MiB/s]
download https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar to /home/aistudio/.paddleocr/whl/cls/ch_ppocr_mobile_v2.0_cls_infer/ch_ppocr_mobile_v2.0_cls_infer.tar
100%|██████████| 1.45M/1.45M [00:00<00:00, 4.66MiB/s]
[2022/05/05 14:53:56] ppocr DEBUG: Namespace(alpha=1.0, benchmark=False, beta=1.0, cls_batch_num=6, cls_image_shape='3, 48, 192', cls_model_dir='/home/aistudio/.paddleocr/whl/cls/ch_ppocr_mobile_v2.0_cls_infer', cls_thresh=0.9, cpu_threads=10, crop_res_save_dir='./output', det=True, det_algorithm='DB', det_db_box_thresh=0.6, det_db_score_mode='fast', det_db_thresh=0.3, det_db_unclip_ratio=1.5, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_east_score_thresh=0.8, det_fce_box_type='poly', det_limit_side_len=960, det_limit_type='max', det_model_dir='/home/aistudio/.paddleocr/whl/det/ch/ch_PP-OCRv3_det_infer', det_pse_box_thresh=0.85, det_pse_box_type='quad', det_pse_min_area=16, det_pse_scale=1, det_pse_thresh=0, det_sast_nms_thresh=0.2, det_sast_polygon=False, det_sast_score_thresh=0.5, draw_img_save_dir='./inference_results', drop_score=0.5, e2e_algorithm='PGNet', e2e_char_dict_path='./ppocr/utils/ic15_dict.txt', e2e_limit_side_len=768, e2e_limit_type='max', e2e_model_dir=None, e2e_pgnet_mode='fast', e2e_pgnet_score_thresh=0.5, e2e_pgnet_valid_set='totaltext', enable_mkldnn=False, fourier_degree=5, gpu_mem=500, help='==SUPPRESS==', image_dir=None, ir_optim=True, label_list=['0', '180'], lang='ch', layout=True, layout_label_map=None, layout_path_model='lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config', max_batch_size=10, max_text_length=25, min_subgraph_size=15, mode='structure', ocr=True, ocr_version='PP-OCRv3', output='./output', precision='fp32', process_id=0, rec=True, rec_algorithm='CRNN', rec_batch_num=6, rec_char_dict_path='/home/aistudio/PaddleOCR/ppocr/utils/ppocr_keys_v1.txt', rec_image_shape='3, 32, 320', rec_model_dir='/home/aistudio/.paddleocr/whl/rec/ch/ch_PP-OCRv3_rec_infer', save_crop_res=False, save_log_path='./log_output/', scales=[8, 16, 32], show_log=True, structure_version='PP-STRUCTURE', table=True, table_char_dict_path=None, table_max_len=488, table_model_dir=None, total_process_num=1, type='ocr', use_angle_cls=False, use_dilation=False, use_gpu=True, use_mp=False, use_onnx=False, use_pdserving=False, use_space_char=True, use_tensorrt=False, vis_font_path='./doc/fonts/simfang.ttf', warmup=False)
[2022/05/05 14:53:59] ppocr WARNING: Since the angle classifier is not initialized, the angle classifier will not be uesd during the forward process
('JOINT', 0.9179949760437012)
2. 训练文字识别模型
本节提供了PaddleOCR文本识别任务的全流程指南,包括数据准备、模型训练、调优、评估、预测,各个阶段的详细说明:
2.1. 数据准备
2.1.1 自定义数据集
下面以通用数据集为例, 介绍如何准备数据集:
建议将训练图片放入同一个文件夹,并用一个txt文件(rec_gt_train.txt)记录图片路径和标签,txt文件里的内容如下:
注意: txt文件中默认请将图片路径和图片标签用 \t 分割,如用其他方式分割将造成训练报错。
" 图像文件名 图像标注信息 "
train_data/rec/train/word_001.jpg 简单可依赖
train_data/rec/train/word_002.jpg 用科技让复杂的世界更简单
最终训练集应有如下文件结构:
|-train_data
|-rec
|- rec_gt_train.txt
|- train
|- word_001.png
|- word_002.jpg
|- word_003.jpg
| ...
除上述单张图像为一行格式之外,PaddleOCR也支持对离线增广后的数据进行训练,为了防止相同样本在同一个batch中被多次采样,我们可以将相同标签对应的图片路径写在一行中,以列表的形式给出,在训练中,PaddleOCR会随机选择列表中的一张图片进行训练。对应地,标注文件的格式如下。
["11.jpg", "12.jpg"] 简单可依赖
["21.jpg", "22.jpg", "23.jpg"] 用科技让复杂的世界更简单
3.jpg ocr
上述示例标注文件中,"11.jpg"和"12.jpg"的标签相同,都是简单可依赖
,在训练的时候,对于该行标注,会随机选择其中的一张图片进行训练。
同训练集类似,验证集也需要提供一个包含所有图片的文件夹(test)和一个rec_gt_test.txt,验证集的结构如下所示:
|-train_data
|-rec
|- rec_gt_test.txt
|- test
|- word_001.jpg
|- word_002.jpg
|- word_003.jpg
| ...
2.1.2 数据下载
若您本地没有数据集,可以在官网下载 ICDAR2015 数据,用于快速验证。也可以参考DTRB ,下载 benchmark 所需的lmdb格式数据集。
如果你使用的是icdar2015的公开数据集,PaddleOCR 提供了一份用于训练 ICDAR2015 数据集的标签文件,通过以下方式下载:
# 训练集标签
wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_train.txt
# 测试集标签
wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_test.txt
PaddleOCR 也提供了数据格式转换脚本,可以将ICDAR官网 label 转换为PaddleOCR支持的数据格式。 数据转换工具在 ppocr/utils/gen_label.py
, 这里以训练集为例:
# 将官网下载的标签文件转换为 rec_gt_label.txt
python gen_label.py --mode="rec" --input_path="{path/of/origin/label}" --output_label="rec_gt_label.txt"
数据样式格式如下,(a)为原始图片,(b)为每张图片对应的 Ground Truth 文本文件:
我们在 ~/data/data34824/ 目录下准备了数据集,可以使用如下指令解压数据文件。
In [9]
!mkdir train_data && cd ./train_data/ && mkdir -p ic15_data && cd ic15_data && cp ~/data/data34824/ic15_rec.zip ./ && unzip -o -q ic15_rec.zip && tar xf ic15.tar
2.2. 开始训练
PaddleOCR提供了训练脚本、评估脚本和预测脚本,本节将以 PP-OCRv3中文识别模型为例:
首先下载pretrain model,您可以下载训练好的模型在 icdar2015 数据上进行finetune
In [10]
!wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar
# 解压模型参数
!cd ./pretrain_models/ && tar -xf ch_PP-OCRv3_rec_train.tar && rm -rf ch_PP-OCRv3_rec_train.tar
--2022-05-05 14:54:16-- https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar
正在解析主机 paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)... 182.61.200.229, 182.61.200.195, 2409:8c04:1001:1002:0:ff:b001:368a
正在连接 paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)|182.61.200.229|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度: 287467520 (274M) [application/x-tar]
正在保存至: “./pretrain_models/ch_PP-OCRv3_rec_train.tar”
ch_PP-OCRv3_rec_tra 100%[===================>] 274.15M 47.9MB/s in 8.6s
2022-05-05 14:54:24 (31.9 MB/s) - 已保存 “./pretrain_models/ch_PP-OCRv3_rec_train.tar” [287467520/287467520])
2.2.1 启动训练
需将configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml
中的训练和评估数据集路径修改为ic15的数据集路径:
Train:
dataset:
name: SimpleDataSet
data_dir: ./train_data/ic15_data/
ext_op_transform_idx: 1
label_file_list: ["./train_data/ic15_data/rec_gt_train.txt"]
......
Eval:
dataset:
name: SimpleDataSet
data_dir: ./train_data/ic15_data
label_file_list: ["./train_data/ic15_data/rec_gt_test.txt"]
如果您安装的是cpu版本,请将配置文件中的 use_gpu
字段修改为false
启动训练命令很简单,指定好配置文件即可。另外在命令行中可以通过 -o 修改配置文件中的参数值。启动训练命令如下所示
- Global.pretrained_model: 加载的预训练模型路径
- Global.character_dict_path : 字典路径(这里只支持26个小写字母+数字)
- Global.eval_batch_step : 评估频率
- Global.epoch_num: 总训练轮数
如果训练速度慢,可去掉数据增强,但是当数据量较少,应用场景复杂时,建议保留数据增强,可提高模型泛化性和精度。
Train:
dataset:
name: SimpleDataSet
data_dir: ./train_data/ic15_data/
ext_op_transform_idx: 1
label_file_list: ["./train_data/ic15_data/rec_gt_train.txt"]
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
# - RecConAug:
# prob: 0.5
# ext_data_num: 2
# image_shape: [48, 320, 3]
# - RecAug:
In [ ]
# 由于预训练模型提供的是蒸馏模型,需先将Student模型的参数提取出
import paddle
params = paddle.load('./pretrain_models/ch_PP-OCRv3_rec_train/best_accuracy' + '.pdparams')
new_state_dict = {}
for k1 in params.keys():
if 'Student.' in k1:
new_state_dict[k1.replace('Student.','')] = params[k1]
# print(k1)
paddle.save(new_state_dict, './pretrain_models/ch_PP-OCRv3_rec_train/best_accuracy'+'_new.pdparams')
# CPU 训练
# 训练icdar15英文数据 训练日志会自动保存为 "{save_model_dir}" 下的train.log
# !python3 tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml \
# -o Global.pretrained_model=./pretrain_models/ch_PP-OCRv3_rec_train/best_accuracy_new Global.use_gpu=False \
# Global.character_dict_path=ppocr/utils/en_dict.txt \
# Global.eval_batch_step=[0,200] \
# Global.epoch_num=40
# GPU训练 支持单卡,多卡训练
#单卡训练
!python3 tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml \
-o Global.pretrained_model=./pretrain_models/ch_PP-OCRv3_rec_train/best_accuracy_new\
Global.character_dict_path=ppocr/utils/en_dict.txt \
Global.eval_batch_step=[0,200] \
Global.epoch_num=40
#多卡训练,通过--gpus参数指定卡号
#!python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o Global.pretrained_model=./pretrain_models/ch_PP-OCRv3_rec_train/best_accuracy
2.3. 模型评估与预测
2.3.1 评估
训练中模型参数默认保存在Global.save_model_dir
目录下。在评估指标时,需要设置Global.checkpoints
指向保存的参数文件。评估数据集可以通过 configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml
修改Eval中的 label_file_path
设置。
In [22]
# GPU 评估
!python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o Global.checkpoints=./output/rec_ppocr_v3/best_accuracy Global.character_dict_path=ppocr/utils/en_dict.txt
# CPU 评估, Global.checkpoints 为待测权重
# !python3 tools/eval.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o Global.checkpoints=./output/rec_ppocr_v3/best_accuracy Global.use_gpu=False Global.character_dict_path=ppocr/utils/en_dict.txt
----------- Configuration Arguments -----------
backend: auto
elastic_server: None
force: False
gpus: 0
heter_devices:
heter_worker_num: None
heter_workers:
host: None
http_port: None
ips: 127.0.0.1
job_id: None
log_dir: log
np: None
nproc_per_node: None
run_mode: None
scale: 0
server_num: None
servers:
training_script: tools/eval.py
training_script_args: ['-c', 'configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml', '-o', 'Global.checkpoints=./output/rec_ppocr_v3/best_accuracy', 'Global.character_dict_path=ppocr/utils/en_dict.txt']
worker_num: None
workers:
------------------------------------------------
WARNING 2022-05-05 16:08:53,089 launch.py:423] Not found distinct arguments and compiled with cuda or xpu. Default use collective mode
launch train in GPU mode!
INFO 2022-05-05 16:08:53,095 launch_utils.py:528] Local start 1 processes. First process distributed environment info (Only For Debug):
+=======================================================================================+
| Distributed Envs Value |
+---------------------------------------------------------------------------------------+
| PADDLE_TRAINER_ID 0 |
| PADDLE_CURRENT_ENDPOINT 127.0.0.1:55453 |
| PADDLE_TRAINERS_NUM 1 |
| PADDLE_TRAINER_ENDPOINTS 127.0.0.1:55453 |
| PADDLE_RANK_IN_NODE 0 |
| PADDLE_LOCAL_DEVICE_IDS 0 |
| PADDLE_WORLD_DEVICE_IDS 0 |
| FLAGS_selected_gpus 0 |
| FLAGS_selected_accelerators 0 |
+=======================================================================================+
INFO 2022-05-05 16:08:53,095 launch_utils.py:532] details abouts PADDLE_TRAINER_ENDPOINTS can be found in log/endpoints.log, and detail running logs maybe found in log/workerlog.0
launch proc_id:10834 idx:0
[2022/05/05 16:08:54] ppocr INFO: Architecture :
[2022/05/05 16:08:54] ppocr INFO: Backbone :
[2022/05/05 16:08:54] ppocr INFO: last_conv_stride : [1, 2]
[2022/05/05 16:08:54] ppocr INFO: last_pool_type : avg
[2022/05/05 16:08:54] ppocr INFO: name : MobileNetV1Enhance
[2022/05/05 16:08:54] ppocr INFO: scale : 0.5
[2022/05/05 16:08:54] ppocr INFO: Head :
[2022/05/05 16:08:54] ppocr INFO: head_list :
[2022/05/05 16:08:54] ppocr INFO: CTCHead :
[2022/05/05 16:08:54] ppocr INFO: Head :
[2022/05/05 16:08:54] ppocr INFO: fc_decay : 1e-05
[2022/05/05 16:08:54] ppocr INFO: Neck :
[2022/05/05 16:08:54] ppocr INFO: depth : 2
[2022/05/05 16:08:54] ppocr INFO: dims : 64
[2022/05/05 16:08:54] ppocr INFO: hidden_dims : 120
[2022/05/05 16:08:54] ppocr INFO: name : svtr
[2022/05/05 16:08:54] ppocr INFO: use_guide : True
[2022/05/05 16:08:54] ppocr INFO: SARHead :
[2022/05/05 16:08:54] ppocr INFO: enc_dim : 512
[2022/05/05 16:08:54] ppocr INFO: max_text_length : 25
[2022/05/05 16:08:54] ppocr INFO: name : MultiHead
[2022/05/05 16:08:54] ppocr INFO: Transform : None
[2022/05/05 16:08:54] ppocr INFO: algorithm : SVTR
[2022/05/05 16:08:54] ppocr INFO: model_type : rec
[2022/05/05 16:08:54] ppocr INFO: Eval :
[2022/05/05 16:08:54] ppocr INFO: dataset :
[2022/05/05 16:08:54] ppocr INFO: data_dir : ./train_data/ic15_data
[2022/05/05 16:08:54] ppocr INFO: label_file_list : ['./train_data/ic15_data/rec_gt_test.txt']
[2022/05/05 16:08:54] ppocr INFO: name : SimpleDataSet
[2022/05/05 16:08:54] ppocr INFO: transforms :
[2022/05/05 16:08:54] ppocr INFO: DecodeImage :
[2022/05/05 16:08:54] ppocr INFO: channel_first : False
[2022/05/05 16:08:54] ppocr INFO: img_mode : BGR
[2022/05/05 16:08:54] ppocr INFO: MultiLabelEncode : None
[2022/05/05 16:08:54] ppocr INFO: RecResizeImg :
[2022/05/05 16:08:54] ppocr INFO: image_shape : [3, 48, 320]
[2022/05/05 16:08:54] ppocr INFO: KeepKeys :
[2022/05/05 16:08:54] ppocr INFO: keep_keys : ['image', 'label_ctc', 'label_sar', 'length', 'valid_ratio']
[2022/05/05 16:08:54] ppocr INFO: loader :
[2022/05/05 16:08:54] ppocr INFO: batch_size_per_card : 128
[2022/05/05 16:08:54] ppocr INFO: drop_last : False
[2022/05/05 16:08:54] ppocr INFO: num_workers : 4
[2022/05/05 16:08:54] ppocr INFO: shuffle : False
[2022/05/05 16:08:54] ppocr INFO: Global :
[2022/05/05 16:08:54] ppocr INFO: cal_metric_during_train : True
[2022/05/05 16:08:54] ppocr INFO: character_dict_path : ppocr/utils/en_dict.txt
[2022/05/05 16:08:54] ppocr INFO: checkpoints : ./output/rec_ppocr_v3/best_accuracy
[2022/05/05 16:08:54] ppocr INFO: debug : False
[2022/05/05 16:08:54] ppocr INFO: distributed : False
[2022/05/05 16:08:54] ppocr INFO: epoch_num : 500
[2022/05/05 16:08:54] ppocr INFO: eval_batch_step : [0, 2000]
[2022/05/05 16:08:54] ppocr INFO: infer_img : doc/imgs_words/ch/word_1.jpg
[2022/05/05 16:08:54] ppocr INFO: infer_mode : False
[2022/05/05 16:08:54] ppocr INFO: log_smooth_window : 20
[2022/05/05 16:08:54] ppocr INFO: max_text_length : 25
[2022/05/05 16:08:54] ppocr INFO: pretrained_model : None
[2022/05/05 16:08:54] ppocr INFO: print_batch_step : 10
[2022/05/05 16:08:54] ppocr INFO: save_epoch_step : 3
[2022/05/05 16:08:54] ppocr INFO: save_inference_dir : None
[2022/05/05 16:08:54] ppocr INFO: save_model_dir : ./output/rec_ppocr_v3
[2022/05/05 16:08:54] ppocr INFO: save_res_path : ./output/rec/predicts_ppocrv3.txt
[2022/05/05 16:08:54] ppocr INFO: use_gpu : True
[2022/05/05 16:08:54] ppocr INFO: use_space_char : True
[2022/05/05 16:08:54] ppocr INFO: use_visualdl : False
[2022/05/05 16:08:54] ppocr INFO: Loss :
[2022/05/05 16:08:54] ppocr INFO: loss_config_list :
[2022/05/05 16:08:54] ppocr INFO: CTCLoss : None
[2022/05/05 16:08:54] ppocr INFO: SARLoss : None
[2022/05/05 16:08:54] ppocr INFO: name : MultiLoss
[2022/05/05 16:08:54] ppocr INFO: Metric :
[2022/05/05 16:08:54] ppocr INFO: ignore_space : False
[2022/05/05 16:08:54] ppocr INFO: main_indicator : acc
[2022/05/05 16:08:54] ppocr INFO: name : RecMetric
[2022/05/05 16:08:54] ppocr INFO: Optimizer :
[2022/05/05 16:08:54] ppocr INFO: beta1 : 0.9
[2022/05/05 16:08:54] ppocr INFO: beta2 : 0.999
[2022/05/05 16:08:54] ppocr INFO: lr :
[2022/05/05 16:08:54] ppocr INFO: learning_rate : 0.001
[2022/05/05 16:08:54] ppocr INFO: name : Cosine
[2022/05/05 16:08:54] ppocr INFO: warmup_epoch : 5
[2022/05/05 16:08:54] ppocr INFO: name : Adam
[2022/05/05 16:08:54] ppocr INFO: regularizer :
[2022/05/05 16:08:54] ppocr INFO: factor : 3e-05
[2022/05/05 16:08:54] ppocr INFO: name : L2
[2022/05/05 16:08:54] ppocr INFO: PostProcess :
[2022/05/05 16:08:54] ppocr INFO: name : CTCLabelDecode
[2022/05/05 16:08:54] ppocr INFO: Train :
[2022/05/05 16:08:54] ppocr INFO: dataset :
[2022/05/05 16:08:54] ppocr INFO: data_dir : ./train_data/ic15_data/
[2022/05/05 16:08:54] ppocr INFO: ext_op_transform_idx : 1
[2022/05/05 16:08:54] ppocr INFO: label_file_list : ['./train_data/ic15_data/rec_gt_train.txt']
[2022/05/05 16:08:54] ppocr INFO: name : SimpleDataSet
[2022/05/05 16:08:54] ppocr INFO: transforms :
[2022/05/05 16:08:54] ppocr INFO: DecodeImage :
[2022/05/05 16:08:54] ppocr INFO: channel_first : False
[2022/05/05 16:08:54] ppocr INFO: img_mode : BGR
[2022/05/05 16:08:54] ppocr INFO: MultiLabelEncode : None
[2022/05/05 16:08:54] ppocr INFO: RecResizeImg :
[2022/05/05 16:08:54] ppocr INFO: image_shape : [3, 48, 320]
[2022/05/05 16:08:54] ppocr INFO: KeepKeys :
[2022/05/05 16:08:54] ppocr INFO: keep_keys : ['image', 'label_ctc', 'label_sar', 'length', 'valid_ratio']
[2022/05/05 16:08:54] ppocr INFO: loader :
[2022/05/05 16:08:54] ppocr INFO: batch_size_per_card : 128
[2022/05/05 16:08:54] ppocr INFO: drop_last : True
[2022/05/05 16:08:54] ppocr INFO: num_workers : 4
[2022/05/05 16:08:54] ppocr INFO: shuffle : True
[2022/05/05 16:08:54] ppocr INFO: profiler_options : None
[2022/05/05 16:08:54] ppocr INFO: train with paddle 2.2.2 and device CUDAPlace(0)
[2022/05/05 16:08:54] ppocr INFO: Initialize indexs of datasets:['./train_data/ic15_data/rec_gt_test.txt']
W0505 16:08:54.870386 10834 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0505 16:08:54.875574 10834 device_context.cc:465] device: 0, cuDNN Version: 7.6.
[2022/05/05 16:08:59] ppocr INFO: resume from ./output/rec_ppocr_v3/best_accuracy
[2022/05/05 16:08:59] ppocr INFO: metric in ckpt ***************
[2022/05/05 16:08:59] ppocr INFO: acc:0.5551275851943784
[2022/05/05 16:08:59] ppocr INFO: norm_edit_dis:0.8100207578002598
[2022/05/05 16:08:59] ppocr INFO: fps:1855.1658462248745
[2022/05/05 16:08:59] ppocr INFO: best_epoch:36
[2022/05/05 16:08:59] ppocr INFO: start_epoch:37
eval model:: 0%| | 0/17 [00:00<?, ?it/s]
eval model:: 6%|▌ | 1/17 [00:00<00:14, 1.08it/s]
eval model:: 18%|█▊ | 3/17 [00:01<00:09, 1.49it/s]
eval model:: 29%|██▉ | 5/17 [00:01<00:05, 2.04it/s]
eval model:: 41%|████ | 7/17 [00:01<00:03, 2.73it/s]
eval model:: 53%|█████▎ | 9/17 [00:01<00:02, 3.59it/s]
eval model:: 65%|██████▍ | 11/17 [00:01<00:01, 4.72it/s]
eval model:: 76%|███████▋ | 13/17 [00:01<00:00, 6.07it/s]
eval model:: 88%|████████▊ | 15/17 [00:01<00:00, 7.59it/s]
eval model:: 100%|██████████| 17/17 [00:02<00:00, 8.22it/s]
[2022/05/05 16:09:01] ppocr INFO: metric eval ***************
[2022/05/05 16:09:01] ppocr INFO: acc:0.5551275851943784
[2022/05/05 16:09:01] ppocr INFO: norm_edit_dis:0.8100207578002598
[2022/05/05 16:09:01] ppocr INFO: fps:2231.024193160108
INFO 2022-05-05 16:09:05,136 launch.py:311] Local processes completed.
2.3.2 测试识别效果
使用 PaddleOCR 训练好的模型,可以通过以下脚本进行快速预测。
默认预测图片存储在 infer_img
里,通过 -o Global.checkpoints
加载训练好的参数文件:
根据配置文件中设置的的 save_model_dir
和 save_epoch_step
字段,会有以下几种参数被保存下来:
output/rec/
├── best_accuracy.pdopt
├── best_accuracy.pdparams
├── best_accuracy.states
├── config.yml
├── iter_epoch_3.pdopt
├── iter_epoch_3.pdparams
├── iter_epoch_3.states
├── latest.pdopt
├── latest.pdparams
├── latest.states
└── train.log
其中 best_accuracy.* 是评估集上的最优模型;iter_epoch_x.* 是以 save_epoch_step
为间隔保存下来的模型;latest.* 是最后一个epoch的模型。
In [23]
# 预测英文结果
# GPU预测
!python3 tools/infer_rec.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o Global.pretrained_model=./output/rec_ppocr_v3/best_accuracy Global.character_dict_path=ppocr/utils/en_dict.txt Global.infer_img=doc/imgs_words/en/word_1.png
[2022/05/05 16:09:44] ppocr INFO: Architecture :
[2022/05/05 16:09:44] ppocr INFO: Backbone :
[2022/05/05 16:09:44] ppocr INFO: last_conv_stride : [1, 2]
[2022/05/05 16:09:44] ppocr INFO: last_pool_type : avg
[2022/05/05 16:09:44] ppocr INFO: name : MobileNetV1Enhance
[2022/05/05 16:09:44] ppocr INFO: scale : 0.5
[2022/05/05 16:09:44] ppocr INFO: Head :
[2022/05/05 16:09:44] ppocr INFO: head_list :
[2022/05/05 16:09:44] ppocr INFO: CTCHead :
[2022/05/05 16:09:44] ppocr INFO: Head :
[2022/05/05 16:09:44] ppocr INFO: fc_decay : 1e-05
[2022/05/05 16:09:44] ppocr INFO: Neck :
[2022/05/05 16:09:44] ppocr INFO: depth : 2
[2022/05/05 16:09:44] ppocr INFO: dims : 64
[2022/05/05 16:09:44] ppocr INFO: hidden_dims : 120
[2022/05/05 16:09:44] ppocr INFO: name : svtr
[2022/05/05 16:09:44] ppocr INFO: use_guide : True
[2022/05/05 16:09:44] ppocr INFO: SARHead :
[2022/05/05 16:09:44] ppocr INFO: enc_dim : 512
[2022/05/05 16:09:44] ppocr INFO: max_text_length : 25
[2022/05/05 16:09:44] ppocr INFO: name : MultiHead
[2022/05/05 16:09:44] ppocr INFO: Transform : None
[2022/05/05 16:09:44] ppocr INFO: algorithm : SVTR
[2022/05/05 16:09:44] ppocr INFO: model_type : rec
[2022/05/05 16:09:44] ppocr INFO: Eval :
[2022/05/05 16:09:44] ppocr INFO: dataset :
[2022/05/05 16:09:44] ppocr INFO: data_dir : ./train_data/ic15_data
[2022/05/05 16:09:44] ppocr INFO: label_file_list : ['./train_data/ic15_data/rec_gt_test.txt']
[2022/05/05 16:09:44] ppocr INFO: name : SimpleDataSet
[2022/05/05 16:09:44] ppocr INFO: transforms :
[2022/05/05 16:09:44] ppocr INFO: DecodeImage :
[2022/05/05 16:09:44] ppocr INFO: channel_first : False
[2022/05/05 16:09:44] ppocr INFO: img_mode : BGR
[2022/05/05 16:09:44] ppocr INFO: MultiLabelEncode : None
[2022/05/05 16:09:44] ppocr INFO: RecResizeImg :
[2022/05/05 16:09:44] ppocr INFO: image_shape : [3, 48, 320]
[2022/05/05 16:09:44] ppocr INFO: KeepKeys :
[2022/05/05 16:09:44] ppocr INFO: keep_keys : ['image', 'label_ctc', 'label_sar', 'length', 'valid_ratio']
[2022/05/05 16:09:44] ppocr INFO: loader :
[2022/05/05 16:09:44] ppocr INFO: batch_size_per_card : 128
[2022/05/05 16:09:44] ppocr INFO: drop_last : False
[2022/05/05 16:09:44] ppocr INFO: num_workers : 4
[2022/05/05 16:09:44] ppocr INFO: shuffle : False
[2022/05/05 16:09:44] ppocr INFO: Global :
[2022/05/05 16:09:44] ppocr INFO: cal_metric_during_train : True
[2022/05/05 16:09:44] ppocr INFO: character_dict_path : ppocr/utils/en_dict.txt
[2022/05/05 16:09:44] ppocr INFO: checkpoints : None
[2022/05/05 16:09:44] ppocr INFO: debug : False
[2022/05/05 16:09:44] ppocr INFO: distributed : False
[2022/05/05 16:09:44] ppocr INFO: epoch_num : 500
[2022/05/05 16:09:44] ppocr INFO: eval_batch_step : [0, 2000]
[2022/05/05 16:09:44] ppocr INFO: infer_img : doc/imgs_words/en/word_1.png
[2022/05/05 16:09:44] ppocr INFO: infer_mode : False
[2022/05/05 16:09:44] ppocr INFO: log_smooth_window : 20
[2022/05/05 16:09:44] ppocr INFO: max_text_length : 25
[2022/05/05 16:09:44] ppocr INFO: pretrained_model : ./output/rec_ppocr_v3/best_accuracy
[2022/05/05 16:09:44] ppocr INFO: print_batch_step : 10
[2022/05/05 16:09:44] ppocr INFO: save_epoch_step : 3
[2022/05/05 16:09:44] ppocr INFO: save_inference_dir : None
[2022/05/05 16:09:44] ppocr INFO: save_model_dir : ./output/rec_ppocr_v3
[2022/05/05 16:09:44] ppocr INFO: save_res_path : ./output/rec/predicts_ppocrv3.txt
[2022/05/05 16:09:44] ppocr INFO: use_gpu : True
[2022/05/05 16:09:44] ppocr INFO: use_space_char : True
[2022/05/05 16:09:44] ppocr INFO: use_visualdl : False
[2022/05/05 16:09:44] ppocr INFO: Loss :
[2022/05/05 16:09:44] ppocr INFO: loss_config_list :
[2022/05/05 16:09:44] ppocr INFO: CTCLoss : None
[2022/05/05 16:09:44] ppocr INFO: SARLoss : None
[2022/05/05 16:09:44] ppocr INFO: name : MultiLoss
[2022/05/05 16:09:44] ppocr INFO: Metric :
[2022/05/05 16:09:44] ppocr INFO: ignore_space : False
[2022/05/05 16:09:44] ppocr INFO: main_indicator : acc
[2022/05/05 16:09:44] ppocr INFO: name : RecMetric
[2022/05/05 16:09:44] ppocr INFO: Optimizer :
[2022/05/05 16:09:44] ppocr INFO: beta1 : 0.9
[2022/05/05 16:09:44] ppocr INFO: beta2 : 0.999
[2022/05/05 16:09:44] ppocr INFO: lr :
[2022/05/05 16:09:44] ppocr INFO: learning_rate : 0.001
[2022/05/05 16:09:44] ppocr INFO: name : Cosine
[2022/05/05 16:09:44] ppocr INFO: warmup_epoch : 5
[2022/05/05 16:09:44] ppocr INFO: name : Adam
[2022/05/05 16:09:44] ppocr INFO: regularizer :
[2022/05/05 16:09:44] ppocr INFO: factor : 3e-05
[2022/05/05 16:09:44] ppocr INFO: name : L2
[2022/05/05 16:09:44] ppocr INFO: PostProcess :
[2022/05/05 16:09:44] ppocr INFO: name : CTCLabelDecode
[2022/05/05 16:09:44] ppocr INFO: Train :
[2022/05/05 16:09:44] ppocr INFO: dataset :
[2022/05/05 16:09:44] ppocr INFO: data_dir : ./train_data/ic15_data/
[2022/05/05 16:09:44] ppocr INFO: ext_op_transform_idx : 1
[2022/05/05 16:09:44] ppocr INFO: label_file_list : ['./train_data/ic15_data/rec_gt_train.txt']
[2022/05/05 16:09:44] ppocr INFO: name : SimpleDataSet
[2022/05/05 16:09:44] ppocr INFO: transforms :
[2022/05/05 16:09:44] ppocr INFO: DecodeImage :
[2022/05/05 16:09:44] ppocr INFO: channel_first : False
[2022/05/05 16:09:44] ppocr INFO: img_mode : BGR
[2022/05/05 16:09:44] ppocr INFO: MultiLabelEncode : None
[2022/05/05 16:09:44] ppocr INFO: RecResizeImg :
[2022/05/05 16:09:44] ppocr INFO: image_shape : [3, 48, 320]
[2022/05/05 16:09:44] ppocr INFO: KeepKeys :
[2022/05/05 16:09:44] ppocr INFO: keep_keys : ['image', 'label_ctc', 'label_sar', 'length', 'valid_ratio']
[2022/05/05 16:09:44] ppocr INFO: loader :
[2022/05/05 16:09:44] ppocr INFO: batch_size_per_card : 128
[2022/05/05 16:09:44] ppocr INFO: drop_last : True
[2022/05/05 16:09:44] ppocr INFO: num_workers : 4
[2022/05/05 16:09:44] ppocr INFO: shuffle : True
[2022/05/05 16:09:44] ppocr INFO: profiler_options : None
[2022/05/05 16:09:44] ppocr INFO: train with paddle 2.2.2 and device CUDAPlace(0)
W0505 16:09:44.179414 10923 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0505 16:09:44.184576 10923 device_context.cc:465] device: 0, cuDNN Version: 7.6.
[2022/05/05 16:09:48] ppocr INFO: load pretrain successful from ./output/rec_ppocr_v3/best_accuracy
[2022/05/05 16:09:48] ppocr INFO: infer_img: doc/imgs_words/en/word_1.png
[2022/05/05 16:09:48] ppocr INFO: result: JOINT 0.9950313568115234
[2022/05/05 16:09:48] ppocr INFO: success!
预测图片:
预测使用的配置文件必须与训练一致.
测试文件夹下所有图像的文字识别效果
In [24]
# GPU预测
!python3 tools/infer_rec.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o Global.pretrained_model=./output/rec_ppocr_v3/best_accuracy Global.character_dict_path=ppocr/utils/en_dict.txt Global.infer_img=./doc/imgs_words_en/
[2022/05/05 16:09:59] ppocr INFO: Architecture :
[2022/05/05 16:09:59] ppocr INFO: Backbone :
[2022/05/05 16:09:59] ppocr INFO: last_conv_stride : [1, 2]
[2022/05/05 16:09:59] ppocr INFO: last_pool_type : avg
[2022/05/05 16:09:59] ppocr INFO: name : MobileNetV1Enhance
[2022/05/05 16:09:59] ppocr INFO: scale : 0.5
[2022/05/05 16:09:59] ppocr INFO: Head :
[2022/05/05 16:09:59] ppocr INFO: head_list :
[2022/05/05 16:09:59] ppocr INFO: CTCHead :
[2022/05/05 16:09:59] ppocr INFO: Head :
[2022/05/05 16:09:59] ppocr INFO: fc_decay : 1e-05
[2022/05/05 16:09:59] ppocr INFO: Neck :
[2022/05/05 16:09:59] ppocr INFO: depth : 2
[2022/05/05 16:09:59] ppocr INFO: dims : 64
[2022/05/05 16:09:59] ppocr INFO: hidden_dims : 120
[2022/05/05 16:09:59] ppocr INFO: name : svtr
[2022/05/05 16:09:59] ppocr INFO: use_guide : True
[2022/05/05 16:09:59] ppocr INFO: SARHead :
[2022/05/05 16:09:59] ppocr INFO: enc_dim : 512
[2022/05/05 16:09:59] ppocr INFO: max_text_length : 25
[2022/05/05 16:09:59] ppocr INFO: name : MultiHead
[2022/05/05 16:09:59] ppocr INFO: Transform : None
[2022/05/05 16:09:59] ppocr INFO: algorithm : SVTR
[2022/05/05 16:09:59] ppocr INFO: model_type : rec
[2022/05/05 16:09:59] ppocr INFO: Eval :
[2022/05/05 16:09:59] ppocr INFO: dataset :
[2022/05/05 16:09:59] ppocr INFO: data_dir : ./train_data/ic15_data
[2022/05/05 16:09:59] ppocr INFO: label_file_list : ['./train_data/ic15_data/rec_gt_test.txt']
[2022/05/05 16:09:59] ppocr INFO: name : SimpleDataSet
[2022/05/05 16:09:59] ppocr INFO: transforms :
[2022/05/05 16:09:59] ppocr INFO: DecodeImage :
[2022/05/05 16:09:59] ppocr INFO: channel_first : False
[2022/05/05 16:09:59] ppocr INFO: img_mode : BGR
[2022/05/05 16:09:59] ppocr INFO: MultiLabelEncode : None
[2022/05/05 16:09:59] ppocr INFO: RecResizeImg :
[2022/05/05 16:09:59] ppocr INFO: image_shape : [3, 48, 320]
[2022/05/05 16:09:59] ppocr INFO: KeepKeys :
[2022/05/05 16:09:59] ppocr INFO: keep_keys : ['image', 'label_ctc', 'label_sar', 'length', 'valid_ratio']
[2022/05/05 16:09:59] ppocr INFO: loader :
[2022/05/05 16:09:59] ppocr INFO: batch_size_per_card : 128
[2022/05/05 16:09:59] ppocr INFO: drop_last : False
[2022/05/05 16:09:59] ppocr INFO: num_workers : 4
[2022/05/05 16:09:59] ppocr INFO: shuffle : False
[2022/05/05 16:09:59] ppocr INFO: Global :
[2022/05/05 16:09:59] ppocr INFO: cal_metric_during_train : True
[2022/05/05 16:09:59] ppocr INFO: character_dict_path : ppocr/utils/en_dict.txt
[2022/05/05 16:09:59] ppocr INFO: checkpoints : None
[2022/05/05 16:09:59] ppocr INFO: debug : False
[2022/05/05 16:09:59] ppocr INFO: distributed : False
[2022/05/05 16:09:59] ppocr INFO: epoch_num : 500
[2022/05/05 16:09:59] ppocr INFO: eval_batch_step : [0, 2000]
[2022/05/05 16:09:59] ppocr INFO: infer_img : ./doc/imgs_words_en/
[2022/05/05 16:09:59] ppocr INFO: infer_mode : False
[2022/05/05 16:09:59] ppocr INFO: log_smooth_window : 20
[2022/05/05 16:09:59] ppocr INFO: max_text_length : 25
[2022/05/05 16:09:59] ppocr INFO: pretrained_model : ./output/rec_ppocr_v3/best_accuracy
[2022/05/05 16:09:59] ppocr INFO: print_batch_step : 10
[2022/05/05 16:09:59] ppocr INFO: save_epoch_step : 3
[2022/05/05 16:09:59] ppocr INFO: save_inference_dir : None
[2022/05/05 16:09:59] ppocr INFO: save_model_dir : ./output/rec_ppocr_v3
[2022/05/05 16:09:59] ppocr INFO: save_res_path : ./output/rec/predicts_ppocrv3.txt
[2022/05/05 16:09:59] ppocr INFO: use_gpu : True
[2022/05/05 16:09:59] ppocr INFO: use_space_char : True
[2022/05/05 16:09:59] ppocr INFO: use_visualdl : False
[2022/05/05 16:09:59] ppocr INFO: Loss :
[2022/05/05 16:09:59] ppocr INFO: loss_config_list :
[2022/05/05 16:09:59] ppocr INFO: CTCLoss : None
[2022/05/05 16:09:59] ppocr INFO: SARLoss : None
[2022/05/05 16:09:59] ppocr INFO: name : MultiLoss
[2022/05/05 16:09:59] ppocr INFO: Metric :
[2022/05/05 16:09:59] ppocr INFO: ignore_space : False
[2022/05/05 16:09:59] ppocr INFO: main_indicator : acc
[2022/05/05 16:09:59] ppocr INFO: name : RecMetric
[2022/05/05 16:09:59] ppocr INFO: Optimizer :
[2022/05/05 16:09:59] ppocr INFO: beta1 : 0.9
[2022/05/05 16:09:59] ppocr INFO: beta2 : 0.999
[2022/05/05 16:09:59] ppocr INFO: lr :
[2022/05/05 16:09:59] ppocr INFO: learning_rate : 0.001
[2022/05/05 16:09:59] ppocr INFO: name : Cosine
[2022/05/05 16:09:59] ppocr INFO: warmup_epoch : 5
[2022/05/05 16:09:59] ppocr INFO: name : Adam
[2022/05/05 16:09:59] ppocr INFO: regularizer :
[2022/05/05 16:09:59] ppocr INFO: factor : 3e-05
[2022/05/05 16:09:59] ppocr INFO: name : L2
[2022/05/05 16:09:59] ppocr INFO: PostProcess :
[2022/05/05 16:09:59] ppocr INFO: name : CTCLabelDecode
[2022/05/05 16:09:59] ppocr INFO: Train :
[2022/05/05 16:09:59] ppocr INFO: dataset :
[2022/05/05 16:09:59] ppocr INFO: data_dir : ./train_data/ic15_data/
[2022/05/05 16:09:59] ppocr INFO: ext_op_transform_idx : 1
[2022/05/05 16:09:59] ppocr INFO: label_file_list : ['./train_data/ic15_data/rec_gt_train.txt']
[2022/05/05 16:09:59] ppocr INFO: name : SimpleDataSet
[2022/05/05 16:09:59] ppocr INFO: transforms :
[2022/05/05 16:09:59] ppocr INFO: DecodeImage :
[2022/05/05 16:09:59] ppocr INFO: channel_first : False
[2022/05/05 16:09:59] ppocr INFO: img_mode : BGR
[2022/05/05 16:09:59] ppocr INFO: MultiLabelEncode : None
[2022/05/05 16:09:59] ppocr INFO: RecResizeImg :
[2022/05/05 16:09:59] ppocr INFO: image_shape : [3, 48, 320]
[2022/05/05 16:09:59] ppocr INFO: KeepKeys :
[2022/05/05 16:09:59] ppocr INFO: keep_keys : ['image', 'label_ctc', 'label_sar', 'length', 'valid_ratio']
[2022/05/05 16:09:59] ppocr INFO: loader :
[2022/05/05 16:09:59] ppocr INFO: batch_size_per_card : 128
[2022/05/05 16:09:59] ppocr INFO: drop_last : True
[2022/05/05 16:09:59] ppocr INFO: num_workers : 4
[2022/05/05 16:09:59] ppocr INFO: shuffle : True
[2022/05/05 16:09:59] ppocr INFO: profiler_options : None
[2022/05/05 16:09:59] ppocr INFO: train with paddle 2.2.2 and device CUDAPlace(0)
W0505 16:09:59.636674 10950 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0505 16:09:59.641562 10950 device_context.cc:465] device: 0, cuDNN Version: 7.6.
[2022/05/05 16:10:04] ppocr INFO: load pretrain successful from ./output/rec_ppocr_v3/best_accuracy
[2022/05/05 16:10:04] ppocr INFO: infer_img: ./doc/imgs_words_en/word_10.png
[2022/05/05 16:10:04] ppocr INFO: result: PAIN 0.9976047277450562
[2022/05/05 16:10:04] ppocr INFO: infer_img: ./doc/imgs_words_en/word_116.png
[2022/05/05 16:10:04] ppocr INFO: result: QBHOUSE 0.9709253311157227
[2022/05/05 16:10:04] ppocr INFO: infer_img: ./doc/imgs_words_en/word_19.png
[2022/05/05 16:10:04] ppocr INFO: result: SLOW 0.9971550703048706
[2022/05/05 16:10:04] ppocr INFO: infer_img: ./doc/imgs_words_en/word_201.png
[2022/05/05 16:10:04] ppocr INFO: result: HOUSE 0.9960419535636902
[2022/05/05 16:10:04] ppocr INFO: infer_img: ./doc/imgs_words_en/word_308.png
[2022/05/05 16:10:04] ppocr INFO: result: LITTLE 0.9545474052429199
[2022/05/05 16:10:04] ppocr INFO: infer_img: ./doc/imgs_words_en/word_336.png
[2022/05/05 16:10:04] ppocr INFO: result: SUPER 0.9802681803703308
[2022/05/05 16:10:04] ppocr INFO: infer_img: ./doc/imgs_words_en/word_401.png
[2022/05/05 16:10:04] ppocr INFO: result: BURGE 0.827716052532196
[2022/05/05 16:10:04] ppocr INFO: infer_img: ./doc/imgs_words_en/word_461.png
[2022/05/05 16:10:04] ppocr INFO: result: SPED 0.912112832069397
[2022/05/05 16:10:04] ppocr INFO: infer_img: ./doc/imgs_words_en/word_52.png
[2022/05/05 16:10:04] ppocr INFO: result: Future 0.9685637354850769
[2022/05/05 16:10:04] ppocr INFO: infer_img: ./doc/imgs_words_en/word_545.png
[2022/05/05 16:10:04] ppocr INFO: result: EORIT 0.9076364636421204
[2022/05/05 16:10:04] ppocr INFO: success!
2.4. 模型导出与预测
inference 模型(paddle.jit.save
保存的模型) 一般是模型训练,把模型结构和模型参数保存在文件中的固化模型,多用于预测部署场景。 训练过程中保存的模型是checkpoints模型,保存的只有模型的参数,多用于恢复训练等。 与checkpoints模型相比,inference 模型会额外保存模型的结构信息,在预测部署、加速推理上性能优越,灵活方便,适合于实际系统集成。
识别模型转inference模型与检测的方式相同,如下:
In [25]
# -c 后面设置训练算法的yml配置文件
# -o 配置可选参数
# Global.pretrained_model 参数设置待转换的训练模型地址,不用添加文件后缀 .pdmodel,.pdopt或.pdparams。
# Global.save_inference_dir参数设置转换的模型将保存的地址。
!python3 tools/export_model.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o Global.pretrained_model=./output/rec_ppocr_v3/best_accuracy Global.character_dict_path=ppocr/utils/en_dict.txt Global.save_inference_dir=./inference/ch_PP-OCRv3_rec/
W0505 16:10:41.224627 11024 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0505 16:10:41.229585 11024 device_context.cc:465] device: 0, cuDNN Version: 7.6.
[2022/05/05 16:10:45] ppocr INFO: load pretrain successful from ./output/rec_ppocr_v3/best_accuracy
[2022/05/05 16:10:47] ppocr INFO: inference model is saved to ./inference/ch_PP-OCRv3_rec/inference
**注意:**如果您是在自己的数据集上训练的模型,并且调整了中文字符的字典文件,请注意修改配置文件中的character_dict_path
为自定义字典文件。
转换成功后,在目录下有三个文件:
inference/ch_PP-OCRv3_rec/
├── inference.pdiparams # 识别inference模型的参数文件
├── inference.pdiparams.info # 识别inference模型的参数信息,可忽略
└── inference.pdmodel # 识别inference模型的program文件
- 使用PP-OCRv3识别进行推理时,不需要使用
--rec_algorithm
指定算法名称,使用默认的推理方式即为PP-OCRv3识别的推理过程。
In [27]
# GPU预测
!python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir=./inference/ch_PP-OCRv3_rec/ --rec_char_dict_path=ppocr/utils/en_dict.txt
# CPU预测
!python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir=./inference/ch_PP-OCRv3_rec/ --rec_char_dict_path=ppocr/utils/en_dict.txt --use_gpu=False
[2022/05/05 16:13:50] ppocr INFO: Predicts of ./doc/imgs_words_en/word_336.png:('SUPER', 0.9802668690681458)
[2022/05/05 16:13:53] ppocr INFO: Predicts of ./doc/imgs_words_en/word_336.png:('SUPER', 0.9802700281143188)
推理预测的图片为:
3. FAQ
Q1: 训练模型转inference 模型之后预测效果不一致?
A:此类问题出现较多,问题多是trained model预测时候的预处理、后处理参数和inference model预测的时候的预处理、后处理参数不一致导致
Q2: 如何自定义字典、修改backbone、训练多语言模型?
A:请参考PP-OCRv3识别详细教程。
4. 直播预告
🔥2022.5.11~13 每晚8:30【超强OCR技术详解与产业应用实战】三日直播课
- 11日:开源最强OCR系统PP-OCRv3揭秘
- 12日:云边端全覆盖的PP-OCRv3训练部署实战
- 13日:OCR产业应用全流程拆解与实战 赶紧扫码报名吧!
转自AI Studio,原文链接:【官方】十分钟完成 PP-OCRv3 识别全流程实战 - 飞桨AI Studio十分钟完成 PP-OCRv3 识别全流程实战项目地址:PaddleOCR github 地址:https://github.com/PaddlePaddle/PaddleOCRPaddleOCR是百度开源的超轻量级OCR模型库,提供了数十种文本检测、识别模型,旨在打造一套丰富、领先、实用的文字检测、识别模型/工具库,助力使用者训练出更好的模型,并应用落地。同时PaddleOC.
接上一篇:检测模型训练(二)
PaddlePaddle环境的构建详见专栏内其他文章。
本文使用MobileNetV3_large_x0_5_pretrained预训练检测模型,评估该检测模型在icdar2015上的检测效果。
icdar2015检测数据集如上图所示。
首先修改配置文件,文件路径如下图所示
这是MobileNetV3_large_x0_5_pretrained模型的配置文件,如果用的是其他模型,请使用其他的.yml配置文件。
打开.yml配置文件,在Architecture标签下可以看到
登录飞桨的官网下载最新的paddle,官网地址:飞桨PaddlePaddle-源于产业实践的开源深度学习平台
选择合适的CUDA版本,然后会在下面生成对应的命令。
然后,复制命令即可
conda install paddlepaddle-gpu==2.2.2 cudatoolkit=11.2 -c https
智能驾驶 车牌检测和识别(五)《C++实现车牌检测和识别(可实时车牌识别)》:https://blog.csdn.net/guyuealian/article/details/128704276
更多项目《智能驾驶 车牌检测和识别》系列文章请参考:
智能驾驶 车牌检测和识别(一)《CCPD车牌数据集》:https://blog.csdn.net/guyuealian/article/details/128704181
智能驾驶 车牌检测和识别(二)《YOLOv5实现车牌检测(含车牌检测数据集和训练代码)》:https://blog.csdn.net/guyuealian/article/details/128704068
智能驾驶 车牌检测和识别(三)《CRNN和LPRNet实现车牌识别(含车牌识别数据集和训练代码)》:https://blog.csdn.net/guyuealian/article/details/128704209
智能驾驶 车牌检测和识别(四)《Android实现车牌检测和识别(可实时车牌识别)》:https://blog.csdn.net/guyueali
1、本文基于上一篇文章:关于提高OCR识别准确率的一些优化(二)进行了一些优化,将图片方向识别准确率提升至96%。
2、在阅读这篇文章之前,建议先看上一篇,以便更好的理解
一、优化思路
1、在上一篇文章中,我们使用paddleocr的方向分类器直接判别图片方向,发现效果并不怎么好,而且效率也很低,识别一张图片平均耗时2s。
2、鉴于以上存在的问题,于是想出了一个新的优化方案:
使用paddleocr的文本矩形框检测得到所
远程传导的方式:在PC的终端中输入
scp -r <PC端文件路径> <ARM端用户名>@<ARMIP>:<ARM保存路径>
例如:scp -r G:/opencv ubuntu@192.168.233.1.3:/home/ubuntu/
(加一个-r是因为远程上传的是含有多级目录的文件夹)
ARM端的环境变量需要添加我们的代码文件路径(因为用到了某些包在
PP-OCR: A Practical Ultra Lightweight OCR System
论文地址:https://arxiv.org/abs/2009.09941
代码地址:https://github.com/PaddlePaddle/PaddleOCR
PP-OCR是一个实用的超轻量中英文OCR系统,是针对中英文OCR问题,对最新的文本检测算法 Differentiable Binarization (DB) 和经典的文本识别算法CRNN的能力充分挖掘,虽然没有理论创新,但是从骨干.
解决报错:pytesseract.pytesseract.TesseractError
安装后的默认文件路径为(这里使用的是Windows版本):C:\Program Files (x86)\Tesseract-OCR\
然后将源码中的:
tesseract_cmd = 'tesseract'
tesseract_cmd = r'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'
ch_pp-ocrv3_rec_train是一个用于中文文本识别训练的开源框架,它基于PyTorch实现,提供了多种预处理,数据增强和模型优化的方法,可以用于训练自己的中文OCR模型。其训练过程主要分为数据准备、模型定义、模型训练和模型评估几个步骤,能够构建出高精度的中文OCR模型,为OCR在实际应用中提供了有力的支持。
而ch_ppocr_mobile_v2.0_rec_pre是一个移动端中文文本识别预测模型,主要针对手机等移动端设备,采用了轻量化的模型结构和精简的参数,保证了高效的预测速度和较高的识别准确性。它支持的输入图像类型包括常见的jpg、png等格式,可以实现图片批量处理和在线图片预测等功能,适合于移动端OCR场景中的文字识别任务。
综上,ch_pp-ocrv3_rec_train和ch_ppocr_mobile_v2.0_rec_pre分别是中文OCR训练和预测的工具。ch_pp-ocrv3_rec_train可以用于训练自己的OCR模型,达到高精度的识别效果;ch_ppocr_mobile_v2.0_rec_pre则可以用于移动端OCR应用中,快速、准确地识别图片上的中文文字。