Beijing University of Posts and Telecommunications, Beijing, China
Institute of Automation Chinese Academy of Sciences, Beijing, China
Noah's Ark Lab, Huawei Technologies, Shenzhen, China
如何估计网络输出的质量是一个重要问题,目前在人体解析领域还没有有效的解决方案。为了解决这个问题,本文提出了一种基于输出概率图的统计方法来计算像素分类质量,称为像素得分。此外,提出了质量感知模块(QAM)来融合不同的质量信息,其目的是估计人类解析结果的质量。我们将 QAM 与简洁有效的网络设计相结合,提出用于人类解析的质量感知网络(QANet)。受益于 QAM 和 QANet 的优势,我们在 CIHP、MHP-v2、Pascal-Person-Part、ATR 和 LIP 等三个多重和一个单一人类解析基准上取得了最佳性能。在不增加训练和推理时间的情况下,QAM 提高了 AP
$^\文本{r}$
在多人解析任务中得分超过 10 分。QAM 可以扩展到具有良好质量估计的其他任务,
例如实例分割。具体来说,QAM 通过以下方式改进了 Mask R-CNN:
$\scriptstyle\sim$
COCO 和 LVISv1.0 数据集上的 mAP 为 1%。基于所提出的 QAM 和 QANet,我们的整个系统在 CVPR2021 L2ID 高分辨率人体解析(HRHP)挑战赛中获得第一名,在 CVPR2021 PIC 短视频人脸解析(SFP)挑战赛中获得第二名。代码和型号可在以下网址获取
https://github.com/soeaver/QANet
How to estimate the quality of the network output is an important issue, and currently there is no effective solution in the field of human parsing. To solve this problem, this work proposes a statistical method based on the output probability map to calculate the pixel classification quality, which is called pixel score. In addition, the Quality-Aware Module (QAM) is proposed to fuse the different quality information, the purpose of which is to estimate the quality of human parsing results. We combine QAM with a concise and effective network design to propose Quality-Aware Network (QANet) for human parsing. Benefiting from the superiority of QAM and QANet, we achieve the best performance on three multiple and one single human parsing benchmarks, including CIHP, MHP-v2, Pascal-Person-Part, ATR and LIP. Without increasing the training and inference time, QAM improves the AP
$^\text{r}$
criterion by more than 10 points in the multiple human parsing task. QAM can be extended to other tasks with good quality estimation,
e.g instance segmentation. Specifically, QAM improves Mask R-CNN by
$\scriptstyle \sim$
1% mAP on COCO and LVISv1.0 datasets. Based on the proposed QAM and QANet, our overall system wins 1st place in CVPR2021 L2ID High-resolution Human Parsing (HRHP) Challenge, and 2nd in CVPR2021 PIC Short-video Face Parsing (SFP) Challenge. Code and models are available at
https://github.com/soeaver/QANet