Heatmap-based methods dominate in the field of human pose estimation by modelling the output distribution through
likelihood heatmaps. In contrast, regression-based methods are more efficient but suffer from inferior performance. In
this work, we explore maximum likelihood estimation (MLE) to develop an efficient and effective regression-based
methods. From the perspective of MLE, adopting different regression losses is making different assumptions about the
output density function. A density function closer to the true distribution leads to a better regression performance. In
light of this, we propose a novel regression paradigm with Residual Log-likelihood Estimation (RLE) to capture the
underlying output distribution. Concretely, RLE learns the change of the distribution instead of the unreferenced
underlying distribution to facilitate the training process. With the proposed reparameterization design, our method is
compatible with off-the-shelf flow models. The proposed method is effective, efficient and flexible. We show its
potential in various human pose estimation tasks with comprehensive experiments. Compared to the conventional regression
paradigm, regression with RLE bring 12.4 mAP improvement on MSCOCO without any test-time overhead. Moreover,
for the first time, especially on multi-person pose estimation, our regression method is superior to the heatmap-based
methods.
|