Data augmentation method for improving the accuracy of human pose estimation with cropped images
Introduction
Body poses are primarily used to understand human behavior. The available large-scale image dataset of human poses (e.g., in [1] and [2]) encourages researchers and practitioners to develop a neural network for human pose estimation. However, pose estimation task remains challenging, especially when a body is only partially visible; the performance of such estimation tends to deteriorate due to the lack of clues from the image. Ruggero Ronchi and Perona [3] also reported that errors increase when an input image has a few visible keypoints and that a large portion of the errors involves false negatives; this phenomenon indicates that the network fails to localize certain keypoints.
Data augmentation is a regularization method that improves the training performance of neural networks without modifying their architectures. This technique allows networks to generalize the problem effectively by adding modified data in the training process. The affine transform-based data augmentation method introduced by Cireşan et al. [4], Krizhevsky et al. [5] and Simonyan and Zisserman [6] has become an essential process in the human pose estimation domain. In addition to the data augmentations to change feature patterns in the region of interest (ROI), alterations of the ROI reinforce the training data. When we modify ROI capturing a human figure, the new ROI obtains a different amount of information for human pose estimation.
In the present study, we propose body-cropping augmentation (BCA) to improve the accuracy of human pose estimation. BCA generates new training data by selecting a human ROI in various ways. Therefore, a neural network can localize keypoints using only a partial feature of the human body, thereby affecting the general performance of human pose estimation. BCA includes policies for compiling proper training data in order to prevent the augmented data from having overly small segments and images with high similarities. In addition, we provide a proper learning strategy that maximizes the effectiveness of the proposed data augmentation. Using the results of various experiments involving state-of-the-art neural networks (e.g., Chen et al. [7], Xiao et al. [8] and Sun et al. [9]), we verify that the proposed BCA can improve the performance of a state-of-the-art network by an average of 1.08% on the val2017 dataset. Furthermore, using the result from the benchmarking tool of [3], we show that BCA contributes to the reduction of false negatives, especially when the input image has limited keypoints.
Section snippets
Keypoint detection for human pose estimation
Recently, various types of neural network architectures have estimated the probability maps of keypoint locations from a single image. Research on human pose estimation has two main streams, namely, the top-down and the bottom-up approaches. Chen et al. [7], Xiao et al. [8], Newell et al. [10], He et al. [11], and Sun et al. [9] adopt top-down-based approaches; in this technique, human pose estimation is performed after human detection which provides the ROI of a target human. The accuracy of a
Body-cropping augmentation(BCA)
Cropping may not be always effective for data augmentation. For example, cropped images might contain insufficient information that can distract the initial problem or slight difference to make any meaningful role in data augmentation. Meanwhile, there can be various ways to utilize the augmented data in a combination with the original data. In this section, we first introduce data cropping strategies to alleviate the possible problems caused by the cropping method(Section 3.1). Then, we
Dataset and experimental setting
Various datasets, such as those of [1], [21] and [2], are available for the study. Among them, we choose the COCO dataset of [2] specifically, because it has the largest volume of images with plenty of annotations such as bounding boxes, keypoint, and segmentation. The dataset also contains various wild cases. The term ‘wild’ here indicates that the dataset includes images with large variance without any constraints for human pose estimation. For example, the data include an image with a
Conclusion
We have proposed body-cropping augmentation (BCA), which include data collection method and learning strategy, for enhancing human pose estimation. Using the COCO dataset, the proposed BCA improves the accuracy of state-of-the-art neural networks by approximately 1.08% and 0.92% on the average on val2017 and test2017, respectively. In addition, BCA allows the networks to alleviate false negatives effectively, especially when an image has a few keypoints. The result is even promising when we
Declaration of Competing Interest
- •
All authors have participated in (a) conception and design, or analysis and interpretation of the data; (b) drafting the article or revising it critically for important intellectual content; and (c) approval of the final version.
- •
This manuscript has not been submitted to, nor is under review at, another journal or other publishing venue.
- •
The authors have no affiliation with any organization with a direct or indirect financial interest in the subject matter discussed in the manuscript
Acknowledgments
This work was supported by Institute of Information and Communications Technology Planning and Evaluation (IITP) grant funded by the Korea government (MIST) [R0118-19-1004, Development of Intelligent Interaction Technology based on Recognition of User’s State and Intention for Digital Life]
References (23)
- et al.
2D human pose estimation: new benchmark and state of the art analysis
Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2014) - et al.
Microsoft coco: common objects in context
Proceedings of the European Conference on Computer Vision (ECCV)
(2014) - et al.
Benchmarking and error diagnosis in multi-instance pose estimation
Proceedings of IEEE International Conference on Computer Vision (ICCV)
(2017) - et al.
Deep, big, simple neural nets for handwritten digit recognition
Neural Comput.
(2010) - et al.
ImageNet classification with deep convolutional neural networks
NIPS
(2012) - et al.
Very deep convolutional networks for large-scale image recognition
ICLR
(2014) - et al.
Cascaded pyramid network for multi-person pose estimation
Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2018) - et al.
Simple baselines for human pose estimation and tracking
Proceedings of the European Conference on Computer Vision (ECCV)
(2018) - K. Sun, B. Xiao, D. Liu, J. Wang, Deep high-resolution representation learning for human pose estimation,...
- et al.
Stacked hourglass networks for human pose estimation
Proceedings of the European Conference on Computer Vision (ECCV)
(2016)
Mask R-CNN
Proceedings of IEEE International Conference on Computer Vision (ICCV)
Cited by (25)
A compact multi-branch 1D convolutional neural network for EEG-based motor imagery classification
2023, Biomedical Signal Processing and ControlCitation Excerpt :The input data of the network consisted of 3 channels with 1000 points. Data Augmentation: Cropping, as a data enhancement method, is widely used in the field of image recognition and can effectively improve the training of algorithms [51,52]. Before performing data cropping, the interp1d function of API scipy was used to interpolate 4 s (1000 samples) of MI EEG data for each EEG channel, and the data length was increased to 1050 samples, as shown in Fig. 8.
Raman spectroscopy may allow rapid noninvasive screening of keratitis and conjunctivitis
2022, Photodiagnosis and Photodynamic TherapyCitation Excerpt :Moreno-Barea et al. used data augmentation on small data sets,which effectively improved the classification accuracy of the neural network model [52]. Park et al. proposed a data augmentation method in the study of human pose estimation to improve the accuracy of the neural network model [53]. Due to the limitation of the number of patients in the hospital and diagnostic equipment, it is difficult to collect a large quantity of tear fluid Raman spectrum data.
Human Body Poses Detection and Estimation Using Convolutional Neural Network
2024, Lecture Notes in Networks and Systems