Short communicationRobust regression with deep CNNs for facial age estimation: An empirical study
Introduction
Age estimation from face images received a lot of attention in recent times (Angulu, Tapamo, & Adewumi, 2018). This problem gave rise to a significant amount of research since it has a broad spectrum of practical applications like age-oriented commercial advertisement, police investigation, security, soft biometrics, and age specific Human-Computer Interaction. In the literature, there are two types of age estimation problems. The first one addresses real age estimation, which is to predict the biological age of a subject using his or her face image (Fernandez, Huerta, & Prati, 2014). The second one addresses age group estimation. In this type, the objective is to predict the age group in which the person’s age falls (Levi & Hassner, 2015). Broadly speaking, there are two categories of approaches for solving the age estimation from images depending on the type of image representation: (i) hand-crafted methods (e.g., J. Wang, W. Yau, Wang, 2009, Lu, Liong, Zhou, 2015, Nguyen, Cho, Park, 2014, Sai, Wang, Teoh, 2015) and (ii) deep learning methods (e.g., Dehghan, Ortiz, Shu, & Masood, Dornaika, Arganda-Carreras, Belver., 2019, Han, Jain, Wang, Shan, Chen, 2018, Huerta, Fernndez, Segura, Hernando, Prati, 2015, Ranjan, Zhou, Chen, Kumar, Alavi, Patel, Chellappa, 2015). In the first category, the approaches use a generic-purpose image descriptor and then apply a classification or regression on the obtained shallow image descriptors. In the second category, an end-to-end solution is provided by learning a set of linear and nonlinear functions directly from the raw images. Due to their superior performance in several computer vision tasks, deep learning methods have been recently used and proposed for facial image analysis (e.g., Liu, Lu, Feng, Zhou, 2017, Rothe, Timofte, Gool, 2018, Taheri, Toygar, 2019.
In Rothe et al. (2018), the authors introduced a deep learning method for age prediction. They generated a large database (IMDB-WIKI database) which is among the largest public age databases. Their solution adopted the use of age groups but the final estimation provides a real number since the expected age can be estimated from the obtained probability distribution. In Shen et al. (2018), the authors introduced the use of Deep Regression Forests (DRFs) and used it as an end-to-end solution for age prediction. These DRFs receive as input the output of a given deep CNN or any other image representation. The authors devise an algorithm that learns the CNN, the data clusters at the split nodes and the data representation at the leaf nodes. In Taheri and Toygar (2019), the authors estimated ages by exploiting Directed Acyclic Graph Convolutional Neural Networks (DAG-CNNs). Their method uses multi-stage outputs from different layers of the CNN. Their introduced architecture used multi-scale features and fused the scores associated with multiple classifiers. In Lou, Alnajar, Alvarez, Hu, and Gevers (2018), the authors proposed an expression-invariant age estimation method by simultaneously learning the expression and age. They learn the relationship between the expression and age by deploying a graphical model adopting a latent layer. In Liu, Lu, Feng, and Zhou (2018b), the authors introduced an ordinal deep learning scheme. They jointly learn face representation and age predictor.
The present submission is not about proposing novel robust loss functions. It focuses instead on the practical aspects of using such robust loss functions for the particular problem of age estimation using deep CNNs. To the best of our knowledge, almost all deep regression networks for age estimation have adopted the Mean Square Error loss. This means that the loss function used for training the CNN is set to the ℓ2 norm of the age error. However, the ℓ2 norm is a non-robust estimator that can lead to poor generalization in cases where aberrant data are present in the training set. The existing deep regression networks have not considered the influence of aberrant and outlier observations on the final model. In the domain of age prediction from face images, an aberrant image is an image that has a large prediction error with respect to the majority of training images. Since the regression model is unknown, one cannot decide whether or not a face image is aberrant. Therefore, in the training phase, a robust loss function should reduce the effect of these unexpected errors on the whole CNN model that performs the age regression.
In this paper, we propose the use of robust loss function in order to derive deep regression for age estimation. By retraining a given deep CNN architecture with a robust regression function, we are able to improve the accuracy of age estimation as well as the convergence properties of the training process. More precisely, we explore the use of two robust loss functions: (i) the ℓ1 norm error, and (ii) the adaptive loss function. The adaptive loss function retains the advantages of the ℓ1 and ℓ2 norms. It can handle the Gaussian distribution of data with small losses and the Laplacian distribution of outlier data.
Section snippets
Approach: deep CNNs with robust loss
In our work, we start from a given deep network architecture and proceed as follows. If the network architecture provides discrete classification (age groups), its corresponding layers are replaced by a linear regression function that estimates the age. Furthermore, the resulting architecture is retrained using the robust loss functions described below. A graphical illustration of the proposed approach is illustrated in Fig. 1. For a given network and a given loss function, the retraining
Face preprocessing
In the face preprocessing stage, we detect the face and the eyes positions. The two eyes positions are employed to rectify the face in the image. A 2D similarity transformation is applied to the face image in order to align the face. Finally, after a suitable cropping, we get a face image that will be used by the CNNs.
Deep networks
In our study, we re-train the following deep CNN architectures: VGG-Face, DEX-IMDB-WIKI, and DEX-ChaLearn.
VGG-Face net This deep CNN contains 11 blocks. Each block has a linear
Conclusion
We introduced the use of robust loss functions for training deep CNNs in regression-based age estimation. We have shown that training deep CNNs with an adaptive loss function, which is robust to outliers, yielded better performance and generalization compared to the standard Mean Square Error loss, which is a common loss function used in regression-based age estimation. The use of such loss function makes the convergence of the training process faster.
CRediT authorship contribution statement
F. Dornaika: Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Writing - original draft. SE. Bekhouche: Conceptualization, Data curation, Investigation, Software, Validation, Writing - original draft. I. Arganda-Carreras: Data curation, Funding acquisition, Software, Validation.
Declaration of Competing Interest
The authors declare that they do not have any financial or nonfinancial conflict of interests.
Acknowledgment
This work was funded by the Spanish Ministerio de Ciencia, Innovacion y Universidades, Programa Estatal de I+D+i Orientada a los Retos de la Sociedad, RTI2018-101045-B-C21.
References (39)
- et al.
Pyramid multi-level features for facial demographic estimation
Expert Systems with Applications
(2017) - et al.
A cascaded convolutional neural network for age estimation of unconstrained faces
Biometrics theory, applications and systems (BTAS), 2016 IEEE 8th international conference on
(2016) - et al.
Age categorization via ECOC with fused Gabor and LBPfeatures
2009 Workshop on applications of computer vision (WACV)
(2009) - et al.
Label-sensitive deep metric learning for facial age estimation
IEEE Transactions on Information Forensics and Security
(2018) - et al.
Joint estimation of age and expression by combining scattering and convolutional networks
TOMCCAP
(2018) - et al.
Apparent and real age estimation in still images with deep residual regressors on appa-real database
Automatic face & gesture recognition (FG2017), 2017 12th IEEE international conference on
(2017) - et al.
Age estimation via face images: A survey
EURASIP Journal on Image and Video Processing
(2018) - et al.
Ordinal hyperplanes ranker with cost sensitivities for age estimation
CVPR 2011
(2011) - et al.
Cumulative attribute space for age and crowd density estimation
Proceedings of the IEEE conference on computer vision and pattern recognition
(2013) - et al.
Cumulative attribute space for age and crowd density estimation
2013 IEEE conference on computer vision and pattern recognition
(2013)
A new robust function that smoothly interpolates between l1 and l2 error functions
Tech Report
Age estimation in facial images through transfer learning
Machine Vision and Applications
Faces—A database of facial expressions in young, middle-aged, and older women and men: Development and validation
Behavior Research Methods
Chalearn looking at people 2015: Apparent age and cultural event recognition datasets and results
Proceedings of the IEEE international conference on computer vision workshops
A comparative evaluation of regression learning algorithms for facial age estimation
FFER in conjunction with IEEE international conference on pattern recognition
Facial age estimation by learning from label distributions
IEEE Transactions on Pattern Analysis and Machine Intelligence
Image-based human age estimation by manifold learning and locally adjusted robust regression
IEEE Transactions on Image Processing
A study on human age estimation under facial expression changes
Computer vision and pattern recognition (CVPR), 2012 IEEE conference on
Cited by (31)
Robust Gaussian process regression based on bias trimming
2024, Knowledge-Based SystemsArtificial intelligence in forensic anthropology: State of the art and Skeleton-ID project
2023, Methodological and Technological Advances in Death Investigations: Application and Case StudiesFacial age estimation using tensor based subspace learning and deep random forests
2022, Information SciencesCitation Excerpt :Although face age estimation technology faces many challenges, there are many different applications in the fields of close observation and inspection systems, information management, human–computer interaction, and entertainment, etc. Modern Convolutional Neural Networks (CNNs) are becoming more extensive and complicated [13,5]. Despite their success in estimating the age of faces in images, they still have some weaknesses: (i) the choice of architecture of the network, (ii) high computational cost, (iii) limited portability between different datasets, and (iv) the need for a very large dataset for training.
AttM-CNN: Attention and metric learning based CNN for pornography, age and Child Sexual Abuse (CSA) Detection in images
2021, NeurocomputingCitation Excerpt :Different than biological age estimation, another research direction is apparent age estimation [62,7,1] which involves age estimation on the images labeled by volunteers based on their subjective age estimate from the visual appearance of a human [22,23]. Age estimation solutions proposed so far have been formulated mainly as classification [21,7,37,19,47], regression [78,67,20,43], ranking [17,46,44], or their combinations [62,16,48]. Earlier methods were mostly based on handcrafted features For instance, Eidinger et al. [21] presented dropout-SVM to classify features extracted with Local binary patterns (LBP) and Four Patch LBP (FPLBP) descriptors, and Iqbal et al. [37] introduced a face descriptor named Directional Age-Primitive Pattern (DAPP) to model discernible aging cues.
Assessment of the interpretability of data mining for the spatial modelling of water erosion using game theory
2021, CatenaCitation Excerpt :Some commonly applied individual algorithms include conditional inference tree (CIT) (Johnstone et al., 2014), generalized additive model (GAM) (Chuangye et al., 2016), principal component analysis (PCA) (Behbahani et al., 2017), quantile regression, partial least-squares regression (PLSR) (Liu et al., 2019), classification and regression tree (CART), artificial neural network (ANN), multinomial logistic regression (Heung et al., 2016), support vector machine (SVM), random forest (Wadoux et al., 2019), cubist and extreme gradient boosting (EGB) (Xu et al., 2018) and, multivariate adaptive regression splines (MARS) (Gómez Gutiérrez et al., 2009). Common hybrid models include bagging-M5P (Khosravi et al., 2018), robust regression with deep convolutional neural network (RR-DCNN) (Dornaika et al., 2020), ANN-SVM (Pourghasemi et al., 2017), boosted regression trees (BRT) (Elith et al., 2008), SVM and k-NN algorithms (Chen and Hao, 2017), hybrid SVM (Gong et al., 2016), random subspace-classification and regression trees (RSCART) (Pham et al., 2018), as well as ANN-MaxEnt, SVM-MaxEnt and ANN-MaxEnt-SVM. Soil erosion by water in Iran is an important issue (e.g. Abdollahi et al., 2019; Ayoubi et al., 2018; Kiani-Harchegani and Sadeghi, 2020).