Short communication
Robust regression with deep CNNs for facial age estimation: An empirical study

https://doi.org/10.1016/j.eswa.2019.112942Get rights and content

Highlights

  • The letter proposes deep CNN models for facial age estimation.

  • It opens the door for the use of robust loss functions in Deep CNNs.

  • Empirical evaluations are provided on four public datasets.

  • The Adpative loss function seems to be a promising loss function.

Abstract

Recent works have shown that deep Convolutional Neural Networks (CNNs) can be very effective for image-based age estimation. However, the proposed approaches significantly vary, and there are still some open problems. Almost all deep regression networks for age estimation have exploited the Mean Square Error loss only. These deep networks have not considered the influence of aberrant and outlier observations on the final model. In this letter, we introduce the use of robust loss functions in order to learn deep regression networks for age estimation. More precisely, we explore the use of two robust regression functions: (i) the ℓ1 norm error, and (ii) the adaptive loss function that retains the advantages of the ℓ1 and ℓ2 norms. Experimental results obtained on four public databases demonstrate that learning a deep CNN with robust losses can improve the age estimation.

Introduction

Age estimation from face images received a lot of attention in recent times (Angulu, Tapamo, & Adewumi, 2018). This problem gave rise to a significant amount of research since it has a broad spectrum of practical applications like age-oriented commercial advertisement, police investigation, security, soft biometrics, and age specific Human-Computer Interaction. In the literature, there are two types of age estimation problems. The first one addresses real age estimation, which is to predict the biological age of a subject using his or her face image (Fernandez, Huerta, & Prati, 2014). The second one addresses age group estimation. In this type, the objective is to predict the age group in which the person’s age falls (Levi & Hassner, 2015). Broadly speaking, there are two categories of approaches for solving the age estimation from images depending on the type of image representation: (i) hand-crafted methods (e.g., J. Wang, W. Yau, Wang, 2009, Lu, Liong, Zhou, 2015, Nguyen, Cho, Park, 2014, Sai, Wang, Teoh, 2015) and (ii) deep learning methods (e.g., Dehghan, Ortiz, Shu, & Masood, Dornaika, Arganda-Carreras, Belver., 2019, Han, Jain, Wang, Shan, Chen, 2018, Huerta, Fernndez, Segura, Hernando, Prati, 2015, Ranjan, Zhou, Chen, Kumar, Alavi, Patel, Chellappa, 2015). In the first category, the approaches use a generic-purpose image descriptor and then apply a classification or regression on the obtained shallow image descriptors. In the second category, an end-to-end solution is provided by learning a set of linear and nonlinear functions directly from the raw images. Due to their superior performance in several computer vision tasks, deep learning methods have been recently used and proposed for facial image analysis (e.g., Liu, Lu, Feng, Zhou, 2017, Rothe, Timofte, Gool, 2018, Taheri, Toygar, 2019.

In Rothe et al. (2018), the authors introduced a deep learning method for age prediction. They generated a large database (IMDB-WIKI database) which is among the largest public age databases. Their solution adopted the use of age groups but the final estimation provides a real number since the expected age can be estimated from the obtained probability distribution. In Shen et al. (2018), the authors introduced the use of Deep Regression Forests (DRFs) and used it as an end-to-end solution for age prediction. These DRFs receive as input the output of a given deep CNN or any other image representation. The authors devise an algorithm that learns the CNN, the data clusters at the split nodes and the data representation at the leaf nodes. In Taheri and Toygar (2019), the authors estimated ages by exploiting Directed Acyclic Graph Convolutional Neural Networks (DAG-CNNs). Their method uses multi-stage outputs from different layers of the CNN. Their introduced architecture used multi-scale features and fused the scores associated with multiple classifiers. In Lou, Alnajar, Alvarez, Hu, and Gevers (2018), the authors proposed an expression-invariant age estimation method by simultaneously learning the expression and age. They learn the relationship between the expression and age by deploying a graphical model adopting a latent layer. In Liu, Lu, Feng, and Zhou (2018b), the authors introduced an ordinal deep learning scheme. They jointly learn face representation and age predictor.

The present submission is not about proposing novel robust loss functions. It focuses instead on the practical aspects of using such robust loss functions for the particular problem of age estimation using deep CNNs. To the best of our knowledge, almost all deep regression networks for age estimation have adopted the Mean Square Error loss. This means that the loss function used for training the CNN is set to the ℓ2 norm of the age error. However, the ℓ2 norm is a non-robust estimator that can lead to poor generalization in cases where aberrant data are present in the training set. The existing deep regression networks have not considered the influence of aberrant and outlier observations on the final model. In the domain of age prediction from face images, an aberrant image is an image that has a large prediction error with respect to the majority of training images. Since the regression model is unknown, one cannot decide whether or not a face image is aberrant. Therefore, in the training phase, a robust loss function should reduce the effect of these unexpected errors on the whole CNN model that performs the age regression.

In this paper, we propose the use of robust loss function in order to derive deep regression for age estimation. By retraining a given deep CNN architecture with a robust regression function, we are able to improve the accuracy of age estimation as well as the convergence properties of the training process. More precisely, we explore the use of two robust loss functions: (i) the ℓ1 norm error, and (ii) the adaptive loss function. The adaptive loss function retains the advantages of the ℓ1 and ℓ2 norms. It can handle the Gaussian distribution of data with small losses and the Laplacian distribution of outlier data.

Section snippets

Approach: deep CNNs with robust loss

In our work, we start from a given deep network architecture and proceed as follows. If the network architecture provides discrete classification (age groups), its corresponding layers are replaced by a linear regression function that estimates the age. Furthermore, the resulting architecture is retrained using the robust loss functions described below. A graphical illustration of the proposed approach is illustrated in Fig. 1. For a given network and a given loss function, the retraining

Face preprocessing

In the face preprocessing stage, we detect the face and the eyes positions. The two eyes positions are employed to rectify the face in the image. A 2D similarity transformation is applied to the face image in order to align the face. Finally, after a suitable cropping, we get a face image that will be used by the CNNs.

Deep networks

In our study, we re-train the following deep CNN architectures: VGG-Face, DEX-IMDB-WIKI, and DEX-ChaLearn.

VGG-Face net This deep CNN contains 11 blocks. Each block has a linear

Conclusion

We introduced the use of robust loss functions for training deep CNNs in regression-based age estimation. We have shown that training deep CNNs with an adaptive loss function, which is robust to outliers, yielded better performance and generalization compared to the standard Mean Square Error loss, which is a common loss function used in regression-based age estimation. The use of such loss function makes the convergence of the training process faster.

CRediT authorship contribution statement

F. Dornaika: Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Writing - original draft. SE. Bekhouche: Conceptualization, Data curation, Investigation, Software, Validation, Writing - original draft. I. Arganda-Carreras: Data curation, Funding acquisition, Software, Validation.

Declaration of Competing Interest

The authors declare that they do not have any financial or nonfinancial conflict of interests.

Acknowledgment

This work was funded by the Spanish Ministerio de Ciencia, Innovacion y Universidades, Programa Estatal de I+D+i Orientada a los Retos de la Sociedad, RTI2018-101045-B-C21.

References (39)

  • Dehghan, A., Ortiz, E. G., Shu, G., & Masood, S. Z. (2017). Dager: Deep age, gender and emotion recognition using...
  • C. Ding

    A new robust function that smoothly interpolates between l1 and l2 error functions

    Tech Report

    (2013)
  • F. Dornaika et al.

    Age estimation in facial images through transfer learning

    Machine Vision and Applications

    (2019)
  • N.C. Ebner et al.

    Faces—A database of facial expressions in young, middle-aged, and older women and men: Development and validation

    Behavior Research Methods

    (2010)
  • S. Escalera et al.

    Chalearn looking at people 2015: Apparent age and cultural event recognition datasets and results

    Proceedings of the IEEE international conference on computer vision workshops

    (2015)
  • C. Fernandez et al.

    A comparative evaluation of regression learning algorithms for facial age estimation

    FFER in conjunction with IEEE international conference on pattern recognition

    (2014)
  • X. Geng et al.

    Facial age estimation by learning from label distributions

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2013)
  • G. Guo et al.

    Image-based human age estimation by manifold learning and locally adjusted robust regression

    IEEE Transactions on Image Processing

    (2008)
  • G. Guo et al.

    A study on human age estimation under facial expression changes

    Computer vision and pattern recognition (CVPR), 2012 IEEE conference on

    (2012)
  • Cited by (31)

    • Artificial intelligence in forensic anthropology: State of the art and Skeleton-ID project

      2023, Methodological and Technological Advances in Death Investigations: Application and Case Studies
    • Facial age estimation using tensor based subspace learning and deep random forests

      2022, Information Sciences
      Citation Excerpt :

      Although face age estimation technology faces many challenges, there are many different applications in the fields of close observation and inspection systems, information management, human–computer interaction, and entertainment, etc. Modern Convolutional Neural Networks (CNNs) are becoming more extensive and complicated [13,5]. Despite their success in estimating the age of faces in images, they still have some weaknesses: (i) the choice of architecture of the network, (ii) high computational cost, (iii) limited portability between different datasets, and (iv) the need for a very large dataset for training.

    • AttM-CNN: Attention and metric learning based CNN for pornography, age and Child Sexual Abuse (CSA) Detection in images

      2021, Neurocomputing
      Citation Excerpt :

      Different than biological age estimation, another research direction is apparent age estimation [62,7,1] which involves age estimation on the images labeled by volunteers based on their subjective age estimate from the visual appearance of a human [22,23]. Age estimation solutions proposed so far have been formulated mainly as classification [21,7,37,19,47], regression [78,67,20,43], ranking [17,46,44], or their combinations [62,16,48]. Earlier methods were mostly based on handcrafted features For instance, Eidinger et al. [21] presented dropout-SVM to classify features extracted with Local binary patterns (LBP) and Four Patch LBP (FPLBP) descriptors, and Iqbal et al. [37] introduced a face descriptor named Directional Age-Primitive Pattern (DAPP) to model discernible aging cues.

    • Assessment of the interpretability of data mining for the spatial modelling of water erosion using game theory

      2021, Catena
      Citation Excerpt :

      Some commonly applied individual algorithms include conditional inference tree (CIT) (Johnstone et al., 2014), generalized additive model (GAM) (Chuangye et al., 2016), principal component analysis (PCA) (Behbahani et al., 2017), quantile regression, partial least-squares regression (PLSR) (Liu et al., 2019), classification and regression tree (CART), artificial neural network (ANN), multinomial logistic regression (Heung et al., 2016), support vector machine (SVM), random forest (Wadoux et al., 2019), cubist and extreme gradient boosting (EGB) (Xu et al., 2018) and, multivariate adaptive regression splines (MARS) (Gómez Gutiérrez et al., 2009). Common hybrid models include bagging-M5P (Khosravi et al., 2018), robust regression with deep convolutional neural network (RR-DCNN) (Dornaika et al., 2020), ANN-SVM (Pourghasemi et al., 2017), boosted regression trees (BRT) (Elith et al., 2008), SVM and k-NN algorithms (Chen and Hao, 2017), hybrid SVM (Gong et al., 2016), random subspace-classification and regression trees (RSCART) (Pham et al., 2018), as well as ANN-MaxEnt, SVM-MaxEnt and ANN-MaxEnt-SVM. Soil erosion by water in Iran is an important issue (e.g. Abdollahi et al., 2019; Ayoubi et al., 2018; Kiani-Harchegani and Sadeghi, 2020).

    View all citing articles on Scopus
    View full text