Skip to main content
Log in

Picking out the bad apples: unsupervised biometric data filtering for refined age estimation

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Introduction of large training datasets was essential for the recent advancement and success of deep learning methods. Due to the difficulties related to biometric data collection, facial image datasets with biometric trait labels are scarce and usually limited in terms of size and sample diversity. Web-scraping approaches for automatic data collection can produce large amounts of weakly labeled and noisy data. This work is focused on picking out the bad apples from web-scraped facial datasets by automatically removing erroneous samples that impair their usability. The unsupervised facial biometric data filtering method presented in this work greatly reduces label noise levels in web-scraped facial biometric data. Experiments on two large state-of-the-art web-scraped datasets demonstrate the effectiveness of the proposed method with respect to real and apparent age estimation based on five different age estimation methods. Furthermore, we apply the proposed method, together with a newly devised strategy for merging multiple datasets, to data collected from three major web-based data sources (i.e., IMDb, Wikipedia, Google) and derive the new Biometrically Filtered Famous Figure Dataset or B3FD. The proposed dataset, which is made publicly available, enables considerable performance gains for all tested age estimation methods and age estimation tasks. This work highlights the importance of training data quality compared to data quantity and selection of the estimation method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

The proposed dataset is available at https://github.com/kbesenic/B3FD.

Notes

  1. https://images.google.com/.

  2. www.flickr.com.

  3. www.imdb.com.

  4. https://en.wikipedia.org/.

  5. http://dlib.net.

  6. https://pjreddie.com/darknet/tiny-darknet/.

  7. https://github.com/kbesenic/B3FD.

References

  1. Agustsson, E., Timofte, R., Escalera, S., Baro, X., Guyon, I., Rothe, R.: Apparent and real age estimation in still images with deep residual regressors on Appa-Real database. In: 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), pp. 87–94. IEEE (2017)

  2. Antipov, G., Baccouche, M., Berrani, S.A., Dugelay, J.L.: Apparent age estimation from face images combining general and children-specialized deep learning models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 96–104 (2016)

  3. Bešenić, K., Ahlberg, J., Pandžić, I.S.: Unsupervised facial biometric data filtering for age and gender estimation. In: Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications—Volume 5: VISAPP, pp. 209–217 (2019)

  4. Biemann, C.: Chinese whispers: an efficient graph clustering algorithm and its application to natural language processing problems. In: Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing, pp. 73–80. Association for Computational Linguistics (2006)

  5. Bulat, A., Tzimiropoulos, G.: How far are we from solving the 2D & 3D face alignment problem? (And a dataset of 230,000 3D facial landmarks). In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1021–1030 (2017)

  6. Cao, Q., Shen, L., Xie, W., Parkhi, O.M., Zisserman, A.: Vggface2: a dataset for recognising faces across pose and age. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 67–74. IEEE (2018)

  7. Chen, B.C., Chen, C.S., Hsu, W.H.: Cross-age reference coding for age-invariant face recognition and retrieval. In: European Conference on Computer Vision, pp. 768–783. Springer (2014)

  8. Chen, S., Zhang, C., Dong, M., Le, J., Rao, M.: Using ranking-CNN for age estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5183–5192 (2017)

  9. Chen, S., Liu, Y., Gao, X., Han, Z.: Mobilefacenets: efficient CNNs for accurate real-time face verification on mobile devices. In: Chinese Conference on Biometric Recognition, pp. 428–438. Springer (2018)

  10. Cheng, Y.: Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Anal. Mach. Intell. 17(8), 790–799 (1995)

    Article  Google Scholar 

  11. Eidinger, E., Enbar, R., Hassner, T.: Age and gender estimation of unfiltered faces. IEEE Trans. Inf. Forensics Secur. 9(12), 2170–2179 (2014)

    Article  Google Scholar 

  12. Escalera, S., Fabian, J., Pardo, P., Baró, X., Gonzalez, J., Escalante, H.J., Misevic, D., Steiner, U., Guyon, I.: Chalearn looking at people 2015: apparent age and cultural event recognition datasets and results. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 1–9 (2015)

  13. Escalera, S., Torres Torres, M., Martinez, B., Baró, X., Jair Escalante, H., Guyon, I., Tzimiropoulos, G., Corneou, C., Oliu, M., Ali Bagheri, M., et al.: Chalearn looking at people and faces of the world: face analysis workshop and challenge 2016. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–8 (2016)

  14. Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  15. Gallagher, A.C., Chen, T.: Understanding images of groups of people. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 256–263. IEEE (2009)

  16. Gao, B.B., Xing, C., Xie, C.W., Wu, J., Geng, X.: Deep label distribution learning with label ambiguity. IEEE Trans. Image Process. 26(6), 2825–2838 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  17. Gao, B.B., Zhou, H.Y., Wu, J., Geng, X.: Age estimation using expectation of label distribution learning. In: IJCAI, pp. 712–718 (2018)

  18. Golomb, B.A., Lawrence, D.T., Sejnowski, T.J.: Sexnet: a neural network identifies sex from human faces. In: NIPS, vol. 1, p. 2 (1990)

  19. Han, H., Otto, C., Jain, A.K., et al.: Age estimation from face images: human vs. machine performance. ICB 13, 1–8 (2013)

    Google Scholar 

  20. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)

  21. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  22. He, Z., Li, X., Zhang, Z., Wu, F., Geng, X., Zhang, Y., Yang, M.H., Zhuang, Y.: Data-dependent label distribution learning for age estimation. IEEE Trans. Image Process. 26(8), 3846–3858 (2017)

    Article  MathSciNet  Google Scholar 

  23. Hu, Z., Wen, Y., Wang, J., Wang, M., Hong, R., Yan, S.: Facial age estimation with age difference. IEEE Trans. Image Process. 26(7), 3087–3097 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  24. Huang, G.B., Mattar, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a database forstudying face recognition in unconstrained environments. In: Workshop on Faces in ’Real-Life’ Images: Detection, Alignment, and Recognition (2008)

  25. Jia, S., Cristianini, N.: Learning to classify gender from four million images. Pattern Recognit. Lett. 58, 35–41 (2015)

    Article  Google Scholar 

  26. Kemelmacher-Shlizerman, I., Seitz, S.M., Miller, D., Brossard, E.: The megaface benchmark: 1 million faces for recognition at scale. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4873–4882 (2016)

  27. Kwon, Y.H., et al.: Age classification from facial images. In: Proceedings CVPR’94, 1994 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1994, pp. 762–767. IEEE (1994)

  28. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  29. Levi, G., Hassner, T.: Age and gender classification using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 34–42 (2015)

  30. Li, P., Hu, Y., Wu, X., He, R., Sun, Z.: Deep label refinement for age estimation. Pattern Recognit. 100, 107178 (2020)

    Article  Google Scholar 

  31. Moschoglou, S., Papaioannou, A., Sagonas, C., Deng, J., Kotsia, I., Zafeiriou, S.: Agedb: the first manually collected, in-the-wild age database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 51–59 (2017)

  32. Ng, H.W., Winkler, S.: A data-driven approach to cleaning large face datasets. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 343–347. IEEE (2014)

  33. Ni, B., Song, Z., Yan, S.: Web image mining towards universal age estimator. In: Proceedings of the 17th ACM International Conference on Multimedia, pp. 85–94. ACM (2009)

  34. Ni, K., Pearce, R., Boakye, K., Van Essen, B., Borth, D., Chen, B., Wang, E.: Large-scale deep learning on the yfcc100m dataset. arXiv preprint arXiv:1502.03409 (2015)

  35. Niu, Z., Zhou, M., Wang, L., Gao, X., Hua, G.: Ordinal regression with multiple output CNN for age estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4920–4928 (2016)

  36. Pan, H., Han, H., Shan, S., Chen, X.: Mean–variance loss for deep age estimation from a face. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5285–5294 (2018)

  37. Panis, G., Lanitis, A.: An overview of research activities in facial age estimation using the FG-net aging database. In: European Conference on Computer Vision, pp. 737–750. Springer (2014)

  38. Parkhi, O.M., Vedaldi, A., Zisserman, A., et al.: Deep face recognition. In: BMVC, vol. 1, p. 6 (2015)

  39. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)

  40. Ricanek, K., Tesafaye, T.: Morph: a longitudinal image database of normal adult age-progression. In: 7th International Conference on Automatic Face and Gesture Recognition, 2006. FGR 2006, pp. 341–345. IEEE (2006)

  41. Rothe, R., Timofte, R., Van Gool, L.: Deep expectation of real and apparent age from a single image without facial landmarks. Int. J. Comput. Vis. 1–14 (2016)

  42. Sun, Y., Wang, X., Tang, X.: Deeply learned face representations are sparse, selective, and robust. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2892–2900 (2015)

  43. Wang, X., Guo, R., Kambhamettu, C.: Deeply-learned feature for age estimation. In: 2015 IEEE Winter Conference on Applications of Computer Vision, pp. 534–541. IEEE (2015)

  44. Yang, X., Gao, B.B., Xing, C., Huo, Z.W., Wei, X.S., Zhou, Y., Wu, J., Geng, X.: Deep label distribution learning for apparent age estimation. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 102–108 (2015)

  45. Yi, D., Lei, Z., Li, S.Z.: Age estimation by multi-scale convolutional network. In: Asian Conference on Computer Vision, pp. 144–158. Springer (2014)

  46. Zeiler, M.D.: Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 (2012)

  47. Zhang, Y., Liu, L., Li, C., et al.: Quantifying facial age by posterior of age comparisons. arXiv preprint arXiv:1708.09687 (2017)

Download references

Funding

The author K. Bešenić receives Ph.D. scholarship from the company Visage Technologies.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Krešimir Bešenić.

Ethics declarations

Conflict of interest

The author K. Bešenić is employed by Visage Technologies. Authors J. Ahlberg and I. S. Pandžić are members of Visage Technologies’ board of directors. All three authors own stock in the company.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bešenić, K., Ahlberg, J. & Pandžić, I.S. Picking out the bad apples: unsupervised biometric data filtering for refined age estimation. Vis Comput 39, 219–237 (2023). https://doi.org/10.1007/s00371-021-02323-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-021-02323-y

Keywords

Navigation