Elsevier

Pattern Recognition

Volume 124, April 2022, 108400
Pattern Recognition

CR-GAN: Automatic craniofacial reconstruction for personal identification

https://doi.org/10.1016/j.patcog.2021.108400Get rights and content

Highlights

  • Deep adversarial learning technique is introduced to synthesize the face images from the skull images for automated craniofacial reconstruction.

  • Artificial neural network architecture and one-to-many inference approach are designed.

  • Through comprehensive automated face recognition tests on five deep face recognition algorithms for the recreated face, we demonstrate the effectiveness of the proposed approach.

Abstract

Craniofacial reconstruction is applied to identify human remains in the absence of determination data (e.g., fingerprinting, dental records, radiological materials, or DNA), by predicting the likeness of the unidentified remains based on the internal relationship between the skull and face. Conventional 3D methods are usually based on statistical models with poor capacity, which limit the description of such complex relationship. Moreover, the required high-quality data are difficult to collect. In this study, we present a novel craniofacial reconstruction paradigm that synthesize craniofacial images from 2D computed tomography scan of skull data. The key idea is to recast craniofacial reconstruction as an image translation task, with the goal of generating corresponding craniofacial images from 2D skull images. To this end, we design an automatic skull-to-face transformation system based on deep generative adversarial nets. The system was trained on 4551 paired skull-face images obtained from 1780 CT head scans of the Han Chinese population. To the best of our knowledge, this is the only database of this magnitude in the literature. Finally, to accurately evaluate the performance of the model, a face recognition task employing five existing deep learning algorithms, —FaceNet, —SphereFace, —CosFace, —ArcFace, and —MagFace, was tested on 102 reconstruction cases in a face pool composed of 1744 CT-scan face images. The experimental results demonstrate that the proposed method can be used as an effective forensic tool.

Introduction

Rapid and accurate identification of human remains is of great importance to crime-scene and disaster victim identification (DVI) investigations. It is usually achieved through comparisons of fingerprints, dental records [1], radiological materials [2], or DNA [3]. In the absence of these determination data of human remains, craniofacial reconstruction (CR) is applied to the skull, providing cues to real-life appearance, which predicts a likeness of the unidentified remains based on the internal relationship between the skull and face for probable facial recognition.

The field of CR has experienced significant development in the last century, from the early occasional manual reconstruction [4] to computerized CR [5]. Computer-assisted techniques have been developed through the evolution of medical imaging using computed tomography (CT). CT scan stores a three-dimensional (3D) digital copy of the skull, along with the corresponding facial surface, taken from the living person. The majority of computerized CR methods are based on a similar idea, i.e., finds the mapping between the skull and face. Therefore, developing a mathematical model for the assumed relationship between the face and the underlying skull substrate based on CT scan data is the most critical task for computerized CR.

In recent decades, there has been increasing interest in using classical machine learning-based methods to perform CR tasks, for example, least squares support vector regression [6], latent root regression [7], and partial least squares regression [8], [9]. These approaches represent the 3D geometry of the skull and craniofacial surfaces in two different rough shape parameter spaces, and then fit the correspondence between these two shape parameter spaces using a statistical model. Unfortunately, they have limited power to exactly describe the relationship between the skull and the face. There are two reasons. First, the shape parameter space cannot fully parameterize the geometric representations derived from the 3D digital version, such as triangle meshes [8] or geodesic grid [9]. The other is that the mechanism by which skull shape influences the face is extremely complex, making it difficult to simulate with these models. To overcome this, the efforts that created a probabilistic joint face-skull mode to search for a whole distribution of plausible faces [10]; fitted several local shapes and glued them together into one joint face [11] have been made. However, before establishing the models, the registration process for constructing point-to-point correspondences is critical. That is, to learn the shape variation from the samples, all of them must be rigorously registered with a reference shape. This process requires a fixed number of anatomical points to be determined on a shape, where complete head scans are required. Therefore, these approaches are limited by the size of the area covered in the head scan data.

Compared with the 3D skull and face, the 2D versions (i.e., frontal view images) greatly reduce redundant information. Moreover, the 2D frontal digital version of the face, bearing the unique facial features as the 3D version, is well qualified to recognize the face for identifying an unknown person, which is the ultimate goal of CR. Recreating the precise 2D version of the human face from the bare skull, is of great value for the identification of unknown bodies. To the best of our knowledge, no comprehensive automated reconstruction techniques have been proposed to directly recreate 2D images of the face from the CT scan data.

One difficulty in achieving CR is the selection of a proper method for accuracy assessment. Resemblance ratings [12] or face-pool comparisons [13] have been commonly used in recent studies. For the former, a spatial map clearly depicts the quantitative error value between the reconstructed surface and the ground truth surface. However, it only measures the similarity of the reconstruction to the target individual and does not necessarily reflect the ability to identify the target individual from a group of faces. For the latter, given a reconstructed face image and a set of possible candidate images extracted from the database, a human observer is asked to identify the face most resmbling the given reconstruction from the face pool. In addition, the face pool was usually small, with a maximum of 100 samples. Thus, this process is frequently criticized as being subjective and unreliable.

With the aforementioned problems in mind, the pre-requests of our work can be formulated. (1) The proposed method should not require high-quality training data, that is, it can use incomplete head CT scan data for learning. (2) The model must have sufficient capacity to capture the correspondence between the skull and the craniofacial surface so that the reconstructed face can be very close to the ground truth. (3) There is an efficient evaluation technique for evaluating the performance of the method.

Generative adversarial nets (GANs) [14] are characterized by training a pair of networks (i.e., generator and discriminator) in competition with each other, until the generated images are indistinguishable from the real ones. Based on GANs, conditional GANs (cGANs) in various contexts made considerable progress. The most prominent one is the translation of images between visual domains. To establish a unified framework for image translation, Pix2Pix [15] introduced U-Net structure and an additional pixel-wise constraint when paired data were available. This approach was further extended to unpaired data by introducing a cycle consistency constraint [16]. Following these studies, numerous studies have recently emerged in the field of medical image editing, which enable us to tackle problems, such as synthesizing CT images from corresponding MR images [17], generating high-resolution fundus images from binary images of vessel trees [18], and synthesizing liver PET images from CT data [19].

Following these works, we propose a novel CR paradigm (CR-GAN) based on GANs to synthesize craniofacial images from 2D CT scan skulls. We argue that CR based on skull data can be recast as an image-to-image translation task [15], [20], with the goal of generating corresponding facial images from 2D skull images. This idea has two main advantages. First, it has a tremendous capacity to capture the complicated relationship between the skull and the face. Second, it does not require the data to be from a scan of the entire head, making the amount of available data much larger.

Directly applying the image-to-image translation method (e.g., Pix2Pix [15] or CycleGAN [16]) to the CR task, we observe that the age and gender characteristics in reconstructed craniofacial surfaces are almost random. Worse still, the identities of the reconstructed craniofacial images are vastly different from the ground truths. We found that the underlying reason is that the reconstruction network “ignores” the real cues and misused cues of facial identity to generate age and gender characteristics. To tackle this issue, we leverage conditional code as an additional input to align semantically with age and gender characteristics. Further, we propose a condition cyclic loss to ensure that the input condition information will not be ignored by the reconstructed network. This solution not only guarantees that the cues used by our model to synthesize age and gender characteristics are correct, but also enables our model to reconstruct different versions of input by varying the conditional code in the inference phase, thereby improving the accuracy of identification. To keep the consistency of facial identity under different conditions, the penalizing term for restricting face landmarks is proposed. Moreover, through the analysis of the experimental results, we carefully designed the network structure to improve the quality of the reconstructed face images. The model was trained on 4551 paired skull-face images obtained from 1780 CT head scans of the Han Chinese population. To the best of our knowledge, this is the only database of this magnitude in the literature.

Another contribution of this work is that we improved the face pool test, which was originally considered to be subjective and unreliable. Unlike human recognition, we employ five deep learning-based face recognition algorithms: FaceNet [21], SphereFace [22], CosFace [23], ArcFace [24], and MagFace [25] for face recognition. The state-of-the-art deep learning-based face recognition algorithms share the same high-level idea, that is, directly learning the mapping from a face image to a compact space, where distances directly correspond to a measure of face similarity. This robust representation of face images can make the model resistant to interference caused by lighting, posture, and even modal changes [21]. Considering the reliability of the test results, we conducted face recognition tests on 102 independent reconstruction cases in a face pool composed of 1744 CT-scan face images. The best top-1 (top-5) accuracy was 80.39% (94.12%) using CosFace, with an average rank-n score of 5.40, in a face pool of 1744 face images. This efficiency gain highlights the importance of using deep neural networks to automate CR and reduce obstacles to this widely used technology. However, utilizing existing well-developed deep learning face recognition algorithms for testing will allow our system to continuously improve its reliability through multiple test feedback.

In summary, the major contributions of this work can be listed as follows:

  • 1.

    To the best of our knowledge, we are the first to present a fully-automated and reliable CR framework specially designed for the 2D CT-scan skull images. It is friendly to data acquisition, and even incomplete head CT scan data can be used for training.

  • 2.

    We propose a novel and effective one-to-many CR paradigm. To ensure that input condition information has a highly related meaning to age and gender and is independent of facial identity, we present the joint constraint of condition cyclic loss and landmark consistency loss.

  • 3.

    We introduce the Mod-Demod technique into the residual block, which indirectly adjusts the activations by scaling the convolution weights to avoid the artifacts of reconstructed face images. Besides, we introduce atrous spatial pyramid pooling (ASPP) into the discriminator network to improve the global coherence of reconstructed face images.

  • 4.

    We improve the classical face pool testing method in the CR field by introducing well-established face recognition algorithms, making it fully automated, objective, reproducible, and capable of dealing with large-scale data. The test results show that CR-GAN achieves the best performance compared to baselines.

Section snippets

Data collection

First of all, what we must declare is that informed consent was obtained from all patients, and this study has been approved by The Institutional Review Board (IRB) of the West China Hospital of Sichuan University. The training and test sets of the head CT scans were obtained from the West China Hospital of Sichuan University. They were acquired using the multi-slice CT scanner system (Siemens Sensation 16) under the scanning protocol for clinical reasons, with a slice thickness of 0.75 mm. All

Method

In this study, we are interested in the task of determining the mapping function between the skull data space X and the face data space Y, given training sample pairs {xi,yi}i=1N, where sample xiX, sample yiY, and N is the number of data pairs. In addition, we have access to some intrinsic properties c of x and y, i.e., age and gender. We define facial identity as the individual age- and gender-invariant facial features. Naturally, we get a prior belief that the identity of y and

Evaluation criteria

The ultimate goal of CR is to identify unknown deceased individuals; thus, the performance of reconstructed faces in recognition algorithms is crucial. In this work, we validate the effectiveness of our model on 102 independent test samples using synthesized frontal view face images. To ensure that face recognition algorithms have adequate information to extract the facial features, among all test cases, the scanned area at least covers the region extending from the position slightly below the

Conclusions

In this work, an effective craniofacial reconstruction paradigm is proposed, which is based on 2D frontal view images of head scans. It is friendly to data acquisition, that is, incomplete head CT scan data can be used for training. Face recognition tests were conducted on 102 independent reconstructed cases in a face pool consisting of 1744 CT-scan face images, and our model finally achieved a top-1 (top-5) accuracy of 80.39% (94.12%). Experimental results suggest that the proposed method can

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Yuan Li received her M.S. degree and her Ph.D. degree from Sichuan University, China respectively in 2015 and in 2019, both in forensic science. She is currently an assistant researcher in Sichuan University. Her research interests include forensic anthology, and forensic imaging.

References (40)

  • Y. Li et al.

    Craniofacial reconstruction based on least square support vector regression

    IEEE International Conference on Systems, Man, and Cybernetics

    (2014)
  • W. Shui et al.

    A computerized craniofacial reconstruction method for an unidentified skull based on statistical shape models

    Multimed. Tools Appl.

    (2020)
  • D. Madsen et al.

    Probabilistic joint face-skull modelling for facial reconstruction

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2018)
  • C. Wilkinson et al.

    A blind accuracy assessment of computer-modeled forensic facial reconstruction using computed tomography data from live subjects

    Forensic Sci. Med. Pathol.

    (2006)
  • I.J. Goodfellow et al.

    Generative adversarial nets

    Advances in Neural Information Processing Systems: Annual Conference on Neural Information Processing Systems, December 8–13

    (2014)
  • P. Isola et al.

    Image-to-image translation with conditional adversarial networks

    IEEE Conference on Computer Vision and Pattern Recognition

    (2017)
  • J.-Y. Zhu et al.

    Unpaired image-to-image translation using cycle-consistent adversarial networks

    Proceedings of the IEEE International Conference on Computer Vision

    (2017)
  • D. Nie et al.

    Medical image synthesis with context-aware generative adversarial networks

    International Conference on Medical Image Computing and Computer-Assisted Intervention

    (2017)
  • P. Costa et al.

    End-to-end adversarial retinal image synthesis

    IEEE Trans. Med. Imaging

    (2017)
  • E. Klang et al.

    Virtual PET images from CT data using deep convolutional networks: initial results

    International Workshop on Simulation and Synthesis in Medical Imaging

    (2017)
  • Yuan Li received her M.S. degree and her Ph.D. degree from Sichuan University, China respectively in 2015 and in 2019, both in forensic science. She is currently an assistant researcher in Sichuan University. Her research interests include forensic anthology, and forensic imaging.

    Jian Wang received his M.S. degree from Sichuan University, China in 2020. He is currently a Ph.D. student in Sichuan University. His research interests include deep learning, computer vision and multimodal learning.

    Weibo Liang received his B.S. degree and his Ph.D. degree from Sichuan University, China respectively in 2001 and in 2006. He is currently a professor in Sichuan University. His research interest is forensic genetics.

    Hui Xue received her Ph.D. degree from Sichuan University, China in 2010. She is currently a professor in Sichuan University. Her research interest is imaging anatomy.

    Zhenan He received his B.E. degree in Automation from University of Science and Technology Beijing (USTB) in 2008. He received the M.S. and Ph.D degree in 2011 and 2014 both in Electrical Engineering from Oklahoma State University, USA. Now he is an associate Professor at Sichuan University. His research interests include machine intelligence, evolutionary computation, robust optimization and manyobjective Optimization.

    Jian Cheng Lv received the Ph.D. degree in computer science and engineering from the University of Electronic Science and Technology of China (UESTC), Chengdu, China, in 2006. He is currently a Professor at the Machine Intelligence Laboratory, College of Computer Science, Sichuan University, Chengdu, China. Prior to joining the University, he was a Research Fellow in Department of Electrical and Computer Engineering, National University of Singapore. His research interests include machine learning and learning in neural networks, etc.

    Lin Zhang received the B.S. and M.S. degrees from Sichuan University, China, and the Ph.D. degree from University of Mainz, Germany, in 1984, 1987, and 1995, respectively. He is currently a professor in Sichuan University. His research interests are immunology and forensic genetics.

    This work was supported by the National Natural Science Fund for Distinguished Young Scholar (Grant no. 61625204) and partially supported by the State Key Program of National Science Foundation of China (Grant nos. 61836006 and 61432014).

    1

    Yuan Li and Jian Wang are joint first author.

    View full text