1 Introduction

Medical data analysis provides convenient and high-quality services, which can be used for prescription recommendation, treatment planning, and online diagnosis for many diseases [1,2,3]. Developmental dysplasia of the hip (DDH) is one of the most common skeletal system diseases in infants and children, which attacks with a global incidence ranging from 0.16% to 2.85% [4]. It is a developmental disease in the hip joint and can be diagnosed by observing bone morphology on an X-ray image [5, 6]. Six landmarks on the X-ray image are the key to the quantitative analysis of DDH as show in Figure 1(a),

The acetabular index can be calculated by their included angles based on these landmarks. The rules for evaluating the acetabular index are shown in Figure 2(b), It is a fundamental diagnostic metric for DDH in many respect [7, 8]. It measures the severity of DDH, uses for acetabular modeling, and indicates for DDH treatment [9,10,11,12]. In addition, predicting the age of the femoral head is significant. It can provide auxiliary information for the diagnosis of DDH. As we can see in Figure 1(c), the calcification and development of these two femoral heads are different. Based on the development age of the femoral head and the patient’s actual age, we can further confirm the condition and give an appropriate treatment plan.

However, due to the shortage of professional doctors to diagnose DDH, many children do not receive timely treatment. Most highly trained experts are generally located in large urban hospitals, while children in rural areas and small cities are difficult to obtain a professional diagnosis [13]. Online automatic diagnosis for DDH has the potential to meet the challenge of insufficient professional treatment by performing medical image analysis. Despite its potential, the reliability and interpretability of online automatic diagnosis remain challenging. The traditional medical image analyzing approach relied on hand-engineered features and multi-step landmark localization. Implementing and improving such an approach requires dedicated design and is computationally expensive [14].

Fig. 1
figure 1

Figure (a) shows six landmarks that need to be detected. Landmark 1 6 represent tri-radiate cartilage center (right), tri-radiate cartilage center(right), acetabulum superolateral margin (right), tri-radiate cartilage center (left), tri-radiate cartilage center(left), acetabulum superolateral margin (left) respectively. Figure (b) shows the schematic diagram [5] of the clinical DDH diagnosis. We need to detect four landmarks (1, 2, 3, 4) to draw the Hilgenreiner line and Perkin line [5] to divide quadrants shown as I, II, III, IV. When landmark 5 and 6 are detected, the degree of DDH is determined. The yellow arrows denote the acetabular index, which is of significance in diagnosis. Figure (c) shows delayed development and calcification of femoral heads due to hip dislocation. The red areas denote the size of femoral heads

Despite the difficulty in diagnosing DDH automatically and effectively, the development of the convolutional neural network has allowed for training an end-to-end model to meet the challenges of medical images [15,16,17,18]. In our paper, we propose a multi-task hourglass network to detect the landmarks and predict the age on the hip X-ray image [19]. It is an encoder-decoder structure that can extract features effectively and accurately predict developmental age. First and foremost, our team constructs a professional DDH dataset with the experts’ annotations include 9369 hip X-ray images. Secondly, we train and evaluate the proposed multi-task hourglass network on our dataset to detect six landmarks and predict the patients’ age. Finally, we deploy our method on the GPU server to provide services through the network for the public and continue to collect DDH data. In this way, our approach can realize a stable and interpretable service for online automatic DDH diagnosis. Therefore, our work shows that a multi-task hourglass network can learn the accurate location of the landmarks in the X-ray images and distinguish between different stages of development. Moreover, with precise image analysis and fast GPU computing, our method can help overcome the shortage of medical resources and enable telehealth for DDH diagnosis. The experiments show that our method achieves 4.64 mean pixel error in landmark detection and accuracy of 89% in age prediction. And our online service has successfully helped 112 children with DDH diagnosis.

In summary, the main contributions of this paper are as follows:

  • We propose a novel multi-task hourglass network to investigate the online automatic diagnosis for DDH. Our method achieves accurate landmark detection and age prediction on X-ray images with the powerful feature extracting capabilities and the end-to-end framework, .

  • To overcome the shortage of medical resources, we build an online DDH diagnosis platform based on the GPU server. To the best of our knowledge, it is the first work that provides an online service for automatic DDH diagnosis.

  • We construct a DDH database with a considerable amount of X-ray images to train and evaluate our model. The success of online diagnosis demonstrates that our model can effectively learn the knowledge from the database.

2 Related work

2.1 Developmental dysplasia of the hip

In 1985, Tonnis first proposed the current standard method to diagnose DDH on an X-ray image [5]. The core of Tonnis’s approach is detecting six landmarks (see Figure 1(a)) to estimate the dislocation degrees (see Figure 1(b)) of DDH. According to these landmarks, Hilgenreiner first described the acetabular index in 1986. It provides a direct measurement referred to as the acetabular index angle to diagnose DDH [6].

Many researchers have studied automatic diagnosis methods due to the high incidence of DDH in children [20,21,22,23]. AI-Bashir et al. utilize Canny edge detection and Hough transform to detect the landmark on the X-ray image [20]. However, their methods are struggling to detect landmarks among different ages. Sahin et al. propose an object detection methods which uses the landmark as a template to match the target image [21]. However, this method requires massive computation because of the patch-by-patch searching for the best fitting patch. Liu et al. [22] and Bier et al. [23] utilize convolutional neural network to detect hip landmarks on the X-ray image. Yet, they ignore the significance of the age of the femoral head to assist DDH diagnosis. Therefore, the above DDH analysis methods are insufficient to realize an online DDH diagnosis which requires accurate and adequate analysis of the hip X-ray image.

In our work, we propose an end-to-end multi-task hourglass network to diagnose DDH effectively and provide service online, thereby realizing an earlier treatment for improved clinical outcomes.

2.2 Online diagnosis

The advances in computer networking enable online diagnosis as an alternative for medical treatment [13, 24,25,26,27,28]. Sublett et al. propose a distribution system that supports remote ultrasound examinations [13]. They achieve a real-time diagnosis under the limited bandwidth. Hollander et al. suggest that telemedical innovations play a pivotal role in medical service especially during the Covid-19 pandemic [25]. This is because online automatic diagnosis can not only save medical resources but also avoid overexposure. Esteva et al. propose a deep-learning-based method to diagnose skin cancer [27]. They deploy their method on a mobile device to extend the reach of dermatologists outside of the clinic. Long et al. build the AI system with a cloud-based platform for multihospital collaboration [28]. They design the platform to help patients with rare diseases and improve disease management for different hospitals. Their method demonstrates the capabilities of an online automatic diagnosis system to help both the patients and the hospitals.

To sum up, online diagnosis provides patients with a convenient and effective solution for health care. Our work proposes a novel multi-task hourglass network. It facilitates the online service with a GPU server to automatically diagnose DDH, the first work opened for public service.Footnote 1

2.3 Multi-task hourglass network

Recent years have witnessed the progress of the deep neural network, which has triggered significant changes in the medical field [3, 29,30,31,32,33,34]. Newell et al. propose an hourglass network with an encoder-decoder architecture to generate the heatmap for landmark detection [19]. The hourglass network has been successfully applied to many areas such as pose estimation, face recognition and object detection [35,36,37]. Cai et al. utilize a multi-task hourglass network to expand receptive fields and reconstruct a frame for fall detection [31]. Their method demonstrates the multi-task hourglass network can complete the detection task in complex scenes. Kordon et al. propose a deep-stacked multi-task hourglass network to jointly localize landmarks, predict a tangent line, and perform segmentation on the X-ray image [34]. Their work indicates that the multi-task hourglass network has sound performance in exploring bone morphology.

Here we propose a novel method based on the hourglass network, which can accurately detect landmarks to diagnose the degree of hip dislocation and predict the age of femoral head development. With a large-scale DDH dataset collected from Anhui Provincial Children’s hospital, the hourglass network can be adequately trained and appropriately evaluated for diagnosing DDH. Meanwhile, the online service can continue to collect extensive DDH data to asses and refine the multi-task hourglass network.

3 Method

3.1 Ethical approval statement

The Ethics Review Committee of Anhui Province Children’s Hospital of China has passed the review of this project. We have submitted the ethics review report as Support Documents. The ethics review report can be found in supplementary material. All data and images used in this study have been desensitized. These data and images used for research do not contain any patient’s private information and do not contain information showing the patient’s identity. Moreover, these images and data are only used for academic research. This statement was presented by Yongdong Zhang and Jun Sun, who are the directors of this research at the University of Science and Technology of China, and director of the Ethics Committee of Anhui Province Children’s Hospital of China, respectively.

3.2 Confirming statement of parents or guardians

We have received written consent from parents or guardians of these children who participated in this study. These data and images used in this study have been desensitized, and this study does not reveal any private information about children. These data and images for research are not intended for any commercial activity.

3.3 Multi-task hourglass network

We propose a novel multi-task hourglass network for online automatic DDH diagnosis. Figure 2 illustrates the overall framework for landmark detection and age prediction.

3.3.1 Network architecture

Fig. 2
figure 2

Illustration of the proposed multi-task hourglass network. The backbone is default hourglass [19]. The overall architecture of this network mainly comprises two components, i.e. the feature extraction section and landmarks and the age prediction section. We use a single network to predict landmarks and the age of femoral heads. Diagnostic results can be seen in the figure

The proposed hourglass network is an encoder-decoder architecture, as shown in Figure 2. We follow the standard implementation of the hourglass network for the feature extraction section. In specific, to build a multi-task hourglass network, we first utilize the convolutional block followed by the pooling layer to extract the features from images [38]. After four convolutional blocks and pooling layers, we obtain a feature map in a small shape with rich information:

$${f_{i + 1}} = Pool(Conv({f_{i}}))$$
(1)

where fi denotes the features of the i-th stage in the encoder, Conv and Pool represent the convolutional block and pooling layer. The stacked convolutional layers enlarge the reception field of the model and enable detection from a large perspective [39]. Then, we follow the network engineering of the hourglass network and fully convolution network [19, 40], the features are unpsampled and added to the output of former layers by skip connection:

$${\hat{f}_{i + 1}} = Up(Conv({\hat{f}_{i}})) + Conv({f_{i + 1}})$$
(2)

where \(\hat {f}_{i}\) represents the features of the i-th stage in the decoder, Up denotes the upsampling layer. In this way, the model can combine the rich context information in the shallow layers with the adequate semantics in the deep layers for landmark detection. In the final stage, we use 1 × 1 convolutional layer to generate output for DDH diagnosis.

In our proposed multi-task hourglass network, there are three types of the output. First of all, the multi-task hourglass output the heatmap of the landmarks 1 to 4. The heatmaps indicate the probability of the location of these landmarks. Secondly, the model output the heatmap of the femoral head. Since the age of patients has an enormous impact on the morphology of the femoral head, we use four different heatmaps to represent each femoral head in different age periods. Thus, the output of these heatmaps could indicate the age of the patients by the highest value. The third type of output is six offset maps for revising the location of the landmarks. Such creation can refine the location results by learning the difference between the heatmap and the ground-truth landmark.

3.3.2 Heatmap estimation

Based on the multi-task hourglass network, we utilize the heatmap to simultaneously predict the landmark and the age of the femoral head on the DDH X-ray image. Thus, heatmap estimation plays a pivotal role in landmark detection and age prediction. Suppose the input image is IRW×H. The proposed multi-task hourglass network predicts the heatmap \({{\hat {Y}}}\) from an image I. The following equation is used to generate the target heatmap according to the position of the ground truth landmark, we utilize the Gaussian kernel as following:

$${Y_{xyc}} = \exp \left( - \frac{{{{(x - {{{l}_{x}}_{c}})}^{2}} + {{(y - {{{l}_{y}}_{c}})}^{2}}}}{{2\sigma^{2}}}\right)$$
(3)

where σ is an size-adaptive standard deviation, Yxyc represnets the heatmap of c-th channel, lxc and lyc represent the coordinate of the c-th landmark. If these Gaussian labels have overlaps, we select the element-wise maximum \({{M_{xyc}} = \underset {c = 1,2,...,C}{\max \limits } {Y_{xyc}}}\). As mentioned above, we divide the left femoral head into four categories (0.1-2, 2-6, 6-12 and > 12) according to age information. We divided the patients into four groups based on the morphological characteristics of the femoral head at different ages. To decide the age period of the input X-ray image, the network output four heatmaps for landmarks 5 and 6 (the femoral head) to represent the age periods. The heatmap with the maximum value is the result of the age period. The same operation is applied to the right femoral head. Other landmarks (1, 2, 3, 4) are 4 categories. Thus, the model output 12 different heatmaps for each X-ray image. The proposed model predicts the landmarks by generating \({{\hat {Y}}}\) and offset \({{\hat {O}}}\). The network can predict different locations and categories of landmarks. We can use the predicted category of landmark to obtain the age of the femoral head.

When our trained model diagnoses a new X-ray image, we extract the peaks in the heatmap for each category to obtain landmarks. We use the categories of these predicted peaks to obtain ages. All outputs are produced directly from the landmark estimation without the need for Non-Maximum Suppression or other post-processing. When using the trained model to detect landmarks and predict ages in an X-ray DDH image, the results are consistent and stable. In the end, we can derive the predicted landmark from the heatmap:

$$< {\hat{l}_{x}},{\hat{l}_{y}} > = \underset{x,y}{\arg \max} (\hat{Y}) + < {\hat{O}_{x}},{\hat{O}_{y}} >$$
(4)

where \({\hat {l}_{x}},{\hat {l}_{y}}\) represent the predicted landmark. And the age of femoral head can be derived by:

$$A = \underset{c}{\arg \max} (\hat{Y})$$
(5)

where A indicates the age range.

3.3.3 Loss function

The loss function plays a pivotal role in optimizing our model. Our model is essentially a one-stage detector. Therefore, we utilize focal loss [41] to optimize the detector and generate the target heatmap in different bone morphologies. The training loss can be formulated as follows:

$${L_{l}} = - \frac{1}{N}{\sum\limits_{xyc} {{\psi_{xyc}}(1 - {{\hat{Y}}_{xyc}})}^{\alpha} }\log ({{\hat{Y}}_{xyc}},$$
(6)

where

$${{\hat{Y}}_{xyc}} = \left\{ {\begin{array}{*{20}{l}} {{{\hat{Y}}_{xyc}}}\\ {1 - {{\hat{Y}}_{xyc}}} \end{array} } \right.\begin{array}{*{20}{l}} {{\text{if}}\quad {Y_{xyc}} = 1}\\ {{\text{otherwise}}} \end{array}$$
(7)

and

$${\psi_{xyc}} = \left\{ {\begin{array}{*{20}{l}} {1 }\\ {{{(1 - {M_{xyc}})}^{\beta} }} \end{array}} \right.\begin{array}{*{20}{c}} {{\text{if}}\quad {Y_{xyc}} = 1}\\ {{\text{otherwise.}}} \end{array}$$
(8)

The hyper-parameters (α and β) are adopted in the focal loss [41], and N is the number of landmarks in an image I. We expect N to be 6. We adopt α = 2 and β = 4 [37, 42] in all experiments.

To recover the error caused by down sampling, we predict a local offset \({\hat {O}}\) for each landmark. We adopt L1 loss to optimize the offset map:

$${{\text{L}}_{o}} = \frac{1}{N}\sum\limits_{l} {\left[{{\hat{O}}_{\tilde{l}}} - \left( \frac{l}{S} - \tilde{l}\right)\right]_{1}}$$
(9)

Finnaly, the overall training loss of the model is

$${L = {L_{l}} + {\lambda_{o}}{L_{o}}.}$$
(10)

where λo is a hyper-parameter to adjust the value of L1 Loss.

4 Experimental results and discussion

We test our trained model on the test dataset with 2119 DDH images and conduct a series of experiments to elaborate on the results of our method. These results contain predictions of landmarks, acetabular index angle, and age in detail.

4.1 Experimental setup

We apply the proposed method to the DDH dataset for landmark detection and age prediction. The network is trained using the Pytorch framework [43] on a Ubuntu workstation equipped with an Intel i7-9700 CPU and an 11GB Nvidia GeForce 1080Ti GPU. During training, the mini batch size is set to 4. Adam grad optimizer is used for updating with the learning rate of 1.25e-4 [44, 45]. We set the total epoches in training as 30.

4.2 Dataset

Our dataset is collected in the clinical routine of Anhui Province Children’s Hospital of China and contains all common conditions in clinical cases in 2012-2019. The original medical images were collected from the PACS system of Anhui Province Children’s Hospital of China. We converted the original DICOM files to JPG files for labeling. And we extracted children’s actual age information from these DICOM files. All landmarks of dataset are labeled by fifteen professional doctors. These doctors are divided into three groups to label landmarks. Three of five doctors in each group label images and the other two doctors check images labeled by these three doctors. If there is an inaccurately labeled image among these three doctors, the image will be re-labeled until it passes the check process. In this way, each image has been labeled three times, and we calculated the average of coordinates of each landmark.

Table 1 Distribution of test set (9369 images) in our database

Patients are between 0.1-14 years old. The total number of DDH images is 9369, in which 7250 images are used for training, and the rest 2119 images are for testing.

The distributions of the training and test dataset are shown in Table 1. Because we need to learn the relationship between age and development of femoral heads, the femoral heads of the training dataset are all in the I quadrants. During the training process, we resize the input resolution to 512 × 512. At inference time, we recover the output to the original size to statistically analyze behaviors.

4.3 Landmark detection

We analyze pixel error of these detected six landmarks on the entire test dataset with 2119 DDH X-ray images. The calculation method of pixel error is \({error = \sqrt {{{({x_{i}} - {x_{j}})}^{2}} + {{({y_{i}} - {y_{j}})}^{2}}} }\). (xi,yi) denotes landmarks labeled by doctors, and (xj,yj) denotes landmark predicted by our network. As shown in Figure 3, the horizontal axis represents pixel error, and we divide the pixel error range into 10 intervals (0-2, 2-4, 4-6, 6-8, 8-10, 10-12, 12-14, 14-16, 16-18, > 18 pixels, We can see in the figure that the pixel error mainly concentrates in 0-2, 2-4, 4-6 intervals. In addition, we can also see that the detection results of landmark 5 and landmark 6 are more concentrated in the low error interval (0-2, 2-4) than landmarks 1 to 4, which indicates that the overall detection accuracy of landmark 5 and landmark 6 are higher than other landmarks. In specific, landmarks 1 to 4 have pixel errors around five while landmarks 5 and 6 have smaller pixel errors. It is because landmarks 5 and 6 locate at the center of the femoral head, which is more distinguishable. However, landmarks 5 and 6 have greater SD for localization. This is because the femoral heads have different morphological shapes at different ages, which could affect the detection of the landmark.

Fig. 3
figure 3

Distribution the pixel error. The X-axis denotes landmarks mean pixel error interval of all landmarks in an image. The Y-axis represents the number of images. Six different colors denote six landmarks. Bestviewedincolor

Table 2 Mean±SD and median error of landmark (1, 2, 3, 4, 5, 6) detection in pixels

In the test dataset with 2119 images, based on the label of fifteen professional doctors, there are 1892 images whose left femoral head center (landmark 6) locates in the I quadrant. There are 152 images whose landmark 6 are in the II quadrant. There are 61 images whose landmark 6 are in the III quadrant, and 14 landmark 6 are in the IV quadrant. For the right femoral head centers (landmark 5), 1927 images are in the I quadrant, 118 in the II quadrant, 53 in the III quadrant, and 21 in the IV quadrant. We compare pixel error in these different sub-test datasets, and the results are in Table 2. The comparison results included Mean±SD and median of pixel error. We can see that the average pixel errors of these six landmarks are 5.5, 5.6, 4.6, 5.5, 3.4, and 3.4, respectively. The overall pixel error is relatively low for these different sub-test datasets.

The test dataset with 2119 images can be roughly divided into four categories based on the age information collected from the Anhui Province Children’s Hospital of China. These results of landmark detection in different ages are in Table 2. On the whole, the pixel error of older age will be slightly larger. This is because as the age becomes larger, mainly at 6-12 and > 12, the degree of bone calcification becomes larger. The feature richness of landmarks in the image is reduced with higher age, thereby reducing the accuracy of landmark detection. Especially for images older than 12, pixel errors of landmark 1 and landmark 6 become larger, which is caused by two following reasons. The first is that degree of bone calcification becomes deeper. The second is that the amount of images older than 12 is small in the training and testing dataset.

4.4 Acetabular index detection

Fig. 4
figure 4

Distribution of acetabular index angle error of DDH X-ray images. X axis denotes mean acetabular index angle error interval of two acetabular index angles in an image. Y axis represents numbers of images. Two different colors denote left acetabular index and right acetabular index angle. Bestviewedincolor

Based on the landmarks predicted by the proposed network, we calculated the left and right acetabular index angles for every image in all test dataset with 2119 images. The results are shown in Table 3. The average error between the predicted and labeled left acetabular index angle is 2.759, and the median error is 2.175. The average error between the predicted and the labeled right acetabular index angle is 2.793, and the median error is 2.211. We analyze the error of the acetabular index angle prediction in detail, as shown in Figure 3. It suggests that the angle error mainly concentrates at 0-2, 2-4, and 4-6. We also separately count the images of left femoral head center (landmark 6) at I, II, III, and IV, respectively. The number of images is 1892, 152, 61, and 14, respectively. The average and median error of these four types of images are shown in Table 3. Similarly, we also analyze angle error on images whose center of the right femoral head (landmark 5) locate in I, II, III, and IV.

Table 3 Mean±SD and median error of acetabular index angle in left and right hips
Table 4 Mean±SD and median error of acetabular index angle. 0.1-2 Year in table represents sub-dataset which only contains images whose real ages are in 0.1-2 years
Table 5 Analysis of sensitivity and specificity on test dataset with 2119 X-ray images

As children growing, their hips continue to develop, and the degree of bone calcification will gradually deepen. Children of different ages have different acetabular index angles. In order to detail the performance of angle prediction by the designed network, we analyze the statistical acetabular index angle error of different ages in the test dataset, as shown in Table 4. Overall, for children of different ages, the error between predicted and labeled acetabular index angle is about 3 to 4, which is relatively stable.

We use all the images in the test dataset for analysis of sensitivity/ specificity. We have unified situations that landmark 5 (landmark 6) is in these quadrants II, III, IV as DDH in left (right) hip, and in quadrant I as non-DDH. The statistical results are as Table 5 and Figure 5.

Fig. 5
figure 5

Example landmark detection results of DDH. Landmarks and number represent the outputs of our method. Text on the left-top of the image is diagnostic report, include the information of left acetabular index angle, Left acetabular index angle, quadrant of landmark 5, quadrant of landmark 6, age of left femoral head and age of left femoral head. Bestviewedincolor

4.5 Age prediction

While predicting landmarks in the DDH X-ray image, we also predict the age of the femoral head. Under normal circumstances, the development of bilateral femoral heads in children is roughly the same. However, when the left (right) hip joint dislocates, the development of the left (right) femoral head will lag behind the average developmental age.

Fig. 6
figure 6

Distribution of age prediction of femoral heads on the test dataset (2119 images, The left figure indicates the age prediction of left femoral head and vice versa. Numbers in figure denote number of images

By predicting the developmental age of the femoral head, we can analyze whether there is a delay in the femoral head in children. When the ages of the femoral heads on both sides are inconsistent, it can also assist doctors in confirming the situation of hip dislocation. We analyze the accuracy of the prediction of the femoral head ages on the left and right sides in Table 6. When the age of the femoral head predicted by the proposed network is consistent with the actual age, the prediction is considered to be accurate. From Table 6, we can see that the accuracy of the predicted age of the left femoral head is 89.5%, and the accuracy of the right femoral head is 89.1%.

Similarly, we also analyze the accuracy of the sub-test dataset of the femoral head center in different quadrants. Table 7 shows the accuracy for children of different ages. We can find that the accuracy of age prediction of children aged 0.1-2 is the highest, and the accuracy of the left and right sides is 96.1% and 96.6% respectively. The age prediction accuracy of children aged 2-12 years is not high because the amount of images of this age interval in the training dataset is relatively small. We can see in the table that the accuracy of age greater than 12 years old is 0.0% and 12.5%. The reason is that images whose ages are older than 12 are scarce in the dataset.

Table 6 Accuracy of age prediction for both left and right femoral heads
Table 7 Accuracy of age prediction of left and right femoral heads in different age intervals

To further illustrate the results of our network’s age prediction, we compare our predicted age to the actual developmental age of the femoral head. The results of the age prediction of the left and right femoral heads are shown in Figure 6. Taking the prediction of the left femoral head as an example, we can find in Figure 4 that there are 1555 children with true ages between 0.1-2, and the predicted results are 1495 cases within 0.1-2 years old, and 60 images belong to 2-6 years old. To the 419 images with real ages of 2-6 years, 347 images are predicted to be consistent with real age. 56 images are predicted in 0.1-2 years and 16 images are belong to 6-12 years.

4.6 Compared to the traditional method

The traditional landmark detection method is represented by template matching [21]. This method can detect the landmarks, but it cannot predict bone age. Tables 8 and 9 show the template matching method results in landmark detection and acetabular index detection. We can see that our method outperforms the traditional method by a large margin. This phenomenon indicates the effectiveness of deep learning based approach for hip dislocation detection. There are three reasons attributed to this phenomenon. First, multi-task hourglass network uses a large number of convolution kernels to extract features, which captures more details of the image. Second, the traditional method for landmark detection is unstable because the local diversity of the landmarks makes it challenging to describe with simple templates [46]. Third, deep neural networks are nonlinear and optimized with the gradient descent method which can effectively fit the target function (Table 10).

Table 8 Comparison between our proposed model and the template matching method of landmark (1, 2, 3) [21] on the test dataset (2119 images)
Table 9 Comparison between our proposed model and the template matching method of landmark (4, 5, 6) [21] on the test dataset (2119 images)
Table 10 Mean±SD and median error of acetabular index detection of our proposed model and template matching method [21] on the test dataset (2119 images)

Using deep learning to detect landmarks in X-ray images is a new method to solve the challenge of insufficient professional doctors, especially in remote and poor areas. The method based on deep learning to diagnose DDH can help more children in the world. Typically, a professional doctor judges a DDH X-ray for about 5 minutes. And the template matching method consume near 30 seconds to process one image. Our proposed method can detect 20 X-ray images in 1 minute. Based on the deep learning method, the detection speed is very fast, and label errors by different doctors can be reduced.

4.7 Performance of online service

Fig. 7
figure 7

Illustration of the online service of our method. When potential patients upload their hip X-ray image for DDH diagnosis, the GPU server is processing the image immediately. The model provides comprehensive evaluation by our multi-task hourglass network and saves the image in the database. If the result from our platform is phase I to IV, a notification is triggered and an a notification is sent to the users. Thus, patients are informed that their condition and get an early diagnosis

Table 11 Mean±SD and median error of landmark (1, 2, 3) detection of our proposed model on the test dataset (2119 images) and online diagnosis (112 images)
Table 12 Mean±SD and median error of acetabular index detection of our proposed model on the test dataset (2119 images) and online diagnosis (112 images)

The online service of automatic diagnosis is illustrated in Figure 7. In our online web system, we use a server with the setting of CPU Intel Xeon E5-2630 @ 2.2GHz and GPU Titan XP 12GB. The operating system is Ubuntu 18.04, and apache2 is used to support the web application. The storage of the database is 1TB. The workflow of the online diagnosing is: (1) the user uploads the image to the server via the web address, (2) the server reads the image and sends it to the model, (3) the proposed multi-task hourglass network process the image and output the results, (4) the database collects the image and the results, (5) the user gets the interpretable diagnostic results from the website.

We have collected 112 DDH cases from our website. Noted that all of the collected data have been desensitized and only used for academic purposes. We compare the performance between the images that collected from our web service platform and the images from the test set as shown in Table 11. We can see that our method has a similar performance in both sets. This phenomenon demonstrates that the proposed hourglass network can generalize the new data.

Meanwhile, we illustrate the performance of online diagnosis in Table 12. It shows the results of acetabular index detection of the online diagnosis. The results suggest that our method increased the mean angle error by around 0.3 when measuring online data. Thus, our approach can still perform well in measuring the acetabular index for the online data.

There are two potential social impacts of the proposed services. From a positive point of view, the online system could accelerate the diversification of diagnostic modes. Our system provides another way for patients to learn about their health conditions. From an opposing point of view, online diagnosis can also lead to misdiagnosis and may bear more criticism than a doctor because the online diagnosis system is open for public use, which could amplify the influence of the mistake. Nevertheless, the widespread use of our model in the future will continue to expand the DDH database. At the same time, we will carry on refining and evaluating our method for better performance for online automatic diagnosis.

5 Conclusion

We propose a novel approach based on the multi-task hourglass network to detect hip dislocation. Online Diagnosing X-ray images of hip dislocation based on deep learning is a new attempt. We design a novel network that simultaneously predicts the locations of six landmarks in the X-ray image and predicts the age of the femoral head. The detection of landmarks is of great significance for medical diagnosis, and accurate medical diagnosis requires precise landmarks localization. The proposed network achieves an average pixel error of the six landmarks are 4.64. The prediction of the developmental age of the femoral head is equally important. Under normal circumstances, the age of femoral head development is nearly the same as the child’s actual age. Experiments show that the accuracy of the proposed network is 89.3% for the femoral heads.

For the future work, there are some weaknesses in our algorithms that need to be improved.

  1. 1)

    The pixel error should be minor for accurate DDH diagnosis. Although the average pixel error of these six landmarks is small, the pixel error of some images is significant. The reason is that the diversity of bone morphology in these X-ray images is complex.

  2. 2)

    There are very few patients older than 12 years in the database. Therefore, the trained model cannot thoroughly learn the bone age information in these images. With the expansion of online data in the future, we will add images of children older than 12 years to improve the accuracy of age prediction.