Disaster cassification net: A disaster classification algorithm on remote sensing imagery

Yuan, Jianye; Ding, Xinwang; Liu, Fangyuan; Cai, Xin

doi:10.3389/fenvs.2022.1095986

ORIGINAL RESEARCH article

Front. Environ. Sci., 06 January 2023
Sec. Environmental Informatics and Remote Sensing
Volume 10 - 2022 | https://doi.org/10.3389/fenvs.2022.1095986

Disaster cassification net: A disaster classification algorithm on remote sensing imagery

Jianye Yuan¹ www.frontiersin.org

Xinwang Ding¹* www.frontiersin.org

Fangyuan Liu² www.frontiersin.org

Xin Cai³

¹School of Electronic Information, Wuhan University, Wuhan, China
²The Second Clinical Medical College, Jinan University, Shenzhen, China
³School of Electrical Engineering, Xinjiang University, Urumqi, China

As we all know, natural disasters have a great impact on people’s lives and properties, and it is very necessary to deal with disaster categories in a timely and effective manner. In light of this, we propose using tandem stitching to create a new Disaster Cassification network D-Net (Disaster Cassification Net) using the D-Conv, D-Linear, D-model, and D-Layer modules. During the experiment, we compared the proposed method with “CNN” and “Transformer”, we found that disaster cassification net compared to CNN algorithm Params decreased by 26–608 times, FLOPs decreased by up to 21 times, Precision increased by 1.6%–43.5%; we found that disaster cassification net compared to Transformer algorithm Params decreased by 23–149 times, FLOPs decreased by 1.7–10 times, Precision increased by 3.9%–25.9%. Precision increased by 3.9%–25.9%. And found that disaster cassification net achieves the effect of SOTA(State-Of-The-Art) on the disaster dataset; After that, we compared the above-mentioned MobileNet_v2 with the best performance on the classification dataset and CCT network are compared with disaster cassification net on fashion_mnist and CIFAR_100 public datasets, respectively, and the results show that disaster cassification net can still achieve the state-of-the-art classification effect. Therefore, our proposed algorithm can be applied not only to disaster tasks, but also to other classification tasks.

1 Introduction

Natural disasters refer to natural phenomena that can cause damage to human production and life, including drought, high temperature, low temperature, cold wave, flood and volcanic eruption, etc. (Botzen et al., 2020). Traditional natural disaster detection techniques frequently ignore many natural disasters in various surroundings in favor of detecting a single natural disaster in a particular situation and location. For instance, literature (Barmpoutis et al., 2020) proposed a system to detect forest fires using remote sensing images in real time; literature (Wang and Xu, 2010) proposed a method to track changes in the severity of forest damage following hurricane disasters using remote sensing images; literature (Saad et al., 2021) proposed an earthquake monitoring framework based on deep learning with an algorithm that can be used in four different seismic zones; The literature (Anusha and Bharathi, 2020) monitors flood dangers in real time using wireless sensor networks, whereas the literature (Al Qundus et al., 2020) employs radar and optical data to detect and map flood hazards. In conclusion, we examine the uncertainty and vast scale of natural disaster occurrence, we classify the primary natural hazards in real time, and monitoring is essential. Traditional natural disaster monitoring are monitored in a specific environment for a single catastrophe hazards.

According to the guideline for the loss of people’s life safety from natural disasters issued by the state, we selected the four types of natural disasters with the greatest impact for classification, including hurricanes, earthquakes, floods, and fires. Our contributions are listed as follows:

(1) The modules for the D-Layer and D-model are proposed;

(2) Combining four components, D-Conv, D-model, D-Layer and D-Linear, in tandem to form the D-Net disaster classification algorithm;

(3) Experiments are conducted with natural disaster datasets and two public datasets to demonstrate the effectiveness and generalization ability of our algorithm.

2 Convolutional neural networks and transformers

Deep learning has quickly gained popularity and has produced numerous promising outcomes in areas as image segmentation (Zhu et al., 2022) and classification (Yuan et al., 2022). We used the deep learning method, and the later experiments mainly compared the CNN(Convolutional Neural Networks) (Sun et al., 2021) model and Transformer (Han et al., 2021) model. The CNN and Transformers algorithms will be introduced respectively in the following.

2.1 Convolutional neural networks

CNN is a variant of MLP (Multilayer Perceptron) (Tolstikhin et al., 2021) It is a feedforward neural Network model consisting of learnable weights and bias constants of neurons. In the process of feature extraction, common CNN includes Convolutional layer, Rectified linear units layer, Pooling layer, and Fully-connected layer. Convolutional layer is a three-dimensional feature extractor. Each filter trains a depth, and the features of multiple units are trained through multiple filters, so as to achieve the purpose of feature extraction. In addition, it is characterized by weight and parameter sharing. Pooling layer is a downsampling process, which aims to reduce feature maps and generally includes Max Pooling, Mean Pooling (Zeng et al., 2019), Gauss Pooling (Kobayashi, 2019), etc. The Fully-connected layer acts as a classifier in the whole CNN and converts the previous layer of convolution into 1 × 1 convolution, and it can be replaced by the convolution layer in practical work. CNN is composed of input layer, hidden layer, and output layer. The input layer and output layer only contain one layer, while the hidden layer can be composed of multiple layers. Therefore, the simplest MLP is composed of three layers, as shown in Figure 1.

FIGURE 1

FIGURE 1. The simplest MLP model structure.

The modern neural Networks mainly stem from the proposal of AlexNet (Zhu et al., 2021) model in 2012, which makes many scientists start to use convolutional neural Networks to solve image problems. Meanwhile, the 3*3 convolution kernel proposed by VGG (Ding et al., 2021) reduced the running time of the model; GoogleNet (Ran et al., 2021) improved the complexity of the model by increasing the width of the model; ResNet (Wightman et al., 2021) proposed in 2016 solved the phenomenon of gradient disappearance and explosion, and further accelerated the development of neural Network. In 2018, SENet (HermineMariette et al., 2021) model was proposed to make the model focus on important parts. With the proposal of EfficientNet (Tan and Le, 2019) and RegNet (Mahbub et al., 2021), deep learning has been developing in image processing tasks and has become a widely accepted technology among scientists.

2.2 Transformers

With the continuous development of CNN Network model, researchers, through in-depth research on the attention mechanism in CNN, found that only attention mechanism (Mormann and Russo, 2021) can be used to carry out machine translation, image classification, and other operations on the Network, without using other modular layer structures of CNN. Therefore, more and more researchers pay attention to the attention mechanism, and Transformer is the best attention mechanism module. Transformer, as a rising star of artificial intelligence, emerged as a result of the application of Google’s BERT model (Tenney et al., 2019) to NLP (Natural Language Processing) (KangCai et al., 2020) tasks in 2019 and achieved remarkable results in machine translation tasks. As shown in Figure 2.

FIGURE 2

FIGURE 2. Transformer model structure diagram.

With the continuous development of Transformer, many Transformer models have been proposed, such as albert algorithm (GOH et al., 2021), beit algorithm (Lev et al., 2021), deit algorithm (Maurice et al., 2021), vit algorithm (Yuan et al., 2021), and swin algorithm (Liu et al., 2021). This model not only shows satisfactory results in the field of NLP (Cambria and White, 2014) but also achieves satisfactory results in the image classification processing. Therefore, we further optimized the algorithm in the experimental environment and compared it with Transformer, and concluded that our D-Net algorithm had a better classification processing effect and portability.

3 Related work

In view of the continuous development of deep learning and the impact of disasters on people’s life and property, we proposed a new classification Network D-Net for disaster classification tasks. Among them, D-Net is composed of D-Model, D-ConV, D-Layer and D-Linear, the important part of which is D-Model.

3.1 D-model

In order to improve the superiority of D-Net Network, we proposed a D-Model module, which was composed of six convolutional layers and connected with each other through two jump connections, as shown in Figure 3. It can be found that the D-Model was divided into three parts, namely One-part, Two-part, and Three-part. The One-part consisted of two 1 × 1 convolution, a 3 × 3 deep convolution (Guo et al., 2019), three BN (Batch Normalization) layers (Bjorck Gomes et al., 2018), and a ReLU6 activation function (Zou et al., 2020). The output X value was directly added to the output M directly through the short connection on the channel dimension, and the output Y value was obtained; after that, in the Two-part, which was composed of two 1 × 1 convolution and ReLU6 and Sigmoid activation function (Hanna and Kaiser, 2021), the Two-part input value Y was added to the output value Q in the channel dimension to obtain the output characteristic graph N. Finally, the Three-part was composed of a 1 × 1 deep convolution and a BN layer.

FIGURE 3

FIGURE 3. D-model module structure diagram.

3.2 D-Net

We open the network structure for building the model from the foundation, the D-Net Network structure is shown in Table 1, which consists of one D-ConV, four D-Model modules, one D-Layer, and one D-Linear. The D-layer consists of One-part and Two-part of the D-Model. Dep-Conv denotes deep convolution, where the number of groups of each Dep-ConV is set to be the same as the number of input channels. Since BN layer and activation function do not change the size of input and output images of the algorithm, Input Size and Output Size in Table 1 both represent the size of input and output images of the convolution layer. Since D-Model contains six convolution layers and D-layer contains five convolution layers, excluding Linear, the full connection layer, D-Net Network has a total of 30 layers. Subsequent experiments verify that the algorithm performs well on disaster data sets, so the D-Net proposed by us is suitable for disaster classification tasks.

TABLE 1

TABLE 1. D-Net Network structure table.

At the same time, we visualize the D-Net network structure, as shown in Figure 4, where each layer of the model is labeled with modules using the output, and by comparison with Table 1, we can find that the D-Module and D-Layer only differ by the last layer “ConV/Sigmoid”, Our algorithm inputs an image and outputs a class of images after feature extraction by each model. At the same time, Figure 4 can better show the algorithm flow of D-Net network structure.

FIGURE 4

FIGURE 4. D-Net visualization network structure diagram.

4 Experiment procedure

4.1 Lab environment

Our experiment was carried out on ubuntu20.04 system with version 10.1.243 cuda, version 1.7.1 Pytorch and version 3.10 Python. The batch size was 32 and .57 m for each iteration. The dataset was from the kaggle competition dataset (Cyclone, 2021) and the State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing of Wuhan University, with a total of 9,792 pieces. Among them, there were 7,791 in training sets and 2,001 in test sets. For each type of disaster training test, the number of images in training sets accounted for 80% and the number of images in test sets accounted for 20%. The horizontal and vertical resolution of disaster images were 96dpi and the bit depth was 24. Among them, the image output and input size was 224 × 224, the iteration cycle was 100 times, the optimizer was SGD (Woodworth et al., 2020), the initial learning rate was .01, and weight_decay was set to .0004.

4.2 Evaluation indicators

In order to optimize the effect of the data visualization model, the evaluation indicators we selected included confusion matrix (Chicco et al., 2021), accuracy curve, loss function curve, precision rate (top1, top3, and top5) (MitchellBillingsley et al., 2021) and recall rate (Zhong et al., 2021). Confusion matrix is to place all the forecast results and real results of the model in a unified table, and the number of correct and wrong recognition classes can be intuitively displayed through the table using supervised learning method. As shown in Table 2, TP represents that the real result is positive and the forecast result is positive; FN represents that the real result is negative and the forecast result is negative; FP means that the real result is negative and the forecast result is positive; TN means that the real result is positive and the forecast result is negative. Each blank represents the number of categories in this case.

TABLE 2

TABLE 2. Confusion matrix classification table.

Among them, Accuracy, Precision, Recall formulas and F1 values are shown in Formula Eqs. 1–4 respectively. Precision reflects the correct number of images detected by the algorithm; Recall reflects the number of images of all the correct categories detected; F1 value is the result of Precision and Recall weighted harmonic averaging.

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N} (1)

P r e c i s i o n = \frac{T P}{T P + F P} (2)

R e c a l l = \frac{T P}{T P + F N} (3)

F 1 = \frac{2 * P r e c i s i o n * R e c a l l}{P r e c i s i o n + R e c a l l} (4)

4.3 Experimental comparison

In order to demonstrate the effect of D-Net algorithm, we compared it with CNN algorithm and Transformer respectively, so as to verify the effect of D-Net algorithm through different experimental effects.

4.3.1 Convolutional neural networks experimental comparison

First, it can be seen from Table 3 that this paper compared the algorithm model D-Net with the CNN models MobileNet_v2 (Zhang and Ding, 2020), ResNet152, MnasNet1_3 (Tan et al., 2019), SqueezeNet1_1 (Koonce, 2021), Efficient_b7 (Wang et al., 2021), and VovNet57a (Su et al., 2022). It can be found that FLOPs and Params are much lower than other algorithm models. The FLOPs of D-Net are close to MnasNet1_3, but Params are 107 times less than MnasNet1_3. Compared with Precision, it is found that the Precision of MnasNet1_3 is only .508, which is not suitable for classifying disaster tasks. D-Net is still far ahead in Precision, which is higher than other algorithm models and has a better classification effect. Therefore, it is concluded that the D-Net algorithm model proposed in this paper has better classification processing ability than the CNN algorithm model.

TABLE 3

TABLE 3. Comparison of D-Net and CNN model data. MobileNet, MnasNet, SqueezeNet, and Efficient are lighter modeling methods, while ResNet and VovNet are among the better performing methods.

4.3.2 Transformer experimental comparison

In order to further verify the algorithm effect, the D-Net was compared with the Transformer algorithm ViT, DeepViT (Zhou et al., 2021), CaiT (Martín Sujo et al., 2021), CCT (Tang et al., 2021), PiT (Abdulai and Sharifzadeh, 2021), LeViT (Levit and Malenko, 2011) and CvT (Wu et al., 2021) models. It can be seen from Table 4 that D-Net is lower in FLOPs and Params than other models, and its FLOPs are generally 5 to 10 times lower than the Transformer model; Params is generally more than 20 times lower than the Transformer model. Compared with LeViT with the smallest FLOPs and Params of the Transformer model, D-Net is still 435M lower than the LeViT algorithm model in FLOPs; its Params are nearly 20 times lower than the LeViT algorithm model. On Precision, D-Net is about 5% higher than the Transformer model, and the Precision of DeepViT and CCT is only .686 and .675, which is not suitable for disaster classification tasks. Therefore, the algorithm model D-Net in this paper is more suitable for disaster classification tasks than the Transformer model.

TABLE 4

TABLE 4. D-Net and Transformer algorithm model data comparison. While ViT, DeepViT, CaiT, and PiT are more cutting-edge approaches with superior experimental performance, CCT, LeViT, and CvT are lighter ways.

It can be seen from Figure 5 and Figure 6 that the Accuracy and Loss of the D-Net algorithm tend to be stable after 100 iterations, which proves that it is effective to set the number of iterations to 100 in our experiments. In addition, according to the confusion matrix in Figure 7, it can be seen that Flood has the highest number of correct identifications, followed by Wildfire, Earthquake, and Cyclone. The algorithm model misidentified Earthquake as Flood 33 times; Flood was misidentified as Earthquake 23 times; Wildfire was misidentified as Flood 22 times; Flood was misidentified as Wildfire 10 times, and other misidentification rates are lower. As can be seen from Figure 8, the ROC curve of our algorithm performs well, and the effect of AUC reaches .993. This proves that the data in this paper is highly effective, and further verifies that D-Net is suitable for application in disaster classification task processing.

FIGURE 5

FIGURE 5. D-Net algorithm Accuracy curve.

FIGURE 6

FIGURE 6. D-Net algorithm Loss curve.

FIGURE 7

FIGURE 7. D-Net algorithm confusion matrix diagram.

FIGURE 8

FIGURE 8. D-Net algorithm ROC and AUC curves.

4.4 Public datasets

To further verify the generalization ability of the D-Net algorithm model, we used the fashion_mnist dataset (Khanday et al., 2021) and the cifar_100 dataset (Hirose et al., 2022) for further experiments.

4.4.1 fashion_mnist

The fashion_mnist dataset contains ten data categories, of which there are 60,000 images in training sets and 10,000 in test sets, and each image is 28 × 28 in size, width and height are both 28 pixels. All of the images are in “png” format and are categorized as follows: “T-shirt/top,” “Trouser,” “Pullover,” “Dress,” “Coat,” “Sandal,” “Shirt,” “Sneaker,” “Bag,” and “Ankle boot.” We compared the MobileNet_v2 with better performance in Table 3 and the CCT algorithm model with better performance in Table 4 with D-Net respectively. It can be seen from Table 5 that in terms of Recall and F1 values, D-Net and MobileNet_v2 are basically the same, slightly better than CCT; in terms of accuracy Top-1, Top-3, and Top-5, D-Net is still basically the same as MobileNet_v2 and performs better than the CCT algorithm model. It is concluded that the algorithm model D-Net proposed in this paper has a good generalization ability on the fashion_mnist dataset, and there is no abnormal situation, which is suitable for application in other classification tasks.

TABLE 5

TABLE 5. Data comparison of D-Net on fashion_mnist.

4.4.2 cifar_100

The cifar_100 dataset contains 100 categories of images, and each category has 600 three-channel color images of size 32 × 32, including 50,000 for the training set and 10,000 for the test set. We use the cifar_100 (http://www.cs.toronto.edu/∼kriz/cifar.html) dataset from the public data of the official website, width and height are both 32pixels. It can be seen from Table 6 that D-Net is basically the same as CCT in terms of Recall, F1 value and Top-1, and CCT is about 4% higher than Top-3 and Top-5. The comprehensive cost performance shows that D-Net is slightly better than CCT algorithm model. Compared with MobileNet_v2, D-Net is basically the same in Recall, F1 value, Top-1, Top-3 and Top-5, and the impact on the algorithm model is almost negligible. Considering the performance of D-Net on public datasets, it can be concluded that D-Net does not have large abnormal classification errors. Therefore, the D-Net proposed in this paper has good stability and robustness, and is suitable for application in other classification tasks.

TABLE 6

TABLE 6. Data comparison of D-Net on cifar_100.

5 Conclusion

With the increasing number of natural disasters, it is very important to classify and deal with the disasters effectively. Therefore, we propose a fast and efficient disaster classification network D-Net. We compared “CNN” and found that D-Net not only reduced FLOPs and Params by more than 100 times, but also maintained a high classification accuracy; compared with “Transformer” network, we found that D-Net’s FLOPs and Params were reduced by more than 20 times, it Precision is about 5% higher than the “Transformer” model. In addition, we conducted experiments on the public datasets fashion_mnist and cifar_100. We compared the two networks MobileNet_v2, CCT and D-Net, which performed the best on the disaster dataset, and found that D-Net still has a good classification effect. Therefore, we can conclude that the D-Net network is not only suitable for disaster datasets, but also for other classification tasks, with high generalization and portability.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Materials, further inquiries can be directed to the corresponding author.

Author contributions

JY: wrote the paper, programmed the paper code, etc. XD: provided funding support and reviewed the paper. FL: embellished the language and modified the format of the paper. XC: to review the format of the thesis, etc.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The handling editor YL declared a shared affiliation with the authors JY and XD at the time of review.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Abdulai, M., and Sharifzadeh, M. (2021). Probability methods for stability design of open pit rock slopes: An overview. J. Geosci. 11 (8), 319. doi:10.3390/geosciences11080319

ORIGINAL RESEARCH article

Disaster cassification net: A disaster classification algorithm on remote sensing imagery

1 Introduction

2 Convolutional neural networks and transformers

2.1 Convolutional neural networks

2.2 Transformers

3 Related work

3.1 D-model

3.2 D-Net

4 Experiment procedure

4.1 Lab environment

4.2 Evaluation indicators

4.3 Experimental comparison

4.3.1 Convolutional neural networks experimental comparison

4.3.2 Transformer experimental comparison

4.4 Public datasets

4.4.1 fashion_mnist

4.4.2 cifar_100

5 Conclusion

Data availability statement

Author contributions

Conflict of interest

Publisher’s note

References

People also looked at