Deep Learning Combined with Hyperspectral Imaging Technology for Variety Discrimination of Fritillaria thunbergii

Kabir, Muhammad Hilal; Guindo, Mahamed Lamine; Chen, Rongqin; Liu, Fei; Luo, Xinmeng; Kong, Wenwen

doi:10.3390/molecules27186042

Open AccessArticle

Deep Learning Combined with Hyperspectral Imaging Technology for Variety Discrimination of Fritillaria thunbergii

¹

College of Biosystems Engineering and Food Science, Zhejiang University, 866 Yuhangtang Road, Hangzhou 310058, China

²

Department of Agricultural and Bio-Resource Engineering, Abubakar Tafawa Balewa University, Bauchi PMB 0248, Nigeria

³

College of Mathematics and Computer Science, Zhejiang A & F University, Hangzhou 311300, China

^*

Author to whom correspondence should be addressed.

Molecules 2022, 27(18), 6042; https://doi.org/10.3390/molecules27186042

Submission received: 23 August 2022 / Revised: 9 September 2022 / Accepted: 12 September 2022 / Published: 16 September 2022

(This article belongs to the Special Issue Analytical Chemistry in Agriculture Application)

Download

Browse Figures

Versions Notes

Abstract

:

Traditional Chinese herbal medicine (TCHM) plays an essential role in the international pharmaceutical industry due to its rich resources and unique curative properties. The flowers, stems, and leaves of Fritillaria contain a wide range of phytochemical compounds, including flavonoids, essential oils, saponins, and alkaloids, which may be useful for medicinal purposes. Fritillaria thunbergii Miq. Bulbs are commonly used in traditional Chinese medicine as expectorants and antitussives. In this paper, a feasibility study is presented that examines the use of hyperspectral imaging integrated with convolutional neural networks (CNN) to distinguish twelve (12) Fritillaria varieties (n = 360). The performance of support vector machines (SVM) and partial least squares-discriminant analysis (PLS-DA) was compared with that of convolutional neural network (CNN). Principal component analysis (PCA) was used to assess the presence of cluster trends in the spectral data. To optimize the performance of the models, cross-validation was used. Among all the discriminant models, CNN was the most accurate with 98.88%, 88.89% in training and test sets, followed by PLS-DA and SVM with 92.59%, 81.94% and 99.65%, 79.17%, respectively. The results obtained in the present study revealed that application of HSI in conjunction with the deep learning technique can be used for classification of Fritillaria thunbergii varieties rapidly and non-destructively.

Keywords:

convolutional neural network; flavonoids; essential oils; saponins; alkaloids; traditional Chinese herbal medicine; Fritillaria thunbergii

1. Introduction

The Fritillaria genus consists of several species, all of which have been domesticated in China since 3500 BC. There are three major species within this genus: Fritillaria thunbergii (Zhebeimu), Fritillaria chuanbeiiensis (Pingbeimu), and Fritillaria ussuriensis (Chuanbeimu). As well as being a valuable herb, it is also one of the most important economic crops for herb growers. It is estimated that Fritillaria is planted on over 6000 hectares in China. Besides producing over 20,000 tons per year, it also provides farmers with an income of CNY 700 million every year [1]. It has been mentioned in the earliest Chinese herbal monograph “Shen Nong’s Herbal Classic” as a remedy for coughing. In the 2010 version, it remains the same. The Chinese Health Law [2002] No. 51 recognizes all three types of Fritillaria as edible due to their non-toxic nature [2]. Fritillaria is considered to promote lung dispersal, dissolve phlegm, relieve coughing, detoxicate, and dissolve lumps and masses in the chest [1].

Hyperspectral imaging has been incorporated into several research fields using remote sensing. Basically, it involves splitting the electromagnetic spectrum into several bands, thereby providing sufficient spectral resolution while covering a wide range of wavelengths (in this case, hundreds of bands). A hyperspectral image represents the spectrum as a series of images, each representing a narrow band of light, rather than depicting it in two dimensions [3,4].

In addition, this method is non-destructive, rapid, and exhibits a high spectral resolution, thus enabling accurate identification of a variety of chemical compounds. The high spectral resolution allows one to identify unique absorption features in minerals as a result of the interaction between radiation and their crystalline structure [5]. Hyperspectral imaging allows the observation of a variety of wavelengths, including ultraviolet (UV), visible and near-infrared (Vis-NIR), shortwave infrared (SWIR), and longwave infrared (LWIR). Traditional Chinese medicine can be evaluated using hyperspectral imaging [6]. In recent years, spectroscopic techniques and spectral imaging have been widely used to identify agricultural product origins and analyze their quality as rapid, non-destructive testing methods [7,8,9,10,11,12,13,14,15,16,17].

Artificial Intelligence (AI) techniques, such as deep learning (DL), enable machines to acquire knowledge from data autonomously [18]. There are a variety of deep learning models available, but one of the most popular is the convolutional neural network (CNN). A CNN consists of three layers: a convolutional layer, a pooling layer, and a fully connected layer for feature extraction, compression, and classification. Combining several convolutional and pooling layers allows abstract features to be learned more effectively. In the field of computer vision, CNNs have shown remarkable performance in a variety of tasks. As part of hyperspectral image analysis, CNN is used to classify images captured using hyperspectral remote sensing in two and three dimensions [1]. Different CNNs have been developed over the past few years based on specific tasks in spectral analysis, such as single rice seed [17], rice seed varieties [19], hybrid seeds [10], and chrysanthemum varieties [9].

In a limited number of studies, deep learning has been used to identify traditional Chinese medicine. A study is needed to determine whether CNN can discriminate between the varieties of Fritillaria thunbergii. The main objective of the study is to examine whether HSI combined with CNN could be used for variety discrimination of Fritillaria thunbergii varieties. Specifically, the following objectives were to be achieved: (1) to study the performance of SVM, PLS-DA, and CNN based on the number of training samples, (2) to evaluate the performance of convolutional neural network (CNN) in comparison to support vector machine (SVM) and partial least squares-discriminant analysis (PLS-DA), and (3) to analyze the outcomes of the identification of Fritillaria thunbergii varieties according to the best model.

2. Materials and Methods

2.1. Sample Preparation

The College of Biosystems Engineering and Food Science at Zhejiang University, China, provided 12 different varieties of Fritillaria for the study. The Fritillaria samples were coded as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, and 12, as presented in (Table 1) for later data processing. Each of the variety had 30 samples; in total, 360 samples were provided. The samples were not subjected to any additional processing. The data set of each variety was divided into a training and testing sets ratio of 4:1 (80:20), respectively.

2.2. Hyperspectral Image Acquisition and Correction

The hyperspectral images of Fritillaria thunbergii were obtained using a near-infrared HSI system. It consisted of a set of devices that interact: Spectral Imaging Ltd., Oulu, Finland; utilized an imaging spectrograph (ImSpector N17E) that has a spectral range of 874–1734 nm and a high-performance camera (OLES22) that provides a spatial and spectral resolution of 326 × 256 pixels. A stepped motor-driven conveyer belt controlling two 150-Watt tungsten halogen lamps (3900e Light source; Illumination Technologies Inc.; West Elbridge, NY, USA) was used to move the samples. It was determined that 25 cm, 4 ms, and 19.5 mm/s were the appropriate distances between the lens and the conveyor belt to produce clear, non-deformable hyperspectral images. This study acquired hyperspectral images of Fritillaria with 256 spectral channels and a resolution of 5 nm. Using Equation (1), a white and black reference image was used to correct the raw hyperspectral images to reduce the effect of dark currents and determine whether the samples were reflective or not.

I_{C} = \frac{I_{r a w} - I_{d a r k}}{I_{w h i t e} - I_{d a r k}}

(1)

where I_white is the hyperspectral image of a white Teflon tile with nearly 100% reflectance; I_dark is acquired by covering the camera lens with its opaque cap. I_raw, I_dark, I_dark are obtained under the same condition during samples collection.

2.3. Pretreatment and Extraction of Spectra

Before spectral analysis, each Fritillaria sample must be segmented from its black background. For obtaining binary masks, threshold segmentation of an image with maximum contrast between sample regions and the background was performed at 1019 nm using an image with maximum contrast between sample regions and the background. Grayscale images at other wavelengths were also masked with this binary mask to achieve this. Figure 1 illustrates both a binary and raw colored image. Each ROI within each Fritillaria sample was spectrally analyzed for a wavelength range of 974–1634 nm in addition to its ROIs. Instabilities in the hyperspectral imaging system resulted in random noise in the spectral data collected at the beginning and end of the sampling process. In this study, we examined the mid-wavelengths between 875 and 1546 nm. The pixel-wise spectrum was smoothed with a wavelet transform (WT), a decomposition scale of 3, and a primary function of 6. The reduction in spectral noise improved the signal-to-noise ratio. The pixel-wise spectra of each ROI were used to discriminate the different Fritillaria samples.

2.4. Software

The Fritillaria samples in the hyperspectral images were cropped from irrelevant backgrounds using ENVI 4.6 (ITT Visual Information Solutions, Boulder, CO, USA). Hyperspectral images were extracted and pre-processed using MATLAB R2018a (The MathWorks, Natick, MA, USA). MATLAB R2018a was also used to implement PCA for pattern recognition between different varieties. Spyder 3.2.6 (Anaconda, Austin, TX, USA) was used to implement Python-based discriminant models, including SVM, PLS-DA, and CNN. Programming was conducted with scikit-learn (http://scikit-learn.org/stable/, accessed on 22 August 2022) and Pytorch (Facebook, Menlo Park, CA, USA). An Intel(R) core (TM) i5-8500 processor with 3.00 GHz and 8G RAM was used as the hardware platform for the execution of all software tools.

2.5. Analysis of Chemometrics

2.5.1. CNN

Figure 2 shows an illustration of convolutional neural networks (CNNs). A one-dimensional spectrum input was incorporated into the design of VGGNet [20]. There are similarities between patterns found in spectral curves and those found in images. The peaks and minimums of a spectral curve are analogous to the edges of an image. This network is chosen because of its high performance in image classification and the ease of modification and extension it provides. Figure 2 illustrates the architecture in terms of five main blocks. Two convolutional layers follow a top pooling layer. Deeper blocks have more convolutional filters (starting at 16 and ending at 128). In convolutional layers, there are three kernels, one stride, and one padding. A convolutional algorithm learns local patterns based on its input and local connections. Convolutional layers can be chained together so that deeper layers are connected to a more significant portion of the input data. This results in different layers of learning features based on raw input. The data set of each variety was divided into a training and testing sets ratio of 4:1 (80:20), respectively.

The last block contains a fully connected layer (FC Block). The fully connected layer may learn combinations of features obtained from convolutional layers. It has two layers: dense and dropout layers [21]. The activation function of the original VGGNet architecture was a rectified linear unit (ReLU). The exponential linear unit (ELU) is shown to accelerate learning and outperforms the (ReLU) in some cases [22]. The performance of ELU activation with batch normalization was superior to ReLU activation [23]. Therefore, ELU is implemented as part of the architecture. The following is a description of an ELU function.

f (x) = {\begin{matrix} x i f x > 0 \\ \propto (\exp (x) - 1) i f x \leq 0 \end{matrix}

(2)

As a classification confidence score, values in the range [0, 1] are produced from the CNN output. A classification loss is calculated based on the samples’ confidence scores and their actual labels. As shown in the following equations, softmax and loss function are defined.

P_{i j} = \frac{e^{Z i j}}{\sum_{k = 1}^{K} e^{Z i k}} f o r j = 1, K

(3)

Loss = - \sum_{i} \sum_{j} l a b e l_{i j} \log (p_{i j})

(4)

where Z represents a CNN input, i represents a sample, j represents a class, and K represents the number of classes.

During CNN training, the data were normalized by dividing the standard deviation by the mean. Before pre-processing the test data, means and standard deviations were calculated on the training data. Initializing the weights of the CNN was carried out by the procedure described in [21]. The Adam algorithm optimized the softmax cross-entropy loss [24]. The following equation showed a gradual decrease in learning rate

(ŋ)

after training.

ŋ = \frac{ŋ_{0}}{1 + k t}

(5)

Based on this function, the initial learning rate

(ŋ_{0})

represents the number of epochs and the decrease in the learning rate represents k.

To find the best combination of hyperparameters, a grid search was conducted. A total of 256 batches was generated; dropout ratio was set at 0.5 and ELU at 1.0. To train the CNN, 800 epochs were conducted with the following parameters: h₀ = 0.0005 and k = 0.045.

2.5.2. PLS-DA

This method is considered to be a supervised technique that achieves the maximum level of discrimination between samples in the classification process [25]. The PLS-DA was cross-validated with leave-one-out. The absolute difference between the actual classification number and the predicted value was used to determine discrimination accuracy in both training and test sets. The data set of each variety was divided into a training and testing sets ratio of 4:1, respectively.

2.5.3. SVM

As a pattern recognition method, the Vapnik-Chervonenkis dimension theory and the structural risk minimization principle make SVMs very effective [12,26]. Due to its ability to find a global minimum, SVM differs from neural networks because fewer training samples are required. Radial basis functions (RBFs) are used to construct the kernel function. The data set of each variety was divided into a training and testing sets ratio of 4:1, respectively. A grid-search procedure was used to determine the penalty parameters (c) and kernel function parameters (g).

2.5.4. Discrimination Models Accuracy Evaluation

A well-known F-score was used to assess the discrimination accuracy of the models [27,28,29,30,31,32,33], in comparison to reference classifications, this metric measures the quality of origin discrimination. Specifically, it consists of the precision and recall values that are used to extract information. The precision, recall, and F-score are defined as follows, and their values are presented in Table 2.

Precision = \frac{True positives}{True positives + False positives}

(6)

Recall = \frac{True positives}{True positives + False positives}

(7)

F - Score = 2 \times \frac{Precision \times Recall}{Precision + Recall}

(8)

3. Results and Discussion

3.1. Spectral Features

Figure 3 illustrates that the valley and peak positions of the average spectra of the twelve (12) varieties were similar. While Fritillaria spectra were generally similar, a few slight differences were observed. These variations in spectrum properties are brought about by the various chemical and biological properties of these twelve (12) kinds. The peaks of spectral curves, at 1100 and 1300 nm, as well as the valleys at 1200 and 1460 nm, could be used to discriminate among Fritillaria varieties. At 1200 nm, the second overtone of C-H stretching is responsible for the two peaks and valleys [16,34]. Furthermore, at 1460 nm, a valley is associated with the first overtone of the O-H stretching [9,34]. The spectral curves of 5, 6, 7, 8, 9, 10, and 11 show a strong overlapping in the range of 950–1200 nm, indicating that these two varieties chemical compositions are similar. The wavelength between 1100 and 1300 nm might be associated with the second overtone of the C-H stretch [35]. Combination bands of C-H vibrations may be responsible for the wavelength between 1300 nm and 1400 nm [36]. The wavelengths between 1400 and 1450 nm might be ascribed to water bands [36]. An overtone of O-H stretch was found to have a wavelength around 1480 nm [37]. CH2 stretching and non-stretching were attributed to the wavelength at 1500 nm [38]. An overtone of the N-H stretch might produce the wavelength between 1500 nm and 1530 nm [39]. An overtone of C-H stretching might account for the wavelength around 1610 nm [40]. The aromatic C-H band was attributed to the wavelength around 1630 nm [11]. These wavelengths carrying the category information are closely related to the constituent differences in chemical composition of different Fritillaria variety.

3.2. PCA

Spectral data analysis is commonly performed using the principal component analysis (PCA). Using PCA, principal components (PCs) were determined from linear combinations of original variables. Depending on the interpretation of the variation, PCs are positioned orthogonally. First, PC examined nearly all variations, followed by second, third, etc. Generally, the first few PCs analyzed most variations [30,41]. PCAs are often used as a qualitative method of spectral analysis. This study used PCA to compare twelve (12) varieties of Fritillaria. During the PCA analysis, hyperspectral images of each variety of the testing set were randomly selected. Approximately 78.77% and 17.35% of the information in the original spectral data is reflected in the first and second PC, respectively.

Conversely, the first two PCs accounted for 96.12% of the variance. Based on this analysis, the two peak PCs contain virtually all spectral information for the various spectrum regions. Figure 3 shows the mean spectra of 12 varieties of Fritillaria. It was observed that the reflectance curves of Fritillaria resembled those of Fritillaria in [6,42]. As shown in Figure 4, there is little overlap between the twelve varieties. There appears to be a disconnection between the varieties. There was a rough separation of the Fritillaria samples. According to the PCA analysis, different varieties have different chemical compositions. Although the cluster trend could be observed in two dimensions, no distinction could be made between the samples. Consequently, discriminant algorithms were employed in this study [30,43,44].

3.3. CNN

Using a CNN as the discriminant model, Fritillaria samples were correctly classified. SVM and PLS-DA were introduced as contrast methods. The data set of each variety was divided into a training and testing sets ratio of 4:1, respectively. Spectral imaging requires machine learning methods to interpret the spectral data derived from various spectroscopy techniques. Artificial intelligence is focused on deep learning, and convolutional neural networks are among the most popular deep learning models. Two-dimensional images are typically analyzed using deep learning methods [17]. This study found that CNN can perform well when applied to one-dimensional spectra. CNN model showed improved performance over the SVM and PLS-DA models. As shown in this study, a new method for analyzing spectral data can be developed using CNN, which provides new methods for handling spectral data. Varieties of Fritillaria may differ greatly in chemical composition due to environmental influences, cultivation management, and other factors. Qiu et al. [17] analyzed whether hyperspectral imaging and convolutional neural networks can be used to distinguish rice seed varieties. Four rice seeds were photographed using hyperspectral imaging techniques in the 380–1030- and 874–1734-nanometer spectral regions. Generally, CNN models outperformed SVMs and KNNs, showing CNN’s effectiveness in analyzing spectral data. It was shown in this study that CNN had positive outcomes when used for the analysis of spectral data. Acquarelli et al. [45] suggested using a CNN structure to analyze data from vibrational spectroscopy, as comparison methods, PLS-DA, logistic regression, and KNN were employed. The CNN model demonstrated a good outcome. While CNN consistently outperformed other models, it was not always the best option. Liu et al. [46] classified pre-processed and not pre-processed Raman spectra using a CNN architecture. CNN performed better than models based on KNN, SVM, gradient boosting, random forest, and correlation analysis. In addition to the results presented in this paper, this study concludes that CNN can be used to analyze one-dimensional spectral data. The number of training samples was also examined to ascertain whether it affected the results. In general, as the number of training samples increases, the performance of machine learning methods increases. Models that have been trained cannot perform well on tests due to a lack of training samples. In view of the redundancy of information within the training samples, once a certain point is reached, performance may no longer be significant. Additionally, collecting samples may take considerable time [17]. It is, therefore, important to strike a balance between model performance and cost. With increased training samples, CNN outperformed SVM and PLS-DA models. A deep learning method may be able to learn features automatically, and more samples may enable a deeper exploration of potential feature combinations. In practical applications, models should be developed that are capable of identifying more Fritillaria varieties. Keeping a hold-out set of test data and gradually collecting samples for training so that there is no significant change in the test accuracy is essential if high model performance is achieved reasonably.

3.4. PLS-DA

Based on the confusion matrices, the results of the various models for the 12 varieties of Fritillaria are illustrated in Figure 5. Different varieties of Fritillaria produce different results when compared with different models. As the number of Fritillaria varieties increased, the performance of the PLS-DA model decreased. In linear classification, PLS-DA is an efficient approach [47]. Regarding the classification of seeds, neural networks and nonlinear models, such as support vector machines, outperformed linear models [17,48,49,50]. Based on a CNN discriminant model, 96.88% and 88.89% recognition accuracy were achieved in training and test sets, respectively, which were superior to the classification accuracy achieved for twelve (12) varieties of Fritillaria thunbergii by SVM and PLS-DA. The CNN effectively extracted features from the spectral data because it contained the most in-depth information. Many deep features were present in the spectral data, which were more easily extracted with a CNN. The deep architecture of deep learning models allows the extraction of more abstract and non-changing features from the data, resulting in a higher level of performance than traditional shallow classifiers [51]. Figure 5 illustrates how CNN and PLS-DA discriminant analysis models can maintain a relatively high level of performance compared to SVM.

3.5. SVM

The SVM algorithm is a nonlinear machine learning algorithm that uses nonlinear hyperplanes to classify complex data sets [26]. As the number of Fritillaria varieties increased, the discriminant performance of the SVM decreased, and the training and test sets were recognized accurately with an accuracy of 99.65% and 79.17%, respectively. Zhao et al. classified maize seeds using a radial basis function neural network, with calibration and prediction accuracy of 98.03% and 93.26%, respectively [13]. There was a problem of over-fitting in the SVM-based discriminant analysis model. It has been shown that discriminant models undergo overfitting when used with NIR hyperspectral imaging data [12,17,52]. A grid-search procedure was used to determine the penalty parameters (c), kernel function parameters (g), and best component in Figure 6. By using dropouts and batch normalization, discriminant models can be improved [10]. Deep learning algorithms provide superior models for discriminant analysis [17,20,52]. Comparing CNN with SVM and PLS-DA models, the CNN model performed significantly better when the number of Fritillaria varieties was increased. In addition, the discriminant model based on CNN was optimized to minimize the problem of overfitting. This allows the CNN to be used for the classification of Fritillaria thunbergii varieties using hyperspectral imaging technology. It is imperative that additional Fritillaria varieties be collected in order to develop an instrument for identifying Fritillaria varieties. Furthermore, a comprehensive research effort will be required in the future to assess the quality of Fritillaria.

4. Conclusions

The capability of hyperspectral imaging technology to discriminate was demonstrated using machine learning algorithms. The classification of Fritillaria varieties was successfully achieved by three algorithms. The accuracy of each model was optimized through cross-validation (CV), which was determined using the highest classification rate for each model. CNN model showed improved performance over the SVM and PLS-DA models, with F-scores of 89.38%, 79.63%, and 82.63%, respectively. There has been little research examining deep learning algorithms for the classification of traditional Chinese medicine using HSI, making this study an important contribution to the field. As a result of the investigation, some conclusions have been drawn. Developing more robust origin models capable of detecting regional and temporal variations is necessary in the future. A large data set representing a wide range of variability (such as geographical origin, harvest period, and harvest year) should be conducted. The results obtained in the present study revealed that application of HSI in conjunction with the deep learning technique can be used for classification of Fritillaria thunbergii varieties rapidly and non-destructively.

Author Contributions

Conceptualization, F.L. and M.H.K.; data curation, M.H.K., M.L.G., R.C. and X.L.; formal analysis, M.H.K.; funding acquisition, F.L. and W.K.; investigation, M.H.K.; methodology, M.H.K. and M.L.G.; project administration, F.L.; resources, F.L. and W.K.; software, M.H.K. and M.L.G.; validation, F.L.; visualization, F.L.; writing—original draft, M.H.K.; writing—review and editing, F.L., M.H.K., M.L.G., R.C., X.L. and W.K. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the Science and Technology Department of Zhejiang Province grant number 2021C02023.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy.

Conflicts of Interest

The authors declare no conflict of interest.

Sample Availability

Not applicable.

References

Hu, W.; Huang, Y.; Wei, L.; Zhang, F.; Li, H. Deep convolutional neural networks for hyperspectral image classification. J. Sens. 2015, 2015, 258619. [Google Scholar] [CrossRef]
Xu, J.; Zhao, W.; Pan, L.; Zhang, A.; Chen, Q.; Xu, K.; Lu, H.; Chen, Y. Peimine, a main active ingredient of fritillaria, exhibits anti-inflammatory and pain suppression properties at the cellular level. Fitoterapia 2016, 111, 1–6. [Google Scholar] [CrossRef] [PubMed]
Lorente, D.; Aleixos, N.; Gómez-Sanchis, J.; Cubero, S.; García-Navarrete, O.L.; Blasco, J. Recent advances and applications of hyperspectral imaging for fruit and vegetable quality assessment. Food Bioprocess Technol. 2012, 5, 1121–1142. [Google Scholar] [CrossRef]
Manley, M.; du Toit, G.; Geladi, P. Tracking diffusion of conditioning water in single wheat kernels of different hardnesses by near infrared hyperspectral imaging. Anal. Chim. Acta 2011, 686, 64–75. [Google Scholar] [CrossRef] [PubMed]
Zhang, H.; Wu, T.; Zhang, L.; Zhang, P. Development of a portable field imaging spectrometer: Application for the identification of sun-dried and sulfur-fumigated chinese herbals. Appl. Spectrosc. 2016, 70, 879–887. [Google Scholar] [CrossRef] [PubMed]
He, J.; He, Y.; Zhang, A.C. Determination and visualization of peimine and peiminine content in Fritillaria thunbergii bulbi treated by sulfur fumigation using hyperspectral imaging with chemometrics. Molecules 2017, 22, 1402. [Google Scholar] [CrossRef] [PubMed]
Zhu, S.; Chao, M.; Zhang, J.; Xu, X.; Song, P.; Zhang, J.; Huang, Z. Identification of soybean seed varieties based on hyperspectral imaging technology. Sensors 2019, 19, 5225. [Google Scholar] [CrossRef]
Zhu, S.; Zhang, J.; Chao, M.; Xu, X.; Song, P.; Zhang, J.; Huang, Z. A Rapid and highly efficient method for the identification of soybean seed varieties: Hyperspectral images combined with transfer learning. Molecules 2019, 25, 152. [Google Scholar] [CrossRef]
Wu, N.; Zhang, C.; Bai, X.; Du, X.; He, Y. Discrimination of chrysanthemum varieties using hyperspectral imaging combined with a deep convolutional neural network. Molecules 2018, 23, 2831. [Google Scholar] [CrossRef]
Nie, P.; Zhang, J.; Feng, X.; Yu, C.; He, Y. Classification of hybrid seeds using near-infrared hyperspectral imaging technology combined with deep learning. Sens. Actuators B Chem. 2019, 296, 126630. [Google Scholar] [CrossRef]
Zhang, C.; Liu, F.; He, Y. Identification of coffee bean varieties using hyperspectral imaging: Influence of preprocessing methods and pixel-wise spectra analysis. Sci. Rep. 2018, 8, 82166. [Google Scholar] [CrossRef] [PubMed]
Yin, W.; Zhang, C.; Zhu, H.; Zhao, Y.; He, Y. Application of near-infrared hyperspectral imaging to discriminate different geographical origins of Chinese wolfberries. PLoS ONE 2017, 12, e0180534. [Google Scholar] [CrossRef] [PubMed]
Zhao, Y.; Zhu, S.; Zhang, C.; Feng, X.; Feng, L.; He, Y. Application of hyperspectral imaging and chemometrics for variety classification of maize seeds. RSC Adv. 2018, 8, 1337–1345. [Google Scholar] [CrossRef] [PubMed]
Yu, Z.; Fang, H.; Zhangjin, Q.; Mi, C.; Feng, X.; He, Y. Hyperspectral imaging technology combined with deep learning for hybrid okra seed identification. Biosyst. Eng. 2021, 212, 46–61. [Google Scholar] [CrossRef]
Carreiro Soares, S.F.; Medeiros, E.P.; Pasquini, C.; de Lelis Morello, C.; Harrop Galvão, R.K.; Ugulino Araújo, M.C. Classification of individual cotton seeds with respect to variety using near-infrared hyperspectral imaging. Anal. Methods 2016, 8, 8498–8505. [Google Scholar] [CrossRef]
Feng, X.; Peng, C.; Chen, Y.; Liu, X.; Feng, X.; He, Y. Discrimination of CRISPR/Cas9-induced mutants of rice seeds using near-infrared hyperspectral imaging. Sci. Rep. 2017, 7, 15934. [Google Scholar] [CrossRef]
Qiu, Z.; Chen, J.; Zhao, Y.; Zhu, S.; He, Y.; Zhang, C. Variety identification of single Rice seed using hyperspectral imaging combined with convolutional neural network. Appl. Sci. 2018, 8, 212. [Google Scholar] [CrossRef]
Jordan, M.I.; Mitchell, T.M. Machine learning: Trends, perspectives, and prospects. Science 2015, 349, 255–260. [Google Scholar] [CrossRef]
Jin, B.; Zhang, C.; Jia, L.; Tang, Q.; Gao, L.; Zhao, G.; Qi, H. Identification of rice seed varieties based on near-infrared hyperspectral imaging technology combined with deep learning. ACS Omega 2022, 7, 4735–4749. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2015, arXiv:1409.1556. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1026–1034. [Google Scholar]
Clevert, D.-A.; Unterthiner, T.; Hochreiter, S. Fast and accurate deep network learning by exponential linear units (ELUs). arXiv 2016, arXiv:1511.07289. [Google Scholar]
Ioffe, S.; Szegedy, C. Batch normalization accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A Method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Almeida, M.R.; Fidelis, C.H.V.; Barata, L.E.S.; Poppi, R.J. Classification of amazonian rosewood essential oil by raman spectroscopy and PLS-DA with reliability estimation. Talanta 2013, 117, 305–311. [Google Scholar] [CrossRef]
Burges, C.J.C. A Tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 1998, 2, 121–167. [Google Scholar] [CrossRef]
Pandiselvam, R.; Mahanti, N.K.; Manikantan, M.R.; Kothakota, A.; Chakraborty, S.K.; Ramesh, S.V.; Beegum, P.P.S. Rapid detection of adulteration in desiccated coconut powder: Vis-NIR spectroscopy and chemometric approach. Food Control 2022, 133, 108588. [Google Scholar] [CrossRef]
Mahanti, N.K.; Chakraborty, S.K. Application of chemometrics to identify artificial ripening in sapota (Manilkara Zapota) using visible near infrared absorbance spectra. Comput. Electron. Agric. 2020, 175, 105539. [Google Scholar] [CrossRef]
Chakraborty, S.K.; Mahanti, N.K.; Mansuri, S.M.; Tripathi, M.K.; Kotwaliwale, N.; Jayas, D.S. Non-destructive classification and prediction of aflatoxin-B1 concentration in maize kernels using Vis–NIR (400–1000 nm) hyperspectral imaging. J. Food Sci. Technol. 2021, 58, 437–450. [Google Scholar] [CrossRef]
Kabir, M.H.; Guindo, M.L.; Chen, R.; Liu, F. Geographic origin discrimination of millet using Vis-NIR spectroscopy combined with machine learning techniques. Foods 2021, 10, 2767. [Google Scholar] [CrossRef]
Visentini, I.; Snidaro, L.; Foresti, G.L. Diversity-aware classifier ensemble selection via f-score. Inf. Fusion 2016, 28, 24–43. [Google Scholar] [CrossRef]
Kim, S.-W.; Gil, J.-M. Research paper classification systems based on TF-IDF and LDA schemes. Hum.-Cent. Comput. Inf. Sci. 2019, 9, 30. [Google Scholar] [CrossRef]
Barbosa, R.M.; de Paula, E.S.; Paulelli, A.C.; Moore, A.F.; Souza, J.M.O.; Batista, B.L.; Campiglia, A.D.; Barbosa, F. Recognition of organic rice samples based on trace elements and support vector machines. J. Food Compos. Anal. 2016, 45, 95–100. [Google Scholar] [CrossRef]
Serranti, S.; Cesare, D.; Marini, F.; Bonifazi, G. Classification of oat and groat kernels using NIR hyperspectral imaging. Talanta 2013, 103, 276–284. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.; Chen, Y.R. Two-dimensional visible/near-infrared correlation spectroscopy study of thawing behavior of frozen chicken meats without exposure to air. Meat Sci. 2001, 57, 299–310. [Google Scholar] [CrossRef]
Vance, C.K.; Tolleson, D.R.; Kinoshita, K.; Rodriguez, J.; Foley, W.J. Near Infrared Spectroscopy in Wildlife and Biodiversity. J. Near Infrared Spectrosc. 2016, 24, 1–25. [Google Scholar] [CrossRef]
Restaino, E.; Fassio, A.; Cozzolino, D. Discrimination of meat patés according to the animal species by means of near infrared spectroscopy and chemometrics Discriminación de muestras de paté de carne según tipo de especie mediante el uso de la espectroscopia en el infrarrojo cercano y la quimiometria. CyTA-J. Food 2011, 9, 210–213. [Google Scholar]
Wilson, R.H.; Nadeau, K.P.; Jaworski, F.B.; Tromberg, B.J.; Durkin, A.J. Review of short-wave infrared spectroscopy and imaging methods for biological tissue characterization. J. Biomed. Opt. 2015, 20, 030901. [Google Scholar] [CrossRef]
Ribeiro, J.S.; Ferreira, M.M.C.; Salva, T.J.G. Chemometric models for the quantitative descriptive sensory analysis of Arabica coffee beverages using near infrared spectroscopy. Talanta 2011, 83, 1352–1358. [Google Scholar] [CrossRef]
Monrroy, M.; GutiÉRrez, D.; Miranda, M.; HernÁNdez, K.; Garcia, J. Determination of brachiaria spp. forage quality by near-infrared spectroscopy and partial least squares regression. J. Chil. Chem. Soc. 2017, 62, 3472–3477. [Google Scholar] [CrossRef]
Luna, A.S.; da Silva, A.P.; Pinho, J.S.A.; Ferré, J.; Boqué, R. Rapid characterization of transgenic and non-transgenic soybean oils by chemometric methods using NIR spectroscopy. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2013, 100, 115–119. [Google Scholar] [CrossRef] [PubMed]
He, J.; Zhang, C.; He, Y. Application of near-infrared hyperspectral imaging to detect sulfur dioxide residual in the Fritillaria thunbergii bulbus treated by sulfur fumigation. Appl. Sci. 2017, 7, 77. [Google Scholar] [CrossRef]
Xu, L.; Sun, W.; Wu, C.; Ma, Y.; Chao, Z. Discrimination of trichosanthis fructus from different geographical origins using near infrared spectroscopy coupled with chemometric techniques. Molecules 2019, 24, 1550. [Google Scholar] [CrossRef] [PubMed]
Gok, S.; Severcan, M.; Goormaghtigh, E.; Kandemir, I.; Severcan, F. Differentiation of anatolian honey samples from different botanical origins by ATR-FTIR spectroscopy using multivariate analysis. Food Chem. 2015, 170, 234–240. [Google Scholar] [CrossRef] [PubMed]
Acquarelli, J.; van Laarhoven, T.; Gerretzen, J.; Tran, T.N.; Buydens, L.M.C.; Marchiori, E. Convolutional neural networks for vibrational spectroscopic data analysis. Anal. Chim. Acta 2017, 954, 22–31. [Google Scholar] [CrossRef] [PubMed]
Liu, J.; Osadchy, M.; Ashton, L.; Foster, M.; Solomon, C.J.; Gibson, S.J. Deep convolutional neural networks for Raman spectrum recognition: A unified solution. Analyst 2017, 142, 4067–4074. [Google Scholar] [CrossRef] [Green Version]
Ballabio, D.; Consonni, V. Classification tools in chemistry. Part 1: Linear models. PLS-DA. Anal. Methods 2013, 5, 3790–3798. [Google Scholar] [CrossRef]
Yang, X.; Hong, H.; You, Z.; Cheng, F. Spectral and image integrated analysis of hyperspectral data for waxy corn seed variety classification. Sensors 2015, 15, 15578–15594. [Google Scholar] [CrossRef]
Kong, W.; Zhang, C.; Liu, F.; Nie, P.; He, Y. Rice seed cultivar identification using near-infrared hyperspectral imaging and multivariate data analysis. Sensors 2013, 13, 8916–8927. [Google Scholar] [CrossRef]
Wakholi, C.; Kandpal, L.M.; Lee, H.; Bae, H.; Park, E.; Kim, M.S.; Mo, C.; Lee, W.H.; Cho, B.-K. Rapid assessment of corn seed viability using short wave infrared line-scan hyperspectral imaging and chemometrics. Sens. Actuators B-Chem. 2018, 255, 498–507. [Google Scholar] [CrossRef]
Chen, Y.; Lin, Z.; Zhao, X.; Wang, G.; Gu, Y. Deep learning-based classification of hyperspectral data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 2094–2107. [Google Scholar] [CrossRef]
Kang, X.; Li, C.; Li, S.; Lin, H. Classification of hyperspectral images by gabor filtering based deep network. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 1166–1178. [Google Scholar] [CrossRef]

Figure 1. (a) Binary image; (b) raw colored image.

Figure 2. A brief overview of the CNN architecture.

Figure 3. The average spectra of Fritillaria samples of twelve (12) varieties.

Figure 4. (a) 2D-distribution of the twelve (12) individual varieties (b) Loadings.

Figure 5. Confusion matrices of the different models.

Figure 6. (a) The grid-search result for the optimization of the radial basis function support vector machine classifier (RBF-SVC0 model). The best combination of the RBF-SVC parameters is marked by ‘*’. (b) PLS-DA: Best component = 20.4. The best combination of the RBF-SVC parameters is marked by the red five-pointed star. The Green line is the training accuracy and The Black line is for validation.

Table 1. Information on different varieties of Fritillaria samples.

ID.	Variety	State	Origin	Supplier
1	TongRenTang	Flake	Zhejiang, China	Tongrentang (Sichuan) Health Pharmaceutical Co., Ltd.
2	MoYuan	Flake	Zhejiang, China	Anguo MedicineSource Trading Co., Ltd.
3	NiuEnTang	Flake	Zhejiang, China	Hebei NiuEntang Electronic Commerce Co., Ltd.
4	QiGuiTang	Flake	Zhejiang, China	Hebei Lingkang Trading Co., Ltd.
5	ZeXinTang	Flake	Zhejiang, China	Bozhou ZeXinTang Pharmaceutical Co., Ltd.
6	JiaQiTang	Flake	Zhejiang, China	Anguo Guangsheng Trading Co., Ltd.
7	FuXiTang	Flake	Zhejiang, China	Sichuan Haorui Gallium Biotechnology Co., Ltd. (Sichuan)
8	ZangXiTang	Flake	Zhejiang, China	Sichuan Zangxitang Biotechnology Co., Ltd.
9	NanBeiHang	Flake	Zhejiang, China	Guangzhou NanBeiHang Chinese Medicine Herb Co., Ltd.
10	ShenYue	Flake	Zhejiang, China	Tonghua Sanbao Ginseng Antler Trading Co., Ltd.
11	KangMei	Flake	Zhejiang, China	Kangmei Pharmaceutical Co., Ltd. (Guangdong)
12	YiLing	Flake	Zhejiang, China	Shijiazhuang Yiling Herbal Pieces Co., Ltd.

Table 2. Precision, Recall, and F-Score values of the three models.

Models	Data Set	Precision (%)	Recall (%)	F-Score
CNN	Training	0.9705	0.9688	0.9697
CNN	Testing	0.8988	0.8889	0.8938
SVM	Training	0.9967	0.9965	0.9965
SVM	Testing	0.8010	0.7917	0.7963
PLS-DA	Training	0.9267	0.9259	0.9263
PLS-DA	Testing	0.8333	0.8194	0.8263

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kabir, M.H.; Guindo, M.L.; Chen, R.; Liu, F.; Luo, X.; Kong, W. Deep Learning Combined with Hyperspectral Imaging Technology for Variety Discrimination of Fritillaria thunbergii. Molecules 2022, 27, 6042. https://doi.org/10.3390/molecules27186042

AMA Style

Kabir MH, Guindo ML, Chen R, Liu F, Luo X, Kong W. Deep Learning Combined with Hyperspectral Imaging Technology for Variety Discrimination of Fritillaria thunbergii. Molecules. 2022; 27(18):6042. https://doi.org/10.3390/molecules27186042

Chicago/Turabian Style

Kabir, Muhammad Hilal, Mahamed Lamine Guindo, Rongqin Chen, Fei Liu, Xinmeng Luo, and Wenwen Kong. 2022. "Deep Learning Combined with Hyperspectral Imaging Technology for Variety Discrimination of Fritillaria thunbergii" Molecules 27, no. 18: 6042. https://doi.org/10.3390/molecules27186042

Article Menu

Deep Learning Combined with Hyperspectral Imaging Technology for Variety Discrimination of Fritillaria thunbergii

Abstract

1. Introduction

2. Materials and Methods

2.1. Sample Preparation

2.2. Hyperspectral Image Acquisition and Correction

2.3. Pretreatment and Extraction of Spectra

2.4. Software

2.5. Analysis of Chemometrics

2.5.1. CNN

2.5.2. PLS-DA

2.5.3. SVM

2.5.4. Discrimination Models Accuracy Evaluation

3. Results and Discussion

3.1. Spectral Features

3.2. PCA

3.3. CNN

3.4. PLS-DA

3.5. SVM

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Sample Availability

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI