Cluster analysis of polymers using laser-induced breakdown spectroscopy with K-means

Yangmin GUO; Yun TANG; Yu DU; Shisong TANG; Lianbo GUO; Xiangyou LI; Yongfeng LU; Xiaoyan ZENG

doi:10.1088/2058-6272/aaaade

1. Introduction

The rapid development of economy and industries has brought great increase in the consumption of polymers (plastics) [1]. The recycling of waste industrial polymers has become an urgent issue to solve the ecological environment pollution and save precious resources such as air, soil and water [2]. Classification are crucial stages in recycling of the waste polymers. Several techniques such as near-infrared (NIR) spectroscopy [3], x-ray fluorescence (XRF) spectroscopy [4], and Raman spectroscopy [5], have been investigated for the classification of polymers. However, there are some problems with the methods described above. NIR has been applied extensively in polymers recycling, but it is not applicable to identify black polymers colored using carbon black pigments [6]. The Raman effect is more susceptible to fluorescence and the parameters of optical system. XRF does not perform well on detecting carbon, oxygen, hydrogen, and nitrogen, while the four elements above are very important to differentiate polymers. Therefore, it has become one of the research focuses to classify a wide range of plastics effectively and efficiently.

Laser-induced breakdown spectroscopy (LIBS) is an effective and prospective technique for elemental analyses [7–13]. Due to its unique features such as no or little sample preparation, multi-elemental analysis, and rapid online analysis capability, LIBS for polymers classification has been investigated in recent years. Boueri et al adopted LIBS with artificial neural network, to identify eight kinds of polymers by using 14 spectral lines, including metallic elements and non-metallic elements lines [14]. Anzano et al identified six kinds of plastics by using LIBS combined with simple statistical correlation methods, and the correct identification rate achieved more than 80% [15]. Banaee and Tavassoli applied discriminant function analysis with 10 spectral lines to identify and classify six kinds of polymers, and the correct classification accuracy achieved 99% [16]. Yu et al improved the identification accuracy of eleven kinds of polymers to nearly 100% by adjusting spectral weightings [17]. He et al compared three unsupervised learning methods (hierarchical cluster, K-means, and Iterative Self-Organizing Data Analysis Technique) in the cluster analysis of four kinds of polymers [18]. Costa et al classified six kinds of polymer e-waste with k-nearest neighbors and soft independent modeling of class analogy, and the average accuracies were 98% and 92%, respectively [19]. The meaningful work described above demonstrated that classification of polymers by using LIBS with supervised or unsupervised learning method is feasible. However, there are still some problems. The additives, such as heavy metals, brominated flame retardants and pigments, are added during production of industrial products. The additives in plastics usually varied in species and contents. Therefore, the classification accuracy will be limited if the characteristic spectral lines of additives were used. Furthermore, the number of plastics types, in general, must be specified in advance, but there is usually no prior knowledge about the dataset.

To address the problems described above, LIBS combined with unsupervised learning algorithm K-means was applied for classification of industrial polymers by using only four characteristic spectral lines. In this study, the cluster analysis was achieved by an iterative process, where the average relative standard deviation (ARSD) values of characteristic spectral lines were used as the criterion. The Davies–Bouldin (DB) index was used to determine the initial number of clusters [20].

2. Experiments and methods

2.1. Experimental setup and samples

A schematic diagram of the LIBS setup used in this study is shown in figure 1. A second harmonic Q-Switched Nd:YAG pulsed laser operating at 532 nm (pulse duration 6 ns, repetition rate 10 Hz, pulse energy 40 mJ) was used in ambient air. The laser beam was reflected by a dichroic mirror and focused onto the surface of polymer samples perpendicularly by a lens with a focal length of 150 mm. The plasma emission was collected by a light collector, and then was coupled into an echelle spectrometer (Andor Tech., Mechelle 5000, spectral range from 200 to 950 nm, resolution λ/Δλ = 5000) coupled with an intensified charge-coupled device camera (Andor Tech., iStar DH-334T). The sample was placed on a platform controlled by a computer. The samples used in this study consisted of 20 kinds of industrial polymers as listed in table 1.

Table 1. Experiment samples and their physical properties.

Samples	Abbr. (color) (#NO.)	Samples	Abbr. (color) (#NO.)
Acrylonitrile butadiene styrene	ABS1 (yellow)(#1)	Polymethyl-methacrylate	PMMA3 (gray)(#11)
	ABS2 (black)(#2)		PMMA4 (red)(#12)
Polyvinylchloride	PVC1 (white)(#3)		PMMA5 (blue^a)(#13)
	PVC2 (black)(#4)	Polycarbonate	PC (colorless^a)(#14)
	PVC3 (colorless^a)(#5)	Polyethylene	PE (beige)(#15)
Nylon	PA1 (beige)(#6)	Polyformaldehyde	POM (white)(#16)
	PA2 (white)(#7)	Polypropylene	PP (milky)(#17)
	PA3 (black)(#8)	Polystyrene	PS (colorless^a)(#18)
Polymethyl-methacrylate	PMMA1 (black)(#9)	Polytetrafluoroethylene	PTFE (white)(#19)
	PMMA2 (colorless^a)(#10)	Polyurethane	PU (yellow^a)(#20)

^aRefers to transparent.

2.2. Experimental method

The atomic spectral lines of carbon, hydrogen, and oxygen were used as analytical lines to optimize experimental parameters. To obtain high spectral intensity and signal-to-background ratio, the gate delay time and width were set to 2.5 and 2 μs, respectively. Focal point was under the target surface at a depth of 1.5 mm to prevent air breakdown occurring. To reduce the influence of the laser energy fluctuation on the spectral intensity, each measured spectrum was accumulated by 30 laser pulses. 50 spectra were obtained at different locations for each sample. 1000 spectra for twenty kinds of polymers were obtained in total. The cluster analysis was realized in MATLAB R2014a (MathWorks Corporation, USA).

2.3. Variables used for the analysis

Spectral lines used for analysis should be free of self-absorption and spectral interference. To classify industrial polymers under atmospheric conditions, the characteristic spectral lines C i 247.86 nm, H i 656.3 nm, and O i 777.3 nm, corresponding to major elements in polymers, were chosen. Compared to N i 746.9 nm, the intensity of molecular line C–N(0, 0) 388.3 nm is mainly contributed by the nitrogen in polymers [21]. It means that the molecular line C–N(0, 0) 388.3 nm is more sensitive to the nitrogen content in polymers than N i 746.9 nm, thus we used the C–N(0, 0) 388.3 nm emission line instead of N i 746.9 nm, thus we used the C–N(0, 0) 388.3 nm emission line instead of N i 746.9 nm. Because the metallic elements mainly come from additives, spectral lines of metallic elements were not used for analysis. Four spectral lines were used for the analysis, as shown in figure 2. The full-spectrum intensity is an important characteristic of measured materials [22, 23], so it was also used for the analysis. In summary, five variables were used for the analysis in total.

**Figure 2.** Spectrum from PC and spectral lines used for analysis, units: nm.
Download figure:
Standard image High-resolution image

3. Clustering method

K-means clustering algorithm has been widely applied in different fields for its characteristics of high efficiency [24]. However, traditional K-means algorithm has two problems: (1) the cluster number should be given in advance; (2) resolution is limited, which means clusters with high similarity are difficult to separate. To address the problems, DB index was used to determine the initial number of clusters, and an iterative clustering process was introduced by calculating the ARSD value of the five variables. The cluster with abnormally large ARSD value was considered to consist of two or more clusters. The DB index defined as:

$\begin{eqnarray}&&{\rm{DB}}=\displaystyle \frac{1}{k}\displaystyle \sum _{i=1}^{k}\mathop{\max }\limits_{i\ne 1}\left\{\displaystyle \frac{{e}_{j}+{e}_{i}}{{d}_{ij}}\right\},j=1,\,2,\,\ldots ,\,k,\end{eqnarray} \tag{ 1 }$

where e_i (e_j) is the average Euclidean distance between each point in the ith (jth) cluster and the centroid of the ith (jth) cluster, d_ij is the Euclidean distance between the centroids of the ith and jth clusters. The cluster number with the minimum DB index was set as initial value. The main steps involved in the algorithm are listed below:

(1)
Determine the initial number of clusters k by calculating DB index.
(2)
Arbitrarily choose k spectral vectors from spectra dataset as initial centroids to cluster the spectra dataset.
(3)
Calculate the ARSD value of each cluster obtained by step (2).
(4)
Determine the cluster with abnormally large ARSD value based on t-test. Repeat steps (1)–(3) for the abnormal cluster until no abnormal cluster was found.

4. Results and discussion

4.1. Initial clustering process

To cluster the spectra dataset measured form industrial polymers without any prior knowledge, the initial number of clusters was determined to be 17 firstly, as shown in figure 3. The corresponding clustering results was shown in figure 4. Each red dot represents a spectrum, 1000 red dots in total. After initial clustering, fifteen kinds of polymers were classified correctly. The spectra of ABS2, PMMA2, and POM were not be differentiated, as well as the spectra of PMMA4 and PMMA5.

**Figure 3.** The relationship between DB index and number of clusters.
Download figure:
Standard image High-resolution image

**Figure 4.** The initial cluster analysis results.
Download figure:
Standard image High-resolution image

The Euclidean distances among each spectrum of twenty kinds of polymers were shown in figure 5. It was obvious that the Euclidean distances among spectra of ABS (spectra 51–100), PMMA2 (spectra 451–500), and POM (spectra 751–800) were less than 1. Similar results were also obtained between PMMA4 (spectra 551–600) and PMMA5 (spectra 601–650). The results demonstrated that the resolution of the initial cluster analysis in this study was about 1, characterized by the Euclidean distance. Because of the interference from air and the similarity of the polymer samples, the results of initial cluster analysis based on traditional K-means algorithm were unsatisfactory.

4.2. Iterative clustering process

To differentiate the samples with high similarity, the ARSD values of the five variables, C i 247.86, H i 656.3, and O i 777.3 nm, C–N (0, 0) 388.3 nm and the full-spectrum intensity, were used as the criterion. Due to the polymers are with similar properties, the intensity fluctuations of each spectral line were also similar under the same experimental conditions. Therefore, if the spectra of two or more different kinds of polymers were mixed together, the ARSD value would be large abnormally. T-test is a method used for eliminating abnormal data [15, 25]. T-test (p = 0.01) was applied to identify the abnormal cluster by using a critical value in this study. If the ARSD value of a cluster is larger than the critical value, the cluster will be judged to be abnormal. The ARSD and critical values were shown in table 2. Cluster 11 and 2 were judged to be abnormal in the first and second iterations, respectively. For the abnormal cluster identified, further cluster analysis was made, similar to initial clustering process. In the third iteration, the iterative clustering process was terminated, because there was no abnormal cluster was found. As shown in figure 6, the ARSD values of the clusters were close to each other after the iterative clustering process.

Table 2. ARSD and critical values in the iteration process.

Iteration times	The cluster with the maximal ARSD value	ARSD (%)	Critical value (%)
1	11	13.69	7.60
2	2	7.05	6.55
3	10	5.88	6.39

**Figure 6.** The ARSD values before (a) and after (b) iterative cluster analysis.
Download figure:
Standard image High-resolution image

The final results of the cluster analysis for the twenty kinds of polymers were shown in figure 7. As shown in figure 7, for the 1000 spectra of twenty kinds of polymers, there were only four spectra which were misclassified. The classification accuracy was 99.6%. The reasons for the high accuracy were as follows: the main difference among the twenty kinds of polymers were the contents of carbon, hydrogen, oxygen, and nitrogen. The contents of the four elements were characterized by the intensities of the four spectral lines, C i 247.86 nm, H i 656.3 nm, O i 777.3 nm, and C–N (0, 0) 388.3 nm. Furthermore, an iterative process was introduced to improve the resolution of the traditional K-means algorithm. As a result, the classification accuracy for the twenty kinds of polymers was improved to nearly 100% with the proposed approach.

5. Conclusions

A new approach was developed for the classification of polymers by LIBS technique coupled with K-means clustering algorithm. Based on the initial cluster analysis, this approach was achieved through an iterative process by calculating the ARSD values of each cluster. With the proposed approach, the classification of twenty kinds of polymers can be realized without requiring any prior knowledge under atmospheric conditions. The classification accuracy for the twenty kinds of polymers achieved 99.6%. Therefore, the results showed that the proposed approach can be an effective and efficient method for on-line, real-time analysis of recycled polymers.

Acknowledgments

This work is supported by National Natural Science Foundation of China (Nos. 61575073 and 51429501).

Cluster analysis of polymers using laser-induced breakdown spectroscopy with K-means

Article metrics

Submit

Permissions

Author e-mails

Author affiliations

Dates

Abstract

1. Introduction