Introduction

Bee pollen is a mixture of flower pollen, nectar and bee saliva. This beehive product is rich in essential nutrients and biologically active substances, such as phenolic compounds that can exhibit antioxidant, anti-inflammatory and anti-microbial activity. Thus, the consumption of bee pollen as a food supplement and a health product has increased in recent years [1]. However, depending on the geographical and botanical origin, bee pollen might be potential hazard for human intake due to the presence of natural toxins like pyrrolizidine alkaloids (PAs) and their N-oxides (PANOs) [2].

PAs/PANOs are secondary metabolites produced by plants as a chemical defence against herbivorous insects. They are predominant in all genera of the Boraginaceae family, in the Senecioneae and Eupatorieae tribes (Asteraceae), but they are also present in the genus Crotalaria (Fabaceae) [3]. PAs/PANOs have been reported to cause toxicity in many animal species, including humans. Experimental data obtained from in vitro and in vivo studies suggest that chronic exposure to PAs/PANOs may lead to hepatotoxicity, genotoxicity, carcinogenicity and pulmonary lesions [4, 5]. However, the toxicity of PAs/PANOs is dependent on their chemical structure [6]. Only those compounds that contain a double bond at the 1,2-position of the necine base can be transformed into highly reactive pyrroles in the liver. Furthermore, PAs/PANOs can be classified according to their esterification level as cyclic diesters, open-chain diesters and monoesters, which are in decreasing order of toxicity [4,5,6]. Based on botanical origin and chemical structure, PAs/PANOs can be classified into different groups, such as lycopsamine, senecionine and heliotrine types [3]. The lycopsamine-type PAs/PANOs consist of monoesters (e.g., lycopsamine and its N-oxide, indicine N-oxide, intermedine) and open-chain diesters (e.g., echimidine and its N-oxide) produced by plants from the Boraginaceae and tribe Eupatorieae. The senecionine-type compounds are cyclic diesters (e.g., jacobine, retrorsine, senecionine, seneciphylline and their N-oxides, and senkirkine), particularly found in plants from the tribe Senecioneae, while the heliotrine-type PAs/PANOs are monoesters (e.g., heliotrine and its N-oxide) that occur in the genus Heliotropium (Boraginaceae) [3].

Although PAs/PANOs can cause toxic effects, it has not yet been possible to establish a limit in food and feed due to analytical uncertainties [7]. In addition, there is a lack of toxicological data relating to the PAs/PANOs found most frequently in food [8]. Some authorities, such as the European Food Safety Authority (EFSA), have performed independent risk assessments by applying the margin of exposure (MOE) approach, based on the benchmark dose lower confidence limit for a 10% excess cancer risk (BMDL10) derived from animal studies. In this approach, an MOE value of at least 10,000 is of low concern for carcinogenic effects [3]. In 2011, the EFSA proposed an orientation value of 0.007 µg/kg body weight (b.w) for the sum content of PAs/PANOs, based on a BMDL10 of 70 µg/kg b.w/day of lasiocarpine in male rats, and an MOE value of 10,000 [3, 9, 10]. In 2017, the EFSA updated its risk characterisation using the MOE approach and a new BMDL10 of 237 μg/kg b.w, derived from the incidence of liver haemangiosarcoma in female rats exposed to riddelliine [8]. Thus, an orientation value of 0.024 µg/kg b.w/day for the sum of PAs/PANOs could be considered of low concern for public health [11]. Regarding non-carcinogenic risks, with PA/PANO concentrations lower than 0.1 µg/kg b.w/day, these kinds of effect are not expected to occur [10]. This orientation value was obtained from a No Observed Adverse Effect Level (NOAEL) for riddelliine in rats of 10 μg/kg b.w/day, divided by an uncertainty factor of 100 [10]. Based on data published by EFSA [12], possible limits for PAs/PANOs in those foods that most contribute to human exposure through the diet are currently being discussed at European Union level [13]. Maximum PA/PANO levels of 400 and 500 μg/kg have been proposed for pollen products [14, 15]. Nevertheless, given that intake of even low PA/PANO amounts could increase the risk to health, especially if consumed frequently, the recommendation is still that the intake of these natural toxins should be minimised to the lowest level possible [16]. For this reason, the EFSA recommends the development of more sensitive methods to detect PAs/PANOs in food [8].

The main methods for analysis of PAs/PANOs are based on liquid chromatography coupled to tandem mass spectrometry (LC–MS/MS), which is highly sensitive, specific and reliable [17,18,19] but also time-consuming and expensive. Therefore, there is a need to develop fast, non-destructive, cost-effective and multi-modality analytical methods that support at-line control of natural toxins throughout the supply chain. Near-infrared (NIR) spectroscopy is a powerful technology that has been already proposed in food-contaminant monitoring related to human health concern [20]. Hence, there is an ongoing increase in the application of NIR to detect mycotoxins [21, 22], allergens [23] and gluten [24] in food products, even at low levels (μg/kg). However, to the best of our knowledge, NIR spectroscopy applications for the detection of PAs/PANOs in bee pollen or analogous food supplements have not yet been performed. So far, only Carvalho et al. have reported the potential of NIR to predict the PA/PANO content in fresh and dried leaves of three Senecio species [25]. Furthermore, there is a lack of literature on the feasibility of applying miniaturised and portable spectroscopic devices, which could further raise the rapidity of measurement as well as avoiding the transportation of samples to the laboratory [26, 27].

In this context, this study first aimed to evaluate the distribution of PAs in bee pollen through univariate analysis across a classification criterion based on 400 µg/kg that could be considered a potential threshold of compliance in line with the future policy of food safety authorities. Regardless of the compliance limit, there is a need for rapid detection of these natural toxins in food, a challenge that could be reliably met by NIR spectroscopy technique. Thus, the main goal of this trial was to assess the feasibility of two NIR systems by means of a statistical modelling approach based on targeted canonical discriminant analysis (CDA). To achieve these aims, 60 bee pollen samples were analysed by a validated LC–MS/MS method, suitable for detecting and quantifying 17 PAs/PANOs suggested by EFSA to be monitored in food.

Materials and methods

Sampling and experimental design

Sixty dehydrated bee pollen samples were purchased from stores and online shops from different countries (39 from Italy, 17 from EU countries, 4 from non-EU countries). Based on standard guidelines and literature data, the samples were supposed to have a moisture level lower than 6% [28, 29]; therefore, they were kept in a dark, cool and dry place until analysis by LC–MS/MS and NIR spectroscopy. Approximately 50 g of each bee pollen sample was ground and homogenised by a GRINDOMIX GM 200 mill (Retsch, Italia, Torre Boldone, Italy) at 6000 rpm for 15 s. The NIR spectral data collection and LC–MS/MS analysis were carried out on ground bee pollen.

LC–MS/MS analysis

Standards, solvents and reagents

Analytical standards were obtained from different suppliers as follows: echimidine (purity 97%), echimidine N-oxide (purity 97%), heliotrine (purity 91%), heliotrine N-oxide (purity 91%), lycopsamine (purity 80%) and lycopsamine N-oxide (purity 80%) from PhytoLab GmbH & Co. KG (Vestenbergsgreuth, Germany); senecionine (purity 99%), senecionine N-oxide (purity 99%), seneciphylline (purity 94%) and seneciphylline N-oxide (purity 94%) from Carl Roth & Co. KG (Karlsruhe, Germany); indicine-N-oxide (purity 99%), intermedine (purity 99%), jacobine (purity 98%), jacobine N-oxide (purity 98%), retrorsine (purity 90%), retrorsine N-oxide (purity 96.0%) and senkirkine (purity 98%) from Phytoplan (Heidelberg, Germany); caffeine (purity 98%), used as an internal standard, was from Sigma-Aldrich (Steinheim, Germany).

Methanol (LC–MS grade) and sulphuric acid (98% purity, analytical grade) were from Carlo Erba reagents (Milan, Italy). Ammonia (28% purity, analytical grade) was from VWR Chemicals. Formic acid (98% purity, LC–MS grade) was from Sigma-Aldrich. Ultra-pure water was obtained from a water purification system (Purelab Classic, ELGA Lab Water, High Wycombe, UK).

Sample preparation and LC–MS/MS analysis

Sample preparation, PA/PANO extraction from bee pollen samples and LC–MS/MS analysis were performed as described by De Jesus Inacio et al. [30]. The validated LC–MS/MS method used in this work is suitable for detecting and quantifying 17 PAs/PANOs in bee pollen. Detailed information on the analytical parameters (specificity, linearity, apparent recovery, precision, absolute recovery and matrix effect) evaluated in the LC–MS/MS method validation for target PAs/PANOs were reported in De Jesus Inacio et al. [30].

Briefly, the PAs/PANOs were extracted from 2.5 g of ground bee pollen samples using 15 mL of 0.05 M sulphuric acid solution. After 10 min of shaking, 10 mL of n-hexane was added and the samples were shaken for another 10 min and centrifuged at 3000g for 10 min. The organic phase was discarded and the aqueous extracts were then applied onto strong cation polymeric solid phase cartridges (Bond Elut Plexa PCX, 200 mg/6 mL, Agilent), previously conditioned with 6 mL methanol and 6 mL 0.1% formic acid in water. After loading, the cartridges were washed with 3 mL methanol and then eluted with 6 mL of 5% ammonia in methanol. The eluates were dried under an air stream at 50 °C, re-suspended with 1 mL of caffeine (internal standard, 500 ng/mL) in 0.1% formic acid in methanol and 0.5% formic acid in water (20 : 80, v/v) and filtered with a syringe filter consisting of a 0.22-µm regenerated cellulose (RC) membrane [31]. The PA/PANO extracts were analysed using a high-performance liquid chromatography system consisting of an Accela 600 HPLC pump equipped with a CTC automatic injector (Thermo Fischer Scientific, San Jose, CA, USA) and coupled to an LTQ XL ion trap mass spectrometer (Thermo Fischer Scientific, San Jose, CA, USA) with a heated electrospray ionisation (HESI-II) probe. Five microliters of extract samples was loaded onto the analytical column (Hypersil GOLD 100 × 2.1 mm, 1.9 μm, Thermo Fisher Scientific, San Jose, CA, USA), and the PA/PANO separation was performed in a gradient of solvent A (water with 0.1% formic acid) and B (methanol with 0.1% formic acid) at a flow rate of 200 μL/min as follows: isocratic condition from 0 to 4 min (90% A and 10% B); from 10 to 15% (B) in 0.5 min; 15% (B) from 4.5 to 9 min; from 15 to 40% (B) in 5 min; from 40 to 80% (B) in 1 min; 80% (B) for 1.5 min; from 80 to 10% (B) in 0.5 min; from 17 to 20 min 10% (B) to re-equilibrate the column.

The 17 PAs/PANOs were detected by the MS with the ESI source operating in positive-ion mode. The optimised operational LC–ESI(+)–MS/MS and detection conditions used for all the PAs/PANOs were as follows: sheath gas flow, 35 arbitrary units; auxiliary gas flow, six arbitrary units; ion spray voltage, 3.5 kV; capillary temperature, 350 °C; capillary voltage, 11 V; and tube lens, 60 V. The MS/MS conditions (collision energy, precursor and product ions) and the retention times obtained for each analyte were reported in Inacio et al. [30].

The limit of quantification (LOQ) of each PA/PANO was set at 0.4 µg/kg which is the lowest calibrator concentration on the calibration curve that could be quantified with a precision within 20%, and trueness between 80 and 120% as reported by the European Commission Decision 2002/657/EC [30].

The concentration of individual PAs/PANOs was calculated based on calibration curves prepared with a pool of blank bee pollen spiked with the 17 PAs/PANOs prior to the extraction to obtain the final concentration in the range 0.4–100 µg/kg.

NIR analysis

All bee pollen samples were analysed in triplicate using both a FOSS DS-2500 scanning monochromator (FOSS NIRSystem, Hillerød, Denmark) and portable NIR apparatus (PoliSPECNIR, ITPhotonics, Breganze, Italy). In the case of the lab-stationary system (referred to as Foss), scans were recorded in reflectance mode (850–2500 nm at 0.5-nm intervals) using a slurry cup with a quartz window (12.6 cm2 area) in 30 g aliquots. With regard to the portable system (referred to as Polispec), scans were also performed in reflectance mode (902–1670 nm at 2-nm intervals) using a quartz cylinder (9.1 cm2 area) in 5 g aliquots. Spectral data were recorded as absorbance (A) calculated as log(1/R), where R represents reflectance, using WinISI4 software V4.10.0.15326 (FOSS Analytical A/S, Hillerød, Denmark) for Foss and using poliDATA (ITPhotonics, Breganze, Italy) for Polispec. For both systems, to carry out the statistical analysis, spectra were exported to an Excel (Microsoft Office®, USA) spreadsheet and averaged before further chemometric modelling.

Data and statistical analysis

The 60 bee pollen samples analysed were grouped into three quantitative (µg/kg) classes according to the sum of the 17 PAs/PANOs: < LOQ (0.4), low; 0.4–400, moderate; and > 400, high. As the PA/PANO data were not normally distributed, a Kruskal–Wallis test was carried out to analyse the effect of the distribution of these alkaloids within the three quantitative classes (XLSTAT, Addinsoft, release 2019, NY, USA). To evaluate the classifying effect, a multiple pairwise comparison was conducted using the Steel–Dwass–Critchlow–Fligner procedure (based on the averaged rank).

The variable importance in projection (VIP) indices were calculated using the relevance of predictors according to the threshold criterion of ‘greater than one’ of the PLS-DA algorithm [32], by means of MATLAB R2017a software V9.2.0.538062 (The MathWorks Inc., Natick, MA, USA) and PLS Toolbox (PLS Toolbox V5.8.2.1, Eigenvector Research Inc., Manson, WA, USA).

To discriminate the PA/PANO classes, a supervised CDA was also adopted for each NIR spectrometer (SAS 9.4 software, SAS Institute Inc., Cary, NC, USA). The first step was processing of the Foss dataset to reduce the number of wavelengths. For this purpose, the absorbance (A) was averaged every 8-nm interval, and the mean value was assigned to the intermediate wavelength (i.e., λ854 is equal to the average of A from 850 to 858 nm, λ2494 is equal to the average of A from 2492 to 2500 nm). The second step was a stepwise feature selection based on analysis of variance (ANOVA) to select those significant (p < 0.05) spectral variables related to the PA/PANO classes. CDA was performed on the selected spectral variables (PROC CANDISC of SAS), to explain the total variance of the model in two main canonical functions (CAN 1 and CAN 2). The degree of dissimilarity among the three quantitative PA/PANO classes was measured by squared Mahalanobis distances (D2-Mahalanobis).

The reliability of the CDA model was assessed by a confusion matrix obtained by means of a cross-validation based on the leave-one-out criterion (PROC DISCRIM of SAS). As suggested by Bisutti et al. [33], the reliability of the related confusion matrix was evaluated by a set of statistical metrics: accuracy, precision, sensitivity, specificity and Matthews correlation coefficient (MCC).

Results and discussion

LC–MS/MS analysis and distribution of PAs/PANOs in bee pollen

LC–MS/MS analysis of the 60 dehydrated bee pollen samples was performed according to a validated method published in a previous study [30].

PAs/PANOs were found in 46 (77%) of the 60 bee pollen samples and the sum amounts of the detected compounds were noticeably distinct (from 2 to 3356 µg/kg); a detailed framework of the descriptive statistics is reported in Table 1. In 17 (37%) of the contaminated samples, the sum of PA/PANO concentration was over the value of 400 µg/kg, which has been considered as threshold for further analyses performed in this study. Lycopsamine-type PAs/PANOs were found in the majority (85%) of the contaminated bee pollen samples, followed by senecionine-type (48%) and heliotrine-type PAs/PANOs (11%) that were rarely found. Figure 1 summarises the descriptive statistics of the main individual and total PAs/PANOs within the three quantitative classes through univariate non-parametric analysis. There was a predominance of echimidine, echimidine N-oxide and lycopsamine. The most noticeable feature is that there is high variability in the distribution of the PA/PANO concentration within the high class, while it is negligible for the moderate class. This is especially observed for lycopsamine (Fig. 1a), senecionine and seneciphylline N-oxides (Fig. 1b), while for echimidine and echimidine N-oxide (Fig. 1a), the range of variability is much lower and the distribution of samples is quite close to the median value.

Table 1 Descriptive statistics of the 17 pyrrolizidine alkaloids (PAs) and PA N-oxides (PANOs) and ΣPAs/PANOs (μg/kg) monitored in bee pollen
Fig. 1
figure 1

Box–whisker plots of lycopsamine-type (a) and senecionine-type compounds (b) and total PAs/PANOs (c) according to the three quantitative (µg/kg) PA/PANO classes (< LOQ (0.4), low; 0.4–400, moderate; and > 400, high). The box plots represent the following descriptive statistics: median (bar in box), mean (+ , red cross), 25–75% quartile (bottom and top end of the box), minimum and maximum values (whiskers) except for outliers (°circles, distance to box 1.5–3.0 times interquartile range) and extreme values (*asterisks, distance to box > 3 times interquartile range). The significance (p values on the top) of the multiple comparisons among the three PA/PANO classes was determined by Kruskal–Wallis non-parametric test

The results of this trial are in agreement with those published by Mulder et al. [17] and Picron et al. [18] who also reported a similar pattern for PAs/PANOs in bee pollen, with echimidine and its N-oxide as the main contributors to the sum content of these natural toxins. Even though the LC–MS/MS methods used by these authors cover a higher number of analytes (28–30), most of the additional compounds, including lasiocarpine and its N-oxide, have not been detected or they were found at trace levels and therefore, such compounds do not contribute remarkably to sum of PA/PANO concentration. Furthermore, the PA/PANO composition identified in the bee pollen samples, is also in agreement with that reported for honeys from Europe [18, 34]. In addition, the data from this work underline the high variability of the presence of PAs/PANOs in bee pollen, as observed for lycopsamine (Fig. 1a) and senecionine N-oxide (Fig. 1b). This phenomenon is probably due to the fact that bees collect pollen from both plants that do not synthesise these toxins and PA-producing plants [18]. Indeed, both the PA/PANO concentration and composition depend on the botanical taxon, geographical origin [35] and developmental stage of the plants. Moreover, synthesis of PAs/PANOs by plants is influenced by many other agronomic and environmental factors such as soil fertility, water availability and climate conditions [36,37,38]. Since most bee pollen samples analysed were from Italy and other European countries, possible sources of the lycopsamine-type PAs/PANOs could be plants from the genus Echium (e.g., E. vulgare), which is known to produce high levels of echimidine and its N-oxide, and Borago officinalis and Eupatorium cannabinum that synthesise lycopsamine, lycopsamine N-oxide and their isomers. Echium and Borago species are abundant in the Mediterranean region [39, 40], while Eupatorium cannabinum is diffuse in Europe [40]. Regarding senecionine-type PAs/PANOs, their origin could be Senecio species, which are widely distributed in Europe and grow everywhere [25, 39]. Although these plants are abundant, if other more attractive pollen sources are available, bees may prefer them [40], and this could also explain the high variability of the PA/PANO concentration and frequency in bee pollen.

NIR spectral data and discriminant analysis of PAs/PANOs

The development of a rapid spectroscopic method in the detection of PAs/PANOs in food supplements was the main challenge of this trial. For food safety purposes, this challenge was to verify if NIR combined with a statistical modelling approach could be a feasible technique to discriminate bee pollen samples grouped into three classes according to their PA/PANO content.

NIR spectra

The NIR spectral data were recorded for ground bee pollen because the grinding process tends to improve the discriminative performance. As reported by Pasikitan et al. [41], NIR analysis is sensitive to the particle size and homogeneity of the matrix, both of which can affect the spectra and, consequently, the predictive performance. Thus, to guarantee a satisfactory precision in discriminative analyses, a grinding step is useful to obtain a more homogeneous matrix and reduce the light scattering effects that result in spectral noise. In the previous study of De Jesus Inacio et al. [30], which tested the capability of a visible spectrophotometer to predict the presence of PAs/PANOs in bee pollen, the highest correlation between PA/PANO amount and the spatial colour coordinate named lightness was observed in ground samples.

Figure 2 evidences the averaged NIR spectra of the three quantitative PA/PANO classes. For both NIR systems, in the first part of the spectrum region (< 1450 nm), there was overlapping of the absorbance (A) among the three classes, while the absorbance of the low class was noticeably separated from the other two classes at higher wavelengths (> 1500 nm). In the study of González-Martín et al. [42], which applied NIR spectroscopy to assess bee pollen quality parameters, the spectra seemed to be comparable to those of the Foss system, with a meaningful absorbance peak around 1950 nm and a subsequent increasing trend up to 2500 nm. Costa et al. [43] reported a similar absorbance profile, although they analysed samples with a Fourier transform (FT)-NIR system. With regard to the portable system, it was not possible to confirm the absorbance pattern with data from the literature. The main outcomes of this alternative NIR instrument are an increasing absorbance from 1450 nm and A values about half of the Foss ones.

Fig. 2
figure 2

Absorbance spectra for the three quantitative classes of pyrrolizidine alkaloids (PAs) and their N-oxides (PANOs) in bee pollen, based on lab-stationary (Foss, upper panel) and portable (Polispec, lower panel) near-infrared (NIR) systems. Quantitative (µg/kg) PA/PANO classes: < LOQ (0.4), low; 0.4–400, moderate; > 400, high

The VIP scores chart obtained for each NIR spectrometer is shown in Fig. 3. The VIP pathway was characterised by intense peaks (VIP scores > 1) in the < 1000, 1000–1660 and > 2400 nm regions for Foss and only > 1500 nm for Polispec. The explicative VIP predictors showed an overlap between low and moderate classes throughout almost all the spectral range of the Foss system, meaning that the informative bands correlated with the smallest or medium PA/PANO content involved similar dominant wavelength regions. In contrast, the Foss-VIP pattern of the high class was almost distinguished from the other ones, and its explicative predictors (VIP scores > 1) belonged to the 900–1350-nm region. However, in the last part of the spectrum (> 2400 nm) there was a similar trend among the three classes. In the case of the portable NIR system, the VIP pathway was characterised by lower values (< 1.6) and slight differences among the PA/PANO groups. A potential relationship between VIP and PA/PANO detection could be due to a multiplicity of interferences that are discussed in the following section.

Fig. 3
figure 3

Variable importance in projection (VIP) and predictive scores by partial least squares discriminant analysis (PLS-DA) and canonical discriminant analysis (CDA) models for the three quantitative classes of pyrrolizidine alkaloids (PAs) and their N-oxides (PANOs) in bee pollen, based on lab-stationary (Foss, upper panel) and portable (Polispec, lower panel) near-infrared (NIR) systems. VIP scores are presented as trends throughout the entire spectral range and the CDA selected predictive wavelengths as vertical bars according to the canonical structure correlation coefficient (CDA predictive scores). Quantitative (µg/kg) PA/PANO classes: < LOQ (0.4), low; 0.4–400, moderate; > 400, high

CDA based on NIR spectral data

The main purpose of this study was to apply a multivariate pattern recognition method to determine the functional relationship between spectral NIR data and the presence of PAs/PANOs in a set of bee pollen samples, simulating a rapid screening to enhance the safety of on-market operating conditions. Among the supervised discriminating methods, CDA has been already proposed as a statistical model able to classify a sub-population of samples within a specific genetic, geographical or chemical class (e.g., PA/PANO amount) according to the similarity of a huge set of chemometric variables such as the NIR spectra [30]. Despite being less frequently applied, the CDA algorithm allows a reduction of data redundancy, thus the discriminating power is preserved in the first canonical discriminant function [44]. Even though CDA seems to improve the classification accuracy when spatial separation of the experimental groups is achieved by the use of a large number of closed-spectral signatures, the use of a large dataset with too many highly correlated variables could be inappropriate for selecting the more informative wavelengths avoiding the irrelevant ones [32, 45]. Indeed, a data arrangement is required when a dataset has too many irrelevant variables, a high multicollinearity among instrumental signals and a number of predictors larger than the sample size [46]. Therefore—in the case of the Foss system—before the stepwise selection, the dataset was restricted to 207 averaged spectral variables so that each one represents a 16 nm wavelength range (from λ854 to λ2494). The portable NIR system presents a short window for spectral acquisition and a relatively longer measurement interval (2 nm); hence, there was no need to restrict the original dataset composed of 390 spectral variables.

For the lab-stationary NIR system (Foss), the stepwise procedure selected 18 significant averaged wavelengths: λ1162, λ1250, λ1330, λ1362, λ1418, λ1514, λ1682, λ1746, λ1850, λ1930, λ1994, λ2042, λ2074, λ2186, λ2218, λ2250, λ2322 and λ2474. These spectral variables were used to perform the CDA algorithm that defined two significant functions, CAN1 and CAN2 (Wilks’s λ = 0.19, approximate F value = 2.85, df1 = 30, df2 = 80, p < 0.001), which explained 78.9% and 21.1% of the total variability, respectively. Regarding the portable NIR system, 22 wavelengths were sorted as the most informative by the stepwise procedure: λ1146, λ1158, λ1172, λ1186, λ1200, λ1226, λ1270, λ1324, λ1332, λ1344, λ1368, λ1416, λ1442, λ1460, λ1484, λ1514, λ1534, λ1576, λ1588, λ1632, λ1652 and λ1680. The CDA algorithm defined two canonical functions (CAN1 and CAN2) that showed a high discriminative power (Wilks’s λ = 0.25, approximate F value = 1.59, df1 = 44, df2 = 72, p = 0.039), which accounted for 71.6% and 28.4% of the total variability, respectively. As shown in Fig. 4, for both NIR systems the CDA model highlighted the possibility of separating the 0.95 confidence circles for population centroids, especially the low class from the others. This discriminative capacity was confirmed by the significant D2-Mahalanobis values, which were equal to 9.5 (p < 0.001) between low and moderate classes, and 16.1 (p < 0.001) between low and high classes for Foss, and 7.7 (p = 0.023) between low and moderate classes, and 9.9 (p = 0.022), between low and high classes for Polispec.

Fig. 4
figure 4

Biplots of the 0.95 confidence circles around the centroids for the three quantitative classes of pyrrolizidine alkaloids (PAs) and their N-oxides (PANOs) in bee pollen, based on lab-stationary (Foss, upper panel) and portable (Polispec, lower panel) near-infrared (NIR) systems. Quantitative (µg/kg) PA/PANO classes: < LOQ (0.4), low; 0.4–400, moderate; and > 400, high

Furthermore, a confusion matrix obtained by leave-one-out cross-validation confirmed that the CDA classification functions allowed the correct assignment of each sample to its actual PA/PANO class based on the restricted pool of the selected NIR features (Table 2). Overall, both NIR spectrometers showed an accurate prediction of the absence of PAs/PANOs because of the high values of predictive parameters for the low class, as summarised by the MCC values of 0.96 and 0.91 for Foss and Polispec, respectively. Conversely, assessment of the high class by the NIR-based algorithm highlighted a decrease in discriminant performance that was more relevant for the portable apparatus (MCC of 0.73 versus 0.59), even though no sample was misclassified as low class (Table 2). Considering the hypothesis of rapid spectroscopy-based screening for the absence (low class) and presence of PAs/PANOs (fusion of moderate and high classes), both NIR systems showed a false negative misclassification rate equal to 0.02 (1 out of 46), which refers to a sample from the moderate class that was recognised as from the low class. That value is lower than the 0.05 threshold for false negatives usually applied in toxicological screening tests [47]. For Polispec, there was also a false-positive misclassification rate of 0.07 (1 out of 14) related to wrong assignment of a sample from the low class to the moderate class, but this represents only an additional charge because it implies further confirmatory analysis by LC–MS/MS, which is the analytical technique recommended by food safety authorities.

Table 2 Confusion matrix and descriptive statistics in cross-validation (leave-one-out criterion) for the three quantitative classes of pyrrolizidine alkaloids (PAs) and their N-oxides (PANOs) in ground bee pollen based on lab-stationary (Foss) and portable (Polispec) near-infrared (NIR) systems

The outcomes of this study indicate the potential capability of NIR spectroscopy to perform reliable screening of bee pollen samples contaminated with PAs/PANOs. Indeed, these natural toxins significantly affected the spectral data, indicating that the NIR technology can be applied for a rapid evaluation of the presence of contaminants [48]. However, as underlined by the decrease in the sensitivity and specificity values (Table 2), the attempt to distinguish a moderate from a high level of PAs/PANOs seems to be partially predictable, probably due to the chemical structure and stereochemistry of these molecules, which influence their toxicity and physical–optical properties [49].

The slightly lower capability of the tested portable NIR instrument is in agreement with the literature. This phenomenon is probably due to worse optical properties and the negative interference of the field operative conditions compared to the lab-stationary apparatus [26, 50], even though its flexibility and the possibility of application at/in-line represent major advantages.

The CDA algorithm (by means of the canonical standardised coefficient) highlighted a relatively similar explicative power of all the predictors because their loading values (predictive scores) ranged from 0.30 to 0.50 and these selected wavelengths involved the spectral region > 1150 nm (Fig. 3). Among them, within the 1140–1370 nm region, λ1162, λ1250, λ1330 and λ1362 (Foss) and λ1146, λ1158, λ1172, λ1200, λ1226, λ1270, λ1324, λ1332, λ1344 and λ1368 (Polispec) could be related to C–H stretching second overtones [51]. The absorption bands λ1418 (Foss) and λ1416 (Polispec) can be related to O–H stretching first overtones of R–OH [52] that are present in the chemical structure of PAs/PANOs. The predictors λ1442 and λ1460 (Polispec) could be associated with the first overtones of cyclic amines [52], which compose the basic structure of all PAs/PANOs. In relation to these latter selected wavelengths, the 1400 nm has been already suggested as a shared region between tertiary-amines and N-oxides, where these molecules seemed to interact with electromagnetic radiation, resulting in vibrational spectroscopic signals [25]. Although the N-oxide spectrum is related to many wavelengths in the visible and infrared regions, the interpretation of the spectroscopic dynamic behaviour of the structure of PANOs is still lacking [53]. Another region that can also be affected by cyclic amines is within the 1900–2100 nm region [25], represented by λ1930, λ1994, λ2042 and λ2074 (Foss). The CDA feature stepwise procedure also sorted some informative wavelengths between 1500 and 1590 nm, including λ1514 (Foss) and λ1514, λ1534, λ1576 and λ1588 (Polispec). These wavelengths may be influenced by stretch overtones of O–H and combination bands of N–H with C=O, and C–N stretching of amine/amide structures [25]. The absorption peak at λ1850 (Foss) may be related to the second overtone of C=O stretching of carboxyl groups [52], which are also found in the PA/PANO chemical structure. Some of these alkaloids can also contain epoxide–amine structures that may influence the λ1632, λ1652 (Polispec) and λ2218 (Foss) regions [25]. In the case of the Foss spectrometer, the selected wavelengths of λ2186, λ2218, λ2250, λ2322 and λ2474 can be considered informative NIR variables because they correspond to combination bands of N–H from amines/amides and O–H from alcohols, and C–H and C–C stretching from –CH, –CH2 and –CH3 [43, 54].

Conclusions

The outcomes of this work highlighted that over 75% of the 60 bee pollen samples analysed were contaminated with PAs/PANOs, even though less than 30% might have a concentration higher than the threshold of 400 µg/kg used for the NIR analyses. However, a relevant variability was observed in the distribution of the 17 PAs/PANOs monitored, as confirmed by the relevant values of standard deviation within both moderate and high classes, which represent a classification method below or above the threshold of 400 µg/kg.

The presence of the PAs/PANOs in bee pollen significantly affected the NIR spectra, highlighted a rising level of absorbance as their concentration increased. This outcome suggests that their chemical structure interferes with the light scattering throughout many band regions. The application of CDA resulted in a modelling statistical approach that demonstrates the predictive capacity of NIR systems to distinguish among the three quantitative PA/PANO classes, especially for detection of those samples belonging to the low class, which corresponds to safe samples.

In summary, both NIR systems have the potential to be applied for rapid and reliable identification of contaminated bee pollens in large-scale screening in the food supply chain by also using an at-line operating system, however, the lab-stationary might be considered the more feasible NIR tool to achieve this purpose. PA/PANO detection could be strengthened by applying a chemometric approach based on a further informative dataset shared among the research community and food safety agencies.