Introduction

Tomato (Solanum lycopersicum L.) is a popular vegetable worldwide. To bridge the seasonal gap of production and prevent the rainfall and unfavorable temperature from reducing the yield and quality, most tomatoes are cultivated in greenhouses to stabilize and mitigate these adverse environmental impacts1. However, water management remains the main issue for farmers even when cultivated under greenhouse conditions2,3.

The shortage of water resources is a major limiting factor for agricultural production in many regions. Additionally, the quality and yield of tomato are not only affected by genotype, but also related to water management. To improve crop quality and water use efficiency, a water stress is induced by applying a deficit irrigation or increasing the salinity of nutrient solution during cultivation4,5. Unfortunately, the water stress induced by underirrigation at the vegetative and reproductive periods of tomato leads to abnormal growth, aborted flowering, and fruit setting failure, which cause a significant reduction in yield and quality6,7. Under moderate water deficit, the photosynthesis is limited, but it can recover in a short time after irrigation. Conversely, if water shortage continues, the irrigation cannot reverse the photosynthesis8. Therefore, it is very important to apply the water stress at an optimum level. Different genotypes or growth stages may have various responses to water stress9,10. Most crops are drought sensitive at various growth stages. The flowering and fruit setting stages are most sensitive to water deficits in drip irrigated tomatoes11. To provide the breeding material to resist drought stress, various species of tomato have been studied12,13,14. Wild tomato is the most resilient against abiotic and biotic stress compared to the domesticated tomato12,13. Tapia et al.14 found better morpho-physiological responses such as tolerance to drought stress in wild tomato than those in domesticated tomato.

In order to achieve adequate water management, it is important to decide a suitable irrigation strategy, which relies on an accurate, reliable, and timely classification of the drought status of tomato15,16,17. The drought status of plants is mainly determined by the soil water content, morphological alternation, physiological responses, and gene expression7,18,19,20. Changing the soil moisture monitored through sensors has been criticized because of the spatial heterogeneity of soils can make the measurements unrepresentative21,22,23,24. Gene expression profiling cannot reflect the instant drought status in the greenhouse, while other kinds of stress may contribute to the same expression variation. In addition, these methods are time-consuming and labor-demanding, which limits the number of plants and scale of the experiment19,25. In contrast, the evaluation of the drought status by examining variations in physiological responses such as stomata conductance, transpiration rate, and leaf temperature by means of some instruments is relatively efficient and effortless8,26,27,28.

When a plant is under drought stress, the changes in stomatal closure are more sensitive and rapid than the water potential and leaf water content29. The stomatal closure is one of the major factors limiting plant photosynthesis under mild or moderate water stress30. Medrano et al.8 found that light-saturated stomatal conductance (gsw) can be used as a reference parameter to reflect the degree of drought for C3 plants. Besides, the species-effect of gsw on photosynthesis seems to be lower than that of the leaf water potential and relative water content8. However, although gsw provides information about the water status of plants, current methods of gsw measurement are designed to be in physical contact with leaves, which is suitable to manually measure individual plant but not favorable for large-scale and field-scale scenarios. Other common indicators to depict the water status of plants are leaf temperature and leaf-to-air vapor pressure deficit (VPD). While plants suffer from drought stress, the stomatal closure reduces the heat emission and air efflux from leaves, leading to an increase in leaf temperature and VPD. Therefore, plant temperature and leaf–air temperature difference (Tdiff) can be used to indirectly assess the plant gsw31,32,33,34. The reported indicators for evaluating plant drought status by Tdiff are stress degree day (SDD)35 and crop water stress index (CWSI)36. However, temperature-based indicators are strongly influenced by the VPD and air temperature (Tair)37,38,39. Therefore, in the subsequent establishment of the drought status model, except for the Tdiff, both Tair and VPD will also be considered in this study as independent variables.

Logistic regression is a statistical method that can establish the relationship between predictive variables and binary (dichotomous) and/or ordinal dependent variables40. Logistic regression has been used in the analysis of plant disease risk factors, and implemented as disease predictive models to classify with or without disease of wheat, oilseed rape, pyrethrum, and peanut plants40,41. Other than logistic regression, classification and regression tree (CART) is a non-parametric regression procedure developed by Breiman et al.42 in 1984. CART supports a non-linear classification and is capable of handling collinearity between predictive variables43. Due to its flexibility, interpretability, and broad applicability, CART has been widely used as a classification algorithm for multiclass issues in agriculture, environmental protection, biomedicine, and computer science44,45,46,47.

For automated irrigation management in greenhouses, most farmers have used a timer to regularly drive irrigation or to maintain a specific soil water content. However, this method neglects to consider the plant response15. Sometimes, soil moisture may not accurately represent the plant water status, and different genotypes have various drought tolerance responses. If a traditional irrigation system is adopted, the problems of irrigation deficiencies or excesses often become unavoidable. Therefore, conducting the water management on the basis of plant response is more appropriate and accurate18. In our previous studies, plant temperature was utilized to classify the drought status of greenhouse tomato to improve irrigation system15,48. However, these studies did not consider the inferences of different genotypes and growth stages.

To facilitate and conduct proper water management, the goal of this study is to develop a simple discriminant model to instantly decide the drought status of tomatoes to serve as a rule for irrigation decision-making based on plant responses. The seedlings of cherry type tomato ‘Tainan ASVEG No. 19’ were subjected to drought and regular irrigation treatments, and the net CO2 assimilation rate (An), gsw, VPD, Tair, and Tdiff parameters were collected each day after treatment. The drought status was divided into binary (WW: well-watered; WS: water deficit stress) and ordinal (L: low stress; M: medium stress; H: high stress) variables according to the value of gsw. The Tair, VPD, and Tdiff were used as explanatory variables to build the CART and logistic regression models for predicting the drought status of the tomato. Except for the data collected from ‘Tainan ASVEG No. 19,’ data of the wild accession ‘LA2093’ (Lycopersicon pimpinellifolium) and the large fruit breeding line ‘108290’ were used to evaluate the model applicability for different genotypes and growth stages of the tomato.

Results and discussion

Relationship between gsw and An

The relationship between gsw and An was displayed using a logarithmic function, and the coefficients of determination (R2) were 0.79–0.94 (Fig. 1). Thus, approximately 80% variation of An can be explained by gsw. In addition, a strong correlation was observed between An and gsw, irrespective of the data collected from different growth stages and genotypes; the Spearman correlation coefficients (ρ) were all above 0.77 (Table 1). In fact, under drought stress, plants close stomata to avoid excessive water loss. Therefore, the closure of stomata results in a lack of CO2 required for photosynthesis. On the other hand, the lack of water causes the dehydration of tissues that conduct photosynthesis and eventually impedes the photosynthesis efficiency of plants49. A high degree of correlation between gsw and An was observed in field- and pot-grown plants8,50. These results strengthen our subsequent establishment of the drought status model based on gsw.

Figure 1
figure 1

Relationship between light-saturated stomatal conductance (gsw) and net CO2 assimilation rate (An) of (A) Tainan ASVEG No. 19 (2018–2019), (B) Tainan ASVEG No. 19 (2020), (C) breeding line 108290, and (D) LA2093.

Table 1 Spearman correlation coefficient between light saturation gsw and the parameters An, VPD, Tair, and Tdiff in different datasets.

Relationship of gsw with VPD, Tdiff, and Tair

VPD is one of the factors that induces stomatal changes in many plants51. Stomata close as the leaf-to-air VPD increases regardless of soil water conditions8,51. In the study, the ρ between gsw and VPD for all genotypes was -0.77 to -0.82 (Table 1). In addition to reducing the efficiency of photosynthesis, stomatal closure also hinders heat loss through leaf transpiration, thereby increasing the plant temperature27,52. Therefore, Tdiff should be negatively correlated with gsw. The result of this study indicated that the ρ between gsw and Tdiff for all genotypes was -0.68 to -0.89 (Table 1). Although the gsw had a slight tendency to increase as Tair rose, the correlation between gsw and Tair was very weak (ρ = 0.05–0.26) (Table 1). Raschke53 summarized the stomata feedback mechanism in which changes in temperature affect CO2 assimilation, and the open or closure of stomata that respond to temperature changes are influenced by the CO2 feedback. Urban et al.54 found that gsw increases with rising Tair. However, VPD is more important than Tair in the change in gsw55, highlighting the weak correlation between gsw and Tair. Even if Tair, Tdiff, and VPD were put into the CART model or logistic model together, Tair is kept in the final model (Table 2; Figs. 2, 3). Thus, Tair may have some influence on the prediction of gsw.

Table 2 Logistic model parameters.
Figure 2
figure 2

Classification and regression tree model for classifying binary drought status. At each intermediate node, a case goes to the left child node only if the condition is satisfied. Each node in the model has three values, the top value is the predicted class at this node, the second value is the probability that the Y = 1 (water deficit stress), and the third value is the number of observations for this node as a percentage of the whole dataset. The predicted class is a (0,1) variable, where 0 represents well-watered and 1 represents water deficit stress, respectively.

Figure 3
figure 3

Classification and regression tree model for classifying ordinal drought status. At each intermediate node, a case goes to the left child node only if the condition is satisfied. Each node in the model has three values, the top value is the predicted class at this node, the second value is the probability that the Y = 1 (medium stress), and the third value is the number of observations for this node as a percentage of the whole dataset. The predicted class 0, 1, and 2 represent low, medium, and high stress, respectively.

Classification ability of binary response models

In the study, 70% of the data of the Tainan ASVEG No. 19 (2018–2019) dataset were used to build the model. Next, the Tainan ASVEG No. 19 (2020), breeding line 108290, LA2093, and the remaining 30% data of Tainan ASVEG No. 19 (2018–2019) dataset were used as the testing sets for model validation.

The parameters of the logistic model in the model-building stage are shown in Table 2. When using the 30% data of Tainan ASVEG No. 19 (2018–2019) as the testing dataset, the classified performance of the logistic model revealed a sensitivity of 0.82, specificity of 0.96, geometric mean of 0.89, and 93.10% accuracy (Table 3). For the other testing datasets, the logistic model also had an acceptable performance with a sensitivity of 0.80–1.00, specificity of 0.79–0.85, geometric mean of 0.86–0.92, and 80.23–89.74% accuracy (Table 3).

Table 3 Performance of model validation for binary logistic model. Values that meet acceptable standards are shown in bold.

The structure of the binary CART model is illustrated in Fig. 2. For the 30% data of Tainan ASVEG No. 19 (2018–2019) used as validation, the classified performance of the CART model displayed a sensitivity of 0.75, specificity of 0.97, geometric mean of 0.85, and accuracy of 92.18% (Table 4). The CART model revealed a comparable and better performance than that of the logistic model when using different testing datasets, with a sensitivity of 0.87–1.00, specificity of 0.89–0.93, geometric mean of 0.90–0.94, and 90.82–92.11% accuracy (Table 4).

Table 4 Performance of model validation for the classification and regression tree model (binary response). Values that meet acceptable standards are shown in bold.

Classification ability of ordinal response models

In the ordinal response models, the training and testing datasets were same as the binary response models. The performances of the classified ability of the ordinal logistic model are shown in Table 5. When using 30% of the data of Tainan ASVEG No. 19 (2018–2019) as the testing dataset, the correctly classified percentages of L, M, and H statuses were 98.98%, 59.02%, and 54.00%, respectively, and the overall accuracy of the classified performances was 87.38%. For the Tainan ASVEG No. 19 (2020) testing dataset, the correctly classified percentages of L, M, and H statuses were 81.48%, 23.81%, and 100.00%, respectively, and the overall accuracy was 72.81%. The performances of classifying the drought status for the breeding line 108290 and LA2093 datasets were between Tainan ASVEG No. 19 (2018–2019) and Tainan ASVEG No. 19 (2020). The values of the overall accuracy were 77.55% and 83.72%, respectively, for predicting the different drought status under different genotypes and growth stages (Table 5).

Table 5 Confusion matrix for the ordinal logistic model. Values that meet acceptable standards are shown in bold.

The results and the structure of the multi-class CART model are represented in Table 6 and Fig. 3. When the CART model predicts the 30% testing dataset of the Tainan ASVEG No. 19 (2018–2019), the correctly classified percentages of L, M, and H statuses were 96.59%, 63.93%, and 58.00%, respectively, and the overall accuracy of the classification was 86.88%. For the Tainan ASVEG No. 19 (2020) testing dataset, the correctly classified percentages of L, M, and H statuses were 87.65%, 71.43%, and 75.00%, respectively, and the overall accuracy was 83.33%. For the breeding line 108290 and LA2093 datasets, the overall accuracy were 79.59 and 84.88%, respectively, for classifying the drought status (Table 6).

Table 6 Confusion matrix for the classification and regression tree model (multi-class). Values that meet acceptable standards are shown in bold.

Models comparison and evaluation

The binary response models performed well for the four datasets (Tables 3, 4). It is worth noting that the data used to build the model were taken from ‘Tainan ASVEG No. 19’ at the seedling/vegetative growth stage, while our testing data included ‘Tainan ASVEG No. 19’, breeding line ‘108290’, and ‘LA2093’ at the flowering stage. The acceptable performance of the binary response models indicate that the logistic and CART model have the potential to classify the binary drought status of tomatoes across different genotypes and growth stages.

As for the multi-class models, both logistic regression and CART revealed good classified ability for the L class, but poor performance in classifying M and H status (Tables 5, 6). The reason may be to the class-imbalanced datasets used in this study, as the number of cases of M and H categories were much lower than those of the L class (Table 7). Class imbalance may lead to poor recognition of M and H categories by the models and contribute to the declined classification capability56,57. The performance of the multi-class model can be further improved if the class number of the dataset was redistributed using some resampling methods58,59.

Table 7 Descriptions of the four datasets used in this study.

When comparing the performances of the logistic and CART models, it can be found that the latter generally outperformed the former (Tables 3, 4, 5, 6). In the case of highly class-imbalanced data, unsatisfactory model performance for the logistic model was often observed60. Even if the logistic model has a good performance, it is difficult to interpret and visualize the classification process, contrary to the process of CART analysis60. In addition, the CART model makes fewer assumptions than those of the logistic regression and can deal with complex interactions and nonlinearities61. These properties contribute to the capability of the CART model to handle class-imbalance datasets42,60,62, outperforming the logistic model60,63. After comprehensively considering the classified performance and convenience of the application, the CART models were recommend for predicting the drought status of tomatoes. In application, only the air temperature, relative humidity, and plant temperature sensors need to be installed to achieve the values of input variables required by the model and set the decision rules of CART in the control system. Taking the decision rule on the rightmost of Fig. 2 as an example, when the Tdiff \(>\) 0.64 °C and the VPD \(>\) 1.7 kPa, the tomato has a high probability (0.95) of being in a state of water shortage, indicating that it should be irrigated at this time.

Conclusions

The proposed CART model with Tair, VPD, and Tdiff as independent variables had a good performance on predicting tomato drought status. The performance of the CART model was generally better than that of the logistic model both in binary and ordinal responses. In addition, the results indicated that the CART model can classify the WW and WS as well as the L, M, and H statuses for domesticated and wild tomato genotypes at different growth stages. Taking the advantages of the convenient measuring of input variables, good classified performances, and the intuitive visualization, the proposed CART model can be utilized as a simple and practical method to classify the drought status of diverse tomato genotypes at vegetative and reproductive stages. In fact, the proposed method only needs to install air temperature, relative humidity, and plant temperature sensors and sets the decision rules of CART in the greenhouse to control the water supply system. In the future, the data of water shortage in the fruiting stage can be taken into consideration to further verify the reliability of the model. The performance of the proposed model can be further improved if the class imbalance problem is solved.

Methods

Experimental layout

In order to develop a drought stress detection method across different growth stages (vegetative and reproductive stages) and genotypes, two experiments were conducted in the 1# and 2# solar greenhouses at the Taiwan Agricultural Research Institute (TARI) located in Taichung City, Taiwan (latitude 24° 01′ N, longitude 120° 41′ E). In the 1# greenhouse, the cherry tomato cultivar ‘Tainan ASVEG No. 19’ was used between 2018 and 2019. Eight young seedlings with 6–8 fully expanded leaves were planted in baskets (50 cm × 40 cm × 30 cm) with 6D soil substrate (BVB, De Lier, The Netherlands). The experiment contained two irrigation treatments, a regular watering and drought treatments. In the regular watering treatment, tomato was irrigated daily until the field capacity was reached. For the drought treatment, no irrigation was applied after transplanting to mimic a progressive drought condition. The substrate volumetric water content (SVWC) was determined by WaterScout SM100 (Spectrum Technologies, Aurora, IL, USA). Four digital sensors were inserted evenly into the substrate at 10 cm depth of each plastic basket. The SVWC was recorded every 30 min after the regular irrigation and drought treatments were applied to tomato seedlings. In total, the experiment in 1# greenhouse was performed seven times at different time points.

In the 2# greenhouse, except for ‘Tainan ASVEG No. 19’, wild accession ‘LA2093’ and large fruits breeding line ‘108290’ were planted in the peat moss during the 2020 summer. Tomatoes were planted at a density of approximately 27,900 plants/ha. Differing from the 1# greenhouse, the irrigation treatments (regular watering and drought treatments) started from the flowering stage to the fruit setting stage in the 2# greenhouse, because this period was most sensitive to water deficits in drip irrigated tomatoes11. The study complies with relevant institutional, national, and international guidelines and legislation.

Environmental parameters and physiological data collection

For each tomato plant, 3–5 fully expanded leaves from the top of the plant were continuously measured. The leaf temperature, Tair, An, gsw, transpiration rate (E), and leaf-to-air VPD were measured using a LI-6800 portable photosynthesis system (LICOR Biosciences, Lincoln, NE, USA) at ambient air temperature (28.0–32.0 °C), air humidity (RH = 60%), reference CO2 concentration (400 μmol mol−1), and stable light intensity of 1200 μmol photons m−2 s−1 using an internal LED light source (red:blue = 9:1). Measurements were taken between 10:00 and 14:00. Data collection started from the drought treatment applied to the tomato showed clear signs of water shortage (Fig. S1), which was judged visually. The clear symptoms of water shortage were appeared about 2 to 3 weeks after drought treatment, when SVWC was 7–12%. In the study, the observations of ‘Tainan ASVEG No. 19’ in 2018–2019, ‘Tainan ASVEG No. 19’ in 2020, breeding line ‘108290’, and ‘LA2093’ are 1238, 114, 86, and 98, respectively.

Relating the light-saturated stomatal conductance to environmental and physiological parameters

In this study, the relationship between the light-saturated gsw of the tomatoes and An was first established. Next, the parameters VPD, Tair, and Tdiff, which can affect or reflect the stomatal closure, were related to the light-saturated gsw. When building the relationship between light saturation gsw and An, several models i.e., linear regression, logarithmic curve, exponential curve, and polynomial regression were considered to find the best model using Excel 2016 software. Additionally, the Spearman correlation coefficients between light saturation gsw and the parameters An, VPD, Tair, and Tdiff were calculated.

Data labeling

The tomato drought status was assessed using the thresholds for gsw defined by Medrano et al.8. The thresholds were defined as follows: for binary response, WW, with gsw ≥ 0.15 mol H2O m−2 s−1 and WS, with gsw < 0.15 mol H2O m−2 s−1. For the ordinal response, L, with gsw 0.15 mol H2O m−2 s−1; M, with 0.05 gsw < 0.15 mol H2O m−2 s−1; and H, with gsw < 0.05 mol H2O m−2 s−1. The whole data were divided into four datasets according to the data sources: (1) Tainan ASVEG No. 19 (2018–2019), (2) Tainan ASVEG No. 19 (2020), (3) breeding line 108290, and (4) LA2093. The description of the four datasets is provided in Table 7. After labeling the drought statuses, the differences of physiological parameter values between different drought statuses were examined. Because the assumption of normality was found to be violated by the Shapiro–Wilk test, the nonparametric methods were used for the comparison of different drought statuses. Mann–Whitney U and Kruskal–Wallis tests were used to examine the differences of E, A, and Tdiff values between drought statuses for binary and ordinal responses, respectively.

For the binary response, all physiological parameter values of three genotypes differed between WW and WS statuses. The values of E and An of WW plants were significantly higher than those of WS plants. Conversely, the values of Tdiff of WW plants were significantly lower than those of WS plants (Table S1). For the ordinal response, the values of E and An decreased with the increasing drought levels, and values of Tdiff increased with the increasing drought levels. However, it was observed that the physiological parameter values of M and H statuses showed no significantly different, except LA2093 (Table S2).

Models building and validation

Logistic regression is a modeling approach that can be used to describe the relationship between predictor variables and a dichotomous or multi-category response variable. For the tomato drought status defined in the previous section, a logistic model for p-1 independent variables was defined as follows:

$${\text{logit}}\left[ {{\text{P}}\left( {{\text{Y }} = { 1}} \right)} \right] \, = {\text{ ln}} \left[ {\frac{{{\text{P}}\left( {{\text{Y }} = { 1}} \right)}}{{1 - {\text{P}}\left( {{\text{Y }} = { 1}} \right)}}} \right] = a + b_{{1}} {\text{X}}_{{1}} + b_{{2}} {\text{X}}_{{2}} + \, \cdot\cdot\cdot \, + b_{{\text{k}}} {\text{X}}_{{{\text{p}} - {1}}}$$
(1)

where P(Y = 1) is the probability of WS status, given the values of X1,···, Xp−1; a is an intercept; and b1, … , bp−1 are regression coefficients. Additionally, the probability of P(Y = 1) is 1/[1 + exp (\(-\) a \(-\) b1X1 \(-\) b2X2 \(-\) ··· \(-\) bp−1Xp−1)] in the multiple logistic regression model. It appears that the logistic model can be expressed as a logit form and is simplified as a linear function. For the final model, the threshold probability, i.e., the probability value to classify an observation to the WS status with the most accurate prediction result, was used as a classification criterion41.

For the k-class ordinal response data, one of the underlying assumptions for the ordinal logistic regression is that the regression coefficient of each independent variable is identical for each of the k \(- 1\) cumulative logits, but different intercepts (Eq. 2)41.

$$\begin{aligned} & {\text{logit}}[{\text{P}}({\text{Y}} \le j)] \, = {\text{ ln}} \left[ {\frac{{{\text{P}}({\text{Y}} \le j)}}{{1 - {\text{P}}({\text{Y}} \le j)}}} \right] = a_{{\text{j}}} + b_{{1}} {\text{X}}_{{1}} + b_{{2}} {\text{X}}_{{2}} + \, \cdot\cdot\cdot \, + b_{{\text{h}}} {\text{X}}_{{{\text{p}} - {1}}} ,j = { 1},{ 2}, \, \ldots \, ,k-1; \\ & {\text{P}}({\text{Y}} \le k) \, = { 1 }\;{\text{and}}\;{\text{ P}}({\text{Y}} \le 0) \, = \, 0. \\ \end{aligned}$$
(2)

The probability function of predicting each category of drought statuses (L, M, and H) can be defined as per Eq. (3). The probability function given the highest probability value was the predicted drought status41.

$${\text{P}}\left( {{\text{Y }} = j} \right) \, = {\text{ P}}({\text{Y}} \le j) - {\text{P}}({\text{Y}} \le j - 1) \, ,\quad j = { 1},{ 2},{ 3}$$
(3)

The CART model is a common categorical classifier, which takes either continuous or categorical variable as predictor variables to predict the continuous dependent variable, requires no assumptions, and is simple to interpret43. It employs the recursive partitioning method using all predictor variables to split subsets of the dataset to create two child nodes, repeatedly62. Starting with the entire dataset, i.e., root node, the CART approach explores all possible values of the predictor variables to find the best predictor variable that can split the node. The best partition is one that minimizes the average impurity of the two child nodes. In this study, the Gini index of diversity was used to choose the best predictor at each node. The Gini index at node t, g(t) is expressed as follows:

$$g\left( t \right) \, = \mathop \sum \limits_{i \ne j} p\left( {j|t} \right)p\left( {i|t} \right)$$
(4)

where i and j are the different categories of the dependent variable.

Regardless of the method used to build the classification model, it was randomly selected 70% data of the Tainan ASVEG No. 19 (2018–2019) dataset as the training set, and used Tair, VPD, and Tdiff as independent variables to build the model. The remaining 30% data of the Tainan ASVEG No. 19 (2018–2019) dataset were used as the testing set for model validation. In addition, the Tainan ASVEG No. 19 (2020), breeding line 108290, and LA2093 datasets were used to validate the applicability of the models to different growth substrates, genotypes, and growth stages of the tomatoes. Since the stress responses vary under different conditions64, this validation can clarify the generalizability of the proposed model65.

Discriminant ability of the models

For the binary class model, sensitivity, specificity, geometric mean, and accuracy were used to evaluate the model performance. The definition of these metrics is expressed as Eqs. (5)–(8), respectively:

$${\text{Sensitivity }} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}}$$
(5)
$${\text{Specificity }} = \frac{{{\text{TN}}}}{{{\text{TN}} + {\text{FP}}}}$$
(6)
$${\text{Geometric mean }} = \sqrt {{\text{Sensitivity}} \times {\text{Specificity}}}$$
(7)
$${\text{Accuracy }} = \frac{{{\text{TP}} + {\text{TN}}}}{{{\text{TP}} + {\text{TN}} + {\text{FP}} + {\text{FN}}}}$$
(8)

where TN is true negative (when the true drought status of the tomato was “WW,” and the model also classified it as “WW”); FP is false positive (when the true drought status of the tomato was “WW,” but the model classified it as “WS”); FN is false negative (when the true drought status of the tomato was “WS,” but the model classified it as “WW”); and TP is true positive (when the true drought status of the tomato was “WS,” and the model also classified it as “WS”).

The performance of the multi-class model were evaluated with the correctly classified percentage for each class and overall accuracy. Let us assume that a size N dataset includes k classes and each class has ni instances (i = 1, 2, …, k), and cij are the elements of the k × k confusion matrix, where i, j = 1, 2, … , k. The rows and columns of the matrix show the true and predicted values at each class, respectively. Next, the correctly classified percentage for each class and overall accuracy can be defined using Eqs. (9)–(10):

$${\text{Correctly}}\;{\text{classified}}\;{\text{percentage}}\;{\text{for}}\;{\text{class}}\;i = \frac{{c_{ii} }}{{n_{i} }} ,\quad i = { 1},{ 2}, \, \ldots \, ,k$$
(9)
$${\text{Overall accuracy }} = \frac{{\mathop \sum \nolimits_{i = 1}^{k} c_{ii} }}{{\mathop \sum \nolimits_{i = 1}^{k} n_{i} }} = \frac{{\mathop \sum \nolimits_{i = 1}^{k} c_{ii} }}{N}$$
(10)

The range of the metrics described here is from 0 to 1. The closer the values of these metrics are to 1, the better the classification ability of the model. Model performance is considered acceptable if the sensitivity > 0.85, specificity > 0.85, geometric mean > 0.75, and accuracy > 90.00% for a binary response. The acceptable standard for the ordinal response model is that the overall accuracy is > 80.00%. These criteria are set based on the median (or close to the median) of previous water status classification studies7,15,19,48,66,67.

Statistical analysis

The statistical analyses were implemented using the R software (version 4.1.3). Spearman correlation coefficients were calculated using the cor function. Binary logistic models were constructed using the glm function. The ordinal logistic model was built using the vglm function in VGAM package (version 1.1–7). The CART model was implemented using the rpart function in the rpart package (version 4.1.16).