Introduction

Post-traumatic stress disorder (PTSD) is a mental disorder that can be caused by terrible life-threatening events either by witnessing or experiencing them. The traumatic events cause severe health effects that include poor life quality, worsening of physical fitness, early mortality rate as well as high psychological health comorbidity (Wallace and Sweetman 2021). PTSD is accompanied by substantial medical comorbidities which include a high risk of dementia, cardio metabolic disorders as well as chronic pain (Girgenti et al. 2021). The prevalence rate of PTSD in the United States ranges from 8 to 13% for women and 4 to 6% for men. The percentage rate of women is high due to diverse traumatic disorders like memory processing, genetics as well as emotional learning (Kaseda and Levine 2020). The symptoms of PTSD include severe anxiety, flashbacks, uncontrollable thinking about a particular event, sleeplessness, nightmare traumas, concentration lacking, emotional detachment, etc. The person affected by PTSD often evades them from various places, things and activities (Chang and Park 2020). The majority of the people experiencing traumatic events face troubles in adjusting with other persons and they don’t feel to eat and doing regular activities but with personal care and self-support, they can be recovered easily (Salehi et al. 2021).

The symptomatic principle of PTSD constitutes four symptom clusters namely the symptomatic principle of PTSD constitutes four symptom clusters namely avoidance, variation in reactivity and arousal, intrusion as well as an amendment in mood and cognition (Fenster et al. 2018). In the contemporary world, PTSD experiences a substantial tendency to enhance depending on social pressure, domestic abuse, assault, war, etc. PTSD is considered a clinical issue that influences numerous people from diverse trades all over the world. Meanwhile, PTSD also affects the children who experience stress due to severe trauma, death of close friends or a family member will be affected long duration. When the children develop such types of stress they may be diagnosed with PTSD (Lewis et al. 2019). The children who have experienced PTSD may have trouble staying organized, paying attention thereby feeling fidgety and restless. PTSD children probably consist of comorbid circumstances since the traumatic disorder occurs in diplomatic periods containing neurological consequences. The common comorbid conditions are depression, dysthymia, anxiety disorders, personality disorders, alcoholism, somatization, etc. (Lehavot et al. 2018).

Numerous research scholars recorded diverse complexities to estimate the severity of the PTSD symptoms in the patients. Therefore, diagnosing PTSD and obtaining accurate diagnosing techniques becomes a more complicated task. The accurate reasons for PTSD are difficult to comprehend but can be computed effectively by employing machine learning approaches that further help the medical practitioners to attain a quick accurate decision by enhancing the rate of accuracy, minimum cost and time related to the treatment of the patients (Haruvi-Lamdan et al. 2020). Machine learning is a domain originating from artificial intelligence that is capable of determining the most significant structures and non-obvious patterns in the data. In addition to this, the machine learning techniques provide effective results for certain issues that are difficult to model (Giannopoulou et al. 2021; Jose et al. 2021; Sundararaj and Selvi 2021; Sundararaj 2019; Sundararaj 2016; Rejeesh and Thejaswini 2020; Srinivasan and Madheswari 2018).

This paper develops a speech based post-traumatic stress disorder monitoring method and the significant objective of the proposed method is to determine if the patients are affected by PTSD. The significant contribution of this paper is illustrated below.

  • To develop a speech based post-traumatic stress disorder monitoring method to determine the PTSD affected patients.

  • Utilizing three different phases’ namely pre-emphasis phase, feature extraction phase as well as classification phase for identifying PTSD.

  • To implement XGBoost based Teamwork optimization (XGB-TWO) algorithm for classification purposes.

The rest of the paper is structured in the following section. In Sect. 2, the past literature works based on PTSD are presented. In Sect. 3, the problem definition is discussed and in Sect. 4, the proposed methodology constituting three different phases is explained. Finally, in Sect. 5, the performance evaluation and the comparative analysis are performed. In Sect. 6, the conclusion of the paper along with the future scope is presented.

Review of related works

Vinkers et al. (2021) demonstrated a successful treatment based on post-traumatic stress disorder using DNA Methylation analysis. In this paper, the data samples utilized were the extracted blood samples. Various evaluation metrics namely efficacy, time, PTSD score, mean value were employed to determine the effectiveness of the system. By evaluating the performances measures the longitudinal sampling was enhanced and the confounding risk was minimized. But there arises a consistent influence in peripheral blood cells.

The continuous detection and monitoring of the post-traumatic stress disorder (PTSD) triggering in between veterans was developed by McDonald et al. (2019). A supervised machine learning technique namely the support vector machine, random forest algorithms were employed in analyzing two different types of datasets namely post-traumatic stress disorder (PTSD). The performance rate obtained in terms of sensitivity and specificity was high. But due to high overfitting issues and fluctuations in heartbeat rate, this approach was ineffective.

Chen et al. (2021a, b) demonstrated a neural connectome prospectively encoding the risk of post-traumatic stress disorder (PTSD) symptoms during the COVID-19 pandemic situation. The datasets employed for evaluation are obtained from real-time COVID-19 epidemic data from Johns Hopkins University and Satellite Remote Sensing Image data. Moreover, the confidence interval was high with high accurate prediction. But the time consumption during the implementation process was high.

The changes in functional connectivity after theta-burst transcranial magnetic stimulation for post-traumatic stress disorder using machine learning techniques were developed by Zandvakili et al. (2021). An intermittent theta-burst stimulation technique was utilized to analyze certain performance measures like misclassification cost, classifier score and Z- scored coherence. The dataset was collected from EEG data from Providence VA Medical Center in Providence, RI, USA and the analysis was conducted to enhance the classification accuracy rate. Meanwhile, the sample size was modest with a sparse electrode system.

Moroni et al. (2021) utilized the stress-alexithymia hypothesis to investigate neuropsychiatric manifestations of systemic lupus erythematosus. Confidence interval, efficiency, mean value rate were the performance metrics employed for simulation. The experimental investigations were conducted and the analysis revealed that the quality of life was improved with high efficiency rate. But this approach failed to implement novel therapeutic approaches.

A low-cost neuro feedback-based wearable electroencephalography (EEG) for reducing the symptoms in chronic Post-Traumatic Stress Disorder was suggested by Du Bois et al. (2021). The database was collected from clinical datasets from Rwanda to evaluate the performance measures namely true positive rate, false-positive rate. The experimental results showed that the efficiency of this approach was high. On the other hand, improper treatment given to the patients was considered as a significant drawback.

Scott et al. (2021) developed a concept based on the association of traumatic brain injury, post-traumatic stress disorder as well as related synergistic factors with prodromal Parkinson’s disease. Here, a case study was conducted for 1.5 million veterans and the data for evaluation was obtained from a corporate data warehouse (CDW). The performance measures namely prevalence (%), synergy index and time duration were evaluated to obtain an efficient system with minimum cost. Meanwhile, this technique failed to suspect TBI associations.

Two different research systems namely Kleinberg’s burst detection algorithm as well as integrating unified medical language system based on post-traumatic disorder was developed by Xu et al. (2020). Cumulative frequency and burst weight were recorded by evaluating the PubMed database. The experimental investigations were carried out and the result analysis showed that the accuracy was high. But the time consumption required for implementing the system was high. The overview of the past literature works based on post traumatic stress disorder are summarized in Table 1.

Table 1 Comparative performances based on PTSD

Problem definition

Numerous research scholars recorded diverse complexities to estimate the severity of the PTSD symptoms in the patients. Hence, diagnosing PTSD and obtaining an accurate diagnosis becomes a complicated task. The features obtained from modalities apart from speech have been reviewed for detecting PTSD that includes pain prescription, physiological responses, clinical assessments, history of past trauma, etc. The modalities provide relevant data but they are complex to collect. The most significant challenges involved in speech-based PTSD diagnosing system are depicted below.

  • The sophisticated machine learning approaches and the capability of diverse categories of speech features have not been investigated fully. Various features like vocal tract features, prosodic features as well as excitation features are established for various applications based on emotional state recognition. On the other hand, it is unclear if the fusion of various feature categories would assist in diagnosing PTSD.

  • Due to limited speech corpora in PSTD, there occurs difficulty in training the diagnosing model. Also, the data regarding the patient affected by PTSD are complex to gather and hence the data required for training the complex model is unobtainable. Therefore, this paper proposes XGBoost based Teamwork optimization (XGB-TWO) algorithm to classify PTSD patients from speech signals.

Proposed methodology

This paper develops a speech based post-traumatic stress disorder monitoring method to determine if the patients are affected by PTSD. This monitoring method uses algorithms for monitoring the pattern of speech signals because PTSD perhaps reflects in a speech via depression and negative emotions. Initially, the patient’s speech signals are recorded and collected from various devices like phones, web cameras, mobile applications, etc. Figure 1 portrays the proposed framework architecture to process speech signals, diverse features thereby evaluating the patient's PTSD score. The proposed approach utilizes three different steps: pre-processing or pre-emphasis, feature extraction as well as classification to evaluate the patients affected by PTSD or not. The input speech signal is initially provided to the pre-processing phase where the speech gets segmented into frames. The speech frame is then extracted and classified using XGBoost based Teamwork optimization (XGB-TWO) algorithm.

Fig. 1
figure 1

Proposed framework to diagnose PTSD

Pre-emphasis phase or Pre-processing phase

Initially, the speech signals of the patients are divided into very short segments termed frames. Then, the speech signals are pre-processed to boost the high-frequency component. The pre-emphasis phase is an initial phase after collecting the speech signal data from the patients in order to diagnose PTSD. The pre-processing phase constitutes silence elimination, pre-emphasis, windowing as well as normalization.

Silence elimination

The speech signals collected from the patients comprise several parts of silence. These silent speech signals are not essential since it does not contain any major information. The two significant techniques employed in the removal of silence speech are short-term energy and zero-crossing rate.

Short-term energy (STE) STE refers to the energy associated with the speech signal which is time-varying in nature. The mathematical expression involved in evaluating the short-term energy \(S_{TE} \,\) with respect to the speech sample \(y(m)\) indicates the mth speech sample is represented in Eq. (1).

$$S_{TE} \, = \frac{1}{K}\,\sum\limits_{m = 1}^{K} {y(m)^{2} }$$
(1)

Zero crossing rate (ZCR) Zero-crossing rate refers to the measure of the number of times at a given time frame or interval in which the speech signal amplitude passes through the zero value.

Pre-emphasis

The speech signals constituting the pre-emphasis phase are the crucial step of pre-processing. Here in this phase, the magnitude of high signal frequency is enhanced with an increase in signal-to-noise ratio (SNR). Also, it is employed to obtain equivalent amplitude for all types of structures. A high pass filter (HPF) is used to pre-emphasize the speech signals \(y^{^{\prime}} (m)\) using Eq. (2).

$$y^{^{\prime}} (m) = y(m) - \delta \,y\,(m - 1)$$
(2)

From the above equation, \(\delta\) signifies the pre-emphasis factor ranging from \(\delta \to 0.9.\,\,to\,\,1\).\(y(m)\) indicates the mth sample speech (Ibrahim et al. 2017).

Windowing

The most commonly used windowing technique is the Hamming window that is capable of smoothening the edges and minimizing the side lobe effects. Usually, the time of hamming window is 25 ms that overlap for every 10 ms (Frid et al. 2014).

Normalization

A long-sentence speech sample of the patients comprises both high and low amplitude values and can be computed using Eq. (3).

$$N\, = \,\frac{{S_{TE} \, - \,Min\,(S_{TE} )}}{{Max\,(S_{TE} )\, - \,Min(S_{TE} )}}$$
(3)

From Eq. (3), the normalized signal obtained from the speech signal of a long sentence is denoted by \(N\)(Chen et al. 2021a, b).

Feature extraction phase

In this paper, we have employed three different features namely the prosodic feature, excitation feature as well as vocal tract feature. The extraction of such features has been evaluated to classify the style of speaking in human speech. Initially, the speech signal is subdivided into 25 ms that overlap for every 10 ms. The description of every segment is depicted below.

Prosodic features The prosodic features have been employed successfully in emotion communication as well as emotion recognition.

Excitation features The excitation features are used in recognizing genders, emotions as well as speaker identification.

Vocal tract features The vocal tract features are employed widely in recognizing voices, speeches, robust speaker recognition as well as gender recognition. The above-mentioned feature produces 54 features from every segment and the descriptions of features are explained in Table 2 (Islam et al. 2018).

Table 2 Description of speech features

Classification phase

The extracted speech features are then provided for classifying the speech signals thereby evaluating the patients affected by PTSD or not. This phase utilizes XGBoost based Teamwork optimization (XGB-TWO) algorithm to classify the PTSD patients from the speech signals. During the classification process, the speech signal datasets are trained and tested in the proportion of 80:20. The detailed description of XGBoost and Teamwork optimization algorithm to diagnose PTSD patients are described in the following sub-section.

Extreme Gradient Boosting (XGBoost) algorithm

XGBoost stands for eXtreme Gradient Boosting that recently has been dominating various machine learning algorithms and employed widely in Kaggle Higg sub-signal detection competitions (Song et al. 2020). The improved form of the gradient boost decision tree approach is the XGBoost technique modelled to enhance the performance and speed of the algorithm. In recent years, the XGBoost algorithm has created a wide focus because of high prediction accuracy and excellent efficiency. The XGBoost algorithm comprises numerous decision trees and is employed typically in various regression and classification domains. The architecture for XGBoost ensemble-based algorithm is presented in Fig. 2. On the other hand, the XGBoost algorithm differs from the gradient boost decision tree for two aspects.

  • Initially, the XGBoost algorithm includes 2nd order Tailor series, whereas the gradient algorithm employs only 1st order Tailor series with respect to the loss function.

  • Secondly, the XGBoost algorithm utilizes the normalization technique to minimize the model complexity as well as to avoid overfitting issues.

Fig. 2
figure 2

XGBoost framework

The mathematical formulation and model details of the XGBoost algorithm are illustrated in the following sub-section.

Mathematical model

Let us consider the dataset \(d = \left\{ {\left( {a_{j} ,b_{j} } \right)} \right\}\), where j = 1, 2, 3,…, N. The model is learned or trained by means of Q trees. The outcome of a model \(\left( {\hat{x}_{j} } \right)\) is expressed in Eq. (4)

$$\hat{x}_{j} = \Phi \left( {y_{j} } \right) = \sum\limits_{Q = 1}^{Q} {F_{Q} (y_{j} } ),F_{Q} \in J$$
(4)

From the above expression, hypothesis space is denoted as J and regression tree is signified as P(y).

From Eq. (1),

$$J = \left\{ {F(y) = \vartheta_{p(y)} } \right\}$$
(5)

Here, the leaf node is represented as \(\vartheta\) and the \(y^{th}\) sample leaf node is indicated as \(f(y)\). The predicted output of \(n^{th}\) iteration is mentioned in Eq. (6).

$$\hat{x}_{j}^{n} = \hat{x}_{j}^{n - 1} + F_{n} (y_{j} )$$
(6)

Then the mathematical expression based on fitness function is formulated in Eq. (7)

$$G(F_{n} ) = \sum\limits_{j = 1}^{N} \mu (x_{j} ,\hat{x}_{j}^{n - 1} ) + F_{n} (y_{j} ) + \Delta (F_{n} )$$
(7)

From the above equation, the loss function is \(\mu\) and the complexity of a model is \(\Delta (F_{n} )\) that contains score and leaf node denoted by \(\vartheta\) and R respectively. Thus,

$$\Delta (F_{n} ) = \lambda \cdot R_{n} + \gamma 1/2\sum\limits_{i = 1}^{R} {\vartheta_{i}^{2} }$$
(8)

Then the following formula is simplified using 2nd order Tailor series.

$$G(F_{n} ) = \sum\limits_{j = 1}^{N} {\left[ {V(x_{j} ,\hat{x}_{j}^{n - 1} ) + b_{j} F_{n} (y_{j} ) + 1/2\;r_{j} F_{n}^{2} (y_{j} )} \right]} + \Delta (F_{n} )$$
(9)

From the above equation, the loss function containing 1st order and 2nd order derivatives are \(b_{j}\) and \(r_{j}\) respectively.

From Eq. (9),

$$b_{j} = \frac{{\partial V(x_{j} ,\hat{x}_{j}^{n - 1} )}}{{\partial \hat{x}_{j}^{n - 1} }}$$
(10)
$$r_{j} = \frac{{\partial^{2} V(x_{j} ,\hat{x}_{j}^{n - 1} }}{{\partial \hat{x}_{j}^{n - 1} }}$$
(11)

In accordance with the analysis formulated above, the fitness function is expressed as,

$$G(F_{n} ) = \sum\limits_{j = 1}^{N} {\left[ {b_{j} \vartheta_{{p(y_{j} )}} + 1/2r_{j} \vartheta_{{p(y_{j} )}}^{2} } \right]} + \lambda R + \gamma 1/2\sum\limits_{j = 1}^{R} {\vartheta_{i}^{2} }$$
(12)

From Eq. (12), \(\vartheta_{{p(y_{j} )}}^{{}}\) signifies the training instant set in jth leaf.

The best leaf provided the structure of a current tree \(\vartheta_{i}^{ * }\) is expressed in Eq. (13).

$$\vartheta_{i}^{ * } = \frac{{ - \sum\nolimits_{{j \in K_{i} }} {b_{j} } }}{{\sum\nolimits_{{j \in K_{j} }} {r_{j} + \gamma } }}$$
(13)

The optimized fitness function containing the optimal solution is expressed in the following equation.

$$G(F_{n} ) = - 1/2\;\sum\limits_{i = 1}^{R} {\frac{{\left( {\sum\nolimits_{{j \in K_{i} }} {b_{j} } } \right)^{2} }}{{\sum\nolimits_{{j \in K_{i} }} {r_{j} + \gamma } }}} + \lambda R$$
(14)

Generally, the optimizers are the algorithms that are employed in varying the attribute of the ensemble technique (i.e. XGBoost algorithm) namely learning rate and weight to reduce the loss. Therefore, this paper utilized teamwork optimization (TWO) algorithm to optimize the weight function. The mathematical formulation and the steps involved in TWO algorithm are illustrated in the following section.

Team work optimization (TWO) algorithm

The TWO algorithm is modeled based on the basis of computer simulation of correlations and performance characteristics of team members accomplishing their duties and attaining a desirable target. The search agents in TWO algorithm is considered as team members and the relationship among the team members acts as a tool to transmit information. The mathematical formulation of TWO algorithm is illustrated in the following sub-section (Dehghani et al. 2021).

Mathematical model

In TWO algorithms, every team mate present in a team signifies an appropriate solution to the optimization issue. The matrix function containing the total number of members and rows is equivalent tothe total number of members and columns. Thus

$$Z = \left[ {\begin{array}{*{20}c} {Z_{1} } & {z_{1,1} } & \cdots & {z_{1,h} } & \cdots & {z_{1,p} } \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ {Z_{a} } & {z_{a,1} } & \cdots & {z_{a,h} } & \cdots & {z_{a,p} } \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ {Z_{T} } & {z_{T,1} } & \cdots & {z_{T,h} } & \cdots & {z_{T,p} } \\ \end{array} } \right]_{T \times p}$$
(15)

From Eq. (16), the population in matrix form is \(Z\). The ath team member in the population matrix is \(Z_{a}\). \(z_{a,h}\), \(T\) and p signifies the problem variable of ath team member, the total number of team members as well as problem variables respectively. Then the vector form of the objective function is expressed in Eq. (17).

$$J = \left[ {\begin{array}{*{20}c} {J_{1} } & {} & {J(Z_{1} )} \\ \vdots & {} & \vdots \\ {J_{a} } & | & {J(Z_{a} )} \\ \vdots & {} & \vdots \\ {J_{T} } & {} & {J(Z_{T} )} \\ \end{array} } \right]_{T \times 1}$$
(16)

The notation \(J\) and \(J_{a}\) signifies the objective function in vector form and the value of an objective function for ath team member. The following sub-section describes the performance behaviour among the team members to achieve a goal.

Supervisors

Supervisors are the members who guide and lead a team containing team members. The work performances of the supervisor will be more effective than the team members. The formula to update the supervisor step is obtained in Eq. (18).

$$Z_{a}^{U1} :z_{a,h}^{U1} = z_{a,h} + p \times (U_{h} - R \times z_{a,h} )$$
(17)
$$Z_{a} = \left\{ \begin{gathered} Z_{a}^{{U1}} ,J_{a}^{{U1}} < J_{a} \hfill \\ Z_{a} ,\;\;\;else \hfill \\ \end{gathered} \right.$$
(18)
$$U = rnd(1 + R)$$
(19)

From Eq. (20), \(Z_{a}^{U1}\) signifies the current status of the team member \(a\) in accordance with the guidance of the supervisors. The objective function value is and \(z_{a,h}^{U1}\) signifies the problem variable of ath team member in accordance with the guidance of the supervisor. \(J_{a}^{U1}\) indicates the value of an objective function. The update factor and the random numbers ranging from 0 to 1 are \(U\) and \(R\) respectively.

Team members

The members who perform their work ineffectively than the supervisors are considered as the team members.

Sharing of information

In this phase, every team member makes an attempt to enhance their performances by utilizing the experiences of other teammates to perform better than them. Therefore,

$$Z^{{m,a}} :z_{h}^{{m,a}} = \frac{{\sum\nolimits_{{i = 1}}^{{T_{a} }} {z_{{i,h}}^{{q,a}} } }}{{T_{a} }}$$
(20)
$$Z_{a}^{U2} :z_{a,h}^{U2} = z_{a,h} + p \times (z_{h}^{m,a} - R \times z_{a,h} ) \times \delta (J_{a} - J^{m,a} )$$
(21)
$$Z_{a} = \left\{ \begin{gathered} Z_{a}^{{U2}} ,J_{a}^{{U2}} < J_{a} \hfill \\ Z_{a} ,\;\;\;else \hfill \\ \end{gathered} \right.$$
(22)

From the above equations, the average of team member better than ath team member is \(Z^{m,a}\). \(J^{m,a}\) and \(z_{i,h}^{q,a}\) signifies the objective function as well as problem variable of ath team member in accordance with the team member. The current status of the team member \(a\) in this phase is \(Z_{a}^{U2}\).

Role of a supervisor on team member

Every team member tries to enhance the performance according to the guidelines and instructions are given by the supervisor.

Individual tasks

Every team member in a respective team makes an attempt to enhance their work performance using their personal effort thereby contributing more to the achievement of a particular team. Thus,

$$Z_{a}^{U3} :z_{a,h}^{U3} = z_{a,h} + ( - 0.01 + p \times 0.02) \times z_{a,h}$$
(23)
$$Z_{a} = \left\{ \begin{gathered} Z_{a}^{{U3}} ,J_{a}^{{U3}} < J_{a} \hfill \\ Z_{a} ,\;\;\;else \hfill \\ \end{gathered} \right.$$
(24)

From the above equations, the current status of the team member \(a\) in the individual task phase is \(Z_{a}^{U3}\). Figure 3 depicts the flow chart representation for the proposed XGB-TWO algorithm to diagnose PTSD.

Fig. 3
figure 3

Flow diagram for XGB-TWO approach

Results and discussions

To evaluate the performances of the speech based post-traumatic stress disorder monitoring method and to diagnose PTSD, several analyses are carried out. The proposed model is computed for diverse performance measures namely accuracy, specificity, sensitivity, and recall. Furthermore, the proposed model is compared with the existing PSTD model to determine the system efficiency. The proposed technique is implemented under the MATLAB platform.

Parameter description of the proposed approach

This section illustrates the description of parameters and their respective ranges used in proposed model experimentation. For further evaluation, the monitoring data samples are splitted into testing process and training process. During the classification process, the speech signal datasets are randomly partitioned in the proportion of 80:20 (i.e. 80% of the datasets are trained and 20% of the data is tested). The training set is employed for training the model thereby determining the hyperparameters and the testing set is utilized for evaluating the performances of the model. In addition to this, the testing set is capable of generalizing the unseen data. In the proposed model, the tuning of hyperparameters is employed to select an optimal value. Table 3 depicts the optimum ranges for certain parameters in the proposed technique.

Table 3 Proposed parameter descriptions

Evaluation metrics

Based on the evaluation of the proposed model to diagnose PTSD patients, various evaluation metrics namely accuracy, specificity, sensitivity, and recall are evaluated. The mathematical formulations involved in evaluating the simulation measures are discussed in the following section.

$$\begin{gathered} Accuracy\,A(k) = \frac{{T_{P} (k) + T_{N} (k)}}{{T_{P} (k) + T_{N} (k) + F_{P} (k) + F_{N} (k)}} \hfill \\ where\,\,k = 0,1 \hfill \\ \end{gathered}$$
(25)
$$Sensitivity\,s(k) = \frac{{T_{P} (k)}}{{T_{P} (k) + F_{N} (k)}}\,\,\,\,$$
(26)
$$Specificity\,\,S(k) = \frac{{T_{N} (k)}}{{T_{N} (k) + F_{P} (k)}}\,\,\,\,$$
(27)
$$Recall\,\,R(k) = \frac{{T_{P} (k)}}{{T_{P} (k) + F_{N} (k)}}\,\,\,\,$$
(28)

From Eq. (26) to (28), \(T_{P} ,\,\,T_{N} ,\,\,F_{P} \,\,and\,\,F_{N}\) indicates the true positive, true negative, false positive as well as false negative values correspondingly.

Dataset description

In this paper, we utilized two different types of datasets to evaluate and to classify the PSTD from the speech signals. The descriptions of two different datasets are depicted below.

Dataset 1: TIMIT dataset

The TIMIT is termed as an acoustic–phonetic continuous speech corpus which is a standard dataset employed for evaluating the automatic speech recognition system. The TIMIT dataset includes phonetic, time-aligned orthographic, and word transcriptions for every utterance.

Dataset 2: FEMH dataset

In order to detect the pathological voices and classify the disordered speech signals from the acoustic waveforms, the data are collected by Far Eastern Memorial Hospital (FEMH). Table 4 provides the description of both TIMIT and FEMH dataset.

Table 4 Dataset description

Confusion matrix

In general, the confusion matrix provides an extensive computation to describe the quality of an ensemble-based algorithm thereby solving various statistical classifications. In this article, the confusion matrix is utilized to evaluate the prediction model performances. The proposed model results are based on the true positive, true negative, false positive as well as false negative values. The true positive and negative values indicate the accurately predicted labels whereas a false positive and negative value signifies the wrongly predicted labels or mislabelled. Figure 4. depicts the confusion matrix for diagnosing PTSD. The higher the true values, the better the confusion matrix representing the accurate prediction values.

Fig. 4
figure 4

Confusion matrix

Performance analysis

Figure 5a, b illustrate the graphical analysis based on two different types of datasets namely TIMIT dataset and FEMH dataset. The experimental result for the TIMIT dataset with respect to the proposed approach is presented in Fig. 5a. The evaluation metrics namely the accuracy, specificity, sensitivity as well as recall are plotted and the performance rate of each metrics is noted. From the graph, it can be seen that the performance rate achieved in terms of accuracy is 97.5% and for specificity, sensitivity and recall, the performance rate achieved is 95.2%, 94.5% as well as 93.8% respectively.

Fig. 5
figure 5

Performance analysis of a TIMIT dataset and b FEMH dataset

Similarly, Fig. 5b presents the graphical evaluation based on FEHM dataset for the proposed XGB-TWO approach with respect to accuracy, specificity, sensitivity as well as recall. The experimental investigations are conducted and from the graphical analysis, it is shown that the proposed technique with respect to FEHM dataset achieved an accuracy rate of 96.29%, specificity rate of 93.53%, sensitivity rate of 92.69% as well as recall rate of about 90.4%. Therefore, from the evaluation results of both datasets, the performance rate of the TIMIT dataset achieved is high than the other dataset.

Figure 6a, b demonstrates the receiver operating characteristic (ROC) curve for the training and the testing datasets. The ROC curve indicates the graphical presentation for evaluating the binary classifiers. The ROC curve connects the ROC points independent of error cost and class distribution. In addition to this, the ROC curve demonstrates the classifier characteristics as well as the predictive performance at diverse probability levels. The area present under the ROC curve is referred to as area under curve (AUC). Figure 6a depicts the graphical plotting of a ROC curve for the training dataset with respect to the true positive and true negative values. The graphical curve is plotted and the result analysis emphasized that the AUC curve rate achieved is 0.98. Figure 6b deliberates the graphical analysis of a ROC curve for the testing dataset with respect to the true positive and true negative values. From the graphical analysis, it is demonstrated that the testing dataset covers a very wide range achieving a 0.99 AUC rate.

Fig. 6
figure 6

ROC evaluation results for a training, b testing datasets

Figure 7 presents the performance analysis of the proposed XGB-TWO approach for diverse performance measures namely accuracy, specificity, sensitivity as well as recall. The experimental investigation is carried out and the evaluation results demonstrated that the accuracy rate achieved for the proposed technique is 98.25%, specificity value obtained is 97.57%, sensitivity rate is 96.3% and recall value obtained is 94.18%. Thus from the experimental analysis, it is well known that the proposed technique achieved better results.

Fig. 7
figure 7

Proposed results for diverse approaches

Comparative results

In general, the ROC curve connects the points in the ROC space apart from error cost and class distribution. Figure 8 depicts the comparative graphical performances based on AUC rate for various techniques namely support vector machine (SVM) [13], deep belief neural network (DBN) (Banerjee et al. 2019), Logistic regression (Lenferink et al. 2020) as well as the proposed XGB-TWO approaches. The graph is plotted for true positive value and false positive value where the x-axis signifies the false positive value and the y-axis indicates the true positive value. The investigations are performed and the analysis revealed that the AUC rate of the proposed technique is better with 0.99 than other techniques.

Fig. 8
figure 8

ROC graphical representation for various techniques

Figure 9a–d depicts the graphical representation based on diverse evaluation metrics namely accuracy, specificity, sensitivity as well as recall for different techniques namely SVM, DBN, logistic regression as well as the proposed XGB-TWO techniques. The experimental analysis is carried out for each respective metric and from the resultant output, it is shown that the accuracy rate achieved for the proposed technique is 98.25%, specificity value obtained is 97.57%, sensitivity rate is 96.3% and recall value obtained is 94.18%. Thus, from the analysis, it is proven that the proposed technique performs well than other techniques.

Fig. 9
figure 9

Comparative results for a accuracy, b specificity, c sensitivity, d recall

The research limitations of the proposed approach consists of few imprecision and low confidence since the effect of suicidal intentions are not discussed and the findings are very low. The complications are not identified systematically in which the rate seems to be very low. The significant limitations of this paper are that the dataset containing small sample size is implemented and failed to evaluate large number of datasets. In addition to this, high dropout rate that influences the results and it should be taken into consideration.

Conclusion

This paper proposed a speech based post-traumatic stress disorder monitoring method to determine if the patients are affected by PTSD. Numerous research scholars recorded diverse complexities to estimate the severity of the PTSD symptoms in the patients. The accurate reasons for PTSD are difficult to comprehend but can be computed effectively by employing machine learning approaches. Also, the data regarding the patient affected by PTSD are complex to gather and hence the data required for training the complex model is unobtainable. The proposed approach utilizes three different steps: pre-processing or pre-emphasis, feature extraction as well as classification to evaluate the patients affected by PTSD or not. The pre-emphasis phase is an initial phase after collecting the speech signal data from the patients in order to diagnose PTSD. The pre-processing phase constitutes silence elimination, pre-emphasis, windowing as well as normalization. The extracted speech features are then provided for classifying the speech signals thereby evaluating the patients affected by PTSD or not. This phase utilizes XGBoost based Teamwork optimization (XGB-TWO) algorithm to classify the PTSD patients from the speech signals. Finally, the performances of the speech based post-traumatic stress disorder monitoring method and to diagnose PTSD, several analyses are carried out. The proposed model is computed for diverse performance measures namely accuracy, specificity, sensitivity, and recall. Finally, the comparative graphical performances based on diverse metrics for various techniques namely support vector machine (SVM), deep belief neural network (DBN), Logistic regression as well as the proposed XGB-TWO approaches are performed. The evaluation results demonstrated that the accuracy rate achieved for the proposed technique is 98.25% whereas the other techniques are comparatively low than the proposed approach. In this paper, we have computed the proposed approach on two small datasets and it will be extended in the future to validate on large datasets. In addition to this, a hybrid metaheuristic approach will be proposed to enhance the effectiveness of the system.