Computerized content analysis of speech plus speech recognition in the measurement of neuropsychiatric dimensions

https://doi.org/10.1016/j.cmpb.2004.08.002Get rights and content

Summary

The Psychiatric Content Analysis and Diagnosis (PCAD) program performs automated content analysis of machine-readable transcriptions of speech samples to measure the magnitude of neuropsychiatric states and traits. Technological advances provided by computerized speech recognition may offer a possible alternative to labor-intensive manual transcription for preparation of samples for PCAD processing. To test this hypothesis, 25 digitally recorded verbal samples were transcribed both manually and by a commercially available speech recognition software package, and the transcriptions scored by PCAD. The inter-correlations between scores derived from the two different methods of transcriptions offer mixed results, with values ranging from a high of 0.920 to a low of −0.119.

Introduction

A computerized content analysis of speech methodology to measure the magnitude of neuropsychiatric states and traits has been repeatedly validated and reported on by the developers of this methodology and other users [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13]. This computerized content analysis procedure, using machine-readable transcriptions of verbal samples prepared manually, has the capacity to provide measurements on a large number of neuropsychiatric dimensions, compare the scores obtained from any verbal sample with previously established norms for each content analysis scale, and indicate by how many standard deviations from these norms the score differs [7], [8], [9], [10], [11], [12], [13]. The norms were obtained from verbal samples provided by mentally and physically healthy adults and children [2], [9], [10]. The content analysis scales validated thus far include: total anxiety (and six subscales), hostility outward, hostility inward, ambivalent hostility, depression (and seven subscales), social alienation–personal disorganization, cognitive impairment, human relations, dependency strivings and frustration, hope, achievement strivings, health/sickness, and quality of life.

The Psychiatric Content Analysis and Diagnosis (PCAD) program relies on a large dictionary of English words (over 300,000) marked with syntactic information, such as, part of speech and number (singular/plural), supplemented by a collection of idiomatic and slang (mostly American) English expressions. Some of the words and all of the idioms are marked as possible indicators of semantic content relevant to one or more of the scales.

Using syntactic information extracted from the dictionary, a parser [14] prepares an analysis of the structure of each input clause. The parser is specifically designed to be tolerant of input that is not strictly grammatical.

When a word or phrase in an input is found in the dictionary as a possible marker of an item from a scale, a partial score is added to a list of scoring candidates. This list of candidates is then examined by a set of scale-dependent procedures that consider the clause structure as well as the scoring candidates to decide the validity of each potential score. Candidate scores approved by this process are emitted as tallies applicable to the input clause, together with weighting as defined by the scale. The weighted scores for individual clauses are aggregated to provide an overall score for the sample.

The human scorer of content analysis can distinguish semantic messages being communicated in context across several grammatical clauses. Whereas, the software program, at this time, is capable of analyzing the meaning conveyed by context only within each grammatical clause. Thus, computer scoring may miss some aspects of the meaning being conveyed when two or more clauses are required to discern the semantics of the message.

We have developed regression formulae enabling us to convert computer-derived content analysis scores into human-equivalent scores by scoring large numbers of verbal samples both manually using human experts in scoring and automatically using the computer program. These computer-to-human mathematical conversions provide some correction for the problem of missed scores. Since the norms we have obtained for the content analysis scale scores from adults and children are based on human expert scorers, the machine-derived scores may be compared to the human-derived norms whenever it is desirable to determine how many standard deviations from our norms any newly obtained content analysis scores may be.

The software system generates four distinct classes of output: (1) an interlinear listing of each grammatical clause and the scores assigned to it; (2) a scoring summary for each content analysis scale, including a comparison to established norms for each scale; (3) an analysis or interpretation, in textual form, of the scale scores; and (4) possible neuropsychiatric diagnostic classifications taken from the Diagnostic and Statistical Manual, fourth edition of the American Psychiatric Association [15] that a clinician might consider in evaluating the subject or patient.

In ordinary use, recorded verbal samples are transcribed manually into computer-readable text for input to PCAD. The time and expense of this manual data preparation process places limits on the utility of the scoring program as a research tool. The current study investigates the ability to achieve and facilitate such measurements with the use of the technological advances provided by computerized speech recognition.

Section snippets

Methods

In a study, funded by the National Institute on Drug Abuse, to test the capacity of our computerized methodology to detect and measure cognitive impairment and comorbid neuropsychiatric dimensions from detoxified drug-abusing inpatients [12], digital recording equipment was used to collect verbal samples from these inpatients in response to purposely ambiguous instructions to talk for 5 min about any interesting or dramatic personal life experiences.

A staff member transcribed all recorded speech

Results

The inter-correlations between the content analysis scores derived from transcriptions by a human typist and by DNS are both encouraging and discouraging. The findings for some content analysis scales are encouraging in that they indicate that PCAD recognizes very similar and complicated mental constructs, whether transcribed from spoken human language by a person or by DNS. Promising correlations were found on the death anxiety, mutilation anxiety, separation anxiety, diffuse anxiety, total

Discussion

Perhaps too much was expected of the current speech recognition technology. There are several factors that could contribute to the differences seen in scores assigned to samples produced by the different transcription methods. The first factor is differences in word count. Assuming scorable content to be evenly distributed throughout a sample, a difference in length should lead to a corresponding difference in opportunities to detect scorable content instances. However, the numbers (see Table 3

Conclusions

On balance, our findings indicate that there are significant inter-correlations between computer content analysis derived scores using human transcriptions versus DNS transcriptions for a considerable number of neurobiological dimensions. These observations put a favorable perspective toward pursuing more of this kind of research. However, analysis of the transcripts themselves indicates that DNS is not particularly accurate in transcribing recorded verbal samples, leaving substantial room for

References (16)

There are more references available in the full text version of this article.

Cited by (8)

View all citing articles on Scopus
View full text