Elsevier

Speech Communication

Volume 92, September 2017, Pages 1-12
Speech Communication

A knowledge graph based speech interface for question answering systems

https://doi.org/10.1016/j.specom.2017.05.001Get rights and content

Abstract

Speech interfaces to conversational systems have been a focus in academia and industry for over a decade due to its applicability as a natural interface. Speech recognition and speech synthesis constitute the important input and output modules respectively for such spoken interface systems. In this paper, the speech recognition interface for question answering applications is reviewed, and existing limitations are discussed. The existing spoken question answering (QA) systems use an automatic speech recogniser by adapting acoustic and language models for the speech interface and off-the-shelf language processing systems for question interpretation. In the process, the impact of recognition errors and language processing inaccuracies is neglected. It is illustrated in the paper how a semantically rich knowledge graph can be used to solve automatic speech recognition and language processing specific problems. A simple concatenation of a speech recogniser and a natural language processing system is a shallow method for a speech interface. An effort beyond merely concatenating these two units is required to develop a successful spoken question answering system. It is illustrated in this paper how a knowledge graph based structured data can be used to build a unified system combining speech recognition and language understanding. This facilitates the use of a semantically rich data model for speech interface.

Introduction

A question answering system can be defined as a system which searches for a suitable answer in a knowledge base for a given question by the user. The answer may be one word, a sentence snippet, a well constructed and meaningful sentence or a collection of sentences with a logical coherence. The answer type depends on the application for which a question answering system is developed. A question answering (QA) system can be developed on different paradigms: database querying, information retrieval and knowledge graph based. The database querying method typically involves developing a question-answer pair database for a specific domain and then fetching the answer depending on the user question. Information retrieval is usually finding unstructured data, i.e. text, that satisfies the user query from large collections of data, usually on the Web. The knowledge graph based approach involves a semantic analysis of the query and then accessing a structured database (Jurafsky and Martin, 2008). The database can be a fully relational database or simple structured databases like sets of Resource Description Framework (RDF)1 triples. A speech interface to the above discussed QA systems is generally called spoken question answering. A general pipeline of spoken question answering is illustrated in Fig. 1.

A speech interface to a question answering system makes the interaction more natural. To achieve this, the most common approach is to combine an automatic speech recognition and a speech synthesis unit with a question answering system. The development of an automatic speech recognition (ASR) and a speech synthesis (SS) modules are straight forward given a database of question-answer pairs. The complexity increases if a QA system is based on information retrieval or on a knowledge graph, since it involves a semantic analysis of the question. The challenges involved in developing a spoken QA is reviewed in Rosso et al. (2012). The ASR and SS systems have their own dependencies and unsolved research problems, adding to the complexities of QA system development. In this paper, we are addressing only the speech recognition part of the interface, which is the first step of a spoken QA process and also more complicated than speech synthesis because of the necessity to recognise multiple speakers. In contrast, a speech synthesiser can be developed based on a single speaker. The core research challenges (e.g. multiple speakers, ambient noise, different dialects, etc.) in ASR are not the focus of this paper. The main focus is on the interface between a speech recogniser and a QA system. It has to be noted that many question answering systems which integrate speech recognition without speech synthesis in the overall architecture are also called “speech interfaced question answering” or “spoken question answering”. We are following the same terminology, as our focus is only on the speech recognition interface. It should be mentioned that spoken question answering can also be developed for spoken documents, but the focus of this paper is only on text documents.

A review of existing spoken question answering systems is presented in this paper, and the limitations are discussed. The existing spoken QA systems use ASR by adapting the acoustic and the language model to a pre-determined set of QA data. After the recognition step, language processing is carried out to interpret the recognised questions. In most of the methods discussed in the review section, the focus is on developing a spoken QA system rather than to address the issues of recognition errors or language processing faults. Language processing involves parts of speech tagging, parsing, named entity extraction, relation extraction, sequence labeling etc. It is integrated with the ASR output depending on the QA application. If the input to QA is an entity from a recognised sentence, then an off-the-shelf entity recogniser is integrated. The recognition accuracy and the efficiency of a language processing unit play an important role in the overall success of a spoken QA. To address problems associated with the speech interface to QA, we will use a knowledge graph and demonstrate with examples how it could be used to solve problems related to automatic speech recognition and spoken language understanding (We refer to language understanding as spoken language understanding (SLU) here, since it is processing of the recognised utterance). ASR specific problems include the use in rescoring to reduce the word error rate, for the out-of-vocabulary problem and rarely seen words, the creation of language models and multi-domain speech recognition, while SLU specific problems include slot filling and intent detection, entity relation extraction, unsupervised data collection for training/testing. In the literature, the knowledge graph concept is used for SLU (discussed in Section 7.2) but to our knowledge we are the first to demonstrate its use to solve ASR specific problems. We have also demonstrated how a knowledge graph can be used as a common data model to build a unified system combining speech recognition and language understanding. We use the term “Spoken Language Recognition and Understanding” (SLRU) for a unified system. There are few works in the literature where they have considered ASR and SLU as one unit (Yaman et al., 2008) but there is no method which describe the speech interface with this framework. In SLRU we are making use of the semantics in the knowledge graph to address the recognition errors and interpret the text.

The motivation for speech interfaced QA is presented in Section 2. A review of conventional spoken QA systems is conducted in Section 3. The motivation for a knowledge graph based speech interface is presented in Section 4. In Section 5, the definition of speech recognition, spoken language understanding and knowledge graph is presented. The SLRU design is described in Section 6. In Section 7, we illustrate how the knowledge graph helps to address specific problems in ASR and SLU. The conclusion is presented in Section 8.

Section snippets

Motivation and problem statement

The spoken interface to a QA system involves recognising and understanding the spoken utterance before query formation to fetch an answer. It is not realistic to make the assumption that the recognition system will produce a text without any errors and the QA system will always have to process an error free text for which it is typically designed. On the other hand, even though the QA system is exclusively designed for speech input, not a lot of research effort is made to improve the

A review of spoken question answering systems

A brief review of existing spoken question answering systems is presented in this section. The main objective of the review is to present the design and developmental limitations in the existing spoken QA systems.

Motivation for knowledge graph-based speech interfaces

State-of-the-art question answering systems are mostly successful for a fixed domain and for factoid questions. The task of question answering becomes more challenging when it is open-domain and the questions are complex and involve describing or logical reasoning. A recent research direction is towards solving the above problem with the use of structured data. The use of structured data has advantages for understanding user queries and generating the answer by analysing the relation between

Speech recognition

Speech Recognition can be described as a function mapping from sequence of observations into a sequence of words. The sequence of observations are the features extracted from the audio signal.

Let O={o1,o2,o3,..ot..,oT} be the sequence of observations (features) where 1 < t < T and T is the length of the observation sequence and W={w1,w2,..wk,..wK} is the set of reference words where 1 < k < K and K is the number of words.

The probability of determining the word sequence given the observation

Spoken language recognition and understanding (SLRU) – a unified system

As described in Section 3, the existing spoken QA systems does not focus on the impact of recognition errors and the efficiency of the language processing unit on the QA system. If the ASR system fails to recognise an important entity in the user utterance, the whole QA process is altered. Similarly if the information in a question is not interpreted correctly, then the whole meaning of the question is lost. In most of the methods, ASR and SLU are considered as two independent units and this

Illustration of knowledge graph use in SLRU

In this section we are illustrating how a knowledge graph concept is used to solve various problems in ASR and SLU. To our knowledge, we are the first to describe the use of a KG for ASR specific problems. There are several works in the literature using a KG for SLU which are discussed here.

Conclusion

Speech interfaces to question answering (QA) systems have been a focus of research since a decade because they are very convenient for users. The existing spoken QA systems plug in the ASR module to the QA system by adapting the acoustic and the language model to a pre-determined set of questions. Similarly state-of-the-art language processing systems are used as a plug in at the output of ASR. In this setup, less attention is paid on the recognition errors and also language processing

Acknowledgment

Parts of this work received funding from the European Unions Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement No. 642795 (WDAqua project).

References (59)

  • A.L. Gorin et al.

    How may i help you?

    Speech Commun.

    (1997)
  • T. Misu et al.

    Dialogue strategy to clarify userâ;;s queries for document retrieval system with speech interface

    Speech Commun.

    (2006)
  • H. Schwenk

    Continuous space language models

    Comput. Speech Lang.

    (2007)
  • M. Akbacak et al.

    Rapid transition to new spoken dialogue domains: Language model training using knowledge from previous domain applications and web text resources

    Ninth European Conference on Speech Communication and Technology

    (2005)
  • M. Akbacak et al.

    Rapidly building domain-specific entity-centric language models using semantic web knowledge sources

    Fifteenth Annual Conference of the International Speech Communication Association

    (2014)
  • B. Arons

    Hyperspeech: Navigating in speech-only hypermedia

    Proceedings of the third annual ACM conference on Hypertext

    (1991)
  • Barbosa, L., Caseiro, D., Di Fabbrizio, G., Stent, A., 2011. Speechforms: From web to speech and...
  • A. Bhargava et al.

    Easy contextual intent prediction and slot detection

    Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on

    (2013)
  • F. Burkhardt et al.

    ?askwiki?: Shallow semantic processing to query wikipedia

    Signal Processing Conference (EUSIPCO), 2012 Proceedings of the 20th European

    (2012)
  • Celikyilmaz, A., Hakkani-Tür, D., Tür, G., 2011. Leveraging web query logs to learn user intent via bayesian discrete...
  • E. Chang et al.

    A system for spoken query information retrieval on mobile devices

    Speech Audio Process. IEEE Trans.

    (2002)
  • Chen, Y.-N., Wang, W. Y., Rudnicky, A. I.,. Jointly modeling inter-slot relations by random walk on knowledge graphs...
  • C. González-Ferreras et al.

    A system for speech driven information retrieval

    Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on

    (2007)
  • C. González-Ferreras et al.

    Experiments in speech driven question answering

    Spoken Language Technology Workshop, 2008. SLT 2008. IEEE

    (2008)
  • Gruenstein, A., Seneff, S., Wang, C.,. Scalable and portable web-based multimodal dialogue interaction with...
  • D. Hakkani-Tür et al.

    Probabilistic enrichment of knowledge graph entities for relation detection in conversational understanding

    Proceedings of INTERSPEECH

    (2014)
  • D. Hakkani-Tür et al.

    Exploiting query click logs for utterance domain detection in spoken language understanding

    Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on

    (2011)
  • D. Hakkani-Tur et al.

    Using a knowledge graph and query click logs for unsupervised learning of relation detection

    Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on

    (2013)
  • S. Harabagiu et al.

    Open-domain voice-activated question answering

    Proceedings of the 19th international conference on Computational linguistics-Volume 1

    (2002)
  • L. Heck et al.

    Exploiting the semantic web for unsupervised spoken language understanding

    Spoken Language Technology Workshop (SLT), 2012 IEEE

    (2012)
  • Heck, L. P., Hakkani-Tür, D., 2013. Leveraging knowledge graphs for web-scale unsupervised semantic...
  • Hu, G., Liu, D., Liu, Q., Wang, R., 2006. Speechqoogle: An open-domain question answering system with speech...
  • R. Inoue et al.

    A question-and-answer classification technique for constructing and managing spoken dialog system

    Speech Database and Assessments (Oriental COCOSDA), 2011 International Conference on

    (2011)
  • D. Jurafsky et al.

    Speech and language processing

    (2008)
  • D. Kim et al.

    Language models and dialogue strategy for a voice qa system

    18th international congress on acoustics (ICA2004)

    (2004)
  • Y. Kiyota et al.

    Dialog navigator: A spoken dialog qa system based on large text knowledge base

    Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 2

    (2003)
  • M. Kohda et al.

    Speech recognition in the question-answering system operated by conversational speech

    Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP’76.

    (1976)
  • K. Komatani et al.

    Contextual constraints based on dialogue models in database search task for spoken dialogue systems.

    INTERSPEECH

    (2005)
  • R. Lau et al.

    Webgalaxy-integrating spoken language and hypertext navigation.

    Eurospeech

    (1997)
  • Cited by (25)

    View all citing articles on Scopus
    View full text