A knowledge graph based speech interface for question answering systems
Introduction
A question answering system can be defined as a system which searches for a suitable answer in a knowledge base for a given question by the user. The answer may be one word, a sentence snippet, a well constructed and meaningful sentence or a collection of sentences with a logical coherence. The answer type depends on the application for which a question answering system is developed. A question answering (QA) system can be developed on different paradigms: database querying, information retrieval and knowledge graph based. The database querying method typically involves developing a question-answer pair database for a specific domain and then fetching the answer depending on the user question. Information retrieval is usually finding unstructured data, i.e. text, that satisfies the user query from large collections of data, usually on the Web. The knowledge graph based approach involves a semantic analysis of the query and then accessing a structured database (Jurafsky and Martin, 2008). The database can be a fully relational database or simple structured databases like sets of Resource Description Framework (RDF)1 triples. A speech interface to the above discussed QA systems is generally called spoken question answering. A general pipeline of spoken question answering is illustrated in Fig. 1.
A speech interface to a question answering system makes the interaction more natural. To achieve this, the most common approach is to combine an automatic speech recognition and a speech synthesis unit with a question answering system. The development of an automatic speech recognition (ASR) and a speech synthesis (SS) modules are straight forward given a database of question-answer pairs. The complexity increases if a QA system is based on information retrieval or on a knowledge graph, since it involves a semantic analysis of the question. The challenges involved in developing a spoken QA is reviewed in Rosso et al. (2012). The ASR and SS systems have their own dependencies and unsolved research problems, adding to the complexities of QA system development. In this paper, we are addressing only the speech recognition part of the interface, which is the first step of a spoken QA process and also more complicated than speech synthesis because of the necessity to recognise multiple speakers. In contrast, a speech synthesiser can be developed based on a single speaker. The core research challenges (e.g. multiple speakers, ambient noise, different dialects, etc.) in ASR are not the focus of this paper. The main focus is on the interface between a speech recogniser and a QA system. It has to be noted that many question answering systems which integrate speech recognition without speech synthesis in the overall architecture are also called “speech interfaced question answering” or “spoken question answering”. We are following the same terminology, as our focus is only on the speech recognition interface. It should be mentioned that spoken question answering can also be developed for spoken documents, but the focus of this paper is only on text documents.
A review of existing spoken question answering systems is presented in this paper, and the limitations are discussed. The existing spoken QA systems use ASR by adapting the acoustic and the language model to a pre-determined set of QA data. After the recognition step, language processing is carried out to interpret the recognised questions. In most of the methods discussed in the review section, the focus is on developing a spoken QA system rather than to address the issues of recognition errors or language processing faults. Language processing involves parts of speech tagging, parsing, named entity extraction, relation extraction, sequence labeling etc. It is integrated with the ASR output depending on the QA application. If the input to QA is an entity from a recognised sentence, then an off-the-shelf entity recogniser is integrated. The recognition accuracy and the efficiency of a language processing unit play an important role in the overall success of a spoken QA. To address problems associated with the speech interface to QA, we will use a knowledge graph and demonstrate with examples how it could be used to solve problems related to automatic speech recognition and spoken language understanding (We refer to language understanding as spoken language understanding (SLU) here, since it is processing of the recognised utterance). ASR specific problems include the use in rescoring to reduce the word error rate, for the out-of-vocabulary problem and rarely seen words, the creation of language models and multi-domain speech recognition, while SLU specific problems include slot filling and intent detection, entity relation extraction, unsupervised data collection for training/testing. In the literature, the knowledge graph concept is used for SLU (discussed in Section 7.2) but to our knowledge we are the first to demonstrate its use to solve ASR specific problems. We have also demonstrated how a knowledge graph can be used as a common data model to build a unified system combining speech recognition and language understanding. We use the term “Spoken Language Recognition and Understanding” (SLRU) for a unified system. There are few works in the literature where they have considered ASR and SLU as one unit (Yaman et al., 2008) but there is no method which describe the speech interface with this framework. In SLRU we are making use of the semantics in the knowledge graph to address the recognition errors and interpret the text.
The motivation for speech interfaced QA is presented in Section 2. A review of conventional spoken QA systems is conducted in Section 3. The motivation for a knowledge graph based speech interface is presented in Section 4. In Section 5, the definition of speech recognition, spoken language understanding and knowledge graph is presented. The SLRU design is described in Section 6. In Section 7, we illustrate how the knowledge graph helps to address specific problems in ASR and SLU. The conclusion is presented in Section 8.
Section snippets
Motivation and problem statement
The spoken interface to a QA system involves recognising and understanding the spoken utterance before query formation to fetch an answer. It is not realistic to make the assumption that the recognition system will produce a text without any errors and the QA system will always have to process an error free text for which it is typically designed. On the other hand, even though the QA system is exclusively designed for speech input, not a lot of research effort is made to improve the
A review of spoken question answering systems
A brief review of existing spoken question answering systems is presented in this section. The main objective of the review is to present the design and developmental limitations in the existing spoken QA systems.
Motivation for knowledge graph-based speech interfaces
State-of-the-art question answering systems are mostly successful for a fixed domain and for factoid questions. The task of question answering becomes more challenging when it is open-domain and the questions are complex and involve describing or logical reasoning. A recent research direction is towards solving the above problem with the use of structured data. The use of structured data has advantages for understanding user queries and generating the answer by analysing the relation between
Speech recognition
Speech Recognition can be described as a function mapping from sequence of observations into a sequence of words. The sequence of observations are the features extracted from the audio signal.
Let be the sequence of observations (features) where 1 < t < T and T is the length of the observation sequence and is the set of reference words where 1 < k < K and K is the number of words.
The probability of determining the word sequence given the observation
Spoken language recognition and understanding (SLRU) – a unified system
As described in Section 3, the existing spoken QA systems does not focus on the impact of recognition errors and the efficiency of the language processing unit on the QA system. If the ASR system fails to recognise an important entity in the user utterance, the whole QA process is altered. Similarly if the information in a question is not interpreted correctly, then the whole meaning of the question is lost. In most of the methods, ASR and SLU are considered as two independent units and this
Illustration of knowledge graph use in SLRU
In this section we are illustrating how a knowledge graph concept is used to solve various problems in ASR and SLU. To our knowledge, we are the first to describe the use of a KG for ASR specific problems. There are several works in the literature using a KG for SLU which are discussed here.
Conclusion
Speech interfaces to question answering (QA) systems have been a focus of research since a decade because they are very convenient for users. The existing spoken QA systems plug in the ASR module to the QA system by adapting the acoustic and the language model to a pre-determined set of questions. Similarly state-of-the-art language processing systems are used as a plug in at the output of ASR. In this setup, less attention is paid on the recognition errors and also language processing
Acknowledgment
Parts of this work received funding from the European Unions Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement No. 642795 (WDAqua project).
References (59)
- et al.
How may i help you?
Speech Commun.
(1997) - et al.
Dialogue strategy to clarify userâ;;s queries for document retrieval system with speech interface
Speech Commun.
(2006) Continuous space language models
Comput. Speech Lang.
(2007)- et al.
Rapid transition to new spoken dialogue domains: Language model training using knowledge from previous domain applications and web text resources
Ninth European Conference on Speech Communication and Technology
(2005) - et al.
Rapidly building domain-specific entity-centric language models using semantic web knowledge sources
Fifteenth Annual Conference of the International Speech Communication Association
(2014) Hyperspeech: Navigating in speech-only hypermedia
Proceedings of the third annual ACM conference on Hypertext
(1991)- Barbosa, L., Caseiro, D., Di Fabbrizio, G., Stent, A., 2011. Speechforms: From web to speech and...
- et al.
Easy contextual intent prediction and slot detection
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
(2013) - et al.
?askwiki?: Shallow semantic processing to query wikipedia
Signal Processing Conference (EUSIPCO), 2012 Proceedings of the 20th European
(2012) - Celikyilmaz, A., Hakkani-Tür, D., Tür, G., 2011. Leveraging web query logs to learn user intent via bayesian discrete...
A system for spoken query information retrieval on mobile devices
Speech Audio Process. IEEE Trans.
A system for speech driven information retrieval
Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on
Experiments in speech driven question answering
Spoken Language Technology Workshop, 2008. SLT 2008. IEEE
Probabilistic enrichment of knowledge graph entities for relation detection in conversational understanding
Proceedings of INTERSPEECH
Exploiting query click logs for utterance domain detection in spoken language understanding
Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
Using a knowledge graph and query click logs for unsupervised learning of relation detection
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Open-domain voice-activated question answering
Proceedings of the 19th international conference on Computational linguistics-Volume 1
Exploiting the semantic web for unsupervised spoken language understanding
Spoken Language Technology Workshop (SLT), 2012 IEEE
A question-and-answer classification technique for constructing and managing spoken dialog system
Speech Database and Assessments (Oriental COCOSDA), 2011 International Conference on
Speech and language processing
Language models and dialogue strategy for a voice qa system
18th international congress on acoustics (ICA2004)
Dialog navigator: A spoken dialog qa system based on large text knowledge base
Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 2
Speech recognition in the question-answering system operated by conversational speech
Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP’76.
Contextual constraints based on dialogue models in database search task for spoken dialogue systems.
INTERSPEECH
Webgalaxy-integrating spoken language and hypertext navigation.
Eurospeech
Cited by (25)
CIT: Integrated cognitive computing and cognitive agent technologies based cognitive architecture for human-like functionality in artificial systems
2018, Biologically Inspired Cognitive ArchitecturesConstruction of Knowledge Graph for Emergency Resources
2024, International Journal of Intelligent SystemsConstruction of Knowledge Mapping of College Students’ Civic and Political Informatization Education in Colleges and Universities
2024, Applied Mathematics and Nonlinear SciencesResearch on Risk Warning Technology of Electric Power Backbone Communication Network Based on Knowledge Graph
2023, 2023 IEEE 3rd International Conference on Power, Electronics and Computer Applications, ICPECA 2023Research Review of the Knowledge Graph and its Application in Power System Dispatching and Operation
2022, Frontiers in Energy Research