Learning in layered multimodal classifier architectures for cognitive technical systems

Loading...
Thumbnail Image

Date

2016-07-08

Journal Title

Journal ISSN

Volume Title

Publication Type

Dissertation

Published in

Abstract

Modern computer systems have changed our way of living fundamentally. They improve our effectiveness by assisting us in our work and daily tasks. However, current systems are limited to a direct input of commands. Furthermore, they are unable to take active decisions on the behalf of the user, mostly because of a lack of information about the user. Cognitive technical systems (CTS) pick up on these deficiencies by recognizing user states and the user’s environment with the help of sensor data. The derived information is collected in a knowledge base and further processed by the application and the dialog management to perform the decision making. In this thesis, new methods addressing sensor-based state recognition in the context of CTS in human-computer interaction are developed and empirically evaluated. The focus is set on large multimodal and temporal multiple classifier systems. Furthermore, the work covers the topics sequential classifiers, handling of partially-available information, and integration of sub-symbolic and symbolic information for complex state recognition. Following approaches are presented in this work: ensemble Gaussian mixture model (EGMM), conditioned hidden Markov model (CHMM), fuzzy conditioned hidden Markov model (FCHMM), hidden Markov model using graph probability densities (HMM-GPD), Markov fusion network (MFN), Kalman filter for classifier fusion and layered classifier architectures. The EGMM extends the classical GMM by the ensemble technique in order to achieve a more robust density estimation. The CHMM and the FCHMM extend the HMM by an additional causal sequence which influences the hidden states. The HMM uses a sequence of discrete causes, whereas the FCHMM uses a sequence of causes with fuzzy memberships. Both approaches can further be utilized to the integrate symbolic information. The HMM-GPD introduces graph probability densities as observations in HMM. MFN and Kalman filter for classifier fusion are probabilistic algorithms for temporal and multimodal late fusion which are robust against sensor failures. Within this thesis, the unidirectional layered architecture (ULA) and the bidirectional layered architecture (BLA) are proposed. Both architectures recognize complex classes based on probabilistic logical rules and the temporal combination of basic patterns. Each layer recognizes patterns based on the class predictions of the underlying layer. Hence, upper layers recognize more complex patterns. The BLA additionally propagates information in the direction of the lower layers. The empirical evaluation of the proposed methods is performed on datasets for affective state and activity recognition, e.g. the Freetalk dataset, AVEC 2011, AVEC 2012, AVEC 2013 and UUlmMAD. The EGMM proved to be more robust and accurate when compared to the conventional GMM approaches. It was shown that the selection of suitable parameters is considerably easier. Further evaluations showed that the multimodal late fusion using the CHMM outperformed the HMM on the Freetalk dataset. The HMM-GPD was studied in the field of activity recognition and showed a good view-invariant performance. The classification was performed on sequences of graphs extracted from partially occluded skeleton models. The MFN and Kalman filter for classifier fusion was studied on the AVEC datasets and achieved good results in comparision to other approaches. Furthermore, it was shown that they outperformed classic point-wise and windowed Fusion approaches. A comprehensive study analyzing the ULA showed that the FCHMM is well-suited to recognize states on different layers given unsegmented sequential data. A dynamic Markov logic network implemented the probabilistic logical rules in the uppermost layer. The thesis further presents a new dataset which was recorded in order to study the BLA. The development of a CTS brings new challenges to the recognition of user’s state and his environment. The presented work identifies important properties in this area and proposes and evaluates methods tailored to this operational area.

Description

Faculties

Fakultät für Ingenieurwissenschaften, Informatik und Psychologie

Institutions

Institut für Neuroinformatik

Citation

DFG Project uulm

Keywords

Ensemble GMM, Probabilistic graphical model, Markov fusion network, Kalman filter for classifier fusion, Undirectional layered architecture, Bidirectional layered architecture, Inequality constraint multi-class F2-support vector machine, Graph probability density, Conditional hidden Markov model, Fuzzy conditional hidden Markov model, Markov-Modell, Datenfusion, Multisensor, Graphical modelling, Kalman filtering, Multiple criteria decision making, Multisensor data fusion, DDC 000 / Computer science, information & general works