Elsevier

Cognitive Systems Research

Volume 66, March 2021, Pages 100-121
Cognitive Systems Research

Cognitive Evaluation of Machine Learning Agents

https://doi.org/10.1016/j.cogsys.2020.11.003Get rights and content

Abstract

Advances in applying statistical Machine Learning (ML) led to several claims of human-level or near-human performance in tasks such as image classification & speech recognition. Such claims are unscientific primarily for two reasons, (1) They incorrectly enforce the notion that task-specific performance can be treated as manifestation of General Intelligence and (2) They are not verifiable as currently there is no set benchmark for measuring human-like cognition in a machine learning agent. Moreover, ML agent’s performance is influenced by knowledge ingested in it by its human designers. Therefore, agent’s performance may not necessarily reflect its true cognition. In this paper, we propose a framework that draws parallels from human cognition to measure machine’s cognition. Human cognitive learning is quite well studied in developmental psychology with frameworks and metrics in place to measure actual learning. To either believe or refute the claims of human-level performance of machine learning agent, we need scientific methodology to measure its cognition. Our framework formalizes incremental implementation of human-like cognitive processes in ML agents with an implicit goal to measure it. The framework offers guiding principles for measuring, (1) Task-specific machine cognition and (2) General machine cognition that spans across tasks. The framework also provides guidelines for building domain-specific task taxonomies to cognitively profile tasks. We demonstrate application of the framework with a case study where two ML agents that perform Vision and NLP tasks are cognitively evaluated.

Introduction

Advances in Artificial Intelligence (AI) are taking us closer to the Artificial General Intelligence (AGI) which envisions a true match with the capabilities of human General Intelligence (GI). A closer look at today’s AI advances across diverse tasks such as image recognition, speech recognition, web search, autonomous driving, or playing strategy games such as Go (Silver et al., 2016) reveals that these narrow AI solutions are capable of specific tasks with human-level competency. But we are yet to come closer to a single generic system that can do all these tasks with human-level competency. There have been few recent attempts having one machine learning model performing up to 8 different tasks (Kaiser et al., 2017) but still not at par with human capabilities where one human brain can juggle several unrelated tasks everyday with remarkable success.

The paper is organized into sections; In Section 2, we briefly discuss Human Intelligence Theories and their relevance to Human Learning. For reader with no background in Human Learning, Appendix A discusses Learning theories at length. In Section 3, Adoption of Human Learning in current AI systems is discussed. In Section 4, we identify key differences in Human & Machine Learning. In Section 5, we propose the evaluation framework to systematically measure any ML system’s cognition. Section 6 showcases the case study where the proposed framework is applied to compare two computer vision agents based on cognitive capability. Section 7 concludes with strengths & limitations of the framework and future directions to extend it.

Section snippets

Nature of Human Intelligence & Challenges in Measurement

To truly appreciate and compare the advances in AI, we need to scientifically measure the intelligence of machines. Most AI advances are narrow AI, with exceptions that try to mimic truly Human-like General Intelligence (Pennachin & Goertzel, 2007) with a master machine learning algorithm (Domingos et al., 2015) that can learn any task. Intuitively, the benchmark for such AI should be natural human intelligence. However, measuring human intelligence itself is an unresolved problem as we do not

Mapping of Human Learning Theories with AI Implementations

AI systems’ human designers have gained few insights from the primary schools of human learning theories (Appendix A), Behaviorism, Cognitivism, Connectivism and Constructivism. However, it is often argued (Miller, 2003, Guerin, 2008, Hassabis et al., 2017) that the findings and insights from the non-computing disciplines especially human learning, are not as much exploited in AI as it should have been. To establish the ground truth of how much insights are adopted in AI agents, we attempt to

Measuring Human and Machine Learning

Currently Machine Learning (ML) based AI is the most successful genre of AI algorithms apart from the Rules-based AI, Expert Systems, Knowledge Graphs or Symbolic AI; collectively called as Good, Old-Fashioned AI (GOFAI). GOFAI can be argued as not truly intelligent as these systems simply execute human designers’ learning & intuitions. ML-based AI is distinct from GOFAI because it is able to modify itself when presented with more data; and therefore, is less reliant on human designers to adapt

Cognitive Evaluation Framework for Machine Learning Agents

There is a growing need to evaluate cognition of ML Agents (See Section 4.1) as they emerge as arguably the most powerful genre of AI paradigms. Traditional ML evaluation metrics are sensitive to factors such as (1) Size & Quality of Training Data, (2) Nature of the target task, (3) Computational resources for training models, (4) Model Complexity (Architecture, hyper-parameters), (5) Tractability of inference and (6) Ability-to-scale for large volumes of data. For fair comparison of two ML

Applying Cognition Evaluation Framework: Case Study to Compare ML Agents

To truly ascertain the usefulness of proposed framework, we apply it to assess task-specific and general cognitive capabilities of two machine learning agents. We will profile following two ML models.

  • 1.

    NoisyStudent Model (Xie, Luong, Hovy, & Le, 2020) is current State-Of-The-Art (SOTA) in Image Classification task on ImageNet dataset.

  • 2.

    “One Model To Learn Them All’ (Kaiser et al., 2017), also called MultiModel is a single deep-learning model that can simultaneously learn 8 tasks, including image

Conclusion and future work

This paper reviewed the literature on human and machine learning theories extensively and addressed an important issue of scientific comparison of Machine Learning (ML) agents based on cognitive capability. We proposed a framework to focus on qualitatively assessing and comparing ML agents rather than conventional performance metric based comparison. The key contribution of the framework is a systematic approach to model ML system’s cognition that helps in objective assessment of current

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (95)

  • R.C. Anderson et al.

    Frameworks for Comprehending Discourse

    American Educational Research Journal

    (1977)
  • Ausubel, D. P., Novak, J. D., & Hanesian, H. et al. (1968). Educational Psychology: A cognitive...
  • M.R. Ayers

    Locke versus Aristotle on natural kinds

    The Journal of Philosophy

    (1981)
  • B.S. Bloom et al.
    (1964)
  • Botchkarev, A. (2018). Performance Metrics (Error Measures) in Machine Learning Regression, Forecasting and...
  • D. Cole

    The Chinese Room Argument

  • Davis, P. M. (1991). Cognition and Learning: A Review of the Literature with Reference to Ethnolinguistic Minorities,...
  • A. Diamond

    Executive Functions

    Annual Review of Psychology

    (2013)
  • P. Domingos

    The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World

    (2015)
  • B. Draganski et al.

    Neuroplasticity: Changes in grey matter induced by training

    Nature

    (2004)
  • R.O. Duda et al.

    Pattern classification

    (2012)
  • R. Eberhart et al.

    A new optimizer using particle swarm theory, in: MHS’95

  • R.M. Gagne et al.

    Principles of Instructional Design

    (1974)
  • H. Gardner

    Frames of Mind: The Theory of Multiple Intelligences

    (2011)
  • H. Gardner et al.

    Educational Implications of The Theory of Multiple Intelligences

    Educational Researcher

    (1989)
  • Ghosh, S., Vinyals, O., . Strope, O., Roy, S., Dean, T., & Heck, L. (2016). Contextual LSTM (CLSTM) models for Large...
  • J. Gläscher et al.

    Distributed neural system for general intelligence revealed by lesion mapping

    Proceedings of the National Academy of Sciences

    (2010)
  • D.E. Goldberg et al.

    Genetic Algorithms and Machine Learning

    Machine Learning

    (1988)
  • Graves, A., Wayne, G., Danihelka, I. (2014). Neural Turing Machines, CoRR...
  • F. Guerin

    Constructivism in AI: Prospects, Progress and Challenges

  • . Gunderson, I., Gunderson, L. (2004). Intelligence  ≠  autonomy  ≠  capability, in: Proceedings of Performance Metrics...
  • D. Gunning

    Explainable Artificial Intelligence (XAI), Defense Advanced Research Projects Agency (DARPA)

    nd Web

    (2017)
  • D. Heckerman

    A Tutorial on Learning with Bayesian Networks

    (2008)
  • Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., Meger, D. (2017). Deep Reinforcement Learning that...
  • S. Hochreiter et al.

    Long short-term memory

    Neural Computation

    (1997)
  • Jing, L., Tian, Y. (2020). Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey, IEEE...
  • D. Jurafsky et al.

    Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition

    (2000)
  • Kaiser, L., Gomez, A. N., Shazeer, N., Vaswani, A., Parmar, N., Jones, L., Uszkoreit, J. (2017). One Model To Learn...
  • D. Koller et al.

    Probabilistic Graphical Models: Principles and Techniques

    (2009)
  • Kotseruba, I., Tsotsos, J.K. (2018). 40 years of cognitive architectures: core cognitive abilities and practical...
  • D.R. Krathwohl

    A revision of Bloom’s taxonomy: An overview

    Theory into Practice

    (2002)
  • A. Krizhevsky et al.

    ImageNet Classification with Deep Convolutional Neural Networks

  • J.D. Lafferty et al.

    Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

  • Y. LeCun et al.

    Deep learning

    Nature

    (2015)
  • H.J. Levesque et al.

    Expressiveness and tractability in knowledge representation and reasoning 1

    Computational Intelligence

    (1987)
  • Liu, F., Baldwin, T., Cohn, T. (2017). Capturing Long-range Contextual Dependencies with Memory-enhanced Conditional...
  • J.R. Lucas

    Minds, Machines and Gödel

    Philosophy

    (1961)
  • Cited by (5)

    View full text