Cognitive Evaluation of Machine Learning Agents
Introduction
Advances in Artificial Intelligence (AI) are taking us closer to the Artificial General Intelligence (AGI) which envisions a true match with the capabilities of human General Intelligence (GI). A closer look at today’s AI advances across diverse tasks such as image recognition, speech recognition, web search, autonomous driving, or playing strategy games such as Go (Silver et al., 2016) reveals that these narrow AI solutions are capable of specific tasks with human-level competency. But we are yet to come closer to a single generic system that can do all these tasks with human-level competency. There have been few recent attempts having one machine learning model performing up to 8 different tasks (Kaiser et al., 2017) but still not at par with human capabilities where one human brain can juggle several unrelated tasks everyday with remarkable success.
The paper is organized into sections; In Section 2, we briefly discuss Human Intelligence Theories and their relevance to Human Learning. For reader with no background in Human Learning, Appendix A discusses Learning theories at length. In Section 3, Adoption of Human Learning in current AI systems is discussed. In Section 4, we identify key differences in Human & Machine Learning. In Section 5, we propose the evaluation framework to systematically measure any ML system’s cognition. Section 6 showcases the case study where the proposed framework is applied to compare two computer vision agents based on cognitive capability. Section 7 concludes with strengths & limitations of the framework and future directions to extend it.
Section snippets
Nature of Human Intelligence & Challenges in Measurement
To truly appreciate and compare the advances in AI, we need to scientifically measure the intelligence of machines. Most AI advances are narrow AI, with exceptions that try to mimic truly Human-like General Intelligence (Pennachin & Goertzel, 2007) with a master machine learning algorithm (Domingos et al., 2015) that can learn any task. Intuitively, the benchmark for such AI should be natural human intelligence. However, measuring human intelligence itself is an unresolved problem as we do not
Mapping of Human Learning Theories with AI Implementations
AI systems’ human designers have gained few insights from the primary schools of human learning theories (Appendix A), Behaviorism, Cognitivism, Connectivism and Constructivism. However, it is often argued (Miller, 2003, Guerin, 2008, Hassabis et al., 2017) that the findings and insights from the non-computing disciplines especially human learning, are not as much exploited in AI as it should have been. To establish the ground truth of how much insights are adopted in AI agents, we attempt to
Measuring Human and Machine Learning
Currently Machine Learning (ML) based AI is the most successful genre of AI algorithms apart from the Rules-based AI, Expert Systems, Knowledge Graphs or Symbolic AI; collectively called as Good, Old-Fashioned AI (GOFAI). GOFAI can be argued as not truly intelligent as these systems simply execute human designers’ learning & intuitions. ML-based AI is distinct from GOFAI because it is able to modify itself when presented with more data; and therefore, is less reliant on human designers to adapt
Cognitive Evaluation Framework for Machine Learning Agents
There is a growing need to evaluate cognition of ML Agents (See Section 4.1) as they emerge as arguably the most powerful genre of AI paradigms. Traditional ML evaluation metrics are sensitive to factors such as (1) Size & Quality of Training Data, (2) Nature of the target task, (3) Computational resources for training models, (4) Model Complexity (Architecture, hyper-parameters), (5) Tractability of inference and (6) Ability-to-scale for large volumes of data. For fair comparison of two ML
Applying Cognition Evaluation Framework: Case Study to Compare ML Agents
To truly ascertain the usefulness of proposed framework, we apply it to assess task-specific and general cognitive capabilities of two machine learning agents. We will profile following two ML models.
- 1.
NoisyStudent Model (Xie, Luong, Hovy, & Le, 2020) is current State-Of-The-Art (SOTA) in Image Classification task on ImageNet dataset.
- 2.
“One Model To Learn Them All’ (Kaiser et al., 2017), also called MultiModel is a single deep-learning model that can simultaneously learn 8 tasks, including image
Conclusion and future work
This paper reviewed the literature on human and machine learning theories extensively and addressed an important issue of scientific comparison of Machine Learning (ML) agents based on cognitive capability. We proposed a framework to focus on qualitatively assessing and comparing ML agents rather than conventional performance metric based comparison. The key contribution of the framework is a systematic approach to model ML system’s cognition that helps in objective assessment of current
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (95)
- et al.
Intelligence assessment: Gardner multiple intelligence theory as an alternative
Learning and Individual Differences
(2010) - et al.
An Integrated Theory of List Memory
Journal of Memory and Language
(1998) - et al.
Human Memory: A Proposed System and its Control Processes
- et al.
Interactions between attention and memory
Current Opinion in Neurobiology
(2007) - et al.
Levels of processing: A framework for memory research
Journal of Verbal Learning and Verbal Behavior
(1972) - et al.
Neuroscience-inspired Artificial Intelligence
Neuron
(2017) The cognitive revolution: a historical perspective
Trends in Cognitive Sciences
(2003)- et al.
On the cognitive process of human problem solving
Cognitive Systems Research
(2010) - et al.
Rubrics: Tools for making learning goals and evaluation criteria explicit for both teachers and learners
CBE–Life Sciences Education
(2006) - et al.
Educational Psychology: The Science of Instruction and Learning
(1973)
Frameworks for Comprehending Discourse
American Educational Research Journal
Locke versus Aristotle on natural kinds
The Journal of Philosophy
The Chinese Room Argument
Executive Functions
Annual Review of Psychology
The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World
Neuroplasticity: Changes in grey matter induced by training
Nature
Pattern classification
A new optimizer using particle swarm theory, in: MHS’95
Principles of Instructional Design
Frames of Mind: The Theory of Multiple Intelligences
Educational Implications of The Theory of Multiple Intelligences
Educational Researcher
Distributed neural system for general intelligence revealed by lesion mapping
Proceedings of the National Academy of Sciences
Genetic Algorithms and Machine Learning
Machine Learning
Constructivism in AI: Prospects, Progress and Challenges
Explainable Artificial Intelligence (XAI), Defense Advanced Research Projects Agency (DARPA)
nd Web
A Tutorial on Learning with Bayesian Networks
Long short-term memory
Neural Computation
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Probabilistic Graphical Models: Principles and Techniques
A revision of Bloom’s taxonomy: An overview
Theory into Practice
ImageNet Classification with Deep Convolutional Neural Networks
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
Deep learning
Nature
Expressiveness and tractability in knowledge representation and reasoning 1
Computational Intelligence
Minds, Machines and Gödel
Philosophy
Cited by (5)
Can robots replace human beings? —Assessment on the developmental potential of construction robot
2022, Journal of Building EngineeringExplaining the Neuroevolution of Fighting Creatures Through Virtual fMRI
2023, Artificial LifeResearch on GOA-RNN based fault prediction method for fire control system
2023, Proceedings - 2023 Prognostics and Health Management Conference - Paris, PHM-Paris 2023END2END UNSTRUCTURED DATA PROCESSING, CONFIDENTIAL DATA STRUCTURING & STORAGE USING IMAGE PROCESSING, NLP, MACHINE LEARNING, AND BLOCKCHAIN
2022, Journal of Theoretical and Applied Information Technology