Graded State Machines: The Representation of Temporal Contingencies in Simple Recurrent Networks

Servan-Schreiber, David; Cleeremans, Axel; McClelland, James L.

doi:10.1023/A:1022647012398

Graded State Machines: The Representation of Temporal Contingencies in Simple Recurrent Networks

Published: September 1991

Volume 7, pages 161–193, (1991)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Graded State Machines: The Representation of Temporal Contingencies in Simple Recurrent Networks

Download PDF

David Servan-Schreiber¹,
Axel Cleeremans¹ &
James L. McClelland¹

504 Accesses
6 Citations
1 Altmetric
Explore all metrics

Abstract

We explore a network architecture introduced by Elman (1990) for predicting successive elements of a sequence. The network uses the pattern of activation over a set of hidden units from time-step t-1, together with element t, to predict element t + 1. When the network is trained with strings from a particular finite-state grammar, it can learn to be a perfect finite-state recognizer for the grammar. When the net has a minimal number of hidden units, patterns on the hidden units come to correspond to the nodes of the grammar; however, this correspondence is not necessary for the network to act as a perfect finite-state recognizer. Next, we provide a detailed analysis of how the network acquires its internal representations. We show that the network progressively encodes more and more temporal context by means of a probability analysis. Finally, we explore the conditions under which the network can carry information about distant sequential contingencies across intervening elements to distant elements. Such information is maintained with relative ease if it is relevant at each intermediate step; it tends to be lost when intervening elements do not depend on it. At first glance this may suggest that such networks are not relevant to natural language, in which dependencies may span indefinite distances. However, embeddings in natural language are not completely independent of earlier information. The final simulation shows that long distance sequential contingencies can be encoded by the network even if only subtle statistical properties of embedded strings depend on the early information. The network encodes long-distance dependencies by shading internal representations that are responsible for processing common embeddings in otherwise different sequences. This ability to represent simultaneously similarities and differences between several sequences relies on the graded nature of representations used by the network, which contrast with the finite states of traditional automata. For this reason, the network and other similar architectures may be called Graded State Machines.

Article PDF

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

References

Allen, R.B.(1988). Sequential connectionist networks for answering simple questions about a microworld. Proceedings of the Tenth Annual Conference of the Cognitive Science Society.
Allen, R.B., & Riecksen, M.E.(1989). Reference in connectionist language users.In R.Pfeifer, Z.Schreter, F.Fogelman-Soulie', & L.Steels (Eds.), Connectionism in perspective. North Holland: Amsterdam.
Google Scholar
Allen, R.B.(1990). Connectionist language users (TR-AR-90-402). Morristown, NJ: Bell Communications Research.
Google Scholar
Cleeremans A., Servan-Schreiber D., & McClelland J.L.(1989). Finite state automata and simple recurrent networks. Neural Computation, 1, 372-381.
Google Scholar
Cottrell, G.W.(1985). Connectionist parsing. Proceedings of the Seventh Annual Conference of the Cognitive Science Society. Hillsdale, NJ: Erlbaum.
Google Scholar
Elman, J.L.(1990). Finding structure in time. Cognitive Science, 14, 179-211.
Google Scholar
Elman, J.L.(1990). Representation and structure in connectionist models.In Gerry T.M.Altmann (Ed.), Cognitive models of speech processing:Psycholinguistic and computational perspectives. Cambridge, MA: MIT Press.
Google Scholar
Fanty, M.(1985). Context-free parsing in connectionist networks (TR174). Rochester, NY: University of Rochester, Computer Science Department.
Google Scholar
Hanson, S. & Kegl, J.(1987). PARSNIP:A connectionist network that learns natural language from exposure to natural language sentences. Proceedings of the Ninth Annual Conference of the Cognitive Science Society. Hillsdale, NJ: Erlbaum.
Google Scholar
Hinton, G., McClelland, J.L., & Rumelhart, D.E.(1986). Distributed representations.In D.E.Rumelhart and J.L.McClelland (Eds.), Parallel distributed processing, I:Foundations. Cambridge, MA: MIT Press.
Google Scholar
Jordan, M.I.(1986). Attractor dynamics and parallelism in a connectionist sequential machine. Proceedings of the Eighth Annual Conference of the Cognitive Science Society. Hillsdale, NJ: Erlbaum.
Google Scholar
Luce, R.D.(1963). Detection and recognition.In R.D.Luce, R.R.Bush and E.Galanter (Eds.), Handbook of mathematical psychology (Vol.I). New York: Wiley.
Google Scholar
McClelland, J.L., & Rumelhart, D.E.(1988). Explorations in parallel distributed processing:A handbook of models, programs and exercises. Cambridge, MA: MIT Press.
Google Scholar
Pollack, J.(in press). Recursive distributed representations. Artificial Intelligence.
Reber, A.S.(1976). Implicit learning of synthetic languages:The role of the instructional set. Journal of Experimental Psychology:Human Learning and Memory, 2, 88-94.
Google Scholar
Rumelhart, D.E., & McClelland, J.L.(1986). Parallel distributed processing, I:Foundations. Cambridge, MA: MIT Press.
Google Scholar
Rumelhart, D.E., Hinton, G., & Williams, R.J.(1986). Learning internal representations by error propagation. In D.E.Rumelhart and J.L.McClelland (Eds.), Parallel distributed processing, I:foundations. Cambridge, MA: MIT Press.
Google Scholar
Sejnowski, T.J., & Rosenberg, C.(1987). Parallel networks that learn to pronounce english text. Complex Systems, 1, 145-168.
Google Scholar
Servan-Schreiber D., Cleeremans A., & McClelland J.L.(1988). Encoding sequential structure in simple recurrent networks (Technical Report CMU-CS-183). Pittsburgh, PA: Carnegie Mellon University, School of Computer Science.
Google Scholar
Servan-Schreiber D., Cleeremans A., & McClelland J.L.(1989). Learning sequential structure in simple recurrent networks.In D.S.Touretzky (Ed.), Advances in neural information processing systems L San Mateo, CA: Morgan Kaufmann.[Collected papers of the IEEE Conference on Neural Information Processing Systems-Natural and Synthetic, Denver, Nov.28-Dec.1, 1988 ].
Google Scholar
St.John, M., & McClelland, J.L.(in press). Learning and applying contextual constraints in sentence comprehension. Artificial Intelligence. 89

Download references

Author information

Authors and Affiliations

School of Computer Science and Department of Psychology, Carnegie Mellon University, US
David Servan-Schreiber, Axel Cleeremans & James L. McClelland

Authors

David Servan-Schreiber
View author publications
You can also search for this author in PubMed Google Scholar
Axel Cleeremans
View author publications
You can also search for this author in PubMed Google Scholar
James L. McClelland
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Servan-Schreiber, D., Cleeremans, A. & McClelland, J.L. Graded State Machines: The Representation of Temporal Contingencies in Simple Recurrent Networks. Machine Learning 7, 161–193 (1991). https://doi.org/10.1023/A:1022647012398

Download citation

Issue Date: September 1991
DOI: https://doi.org/10.1023/A:1022647012398

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Graded State Machines: The Representation of Temporal Contingencies in Simple Recurrent Networks

Abstract

Article PDF

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation