ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
advertisementadvertisement
Neurocomputing
Volume 7, Issue 3, April 1995, Pages 275-297
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Purchase PDF (1584 K)

 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
View Record in Scopus
 
doi:10.1016/0925-2312(94)00027-P    How to Cite or Link Using DOI (Opens New Window)
Copyright © 1995 Published by Elsevier Science B.V.

Paper

A new approach to the design of reinforcement schemes for learning automata: Stochastic estimator learning algorithm

Athanasios V. Vasilakosa, Corresponding Author Contact Information and Georgios I. Papadimitrioub

a Hellenic Air Force Academy, Department of Computer Science, P.O. Box 1010, Dekeleia-Attiki, Greece b Department of Computer Engineering, University of Patras, 26500, Patras, Greece

Received 26 May 1992; 
accepted 7 March 1994. ;
Available online 7 April 2000.

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Abstract

In this paper a new approach to the design of S-model ergodic reinforcement learning algorithms is introduced. The new scheme utilizes a stochastic estimator and is able to operate in non-stationary environments with high accuracy and a high adaptation rate. According to the stochastic estimator scheme, which is the first attempt in the field, the estimates of the mean rewards of actions are computed stochastically. So, they are not strictly dependent on the environmental responses. The dependence between the stochastic estimates and the deterministic estimator's contents is more relaxed if the latter are not updated. In this way actions that have not been selected recently have the opportunity to be estimated as ‘optimal’, to increase their choice probability and consequently to be selected. Thus, the estimator is always recently updated and consequently able to adapt to environmental changes. The performance of the presented Stochastic Estimator Learning Automaton (SELA) is superior to all previous well-known S-model ergodic schemes. Furthermore it is proved that SELA is ε-optimal in every S-model random environment.

Author Keywords: Stochastic estimator; Learning window; Ergodic learning algorithm; Discretized learning algorithm; Probability slice

Article Outline

• References

Neurocomputing
Volume 7, Issue 3, April 1995, Pages 275-297
 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.