Elsevier

Neurocomputing

Volume 345, 14 June 2019, Pages 92-102
Neurocomputing

Robot skill acquisition in assembly process using deep reinforcement learning

https://doi.org/10.1016/j.neucom.2019.01.087Get rights and content

Abstract

Uncertain factors in environments restrict the intelligence level of industrial robots. Based on deep reinforcement learning, a skill-acquisition method is used to solve the posed problems of uncertainty in a complex assembly process. Under the frame of the Markov decision process, a quaternion sequence of the assembly process is represented. The reward function uses a trained classification model, which mainly recognizes whether the assembly is successful. The proposed skill-acquisition method is designed to make robots acquire assembly skills. The input of the model is the contact state of the assembly process, and the output is the robot action. The robot can complete the assembly by self-learning with little prior knowledge. To evaluate the performance of the proposed skill-acquisition method, simulations and real-world experiments were performed in a low-voltage apparatus assembly. The assembly success rate increases with the learning time. In the case of a random initial position and orientation, the assembly success rate was greater than 80% with little prior knowledge. The results show that the robot has a capability to complex assembly through skill acquisition.

Introduction

The ability of data-driven autonomous learning has become an important feature of the intelligent manufacturing technology. Traditional industrial robots mainly rely on teaching reproduction or programming-based operation. Once faced with uncertain factors in an unknown environment, the adaptability of industrial robots is poor. In complex assembly tasks, this problem is particularly prominent. Assembly work is one of the most important challenges in the industrial robotics field because of the complicated environment, diverse objects, complex action types, and flexible requirements. Uncertainty in the assembly process is particularly evident.

Some traditional methods have been presented to deal with uncertainties to make the assembly process more flexible. Different assembly stages have been identified to ensure the smooth operation of the assembly through the analysis of the contact state [1], [2]. Compliance of assembly has been realized through either a flexible gripper [3] or an impedance-control approach [4], [5]. Most methods aim at a known contact state and regular objects. Therefore, assembly theories and strategies are limited. Such methods are not applicable to the parts assembly process. In recent years, machine learning has brought about new opportunities [6]. In particular, deep reinforcement learning has been successfully applied in chess and other games [7], [8], [9], [10], [11]. Compared with traditional algorithms, deep reinforcement learning shows great capability of adaptation to complex environments and self-learning. Many researchers intend to apply it to other fields, such as grabbing [12], opening doors [13], [14], object tracking [15], and path planning [16]. In a comprehensive view, most studies consider simple actions, such as grabbing. However, this method might not be effective for the assembly tasks with more uncertainties and variable parameters.

In contrast to previous work, herein, we propose an assembly skill-acquisition method based on deep reinforcement learning to solve the uncertainty in the complex assembly process. Specifically, the contact state is described by force/torque as a dynamic description of the assembly process based on data. Unlike the evaluation criteria of games such as Atria and Go, our reward system adopts a two-classification model to judge whether the process is completed successfully. The skill of orientation and pose adjustment is mastered by the model based on deep Q-learning with little prior knowledge, and the skill becomes more proficient by self-optimization. The experimental results show that robot can still complete the assembly by skill acquisition when the position of the parts is uncertain. The skill-acquisition method has been verified in real experiments, and the industrial robot can handle uncertainty in the assembly process.

The contributions of this paper are as follows:

  • A trained classification model is used to evaluate the reward system.

  • Fasten assembly (FA) is completed by skill acquisition based on deep reinforcement learning.

  • The assembly performance is improved with little prior knowledge.

Most of the assembly strategies are based on the idea of modeling and divided into two categories, namely physical and data models.

The physical model is derived through contact force analysis and geometry. A three-point contact model is built and the pose misalignment between the peg and hole is estimated by force and geometric analysis [17]. Zhang et al. proposed a novel modeling method of geometric errors for precision assembly [18]. David et al. [19] proposed a control scheme that includes learning the contact states during operation. Park et al. [1] proposed a robotic peg-in-hole strategy given positional uncertainty through the analysis of the contact state without force feedback. In building the data models, Jasim and Plapper [2] used Gauss mixture models based on expectation maximization to identify the contact resistance state with a spiral search path. A cylindrical shaft-hole assembly experiment was successfully applied to the KUKA robot to validate the algorithm effectiveness. A robotic assembly parameter optimization method [20], [21] was proposed to enable industrial robots to assemble a workpiece from unskilled to skilled, similar to human learning behavior. Unlike modeling, impedance control approaches [4], [5] were used to deal with partially unstructured environments. Huang et al. [22] presented a visual compliance strategy to deal with the problem of fast peg-and-hole alignment with large position and attitude uncertainty. Wan et al. taught robots to perform object assembly using multi-modal 3D vision [23].

The assembly objects mentioned above are regular shapes, such as circular or square holes. The circuit breakers in this paper are irregular parts. The FA process of the circuit breaker is complex. A data model is adopted in this paper. Both the contact and position adjustment occur in the three-dimensional space. Contact theory and assembly strategy must be discussed in space. The FA mechanism of circuit breaker is analyzed in detail in this paper. The force/torque is taken as a state-space model to reflect the uncertainty in the assembly process, and Markov modeling is applied to the assembly process.

Another task is to learn assembly skills through human demonstration. In [24], a robot was taught complex assembly skills, including recognition of the parts, tools, assembly actions, and assembly state, in a coherent way. A hidden Markov model [25] was used to analyze the hidden states in the peg-in-hole assembly process. In order to use the model in the robot such that it can take various inputs, different control strategies must be designed. In addition, the robot masters skill learning through programming [26].

Inspired by the success of deep neural networks in various vision tasks [27], [28], [29], this study uses a skill-acquisition method based on deep reinforcement learning to enable robots to master assembly skills. Gu et al. [30] introduced a novel reinforcement learning model that generalizes across targets and scenes. Gu et al. [13] presented an asynchronous deep reinforcement learning approach that can be used to learn complex robotic manipulation skills from scratch on real physical robotic manipulators. They also demonstrated that their approach can learn a complex door-opening task with only a few hours of training. The end-to-end control of the three-joint arms is realized by deep Q-network technology [31]. The grabbing tasks were implemented using deep reinforcement [12], [32], [33]. The use of deep reinforcement learning in different fields has achieved self-development learning. In this work, we use deep reinforcement learning to deal with uncertainty in the assembly environment so that the robot can complete complex FA tasks with little prior knowledge.

The rest of the paper is organized as follows: Section 2 introduces the complex assembly process and formulates the problem to be solved. Section 3 presents the proposed method. The simulation and real experimental procedures and results are provided in Section 4 and Section 5. Section 6 presents the conclusions and future work.

Section snippets

Complex assembly process

Fig. 1 demonstrates an example of the complex assembly, where the cover above is inserted into the base of the circuit breaker, which is defined as FA. A circuit breaker is an on-load device that can interrupt load or short circuit current. The circuit breaker has the characteristics of compact structure, small size, and many types of parts, complex shape, and different models. A series of complex mechanisms are observed in approximately 18 mm width, including two poles, arc extinguishing

Markov decision round of assembly process

The key ingredients of the Markov decision process (MDP) in FA are the environment, observations (state space), action space, and reward system design. The Markov round of FA is specified by Fig. 4. The environment can be seen as a compliant assembly system with the actuator of the manipulator with seven degrees of freedom (iiwa). The current state of the FA process, which is the input of the controller, is collected by the F/T sensor in environment.

The state space is defined as the contact

Simulation system

In the Gazebo simulation system (see Fig. 7) in ROS, the effect of the friction of the assembly claw on the upper cover was neglected in the moving process. The initial position is adjusted through the forward kinematics of the manipulator, and inverse kinematics in assembly process. The control simulation was achieved through impedance. We implement our assembly algorithm in Tensorflow and train them on NVIDIA GTX1070 and NVIDIA K80 GDDR5 384Bit 10 Gbps GPUs.

The steps of the simulation

Real-world system

To validate the generalization of the method to real-world settings, we also perform an experimental platform as shown in Fig. 15 using KUKA iiwa7 R800.

A vacuum suction tool with four small iron props is used to pick up the upper cover. A computer is connected to the robot controller via ethernet (TCP/IP). The KUKA force control package is used to perform the assembly process. Unlike the simulation experiment, the evaluation is based on the assembly displacement (z-axis: 7.5 mm ± 0.1 mm). The

Conclusions

This paper proposed a skill-acquisition method using deep reinforcement learning to solve the uncertain problems in assembly. The ability to acquire skills could be called a behavior increment of the industrial robot. The Markov round in the FA process was designed, including the assembly environment, contact state space, discredited action space, and reward system. The skill-acquisition model was applied to FA with little prior knowledge. The robot can adjust its position according to the

Acknowledgment

This paper is supported by the Integration Fund Project of China NSF and Zhejiang province, China (No. U1509212), Key Research and Development Plan of Shandong Province, China (No. 2017CXGC0915), and Major Program of Shandong Province Natural Science Foundation, China (No. ZR2018ZC0437).

Fengming Li received her B.S. degree in Automation from Weifang University, Weifang China, in 2007, her M.S. degree from the School of Control Science and Engineering, Shandong University, Jinan, China, in 2010. She is currently working toward a Ph.D. with the School of Control Science and Engineering, Shandong University, Jinan, China. Her research interests include Intelligent Control, Machine Learning, and Optimization Theory.

References (39)

  • V. Mnih et al.

    Human-level control through deep reinforcement learning

    Nature

    (2015)
  • D. Silver et al.

    Mastering the game of go with deep neural networks and tree search

    Nature

    (2016)
  • D. Silver et al.

    Mastering the game of go without human knowledge

    Nature

    (2017)
  • Y.C. Wu et al.

    Master-slave curriculum design for reinforcement learning

    Proceeding of International Joint Conference on Artificial Intelligence

    (2018)
  • S. Levine et al.

    Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection

    Int. J. Robot. Res.

    (2016)
  • S.X. Gu et al.

    Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates

    Proceedings of the IEEE International Conference on Robotics and Automation

    (2017)
  • P.S. Thomas, E. Brunskill, Policy gradient methods for reinforcement learning with function approximation and...
  • W. Zhang et al.

    Coarse-to-fine uav target tracking with deep reinforcement learning

    IEEE Trans. Autom. Sci. Eng.

    (2018)
  • Y.e. Zhu et al.

    Target-driven visual navigation in indoor scenes using deep reinforcement learning

    Proceedings of the IEEE International Conference on Robotics and Automation

    (2017)
  • Cited by (86)

    • Markovian policy network for efficient robot learning

      2022, Neurocomputing
      Citation Excerpt :

      These approaches are under the umbrella of multilayer perceptrons. Generally speaking, the policy networks are represented as generic multilayer perceptrons to output actions adopted by all joints for non-vision part Li, Jiang, Zhang, Wei and Song [28,18,39,48]. However, these methods are highly data-intensive: considerable training time and sample trajectories are required to learn to control robots with high-dimensional state spaces Chatzilygeroudis, Vassiliades, Stulp, Calinon and Mouret [5].

    • Robot learning towards smart robotic manufacturing: A review

      2022, Robotics and Computer-Integrated Manufacturing
    View all citing articles on Scopus

    Fengming Li received her B.S. degree in Automation from Weifang University, Weifang China, in 2007, her M.S. degree from the School of Control Science and Engineering, Shandong University, Jinan, China, in 2010. She is currently working toward a Ph.D. with the School of Control Science and Engineering, Shandong University, Jinan, China. Her research interests include Intelligent Control, Machine Learning, and Optimization Theory.

    Qi Jiang received his Ph.D. from Tianjin University, Tianjin, China, in 2003. He is currently a professor at the School of Control Science and Engineering, Shandong University, Jinan City of Shandong Province, China. His major research focuses on novel inspection and sensors, and FBG sensors

    Sisi Zhang received her B.S. degree in Electrical Engineering and Automation from Shandong Jianzhu University, China, in 2016. She is currently working toward a M.S. with the School of Control Science and Engineering, Shandong University, Jinan, China. Her research interests include Intelligent Control and Machine Learning

    Meng Wei received his B.S. degree in Measurement and Control Technology and Instruments from Yanshan University, China, in 2016. He is currently working toward a M.S. with the School of Control Science and Engineering, Shandong University, Jinan, China. His research interests include Intelligent Control and Machine Learning

    Rui Song received his B.E. degree in industrial automation in 1998, M.S. degree in control theory and control engineering in 2001 from Shandong University of Science and Technology, and Ph.D. in control theory and control engineering from Shandong University in 2011. He is engaged in research on intelligent sensor networks, intelligent robot technology, and intelligent control systems. His research interests include Medical robots, industrial robots and the quadruped robots. He is currently an Associate Professor at the School of Control Science and Engineering of Shandong University in Jinan China, and one of the directors of the Center of Robotics of Shandong University.

    View full text