Direct-Vision-Based Reinforcement Learning

Summary
Many of modern robots are utilizing visual sensors to get plenty of information about environment. The visual sensor provides us a huge number of sensor signals. Even for the robot in which learning is a special feature, Applying image processing to the visual signals is taken for granted generally to extract some useful pieces of information and to assign the present visual signals to one state in state space. However, useful knowledge to solve a given task is often included in the image processing or other pre-processings. For example, in the work of Asada et al., when the soccer robot learned shoot action, the ball position and size, the goal position, size, and orientation were extracted from the image captured by the robot. In that case, it is also a very intelligent process that the robot notices such information is important to solve the task, and that it finds how the such information can be extracted from the image.

Direct-Vision-Based Reinforcement Learning(RL) is one of the ways to utilize RL in robot-like system with sensors and motors on the basis that given knowledge is reduced as much as possible. Concretely, a layered neural network is employed; the raw sensor signals are the input and motor commands are the output of the network. The main advantage is that RL does not remain only as the learning of action planning, but also can be extended as the learning for the whole process from sensors to motors including recognition, memory, and so on. The abstracted state representation in line with its purpose is formed in the neural network; that can be expected to lead to the emergence of high-order functions.

It was confirmed that a real mobile robot with a CCD camera could learn appropriate actions to reach and push a lying box only by Direct-Vision-Based reinforcement learning (RL). No image processing, no control methods, and no task information are given at premise even if as many as 1536 monochrome visual signals and 4 infrared signals are the inputs. The box pushing task is rather difficult than reaching task for the reason that not only the center of gravity, but also the direction, weight and sliding character of the box should be considered. Nevertheless, the robot could learn appropriate actions even if the reward was given only when the robot was pushing the box. It was also observed that the neural network obtained global representation of the box location through the learning.

Reference
13. Katsunari Shibata & Masaru Iida:
Acquisition of Box Pushing by Direct-Vision-Based Reinforcement Learning,
SICE Annual Conf. 2003, 2003. 8 (to appear)
pdf File (6 pages, 644kB)

12. Masaru Iida, Masanori Sugisaka \& Katsunari Shibata:
Application of Direct-Vision-Based Reinforcement Learning to a Real Mobile Robot with a CCD camera,
Proc. of AROB (Int'l Symp. on Artificial Life and Robotics) 8th, pp.86-89, 2003.1

11. Masaru Iida, Masanori Sugisaka \& Katsunari Shibata:
Direct-Vision-Based Reinforcement Learning to a Real Mobile Robot,
Proc. of Int'l Conf. of Neural Information Processing Systems (ICONIP '02), Vol. 5, pp. 2556--2560, 2002. 11
pdf File (5 pages, 640AROB03kB)

10. Masaru Iida, Masanori Sugisaka and Katsunari Shibata:
Direct-Vision-Based Reinforcement Learning in a Real Mobile Robot,
Proc. of AROB (Int'l Sympo. on Artificial Life and Robotics) 7th, pp. 42-45, 2002.1

9. Katsunari Shibata, Yoichi Okabe and Koji Ito:
Direct-Vision-Based Reinforcement Learning Using a Layered Neural Network - For the Whole Process from Sensors to Motors -,
Trans. of SICE (The Society of Instrument and Control Engineers), Vol.37, No.2, pp.168-177, 2001.2 (in Japanese)
柴田克成, 岡部洋一, 伊藤宏司:
ニューラルネットワークを用いたDirect-Vision-Based強化学習 - センサからモータまで -,
計測自動制御学会論文集, Vol.37, No.2, pp.168-177, 2001.2
pdf File (10 pages, 307kB)

8. Katsunari Shibata, Masanori Sugisaka and Koji Ito:
Fast and Stable Learning in Direct-Vision-Based Reinforcement Learning,
Proc. of AROB (Int'l Sympo. on Artificial Life and Robotics) 6th, Vol. 1, pp.200-203, 2001.1
[reinforcement learning, neural network, visual sensor, localization]
pdf File (4 pages, 150kB)

7. K. Shibata, K. Ito and Y. Okabe : PS File (8 pages, 318kB)
"Direct-Vision-Based Reinforcement Learning in "Going to a Target" Task with an Obstacle and with a Variety of Target Size"
Neurap'98 Marseilles , 1998.3

6. Katsunari Shibata, Yoichi Okabe:
Temporal Smoothing Learning,
Trans. IEE (The Institute of Electrical Engineers) of Japan, Vol. 117-C, No. 9, pp.1291-1299, 1997.9 (in Japanese)
柴田克成、岡部洋一 K. Shibata and Y. Okabe :
"時間軸スムージング学習" "Temporal Smoothing Learning"
電気学会論文誌C分冊 Trans. of IEEJ ,Vol. 117-C, No. 9, pp.1291-1299 (1997.9)

5. K. Shibata and Y. Okabe : PS File (5 pages, 236kB)
"Reinforcement Learning When Visual Sensory Signals are Directly Given as Inputs"
Proc. of ICNN'97, Vol. III, pp.1716-1720, 1997.6

4. 柴田克成,岡部洋一 K. Shibata and Y. Okabe :
"ニューラルネットによる視覚入力強化学習",
第15回日本ロボット学会学術講演会予稿集、Vol. 3, pp.897-898, 1997.9
(in Japanese)

3. 柴田克成、岡部洋一 K. Shibata and Y. Okabe : PS File (2 pages, 109kB in Japanese)
"Delayed Reinforcement Learning when Visual Sensory Signals are given as Inputs"
"視覚センサ信号を入力とした遅延強化学習"
Proc. of JNNS'96 (日本神経回路学会第7回全国大会 名古屋)(1996)
(in Japanese)

2. K. Shibata and Y. Okabe : PS File (6 pages, 157kB)
"A Robot that Learns an Evaluation Function for Acquiring of Appropriate Motions"
Proc. on WCNN '94 San Diego, vol.2, pp. II-29 - II-34 (1994)

1. K. Shibata and Y. Okabe PS File (8 pages, 164kB)
"Smoothing-Evaluation Method in Delayed Reinforcement Learning"
(1995)


Return to my home page (English)
Return to my home page (Japanese)