Acquisition of Memory through Reinforcement Learning

Summary As an approach of reinforcement learning in POMDP (Partially Observable Markov Decision Process), a novel memory-based learning named ``Q-learning for Memory'' is proposed. A state is consisted of the present state and the shift-register-like short-term memory. An agent decides whether the present state is memorized or not in its shift-register-like short-term memory. The memorization is considered as an action, and Q value for memorization is assigned. Two kinds of implementations named ``memory-Q'' and ``action-memory-Q'' are proposed.

In the former, Q value is assigned for each state-memorization pair. While, Q value for regular action is also assigned to each state-action pair. The learning of Q value for memorization is trained from the maximum Q values for regular action at the next time step. In the latter, Q value is assigned for each state-action-memorization pair. In this case, there are no Q value for regular action. The learning is the same as the regular Q learning. By some simulations, it was examined that necessary states became to be memorized, and appropriate actions could be obtained with few memories after learning. The advantage over the other memory-based approaches was confirmed. There was no conspicuous difference between proposed two learning methods except for the learning speed.

Reference
Not published yet.


Return to my home page (English)
Return to my home page (Japanese)