In the former, Q value is assigned for each state-memorization pair. While, Q value for regular action is also assigned to each state-action pair. The learning of Q value for memorization is trained from the maximum Q values for regular action at the next time step. In the latter, Q value is assigned for each state-action-memorization pair. In this case, there are no Q value for regular action. The learning is the same as the regular Q learning. By some simulations, it was examined that necessary states became to be memorized, and appropriate actions could be obtained with few memories after learning. The advantage over the other memory-based approaches was confirmed. There was no conspicuous difference between proposed two learning methods except for the learning speed.