Indoor Target-driven Visual Navigation

Deep reinforcement learning setups to reach a goal usually have the goal embedded somewhere in the reward function. However, this makes the learning specific to that goal.

It is possible to use a different setup and make the learning more general, that is, good for any goal.

To do this, the state representation can be changed to pass in the output of a Siamese network. This network takes both the current state image and the target state image as an input and produces a discriminatory embedding that tells how different the current state and the target state are.

With this setup, the actor critic part learns through embeddings; the reward function can simply be a reward on the disagreement. If the disagreement is high, we penalize, and if it is low, we reward the setup, because we're getting close to the goal.

References

  1. Zhu, Yuke, et al. "Target-driven visual navigation in indoor scenes using deep reinforcement learning." Robotics and Automation (ICRA), 2017 IEEE International Conference on. IEEE, 2017.