The original code demonstrates DQN with Keras and uses a pygame implementation of Flappy Bird.
The main idea was to see the benefits of incorporating uncertainty estimates with bootstrapped DQN. We did get faster convergence with UCB1 estimates:
However, surprisingly, vanilla bootstrap outperformed in terms of max scores. Although, bootstrap with UCB1 was indeed better than DQN. The reason could be that the action space is too small (two actions: jump or not). Maybe this will produce effective with bigger action spaces.
|Method||Max score achieved|
|bootstrap with majority voting||740|
|bootstrap with UCB1||81|
|bootstrap with UCB1 and majority voting||197|