Perhaps we could use uncertainty estimation to detect where the model may be wrong and then correct for these potential errors without having to collect much more data. This uncertainty is useful in many respects. It maybe that a certain action yields high expected reward but there is a possibility (encoded in the uncertainty) that the agent takes too far a step and ’falls off the cliff’. A pessimistic agent may want to avoid this. Or perhaps this uncertainty can guide exploration as in UCB or Thompson sampling for an optimistic agent. If the agent only chooses actions for which it believes it will obtain the highest expected reward, the agent can avoid falsely exploiting the model.