Can someone please explain what the discount factor means in the Value Iteration Algorithm for solving Markov Decision Processes?
I understand the equation, but I don’t understand why it requires the discount factor (gamma).
1
Here’s what I understand: discount factor represents the preference of short-term solutions over long-term solutions.
For example, if I could earn $1 today, I’d value it more than $1 which I could earn tomorrow, and much more than $1 which I could earn on Jan 1, 2050, because random factor change situation more and more as time passes. Discount factor shows how much is today’s $1 more valuable than tomorrow’s $1.
Since the whole algorithm is about making decisions where the outcome partly depends on random inputs which can drift away over time, invalidating initial decision, it makes sense to prefer decisions which a better as short-term solutions.