A short note on Standford CS234 Reinforcement Learning 2019 Lecture1
How should an RL agent balance its action?
- Exploration: trying new things that might enable the agent to make better decisions in the future
- Exploitation: choosing actions that are expected to yield good reward given the past experience
Often there may be an exploration-exploitation tradeoff, we may have to sacrifice reward in order to explore and learn about better policy.
To make the idea concrete, if you go to a restaurant, they have several different dishes, you want to optimize at the best dish, the best strategy is actually depends on how long you will spend near that restaurant. If you are going to live there for a long time, the best strategy is try them all, instead, when you go to the restaurant last time, you should order the known best dish.
The underlying idea is fairly simple, when it applys to human lives, it means you should try different things while you are young, and stick to whatever interests you when you gets old.
It also suggests, "Treat everyday as if it's your last day" is actually a terrible strategy. Because if it is your last day, you should always choose to do whatever gives you the maximum pleasure, but if you have future, you should take more time for "exploration".