Error in user YAML: (<unknown>): mapping values are not allowed in this context at line 3 column 65

---
title: Exploration and Exploitation.
date: 2020-09-28 18:34
tags: :behavior:algorithm:reinforcement-learning:literature-note:
type: note
---

Exploration and Exploitation.

Challenge: How to use algorithm that explore and exploit space solution in the same time.
Exploration: It happens when use non-greedy actions. In another word, non-optimum actions with the highest value(reward)
Exploitation: Algorithm uses the highest action's value(reward).
Conflict: Does it(algorithm) use exploration and have chance to maximize future reward or use exploitation and maximize short-time reward?
RL algorithms use training information that evaluates the actions taken rather than instructions by giving correction actions.
- "Evaluative feedback indicates how good the action taken was, but not where it was the best or the worst action possible. Purely instructive feedback, on the other hand, indicates the correction action to take, independently of the action actually taken." Sutton, Page 47
Has it ever tried finding a balance between exploration ?
Methods:

Provide feedback