Reward Plugin

In all the training programs for reinforcement learning of this repository, users MUST set up a Python source code called a reward plug-in, which allows for extremely flexible design of rewards in reinforcement learning. Reward is one of the most important elements in determining the direction of the model in reinforcement learning. The fact that users can flexibly design rewards means that they essentially have the freedom to determine the direction of the model in reinforcement learning.

The only requirement for a reward plug-in is to define the get_reward function with the following signature:

def get_reward(
    sparse: torch.Tensor, numeric: torch.Tensor, progression: torch.Tensor,
    candidate: torch.Tensor, index: torch.Tensor, game_rank: Optional[int],
    game_score: Optional[int]) -> float:
    .....

This function is called in the reinforcement learning training program for each training example, that is, for pairs of decision-making points representing a state transition or for the last decision-making point of each game. The parameters of this function are set with the features at the decision-making point immediately after each state transition, or the last decision-making point of each game and the results of the game. The meaning of each parameter is as follows:

sparse: sparse feature of the decision-making point,
numeric: numeric feature of the decision-making point,
progression: progression feature of the decision-making point,
candidate: possible actions of the decision-making point,
index: actual action of the decision-making point,
game_rank: the rank at the end of the game if the function is called at the end of a game, None otherwise, and
game_score: the score at the end of the game if the function is called at the end of a game, None otherwise.

The return value of the function MUST be a real number, representing the reward at each state transition.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reward Plugin

Clone this wiki locally