Skip to content

An implementation in python for performance comparison of the algorithms E-greedy and Upper-Confidence-Bound on Most valuable bandit selection

License

Notifications You must be signed in to change notification settings

petroud/E-greedy_and_UCB_algorithms

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

-->Environment:  
    - k stochastic bandits
    - Each bandit i has a reward that is uniformly distributed in [a_i, b_i].
    - a_i, b_i should be chosen randomly in [0,1] and must be different for each arm i.
    - Example, if a_i = 0.3 and b_i = 0.8 for arm i, then its reward at a given time can take any value in [0.3,0.8] with equal probability ("uniform") and its expected reward mu_i = 0.55

-->Algorithms:
    1. ε-Greedy: assume ε_t gets reduced according to the theorem in the slides.
    2. Upper Confidence Bound algorithm

-->Measurement Tasks:
    1. Produce plots that prove or disprove the respective sublinear regret rates for each scheme
    2. Compare the convergence/learning speed of the two algorithms for T = 1000, k = 10
    3. Repeat  (2) for another two scenarios with different T,k values and comment on the differences similarities.

Hand in:
    1. Your python notebook of the code
       this must execute correctly, producing also the respective plots above
       and it must be commented in detail (follow the example notebook I uploaded) 
    2. A short report (1-2 pages max) with 
       - your measurement plots and short comments for each
         (all plots should include axis titles, legends, etc., to be readable.)

About

An implementation in python for performance comparison of the algorithms E-greedy and Upper-Confidence-Bound on Most valuable bandit selection

Resources

License

Stars

Watchers

Forks