-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Progress stalled? Currently checking #112
Comments
maybe the structure 12x256 is not as good as 15x192? |
Going to try an experiment today. There will be quite a lot of fast and small matches to get some feedback: do not panic! :-) |
it seems we spotted a false positive (38a13344) just in time and avoided a false promotion, this gives me a few ideas on how SAI's promotion strategy could be improved, i'd like to know if you find some of them interesting :slight_smile:
what do you think about all this? |
About promotion strategy, we did many different experiments in 7x7 runs an three experiments in 9x9 runs, so that we could more or less get it right on the first attempt on 19x19. We are now doing a weak gating strategy, tested with good results in one of the 9x9 runs: always promote a network after the same number of games (5120, currently) but test a small number and choose the best among those. The aim is to avoid picking a very bad network, not to have always a real improvement. Recently we are hand-picking a little the networks, and doing early reference and comparison tests before promotion, to be sure that we are choosing correctly. In principle, this could be avoided completely: even if we let the procedure choose a network that is a step back in strength, if will be a small step back (no more than 50 Elo I would say) and recovered quickly in the next generations. The problem is that at this level of play, the final result of games do not hold enough information to discriminate between similar networks. Of course you can have big steps in strength when you increase blocks or do other major things, but generally similar nets will get around 50% wins against each other. You see that there is not much difference in our results against LZ107 through LZ116. If you study a match game with a stronger AI, you can identify the key move that lost a game. If you then check again with the AI that made the mistake, you will see that often the move was chosen by chance, in the sense that if you ask two or three times it will often change idea and choose instead the right move. So I believe that the result of match games at this level is 5% a difference in strength and 95% a fair coin toss. We will need to find a way to compare networks at a finer level of detail: move by move, instead of game by game. We have ideas, but it is a lot of work, so maybe in the future. In conclusion, we are not changing the weak-gating strategy, in particular, every promoted network will play the same number of games and then be substituted. Apart from this, we will listen to suggestions, but remember that match games results are noisy and hold small information, and that it is not so important to always have a stronger net. Finally, I believe that while we can play with the visits number in reference, panel and comparison games, for promotion matches we want to keep the visit number equal to the official one for self-plays, because in this way we are judging networks fairly. (Recall that v1 means only policy head, and the higher the visits, the more weight you are putting on value head, so if you change the visits, you are weighting differently the two components.) |
yes, no gating is great too :)
good point
ah i wasn't aware of this, so the data serves many purpose, i see, good initiative indeed
yes handpicking is generally something we'd want to avoid, and just let the AI training be autonomous for sustainability i admit as a contributor that i'm always looking forward to promotions that increase strength, but i am not expecting it to always improve either (i know it can go up and down normally), but i think it may be a let down for some contributors to see it "regress" in their eyes (even though results show consistent if not rising global strength against lz networks) i think this point is something SAI would greatly benefit from communicating more about
some introduction like that in the website with SAI's own words would be super effective and appreciated i didnt finish reading but i'm commenting this first so i dont randomly lose it :) |
ok i understand it all, big thanks 💯 i also think SAI's weak gating approach is working fine and there is no reason to change it it also seems i was a bit too much focused on strength improvement at every promotion (all the more reason why SAI would greatly benefit from clarifying that on the website with a small introduction) if significance can be increased for promotion matches (for example by increasing game number a bit), i think it could only be a good thing, as long as it is not done at the expense of other things SAI needs thanks for all this information 💯 |
Recent promoted networks Elo are not very exciting. We will do an exceptional number of promotion matches for the current generation, to see if we can understand what the problem may be.
Stay tuned.
The text was updated successfully, but these errors were encountered: