-
Notifications
You must be signed in to change notification settings - Fork 40
BASE/BC_H13/IQL_S3 v20220301
Cryolite edited this page Oct 11, 2022
·
17 revisions
- Type: Transformer encoder layers (the same network structure as the one used for BERTBASE)
- Dimension: 768
- # of heads: 12
- Dimension of feedforward networks: 3072
- # of layers: 12
- Activation function: GELU
- Dropout rate in training: 0.1
- Initialization: Transferred from the trained encoder of BASE/BC_H13 v20220210
- Type: Single-layer position-wise feedforward network
- Dimension: 3072
- Activation function: GELU
- Dropout rate in training: 0.1
- Initialization: Random
- Type: Transformer encoder layers (the same network structure as the one used for BERTBASE)
- Dimension: 768
- # of heads: 12
- Dimension of feedforward networks: 3072
- # of layers: 12
- Activation function: GELU
- Dropout rate in training: 0.1
- Initialization: Transferred from the trained encoder of BASE/BC_H13 v20220210
- Type: Dueling network with two single-layer position-wise feedforward networks
- Dimension: 3072
- Activation function: GELU
- Dropout rate in training: 0.1
- Initialization: Random
- Type: Implicit Q-learning (IQL)
- Reward: Game delta of grading points as a Saint 3 player in the Jade room
Crawled Game Records v202007_202107
100044800 samples randomly sampled from the crawled game records and shuffled.
- Discount factor (γ): 0.99
- Expectile (τ): 0.9
- Soft update (Polyak averaging) rate of target networks (α): 0.1
- Optimizer: LAMB
- Learning rate: 0.001
- ε: 1.0e-6
- Batch size: 4096
- # of training epochs: 1
- Inverse temperature (β): 1.0
- Optimizer: LAMB
- Learning rate: 0.001
- ε: 1.0e-6
- Batch size: 4096
- # of training epochs: 1 (More than 1 epoch seem to result in overfitting)
Quantitative Comparison with BASE/BC_H13 v20220210 as the Baseline
Please refer to Methods and Metrics in Performance Comparison and Evaluation for the evaluation method and the meaning of each metric.
2500 sets of duplicate mahjong for the 1vs3 and 3vs1 styles, respectively, and 1667 sets for the 2vs2 style. All games are half-length.
average | variance (unbiased) |
99% CI LL | 95% CI LL | 95% CI UL | 99% CI UL | ||
---|---|---|---|---|---|---|---|
1vs3 # of games | 10000 | ||||||
1vs3 ranking | 2.457×100 | 1.306×100 | 2.428×100 | 2.435×100 | 2.479×100 | 2.486×100 | |
1vs3 grading point | -1.631×101 | 2.526×104 | -2.040×101 | -1.943×101 | -1.319×101 | -1.222×101 | |
1vs3 soul point | 1.488×10-2 | 1.508×10-1 | 4.875×10-3 | 7.268×10-3 | 2.249×10-2 | 2.488×10-2 | |
1vs3 top rate | 2.760×10-1 | 1.998×10-5 | 2.645×10-1 | 2.672×10-1 | 2.848×10-1 | 2.875×10-1 | |
1vs3 quinella rate | 5.198×10-1 | 2.496×10-5 | 5.069×10-1 | 5.100×10-1 | 5.296×10-1 | 5.327×10-1 | |
1vs3 ranking diff | -5.733×10-2 | 2.322×100 | N/A (one-sided) |
N/A (one-sided) |
-3.226×10-2 | -2.188×10-2 | |
2vs2 # of games | 10002 | ||||||
2vs2 ranking | 2.428×100 | 1.279×100 | 2.399×100 | 2.406×100 | 2.450×100 | 2.457×100 | |
2vs2 grading point | -1.125×101 | 2.442×104 | -1.528×101 | -1.431×101 | -8.187×100 | -7.224×100 | |
2vs2 soul point | 2.461×10-2 | 1.480×10-1 | 1.470×10-2 | 1.707×10-3 | 3.215×10-2 | 3.452×10-2 | |
2vs2 top rate | 2.794×10-1 | 1.007×10-5 | 2.712×10-1 | 2.732×10-1 | 2.856×10-1 | 2.876×10-1 | |
2vs2 quinella rate | 5.302×10-1 | 1.245×10-5 | 5.211×10-1 | 5.233×10-1 | 5.371×10-1 | 5.393×10-1 | |
2vs2 ranking diff | -1.439×10-1 | 1.658×100 | N/A (one-sided) |
N/A (one-sided) |
-1.227×10-1 | -1.139×10-1 | |
3vs1 # of games | 10000 | ||||||
3vs1 ranking | 2.482×100 | 1.260×100 | 2.453×100 | 2.460×100 | 2.504×100 | 2.511×100 | |
3vs1 grading point | -1.749×101 | 2.458×104 | -2.153×101 | -2.056×101 | -1.442×101 | -1.345×101 | |
3vs1 soul point | 6.410×10-3 | 1.461×10-1 | -3.707×10-3 | -1.352×10-3 | 1.363×10-2 | 1.599×10-2 | |
3vs1 top rate | 2.572×10-1 | 6.369×10-6 | 2.507×10-1 | 2.523×10-1 | 2.621×10-1 | 2.637×10-1 | |
3vs1 quinella rate | 5.084×10-1 | 8.331×10-6 | 5.010×10-1 | 5.027×10-1 | 5.141×10-1 | 5.158×10-1 | |
3vs1 ranking diff | -7.067×10-2 | 2.162×100 | N/A (one-sided) |
N/A (one-sided) |
-4.648×10-2 | -3.646×10-2 |
average | variance (unbiased) |
99% CI LL | 95% CI LL | 95% CI UL | 99% CI UL | ||
---|---|---|---|---|---|---|---|
1vs3 # of games | 10000 | ||||||
1vs3 ranking | 2.618×100 | 1.254×100 | 2.589×100 | 2.596×100 | 2.640×100 | 2.647×100 | |
1vs3 grading point | -3.647×101 | 2.588×104 | -4.061×101 | -3.962×101 | -3.332×101 | -3.233×101 | |
1vs3 soul point | -3.940×10-2 | 1.453×10-1 | -4.922×10-2 | -4.687×10-2 | -3.193×10-2 | -2.958×10-2 | |
1vs3 top rate | 2.162×10-1 | 1.695×10-5 | 2.056×10-1 | 2.081×10-1 | 2.243×10-1 | 2.268×10-1 | |
1vs3 quinella rate | 4.588×10-1 | 2.483×10-5 | 4.460×10-1 | 4.490×10-1 | 4.686×10-1 | 4.716×10-1 | |
1vs3 ranking diff | 1.568×10-1 | 2.229×100 | N/A (one-sided) |
N/A (one-sided) |
1.814×10-1 | 1.915×10-1 | |
2vs2 # of games | 10002 | ||||||
2vs2 ranking | 2.442×100 | 1.284×100 | 2.413×100 | 2.420×100 | 2.464×100 | 2.471×100 | |
2vs2 grading point | -1.317×101 | 2.455×104 | -1.721×101 | -1.624×101 | -1.010×101 | -9.134×100 | |
2vs2 soul point | 1.969×10-2 | 1.485×10-1 | 9.763×10-3 | 1.214×10-2 | 2.724×10-2 | 2.962×10-2 | |
2vs2 top rate | 2.763×10-1 | 9.996×10-6 | 2.682×10-1 | 2.701×10-1 | 2.825×10-1 | 2.844×10-1 | |
2vs2 quinella rate | 5.236×10-1 | 1.247×10-5 | 5.145×10-1 | 5.167×10-1 | 5.305×10-1 | 5.327×10-1 | |
2vs2 ranking diff | -1.155×10-1 | 1.667×100 | N/A (one-sided) |
N/A (one-sided) |
-9.426×10-2 | -8.546×10-2 | |
3vs1 # of games | 10000 | ||||||
3vs1 ranking | 2.469×100 | 1.266×100 | 2.440×100 | 2.447×100 | 2.491×100 | 2.498×100 | |
3vs1 grading point | -1.600×101 | 2.451×104 | -2.003×101 | -1.907×101 | -1.293×101 | -1.197×101 | |
3vs1 soul point | 1.058×10-2 | 1.467×10-1 | -7.123×10-4 | -3.072×10-3 | 1.809×10-2 | 2.045×10-2 | |
3vs1 top rate | 2.626×10-1 | 6.455×10-6 | 2.561×10-1 | 2.576×10-1 | 2.676×10-1 | 2.691×10-1 | |
3vs1 quinella rate | 5.139×10-1 | 8.327×10-6 | 5.065×10-1 | 5.082×10-1 | 5.196×10-1 | 5.213×10-1 | |
3vs1 ranking diff | -1.225×10-1 | 2.117×100 | N/A (one-sided) |
N/A (one-sided) |
-9.857×10-2 | -8.865×10-2 |
(FIXME: Buggy results as of 2022/03/26)
average | variance (unbiased) |
99% CI LL | 95% CI LL | 95% CI UL | 99% CI UL | ||
---|---|---|---|---|---|---|---|
1vs3 # of games | 10000 | ||||||
1vs3 ranking | 2.508×100 | 1.235×100 | 2.479×100 | 2.486×100 | 2.530×100 | 2.537×100 | |
1vs3 grading point | -2.234×101 | 2.543×104 | -2.645×101 | -2.547×101 | -1.921×101 | -1.823×101 | |
1vs3 soul point | -2.870×10-3 | 1.498×10-1 | -1.284×10-2 | -1.046×10-2 | 4.717×10-3 | 7.101×10-3 | |
1vs3 top rate | 2.591×10-1 | 1.920×10-5 | 2.478×10-1 | 2.505×10-1 | 2.677×10-1 | 2.704×10-1 | |
1vs3 quinella rate | 4.962×10-1 | 2.500×10-5 | 4.833×10-1 | 4.864×10-1 | 5.060×10-1 | 5.091×10-1 | |
1vs3 ranking diff | 1.107×10-2 | 2.303×100 | N/A (one-sided) |
N/A (one-sided) |
3.603×10-2 | 4.638×10-2 | |
2vs2 # of games | 10002 | ||||||
2vs2 ranking | 2.483×100 | 1.271×100 | 2.454×100 | 2.461×100 | 2.505×100 | 2.512×100 | |
2vs2 grading point | -1.810×101 | 2.483×104 | -2.216×101 | -2.119×101 | -1.501×101 | -1.404×101 | |
2vs2 soul point | 5.884×10-3 | 1.472×10-1 | -3.999×10-3 | -1.636×10-3 | 1.340×10-2 | 1.577×10-2 | |
2vs2 top rate | 2.595×10-1 | 9.607×10-6 | 2.515×10-1 | 2.534×10-1 | 2.656×10-1 | 2.675×10-1 | |
2vs2 quinella rate | 5.083×10-1 | 1.249×10-5 | 4.992×10-1 | 5.014×10-1 | 5.152×10-1 | 5.174×10-1 | |
2vs2 ranking diff | -3.369×10-2 | 1.624×100 | N/A (one-sided) |
N/A (one-sided) |
-1.273×10-2 | -4.042×10-3 | |
3vs1 # of games | 10000 | ||||||
3vs1 ranking | 2.499×100 | 1.261×100 | 2.470×100 | 2.477×100 | 2.521×100 | 2.528×100 | |
3vs1 grading point | -1.958×101 | 2.469×104 | -2.363×101 | -2.266×101 | -1.650×101 | -1.553×101 | |
3vs1 soul point | 2.733×10-4 | 1.462×10-1 | -9.578×10-3 | -7.222×10-3 | 7.768×10-3 | 1.012×10-2 | |
3vs1 top rate | 2.533×10-1 | 6.304×10-6 | 2.468×10-1 | 2.484×10-1 | 2.582×10-1 | 2.598×10-1 | |
3vs1 quinella rate | 5.000×10-1 | 8.334×10-6 | 4.926×10-1 | 4.943×10-1 | 5.057×10-1 | 5.074×10-1 | |
3vs1 ranking diff | -3.600×10-3 | 2.162×100 | N/A (one-sided) |
N/A (one-sided) |
2.059×10-2 | 3.061×10-2 |