Skip to content

BASE/BC_H13/IQL_S3 v20220301

Cryolite edited this page Oct 11, 2022 · 17 revisions

Model

Value Network V(s)

Encoder

  • Type: Transformer encoder layers (the same network structure as the one used for BERTBASE)
    • Dimension: 768
    • # of heads: 12
    • Dimension of feedforward networks: 3072
    • # of layers: 12
    • Activation function: GELU
    • Dropout rate in training: 0.1
    • Initialization: Transferred from the trained encoder of BASE/BC_H13 v20220210

Decoder

  • Type: Single-layer position-wise feedforward network
    • Dimension: 3072
    • Activation function: GELU
    • Dropout rate in training: 0.1
    • Initialization: Random

Q Network Q(s, a)

Encoder

  • Type: Transformer encoder layers (the same network structure as the one used for BERTBASE)
    • Dimension: 768
    • # of heads: 12
    • Dimension of feedforward networks: 3072
    • # of layers: 12
    • Activation function: GELU
    • Dropout rate in training: 0.1
    • Initialization: Transferred from the trained encoder of BASE/BC_H13 v20220210

Decoder

  • Type: Dueling network with two single-layer position-wise feedforward networks
    • Dimension: 3072
    • Activation function: GELU
    • Dropout rate in training: 0.1
    • Initialization: Random

Objective

Data

Crawled Game Records

Crawled Game Records v202007_202107

Training Examples

100044800 samples randomly sampled from the crawled game records and shuffled.

Optimization

Implicit Q-learning (IQL)

  • Discount factor (γ): 0.99
  • Expectile (τ): 0.9
  • Soft update (Polyak averaging) rate of target networks (α): 0.1
  • Optimizer: LAMB
  • Learning rate: 0.001
  • ε: 1.0e-6
  • Batch size: 4096
  • # of training epochs: 1

Advantage Weighted Regression (AWR)

  • Inverse temperature (β): 1.0
  • Optimizer: LAMB
  • Learning rate: 0.001
  • ε: 1.0e-6
  • Batch size: 4096
  • # of training epochs: 1 (More than 1 epoch seem to result in overfitting)

Quantitative Comparison with BASE/BC_H13 v20220210 as the Baseline

Please refer to Methods and Metrics in Performance Comparison and Evaluation for the evaluation method and the meaning of each metric.

2500 sets of duplicate mahjong for the 1vs3 and 3vs1 styles, respectively, and 1667 sets for the 2vs2 style. All games are half-length.

average variance
(unbiased)
99% CI LL 95% CI LL 95% CI UL 99% CI UL
1vs3 # of games 10000
1vs3 ranking 2.457×100 1.306×100 2.428×100 2.435×100 2.479×100 2.486×100
1vs3 grading point -1.631×101 2.526×104 -2.040×101 -1.943×101 -1.319×101 -1.222×101
1vs3 soul point 1.488×10-2 1.508×10-1 4.875×10-3 7.268×10-3 2.249×10-2 2.488×10-2
1vs3 top rate 2.760×10-1 1.998×10-5 2.645×10-1 2.672×10-1 2.848×10-1 2.875×10-1
1vs3 quinella rate 5.198×10-1 2.496×10-5 5.069×10-1 5.100×10-1 5.296×10-1 5.327×10-1
1vs3 ranking diff -5.733×10-2 2.322×100 N/A
(one-sided)
N/A
(one-sided)
-3.226×10-2 -2.188×10-2
2vs2 # of games 10002
2vs2 ranking 2.428×100 1.279×100 2.399×100 2.406×100 2.450×100 2.457×100
2vs2 grading point -1.125×101 2.442×104 -1.528×101 -1.431×101 -8.187×100 -7.224×100
2vs2 soul point 2.461×10-2 1.480×10-1 1.470×10-2 1.707×10-3 3.215×10-2 3.452×10-2
2vs2 top rate 2.794×10-1 1.007×10-5 2.712×10-1 2.732×10-1 2.856×10-1 2.876×10-1
2vs2 quinella rate 5.302×10-1 1.245×10-5 5.211×10-1 5.233×10-1 5.371×10-1 5.393×10-1
2vs2 ranking diff -1.439×10-1 1.658×100 N/A
(one-sided)
N/A
(one-sided)
-1.227×10-1 -1.139×10-1
3vs1 # of games 10000
3vs1 ranking 2.482×100 1.260×100 2.453×100 2.460×100 2.504×100 2.511×100
3vs1 grading point -1.749×101 2.458×104 -2.153×101 -2.056×101 -1.442×101 -1.345×101
3vs1 soul point 6.410×10-3 1.461×10-1 -3.707×10-3 -1.352×10-3 1.363×10-2 1.599×10-2
3vs1 top rate 2.572×10-1 6.369×10-6 2.507×10-1 2.523×10-1 2.621×10-1 2.637×10-1
3vs1 quinella rate 5.084×10-1 8.331×10-6 5.010×10-1 5.027×10-1 5.141×10-1 5.158×10-1
3vs1 ranking diff -7.067×10-2 2.162×100 N/A
(one-sided)
N/A
(one-sided)
-4.648×10-2 -3.646×10-2

Supplemental: AWR epoch = 2 (probably overfitting)

average variance
(unbiased)
99% CI LL 95% CI LL 95% CI UL 99% CI UL
1vs3 # of games 10000
1vs3 ranking 2.618×100 1.254×100 2.589×100 2.596×100 2.640×100 2.647×100
1vs3 grading point -3.647×101 2.588×104 -4.061×101 -3.962×101 -3.332×101 -3.233×101
1vs3 soul point -3.940×10-2 1.453×10-1 -4.922×10-2 -4.687×10-2 -3.193×10-2 -2.958×10-2
1vs3 top rate 2.162×10-1 1.695×10-5 2.056×10-1 2.081×10-1 2.243×10-1 2.268×10-1
1vs3 quinella rate 4.588×10-1 2.483×10-5 4.460×10-1 4.490×10-1 4.686×10-1 4.716×10-1
1vs3 ranking diff 1.568×10-1 2.229×100 N/A
(one-sided)
N/A
(one-sided)
1.814×10-1 1.915×10-1
2vs2 # of games 10002
2vs2 ranking 2.442×100 1.284×100 2.413×100 2.420×100 2.464×100 2.471×100
2vs2 grading point -1.317×101 2.455×104 -1.721×101 -1.624×101 -1.010×101 -9.134×100
2vs2 soul point 1.969×10-2 1.485×10-1 9.763×10-3 1.214×10-2 2.724×10-2 2.962×10-2
2vs2 top rate 2.763×10-1 9.996×10-6 2.682×10-1 2.701×10-1 2.825×10-1 2.844×10-1
2vs2 quinella rate 5.236×10-1 1.247×10-5 5.145×10-1 5.167×10-1 5.305×10-1 5.327×10-1
2vs2 ranking diff -1.155×10-1 1.667×100 N/A
(one-sided)
N/A
(one-sided)
-9.426×10-2 -8.546×10-2
3vs1 # of games 10000
3vs1 ranking 2.469×100 1.266×100 2.440×100 2.447×100 2.491×100 2.498×100
3vs1 grading point -1.600×101 2.451×104 -2.003×101 -1.907×101 -1.293×101 -1.197×101
3vs1 soul point 1.058×10-2 1.467×10-1 -7.123×10-4 -3.072×10-3 1.809×10-2 2.045×10-2
3vs1 top rate 2.626×10-1 6.455×10-6 2.561×10-1 2.576×10-1 2.676×10-1 2.691×10-1
3vs1 quinella rate 5.139×10-1 8.327×10-6 5.065×10-1 5.082×10-1 5.196×10-1 5.213×10-1
3vs1 ranking diff -1.225×10-1 2.117×100 N/A
(one-sided)
N/A
(one-sided)
-9.857×10-2 -8.865×10-2

Supplemental: AWR β = 2.0 (worse than β = 1.0)

(FIXME: Buggy results as of 2022/03/26)

average variance
(unbiased)
99% CI LL 95% CI LL 95% CI UL 99% CI UL
1vs3 # of games 10000
1vs3 ranking 2.508×100 1.235×100 2.479×100 2.486×100 2.530×100 2.537×100
1vs3 grading point -2.234×101 2.543×104 -2.645×101 -2.547×101 -1.921×101 -1.823×101
1vs3 soul point -2.870×10-3 1.498×10-1 -1.284×10-2 -1.046×10-2 4.717×10-3 7.101×10-3
1vs3 top rate 2.591×10-1 1.920×10-5 2.478×10-1 2.505×10-1 2.677×10-1 2.704×10-1
1vs3 quinella rate 4.962×10-1 2.500×10-5 4.833×10-1 4.864×10-1 5.060×10-1 5.091×10-1
1vs3 ranking diff 1.107×10-2 2.303×100 N/A
(one-sided)
N/A
(one-sided)
3.603×10-2 4.638×10-2
2vs2 # of games 10002
2vs2 ranking 2.483×100 1.271×100 2.454×100 2.461×100 2.505×100 2.512×100
2vs2 grading point -1.810×101 2.483×104 -2.216×101 -2.119×101 -1.501×101 -1.404×101
2vs2 soul point 5.884×10-3 1.472×10-1 -3.999×10-3 -1.636×10-3 1.340×10-2 1.577×10-2
2vs2 top rate 2.595×10-1 9.607×10-6 2.515×10-1 2.534×10-1 2.656×10-1 2.675×10-1
2vs2 quinella rate 5.083×10-1 1.249×10-5 4.992×10-1 5.014×10-1 5.152×10-1 5.174×10-1
2vs2 ranking diff -3.369×10-2 1.624×100 N/A
(one-sided)
N/A
(one-sided)
-1.273×10-2 -4.042×10-3
3vs1 # of games 10000
3vs1 ranking 2.499×100 1.261×100 2.470×100 2.477×100 2.521×100 2.528×100
3vs1 grading point -1.958×101 2.469×104 -2.363×101 -2.266×101 -1.650×101 -1.553×101
3vs1 soul point 2.733×10-4 1.462×10-1 -9.578×10-3 -7.222×10-3 7.768×10-3 1.012×10-2
3vs1 top rate 2.533×10-1 6.304×10-6 2.468×10-1 2.484×10-1 2.582×10-1 2.598×10-1
3vs1 quinella rate 5.000×10-1 8.334×10-6 4.926×10-1 4.943×10-1 5.057×10-1 5.074×10-1
3vs1 ranking diff -3.600×10-3 2.162×100 N/A
(one-sided)
N/A
(one-sided)
2.059×10-2 3.061×10-2