Pay attention to the hidden semanteme.
details for hidden bias attention (HBA)
python main.py -d=YOUR_DATASET -t=train -p=YOUR_MODEL
YOUR_DATASET
must be selected in configs/nlp/*.yml
when train your model, -p
is optional.
python main.py -d=YOUR_DATASET -t=test -p=YOUR_MODEL
-p
must be a specific model in trained-models/*.ckpt
We provide 9 datasets which have been displayed in Table 1.
Table 1: Results on text classification tasks.
Model | Accuracy | Parameters (M) | ||
---|---|---|---|---|
AGNews | Amazon | DBpedia | ||
XLNet | 95.55 | / | 99.40 | 240 |
UDA | / | 96.50 | 98.91 | / |
BERT Large | / | 97.37 | 99.36 | 340 |
BERT-ITPT-FiT | 95.20 | / | 99.32 | / |
pNLP-Mixer XS | 89.62 | 90.38 | 98.24 | 0.404 |
pNLP-Mixer XL | 90.45 | 90.56 | 98.40 | 6.0 |
HBA-Mixer-2 | 91.38 | 91.28 | 93.49 | 0.13 |
MHBA-Mixer-64d | 91.68 | 91.17 | 98.11 | 0.10/0.10/0.13 |
MHBA-Mixer-256d | 91.79 | 91.88 | 98.44 | 0.73/0.73/0.73 |
Table 2: Results on semantic analysis tasks.
Model | Accuracy | Parameters (M) | ||
---|---|---|---|---|
Hyperpartisan | IMDb | Yelp-2 | ||
RoBERTa | 87.40 | 95.30 | / | 125 |
Longformer | 94.80 | 96.70 | / | 149 |
XLNet | / | 96.21 | 98.63 | 240 |
BERT Large | / | 95.49 | / | 340 |
UDA | / | 95.80 | 97.95 | / |
pNLP-Mixer XS | 89.80 | 81.90 | 84.05 | 2.2/1.2/0.403 |
pNLP-Mixer XL | 89.20 | 82.90 | 84.05 | 8.4/6.8/4.9 |
HBA-Mixer-2 | 77.86 | 86.79 | 92.81 | 8.5/2.2/0.12 |
MHBA-Mixer-64d | / | 87.08 | 92.35 | -/0.10/0.10 |
MHBA-Mixer-256d | 89.43 | 87.88 | 92.57 | 0.68/0.73/0.73 |
Table 3: Results on natural language inference.
Model | Accuracy (%) | Parameters (M) | ||
---|---|---|---|---|
SST-2 | CoLA | QQP | ||
RoBERTa | 96.70 | 67.80 | 90.20 | 125 |
XLNet | 94.40 | 69.00 | 90.40 | 240 |
BERT Large | 93.70 | 71.00 | 88.00 | 340 |
gMLP Large | 94.80 | / | / | 365 |
FNet Large | 95 | 71 | 88 | 238 |
MobileBERT tiny | 91.70 | 46.70 | 68.90 | 15.1 |
MobileBERT | 92.80 | 50.50 | 70.20 | 25.3 |
MobileBERT w/s OPT | 92.60 | 51.10 | 70.50 | 25.3 |
pNLP-Mixer XS | 79.70 | 69.45 | 83.70 | 0.403 |
pNLP-Mixer XL | 80.90 | 69.94 | 84.90 | 5.3 |
HyperMixer | 80.70 | / | 83.70 | 12.5 |
HBA-Mixer-2 | 80.21 | 69.12 | 81.55 | 0.13/0.13/0.23 |
MHBA-Mixer-64d | 83.21 | 69.23 | 81.96 | 0.10/0.10/0.09 |
MHBA-Mixer-256d | 83.48 | 69.51 | 82.02 | 0.73/0.73/0.73 |
Table 4: Results of the comparasion between HBA-Mixer and MHBA-Mixers.
Dataset | HBA-Mixer-2 | MHBA-Mixer-64d | MHBA-Mixer-256d | |||
---|---|---|---|---|---|---|
ACC. (%) | Param. (M) | ACC. (%) | Param. (M) | ACC. (%) | Param. (M) | |
AGNews | 91.38 | 0.13 | 91.68 (+0.30) | 0.10 (-0.03) | 91.79 (+0.41) | 0.73 (+0.60) |
Amazon-2 | 91.28 | 0.13 | / | / | 91.88 (+0.60) | 0.73 (+0.60) |
DBpedia | 93.49 | 0.13 | 98.11 (+4.62) | 0.13 | 98.44 (+4.95) | 0.73 (+0.60) |
Hyperpartisan | 77.86 | 8.50 | / | / | 89.43 (+11.57) | 0.70 (-7.80) |
IMDb | 86.79 | 2.20 | 87.08 (+0.29) | 0.10 (-2.10) | 87.88 (+1.09) | 0.68 (+1.52) |
Yelp-2 | 92.81 | 0.12 | 92.35 (-0.46) | 0.10 (-0.02) | 92.57 (-0.24) | 0.73 (+0.60) |
SST-2 | 80.21 | 0.13 | 83.21 (+1.11) | 0.10 (-0.03) | 83.48 (+3.27) | 0.73 (+0.60) |
CoLA | 69.12 | 0.13 | 69.23 (+0.11) | 0.10 (-0.03) | 69.51 (+0.39) | 0.73 (+0.60) |
QQP | 81.55 | 0.23 | 81.96 (+0.41) | 0.09 (-0.14) | 82.02 (+0.47) | 0.73 (+0.50) |
Table 5: Main results of MHBA-Mixer with different hidden dimension.
Hidden Dimension | AGNews | IMDb | SST-2 | |||
---|---|---|---|---|---|---|
Accuracy (%) | Parameters (M) | Accuracy (%) | Parameters (M) | Accuracy (%) | Parameters (M) | |
64 | 91.30 | 0.10 | 87.08 | 0.10 | 83.21 | 0.10 |
128 | 91.42 | 0.25 | 87.76 | 0.24 | 82.63 | 0.25 |
256 | 91.79 | 0.73 | 87.88 | 0.68 | 83.48 | 0.73 |
Please stay tuned for this series.
@article{TANG2023119076,
title = {Pay attention to the hidden semanteme},
journal = {Information Sciences},
volume = {640},
pages = {119076},
year = {2023},
issn = {0020-0255},
doi = {https://doi.org/10.1016/j.ins.2023.119076},
url = {https://www.sciencedirect.com/science/article/pii/S0020025523006618},
author = {Huanling Tang and Xiaoyan Liu and Yulin Wang and Quansheng Dou and Mingyu Lu},
@Article{Liu2023,
author = {Xiaoyan Liu and Huanling Tang and Jie Zhao and Quansheng Dou and Mingyu Lu},
journal = {Eng. Appl. Artif. Intell.},
title = {TCAMixer: {A} lightweight Mixer based on a novel triple concepts attention mechanism for {NLP}},
year = {2023},
number = {Part {C}},
pages = {106471},
volume = {123},
bibsource = {dblp computer science bibliography, https://dblp.org},
biburl = {https://dblp.org/rec/journals/eaai/LiuTZDL23.bib},
doi = {10.1016/J.ENGAPPAI.2023.106471},
}