Skip to content

Liu-Xiaoyan97/MHBA-Mixer

Repository files navigation

MHBA-Mixer

Pay attention to the hidden semanteme.

Architecture of MHBA-Mixer

Arcitecture of MHBA-Mixer

details for hidden bias attention (HBA)

(left) hidden bias attention (HBA), (right) Multi-Head HBA with n heads

How to train

python main.py -d=YOUR_DATASET -t=train -p=YOUR_MODEL
YOUR_DATASET must be selected in configs/nlp/*.yml
when train your model, -p is optional.

How to test

python main.py -d=YOUR_DATASET -t=test -p=YOUR_MODEL
-p must be a specific model in trained-models/*.ckpt
We provide 9 datasets which have been displayed in Table 1.

Experiments

Table 1: Results on text classification tasks.

Model Accuracy Parameters (M)
AGNews Amazon DBpedia
XLNet 95.55 / 99.40 240
UDA / 96.50 98.91 /
BERT Large / 97.37 99.36 340
BERT-ITPT-FiT 95.20 / 99.32 /
pNLP-Mixer XS 89.62 90.38 98.24 0.404
pNLP-Mixer XL 90.45 90.56 98.40 6.0
HBA-Mixer-2 91.38 91.28 93.49 0.13
MHBA-Mixer-64d 91.68 91.17 98.11 0.10/0.10/0.13
MHBA-Mixer-256d 91.79 91.88 98.44 0.73/0.73/0.73

Table 2: Results on semantic analysis tasks.

Model Accuracy Parameters (M)
Hyperpartisan IMDb Yelp-2
RoBERTa 87.40 95.30 / 125
Longformer 94.80 96.70 / 149
XLNet / 96.21 98.63 240
BERT Large / 95.49 / 340
UDA / 95.80 97.95 /
pNLP-Mixer XS 89.80 81.90 84.05 2.2/1.2/0.403
pNLP-Mixer XL 89.20 82.90 84.05 8.4/6.8/4.9
HBA-Mixer-2 77.86 86.79 92.81 8.5/2.2/0.12
MHBA-Mixer-64d / 87.08 92.35 -/0.10/0.10
MHBA-Mixer-256d 89.43 87.88 92.57 0.68/0.73/0.73

Table 3: Results on natural language inference.

Model Accuracy (%) Parameters (M)
SST-2 CoLA QQP
RoBERTa 96.70 67.80 90.20 125
XLNet 94.40 69.00 90.40 240
BERT Large 93.70 71.00 88.00 340
gMLP Large 94.80 / / 365
FNet Large 95 71 88 238
MobileBERT tiny 91.70 46.70 68.90 15.1
MobileBERT 92.80 50.50 70.20 25.3
MobileBERT w/s OPT 92.60 51.10 70.50 25.3
pNLP-Mixer XS 79.70 69.45 83.70 0.403
pNLP-Mixer XL 80.90 69.94 84.90 5.3
HyperMixer 80.70 / 83.70 12.5
HBA-Mixer-2 80.21 69.12 81.55 0.13/0.13/0.23
MHBA-Mixer-64d 83.21 69.23 81.96 0.10/0.10/0.09
MHBA-Mixer-256d 83.48 69.51 82.02 0.73/0.73/0.73

Table 4: Results of the comparasion between HBA-Mixer and MHBA-Mixers.

Dataset HBA-Mixer-2 MHBA-Mixer-64d MHBA-Mixer-256d
ACC. (%) Param. (M) ACC. (%) Param. (M) ACC. (%) Param. (M)
AGNews 91.38 0.13 91.68 (+0.30) 0.10 (-0.03) 91.79 (+0.41) 0.73 (+0.60)
Amazon-2 91.28 0.13 / / 91.88 (+0.60) 0.73 (+0.60)
DBpedia 93.49 0.13 98.11 (+4.62) 0.13 98.44 (+4.95) 0.73 (+0.60)
Hyperpartisan 77.86 8.50 / / 89.43 (+11.57) 0.70 (-7.80)
IMDb 86.79 2.20 87.08 (+0.29) 0.10 (-2.10) 87.88 (+1.09) 0.68 (+1.52)
Yelp-2 92.81 0.12 92.35 (-0.46) 0.10 (-0.02) 92.57 (-0.24) 0.73 (+0.60)
SST-2 80.21 0.13 83.21 (+1.11) 0.10 (-0.03) 83.48 (+3.27) 0.73 (+0.60)
CoLA 69.12 0.13 69.23 (+0.11) 0.10 (-0.03) 69.51 (+0.39) 0.73 (+0.60)
QQP 81.55 0.23 81.96 (+0.41) 0.09 (-0.14) 82.02 (+0.47) 0.73 (+0.50)

Table 5: Main results of MHBA-Mixer with different hidden dimension.

Hidden Dimension AGNews IMDb SST-2
Accuracy (%) Parameters (M) Accuracy (%) Parameters (M) Accuracy (%) Parameters (M)
64 91.30 0.10 87.08 0.10 83.21 0.10
128 91.42 0.25 87.76 0.24 82.63 0.25
256 91.79 0.73 87.88 0.68 83.48 0.73

Please stay tuned for this series.

@article{TANG2023119076,
title = {Pay attention to the hidden semanteme},
journal = {Information Sciences},
volume = {640},
pages = {119076},
year = {2023},
issn = {0020-0255},
doi = {https://doi.org/10.1016/j.ins.2023.119076},
url = {https://www.sciencedirect.com/science/article/pii/S0020025523006618},
author = {Huanling Tang and Xiaoyan Liu and Yulin Wang and Quansheng Dou and Mingyu Lu},
@Article{Liu2023,
  author    = {Xiaoyan Liu and Huanling Tang and Jie Zhao and Quansheng Dou and Mingyu Lu},
  journal   = {Eng. Appl. Artif. Intell.},
  title     = {TCAMixer: {A} lightweight Mixer based on a novel triple concepts attention mechanism for {NLP}},
  year      = {2023},
  number    = {Part {C}},
  pages     = {106471},
  volume    = {123},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl    = {https://dblp.org/rec/journals/eaai/LiuTZDL23.bib},
  doi       = {10.1016/J.ENGAPPAI.2023.106471},
}

About

Pay attention to the hidden semanteme

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published