MHBA-Mixer

Pay attention to the hidden semanteme.

Architecture of MHBA-Mixer

details for hidden bias attention (HBA)

How to train

python main.py -d=YOUR_DATASET -t=train -p=YOUR_MODEL
YOUR_DATASET must be selected in configs/nlp/*.yml
when train your model, -p is optional.

How to test

python main.py -d=YOUR_DATASET -t=test -p=YOUR_MODEL
-p must be a specific model in trained-models/*.ckpt
We provide 9 datasets which have been displayed in Table 1.

Experiments

Table 1: Results on text classification tasks.

Model	Accuracy			Parameters (M)
Model	AGNews	Amazon	DBpedia	Parameters (M)
XLNet	95.55	/	99.40	240
UDA	/	96.50	98.91	/
BERT Large	/	97.37	99.36	340
BERT-ITPT-FiT	95.20	/	99.32	/
pNLP-Mixer XS	89.62	90.38	98.24	0.404
pNLP-Mixer XL	90.45	90.56	98.40	6.0
HBA-Mixer-2	91.38	91.28	93.49	0.13
MHBA-Mixer-64d	91.68	91.17	98.11	0.10/0.10/0.13
MHBA-Mixer-256d	91.79	91.88	98.44	0.73/0.73/0.73

Table 2: Results on semantic analysis tasks.

Model	Accuracy			Parameters (M)
Model	Hyperpartisan	IMDb	Yelp-2	Parameters (M)
RoBERTa	87.40	95.30	/	125
Longformer	94.80	96.70	/	149
XLNet	/	96.21	98.63	240
BERT Large	/	95.49	/	340
UDA	/	95.80	97.95	/
pNLP-Mixer XS	89.80	81.90	84.05	2.2/1.2/0.403
pNLP-Mixer XL	89.20	82.90	84.05	8.4/6.8/4.9
HBA-Mixer-2	77.86	86.79	92.81	8.5/2.2/0.12
MHBA-Mixer-64d	/	87.08	92.35	-/0.10/0.10
MHBA-Mixer-256d	89.43	87.88	92.57	0.68/0.73/0.73

Table 3: Results on natural language inference.

Model	Accuracy (%)			Parameters (M)
Model	SST-2	CoLA	QQP	Parameters (M)
RoBERTa	96.70	67.80	90.20	125
XLNet	94.40	69.00	90.40	240
BERT Large	93.70	71.00	88.00	340
gMLP Large	94.80	/	/	365
FNet Large	95	71	88	238
MobileBERT tiny	91.70	46.70	68.90	15.1
MobileBERT	92.80	50.50	70.20	25.3
MobileBERT w/s OPT	92.60	51.10	70.50	25.3
pNLP-Mixer XS	79.70	69.45	83.70	0.403
pNLP-Mixer XL	80.90	69.94	84.90	5.3
HyperMixer	80.70	/	83.70	12.5
HBA-Mixer-2	80.21	69.12	81.55	0.13/0.13/0.23
MHBA-Mixer-64d	83.21	69.23	81.96	0.10/0.10/0.09
MHBA-Mixer-256d	83.48	69.51	82.02	0.73/0.73/0.73

Table 4: Results of the comparasion between HBA-Mixer and MHBA-Mixers.

Dataset	HBA-Mixer-2		MHBA-Mixer-64d		MHBA-Mixer-256d
Dataset	ACC. (%)	Param. (M)	ACC. (%)	Param. (M)	ACC. (%)	Param. (M)
AGNews	91.38	0.13	91.68 (+0.30)	0.10 (-0.03)	91.79 (+0.41)	0.73 (+0.60)
Amazon-2	91.28	0.13	/	/	91.88 (+0.60)	0.73 (+0.60)
DBpedia	93.49	0.13	98.11 (+4.62)	0.13	98.44 (+4.95)	0.73 (+0.60)
Hyperpartisan	77.86	8.50	/	/	89.43 (+11.57)	0.70 (-7.80)
IMDb	86.79	2.20	87.08 (+0.29)	0.10 (-2.10)	87.88 (+1.09)	0.68 (+1.52)
Yelp-2	92.81	0.12	92.35 (-0.46)	0.10 (-0.02)	92.57 (-0.24)	0.73 (+0.60)
SST-2	80.21	0.13	83.21 (+1.11)	0.10 (-0.03)	83.48 (+3.27)	0.73 (+0.60)
CoLA	69.12	0.13	69.23 (+0.11)	0.10 (-0.03)	69.51 (+0.39)	0.73 (+0.60)
QQP	81.55	0.23	81.96 (+0.41)	0.09 (-0.14)	82.02 (+0.47)	0.73 (+0.50)

Table 5: Main results of MHBA-Mixer with different hidden dimension.

Hidden Dimension	AGNews		IMDb		SST-2
Hidden Dimension	Accuracy (%)	Parameters (M)	Accuracy (%)	Parameters (M)	Accuracy (%)	Parameters (M)
64	91.30	0.10	87.08	0.10	83.21	0.10
128	91.42	0.25	87.76	0.24	82.63	0.25
256	91.79	0.73	87.88	0.68	83.48	0.73

Please stay tuned for this series.

@article{TANG2023119076,
title = {Pay attention to the hidden semanteme},
journal = {Information Sciences},
volume = {640},
pages = {119076},
year = {2023},
issn = {0020-0255},
doi = {https://doi.org/10.1016/j.ins.2023.119076},
url = {https://www.sciencedirect.com/science/article/pii/S0020025523006618},
author = {Huanling Tang and Xiaoyan Liu and Yulin Wang and Quansheng Dou and Mingyu Lu},

@Article{Liu2023,
  author    = {Xiaoyan Liu and Huanling Tang and Jie Zhao and Quansheng Dou and Mingyu Lu},
  journal   = {Eng. Appl. Artif. Intell.},
  title     = {TCAMixer: {A} lightweight Mixer based on a novel triple concepts attention mechanism for {NLP}},
  year      = {2023},
  number    = {Part {C}},
  pages     = {106471},
  volume    = {123},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl    = {https://dblp.org/rec/journals/eaai/LiuTZDL23.bib},
  doi       = {10.1016/J.ENGAPPAI.2023.106471},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

MHBA-Mixer

Architecture of MHBA-Mixer

details for hidden bias attention (HBA)

How to train

How to test

Experiments

Files

README.md

Latest commit

History

README.md

File metadata and controls

MHBA-Mixer

Architecture of MHBA-Mixer

details for hidden bias attention (HBA)

How to train

How to test

Experiments