Model Performance¶
This page presents a summary of the performance of our natively trained models as well as the third-party models that we integrate into our library.
Note
Certain packages (such as Trankit) have been included for completeness of information in comparing
model performance but are not integrated into our library. These have been marked with an asterisk *
.
Note
Models from the Malaya package generally cannot be compared directly to other models as the datasets used both for training and testing have been augmented and use variable split ratios.
Furthermore, Malaya models have the option to be quantized which reduces file size and increases loading/inference speed, but causes a dip in performance.
Tokenization¶
Language |
Name |
Architecture |
Test Dataset |
F1 (%) |
---|---|---|---|---|
Indonesian |
SentencePiece + FFNN (XLM-R Large) |
99.89 |
||
1D-CNN + Bi-LSTM |
99.99 |
|||
Regex |
? |
? |
||
Thai |
Deepcut (CNN + FFNN) |
93.00 |
||
Attacut (3-layer Dilated CNN) |
91.00 |
|||
newmm (Dictionary-based) |
67.00 |
|||
Vietnamese |
SentencePiece + FFNN (XLM-R Base) |
95.22 |
||
1D-CNN + Bi-LSTM |
87.25 |
|||
SCRDR (Rule-based) |
VLSP 2013 |
97.90 |
||
CRF + Regex |
? |
? |
||
CRF |
? |
98.50 |
Sentence Segmentation¶
Language |
Name |
Architecture |
Test Dataset |
F1 (%) |
---|---|---|---|---|
Indonesian |
SentencePiece + FFNN (XLM-R Large) |
95.54 |
||
1D-CNN + Bi-LSTM |
93.78 |
|||
Regex |
? |
? |
||
Thai |
CRFCut (CRF trained on TED dataset) |
ORCHID |
87.00 1 |
|
Vietnamese |
SentencePiece + FFNN (XLM-R Large) |
96.63 |
||
1D-CNN + Bi-LSTM |
93.15 |
|||
? |
? |
? |
- 1
Refer to original CRFCut Github for more details on performance when trained and tested on different datasets.
Part-of-speech Tagging¶
We only support UPOS tagging at the moment for natively trained models.
Language |
Package |
Model Name |
Architecture |
Size |
Dataset |
Accuracy (%) |
F1 (%) |
---|---|---|---|---|---|---|---|
Indonesian |
SEACoreNLP |
pos-id-ud-xlmr-best |
XLM-R (Base) + FFNN |
774.4MB |
93.90 |
||
pos-id-ud-xlmr |
XLM-R (Base) + FFNN |
47KB (FFNN) |
92.44 |
||||
pos-id-ud-indobert |
IndoBERT (Base) + FFNN |
462.1MB |
91.54 |
||||
pos-id-ud-bilstm |
Embeddings (200) + Bi-LSTM |
16.3MB |
90.19 |
||||
XLM-R Base |
Embeddings + Adapters + FFNN |
? |
93.57 |
||||
word2vec/fastText + Bi-LSTM |
17.3MB |
93.40 |
|||||
XLNET |
Transformer Embedding + CRF |
446.6MB |
93.24 |
||||
BERT |
Transformer Embedding + CRF |
426.4MB |
93.18 |
||||
ALXLNET |
Transformer Embedding + CRF |
46.8MB |
92.82 |
||||
Tiny-BERT |
Transformer Embedding + CRF |
57.7MB |
92.70 |
||||
ALBERT |
Transformer Embedding + CRF |
48.7MB |
92.55 |
||||
Tiny-ALBERT |
Transformer Embedding + CRF |
22.4MB |
90.00 |
||||
Thai |
SEACoreNLP |
pos-th-ud-xlmr-best |
XLM-R (Base) + FFNN |
755.8MB |
97.20 |
||
pos-th-ud-xlmr |
XLM-R (Base) + FFNN |
44KB (FFNN) |
92.89 |
||||
pos-th-ud-bilstmcrf |
Embeddings (100) + Bi-LSTM + CRF |
2.1MB |
89.10 |
||||
pos-th-ud-bilstm |
Embeddings (100) + Bi-LSTM |
2.1MB |
88.48 |
||||
Averaged Perceptron |
? |
99.09 |
|||||
Unigram |
? |
93.18 |
|||||
RDRPOSTagger |
RDR (Rule-based) |
? |
93.18 |
||||
Vietnamese |
SEACoreNLP |
pos-vi-ud-xlmr-best |
XLM-R (Base) + FFNN |
755.1MB |
93.07 |
||
pos-vi-ud-xlmr |
XLM-R (Base) + FFNN |
41KB (FFNN) |
91.90 |
||||
pos-vi-ud-phobert |
PhoBERT (Base) + FFNN |
438MB |
92.92 |
||||
pos-vi-ud-bilstm |
Embeddings (256) + Bi-LSTM |
8.4MB |
85.21 |
||||
XLM-R Base |
Embeddings + Adapters + FFNN |
? |
89.70 |
||||
word2vec/fastText + Bi-LSTM |
18.1MB |
79.50 |
|||||
CRF |
2.68MB |
? |
? |
? |
|||
MarMoT |
CRF |
28.3MB |
VLSP 2013 |
95.88 |
Named Entity Recognition¶
Language |
Package |
Model Name |
Architecture |
Size |
Dataset |
F1 (%) |
---|---|---|---|---|---|---|
Indonesian |
SEACoreNLP |
ner-id-nergrit-xlmr-best |
XLM-R (Base) + Bi-LSTM + CRF |
797.7MB |
79.85 |
|
ner-id-nergrit-xlmr |
XLM-R (Base) + Bi-LSTM + CRF |
9.3MB (BiLSTMCRF) |
75.31 |
|||
XLNET |
Transformer Embedding + CRF |
446.6MB |
98.73 |
|||
BERT |
Transformer Embedding + CRF |
425.4MB |
98.54 |
|||
ALXLNET |
Transformer Embedding + CRF |
46.8MB |
98.34 |
|||
ALBERT |
Transformer Embedding + CRF |
48.6MB |
96.49 |
|||
Tiny-BERT |
Transformer Embedding + CRF |
57.7MB |
96.13 |
|||
Tiny-ALBERT |
Transformer Embedding + CRF |
22.4MB |
92.37 |
|||
Thai |
SEACoreNLP |
ner-th-thainer-xlmr-best |
XLM-R (Base) + Bi-LSTM + CRF |
790.8MB |
89.49 |
|
ner-th-thainer-xlmr |
XLM-R (Base) + Bi-LSTM + CRF |
9.4MB (BiLSTMCRF) |
87.07 |
|||
ner-th-thainer-scratch |
Embeddings + Bi-LSTM + CRF |
12.3MB |
80.11 |
|||
ThaiNER 1.3 |
CRF |
? |
87.00 |
|||
WangchanBERTa* |
? |
? |
86.49 |
|||
? |
? |
78.01 |
||||
Vietnamese |
CRF |
172KB |
? |
? |
||
Dynamic Feature Induction |
69.5MB |
VLSP 2016 |
88.55 |
Constituency Parsing¶
Language |
Package |
Model Name |
Architecture |
Size |
Dataset |
F1 (%) |
---|---|---|---|---|---|---|
Indonesian |
SEACoreNLP |
cp-id-kethu-benepar-xlmr-best |
Benepar 2 |
825.9MB |
82.85 4 |
|
cp-id-kethu-xlmr |
AllenNLP 5 |
15.2MB (Classifier layers) |
77.05 |
|||
XLNET |
498.0MB |
83.31 |
||||
BERT |
470.0MB |
80.35 |
||||
ALBERT |
180.0MB |
79.01 |
||||
Tiny-BERT |
125.0MB |
76.79 |
||||
Tiny-ALBERT |
56.7MB |
70.84 |
- 2
This architecture comprises embeddings (XLM-R Base in our case) followed by a variable number of self-attention layers. Refer to the original paper “Multilingual Constituency Parsing with Self-Attention and Pre-Training (2018)” or this Github for more details.
- 3
Credits to Jessica Naraiswari Arwidarasti, Ika Alfina and Dr Adila Alfa Krisnadhi at Universitas Indonesia for the great work in producing this open-source constituency treebank for Indonesian. Please refer to their paper “Converting an Indonesian Constituency Treebank to the Penn Treebank Format (2019)” for more details.
- 4
The previous state-of-the-art benchmark for constituency parsing in Indonesian was an F1 score of 70.90% using StanfordCoreNLP’s Shift-Reduce Parser. Please refer to their original paper “Converting an Indonesian Constituency Treebank to the Penn Treebank Format (2019)” for more details.
- 5
This architecture comprises embeddings (XLM-R Base in our case) followed by a bi-directional LSTM, followed by a bi-directional span extraction layer and ending with a feedforward neural network. Please refer to AllenNLP’s original paper “Extending a Parser to Distant Domains Using a Few Dozen Partially Annotated Examples (2018)” for more details.
Dependency Parsing¶
Language |
Package |
Model Name |
Architecture |
Size |
Dataset |
UAS (%) 7 |
LAS (%) |
---|---|---|---|---|---|---|---|
Indonesian |
SEACoreNLP |
dp-id-ud-xlmr-best |
Bi-LSTM + Deep Biaffine Attention 6 |
841.5MB |
88.10 |
82.23 |
|
dp-id-ud-xlmr |
Bi-LSTM + Deep Biaffine Attention |
67.5MB (Classifier layers) |
86.02 |
80.17 |
|||
dp-id-ud-indobert |
Bi-LSTM + Deep Biaffine Attention |
67.5MB (Classifier layers) |
86.67 |
81.04 |
|||
dp-id-ud-scratch |
Bi-LSTM + Deep Biaffine Attention |
63.3MB |
84.23 |
78.70 |
|||
XLM-R Base |
Embeddings + Adapters + Deep Biaffine Attention |
86.55 |
80.28 |
||||
Bi-LSTM + Deep Biaffine Attention |
95.3MB |
85.17 |
79.19 |
||||
XLNET |
450.2MB |
93.10 |
92.50 |
||||
ALXLNET |
50.0MB |
89.40 |
88.60 |
||||
BERT |
426.0MB |
85.50 |
84.80 |
||||
ALBERT |
50.0MB |
81.10 |
79.30 |
||||
Tiny-BERT |
59.5MB |
71.80 |
69.40 |
||||
Tiny-ALBERT |
24.8MB |
70.80 |
67.30 |
||||
Thai |
SEACoreNLP |
dp-th-ud-xlmr-best |
Bi-LSTM + Deep Biaffine Attention |
823.7MB |
89.74 |
82.30 |
|
dp-th-ud-xlmr |
Bi-LSTM + Deep Biaffine Attention |
67.9MB (Classifier layers) |
88.33 |
82.39 |
|||
dp-th-ud-scratch |
Bi-LSTM + Deep Biaffine Attention |
57.5MB |
81.06 |
73.67 |
|||
4.82MB |
? |
? |
|||||
Vietnamese |
SEACoreNLP |
dp-vi-ud-xlmr-best |
Bi-LSTM + Deep Biaffine Attention |
822.5MB |
77.79 |
71.03 |
|
dp-vi-ud-xlmr |
Bi-LSTM + Deep Biaffine Attention |
67.3MB (Classifier layers) |
77.37 |
73.65 |
|||
dp-vi-ud-scratch |
Bi-LSTM + Deep Biaffine Attention |
57.2MB |
67.56 |
63.96 |
|||
XLM-R Large |
Embeddings + Adapters + Deep Biaffine Attention |
71.07 |
65.37 |
||||
Bi-LSTM + Deep Biaffine Attention |
93.1MB |
53.63 |
48.16 |
||||
Bi-LSTM + Deep Biaffine Attention |
? |
? |
? |
? |
|||
Transition-based Parser |
15.3MB |
VnDT |
79.02 |
73.39 |
- 6
Please refer to Timothy Dozat and Christopher Manning’s original paper “Deep Biaffine Attention for Neural Dependency Parsing (2017)” for more details on this architecture.
- 7
The scores displayed here under UAS and LAS for the Malaya models are reported as Arc Accuracy and Types Accuracy in the official Malaya documentation. We believe that they correspond and have therefore reported them as UAS and LAS in order to standardize the way we report metrics, but it is unclear what the author of the documentation meant exactly by these terms.