Model Performance

This page presents a summary of the performance of our natively trained models as well as the third-party models that we integrate into our library.

Note

Certain packages (such as Trankit) have been included for completeness of information in comparing model performance but are not integrated into our library. These have been marked with an asterisk *.

Note

Models from the Malaya package generally cannot be compared directly to other models as the datasets used both for training and testing have been augmented and use variable split ratios.

Furthermore, Malaya models have the option to be quantized which reduces file size and increases loading/inference speed, but causes a dip in performance.

Tokenization

Language

Name

Architecture

Test Dataset

F1 (%)

Indonesian

Trankit*

SentencePiece + FFNN (XLM-R Large)

UD-ID-GSD

99.89

Stanza

1D-CNN + Bi-LSTM

UD-ID-GSD

99.99

Malaya

Regex

?

?

Thai

PyThaiNLP

Deepcut (CNN + FFNN)

InterBEST

93.00

Attacut (3-layer Dilated CNN)

InterBEST

91.00

newmm (Dictionary-based)

InterBEST

67.00

Vietnamese

Trankit*

SentencePiece + FFNN (XLM-R Base)

UD-VI-VTB

95.22

Stanza

1D-CNN + Bi-LSTM

UD-VI-VTB

87.25

VnCoreNLP*

SCRDR (Rule-based)

VLSP 2013

97.90

UnderTheSea

CRF + Regex

?

?

PyVI*

CRF

?

98.50

Sentence Segmentation

Language

Name

Architecture

Test Dataset

F1 (%)

Indonesian

Trankit*

SentencePiece + FFNN (XLM-R Large)

UD-ID-GSD

95.54

Stanza

1D-CNN + Bi-LSTM

UD-ID-GSD

93.78

Malaya

Regex

?

?

Thai

PyThaiNLP

CRFCut (CRF trained on TED dataset)

ORCHID

87.00 1

Vietnamese

Trankit*

SentencePiece + FFNN (XLM-R Large)

UD-VI-VTB

96.63

Stanza

1D-CNN + Bi-LSTM

UD-VI-VTB

93.15

UnderTheSea

?

?

?

1

Refer to original CRFCut Github for more details on performance when trained and tested on different datasets.

Part-of-speech Tagging

We only support UPOS tagging at the moment for natively trained models.

Language

Package

Model Name

Architecture

Size

Dataset

Accuracy (%)

F1 (%)

Indonesian

SEACoreNLP

pos-id-ud-xlmr-best

XLM-R (Base) + FFNN

774.4MB

UD-ID-GSD

93.90

pos-id-ud-xlmr

XLM-R (Base) + FFNN

47KB (FFNN)

UD-ID-GSD

92.44

pos-id-ud-indobert

IndoBERT (Base) + FFNN

462.1MB

UD-ID-GSD

91.54

pos-id-ud-bilstm

Embeddings (200) + Bi-LSTM

16.3MB

UD-ID-GSD

90.19

Trankit*

XLM-R Base

Embeddings + Adapters + FFNN

?

UD-ID-GSD

93.57

Stanza

word2vec/fastText + Bi-LSTM

17.3MB

UD-ID-GSD

93.40

Malaya

XLNET

Transformer Embedding + CRF

446.6MB

UD-ID-GSD

93.24

BERT

Transformer Embedding + CRF

426.4MB

UD-ID-GSD

93.18

ALXLNET

Transformer Embedding + CRF

46.8MB

UD-ID-GSD

92.82

Tiny-BERT

Transformer Embedding + CRF

57.7MB

UD-ID-GSD

92.70

ALBERT

Transformer Embedding + CRF

48.7MB

UD-ID-GSD

92.55

Tiny-ALBERT

Transformer Embedding + CRF

22.4MB

UD-ID-GSD

90.00

Thai

SEACoreNLP

pos-th-ud-xlmr-best

XLM-R (Base) + FFNN

755.8MB

UD-TH-PUD

97.20

pos-th-ud-xlmr

XLM-R (Base) + FFNN

44KB (FFNN)

UD-TH-PUD

92.89

pos-th-ud-bilstmcrf

Embeddings (100) + Bi-LSTM + CRF

2.1MB

UD-TH-PUD

89.10

pos-th-ud-bilstm

Embeddings (100) + Bi-LSTM

2.1MB

UD-TH-PUD

88.48

PyThaiNLP

Averaged Perceptron

?

UD-TH-PUD

99.09

Unigram

?

UD-TH-PUD

93.18

RDRPOSTagger

RDR (Rule-based)

?

UD-TH-PUD

93.18

Vietnamese

SEACoreNLP

pos-vi-ud-xlmr-best

XLM-R (Base) + FFNN

755.1MB

UD-VI-VTB

93.07

pos-vi-ud-xlmr

XLM-R (Base) + FFNN

41KB (FFNN)

UD-VI-VTB

91.90

pos-vi-ud-phobert

PhoBERT (Base) + FFNN

438MB

UD-VI-VTB

92.92

pos-vi-ud-bilstm

Embeddings (256) + Bi-LSTM

8.4MB

UD-VI-VTB

85.21

Trankit*

XLM-R Base

Embeddings + Adapters + FFNN

?

UD-VI-VTB

89.70

Stanza

word2vec/fastText + Bi-LSTM

18.1MB

UD-VI-VTB

79.50

UnderTheSea

CRF

2.68MB

?

?

?

VnCoreNLP*

MarMoT

CRF

28.3MB

VLSP 2013

95.88

Named Entity Recognition

Language

Package

Model Name

Architecture

Size

Dataset

F1 (%)

Indonesian

SEACoreNLP

ner-id-nergrit-xlmr-best

XLM-R (Base) + Bi-LSTM + CRF

797.7MB

NERGrit

79.85

ner-id-nergrit-xlmr

XLM-R (Base) + Bi-LSTM + CRF

9.3MB (BiLSTMCRF)

NERGrit

75.31

Malaya

XLNET

Transformer Embedding + CRF

446.6MB

Malaya

98.73

BERT

Transformer Embedding + CRF

425.4MB

Malaya

98.54

ALXLNET

Transformer Embedding + CRF

46.8MB

Malaya

98.34

ALBERT

Transformer Embedding + CRF

48.6MB

Malaya

96.49

Tiny-BERT

Transformer Embedding + CRF

57.7MB

Malaya

96.13

Tiny-ALBERT

Transformer Embedding + CRF

22.4MB

Malaya

92.37

Thai

SEACoreNLP

ner-th-thainer-xlmr-best

XLM-R (Base) + Bi-LSTM + CRF

790.8MB

ThaiNER 1.3

89.49

ner-th-thainer-xlmr

XLM-R (Base) + Bi-LSTM + CRF

9.4MB (BiLSTMCRF)

ThaiNER 1.3

87.07

ner-th-thainer-scratch

Embeddings + Bi-LSTM + CRF

12.3MB

ThaiNER 1.3

80.11

PyThaiNLP

ThaiNER 1.3

CRF

?

ThaiNER 1.3

87.00

WangchanBERTa*

?

?

ThaiNER 1.3

86.49

?

?

LST20

78.01

Vietnamese

UnderTheSea

CRF

172KB

?

?

VnCoreNLP*

Dynamic Feature Induction

69.5MB

VLSP 2016

88.55

Constituency Parsing

Language

Package

Model Name

Architecture

Size

Dataset

F1 (%)

Indonesian

SEACoreNLP

cp-id-kethu-benepar-xlmr-best

Benepar 2

825.9MB

Kethu 3

82.85 4

cp-id-kethu-xlmr

AllenNLP 5

15.2MB (Classifier layers)

Kethu

77.05

Malaya

XLNET

498.0MB

Augmented Kethu

83.31

BERT

470.0MB

Augmented Kethu

80.35

ALBERT

180.0MB

Augmented Kethu

79.01

Tiny-BERT

125.0MB

Augmented Kethu

76.79

Tiny-ALBERT

56.7MB

Augmented Kethu

70.84

2

This architecture comprises embeddings (XLM-R Base in our case) followed by a variable number of self-attention layers. Refer to the original paper “Multilingual Constituency Parsing with Self-Attention and Pre-Training (2018)” or this Github for more details.

3

Credits to Jessica Naraiswari Arwidarasti, Ika Alfina and Dr Adila Alfa Krisnadhi at Universitas Indonesia for the great work in producing this open-source constituency treebank for Indonesian. Please refer to their paper “Converting an Indonesian Constituency Treebank to the Penn Treebank Format (2019)” for more details.

4

The previous state-of-the-art benchmark for constituency parsing in Indonesian was an F1 score of 70.90% using StanfordCoreNLP’s Shift-Reduce Parser. Please refer to their original paper “Converting an Indonesian Constituency Treebank to the Penn Treebank Format (2019)” for more details.

5

This architecture comprises embeddings (XLM-R Base in our case) followed by a bi-directional LSTM, followed by a bi-directional span extraction layer and ending with a feedforward neural network. Please refer to AllenNLP’s original paper “Extending a Parser to Distant Domains Using a Few Dozen Partially Annotated Examples (2018)” for more details.

Dependency Parsing

Language

Package

Model Name

Architecture

Size

Dataset

UAS (%) 7

LAS (%)

Indonesian

SEACoreNLP

dp-id-ud-xlmr-best

Bi-LSTM + Deep Biaffine Attention 6

841.5MB

UD-ID-GSD

88.10

82.23

dp-id-ud-xlmr

Bi-LSTM + Deep Biaffine Attention

67.5MB (Classifier layers)

UD-ID-GSD

86.02

80.17

dp-id-ud-indobert

Bi-LSTM + Deep Biaffine Attention

67.5MB (Classifier layers)

UD-ID-GSD

86.67

81.04

dp-id-ud-scratch

Bi-LSTM + Deep Biaffine Attention

63.3MB

UD-ID-GSD

84.23

78.70

Trankit*

XLM-R Base

Embeddings + Adapters + Deep Biaffine Attention

UD-ID-GSD

86.55

80.28

Stanza

Bi-LSTM + Deep Biaffine Attention

95.3MB

UD-ID-GSD

85.17

79.19

Malaya

XLNET

450.2MB

Augmented UD

93.10

92.50

ALXLNET

50.0MB

Augmented UD

89.40

88.60

BERT

426.0MB

Augmented UD

85.50

84.80

ALBERT

50.0MB

Augmented UD

81.10

79.30

Tiny-BERT

59.5MB

Augmented UD

71.80

69.40

Tiny-ALBERT

24.8MB

Augmented UD

70.80

67.30

Thai

SEACoreNLP

dp-th-ud-xlmr-best

Bi-LSTM + Deep Biaffine Attention

823.7MB

UD-TH-PUD

89.74

82.30

dp-th-ud-xlmr

Bi-LSTM + Deep Biaffine Attention

67.9MB (Classifier layers)

UD-TH-PUD

88.33

82.39

dp-th-ud-scratch

Bi-LSTM + Deep Biaffine Attention

57.5MB

UD-TH-PUD

81.06

73.67

spaCy-Thai

UDPipe

4.82MB

UD-TH-PUD

?

?

Vietnamese

SEACoreNLP

dp-vi-ud-xlmr-best

Bi-LSTM + Deep Biaffine Attention

822.5MB

UD-VI-VTB

77.79

71.03

dp-vi-ud-xlmr

Bi-LSTM + Deep Biaffine Attention

67.3MB (Classifier layers)

UD-VI-VTB

77.37

73.65

dp-vi-ud-scratch

Bi-LSTM + Deep Biaffine Attention

57.2MB

UD-VI-VTB

67.56

63.96

Trankit*

XLM-R Large

Embeddings + Adapters + Deep Biaffine Attention

UD-VI-VTB

71.07

65.37

Stanza

Bi-LSTM + Deep Biaffine Attention

93.1MB

UD-VI-VTB

53.63

48.16

UnderTheSea

Bi-LSTM + Deep Biaffine Attention

?

?

?

?

VnCoreNLP*

Transition-based Parser

15.3MB

VnDT

79.02

73.39

6

Please refer to Timothy Dozat and Christopher Manning’s original paper “Deep Biaffine Attention for Neural Dependency Parsing (2017)” for more details on this architecture.

7

The scores displayed here under UAS and LAS for the Malaya models are reported as Arc Accuracy and Types Accuracy in the official Malaya documentation. We believe that they correspond and have therefore reported them as UAS and LAS in order to standardize the way we report metrics, but it is unclear what the author of the documentation meant exactly by these terms.