Model Performance¶

This page presents a summary of the performance of our natively trained models as well as the third-party models that we integrate into our library.

Note

Certain packages (such as Trankit) have been included for completeness of information in comparing model performance but are not integrated into our library. These have been marked with an asterisk *.

Note

Models from the Malaya package generally cannot be compared directly to other models as the datasets used both for training and testing have been augmented and use variable split ratios.

Furthermore, Malaya models have the option to be quantized which reduces file size and increases loading/inference speed, but causes a dip in performance.

Tokenization¶

Language	Name	Architecture	Test Dataset	F1 (%)
Indonesian	Trankit*	SentencePiece + FFNN (XLM-R Large)	UD-ID-GSD	99.89
	Stanza	1D-CNN + Bi-LSTM	UD-ID-GSD	99.99
	Malaya	Regex	?	?
Thai	PyThaiNLP	Deepcut (CNN + FFNN)	InterBEST	93.00
		Attacut (3-layer Dilated CNN)	InterBEST	91.00
		newmm (Dictionary-based)	InterBEST	67.00
Vietnamese	Trankit*	SentencePiece + FFNN (XLM-R Base)	UD-VI-VTB	95.22
	Stanza	1D-CNN + Bi-LSTM	UD-VI-VTB	87.25
	VnCoreNLP*	SCRDR (Rule-based)	VLSP 2013	97.90
	UnderTheSea	CRF + Regex	?	?
	PyVI*	CRF	?	98.50

Sentence Segmentation¶

Language	Name	Architecture	Test Dataset	F1 (%)
Indonesian	Trankit*	SentencePiece + FFNN (XLM-R Large)	UD-ID-GSD	95.54
	Stanza	1D-CNN + Bi-LSTM	UD-ID-GSD	93.78
	Malaya	Regex	?	?
Thai	PyThaiNLP	CRFCut (CRF trained on TED dataset)	ORCHID	87.00 1
Vietnamese	Trankit*	SentencePiece + FFNN (XLM-R Large)	UD-VI-VTB	96.63
	Stanza	1D-CNN + Bi-LSTM	UD-VI-VTB	93.15
	UnderTheSea	?	?	?

1: Refer to original CRFCut Github for more details on performance when trained and tested on different datasets.

Part-of-speech Tagging¶

We only support UPOS tagging at the moment for natively trained models.

Language	Package	Model Name	Architecture	Size	Dataset	Accuracy (%)	F1 (%)
Indonesian	SEACoreNLP	pos-id-ud-xlmr-best	XLM-R (Base) + FFNN	774.4MB	UD-ID-GSD	93.90
		pos-id-ud-xlmr	XLM-R (Base) + FFNN	47KB (FFNN)	UD-ID-GSD	92.44
		pos-id-ud-indobert	IndoBERT (Base) + FFNN	462.1MB	UD-ID-GSD	91.54
		pos-id-ud-bilstm	Embeddings (200) + Bi-LSTM	16.3MB	UD-ID-GSD	90.19
	Trankit*	XLM-R Base	Embeddings + Adapters + FFNN	?	UD-ID-GSD		93.57
	Stanza		word2vec/fastText + Bi-LSTM	17.3MB	UD-ID-GSD		93.40
	Malaya	XLNET	Transformer Embedding + CRF	446.6MB	UD-ID-GSD		93.24
		BERT	Transformer Embedding + CRF	426.4MB	UD-ID-GSD		93.18
		ALXLNET	Transformer Embedding + CRF	46.8MB	UD-ID-GSD		92.82
		Tiny-BERT	Transformer Embedding + CRF	57.7MB	UD-ID-GSD		92.70
		ALBERT	Transformer Embedding + CRF	48.7MB	UD-ID-GSD		92.55
		Tiny-ALBERT	Transformer Embedding + CRF	22.4MB	UD-ID-GSD		90.00
Thai	SEACoreNLP	pos-th-ud-xlmr-best	XLM-R (Base) + FFNN	755.8MB	UD-TH-PUD	97.20
		pos-th-ud-xlmr	XLM-R (Base) + FFNN	44KB (FFNN)	UD-TH-PUD	92.89
		pos-th-ud-bilstmcrf	Embeddings (100) + Bi-LSTM + CRF	2.1MB	UD-TH-PUD	89.10
		pos-th-ud-bilstm	Embeddings (100) + Bi-LSTM	2.1MB	UD-TH-PUD	88.48
	PyThaiNLP	Averaged Perceptron		?	UD-TH-PUD	99.09
		Unigram		?	UD-TH-PUD	93.18
		RDRPOSTagger	RDR (Rule-based)	?	UD-TH-PUD	93.18
Vietnamese	SEACoreNLP	pos-vi-ud-xlmr-best	XLM-R (Base) + FFNN	755.1MB	UD-VI-VTB	93.07
		pos-vi-ud-xlmr	XLM-R (Base) + FFNN	41KB (FFNN)	UD-VI-VTB	91.90
		pos-vi-ud-phobert	PhoBERT (Base) + FFNN	438MB	UD-VI-VTB	92.92
		pos-vi-ud-bilstm	Embeddings (256) + Bi-LSTM	8.4MB	UD-VI-VTB	85.21
	Trankit*	XLM-R Base	Embeddings + Adapters + FFNN	?	UD-VI-VTB		89.70
	Stanza		word2vec/fastText + Bi-LSTM	18.1MB	UD-VI-VTB		79.50
	UnderTheSea		CRF	2.68MB	?	?	?
	VnCoreNLP*	MarMoT	CRF	28.3MB	VLSP 2013	95.88

Named Entity Recognition¶

Language	Package	Model Name	Architecture	Size	Dataset	F1 (%)
Indonesian	SEACoreNLP	ner-id-nergrit-xlmr-best	XLM-R (Base) + Bi-LSTM + CRF	797.7MB	NERGrit	79.85
		ner-id-nergrit-xlmr	XLM-R (Base) + Bi-LSTM + CRF	9.3MB (BiLSTMCRF)	NERGrit	75.31
	Malaya	XLNET	Transformer Embedding + CRF	446.6MB	Malaya	98.73
		BERT	Transformer Embedding + CRF	425.4MB	Malaya	98.54
		ALXLNET	Transformer Embedding + CRF	46.8MB	Malaya	98.34
		ALBERT	Transformer Embedding + CRF	48.6MB	Malaya	96.49
		Tiny-BERT	Transformer Embedding + CRF	57.7MB	Malaya	96.13
		Tiny-ALBERT	Transformer Embedding + CRF	22.4MB	Malaya	92.37
Thai	SEACoreNLP	ner-th-thainer-xlmr-best	XLM-R (Base) + Bi-LSTM + CRF	790.8MB	ThaiNER 1.3	89.49
		ner-th-thainer-xlmr	XLM-R (Base) + Bi-LSTM + CRF	9.4MB (BiLSTMCRF)	ThaiNER 1.3	87.07
		ner-th-thainer-scratch	Embeddings + Bi-LSTM + CRF	12.3MB	ThaiNER 1.3	80.11
	PyThaiNLP	ThaiNER 1.3	CRF	?	ThaiNER 1.3	87.00
		WangchanBERTa*	?	?	ThaiNER 1.3	86.49
			?	?	LST20	78.01
Vietnamese	UnderTheSea		CRF	172KB	?	?
	VnCoreNLP*		Dynamic Feature Induction	69.5MB	VLSP 2016	88.55

Constituency Parsing¶

Language	Package	Model Name	Architecture	Size	Dataset	F1 (%)
Indonesian	SEACoreNLP	cp-id-kethu-benepar-xlmr-best	Benepar 2	825.9MB	Kethu 3	82.85 4
		cp-id-kethu-xlmr	AllenNLP 5	15.2MB (Classifier layers)	Kethu	77.05
	Malaya	XLNET		498.0MB	Augmented Kethu	83.31
		BERT		470.0MB	Augmented Kethu	80.35
		ALBERT		180.0MB	Augmented Kethu	79.01
		Tiny-BERT		125.0MB	Augmented Kethu	76.79
		Tiny-ALBERT		56.7MB	Augmented Kethu	70.84

2: This architecture comprises embeddings (XLM-R Base in our case) followed by a variable number of self-attention layers. Refer to the original paper “Multilingual Constituency Parsing with Self-Attention and Pre-Training (2018)” or this Github for more details.
3: Credits to Jessica Naraiswari Arwidarasti, Ika Alfina and Dr Adila Alfa Krisnadhi at Universitas Indonesia for the great work in producing this open-source constituency treebank for Indonesian. Please refer to their paper “Converting an Indonesian Constituency Treebank to the Penn Treebank Format (2019)” for more details.
4: The previous state-of-the-art benchmark for constituency parsing in Indonesian was an F1 score of 70.90% using StanfordCoreNLP’s Shift-Reduce Parser. Please refer to their original paper “Converting an Indonesian Constituency Treebank to the Penn Treebank Format (2019)” for more details.
5: This architecture comprises embeddings (XLM-R Base in our case) followed by a bi-directional LSTM, followed by a bi-directional span extraction layer and ending with a feedforward neural network. Please refer to AllenNLP’s original paper “Extending a Parser to Distant Domains Using a Few Dozen Partially Annotated Examples (2018)” for more details.

Dependency Parsing¶

Language	Package	Model Name	Architecture	Size	Dataset	UAS (%) 7	LAS (%)
Indonesian	SEACoreNLP	dp-id-ud-xlmr-best	Bi-LSTM + Deep Biaffine Attention 6	841.5MB	UD-ID-GSD	88.10	82.23
		dp-id-ud-xlmr	Bi-LSTM + Deep Biaffine Attention	67.5MB (Classifier layers)	UD-ID-GSD	86.02	80.17
		dp-id-ud-indobert	Bi-LSTM + Deep Biaffine Attention	67.5MB (Classifier layers)	UD-ID-GSD	86.67	81.04
		dp-id-ud-scratch	Bi-LSTM + Deep Biaffine Attention	63.3MB	UD-ID-GSD	84.23	78.70
	Trankit*	XLM-R Base	Embeddings + Adapters + Deep Biaffine Attention		UD-ID-GSD	86.55	80.28
	Stanza		Bi-LSTM + Deep Biaffine Attention	95.3MB	UD-ID-GSD	85.17	79.19
	Malaya	XLNET		450.2MB	Augmented UD	93.10	92.50
		ALXLNET		50.0MB	Augmented UD	89.40	88.60
		BERT		426.0MB	Augmented UD	85.50	84.80
		ALBERT		50.0MB	Augmented UD	81.10	79.30
		Tiny-BERT		59.5MB	Augmented UD	71.80	69.40
		Tiny-ALBERT		24.8MB	Augmented UD	70.80	67.30
Thai	SEACoreNLP	dp-th-ud-xlmr-best	Bi-LSTM + Deep Biaffine Attention	823.7MB	UD-TH-PUD	89.74	82.30
		dp-th-ud-xlmr	Bi-LSTM + Deep Biaffine Attention	67.9MB (Classifier layers)	UD-TH-PUD	88.33	82.39
		dp-th-ud-scratch	Bi-LSTM + Deep Biaffine Attention	57.5MB	UD-TH-PUD	81.06	73.67
	spaCy-Thai		UDPipe	4.82MB	UD-TH-PUD	?	?
Vietnamese	SEACoreNLP	dp-vi-ud-xlmr-best	Bi-LSTM + Deep Biaffine Attention	822.5MB	UD-VI-VTB	77.79	71.03
		dp-vi-ud-xlmr	Bi-LSTM + Deep Biaffine Attention	67.3MB (Classifier layers)	UD-VI-VTB	77.37	73.65
		dp-vi-ud-scratch	Bi-LSTM + Deep Biaffine Attention	57.2MB	UD-VI-VTB	67.56	63.96
	Trankit*	XLM-R Large	Embeddings + Adapters + Deep Biaffine Attention		UD-VI-VTB	71.07	65.37
	Stanza		Bi-LSTM + Deep Biaffine Attention	93.1MB	UD-VI-VTB	53.63	48.16
	UnderTheSea		Bi-LSTM + Deep Biaffine Attention	?	?	?	?
	VnCoreNLP*		Transition-based Parser	15.3MB	VnDT	79.02	73.39

6: Please refer to Timothy Dozat and Christopher Manning’s original paper “Deep Biaffine Attention for Neural Dependency Parsing (2017)” for more details on this architecture.
7: The scores displayed here under UAS and LAS for the Malaya models are reported as Arc Accuracy and Types Accuracy in the official Malaya documentation. We believe that they correspond and have therefore reported them as UAS and LAS in order to standardize the way we report metrics, but it is unclear what the author of the documentation meant exactly by these terms.