Corpus Tagsets

This section explains the tagsets used in various corpora. We have grouped them by task and we also provide links to relevant sources where applicable.

Part-of-speech Tagging

Universal Dependencies UPOS

Further details on the definition of these POS tags can be found on the Universal Dependencies website.

Tag

POS

ADJ

Adjective

ADP

Adposition

ADV

Adverb

AUX

Auxiliary

CCONJ

Coordinating Conjunction

DET

Determiner

INTJ

Interjection

NOUN

Noun

NUM

Numeral

PART

Particle

PRON

Pronoun

PROPN

Proper Noun

PUNCT

Punctuation

SCONJ

Subordinating Conjunction

SYM

Symbol

VERB

Verb

X

Other

Thai - ORCHID Corpus XPOS

The following definitions were extracted from the original paper ORCHID: Thai Part-Of-Speech Tagged Corpus published in 2009.

Tag

POS

NPRP

Proper Noun

NLBL

Label Noun

NTTL

Title Noun

NCMN

Common Noun

PPRS

Personal Pronoun

PDMN

Demonstrative Pronoun

PNTR

Interrogative Pronoun

PREL

Relative Pronoun

VACT

Active Verb

VSTA

Stative Verb

VATT

Attributive Verb

XVBM

Pre-Verb Auxiliary (Before Negator)

XVAM

Pre-Verb Auxiliary (After Negator)

XVMM

Pre-Verb Auxiliary (Before/After Negator)

XVBB

Pre-Verb Auxiliary (Imperative)

XVAE

Post-Verb Auxiliary

NCNM

Cardinal Number

NONM

Ordinal Number

DDAN

Definite Determiner (No Classifier Between)

DDAC

Definite Determiner (Classifier Between)

DDBQ

Definite Determiner (Before Quantitative Expression)

DDAQ

Definite Determiner (After Quantitative Expression)

DIAC

Indefinite Determiner (Before Quantitative Expression)

DIBQ

Indefinite Determiner (After Quantitative Expression)

DCNM

Determiner (Cardinal Number)

DONM

Determiner (Ordinal Number)

ADVN

Adverb

ADVI

Adverb (Iterative)

ADVP

Adverb (Prefixed)

ADVS

Adverb (Sentential)

CNIT

Unit Classifier

CLTV

Collective Classifier

CMTR

Meaasurement Classifier

CVBL

Verbal Classifier

JCRG

Coordinating Conjunction

JCMP

Comparative Conjunction

JSBR

Subordinating Conjunction

RPRE

Preposition

INT

Interjection

FIXN

Nominal Prefix

FIXV

Adverbial Prefix

EAFF

Ending (Affirmative)

EITT

Ending (Interrogative)

NEG

Negator

PUNC

Punctuation

Vietnamese XPOS (Underthesea)

The following is the XPOS tagset used by the underthesea package for their POS Tagger. While it is not stated explicitly what corpus their model was trained on, we managed to extract the labels from their model.

XPOS tagsets seem to vary from paper to paper although there are many similarities. None of the papers had the exact tagset used by underthesea model. Therefore, we decided to synthesize the tagset information ourselves by combing a selection of such papers relating to Vietnamese treebanks.

Some of the papers included are:

  1. Utilizing State-of-the-art Parsers to Diagnose Problems in Treebank Annotation for a Less Resourced Language

  2. From Treebank Conversion to Automatic Dependency Parsing for Vietnamese

Tag

POS

N

Noun

Np

Proper Noun

Nc

Classifier Noun

Nu

Unit Noun

Ny

Abbreviated Noun

P

Pronoun

V

Verb

Vy

Abbreviated Verb

A

Adjective

R

Adverb

L

Determiner

M

Numeral

E

Preposition

C

Conjunction

I

Interjection

T

Auxiliary/ Modal Words

Z

Bound Morphemes

FW

Foreign Word

X

Unknown

CH

Punctuation

Indonesian - ICON Treebank XPOS

The following is the XPOS tagset used in the ICON Constituency Treebank.

Tag

POS

NNO

Noun

NNP

Proper noun

PRN

Pronoun

PRK

Clitic pronoun

PRR

Relative pronoun

PRI

Interrogative pronoun

VBI

Intransitive verb

VBT

Transitive verb

VBP

Passive verb

VBL

Linking verb (copula)

TAME

Tense, Aspect, Modality, Evidentiality marker

CCN

Coordinating conjunction

CSN

Subordinating conjunction

PPO

Preposition

ADJ

Adjective

ADV

Adverb

NEG

Negation

NUM

Numeric value

KUA

Quantifier

ART

Article

PAR

Particle

INT

Interjection

SYM

Symbol

PUN

Punctuation

Constituency Parsing

Please refer to the Penn Treebank Bracketing Guidelines for more information on the constituent tagsets.

Indonesian - ICON Treebank Constituents

The following is the constituent tagset used in the ICON Constituency Treebank.

Tag

Definition

S

Main clause + Complete clause with final intonation

SINV

Inverted clause

CP

All types of complementizer phrases and clauses

RPN

Relative clause

SBARQ

Complete interrogative clause

SQ

Yes-or-no question

ADJP

Adjectival phrase

WHADJP

Adjectival phrase consisting of wh-premodifier and head is an adjective

ADVP

Adverbial phrase

WHADVP

Wh-adverbial phrase

NP

Noun phrase

WHNP

Wh-noun phrase

PP

Prepositional phrase

WHPP

Wh-prepositional phrase

VP

Verb phrase

CONJP

Conjunction spanning more than a single word

UCP

Unlike coordinated phrase

QP

Quantifier phrase

PNT

Parenthetical

INTJ

Interjection

FRAG

Fragmented sentence

Dependency Parsing

Please refer to the Universal Dependencies website for more details on dependency relation tags.