Corpus Tagsets¶
This section explains the tagsets used in various corpora. We have grouped them by task and we also provide links to relevant sources where applicable.
Part-of-speech Tagging¶
Universal Dependencies UPOS¶
Further details on the definition of these POS tags can be found on the Universal Dependencies website.
Tag |
POS |
---|---|
ADJ |
Adjective |
ADP |
Adposition |
ADV |
Adverb |
AUX |
Auxiliary |
CCONJ |
Coordinating Conjunction |
DET |
Determiner |
INTJ |
Interjection |
NOUN |
Noun |
NUM |
Numeral |
PART |
Particle |
PRON |
Pronoun |
PROPN |
Proper Noun |
PUNCT |
Punctuation |
SCONJ |
Subordinating Conjunction |
SYM |
Symbol |
VERB |
Verb |
X |
Other |
Thai - ORCHID Corpus XPOS¶
The following definitions were extracted from the original paper ORCHID: Thai Part-Of-Speech Tagged Corpus published in 2009.
Tag |
POS |
---|---|
NPRP |
Proper Noun |
NLBL |
Label Noun |
NTTL |
Title Noun |
NCMN |
Common Noun |
PPRS |
Personal Pronoun |
PDMN |
Demonstrative Pronoun |
PNTR |
Interrogative Pronoun |
PREL |
Relative Pronoun |
VACT |
Active Verb |
VSTA |
Stative Verb |
VATT |
Attributive Verb |
XVBM |
Pre-Verb Auxiliary (Before Negator) |
XVAM |
Pre-Verb Auxiliary (After Negator) |
XVMM |
Pre-Verb Auxiliary (Before/After Negator) |
XVBB |
Pre-Verb Auxiliary (Imperative) |
XVAE |
Post-Verb Auxiliary |
NCNM |
Cardinal Number |
NONM |
Ordinal Number |
DDAN |
Definite Determiner (No Classifier Between) |
DDAC |
Definite Determiner (Classifier Between) |
DDBQ |
Definite Determiner (Before Quantitative Expression) |
DDAQ |
Definite Determiner (After Quantitative Expression) |
DIAC |
Indefinite Determiner (Before Quantitative Expression) |
DIBQ |
Indefinite Determiner (After Quantitative Expression) |
DCNM |
Determiner (Cardinal Number) |
DONM |
Determiner (Ordinal Number) |
ADVN |
Adverb |
ADVI |
Adverb (Iterative) |
ADVP |
Adverb (Prefixed) |
ADVS |
Adverb (Sentential) |
CNIT |
Unit Classifier |
CLTV |
Collective Classifier |
CMTR |
Meaasurement Classifier |
CVBL |
Verbal Classifier |
JCRG |
Coordinating Conjunction |
JCMP |
Comparative Conjunction |
JSBR |
Subordinating Conjunction |
RPRE |
Preposition |
INT |
Interjection |
FIXN |
Nominal Prefix |
FIXV |
Adverbial Prefix |
EAFF |
Ending (Affirmative) |
EITT |
Ending (Interrogative) |
NEG |
Negator |
PUNC |
Punctuation |
Vietnamese XPOS (Underthesea)¶
The following is the XPOS tagset used by the underthesea
package for their POS Tagger.
While it is not stated explicitly what corpus their model was trained on, we managed to extract the labels
from their model.
XPOS tagsets seem to vary from paper to paper although there are many similarities. None
of the papers had the exact tagset used by underthesea
model. Therefore, we decided to
synthesize the tagset information ourselves by combing a selection of such papers
relating to Vietnamese treebanks.
Some of the papers included are:
Tag |
POS |
---|---|
N |
Noun |
Np |
Proper Noun |
Nc |
Classifier Noun |
Nu |
Unit Noun |
Ny |
Abbreviated Noun |
P |
Pronoun |
V |
Verb |
Vy |
Abbreviated Verb |
A |
Adjective |
R |
Adverb |
L |
Determiner |
M |
Numeral |
E |
Preposition |
C |
Conjunction |
I |
Interjection |
T |
Auxiliary/ Modal Words |
Z |
Bound Morphemes |
FW |
Foreign Word |
X |
Unknown |
CH |
Punctuation |
Indonesian - ICON Treebank XPOS¶
The following is the XPOS tagset used in the ICON Constituency Treebank.
Tag |
POS |
---|---|
NNO |
Noun |
NNP |
Proper noun |
PRN |
Pronoun |
PRK |
Clitic pronoun |
PRR |
Relative pronoun |
PRI |
Interrogative pronoun |
VBI |
Intransitive verb |
VBT |
Transitive verb |
VBP |
Passive verb |
VBL |
Linking verb (copula) |
TAME |
Tense, Aspect, Modality, Evidentiality marker |
CCN |
Coordinating conjunction |
CSN |
Subordinating conjunction |
PPO |
Preposition |
ADJ |
Adjective |
ADV |
Adverb |
NEG |
Negation |
NUM |
Numeric value |
KUA |
Quantifier |
ART |
Article |
PAR |
Particle |
INT |
Interjection |
SYM |
Symbol |
PUN |
Punctuation |
Constituency Parsing¶
Please refer to the Penn Treebank Bracketing Guidelines for more information on the constituent tagsets.
Indonesian - ICON Treebank Constituents¶
The following is the constituent tagset used in the ICON Constituency Treebank.
Tag |
Definition |
---|---|
S |
Main clause + Complete clause with final intonation |
SINV |
Inverted clause |
CP |
All types of complementizer phrases and clauses |
RPN |
Relative clause |
SBARQ |
Complete interrogative clause |
SQ |
Yes-or-no question |
ADJP |
Adjectival phrase |
WHADJP |
Adjectival phrase consisting of wh-premodifier and head is an adjective |
ADVP |
Adverbial phrase |
WHADVP |
Wh-adverbial phrase |
NP |
Noun phrase |
WHNP |
Wh-noun phrase |
PP |
Prepositional phrase |
WHPP |
Wh-prepositional phrase |
VP |
Verb phrase |
CONJP |
Conjunction spanning more than a single word |
UCP |
Unlike coordinated phrase |
QP |
Quantifier phrase |
PNT |
Parenthetical |
INTJ |
Interjection |
FRAG |
Fragmented sentence |
Dependency Parsing¶
Please refer to the Universal Dependencies website for more details on dependency relation tags.