############## Corpus Tagsets ############## This section explains the tagsets used in various corpora. We have grouped them by task and we also provide links to relevant sources where applicable. ********************** Part-of-speech Tagging ********************** Universal Dependencies UPOS =========================== Further details on the definition of these POS tags can be found on the `Universal Dependencies website `_. .. csv-table:: :file: tables/ud-pos.csv :header-rows: 1 Thai - ORCHID Corpus XPOS ========================= The following definitions were extracted from the original paper `ORCHID: Thai Part-Of-Speech Tagged Corpus `_ published in 2009. .. csv-table:: :file: tables/orchid-xpos.csv :header-rows: 1 Vietnamese XPOS (Underthesea) ============================= The following is the XPOS tagset used by the ``underthesea`` package for their POS Tagger. While it is not stated explicitly what corpus their model was trained on, we managed to extract the labels from their model. XPOS tagsets seem to vary from paper to paper although there are many similarities. None of the papers had the exact tagset used by ``underthesea`` model. Therefore, we decided to synthesize the tagset information ourselves by combing a selection of such papers relating to Vietnamese treebanks. Some of the papers included are: 1. `Utilizing State-of-the-art Parsers to Diagnose Problems in Treebank Annotation for a Less Resourced Language `_ 2. `From Treebank Conversion to Automatic Dependency Parsing for Vietnamese `_ .. csv-table:: :file: tables/vi-xpos.csv :header-rows: 1 Indonesian - ICON Treebank XPOS =============================== The following is the XPOS tagset used in the ICON Constituency Treebank. .. csv-table:: :file: tables/icon-xpos.csv :header-rows: 1 ******************** Constituency Parsing ******************** Please refer to the `Penn Treebank Bracketing Guidelines `_ for more information on the constituent tagsets. Indonesian - ICON Treebank Constituents ======================================= The following is the constituent tagset used in the ICON Constituency Treebank. .. csv-table:: :file: tables/icon-cons.csv :header-rows: 1 ****************** Dependency Parsing ****************** Please refer to the `Universal Dependencies website `_ for more details on dependency relation tags.