####################
Datasets for CoreNLP
####################

This section details the various datasets available for CoreNLP tasks in ASEAN languages.
We have grouped them by task and we also provide links to the relevant repositories where
available.

**********************
Part-of-speech Tagging
**********************

.. csv-table::
   :file: tables/pos-datasets.csv
   :header-rows: 1

************************
Named Entity Recognition
************************

.. csv-table::
   :file: tables/ner-datasets.csv
   :header-rows: 1

********************
Constituency Parsing
********************

.. csv-table::
   :file: tables/cp-datasets.csv
   :header-rows: 1

.. note::
   There are no Thai constituency treebanks (that we are aware of).
   As the Thai language is more amenable to analysis via dependency grammar,
   only dependency treebanks are available at the moment. Shallow parsing/chunking is
   available in many of the open-source Thai datasets if that is of interest
   (e.g. LST20, ThaiNER).

******************
Dependency Parsing
******************

.. csv-table::
   :file: tables/dp-datasets.csv
   :header-rows: 1

**********************
Coreference Resolution
**********************

.. csv-table::
   :file: tables/coref-datasets.csv
   :header-rows: 1

.. note::
   D = Document // P = Paragraph // S = Sentence