#################### Packages for CoreNLP #################### In this section, we have consolidated (mostly Python) packages that are useful for core NLP tasks in Southeast Asian languages. ************************************ |:earth_asia:| Multilingual Packages ************************************ .. csv-table:: :file: tables/multilingual-packages.csv :header-rows: 1 **Legend** id: Indonesian | ms: Malay | ta: Tamil | th: Thai | vi: Vietnamese .. note:: Trankit and Stanza are both trained on the Universal Dependencies v2.5 datasets and therefore cover the same languages and tasks. Trankit has overall better performance than Stanza (see respective model performances on their websites). Polyglot does not cover all tasks for all the languages shown above. Please check their documentation to see which languages are supported for each task. ******************** Monolingual Packages ******************** .. csv-table:: :file: tables/monolingual-packages.csv :header-rows: 1 **Legend** tk: Tokenization | ss: Sentence Segmentation | pos: Part-of-speech Tagging ner: Named Entity Recognition | cp: Constituency Parsing | dp: Dependency Parsing