Introduction
Usage
Resources
This section lists out some of the literature that we consulted in our work in one way or another.
Qi, P., Zhang, Y., Zhang, Y., Bolton, J., & Manning, C.D. (2020). Stanza: A Python Natural Language Processing Toolkit for Many Human Languages. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, July 2020 (pp. 101-108).
Nguyen, M.V., Lai, V.D., Veyseh, A.P.B., & Nguyen, T.H. (2021). Trankit: A Light-Weight Transformer-based Toolkit for Multilingual Natural Language Processing. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, April 2021 (pp. 80-90).
Vu, T., Nguyen, D.Q., Nguyen, D.Q., Dras, M., & Johnson, M. (2018). VnCoreNLP: A Vietnamese Natural Language Processing Toolkit. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, June 2018 (pp. 56-60).
Chormai, P., Prasertsom P., & Rutherford, A.T. (2019). AttaCut: A Fast and Accurate Neural Thai Word Segmenter. ArXiv, abs/1911.07056.
Maung, Z.M., & Mikami, Y. (2008). A Rule-based Syllable Segmentation of Myanmar Text. Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages, January 2008 (pp. 51-58)
Kitaev, N., Cao, S., & Klein, D. (2019). Multilingual Constituency Parsing with Self-Attention and Pre-Training. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, January 2019 (pp. 3499–3505).
Joshi, V., Peters, M., & Hopkins, M. (2018). Extending a Parser to Distant Domains Using a Few Dozen Partially Annotated Examples. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), July 2018 (pp. 1190-1199).
Dozat, T., & Manning, C.D. (2017). Deep Biaffine Attention for Neural Dependency Parsing. ArXiv, abs/1611.01734.
Straka, M., Straková, J., & Hajic, J. (2019). UDPipe at SIGMORPHON 2019: Contextualized Embeddings, Regularization with Morphological Categories, Corpora Merging. Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology, August 2019 (pp. 95–103).
Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave, E., Ott, M., Zettlemoyer, L., & Stoyanov, V. (2020). Unsupervised Cross-lingual Representation Learning at Scale. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, July 2020 (pp. 8440-8451).
Wilie, B., Vincentio, K., Winata, G.I., Cahyawijaya, S., Li, X., Lim, Z.Y., Soleman, S., Mahendra, R., Fung, P., Bahar, S., & Purwarianti, A. (2020). IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding. Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, December 2020 (pp. 843-857).
Lowphansirikul, L., Polpanumas, C., Jantrakulchai, N., & Nutanong, S. (2021). WangchanBERTa: Pretraining transformer-based Thai Language Models. ArXiv, abs/2101.09635.
Nguyen, D.Q., & Nguyen, A.T. (2020) PhoBERT: Pre-trained language models for Vietnamese. Findings of the Association for Computational Linguistics: EMNLP 2020, November 2020 (pp. 1037-1042).
Arwidarasti, J. N., Alfina, I., & Krisnadhi, A. A. (2019). Converting an Indonesian Constituency Treebank to the Penn Treebank Format. 2019 International Conference on Asian Language Processing (IALP), November 2019 (pp. 331-336)
Thu, Y.K., Pa, W.P., Utiyama, M., Finch, A., & Sumita, E. (2016). Introducing the Asian Language Treebank (ALT). Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), May 2016 (pp. 1574-1578)
Nguyen, Q., Nguyen, N., & Miyao, Y. (2013). Utilizing State-of-the-art Parsers to Diagnose Problems in Treebank Annotation for a Less Resourced Language. Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, August 2013 (pp. 19-27).
Nguyen, D. Q., Nguyen, D. Q., Pham, S. B., Nguyen, P. T., & Nguyen, M. L. (2014). From Treebank Conversion to Automatic Dependency Parsing for Vietnamese. Proceedings of 19th International Conference on Application of Natural Language to Information Systems, June 2014 (pp. 196–207).
Nguyen, P.T., Vu, X.L., Nguyen, T.M.H., Nguyen, V.H., & Le, H.P. (2009). Building a Large Syntactically-Annotated Corpus of Vietnamese. Proceedings of the Third Linguistic Annotation Workshop (LAW III), August 2009 (pp. 182-185)
Jwalapuram, P., Joty, S., & Shen, S. (2020). Pronoun-Targeted Fine-tuning for NMT with Hybrid Losses. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, November 2020 (pp. 2267-2279).
Mohiuddin, T., Bari, M. S., & Joty, S (2020). LNMap: Departures from Isomorphic Assumption in Bilingual Lexicon Induction Through Non-Linear Mapping in Latent Space. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, November 2020 (pp. 2712-2723).