#################### Reference Literature #################### This section lists out some of the literature that we consulted in our work in one way or another. ***************** CoreNLP Pipelines ***************** 1. `Qi, P., Zhang, Y., Zhang, Y., Bolton, J., & Manning, C.D. (2020). Stanza: A Python Natural Language Processing Toolkit for Many Human Languages. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, July 2020 (pp. 101-108). `_ 2. `Nguyen, M.V., Lai, V.D., Veyseh, A.P.B., & Nguyen, T.H. (2021). Trankit: A Light-Weight Transformer-based Toolkit for Multilingual Natural Language Processing. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, April 2021 (pp. 80-90). `_ 3. `Vu, T., Nguyen, D.Q., Nguyen, D.Q., Dras, M., & Johnson, M. (2018). VnCoreNLP: A Vietnamese Natural Language Processing Toolkit. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, June 2018 (pp. 56-60). `_ ******************* Model Architectures ******************* Segmentation ============ 1. `Chormai, P., Prasertsom P., & Rutherford, A.T. (2019). AttaCut: A Fast and Accurate Neural Thai Word Segmenter. ArXiv, abs/1911.07056. `_ 2. `Maung, Z.M., & Mikami, Y. (2008). A Rule-based Syllable Segmentation of Myanmar Text. Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages, January 2008 (pp. 51-58) `_ Constituency Parsing ==================== 1. `Kitaev, N., Cao, S., & Klein, D. (2019). Multilingual Constituency Parsing with Self-Attention and Pre-Training. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, January 2019 (pp. 3499–3505). `_ 2. `Joshi, V., Peters, M., & Hopkins, M. (2018). Extending a Parser to Distant Domains Using a Few Dozen Partially Annotated Examples. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), July 2018 (pp. 1190-1199). `_ Dependency Parsing ================== 1. `Dozat, T., & Manning, C.D. (2017). Deep Biaffine Attention for Neural Dependency Parsing. ArXiv, abs/1611.01734. `_ 2. `Straka, M., Straková, J., & Hajic, J. (2019). UDPipe at SIGMORPHON 2019: Contextualized Embeddings, Regularization with Morphological Categories, Corpora Merging. Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology, August 2019 (pp. 95–103). `_ ****************** Pre-trained Models ****************** 1. `Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave, E., Ott, M., Zettlemoyer, L., & Stoyanov, V. (2020). Unsupervised Cross-lingual Representation Learning at Scale. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, July 2020 (pp. 8440-8451). `_ 2. `Wilie, B., Vincentio, K., Winata, G.I., Cahyawijaya, S., Li, X., Lim, Z.Y., Soleman, S., Mahendra, R., Fung, P., Bahar, S., & Purwarianti, A. (2020). IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding. Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, December 2020 (pp. 843-857). `_ 3. `Lowphansirikul, L., Polpanumas, C., Jantrakulchai, N., & Nutanong, S. (2021). WangchanBERTa: Pretraining transformer-based Thai Language Models. ArXiv, abs/2101.09635. `_ 4. `Nguyen, D.Q., & Nguyen, A.T. (2020) PhoBERT: Pre-trained language models for Vietnamese. Findings of the Association for Computational Linguistics: EMNLP 2020, November 2020 (pp. 1037-1042). `_ ****************** Datasets / Tagsets ****************** 1. `Arwidarasti, J. N., Alfina, I., & Krisnadhi, A. A. (2019). Converting an Indonesian Constituency Treebank to the Penn Treebank Format. 2019 International Conference on Asian Language Processing (IALP), November 2019 (pp. 331-336) `_ 2. `Thu, Y.K., Pa, W.P., Utiyama, M., Finch, A., & Sumita, E. (2016). Introducing the Asian Language Treebank (ALT). Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), May 2016 (pp. 1574-1578) `_ 3. `Nguyen, Q., Nguyen, N., & Miyao, Y. (2013). Utilizing State-of-the-art Parsers to Diagnose Problems in Treebank Annotation for a Less Resourced Language. Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, August 2013 (pp. 19-27). `_ 4. `Nguyen, D. Q., Nguyen, D. Q., Pham, S. B., Nguyen, P. T., & Nguyen, M. L. (2014). From Treebank Conversion to Automatic Dependency Parsing for Vietnamese. Proceedings of 19th International Conference on Application of Natural Language to Information Systems, June 2014 (pp. 196–207). `_ 5. `Nguyen, P.T., Vu, X.L., Nguyen, T.M.H., Nguyen, V.H., & Le, H.P. (2009). Building a Large Syntactically-Annotated Corpus of Vietnamese. Proceedings of the Third Linguistic Annotation Workshop (LAW III), August 2009 (pp. 182-185) `_ ******* General ******* 1. `Jwalapuram, P., Joty, S., & Shen, S. (2020). Pronoun-Targeted Fine-tuning for NMT with Hybrid Losses. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, November 2020 (pp. 2267-2279). `_ 2. `Mohiuddin, T., Bari, M. S., & Joty, S (2020). LNMap: Departures from Isomorphic Assumption in Bilingual Lexicon Induction Through Non-Linear Mapping in Latent Space. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, November 2020 (pp. 2712-2723). `_