####################
Reference Literature
####################

This section lists out some of the literature that we consulted in our work in one way or another.

*****************
CoreNLP Pipelines
*****************

1. `Qi, P., Zhang, Y., Zhang, Y., Bolton, J., & Manning, C.D. (2020). Stanza: A Python Natural Language Processing Toolkit for Many Human Languages. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, July 2020 (pp. 101-108). <https://aclanthology.org/2020.acl-demos.14.pdf>`_
2. `Nguyen, M.V., Lai, V.D., Veyseh, A.P.B., & Nguyen, T.H. (2021). Trankit: A Light-Weight Transformer-based Toolkit for Multilingual Natural Language Processing. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, April 2021 (pp. 80-90). <https://aclanthology.org/2021.eacl-demos.10/>`_
3. `Vu, T., Nguyen, D.Q., Nguyen, D.Q., Dras, M., & Johnson, M. (2018). VnCoreNLP: A Vietnamese Natural Language Processing Toolkit. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, June 2018 (pp. 56-60). <https://aclanthology.org/N18-5012.pdf>`_

*******************
Model Architectures
*******************

Segmentation
============

1. `Chormai, P., Prasertsom P., & Rutherford, A.T. (2019). AttaCut: A Fast and Accurate Neural Thai Word Segmenter. ArXiv, abs/1911.07056. <https://arxiv.org/pdf/1911.07056.pdf>`_
2. `Maung, Z.M., & Mikami, Y. (2008). A Rule-based Syllable Segmentation of Myanmar Text. Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages, January 2008 (pp. 51-58) <https://aclanthology.org/I08-3010.pdf>`_

Constituency Parsing
====================

1. `Kitaev, N., Cao, S., & Klein, D. (2019). Multilingual Constituency Parsing with Self-Attention and Pre-Training. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, January 2019 (pp. 3499–3505). <https://aclanthology.org/P19-1340.pdf>`_
2. `Joshi, V., Peters, M., & Hopkins, M. (2018). Extending a Parser to Distant Domains Using a Few Dozen Partially Annotated Examples. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), July 2018 (pp. 1190-1199). <https://aclanthology.org/P18-1110.pdf>`_

Dependency Parsing
==================

1. `Dozat, T., & Manning, C.D. (2017). Deep Biaffine Attention for Neural Dependency Parsing. ArXiv, abs/1611.01734. <https://arxiv.org/pdf/1611.01734.pdf>`_
2. `Straka, M., Straková, J., & Hajic, J. (2019). UDPipe at SIGMORPHON 2019: Contextualized Embeddings, Regularization with Morphological Categories, Corpora Merging. Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology, August 2019 (pp. 95–103). <https://aclanthology.org/W19-4212/>`_

******************
Pre-trained Models
******************

1. `Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave, E., Ott, M., Zettlemoyer, L., & Stoyanov, V. (2020). Unsupervised Cross-lingual Representation Learning at Scale. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, July 2020 (pp. 8440-8451). <https://aclanthology.org/2020.acl-main.747.pdf>`_
2. `Wilie, B., Vincentio, K., Winata, G.I., Cahyawijaya, S., Li, X., Lim, Z.Y., Soleman, S., Mahendra, R., Fung, P., Bahar, S., & Purwarianti, A. (2020). IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding. Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, December 2020 (pp. 843-857). <https://aclanthology.org/2020.aacl-main.85.pdf>`_
3. `Lowphansirikul, L., Polpanumas, C., Jantrakulchai, N., & Nutanong, S. (2021). WangchanBERTa: Pretraining transformer-based Thai Language Models. ArXiv, abs/2101.09635. <https://arxiv.org/pdf/2101.09635.pdf>`_
4. `Nguyen, D.Q., & Nguyen, A.T. (2020) PhoBERT: Pre-trained language models for Vietnamese. Findings of the Association for Computational Linguistics: EMNLP 2020, November 2020 (pp. 1037-1042). <https://aclanthology.org/2020.findings-emnlp.92.pdf>`_

******************
Datasets / Tagsets
******************

1. `Arwidarasti, J. N., Alfina, I., & Krisnadhi, A. A. (2019). Converting an Indonesian Constituency Treebank to the Penn Treebank Format. 2019 International Conference on Asian Language Processing (IALP), November 2019 (pp. 331-336) <https://ieeexplore.ieee.org/document/9037723>`_
2. `Thu, Y.K., Pa, W.P., Utiyama, M., Finch, A., & Sumita, E. (2016). Introducing the Asian Language Treebank (ALT). Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), May 2016 (pp. 1574-1578) <https://aclanthology.org/L16-1249.pdf>`_
3. `Nguyen, Q., Nguyen, N., & Miyao, Y. (2013). Utilizing State-of-the-art Parsers to Diagnose Problems in Treebank Annotation for a Less Resourced Language. Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, August 2013 (pp. 19-27). <https://aclanthology.org/W13-2303.pdf>`_
4. `Nguyen, D. Q., Nguyen, D. Q., Pham, S. B., Nguyen, P. T., & Nguyen, M. L. (2014). From Treebank Conversion to Automatic Dependency Parsing for Vietnamese. Proceedings of 19th International Conference on Application of Natural Language to Information Systems, June 2014 (pp. 196–207). <https://www.researchgate.net/publication/279916415_From_Treebank_Conversion_to_Automatic_Dependency_Parsing_for_Vietnamese>`_
5. `Nguyen, P.T., Vu, X.L., Nguyen, T.M.H., Nguyen, V.H., & Le, H.P. (2009). Building a Large Syntactically-Annotated Corpus of Vietnamese. Proceedings of the Third Linguistic Annotation Workshop (LAW III), August 2009 (pp. 182-185) <https://aclanthology.org/W09-3035.pdf>`_

*******
General
*******

1. `Jwalapuram, P., Joty, S., & Shen, S. (2020). Pronoun-Targeted Fine-tuning for NMT with Hybrid Losses. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, November 2020 (pp. 2267-2279). <https://aclanthology.org/2020.emnlp-main.177.pdf>`_
2. `Mohiuddin, T., Bari, M. S., & Joty, S (2020). LNMap: Departures from Isomorphic Assumption in Bilingual Lexicon Induction Through Non-Linear Mapping in Latent Space. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, November 2020 (pp. 2712-2723). <https://aclanthology.org/2020.emnlp-main.215.pdf>`_