What is SEACoreNLP?¶
SEACoreNLP is an initiative by NLPHub of AI Singapore that aims to provide a one-stop solution for Natural Language Processing (NLP) in Southeast Asia.
The raison d’être of SEACoreNLP lies in the fact that many of the languages used in Southeast Asia do not have adequate NLP resources, be it open-source datasets, models or tools. With the growing demand for such capabilities in the industry but no one to supply them, SEACoreNLP hopes to lead the way in spearheading projects and gathering like-minded entities across the region to build a livelier NLP ecosystem for Southeast Asia.
As the name suggests, SEACoreNLP focuses on “core” NLP tasks, such as part-of-speech tagging, syntactic parsing or semantic role labeling, as opposed to higher-level tasks such as machine translation or question answering. This is because we believe that features engineered through such core tasks will be paramount in boosting the performance of downstream models for higher-level tasks, given that the languages of the region are low-resource languages and cannot (as of now) rely on training huge language models with heaps of data.
We hope to accomplish the following:
Provide an open-source Python library for core NLP tasks in the official ASEAN languages
Provide a one-stop information hub for progress in NLP in Southeast Asia
Build high-quality benchmark datasets for core NLP tasks in the relevant languages
Improve NLP capabilities for regional languages with core NLP, state-of-the-art models and multilingual pre-trained models
Core NLP Tasks¶
The core NLP tasks that we aim to cover are as follows:
Named Entity Recognition
Semantic Role Labeling
We have a demo that demonstrates the aforementioned core NLP tasks. Click here to check out the demo.
In our SEACoreNLP library, we hope to provide users with an easy way to train, evaluate and perform inference with models for core NLP tasks in ASEAN languages.
Our library is a light wrapper over the AllenNLP library which itself is a wrapper over Huggingface and Pytorch. We use AllenNLP as a base for development as we believe that its framework allows for easy and quick experimentation of different architectures. Furthermore, it already supports all the core NLP tasks that we are aiming to cover.
We would like to thank the creators of the third-party libraries that we use in our package for their great work in furthering NLP in SEA.
malaya: Husein Zolkepli
pythainlp: Wannaphong Phatthiyaphaibun and his team
attacut: Pattarawat Chormai, Ponrawee Prasertsom and Prof. Attapol Rutherford
spacy-thai: Prof. Koichi Yasuoka
underthesea: Vu Anh
The SEACoreNLP package is released under the GPLv3 license.