##########
Quickstart
##########

There are six main tasks supported by SEACoreNLP at the moment, namely:

* Word Tokenization
* Sentence Segmentation
* Part-of-speech Tagging
* Named Entity Recognition
* Constituency Parsing
* Dependency Parsing

SEACoreNLP provides classes that can perform each of these tasks.

*******
Classes
*******

Segmentation Tasks
==================

For tokenization and sentence segmentation, we provide the classes ``Tokenizer`` and ``SentenceSplitter``.
As we do not provide natively trained segmenters for now, the only segmenters available are from third-party
libraries.

In order to instantiate a segmenter, use the ``from_library`` method if you have a specific one in mind,
or use the ``from_default`` method if you would like to use the default segmenter.

.. code-block:: python

   from seacorenlp.data.tokenizers import Tokenizer, SentenceSplitter

   text = 'ผมอยากกินข้าว'

   # Default Tokenizer
   tokenizer = Tokenizer.from_default('th')
   tokenizer.tokenize(text)
   # Output: [ผม, อยาก, กิน, ข้าว]

   # Specific Tokenizer
   tokenizer = Tokenizer.from_library('pythainlp', engine='newmm')
   tokenizer.tokenize(text)
   # Output: [ผม, อยาก, กินข้าว]

   longer_text = 'Tôi muốn ăn cơm. Chị muốn đi du lịch.'

   # Default SentenceSplitter
   splitter = SentenceSplitter.from_default('vi')
   splitter.split_sentences(longer_text)
   # Output: ['Tôi muốn ăn cơm.', 'Chị muốn đi du lịch.']


Tagging & Parsing Tasks
=======================

For tagging (POS, NER) and parsing (constituency, dependency) tasks, we provide natively trained
models as well as third-party models which can be used by instantiating the relevant class with
the ``from_pretrained`` and ``from_library`` methods respectively.

.. code-block:: python

   from seacorenlp.tagging import POSTagger

   th_text = 'ผมอยากกินข้าว'

   # Native Models
   native_tagger = POSTagger.from_pretrained('pos-th-ud-xlmr')
   native_tagger.predict(th_text)
   # Output: [('ผม', 'PRON'), ('อยาก', 'VERB'), ('กิน', 'VERB'), ('ข้าว', 'NOUN')]

   # External Models
   # Include keyword arguments as necessary (see respective class documentation)
   external_tagger = POSTagger.from_library('pythainlp', corpus='orchid')
   external_tagger.predict(th_text)
   # Output: [('ผม', 'PPRS'), ('อยาก', 'XVMM'), ('กิน', 'VACT'), ('ข้าว', 'NCMN')]

For the full list of models available, refer to `Model Performance <models.html>`_.

Here are some examples for each task:

.. code-block:: python

   from seacorenlp.tagging import POSTagger, NERTagger
   from seacorenlp.parsing import ConstituencyParser, DependencyParser

   # POS Tagging

   pos_text = 'Tôi muốn ăn cơm.'
   pos_tagger = POSTagger.from_pretrained('pos-vi-ud-xlmr')
   pos_tagger.predict(pos_text)
   # Output: [('Tôi', 'PROPN'), ('muốn', 'VERB'), ('ăn', 'VERB'), ('cơm', 'NOUN'), ('.', 'PUNCT')]


   # NER

   ner_text = 'Thủ tướng Trung Quốc Ôn Gia Bảo đã đến thăm Việt Nam vào năm 2004.'
   ner_tagger = NERTagger.from_library('underthesea')
   ner_tagger.predict(ner_text)
   # Output:
   # [('Thủ tướng', 'O'),
   #  ('Trung Quốc', 'B-LOC'),
   #  ('Ôn', 'B-PER'),
   #  ('Gia Bảo', 'I-PER'),
   #  ('đã', 'O'),
   #  ('đến', 'O'),
   #  ('thăm', 'O'),
   #  ('Việt Nam', 'B-LOC'),
   #  ('vào', 'O'),
   #  ('năm', 'O'),
   #  ('2004', 'O'),
   #  ('.', 'O')]


   # Constituency Parsing

   const_text = 'Saya pergi ke sekolah'
   const_parser = ConstituencyParser.from_pretrained('cp-id-kethu-benepar-xlmr-best')
   trees = const_parser.predict(const_text)
   print(trees[0])
   # Output:
   # (TOP
   #  (S
   #    (NP-SBJ (PRP Saya))
   #    (VP (VB pergi) (PP (IN ke) (NP (NN sekolah))))))


   # Dependency Parsing

   dep_text = 'Saya pergi ke sekolah'
   dep_parser = DependencyParser.from_pretrained('dp-id-ud-xlmr')
   results = dep_parser.predict(dp_text)
   print(results[0])
   # Output: [('Saya', 2, 'nsubj'), ('pergi', 0, 'root'), ('ke', 4, 'case'), ('sekolah', 2, 'obl')]