Command Line Interface (CLI)

SEACoreNLP provides a CLI for training, evaluation and inference. The exact arguments to be provided for each CoreNLP task can be found in their respective pages under the Usage section of our documentation, but we provide an overview of how to use the CLI here.

For reference, the four tasks currently supported by SEACoreNLP for CLI have the following aliases. Use these aliases when specifying the task for the CLI commands:

  • Part-of-speech Tagging: pos

  • Named Entity Recognition: ner

  • Constituency Parsing: constituency

  • Dependency Parsing: dependency

Training Models

To train a model, specify the paths to the training data (train_data_path) and validation data (validation_data_path), the arguments for defining the model and training hyperparameters, as well as the task (task) involved.

[ARGUMENTS ...] train_data_path=PATH validation_data_path=PATH seacorenlp train --task=TASK

For example, if we wanted to train a dependency parsing model with the following details:

  • Use XLM-R Base embeddings and freeze its parameters for training

  • Concatenate POS tag embeddings of 100 dimensions to the word embeddings

  • Use the bi-LSTM configuration in Dozat and Manning’s paper

  • Train for 20 epochs with an early stopping of 3 epochs

  • Train with a batch size of 4 and learning rate of 0.00001

# Define embeddings to be used
use_pretrained=true model_name=xlm-roberta-base freeze=true pos_tag_embedding_dim=100 \

# Define encoder to be used (Bi-LSTM)
lstm_input_dim=868 lstm_hidden_dim=400 lstm_layers=3 lstm_dropout=0.3 \

# Define training hyperparameters
num_epochs=20 patience=3 batch_size=4 lr=1e-5 \

# Define path to data
train_data_path=train.txt validation_data_path=val.txt \

# Specify train subcommand and task
seacorenlp train --task=dependency

By default, the model trained will be in a folder named outputs in your current working directory.

Evaluating Models

To evaluate a model trained by SEACoreNLP, you simply need to specify the path to the model, path to the test data as well as the task involved.

seacorenlp evaluate --archive_file=PATH_TO_MODEL --input_file=PATH_TO_TEST_DATA --task=TASK

Inference

To perform inference, the data to be predicted on should take the same format as the training data.

seacorenlp predict --archive_file=PATH_TO_MODEL --input_file=PATH_TO_DATA --output_file=PATH_TO_OUTPUT