Command Line Interface (CLI)¶
SEACoreNLP provides a CLI for training, evaluation and inference. The exact arguments to be provided for each CoreNLP task can be found in their respective pages under the Usage section of our documentation, but we provide an overview of how to use the CLI here.
For reference, the four tasks currently supported by SEACoreNLP for CLI have the following aliases. Use these aliases when specifying the task for the CLI commands:
Part-of-speech Tagging:
pos
Named Entity Recognition:
ner
Constituency Parsing:
constituency
Dependency Parsing:
dependency
Training Models¶
To train a model, specify the paths to the training data (train_data_path
) and
validation data (validation_data_path
), the arguments for defining the model and
training hyperparameters, as well as the task (task
) involved.
[ARGUMENTS ...] train_data_path=PATH validation_data_path=PATH seacorenlp train --task=TASK
For example, if we wanted to train a dependency parsing model with the following details:
Use XLM-R Base embeddings and freeze its parameters for training
Concatenate POS tag embeddings of 100 dimensions to the word embeddings
Use the bi-LSTM configuration in Dozat and Manning’s paper
Train for 20 epochs with an early stopping of 3 epochs
Train with a batch size of 4 and learning rate of 0.00001
# Define embeddings to be used
use_pretrained=true model_name=xlm-roberta-base freeze=true pos_tag_embedding_dim=100 \
# Define encoder to be used (Bi-LSTM)
lstm_input_dim=868 lstm_hidden_dim=400 lstm_layers=3 lstm_dropout=0.3 \
# Define training hyperparameters
num_epochs=20 patience=3 batch_size=4 lr=1e-5 \
# Define path to data
train_data_path=train.txt validation_data_path=val.txt \
# Specify train subcommand and task
seacorenlp train --task=dependency
By default, the model trained will be in a folder named outputs in your current working directory.
Evaluating Models¶
To evaluate a model trained by SEACoreNLP, you simply need to specify the path to the model, path to the test data as well as the task involved.
seacorenlp evaluate --archive_file=PATH_TO_MODEL --input_file=PATH_TO_TEST_DATA --task=TASK
Inference¶
To perform inference, the data to be predicted on should take the same format as the training data.
seacorenlp predict --archive_file=PATH_TO_MODEL --input_file=PATH_TO_DATA --output_file=PATH_TO_OUTPUT