Command Line Interface (CLI)¶
SEACoreNLP provides a CLI for training, evaluation and inference. The exact arguments to be provided for each CoreNLP task can be found in their respective pages under the Usage section of our documentation, but we provide an overview of how to use the CLI here.
For reference, the four tasks currently supported by SEACoreNLP for CLI have the following aliases. Use these aliases when specifying the task for the CLI commands:
Named Entity Recognition:
To train a model, specify the paths to the training data (
validation data (
validation_data_path), the arguments for defining the model and
training hyperparameters, as well as the task (
[ARGUMENTS ...] train_data_path=PATH validation_data_path=PATH seacorenlp train --task=TASK
For example, if we wanted to train a dependency parsing model with the following details:
Use XLM-R Base embeddings and freeze its parameters for training
Concatenate POS tag embeddings of 100 dimensions to the word embeddings
Use the bi-LSTM configuration in Dozat and Manning’s paper
Train for 20 epochs with an early stopping of 3 epochs
Train with a batch size of 4 and learning rate of 0.00001
# Define embeddings to be used use_pretrained=true model_name=xlm-roberta-base freeze=true pos_tag_embedding_dim=100 \ # Define encoder to be used (Bi-LSTM) lstm_input_dim=868 lstm_hidden_dim=400 lstm_layers=3 lstm_dropout=0.3 \ # Define training hyperparameters num_epochs=20 patience=3 batch_size=4 lr=1e-5 \ # Define path to data train_data_path=train.txt validation_data_path=val.txt \ # Specify train subcommand and task seacorenlp train --task=dependency
By default, the model trained will be in a folder named outputs in your current working directory.
To evaluate a model trained by SEACoreNLP, you simply need to specify the path to the model, path to the test data as well as the task involved.
seacorenlp evaluate --archive_file=PATH_TO_MODEL --input_file=PATH_TO_TEST_DATA --task=TASK
To perform inference, the data to be predicted on should take the same format as the training data.
seacorenlp predict --archive_file=PATH_TO_MODEL --input_file=PATH_TO_DATA --output_file=PATH_TO_OUTPUT