Skip to content

Model Training

Prepare model

First you need to load GPT-2 pretrained model. The model is easily loaded by using from_pretrained method defined in ConversationModel.

from convmodel import ConversationModel

model = ConverstationModel.from_pretrained("gpt2")

If you want to use GPU to train, device option is for that.

# Load model in GPU
model = ConverstationModel.from_pretrained("gpt2", device="cuda")

# Load model in CPU
model = ConverstationModel.from_pretrained("gpt2", device="cpu")

If you do not specify any values to device, GPU is used if available.

Training data

Before training, you also need to prepare training data.

convmodel provides ConversationExample class which shows one example of conversation to use in training. You need to prepare iterator objects for train/valid data to provide one ConversationExample object in each step in the loop.

from convmodel import ConversationExample

train_iterator = [
    ConversationExample(conversation=["Hello", "Hi, how are you?", "Good, thank you, how about you?", "Good, thanks!"]),
    ConversationExample(conversation=["I am hungry", "How about eating pizza?"]),
]
valid_iterator = [
    ConvesationExample(conversation=["Tired...", "Let's have a break!", "Nice idea!"]),
]

Although the above example is fine, the data is usually large and difficult to load all the data on memory at the same time. In this case, it might be better to implement iterator class to provide one example in each step in the loop.

Following example assumes each data file contains one conversation example in one line. The file format is Json Lines and each line contains a list of string which shows one conversation examples.

# Assume that training/valid data is located in under input directory.

# Training file: input/train.jsonl
$ head -n2 input/train.jsonl
["Hello", "Hi, how are you?", "Good, thank you, how about you?", "Good, thanks!"]
["I am hungry", "How about eating pizza?"]

# Validation file: input/valid.jsonl
$ head -n1 input/valid.jsonl
["Tired...", "Let's have a break!", "Nice idea!"]

You can implement your own iterator class to load the file and return each conversation example at each time as follows.

class JsonLinesIterator:
    """Json Lines data loader used in fit command"""
    def __init__(self, filename: str):
        self._filename = filename

    def __iter__(self):
        with open(self._filename) as fd:
            for line in fd:
                yield ConversationExample(conversation=json.loads(line))


train_iterator = JsonLinesIterator("input/train.jsonl")
valid_iterator = JsonLinesIterator("input/valid.jsonl")

Training

Finally, you can start training by calling fit method with train and valid itarators.

model.fit(train_iterator=train_iterator, valid_iterator=valid_iterator)

Fit with model saving

Although you can save the model by calling .save_pretrained directly to the model as follows,

model.save_pretrained("model")

you can also pass the directory to be saved as output_path parameter to fit method.

model.fit(train_iterator=train_iterator, valid_iterator=valid_iterator, output_path="model")

output_path option allows you to pass save_best_model as a parameter. This option enables to save only the best model based on validation perplexity in output_path.

model.fit(train_iterator=train_iterator, valid_iterator=valid_iterator, output_path="model", save_best_model=True)

Load trained model

Oncde model training is completed, you can load your trained model by .from_pretrained method.

model = ConversationModel.from_pretrained("model")