Skip to content

Model Architecture Overview

convmodel provides ConversationModel class. ConversationModel class adopts GPT2LMHeadModel architecture provided by transformers library.

Although, in a initializer of ConversationModel, ConversationTokenizer is automatically initialized, let us first directly initialize ConversationTokenizer to see it encodes a given context to input to the model. Assume that ConversationTokenizer gets a context ["Hello", "How are you"] . Then ConversationTokenizer encodes it as follows.

>>> from convmodel import ConversationTokenizer
>>> tokenizer = ConversationTokenizer.from_pretrained("gpt2")
>>> context = ["Hello", "How are you"]
>>> tokenizer(context)
{'input_ids': [50256, 15496, 50256, 2437, 389, 345, 50256], 'token_type_ids': [0, 0, 1, 1, 1, 1, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1]}
position 0 1 2 3 4 5 6
word \<sep> Hello \<sep> How are you \<sep>
input_ids 50256 15496 50256 2437 389 345 50256
token_type_ids 0 0 1 1 1 1 0
attention_mask 1 1 1 1 1 1 1

Note: if a tokenizer does not assign a value to sep_token_id, it is automatically set with sep_token of <sep>.

When initializing ConversationModel, ConversationTokenizer is automatically initialized inside. ConversationModel implements generate method. In generate method, an input context is first encoded as above. Then the encoded tensors are forwardded by the model to predict following tokens until <sep> token appears

Note: Here we assume that model directory contains a trained conversation model which was fine-tuned from gpt2 model. We will see how to train our own conversation model later.

>>> from convmodel import ConversationModel
>>> model = ConversationModel.from_pretrained("model")
>>> model.generate(context, do_sample=True, top_p=0.95)
position 0 1 2 3 4 5 6 7 8 9
word \<sep> Hello \<sep> How are you \<sep> Good thank you
input_ids 50256 15496 50256 2437 389 345 50256 10248 5875 345
token_type_ids 0 0 1 1 1 1 0 0 0 0
attention_mask 1 1 1 1 1 1 1 1 1 1
generated word - - - - - - Good thank you \<sep>