Model Architecture Overview

convmodel provides ConversationModel class. ConversationModel class adopts GPT2LMHeadModel architecture provided by transformers library.

Although, in a initializer of ConversationModel, ConversationTokenizer is automatically initialized, let us first directly initialize ConversationTokenizer to see it encodes a given context to input to the model. Assume that ConversationTokenizer gets a context ["Hello", "How are you"] . Then ConversationTokenizer encodes it as follows.

>>> from convmodel import ConversationTokenizer
>>> tokenizer = ConversationTokenizer.from_pretrained("gpt2")
>>> context = ["Hello", "How are you"]
>>> tokenizer(context)
{'input_ids': [50256, 15496, 50256, 2437, 389, 345, 50256], 'token_type_ids': [0, 0, 1, 1, 1, 1, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1]}

position	0	1	2	3	4	5	6
word	\<sep>	Hello	\<sep>	How	are	you	\<sep>
input_ids	50256	15496	50256	2437	389	345	50256
token_type_ids	0	0	1	1	1	1	0
attention_mask	1	1	1	1	1	1	1

Note: if a tokenizer does not assign a value to sep_token_id, it is automatically set with sep_token of <sep>.

When initializing ConversationModel, ConversationTokenizer is automatically initialized inside. ConversationModel implements generate method. In generate method, an input context is first encoded as above. Then the encoded tensors are forwardded by the model to predict following tokens until <sep> token appears

Note: Here we assume that model directory contains a trained conversation model which was fine-tuned from gpt2 model. We will see how to train our own conversation model later.

>>> from convmodel import ConversationModel
>>> model = ConversationModel.from_pretrained("model")
>>> model.generate(context, do_sample=True, top_p=0.95)

position	0	1	2	3	4	5	6	7	8	9
word	\<sep>	Hello	\<sep>	How	are	you	\<sep>	Good	thank	you
input_ids	50256	15496	50256	2437	389	345	50256	10248	5875	345
token_type_ids	0	0	1	1	1	1	0	0	0	0
attention_mask	1	1	1	1	1	1	1	1	1	1
							↓	↓	↓	↓
generated word	-	-	-	-	-	-	Good	thank	you	\<sep>