Pytorch学习系列(9):语言模型(Language Model (RNN-LM))

Pytorch学习系列(9):语言模型(Language Model (RNN-LM))

使用循环神经网络解决 Penn Treebank 数据集 分块问题。

导包

# 包
import torch
import torch.nn as nn
import numpy as np
from torch.nn.utils import clip_grad_norm
from data_utils import Dictionary, Corpus 

#data_utils代码在https://github.com/yunjey/pytorch-tutorial/blob/master/tutorials/02-intermediate/language_model/data_utils.py

参数设置

# 设备配置
# Device configuration
torch.cuda.set_device(1) # 这句用来设置pytorch在哪块GPU上运行
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# 超参数设置
# Hyper-parameters
embed_size = 128
hidden_size = 1024
num_layers = 1
num_epochs = 5
num_samples = 1000     # number of words to be sampled
batch_size = 20
seq_length = 30
learning_rate = 0.002

Penn Treebank 数据集

corpus = Corpus()
ids = corpus.get_data('./data/Penn-Treebank/train.txt', batch_size)
vocab_size = len(corpus.dictionary)
num_batches = ids.size(1) // seq_length

基于 RNN 的语言模型

class RNNLM(nn.Module):
    def __init__(self, vocab_size, embed_size, hidden_size, num_layers):
        super(RNNLM, self).__init__()
        self.embed = nn.Embedding(vocab_size, embed_size)
        self.lstm = nn.LSTM(embed_size, hidden_size, num_layers, batch_first=True)
        self.linear = nn.Linear(hidden_size, vocab_size)

    def forward(self, x, h):
        # Embed word ids to vectors
        x = self.embed(x)

        # Forward propagate LSTM
        out, (h, c) = self.lstm(x, h)

        # Reshape output to (batch_size*sequence_length, hidden_size)
        out = out.reshape(out.size(0)*out.size(1), out.size(2))

        # Decode hidden states of all time steps
        out = self.linear(out)
        return out, (h, c)

# 实例化一个模型
model = RNNLM(vocab_size, embed_size, hidden_size, num_layers).to(device)

# 定义损失函数和优化器
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

# 定义函数:截断反向传播
def detach(states):
    return [state.detach() for state in states] 

训练模型

for epoch in range(num_epochs):
    # 初始化隐状态和细胞状态
    states = ( torch.zeros(num_layers, batch_size, hidden_size).to(device),
                    torch.zeros(num_layers, batch_size, hidden_size).to(device) )

    for i in range(0, ids.size(1) - seq_length, seq_length):
        # Get mini-batch inputs and targets
        inputs = ids[:, i:i+seq_length].to(device)
        targets = ids[:, (i+1):(i+1)+seq_length].to(device)

        # Forward pass
        states = detach(states)
        outputs, states = model(inputs, states)
        loss = criterion(outputs, targets.reshape(-1))

        # Backward and optimize
        model.zero_grad()
        loss.backward()
        clip_grad_norm(model.parameters(), 0.5)  # 梯度裁剪,防止过拟合
        optimizer.step()

        step = (i+1) // seq_length
        if step % 100 == 0:
            print ('Epoch [{}/{}], Step[{}/{}], Loss: {:.4f}, Perplexity: {:5.2f}'.format(epoch+1, num_epochs, step, num_batches, loss.item(), np.exp(loss.item())))
Epoch [1/5], Step[0/1273], Loss: 9.2046, Perplexity: 9942.95
Epoch [1/5], Step[100/1273], Loss: 6.3046, Perplexity: 547.06
Epoch [1/5], Step[200/1273], Loss: 6.1742, Perplexity: 480.18
Epoch [1/5], Step[300/1273], Loss: 5.7204, Perplexity: 305.03
Epoch [1/5], Step[400/1273], Loss: 5.3257, Perplexity: 205.55
Epoch [1/5], Step[500/1273], Loss: 5.4088, Perplexity: 223.35
Epoch [1/5], Step[600/1273], Loss: 5.1798, Perplexity: 177.65
Epoch [1/5], Step[700/1273], Loss: 5.3890, Perplexity: 218.98
Epoch [1/5], Step[800/1273], Loss: 5.3407, Perplexity: 208.66
Epoch [1/5], Step[900/1273], Loss: 4.7600, Perplexity: 116.74
Epoch [1/5], Step[1000/1273], Loss: 5.4592, Perplexity: 234.90
Epoch [1/5], Step[1100/1273], Loss: 5.0615, Perplexity: 157.83
Epoch [1/5], Step[1200/1273], Loss: 5.2522, Perplexity: 190.99
Epoch [2/5], Step[0/1273], Loss: 5.2855, Perplexity: 197.46
Epoch [2/5], Step[100/1273], Loss: 4.8959, Perplexity: 133.73
Epoch [2/5], Step[200/1273], Loss: 4.9556, Perplexity: 141.96
Epoch [2/5], Step[300/1273], Loss: 4.5918, Perplexity: 98.67
Epoch [2/5], Step[400/1273], Loss: 4.2465, Perplexity: 69.86
Epoch [2/5], Step[500/1273], Loss: 4.5518, Perplexity: 94.80
Epoch [2/5], Step[600/1273], Loss: 4.3887, Perplexity: 80.54
Epoch [2/5], Step[700/1273], Loss: 4.3760, Perplexity: 79.52
Epoch [2/5], Step[800/1273], Loss: 4.5463, Perplexity: 94.28
Epoch [2/5], Step[900/1273], Loss: 3.9136, Perplexity: 50.08
Epoch [2/5], Step[1000/1273], Loss: 4.5064, Perplexity: 90.60
Epoch [2/5], Step[1100/1273], Loss: 4.2943, Perplexity: 73.28
Epoch [2/5], Step[1200/1273], Loss: 4.4425, Perplexity: 84.99
Epoch [3/5], Step[0/1273], Loss: 4.3017, Perplexity: 73.83
Epoch [3/5], Step[100/1273], Loss: 4.1055, Perplexity: 60.67
Epoch [3/5], Step[200/1273], Loss: 4.0845, Perplexity: 59.41
Epoch [3/5], Step[300/1273], Loss: 3.7977, Perplexity: 44.60
Epoch [3/5], Step[400/1273], Loss: 3.5935, Perplexity: 36.36
Epoch [3/5], Step[500/1273], Loss: 3.8631, Perplexity: 47.61
Epoch [3/5], Step[600/1273], Loss: 3.7142, Perplexity: 41.03
Epoch [3/5], Step[700/1273], Loss: 3.4882, Perplexity: 32.73
Epoch [3/5], Step[800/1273], Loss: 3.7708, Perplexity: 43.42
Epoch [3/5], Step[900/1273], Loss: 3.2133, Perplexity: 24.86
Epoch [3/5], Step[1000/1273], Loss: 3.6518, Perplexity: 38.54
Epoch [3/5], Step[1100/1273], Loss: 3.5773, Perplexity: 35.78
Epoch [3/5], Step[1200/1273], Loss: 3.6072, Perplexity: 36.86
Epoch [4/5], Step[0/1273], Loss: 3.4450, Perplexity: 31.34
Epoch [4/5], Step[100/1273], Loss: 3.4896, Perplexity: 32.77
Epoch [4/5], Step[200/1273], Loss: 3.4073, Perplexity: 30.18
Epoch [4/5], Step[300/1273], Loss: 3.1986, Perplexity: 24.50
Epoch [4/5], Step[400/1273], Loss: 3.1177, Perplexity: 22.59
Epoch [4/5], Step[500/1273], Loss: 3.2878, Perplexity: 26.78
Epoch [4/5], Step[600/1273], Loss: 3.2006, Perplexity: 24.55
Epoch [4/5], Step[700/1273], Loss: 2.9260, Perplexity: 18.65
Epoch [4/5], Step[800/1273], Loss: 3.1832, Perplexity: 24.12
Epoch [4/5], Step[900/1273], Loss: 2.6179, Perplexity: 13.71
Epoch [4/5], Step[1000/1273], Loss: 3.0230, Perplexity: 20.55
Epoch [4/5], Step[1100/1273], Loss: 3.0621, Perplexity: 21.37
Epoch [4/5], Step[1200/1273], Loss: 3.0464, Perplexity: 21.04
Epoch [5/5], Step[0/1273], Loss: 2.9219, Perplexity: 18.58
Epoch [5/5], Step[100/1273], Loss: 3.0574, Perplexity: 21.27
Epoch [5/5], Step[200/1273], Loss: 3.0433, Perplexity: 20.97
Epoch [5/5], Step[300/1273], Loss: 2.7347, Perplexity: 15.41
Epoch [5/5], Step[400/1273], Loss: 2.7218, Perplexity: 15.21
Epoch [5/5], Step[500/1273], Loss: 2.9437, Perplexity: 18.99
Epoch [5/5], Step[600/1273], Loss: 2.8465, Perplexity: 17.23
Epoch [5/5], Step[700/1273], Loss: 2.6238, Perplexity: 13.79
Epoch [5/5], Step[800/1273], Loss: 2.8717, Perplexity: 17.67
Epoch [5/5], Step[900/1273], Loss: 2.3243, Perplexity: 10.22
Epoch [5/5], Step[1000/1273], Loss: 2.6826, Perplexity: 14.62
Epoch [5/5], Step[1100/1273], Loss: 2.7305, Perplexity: 15.34
Epoch [5/5], Step[1200/1273], Loss: 2.7161, Perplexity: 15.12

测试模型

with torch.no_grad():
    with open('sample.txt', 'w') as f:
        # 初始化隐状态和细胞状态
        state = (torch.zeros(num_layers, 1, hidden_size).to(device),
                         torch.zeros(num_layers, 1, hidden_size).to(device))

        # 随机选择一个单词
        prob = torch.ones(vocab_size)
        input = torch.multinomial(prob, num_samples=1).unsqueeze(1).to(device)

        for i in range(num_samples):
            # Forward propagate RNN 
            output, state = model(input, state)

            # Sample a word id
            prob = output.exp()
            word_id = torch.multinomial(prob, num_samples=1).item()

            # Fill input with sampled word id for the next time step
            input.fill_(word_id)

            # File write
            word = corpus.dictionary.idx2word[word_id]
            word = '\n' if word == '<eos>' else word + ' '
            f.write(word)

            if (i+1) % 100 == 0:
                print('Sampled [{}/{}] words and save to {}'.format(i+1, num_samples, 'sample.txt'))
Sampled [100/1000] words and save to sample.txt
Sampled [200/1000] words and save to sample.txt
Sampled [300/1000] words and save to sample.txt
Sampled [400/1000] words and save to sample.txt
Sampled [500/1000] words and save to sample.txt
Sampled [600/1000] words and save to sample.txt
Sampled [700/1000] words and save to sample.txt
Sampled [800/1000] words and save to sample.txt
Sampled [900/1000] words and save to sample.txt
Sampled [1000/1000] words and save to sample.txt

保存模型

# 保存模型
torch.save(model.state_dict(), 'model.ckpt')
暂无评论

发送评论 编辑评论


				
|´・ω・)ノ
ヾ(≧∇≦*)ゝ
(☆ω☆)
(╯‵□′)╯︵┴─┴
 ̄﹃ ̄
(/ω\)
∠( ᐛ 」∠)_
(๑•̀ㅁ•́ฅ)
→_→
୧(๑•̀⌄•́๑)૭
٩(ˊᗜˋ*)و
(ノ°ο°)ノ
(´இ皿இ`)
⌇●﹏●⌇
(ฅ´ω`ฅ)
(╯°A°)╯︵○○○
φ( ̄∇ ̄o)
ヾ(´・ ・`。)ノ"
( ง ᵒ̌皿ᵒ̌)ง⁼³₌₃
(ó﹏ò。)
Σ(っ °Д °;)っ
( ,,´・ω・)ノ"(´っω・`。)
╮(╯▽╰)╭
o(*////▽////*)q
>﹏<
( ๑´•ω•) "(ㆆᴗㆆ)
😂
😀
😅
😊
🙂
🙃
😌
😍
😘
😜
😝
😏
😒
🙄
😳
😡
😔
😫
😱
😭
💩
👻
🙌
🖕
👍
👫
👬
👭
🌚
🌝
🙈
💊
😶
🙏
🍦
🍉
😣
Source: github.com/k4yt3x/flowerhd
颜文字
Emoji
小恐龙
花!
上一篇
下一篇