主要内容

使用深度学习的字词文本生成

此示例显示如何培训深度学习LSTM网络以生成文本逐个字。

为了训练一个深度学习网络来逐词生成文本,训练一个序列到序列的LSTM网络来预测单词序列中的下一个单词。为了训练网络预测下一个单词,指定响应为移动一个时间步长的输入序列。

此示例从网站读取文本。它读取并解析了HTML代码以提取相关文本,然后使用自定义迷你批处理数据存储DemodentGenerationDataStore.将文档输入到网络中作为较小批次的序列数据。数据存储区将文档转换为数字字指数的序列。深度学习网络是包含单词嵌入层的LSTM网络。

迷你批处理数据存储是数据存储的实现,支持批量读取数据。万博1manbetx您可以使用迷你批处理数据存储作为深度学习应用程序的培训,验证,测试和预测数据集来源。使用迷你批量数据存储读取内存数据或在阅读数据批次时执行特定的预处理操作。

您可以调整自定义迷你批处理数据存储DemodureGenerationDatastoreore.m.通过自定义函数来对您的数据。有关展示如何创建自己的自定义迷你批处理数据存储的示例,请参阅开发自定义迷你批处理数据存储(深度学习工具箱)

负载培训数据

加载培训数据。阅读html代码刘易斯卡罗尔的Alice在仙境中的冒险经历从项目古顿贝格。

URL =.“https://www.gutenberg.org/files/11/11-h/11-h.htm”;代码= Webrabread(URL);

解析HTML代码

HTML代码中包含了相关的文本

(段落)元素。通过解析HTML代码来提取相关文本htmltree.然后找到具有元素名称的所有元素“P”

树= htmltree(代码);选择器=“P”;子树= FindElement(树,选择器);

从HTML子树中提取文本数据extracthtmltext.并查看前10段。

textdata = extracthtmltext(子树);TextData(1:10)
ans =.10×1字符串数组“”“”“”“”“”“”“”爱丽丝开始厌倦了坐在姐姐上的银行,而且无关:她偷看了她姐姐的书中的两次,but it had no pictures or conversations in it, ‘and what is the use of a book,’ thought Alice ‘without pictures or conversations?’ " "So she was considering in her own mind (as well as she could, for the hot day made her feel very sleepy and stupid), whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking the daisies, when suddenly a White Rabbit with pink eyes ran close by her. " "There was nothing so very remarkable in that; nor did Alice think it so very much out of the way to hear the Rabbit say to itself, ‘Oh dear! Oh dear! I shall be late!’ (when she thought it over afterwards, it occurred to her that she ought to have wondered at this, but at the time it all seemed quite natural); but when the Rabbit actually took a watch out of its waistcoat-pocket, and looked at it, and then hurried on, Alice started to her feet, for it flashed across her mind that she had never before seen a rabbit with either a waistcoat-pocket, or a watch to take out of it, and burning with curiosity, she ran across the field after it, and fortunately was just in time to see it pop down a large rabbit-hole under the hedge. " "In another moment down went Alice after it, never once considering how in the world she was to get out again. "

删除空段落并查看前10段。

TextData(TextData ==)= [];TextData(1:10)
ans =.10×1字符串数组“Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, ‘and what is the use of a book,’ thought Alice ‘without pictures or conversations?’ " "So she was considering in her own mind (as well as she could, for the hot day made her feel very sleepy and stupid), whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking the daisies, when suddenly a White Rabbit with pink eyes ran close by her. " "There was nothing so very remarkable in that; nor did Alice think it so very much out of the way to hear the Rabbit say to itself, ‘Oh dear! Oh dear! I shall be late!’ (when she thought it over afterwards, it occurred to her that she ought to have wondered at this, but at the time it all seemed quite natural); but when the Rabbit actually took a watch out of its waistcoat-pocket, and looked at it, and then hurried on, Alice started to her feet, for it flashed across her mind that she had never before seen a rabbit with either a waistcoat-pocket, or a watch to take out of it, and burning with curiosity, she ran across the field after it, and fortunately was just in time to see it pop down a large rabbit-hole under the hedge. " "In another moment down went Alice after it, never once considering how in the world she was to get out again. " "The rabbit-hole went straight on like a tunnel for some way, and then dipped suddenly down, so suddenly that Alice had not a moment to think about stopping herself before she found herself falling down a very deep well. " "Either the well was very deep, or she fell very slowly, for she had plenty of time as she went down to look about her and to wonder what was going to happen next. First, she tried to look down and make out what she was coming to, but it was too dark to see anything; then she looked at the sides of the well, and noticed that they were filled with cupboards and book-shelves; here and there she saw maps and pictures hung upon pegs. She took down a jar from one of the shelves as she passed; it was labelled ‘ORANGE MARMALADE’, but to her great disappointment it was empty: she did not like to drop the jar for fear of killing somebody, so managed to put it into one of the cupboards as she fell past it. " "‘Well!’ thought Alice to herself, ‘after such a fall as this, I shall think nothing of tumbling down stairs! How brave they’ll all think me at home! Why, I wouldn’t say anything about it, even if I fell off the top of the house!’ (Which was very likely true.) " "Down, down, down. Would the fall never come to an end! ‘I wonder how many miles I’ve fallen by this time?’ she said aloud. ‘I must be getting somewhere near the centre of the earth. Let me see: that would be four thousand miles down, I think-’ (for, you see, Alice had learnt several things of this sort in her lessons in the schoolroom, and though this was not a very good opportunity for showing off her knowledge, as there was no one to listen to her, still it was good practice to say it over) ‘-yes, that’s about the right distance-but then I wonder what Latitude or Longitude I’ve got to?’ (Alice had no idea what Latitude was, or Longitude either, but thought they were nice grand words to say.) " "Presently she began again. ‘I wonder if I shall fall right through the earth! How funny it’ll seem to come out among the people that walk with their heads downward! The Antipathies, I think-’ (she was rather glad there was no one listening, this time, as it didn’t sound at all the right word) ‘-but I shall have to ask them what the name of the country is, you know. Please, Ma’am, is this New Zealand or Australia?’ (and she tried to curtsey as she spoke-fancy curtseying as you’re falling through the air! Do you think you could manage it?) ‘And what an ignorant little girl she’ll think me for asking! No, it’ll never do to ask: perhaps I shall see it written up somewhere.’ " "Down, down, down. There was nothing else to do, so Alice soon began talking again. ‘Dinah’ll miss me very much to-night, I should think!’ (Dinah was the cat.) ‘I hope they’ll remember her saucer of milk at tea-time. Dinah my dear! I wish you were down here with me! There are no mice in the air, I’m afraid, but you might catch a bat, and that’s very like a mouse, you know. But do cats eat bats, I wonder?’ And here Alice began to get rather sleepy, and went on saying to herself, in a dreamy sort of way, ‘Do cats eat bats? Do cats eat bats?’ and sometimes, ‘Do bats eat cats?’ for, you see, as she couldn’t answer either question, it didn’t much matter which way she put it. She felt that she was dozing off, and had just begun to dream that she was walking hand in hand with Dinah, and saying to her very earnestly, ‘Now, Dinah, tell me the truth: did you ever eat a bat?’ when suddenly, thump! thump! down she came upon a heap of sticks and dry leaves, and the fall was over. "

可视化单词云中的文本数据。

图WordCloud(TextData);标题(“爱丽丝在仙境中的冒险”

准备培训数据

创建包含使用培训数据的数据存储DemodentGenerationDataStore..要创建数据存储,请首先保存自定义迷你批处理数据存储DemodureGenerationDatastoreore.m.到了路径。对于预测器,此数据存储区使用单词编码将文档转换为单词索引序列。每个文档的第一个单词索引对应于“开头”令牌。字符串给出了“文本开始”令牌“startroftext”.对于响应,数据存储返回由一个转移的单词的分类序列。

使用授权文本数据令人畏缩的鳕文

文档= tokenizeddocument(textdata);

使用令牌化文档创建文档生成数据存储。

DS = DocumentGenerationDataStore(文件);

为了减少添加到序列的填充量,通过序列长度对数据存储中的文档进行排序。

DS = SORT(DS);

建立和培训LSTM网络

定义LSTM网络架构。为了将序列数据输入网络,包括序列输入层并将输入大小设置为1.接下来,包括尺寸100的单词嵌入层和与单词编码相同的单词。接下来,包括LSTM层并将隐藏的大小指定为100.最后,将具有与类的数量,Softmax层和分类层添加相同的尺寸的完全连接的图层。类的数量是词汇表中的单词数加上“文本末尾”类的额外类。

输入= 1;EmbeddingDimension = 100;numwords = numel(ds.encoding.vocabulary);numclasses = numwords + 1;图层= [sequenceInputlayer(inputsize)wordembeddinglayer(embeddingdimension,num字)lstmlayer(100)ropoutlayer(0.2)全连接列(numclasses)softmaxlayer分类层];

指定培训选项。指定求解器'亚当'.300个时代的火车,学习速度0.01。将迷你批量大小设置为32.以保持按顺序长度排序的数据,请设置'洗牌'选项'绝不'.为了监控训练进度,设置'plots'选项'培训 - 进步'.要抑制详细输出,请设置'verbose'错误的

选项=培训选项('亚当'......'maxepochs'300,......'italllearnrate',0.01,......'minibatchsize',32,......'洗牌''绝不'......'plots''培训 - 进步'......'verbose'、假);

使用培训网络Trainnetwork.

net = trainnetwork(ds,图层,选项);

生成新文本

通过根据训练数据中的文本的第一个单词从概率分布中采样单词来生成文本的第一个单词。通过使用训练的LSTM网络生成剩余的单词来预测使用所生成的当前文本的下一个时间步骤。一逐一点一直将单词保持一,直到网络预测“文本结束”字。

要使用网络进行第一个预测,请输入表示“文本开始”令牌的索引。使用使用的索引Word2ind.使用文档数据存储使用单词编码的函数。

enc = ds.encoding;WordIndex = Word2ind(ENC,“startroftext”
WordIndex = 1

对于剩下的预测,根据网络的预测分数来示例下一个单词。预测得分表示下一个单词的概率分布。使用网络输出层的类名给出的词汇表中的单词。

词汇=字符串(Net.Layers(END).classes);

通过单词进行预测单词predictandanddatestate..对于每个预测,输入前一词的索引。停止预测网络当网络预测到文本字的结尾或生成的文本长时间为500个字符时。对于大量数据,长序列或大型网络,GPU的预测通常比CPU上的预测更快地计算成计算。否则,对CPU的预测通常更快以计算。有关单时间步骤预测,请使用CPU。要使用CPU进行预测,请设置'executionenvironment'选择predictandanddatestate.'中央处理器'

生成图=;maxlength = 500;尽管strlength(生成文本)%预测下一个单词分数。[net,wordcores] = predictandanddateState(net,wordindex,'executionenvironment''中央处理器');%样本下一个单词。Newword = DataMple(词汇,1,'重量', wordScores);%停止在文本结束时预测。如果新字==.“Endoftext”休息结尾%将单词添加到生成的文本中。生成ext =生成的文本++新字;%找到下一个输入的单词索引。WordIndex = Word2ind(enc,newword);结尾

生成过程在每个预测之间引入空白字符,这意味着某些标点符号在之前和之后的不必要空间出现。通过在适当的标点符号之前和之后删除空格来重建生成的文本。

删除在指定标点字符之前出现的空格。

punctuationCharacters = [“。”“,”“)”“:”“?”“!”];生成图=替换(生成的文本,+ punctumentcharacters,punctumentyacteracters);

删除指定标点字符后出现的空格。

punctuationCharacters = [“(”];生成ext = replace(生成的文本,punctumentcharacters +,点击特征)
生成术语=“'肯定,这是一个好龟!'女王在一个低,弱的声音中说。”

要生成多条文本,请使用几代内部重置网络状态resetState

net = ResetState(网络);

也可以看看

|||||||(深度学习工具箱)|(深度学习工具箱)|(深度学习工具箱)|(深度学习工具箱)

相关的话题