Modeling and Prediction

Develop predictive models using topic models and word embeddings

要查找高维文本数据集中的簇和提取功能，您可以使用机器学习技术和模型，例如LSA，LDA和Word Embeddings。您可以将创建的功能与Text Analytics Toolbox™与其他数据源的功能相结合。有了这些功能，您可以构建利用文本，数字和其他类型数据的机器学习模型。

Functions

展开全部

Word and N-Gram Counting

`bagOfWords`	单袋型号
`bagOfNgrams`	Bag-of-n-grams model
`addDocument`	将文档添加到字袋或n-grams型号
`removeDocument`	Remove documents from bag-of-words or bag-of-n-grams model
`removeInfrequentWords`	Remove words with low counts from bag-of-words model
`removeInfrequentNgrams`	从n-grams模型中删除很少见的n-grams
`removeWords`	Remove selected words from documents or bag-of-words model
`removengrams`	从n-grams模型中删除n-grams
`removeEmptyDocuments`	从令牌化的文档阵列，词袋模型或n-grams型号中删除空文档
`topkwords`	Most important words in bag-of-words model or LDA topic
`topkngrams`	Most frequent n-grams
`encode`	Encode documents as matrix of word or n-gram counts
`tfidf`	Term Frequency–Inverse Document Frequency (tf-idf) matrix
`加入`	结合多个字袋或n-grams型号

情感分析

`vaderSentimentScores`	Vader算法的情感分数
`比例术`	与比率规则的情感分数

Word Embeddings and Encodings

`FastTextWordembedding`	预处理的fastText单词嵌入
`wordEncoding`	Word encoding model to map words to indices and back
`DOC2序列`	Convert documents to sequences for deep learning
`Wordembeddinglayer`	Word embedding layer for deep learning networks
`Word2Vec`	地图单词嵌入向量
`word2ind`	Map word to encoding index
`vec2word`	Map embedding vector to word
`ind2word`	Map encoding index to word
`isVocabularyWord`	Test if word is member of word embedding or encoding
`ReadWordEmbedding`	从文件中读取单词嵌入
`trainWordEmbedding`	Train word embedding
`写入wordembedding`	Write word embedding file
`Wordembedding`	单词嵌入模型以将单词映射到向量和后背

文件摘要和相似性

`提取物`	Extract summary from documents
`rakeKeywords`	Extract keywords using RAKE
`textrankKeywords`	Extract keywords using TextRank
`bleuEvaluationScore`	Evaluate translation or summarization with BLEU similarity score
`Rougeevaluationscore`	Evaluate translation or summarization with ROUGE similarity score
`bm25Similarity`	Document similarities with BM25 algorithm
`余弦`	Document similarities with cosine similarity
`TexTrankScores`	Document scoring with TextRank algorithm
`lexrankScores`	Document scoring with LexRank algorithm
`mmrScores`	Document scoring with Maximal Marginal Relevance (MMR) algorithm

Topic Modeling and Dimension Reduction

`fitlda`	Fit latent Dirichlet allocation (LDA) model
`fitlsa`	适合LSA型号
`resume`	简历安装LDA模型
`logp`	LDA模型的文档对数概率和拟合度的优点
`predict`	Predict top LDA topics of documents
`transform`	将文件转换为较低维的空间
`ldaModel`	Latent Dirichlet allocation (LDA) model
`lsaModel`	Latent semantic analysis (LSA) model

Visualization

`wordcloud`	Create word cloud chart from text, bag-of-words model, bag-of-n-grams model, or LDA model
`textscatter`	2-D scatter plot of text
`textscatter3`	3-D scatter plot of text

Topics

Classification and Modeling

Create Simple Preprocessing Function
This example shows how to create a function which cleans and preprocesses text data for analysis.
Create Simple Text Model for Classification
此示例显示了如何使用单词范围模型在单词频率计数上训练简单的文本分类器。
使用多字短语分析文本数据
This example shows how to analyze text using n-gram frequency counts.
使用主题模型分析文本数据
This example shows how to use the Latent Dirichlet Allocation (LDA) topic model to analyze text data.
选择LDA模型的主题数量
此示例显示了如何确定潜在Dirichlet分配（LDA）模型的合适数量的主题。
Compare LDA Solvers
This example shows how to compare latent Dirichlet allocation (LDA) solvers by comparing the goodness of fit and the time taken to fit the model.
使用LDA模型可视化文档簇
此示例显示了如何使用潜在的Dirichlet分配（LDA）主题模型和T-SNE图可视化文档的聚类。
Visualize LDA Topic Correlations
这个例子展示了如何分析相关性的赌注ween topics in a Latent Dirichlet Allocation (LDA) topic model.
Visualize Correlations Between LDA Topics and Document Labels
此示例显示了如何拟合潜在的Dirichlet分配（LDA）主题模型，并可视化LDA主题和文档标签之间的相关性。
Create Co-occurrence Network
This example shows how to create a co-occurrence network using a bag-of-words model.

情感分析and Keyword Extraction

分析文字的情绪
此示例显示了如何使用Valence Aware Away词典和情感推理器（Vader）算法进行情感分析。
Generate Domain Specific Sentiment Lexicon
This example shows how to generate a lexicon for sentiment analysis using 10-K and 10-Q financial reports.
培训情绪分类器
此示例显示了如何使用正面和负面情感单词的注释列表和审计的单词嵌入来训练分类器进行情感分析。
使用Rake从文本数据中提取关键字
此示例显示了如何使用快速自动关键字提取（Rake）从文本数据中提取关键字。
使用Textrank从文本数据中提取关键字
This example shows to extract keywords from text data using TextRank.

深度学习

Classify Text Data Using Deep Learning
此示例显示了如何使用深度学习长期记忆（LSTM）网络对文本数据进行分类。
使用卷积神经网络对文本数据进行分类
此示例显示了如何使用卷积神经网络对文本数据进行分类。
Classify Out-of-Memory Text Data Using Deep Learning
此示例显示了如何使用转换后的数据存储使用深度学习网络将失调的文本数据分类。
Sequence-to-Sequence Translation Using Attention
This example shows how to convert decimal strings to Roman numerals using a recurrent sequence-to-sequence encoder-decoder model with attention.
使用深度学习的多标签文本分类
此示例显示了如何对具有多个独立标签的文本数据进行分类。
Generate Text Using Deep Learning(Deep Learning Toolbox)
This example shows how to train a deep learning long short-term memory (LSTM) network to generate text.
Pride and Prejudice and MATLAB
此示例显示了如何训练深度学习LSTM网络以使用字符嵌入生成文本。
Word-By-Word Text Generation Using Deep Learning
This example shows how to train a deep learning LSTM network to generate text word-by-word.
Classify Text Data Using Custom Training Loop
This example shows how to classify text data using a deep learning bidirectional long short-term memory (BiLSTM) network with a custom training loop.
Generate Text Using Autoencoders
This example shows how to generate text data using autoencoders.
定义文本编码器模型功能
此示例显示了如何定义文本编码器模型函数。
定义文本解码器模型功能
此示例显示了如何定义文本解码器模型函数。
Language Translation Using Deep Learning
This example shows how to train a German to English language translator using a recurrent sequence-to-sequence encoder-decoder model with attention.

Language Support

语言注意事项
有关使用文本分析工具箱功能的信息。
Japanese Language Support
Information on Japanese support in Text Analytics Toolbox.
Analyze Japanese Text Data
此示例显示了如何使用主题模型导入，准备和分析日本文本数据。
German Language Support
文本分析工具箱中有关德语支持的信息。万博1manbetx
分析德语文本数据
此示例显示了如何使用主题模型导入，准备和分析德语文本数据。

Featured Examples

Classify Text Data Using Deep Learning

Classify text data using a deep learning long short-term memory (LSTM) network.

开放的生活Script

使用多字短语分析文本数据

使用n克频率计数分析文本。

开放的生活Script

使用主题模型分析文本数据

Use the Latent Dirichlet Allocation (LDA) topic model to analyze text data.

开放的生活Script