What Is Text Analytics Toolbox?
文本分析工具箱™ provides tools for extracting text from documents, preprocessing raw text, visualizing text, and performing machine learning on text data. The typical workflow begins by importing text data from documents, such as PDF and Microsoft®Word®文件,然后从数据中提取有意义的单词。预处理文本后,您可以通过多种方式与数据进行交互,包括将文本转换为数字表示,并用单词云或散点图可视化文本。
使用文本分析工具箱创建的功能也可以与其他数据源的功能结合使用,以构建机器学习模型,以利用文本,数字,音频和其他类型的数据。您可以导入预算的单词装饰模型,例如Word2Vec,fastText和手套格式,将数据集中的单词映射到其相应的单词向量。您还可以通过LDA和LSA等机器学习算法执行主题建模和尺寸降低。
To get started transforming large sets of text data into meaningful insight,download a free trialof Text Analytics Toolbox.
Text Analytics Toolbox提供了用于从文档中提取文本的工具,预处理原始文本,可视化文本以及在文本数据上执行机器学习。
You can use Text Analytics Toolbox to analyze data from sources like maintenance reports, operations logs, financial documents, web and social media sources.
You can extract raw text from a variety of sources including Microsoft Word, Microsoft Excel, and PDF and use word clouds to view the relative frequency of words and interactive scatter plots to understand the numeric relationships between words.
文本分析工具箱provides functions for pre-processing raw text such as removing common words and punctuation and tokenizing documents into individual words or phrases.
Once text is pre-processed, converting text to numeric representations let you do more analysis and visualizations to understand word frequencies including:
- 直方图比较单词统计
- Bag of Wordsand Ngrams to enable efficient visualization and computation
- and TF-IDF models for text mining and machine learning
统计和机器学习算法可以与文本分析一起使用,以执行主题建模,以识别文档中的主题,对文档进行分类并做出预测。
您可以训练机器学习模型或使用预训练的单词嵌入模型,例如Word2Vec,FastText和Glove。
In this example, the Latent Dirichlet Allocation algorithm is used to build a topic model with 60 topics in storm reports to identify damage and weather patterns.
You can also use deep learning algorithms to build accurate classifiers when you have large sets of documents and use parallel computing to speed up text processing and training.
For more information about Text Analytics Toolbox, see the product page, or choose a link below.
Featured Product
文本分析工具箱
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select:.
您还可以从以下列表中选择一个网站:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- AméricaLatina(Español)
- Canada(English)
- United States(English)
欧洲
- Belgium(English)
- 丹麦(English)
- Deutschland(德意志)
- España(Español)
- Finland(English)
- 法国(Français)
- 爱尔兰(English)
- Italia(Italiano)
- Luxembourg(English)
- Netherlands(English)
- 挪威(English)
- Österreich(德意志)
- Portugal(English)
- Sweden(English)
- Switzerland
- United Kingdom(English)
Asia Pacific
- Australia(English)
- India(English)
- 新西兰(English)
- 中国
- 日本Japanese(日本语)
- 한국Korean(한국어)