主要内容

join

Combine multiple bag-of-words or bag-of-n-grams models

Description

example

newBag= join(bag)combines the elements in the arraybagby merging the frequency counts. The function combines the elements along the first dimension not equal to 1.

newBag= join(bag,dim)combines the elements in the arraybag沿尺寸dim.

Examples

collapse all

Create an array of two bags-of-words models from tokenized documents.

str = [..."an example of a short sentence""a second short sentence"]; documents = tokenizedDocument(str); bag(1) = bagOfWords(documents(1)); bag(2) = bagOfWords(documents(2))
bag=1×2 object1x2 bagOfWords array with properties: Counts Vocabulary NumWords NumDocuments

Combine the bag-of-words models usingjoin.

bag = join(bag)
bag = bagOfWords with properties: Counts: [2x7 double] Vocabulary: ["an" "example" "of" "a" "short" ... ] NumWords: 7 NumDocuments: 2

If your text data is contained in multiple files in a folder, then you can import the text data and create a bag-of-words model in parallel usingparfor. If you have Parallel Computing Toolbox™ installed, then theparforloop runs in parallel, otherwise, it runs in serial. Usejoin将一系列单词型型号结合到一个型号中。

从一个文件集中创建一个单词范围的模型。示例十四行诗具有文件名”exampleSonnetN.txt", whereN十四行诗的数量。Get a list of the files and their locations usingdir.

fileLocation = fullfile(matlabroot,'例子','textanalytics','数据','exampleSonnet*.txt'); fileInfo = dir(fileLocation);

Initialize an empty bag-of-words model and then loop over the files and create an array of bag-of-words models.

bag = bagOfWords; numFiles = numel(fileInfo);parfori = 1:numFiles f = fileInfo(i); filename = fullfile(f.folder,f.name); textData = extractFileText(filename); document = tokenizedDocument(textData); bag(i) = bagOfWords(document);end
Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 4).

Combine the bag-of-words models usingjoin.

bag = join(bag)
bag = bagOfWords with properties: Counts: [4x276 double] Vocabulary: ["From" "fairest" "creatures" "we" ... ] NumWords: 276 NumDocuments: 4

Input Arguments

collapse all

一系列字袋或n-grams型号,指定为bagOfWords大批or abagOfNgrams大批. Ifbagis abagOfNgrams大批, then each element to be joined must have the same value for theNgramLengthsproperty.

Dimension along which to join models, specified as a positive integer. Ifdim未指定,那么默认值是一个不等于1的大小的第一个维度。

Output Arguments

collapse all

Output model, returned as abagOfWords对象或一个bagOfNgramsobject. The type ofnewBagis the same as the type ofbag.newBaghas the same data type as the input model and has a size of 1 along the dimension being joined.

Version History

Introduced in R2018a