removeInfrequentWords

袋-of-words 모델에서 개수가 적은 단어 제거

페이지 내 모두 축소

구문

newBag= removeInfrequentWords(bag,count)

newBag= removeInfrequentWords(bag,count,'IgnoreCase',true)

설명

예제

newBag= removeInfrequentWords(袋,count)는 bag-of-words 모델袋에서 최대count번 나오는 단어를 제거합니다. 이 함수는 기본적으로 대/소문자를 구분합니다.

예제

newBag= removeInfrequentWords(袋,count,'IgnoreCase',true)는 대/소문자를 구분하지 않고 최대count번 나오는 단어를 제거합니다. 단어가 대/소문자만 다른 경우 개수가 합산됩니다.

예제

모두 축소

빈도가 낮은 단어 제거하기

라이브 스크립트 열기

袋-of-words 모델에서 2번 이하로 나오는 단어를 제거합니다.

토큰화된 문서로 구성된 배열에서 bag-of-words 모델을 만듭니다.

documents = tokenizedDocument(["an example of a short sentence""a second short sentence""another example""a short example"]); bag = bagOfWords(documents)

袋= bagOfWords with properties: Counts: [4x8 double] Vocabulary: ["an" "example" "of" "a" "short" ... ] NumWords: 8 NumDocuments: 4

袋-of-words 모델에서 2번 이하로 나오는 단어를 제거합니다.

count = 2; newBag = removeInfrequentWords(bag,count)

newBag= bagOfWords with properties: Counts: [4x3 double] Vocabulary: ["example" "a" "short"] NumWords: 3 NumDocuments: 4

입력 인수

모두 축소

`袋`- - - - - -입력 bag-of-words 모델
`袋OfWords`객체

입력 bag-of-words 모델로,袋OfWords객체로 지정됩니다.

`count`- - - - - -단어 제거를 위한 개수 임계값
양의 정수

단어 제거를 위한 개수 임계값으로, 양의 정수로 지정됩니다. 이 함수는 총count번 이하로 나오는 단어를 제거합니다.

버전 내역

R2017b에 개발됨

참고 항목