모델링 및 예측

토픽 모델과 단어 임베딩을 사용하여 예측 모델 개발

LSA, LDA, 단어 임베딩 같은 머신러닝 기법과 모델을 사용하여 고차원 텍스트 데이터셋에서 군집을 찾고 특징을 추출할 수 있습니다. Text Analytics Toolbox™에서 생성한 특징은 다른 데이터 소스의 특징과 결합할 수 있습니다. 결합된 특징을 사용하여 텍스트, 숫자 등 다양한 유형의 데이터를 활용하는 머신러닝 모델을 빌드할 수 있습니다.

함수

모두 확장

단어 및 N-Gram 개수 세기

`bagOfWords`	Bag-of-words 모델
`bagOfNgrams`	Bag-of-n-grams 모델
`addDocument`	bag-of-words 모델 또는 bag-of-n-grams 모델에 문서 추가
`removeDocument`	bag-of-words 모델 또는 bag-of-n-grams 모델에서 문서 제거
`removeInfrequentWords`	bag-of-words 모델에서 개수가 적은 단어 제거
`removeInfrequentNgrams`	bag-of-n-grams 모델에서 낮은 빈도로 나오는 n-gram 제거
`removeWords`	문서 또는 bag-of-words 모델에서 선택한 단어 제거
`removeNgrams`	bag-of-n-grams 모델에서 n-gram 제거
`removeEmptyDocuments`	토큰화된 문서 배열, bag-of-words 모델 또는 bag-of-n-grams 모델에서 빈 문서 제거
`topkwords`	Most important words in bag-of-words model or LDA topic
`topkngrams`	Most frequent n-grams
`encode`	문서를 단어 개수 또는 n-gram 개수로 구성된 행렬로 인코딩
`tfidf`	TF-IDF(단어빈도-역문서빈)도행렬
`加入`	Combine multiple bag-of-words or bag-of-n-grams models

감성 분석

`vaderSentimentScores`	Sentiment scores with VADER algorithm
`ratioSentimentScores`	Sentiment scores with ratio rule

단어 임베딩 및 인코딩

`fastTextWordEmbedding`	사전 훈련된 fastText 단어 임베딩
`wordEncoding`	단어를 인덱스로 매핑하는 단어 인코딩 모델
`doc2sequence`	Convert documents to sequences for deep learning
`wordEmbeddingLayer`	딥러닝 신경망을 위한 단어 임베딩 계층
`word2vec`	단어를 임베딩 벡터에 매핑하기
`word2ind`	단어를 인코딩 인덱스에 매핑하기
`vec2word`	임베딩 벡터를 단어에 매핑
`ind2word`	인코딩 인덱스를 단어에 매핑하기
`isVocabularyWord`	단어가 단어 임베딩 또는 인코딩에 포함되었는지 테스트
`readWordEmbedding`	파일에서 단어 임베딩 읽어오기
`trainWordEmbedding`	Train word embedding
`writeWordEmbedding`	단어 임베딩 파일 쓰기
`wordEmbedding`	단어를 벡터로 매핑하는 단어 임베딩 모델

문서 요약 및 유사도

`extractSummary`	문서에서 요약 추출
`rakeKeywords`	Extract keywords using RAKE
`textrankKeywords`	Extract keywords using TextRank
`bleuEvaluationScore`	Evaluate translation or summarization with BLEU similarity score
`rougeEvaluationScore`	Evaluate translation or summarization with ROUGE similarity score
`bm25Similarity`	Document similarities with BM25 algorithm
`cosineSimilarity`	코사인 유사도를 사용한 문서 유사도
`textrankScores`	TextRank 알고리즘을 사용하여 문서 점수화
`lexrankScores`	LexRank 알고리즘을 사용하여 문서 점수화
`mmrScores`	Document scoring with Maximal Marginal Relevance (MMR) algorithm

토픽 모델링 및 차원 축소

`fitlda`	Fit latent Dirichlet allocation (LDA) model
`fitlsa`	Fit LSA model
`resume`	Resume fitting LDA model
`logp`	Document log-probabilities and goodness of fit of LDA model
`predict`	Predict top LDA topics of documents
`transform`	Transform documents into lower-dimensional space
`ldaModel`	Latent Dirichlet allocation (LDA) model
`lsaModel`	Latent semantic analysis (LSA) model

시각화

`wordcloud`	Create word cloud chart from text, bag-of-words model, bag-of-n-grams model, or LDA model
`textscatter`	2-D 텍스트 산점도 플롯
`textscatter3`	3-D 텍스트 산점도 플롯

도움말 항목

분류 및 모델링

단순 전처리 함수 만들기
이 예제에서는 분석할 텍스트 데이터를 정리하고 전처리하는 함수를 만드는 방법을 보여줍니다.
분류를 위한 간단한 텍스트 모델 만들기
이 예제에서는 bag-of-words 모델을 사용하여 단어 빈도 수로 간단한 텍스트 분류기를 훈련시키는 방법을 보여줍니다.
다단어 구문을 사용하여 텍스트 데이터 분석하기
이 예제에서는 n-gram 빈도 수를 사용하여 텍스트를 분석하는 방법을 보여줍니다.
토픽 모델을 사용하여 텍스트 데이터 분석하기
이 예제에서는 LDA(잠재 디리클레 할당) 토픽 모델을 사용하여 텍스트 데이터를 분석하는 방법을 보여줍니다.
LDA 모델의 토픽 수 선택하기
이 예제에서는 LDA(잠재 디리클레 할당) 모델에 적합한 토픽 수를 결정하는 방법을 보여줍니다.
Compare LDA Solvers
This example shows how to compare latent Dirichlet allocation (LDA) solvers by comparing the goodness of fit and the time taken to fit the model.
Visualize Document Clusters Using LDA Model
This example shows how to visualize the clustering of documents using a Latent Dirichlet Allocation (LDA) topic model and a t-SNE plot.
Visualize LDA Topic Correlations
这个例子展示了如何分析相关性的赌注ween topics in a Latent Dirichlet Allocation (LDA) topic model.
Visualize Correlations Between LDA Topics and Document Labels
This example shows how to fit a Latent Dirichlet Allocation (LDA) topic model and visualize correlations between the LDA topics and document labels.
동시발생 신경망 만들기
이 예제에서는 bag-of-words 모델을 사용하여 동시발생 신경망을 만드는 방법을 보여줍니다.

감성 분석 및 키워드 추출

텍스트에 내포된 감성 분석하기
이 예제에서는 VADER(Valence Aware Dictionary and sEntiment Reasoner) 알고리즘을 사용하여 감성 분석을 수행하는 방법을 보여줍니다.
Generate Domain Specific Sentiment Lexicon
This example shows how to generate a lexicon for sentiment analysis using 10-K and 10-Q financial reports.
감성 분류기 훈련시키기
이 예제에서는 주석이 있는 긍정적, 부정적 감성 단어 목록과 사전 훈련된 단어 임베딩을 사용하여 분류기에게 감성 분석을 훈련시키는 방법을 보여줍니다.
Extract Keywords from Text Data Using RAKE
This example shows how to extract keywords from text data using Rapid Automatic Keyword Extraction (RAKE).
Extract Keywords from Text Data Using TextRank
This example shows to extract keywords from text data using TextRank.

딥러닝

딥러닝을 사용하여 텍스트 데이터 분류하기
이 예제에서는 딥러닝 장단기 기억(LSTM) 신경망을 사용하여 텍스트 데이터를 분류하는 방법을 보여줍니다.
Classify Text Data Using Convolutional Neural Network
This example shows how to classify text data using a convolutional neural network.
Classify Out-of-Memory Text Data Using Deep Learning
This example shows how to classify out-of-memory text data with a deep learning network using a transformed datastore.
Sequence-to-Sequence Translation Using Attention
This example shows how to convert decimal strings to Roman numerals using a recurrent sequence-to-sequence encoder-decoder model with attention.
Multilabel Text Classification Using Deep Learning
This example shows how to classify text data that has multiple independent labels.
딥러닝을 사용하여 텍스트 생성하기(Deep Learning Toolbox)
이 예제에서는 텍스트를 생성하도록 딥러닝 장단기 기억(LSTM) 신경망을 훈련시키는 방법을 보여줍니다.
오만과 편견 그리고 MATLAB
이 예제에서는 문자 임베딩을 사용하여 텍스트를 생성하도록 딥러닝 LSTM 신경망을 훈련시키는 방법을 보여줍니다.
Word-By-Word Text Generation Using Deep Learning
This example shows how to train a deep learning LSTM network to generate text word-by-word.
Classify Text Data Using Custom Training Loop
This example shows how to classify text data using a deep learning bidirectional long short-term memory (BiLSTM) network with a custom training loop.
Generate Text Using Autoencoders
This example shows how to generate text data using autoencoders.
Define Text Encoder Model Function
This example shows how to define a text encoder model function.
텍스트 디코더 모델 함수 정의하기
이 예제에서는 텍스트 디코더 모델 함수를 정의하는 방법을 보여줍니다.
Language Translation Using Deep Learning
This example shows how to train a German to English language translator using a recurrent sequence-to-sequence encoder-decoder model with attention.

언어 지원

언어고려사항
다른 언어에서 Text Analytics Toolbox 기능을 사용하는 방법에 대한 정보.
일본어 지원
Text Analytics Toolbox의 일본어 지원에 대한 정보.
Analyze Japanese Text Data
This example shows how to import, prepare, and analyze Japanese text data using a topic model.
German Language Support
Information on German support in Text Analytics Toolbox.
Analyze German Text Data
This example shows how to import, prepare, and analyze German text data using a topic model.