topicmodels: Topic Models

トピックモデル

> library(topicmodels)
> data("AssociatedPress")

バージョン: 0.2.3


関数名 概略
AssociatedPress Associated Press data
CTM Correlated Topic Model
LDA Latent Dirichlet Allocation
TopicModel-class Virtual class "TopicModel"
TopicModelcontrol-class Different classes for controlling the estimation of topic models
build_graph Construct the adjacency matrix for a topic graph
distHellinger Compute Hellinger distance
ldaformat2dtm Transform data from and for use with the 'lda' package
logLik,TopicModel-method Methods for Function logLik
perplexity Methods for Function perplexity
posterior,TopicModel,missing-method Determine posterior probabilities
topics Extract most likely terms or topics.

AssociatedPress

> data("AssociatedPress")
> AssociatedPress %>% {
+   print(.)
+   class(.)
+ }
<<DocumentTermMatrix (documents: 2246, terms: 10473)>>
Non-/sparse entries: 302031/23220327
Sparsity           : 99%
Maximal term length: 18
Weighting          : term frequency (tf)
[1] "DocumentTermMatrix"    "simple_triplet_matrix"

CTM

相関型トピックモデル

Arguments

  • x
  • k
  • method
  • control
  • model
  • ...
> ctm <- CTM(AssociatedPress[1:20,], k = 2)
> get_terms(ctm, 6)
     Topic 1   Topic 2   
[1,] "percent" "i"       
[2,] "year"    "police"  
[3,] "oil"     "bush"    
[4,] "noriega" "campaign"
[5,] "gas"     "magellan"
[6,] "peres"   "mrs"

LDA

潜在ディレクレ配分法

Arguments

  • x
  • k
  • method... VEM (変分ベイズ法), Gibbs (ギブスサンプリング法)
  • model
  • ...
> (lda <- LDA(AssociatedPress[1:20,], control = list(alpha = 0.1), k = 2))
A LDA_VEM topic model with 2 topics.

terms / get_terms

> LDA(AssociatedPress[1:20,], control = list(alpha = 0.1), k = 2) %>% terms(4)
     Topic 1    Topic 2  
[1,] "year"     "percent"
[2,] "bush"     "oil"    
[3,] "i"        "noriega"
[4,] "campaign" "i"

topics / get_topics

> CTM(AssociatedPress[1:20,], k = 2) %>% topics()
 [1] 2 2 1 1 2 1 2 2 2 2 1 1 1 2 1 2 1 1 2 1