topicmodels: Topic Models

トピックモデル

CRAN: http://cran.r-project.org/web/packages/topicmodels/index.html
URL: http://datacube.wu.ac.at
Vignettes:
- topicmodels: An R Package for Fitting Topic Models

> library(topicmodels)
> data("AssociatedPress")

バージョン: 0.2.3

関数名	概略
`AssociatedPress`	Associated Press data
`CTM`	Correlated Topic Model
`LDA`	Latent Dirichlet Allocation
`TopicModel-class`	Virtual class "TopicModel"
`TopicModelcontrol-class`	Different classes for controlling the estimation of topic models
`build_graph`	Construct the adjacency matrix for a topic graph
`distHellinger`	Compute Hellinger distance
`ldaformat2dtm`	Transform data from and for use with the 'lda' package
`logLik,TopicModel-method`	Methods for Function logLik
`perplexity`	Methods for Function perplexity
`posterior,TopicModel,missing-method`	Determine posterior probabilities
`topics`	Extract most likely terms or topics.

AssociatedPress

> data("AssociatedPress")
> AssociatedPress %>% {
+   print(.)
+   class(.)
+ }

<<DocumentTermMatrix (documents: 2246, terms: 10473)>>
Non-/sparse entries: 302031/23220327
Sparsity           : 99%
Maximal term length: 18
Weighting          : term frequency (tf)

[1] "DocumentTermMatrix"    "simple_triplet_matrix"

CTM

相関型トピックモデル

Arguments

x
k
method
control
model
...

> ctm <- CTM(AssociatedPress[1:20,], k = 2)
> get_terms(ctm, 6)

     Topic 1   Topic 2   
[1,] "percent" "i"       
[2,] "year"    "police"  
[3,] "oil"     "bush"    
[4,] "noriega" "campaign"
[5,] "gas"     "magellan"
[6,] "peres"   "mrs"

LDA

潜在ディレクレ配分法

Arguments

x
k
method... VEM （変分ベイズ法）, Gibbs （ギブスサンプリング法）
model
...

> (lda <- LDA(AssociatedPress[1:20,], control = list(alpha = 0.1), k = 2))

A LDA_VEM topic model with 2 topics.

terms / get_terms

> LDA(AssociatedPress[1:20,], control = list(alpha = 0.1), k = 2) %>% terms(4)

     Topic 1    Topic 2  
[1,] "year"     "percent"
[2,] "bush"     "oil"    
[3,] "i"        "noriega"
[4,] "campaign" "i"

topics / get_topics

> CTM(AssociatedPress[1:20,], k = 2) %>% topics()

 [1] 2 2 1 1 2 1 2 2 2 2 1 1 1 2 1 2 1 1 2 1