topicmodels: Topic Models
トピックモデル
- CRAN: http://cran.r-project.org/web/packages/topicmodels/index.html
- URL: http://datacube.wu.ac.at
- Vignettes:
> library(topicmodels)
> data("AssociatedPress")
バージョン: 0.2.3
関数名 | 概略 |
---|---|
AssociatedPress |
Associated Press data |
CTM |
Correlated Topic Model |
LDA |
Latent Dirichlet Allocation |
TopicModel-class |
Virtual class "TopicModel" |
TopicModelcontrol-class |
Different classes for controlling the estimation of topic models |
build_graph |
Construct the adjacency matrix for a topic graph |
distHellinger |
Compute Hellinger distance |
ldaformat2dtm |
Transform data from and for use with the 'lda' package |
logLik,TopicModel-method |
Methods for Function logLik |
perplexity |
Methods for Function perplexity |
posterior,TopicModel,missing-method |
Determine posterior probabilities |
topics |
Extract most likely terms or topics. |
AssociatedPress
> data("AssociatedPress")
> AssociatedPress %>% {
+ print(.)
+ class(.)
+ }
<<DocumentTermMatrix (documents: 2246, terms: 10473)>>
Non-/sparse entries: 302031/23220327
Sparsity : 99%
Maximal term length: 18
Weighting : term frequency (tf)
[1] "DocumentTermMatrix" "simple_triplet_matrix"
CTM
相関型トピックモデル
Arguments
- x
- k
- method
- control
- model
- ...
> ctm <- CTM(AssociatedPress[1:20,], k = 2)
> get_terms(ctm, 6)
Topic 1 Topic 2
[1,] "percent" "i"
[2,] "year" "police"
[3,] "oil" "bush"
[4,] "noriega" "campaign"
[5,] "gas" "magellan"
[6,] "peres" "mrs"
LDA
潜在ディレクレ配分法
Arguments
- x
- k
- method...
VEM
(変分ベイズ法),Gibbs
(ギブスサンプリング法) - model
- ...
> (lda <- LDA(AssociatedPress[1:20,], control = list(alpha = 0.1), k = 2))
A LDA_VEM topic model with 2 topics.
terms / get_terms
> LDA(AssociatedPress[1:20,], control = list(alpha = 0.1), k = 2) %>% terms(4)
Topic 1 Topic 2
[1,] "year" "percent"
[2,] "bush" "oil"
[3,] "i" "noriega"
[4,] "campaign" "i"
topics / get_topics
> CTM(AssociatedPress[1:20,], k = 2) %>% topics()
[1] 2 2 1 1 2 1 2 2 2 2 1 1 1 2 1 2 1 1 2 1