lda: Collapsed Gibbs Sampling Methods for Topic Models
トピックモデル
> library(lda)
> data("cora.documents")
> data("cora.vocab")
> data("cora.cites")
> data("cora.titles")
バージョン: 1.4.2
関数名 | 概略 |
---|---|
cora |
A subset of the Cora dataset of scientific documents. |
filter.words |
Functions to manipulate text corpora in LDA format. |
lda-package |
Collapsed Gibbs Samplers and Related Utility Functions for LDA-type Models |
lda.collapsed.gibbs.sampler |
Functions to Fit LDA-type models |
lexicalize |
Generate LDA Documents from Raw Text |
links.as.edgelist |
Convert a set of links keyed on source to a single list of edges. |
newsgroup |
A collection of newsgroup messages with classes. |
nubbi.collapsed.gibbs.sampler |
Collapsed Gibbs Sampling for the Networks Uncovered By Bayesian Inference (NUBBI) Model. |
poliblog |
A collection of political blogs with ratings. |
predictive.distribution |
Compute predictive distributions for fitted LDA-type models. |
predictive.link.probability |
Use the RTM to predict whether a link exists between two documents. |
read.documents |
Read LDA-formatted Document and Vocabulary Files |
rtm.collapsed.gibbs.sampler |
Collapsed Gibbs Sampling for the Relational Topic Model (RTM). |
sampson |
Sampson monk data |
slda.predict |
Predict the response variable of documents using an sLDA model. |
top.topic.words |
Get the Top Words and Documents in Each Topic |
word.counts |
Compute Summary Statistics of a Corpus |
core
Cora検索エンジンを利用して集められた科学論文のメタデータ
> data("cora.documents") # corpus
> data("cora.vocab") # corpus
> data("cora.cites")
> data("cora.titles")
lda.collapsed.gibbs.sampler
潜在ディレクレ配分法のモデルによる当てはめ
Arguments
- documents
- network
- K
- vocab
- num.iterations
- num.e.iterations
- num.m.iterations
- alpha
- beta.prior
- eta
- initial
- burnin
- compute.log.likelihood
- annotations
- params
- variance
- logistic
- lambda
- regularise
- method
- trace
- MaxNWts
- freeze.topics
> lda.collapsed.gibbs.sampler(documents, K, vocab, num.iterations, alpha,
+ eta, initial = NULL, burnin = NULL, compute.log.likelihood = FALSE,
+ trace = 0L, freeze.topics = FALSE)
lexicalize
文字列オブジェクトからLDA文章を作成
Arguments
- doclines
- sep
- lower
- count
- vocab
> c("吾輩 は 猫 で ある 。 名前 は まだ ない 。") %>% lexicalize()
$documents
$documents[[1]]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,] 0 1 2 3 4 5 6 1 7 8 5
[2,] 1 1 1 1 1 1 1 1 1 1 1
$vocab
[1] "吾輩" "は" "猫" "で" "ある" "。" "名前" "まだ" "ない"