lda: Collapsed Gibbs Sampling Methods for Topic Models

トピックモデル

CRAN: http://cran.r-project.org/web/packages/lda/index.hldal

> library(lda)
> data("cora.documents")
> data("cora.vocab")
> data("cora.cites")
> data("cora.titles")

バージョン: 1.4.2

関数名	概略
`cora`	A subset of the Cora dataset of scientific documents.
`filter.words`	Functions to manipulate text corpora in LDA format.
`lda-package`	Collapsed Gibbs Samplers and Related Utility Functions for LDA-type Models
`lda.collapsed.gibbs.sampler`	Functions to Fit LDA-type models
`lexicalize`	Generate LDA Documents from Raw Text
`links.as.edgelist`	Convert a set of links keyed on source to a single list of edges.
`newsgroup`	A collection of newsgroup messages with classes.
`nubbi.collapsed.gibbs.sampler`	Collapsed Gibbs Sampling for the Networks Uncovered By Bayesian Inference (NUBBI) Model.
`poliblog`	A collection of political blogs with ratings.
`predictive.distribution`	Compute predictive distributions for fitted LDA-type models.
`predictive.link.probability`	Use the RTM to predict whether a link exists between two documents.
`read.documents`	Read LDA-formatted Document and Vocabulary Files
`rtm.collapsed.gibbs.sampler`	Collapsed Gibbs Sampling for the Relational Topic Model (RTM).
`sampson`	Sampson monk data
`slda.predict`	Predict the response variable of documents using an sLDA model.
`top.topic.words`	Get the Top Words and Documents in Each Topic
`word.counts`	Compute Summary Statistics of a Corpus

core

Cora検索エンジンを利用して集められた科学論文のメタデータ

> data("cora.documents") # corpus
> data("cora.vocab") # corpus
> data("cora.cites")
> data("cora.titles")

lda.collapsed.gibbs.sampler

潜在ディレクレ配分法のモデルによる当てはめ

Arguments

documents
network
K
vocab
num.iterations
num.e.iterations
num.m.iterations
alpha
beta.prior
eta
initial
burnin
compute.log.likelihood
annotations
params
variance
logistic
lambda
regularise
method
trace
MaxNWts
freeze.topics

> lda.collapsed.gibbs.sampler(documents, K, vocab, num.iterations, alpha,
+ eta, initial = NULL, burnin = NULL, compute.log.likelihood = FALSE,
+   trace = 0L, freeze.topics = FALSE)

lexicalize

文字列オブジェクトからLDA文章を作成

Arguments

doclines
sep
lower
count
vocab

> c("吾輩 は 猫 で ある 。 名前 は まだ ない 。") %>% lexicalize()

$documents
$documents[[1]]
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,]    0    1    2    3    4    5    6    1    7     8     5
[2,]    1    1    1    1    1    1    1    1    1     1     1


$vocab
[1] "吾輩" "は"   "猫"   "で"   "ある" "。"   "名前" "まだ" "ない"