lda: Collapsed Gibbs Sampling Methods for Topic Models

トピックモデル

> library(lda)
> data("cora.documents")
> data("cora.vocab")
> data("cora.cites")
> data("cora.titles")

バージョン: 1.4.2


関数名 概略
cora A subset of the Cora dataset of scientific documents.
filter.words Functions to manipulate text corpora in LDA format.
lda-package Collapsed Gibbs Samplers and Related Utility Functions for LDA-type Models
lda.collapsed.gibbs.sampler Functions to Fit LDA-type models
lexicalize Generate LDA Documents from Raw Text
links.as.edgelist Convert a set of links keyed on source to a single list of edges.
newsgroup A collection of newsgroup messages with classes.
nubbi.collapsed.gibbs.sampler Collapsed Gibbs Sampling for the Networks Uncovered By Bayesian Inference (NUBBI) Model.
poliblog A collection of political blogs with ratings.
predictive.distribution Compute predictive distributions for fitted LDA-type models.
predictive.link.probability Use the RTM to predict whether a link exists between two documents.
read.documents Read LDA-formatted Document and Vocabulary Files
rtm.collapsed.gibbs.sampler Collapsed Gibbs Sampling for the Relational Topic Model (RTM).
sampson Sampson monk data
slda.predict Predict the response variable of documents using an sLDA model.
top.topic.words Get the Top Words and Documents in Each Topic
word.counts Compute Summary Statistics of a Corpus

core

Cora検索エンジンを利用して集められた科学論文のメタデータ

> data("cora.documents") # corpus
> data("cora.vocab") # corpus
> data("cora.cites")
> data("cora.titles")

lda.collapsed.gibbs.sampler

潜在ディレクレ配分法のモデルによる当てはめ

Arguments

  • documents
  • network
  • K
  • vocab
  • num.iterations
  • num.e.iterations
  • num.m.iterations
  • alpha
  • beta.prior
  • eta
  • initial
  • burnin
  • compute.log.likelihood
  • annotations
  • params
  • variance
  • logistic
  • lambda
  • regularise
  • method
  • trace
  • MaxNWts
  • freeze.topics
> lda.collapsed.gibbs.sampler(documents, K, vocab, num.iterations, alpha,
+ eta, initial = NULL, burnin = NULL, compute.log.likelihood = FALSE,
+   trace = 0L, freeze.topics = FALSE)

lexicalize

文字列オブジェクトからLDA文章を作成

Arguments

  • doclines
  • sep
  • lower
  • count
  • vocab
> c("吾輩 は 猫 で ある 。 名前 は まだ ない 。") %>% lexicalize()
$documents
$documents[[1]]
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,]    0    1    2    3    4    5    6    1    7     8     5
[2,]    1    1    1    1    1    1    1    1    1     1     1


$vocab
[1] "吾輩" "は"   "猫"   "で"   "ある" "。"   "名前" "まだ" "ない"