tidytext: Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools
- CRAN: http://cran.r-project.org/web/packages/tidytext/index.html
- Vignettes:
- GitHub: http://github.com/juliasilge/tidytext
> library(tidytext)
> data("sentiments")
バージョン: 0.1.1
関数名 | 概略 |
---|---|
bind_tf_idf |
Bind the term frequency and inverse document frequency of a tidy text dataset to the dataset |
cast_sparse |
Create a sparse matrix from row names, column names, and values in a table. |
cast_sparse_ |
Standard-evaluation version of cast_sparse |
cast_tdm_ |
Casting a data frame to a DocumentTermMatrix, TermDocumentMatrix, or dfm |
corpus_tidiers |
Tidiers for a corpus object from the quanteda package |
dictionary_tidiers |
Tidy dictionary objects from the quanteda package |
lda_tidiers |
Tidiers for LDA objects from the topicmodels package |
pair_count |
Count pairs of items that cooccur within a group |
parts_of_speech |
Parts of speech for English words from the Moby Project |
sentiments |
Sentiment lexicons from three sources |
stop_words |
Various lexicons for English stop words |
tdm_tidiers |
Tidy DocumentTermMatrix, TermDocumentMatrix, and related objects from the tm package |
tidy.Corpus |
Tidy a Corpus object from the tm package |
tidy_triplet |
Utility function to tidy a simple triplet matrix |
unnest_tokens |
Split a column into tokens using the tokenizers package |
sentiments
3つの辞書からなる感情極性のデータセット
> sentiments %>% dplyr::glimpse()
Observations: 23,165
Variables: 4
$ word <chr> "abacus", "abandon", "abandon", "abandon", "abandone...
$ sentiment <chr> "trust", "fear", "negative", "sadness", "anger", "fe...
$ lexicon <chr> "nrc", "nrc", "nrc", "nrc", "nrc", "nrc", "nrc", "nr...
$ score <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
> sentiments %>% dplyr::filter(lexicon == "AFINN") %$% range(score)
[1] -5 5
unnest_tokens
{tokenizers}
パッケージを利用したテキストを含んだデータフレームの分割
Arguments
- tbl
- output, output_col
- input, input_col
- token...
tokenizers:::basic-tokenizers
の関数指定(既定値で単語区切りを指定するwords
となる。characters
やsentences
なども指定可能) - to_lower... 追加した変数の単語や語句を小文字にするか否か
- drop
- collapse
- ...
> unnest_tokens
function (tbl, output, input, token = "words", to_lower = TRUE,
drop = TRUE, collapse = NULL, ...)
{
output_col <- col_name(substitute(output))
input_col <- col_name(substitute(input))
unnest_tokens_(tbl, output_col, input_col, token = token,
to_lower = to_lower, drop = drop, collapse = collapse,
...)
}
<environment: namespace:tidytext>