splitstackshape: Stack and Reshape Datasets After Splitting Concatenated Values
- CRAN: http://cran.r-project.org/web/packages/splitstackshape/index.html
- GitHub: https://github.com/mrdwab/splitstackshape
> library(splitstackshape)
バージョン: 1.4.2
関数名 | 概略 |
---|---|
FacsToChars |
Convert All Factor Columns to Character Columns |
Names |
Dataset Names as a Character Vector, Always |
NoSep |
Split Basic Alphanumeric Strings Which Have No Separators |
Reshape |
Reshape Wide Data Into a Semi-long Form |
Stacked |
Stack Columns from a Wide Form to a Long Form |
cSplit |
Split Concatenated Values into Separate Values |
cSplit_f |
Split Concatenated Cells in a data.frame or a data.table |
charMat |
Create a Binary Matrix from a List of Character Values |
concat.split |
Split Concatenated Cells in a Dataset |
concat.split.compact |
Split Concatenated Cells into a Condensed Format |
concat.split.expanded |
Split Concatenated Values into their Corresponding Column Position |
concat.split.list |
Split Concatenated Cells into a List Format |
concat.split.multiple |
Split Concatenated Cells and Optionally Reshape the Output |
concat.test |
Example Dataset with Concatenated Cells |
expandRows |
Expand the Rows of a Dataset |
getanID |
Add an "id" Variable to a Dataset |
listCol_l |
Unlist a Column Stored as a List |
listCol_w |
Flatten a Column Stored as a List |
merged.stack |
Take a List of Stacked data.tables and Merge Them |
numMat |
Create a Numeric Matrix from a List of Values |
othernames |
Extract All Names From a Dataset Other Than the Ones Listed |
read.concat |
Read Concatenated Character Vectors Into a data.frame |
splitstackshape-package |
splitstackshape |
stratified |
Take a Stratified Sample From a Dataset |
FacsToChars
データフレーム内で要素として扱われている列を文字列とする
> data.frame(title = c("title1", "title2", "title3"),
+ author = c("author1", "author2", "author3"),
+ customerID = c(1, 2, 1)) %>% {
+ print(str(.))
+ splitstackshape:::FacsToChars(.) %>% str(.)
+ }
'data.frame': 3 obs. of 3 variables:
$ title : Factor w/ 3 levels "title1","title2",..: 1 2 3
$ author : Factor w/ 3 levels "author1","author2",..: 1 2 3
$ customerID: num 1 2 1
NULL
'data.frame': 3 obs. of 3 variables:
$ title : chr "title1" "title2" "title3"
$ author : chr "author1" "author2" "author3"
$ customerID: num 1 2 1
Reshape
横長のデータを縦長にする
> data.frame(id_1 = 1:6, id_2 = c("A", "B"), varA.1 = sample(letters, 6),
+ varA.2 = sample(letters, 6), varA.3 = sample(letters, 6),
+ varB.2 = sample(10, 6), varB.3 = sample(10, 6),
+ varC.3 = rnorm(6)) %>% {
+ print(dim(.))
+ splitstackshape::Reshape(., id.vars = c("id_1", "id_2"),
+ var.stubs = c("varA", "varB", "varC")) %>% dim()
+ }
[1] 6 8
[1] 18 6
concat.test
> concat.test %>% {
+ print(class(.))
+ dplyr::tbl_df(.)
+ }
[1] "data.frame"
Source: local data frame [48 x 4]
Name Likes Siblings Hates
(fctr) (fctr) (fctr) (fctr)
1 Boyd 1,2,4,5,6 Reynolds , Albert , Ortega 2;4;
2 Rufus 1,2,4,5,6 Cohen , Bert , Montgomery 1;2;3;4;
3 Dana 1,2,4,5,6 Pierce 2;
4 Carole 1,2,4,5,6 Colon , Michelle , Ballard 1;4;
5 Ramona 1,2,5,6 Snyder , Joann , 1;2;3;
6 Kelley 1,2,5,6 James , Roxanne , 1;4;
7 Toby 1,2,4,5,6 Henderson , Isabel , 2;
8 Marilyn 1,2,4,5,6 Little , Vicki , Phillips 1;2;3;4;
9 Derek 1,2,4,5,6 1;2;3;
10 Wilfred 1,2,5 Klein , Julius , Miles 1;2;3;4;
.. ... ... ... ...
cSplit_e
Arguments
- data
- split.col... 値を分割する列
- sep... 分割の基準。初期値は
,
- mode... 列に与える値。
binary
(在不在)あるいはvalue
(値そのもの)を指定 - type
- drop
- fixed
- fill
> names(concat.test)
[1] "Name" "Likes" "Siblings" "Hates"
> cSplit_e(data = concat.test, split.col = "Likes") %>% head() # データの値に応じて列を分ける
Name Likes Siblings Hates Likes_1 Likes_2
1 Boyd 1,2,4,5,6 Reynolds , Albert , Ortega 2;4; 1 1
2 Rufus 1,2,4,5,6 Cohen , Bert , Montgomery 1;2;3;4; 1 1
3 Dana 1,2,4,5,6 Pierce 2; 1 1
4 Carole 1,2,4,5,6 Colon , Michelle , Ballard 1;4; 1 1
5 Ramona 1,2,5,6 Snyder , Joann , 1;2;3; 1 1
6 Kelley 1,2,5,6 James , Roxanne , 1;4; 1 1
Likes_3 Likes_4 Likes_5 Likes_6
1 NA 1 1 1
2 NA 1 1 1
3 NA 1 1 1
4 NA 1 1 1
5 NA NA 1 1
6 NA NA 1 1
> cSplit_e(data = concat.test, split.col = "Likes", mode = "value") %>% head()
Name Likes Siblings Hates Likes_1 Likes_2
1 Boyd 1,2,4,5,6 Reynolds , Albert , Ortega 2;4; 1 2
2 Rufus 1,2,4,5,6 Cohen , Bert , Montgomery 1;2;3;4; 1 2
3 Dana 1,2,4,5,6 Pierce 2; 1 2
4 Carole 1,2,4,5,6 Colon , Michelle , Ballard 1;4; 1 2
5 Ramona 1,2,5,6 Snyder , Joann , 1;2;3; 1 2
6 Kelley 1,2,5,6 James , Roxanne , 1;4; 1 2
Likes_3 Likes_4 Likes_5 Likes_6
1 NA 4 5 6
2 NA 4 5 6
3 NA 4 5 6
4 NA 4 5 6
5 NA NA 5 6
6 NA NA 5 6
> cSplit_e(data = concat.test,
+ split.col = "Hates",
+ sep = ";",
+ fill = 0) %>% head()
Name Likes Siblings Hates Hates_1 Hates_2
1 Boyd 1,2,4,5,6 Reynolds , Albert , Ortega 2;4; 0 1
2 Rufus 1,2,4,5,6 Cohen , Bert , Montgomery 1;2;3;4; 1 1
3 Dana 1,2,4,5,6 Pierce 2; 0 1
4 Carole 1,2,4,5,6 Colon , Michelle , Ballard 1;4; 1 0
5 Ramona 1,2,5,6 Snyder , Joann , 1;2;3; 1 1
6 Kelley 1,2,5,6 James , Roxanne , 1;4; 1 0
Hates_3 Hates_4
1 0 1
2 1 1
3 0 0
4 0 1
5 1 0
6 0 1