splitstackshape: Stack and Reshape Datasets After Splitting Concatenated Values

> library(splitstackshape)

バージョン: 1.4.2

関数名	概略
`FacsToChars`	Convert All Factor Columns to Character Columns
`Names`	Dataset Names as a Character Vector, Always
`NoSep`	Split Basic Alphanumeric Strings Which Have No Separators
`Reshape`	Reshape Wide Data Into a Semi-long Form
`Stacked`	Stack Columns from a Wide Form to a Long Form
`cSplit`	Split Concatenated Values into Separate Values
`cSplit_f`	Split Concatenated Cells in a data.frame or a data.table
`charMat`	Create a Binary Matrix from a List of Character Values
`concat.split`	Split Concatenated Cells in a Dataset
`concat.split.compact`	Split Concatenated Cells into a Condensed Format
`concat.split.expanded`	Split Concatenated Values into their Corresponding Column Position
`concat.split.list`	Split Concatenated Cells into a List Format
`concat.split.multiple`	Split Concatenated Cells and Optionally Reshape the Output
`concat.test`	Example Dataset with Concatenated Cells
`expandRows`	Expand the Rows of a Dataset
`getanID`	Add an "id" Variable to a Dataset
`listCol_l`	Unlist a Column Stored as a List
`listCol_w`	Flatten a Column Stored as a List
`merged.stack`	Take a List of Stacked data.tables and Merge Them
`numMat`	Create a Numeric Matrix from a List of Values
`othernames`	Extract All Names From a Dataset Other Than the Ones Listed
`read.concat`	Read Concatenated Character Vectors Into a data.frame
`splitstackshape-package`	splitstackshape
`stratified`	Take a Stratified Sample From a Dataset

FacsToChars

データフレーム内で要素として扱われている列を文字列とする

> data.frame(title = c("title1", "title2", "title3"),
+          author = c("author1", "author2", "author3"),
+          customerID = c(1, 2, 1)) %>% {
+            print(str(.))
+            splitstackshape:::FacsToChars(.) %>% str(.)
+          }

'data.frame':    3 obs. of  3 variables:
 $ title     : Factor w/ 3 levels "title1","title2",..: 1 2 3
 $ author    : Factor w/ 3 levels "author1","author2",..: 1 2 3
 $ customerID: num  1 2 1
NULL
'data.frame':    3 obs. of  3 variables:
 $ title     : chr  "title1" "title2" "title3"
 $ author    : chr  "author1" "author2" "author3"
 $ customerID: num  1 2 1

Reshape

横長のデータを縦長にする

> data.frame(id_1 = 1:6, id_2 = c("A", "B"), varA.1 = sample(letters, 6),
+                  varA.2 = sample(letters, 6), varA.3 = sample(letters, 6),
+                  varB.2 = sample(10, 6), varB.3 = sample(10, 6),
+                  varC.3 = rnorm(6)) %>% {
+                    print(dim(.))
+                    splitstackshape::Reshape(., id.vars = c("id_1", "id_2"),
+        var.stubs = c("varA", "varB", "varC")) %>% dim()
+                  }

[1] 6 8

[1] 18  6

concat.test

> concat.test %>% {
+   print(class(.))
+   dplyr::tbl_df(.)
+ }

[1] "data.frame"

Source: local data frame [48 x 4]

      Name     Likes                   Siblings    Hates
    (fctr)    (fctr)                     (fctr)   (fctr)
1     Boyd 1,2,4,5,6 Reynolds , Albert , Ortega     2;4;
2    Rufus 1,2,4,5,6  Cohen , Bert , Montgomery 1;2;3;4;
3     Dana 1,2,4,5,6                     Pierce       2;
4   Carole 1,2,4,5,6 Colon , Michelle , Ballard     1;4;
5   Ramona   1,2,5,6           Snyder , Joann ,   1;2;3;
6   Kelley   1,2,5,6          James , Roxanne ,     1;4;
7     Toby 1,2,4,5,6       Henderson , Isabel ,       2;
8  Marilyn 1,2,4,5,6  Little , Vicki , Phillips 1;2;3;4;
9    Derek 1,2,4,5,6                              1;2;3;
10 Wilfred     1,2,5     Klein , Julius , Miles 1;2;3;4;
..     ...       ...                        ...      ...

cSplit_e

Arguments

data
split.col... 値を分割する列
sep... 分割の基準。初期値は,
mode... 列に与える値。binary（在不在）あるいはvalue（値そのもの）を指定
type
drop
fixed
fill

> names(concat.test)

[1] "Name"     "Likes"    "Siblings" "Hates"

> cSplit_e(data = concat.test, split.col = "Likes") %>% head() # データの値に応じて列を分ける

    Name     Likes                   Siblings    Hates Likes_1 Likes_2
1   Boyd 1,2,4,5,6 Reynolds , Albert , Ortega     2;4;       1       1
2  Rufus 1,2,4,5,6  Cohen , Bert , Montgomery 1;2;3;4;       1       1
3   Dana 1,2,4,5,6                     Pierce       2;       1       1
4 Carole 1,2,4,5,6 Colon , Michelle , Ballard     1;4;       1       1
5 Ramona   1,2,5,6           Snyder , Joann ,   1;2;3;       1       1
6 Kelley   1,2,5,6          James , Roxanne ,     1;4;       1       1
  Likes_3 Likes_4 Likes_5 Likes_6
1      NA       1       1       1
2      NA       1       1       1
3      NA       1       1       1
4      NA       1       1       1
5      NA      NA       1       1
6      NA      NA       1       1

> cSplit_e(data = concat.test, split.col = "Likes", mode = "value") %>% head()

    Name     Likes                   Siblings    Hates Likes_1 Likes_2
1   Boyd 1,2,4,5,6 Reynolds , Albert , Ortega     2;4;       1       2
2  Rufus 1,2,4,5,6  Cohen , Bert , Montgomery 1;2;3;4;       1       2
3   Dana 1,2,4,5,6                     Pierce       2;       1       2
4 Carole 1,2,4,5,6 Colon , Michelle , Ballard     1;4;       1       2
5 Ramona   1,2,5,6           Snyder , Joann ,   1;2;3;       1       2
6 Kelley   1,2,5,6          James , Roxanne ,     1;4;       1       2
  Likes_3 Likes_4 Likes_5 Likes_6
1      NA       4       5       6
2      NA       4       5       6
3      NA       4       5       6
4      NA       4       5       6
5      NA      NA       5       6
6      NA      NA       5       6

> cSplit_e(data = concat.test, 
+          split.col = "Hates", 
+          sep = ";", 
+          fill = 0) %>% head()

    Name     Likes                   Siblings    Hates Hates_1 Hates_2
1   Boyd 1,2,4,5,6 Reynolds , Albert , Ortega     2;4;       0       1
2  Rufus 1,2,4,5,6  Cohen , Bert , Montgomery 1;2;3;4;       1       1
3   Dana 1,2,4,5,6                     Pierce       2;       0       1
4 Carole 1,2,4,5,6 Colon , Michelle , Ballard     1;4;       1       0
5 Ramona   1,2,5,6           Snyder , Joann ,   1;2;3;       1       1
6 Kelley   1,2,5,6          James , Roxanne ,     1;4;       1       0
  Hates_3 Hates_4
1       0       1
2       1       1
3       0       0
4       0       1
5       1       0
6       0       1