splitstackshape: Stack and Reshape Datasets After Splitting Concatenated Values

> library(splitstackshape)

バージョン: 1.4.2


関数名 概略
FacsToChars Convert All Factor Columns to Character Columns
Names Dataset Names as a Character Vector, Always
NoSep Split Basic Alphanumeric Strings Which Have No Separators
Reshape Reshape Wide Data Into a Semi-long Form
Stacked Stack Columns from a Wide Form to a Long Form
cSplit Split Concatenated Values into Separate Values
cSplit_f Split Concatenated Cells in a data.frame or a data.table
charMat Create a Binary Matrix from a List of Character Values
concat.split Split Concatenated Cells in a Dataset
concat.split.compact Split Concatenated Cells into a Condensed Format
concat.split.expanded Split Concatenated Values into their Corresponding Column Position
concat.split.list Split Concatenated Cells into a List Format
concat.split.multiple Split Concatenated Cells and Optionally Reshape the Output
concat.test Example Dataset with Concatenated Cells
expandRows Expand the Rows of a Dataset
getanID Add an "id" Variable to a Dataset
listCol_l Unlist a Column Stored as a List
listCol_w Flatten a Column Stored as a List
merged.stack Take a List of Stacked data.tables and Merge Them
numMat Create a Numeric Matrix from a List of Values
othernames Extract All Names From a Dataset Other Than the Ones Listed
read.concat Read Concatenated Character Vectors Into a data.frame
splitstackshape-package splitstackshape
stratified Take a Stratified Sample From a Dataset

FacsToChars

データフレーム内で要素として扱われている列を文字列とする

> data.frame(title = c("title1", "title2", "title3"),
+          author = c("author1", "author2", "author3"),
+          customerID = c(1, 2, 1)) %>% {
+            print(str(.))
+            splitstackshape:::FacsToChars(.) %>% str(.)
+          }
'data.frame':    3 obs. of  3 variables:
 $ title     : Factor w/ 3 levels "title1","title2",..: 1 2 3
 $ author    : Factor w/ 3 levels "author1","author2",..: 1 2 3
 $ customerID: num  1 2 1
NULL
'data.frame':    3 obs. of  3 variables:
 $ title     : chr  "title1" "title2" "title3"
 $ author    : chr  "author1" "author2" "author3"
 $ customerID: num  1 2 1

Reshape

横長のデータを縦長にする

> data.frame(id_1 = 1:6, id_2 = c("A", "B"), varA.1 = sample(letters, 6),
+                  varA.2 = sample(letters, 6), varA.3 = sample(letters, 6),
+                  varB.2 = sample(10, 6), varB.3 = sample(10, 6),
+                  varC.3 = rnorm(6)) %>% {
+                    print(dim(.))
+                    splitstackshape::Reshape(., id.vars = c("id_1", "id_2"),
+        var.stubs = c("varA", "varB", "varC")) %>% dim()
+                  }
[1] 6 8
[1] 18  6

concat.test

> concat.test %>% {
+   print(class(.))
+   dplyr::tbl_df(.)
+ }
[1] "data.frame"
Source: local data frame [48 x 4]

      Name     Likes                   Siblings    Hates
    (fctr)    (fctr)                     (fctr)   (fctr)
1     Boyd 1,2,4,5,6 Reynolds , Albert , Ortega     2;4;
2    Rufus 1,2,4,5,6  Cohen , Bert , Montgomery 1;2;3;4;
3     Dana 1,2,4,5,6                     Pierce       2;
4   Carole 1,2,4,5,6 Colon , Michelle , Ballard     1;4;
5   Ramona   1,2,5,6           Snyder , Joann ,   1;2;3;
6   Kelley   1,2,5,6          James , Roxanne ,     1;4;
7     Toby 1,2,4,5,6       Henderson , Isabel ,       2;
8  Marilyn 1,2,4,5,6  Little , Vicki , Phillips 1;2;3;4;
9    Derek 1,2,4,5,6                              1;2;3;
10 Wilfred     1,2,5     Klein , Julius , Miles 1;2;3;4;
..     ...       ...                        ...      ...

cSplit_e

Arguments

  • data
  • split.col... 値を分割する列
  • sep... 分割の基準。初期値は,
  • mode... 列に与える値。binary(在不在)あるいはvalue(値そのもの)を指定
  • type
  • drop
  • fixed
  • fill
> names(concat.test)
[1] "Name"     "Likes"    "Siblings" "Hates"
> cSplit_e(data = concat.test, split.col = "Likes") %>% head() # データの値に応じて列を分ける
    Name     Likes                   Siblings    Hates Likes_1 Likes_2
1   Boyd 1,2,4,5,6 Reynolds , Albert , Ortega     2;4;       1       1
2  Rufus 1,2,4,5,6  Cohen , Bert , Montgomery 1;2;3;4;       1       1
3   Dana 1,2,4,5,6                     Pierce       2;       1       1
4 Carole 1,2,4,5,6 Colon , Michelle , Ballard     1;4;       1       1
5 Ramona   1,2,5,6           Snyder , Joann ,   1;2;3;       1       1
6 Kelley   1,2,5,6          James , Roxanne ,     1;4;       1       1
  Likes_3 Likes_4 Likes_5 Likes_6
1      NA       1       1       1
2      NA       1       1       1
3      NA       1       1       1
4      NA       1       1       1
5      NA      NA       1       1
6      NA      NA       1       1
> cSplit_e(data = concat.test, split.col = "Likes", mode = "value") %>% head()
    Name     Likes                   Siblings    Hates Likes_1 Likes_2
1   Boyd 1,2,4,5,6 Reynolds , Albert , Ortega     2;4;       1       2
2  Rufus 1,2,4,5,6  Cohen , Bert , Montgomery 1;2;3;4;       1       2
3   Dana 1,2,4,5,6                     Pierce       2;       1       2
4 Carole 1,2,4,5,6 Colon , Michelle , Ballard     1;4;       1       2
5 Ramona   1,2,5,6           Snyder , Joann ,   1;2;3;       1       2
6 Kelley   1,2,5,6          James , Roxanne ,     1;4;       1       2
  Likes_3 Likes_4 Likes_5 Likes_6
1      NA       4       5       6
2      NA       4       5       6
3      NA       4       5       6
4      NA       4       5       6
5      NA      NA       5       6
6      NA      NA       5       6
> cSplit_e(data = concat.test, 
+          split.col = "Hates", 
+          sep = ";", 
+          fill = 0) %>% head()
    Name     Likes                   Siblings    Hates Hates_1 Hates_2
1   Boyd 1,2,4,5,6 Reynolds , Albert , Ortega     2;4;       0       1
2  Rufus 1,2,4,5,6  Cohen , Bert , Montgomery 1;2;3;4;       1       1
3   Dana 1,2,4,5,6                     Pierce       2;       0       1
4 Carole 1,2,4,5,6 Colon , Michelle , Ballard     1;4;       1       0
5 Ramona   1,2,5,6           Snyder , Joann ,   1;2;3;       1       1
6 Kelley   1,2,5,6          James , Roxanne ,     1;4;       1       0
  Hates_3 Hates_4
1       0       1
2       1       1
3       0       0
4       0       1
5       1       0
6       0       1