data.table: Extension of data.frame

データフレームの機能を拡張するパッケージ

> library(data.table)

バージョン: 1.9.7


関数名 概略
:= Assignment by reference
IDateTime Integer based date class
J Creates a Join data table
address Address in RAM of a variable
all.equal Equality Test Between Two Data Tables
as.data.table.xts Efficient xts to as.data.table conversion
as.xts.data.table Efficient data.table to xts conversion
between Convenience function for range subset logic.
chmatch Faster match of character vectors
copy Copy an entire object
data.table-class S4 Definition for data.table
data.table-package Enhanced data.frame
dcast.data.table Fast dcast for data.table
duplicated Determine Duplicate Rows
foverlaps Fast overlap joins
frank Fast rank
fread Fast and friendly file finagler
last Last item of an object
like Convenience function for calling regexpr.
melt.data.table Fast melt for data.table
merge Merge Two Data Tables
na.omit.data.table Remove rows with missing values on columns specified
patterns Regex patterns to extract columns from data.table
rbindlist Makes one data.table from a list of many
setDF Convert a data.table to data.frame by reference
setDT Convert lists and data.frames to data.table by reference
setNumericRounding Change or turn off numeric rounding
setattr Set attributes to objects by reference
setcolorder Fast column reordering of a data.table by reference
setkey Create key on a data table
setorder Fast reordering of a data.table by reference
shift Fast lead/lag for vectors and lists
subset.data.table Subsetting data.tables
tables Display all objects of class 'data.table'
test.data.table Runs a set of tests.
timetaken Pretty print of time taken
transform.data.table Data table utilities
transpose Efficient transpose of list
truelength Over-allocation access
tstrsplit strsplit and transpose the resulting list efficiently

IDateTime

整数の日付・時間クラスオブジェクト

> as.IDate("2001-01-01") %>% {
+   print(class(.))
+   print(.)
+ }
[1] "IDate" "Date" 
[1] "2001-01-01"
> identical(as.IDate("2001-01-01"), as("2001-01-01", "IDate"))
[1] TRUE
> as.ITime("10:45") %>% {
+   print(class(.))
+   print(.)
+ }
[1] "ITime"
[1] "10:45:00"
> # hour(x)   
> # yday(x)   
> # wday(x)   
> # mday(x)   
> # week(x)   
> # month(x)  
> # quarter(x)
> # year(x)

all.equal

> dt1 <- data.table(A = letters[1:10], X = 1:10, key = "A")
> identical(all.equal(dt1, dt1), TRUE)
[1] TRUE
> dt2 <- data.table(A = letters[5:14], Y = 1:10, key = "A")
> all.equal(dt1, dt2)
[1] "Different column names"
> is.character(all.equal(dt1, dt2))
[1] TRUE

data.tabel

base::data.frameの拡張

Arguments

  • ...
  • keep.rownames
  • check.names... 既定値はFALSE。列名に重複がないかを確認するか否か。重複があったときは独自の列名が与えられる。
  • key
  • x
  • i
  • j
  • by
  • keyby
  • with
  • nomatch
  • mult
  • roll
  • rollends
  • which
  • .SDcols
  • verbose
  • allow.cartesian
  • drop
  • on
> data.frame(x = rep(c("a", "b", "c"),
+                    each = 3), 
+            y = c(1, 3, 6), 
+            v = 1:9) %>% class()
[1] "data.frame"
> data.table(x = rep(c("a", "b", "c"),
+                    each = 3), 
+            y = c(1, 3, 6), 
+            v = 1:9) %>% {
+              class(.) %>% print(.) # data.tableクラス
+              is.data.table(.)
+              }
[1] "data.table" "data.frame"
[1] TRUE

foverlaps

> x <- data.table(start=c(5,31,22,16), end=c(8,50,25,18), val2 = 7:10)
> y <- data.table(start=c(10, 20, 30), end=c(15, 35, 45), val1 = 1:3)
> setkey(y, start, end)
> foverlaps(x, y, type="any", which=TRUE)
   xid yid
1:   1  NA
2:   2   2
3:   2   3
4:   3   2
5:   4  NA

fread

素早くファイルを読み込む。read.tableよりも高速にファイルを読み込める。

Arguments

  • input... 読み込みの対象とするファイル名や文字列
  • sep
  • sep2
  • nrows... 読み込む行数を指定できる
  • header
  • na.strings
  • stringsAsFactors
  • verbose
  • autostart
  • skip
  • select... 読み込む列を指定
  • drop... 読み込みの際に削除する列。colClasses引数でNULLを指定しても良い
  • colClasses... 明示的に列名のクラスを指定する
  • integer64
  • dec
  • col.names
  • check.names
  • encoding
  • strip.white
  • showProgress
  • data.table... 既定値はTRUE。データテーブルではなく、データフレームとするときはFALSE
> fread(input = "http://www.stats.ox.ac.uk/pub/datasets/csb/ch11b.dat",
+       nrows = 10) %>% dplyr::glimpse()
Observations: 10
Variables: 5
$ V1 <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
$ V2 <int> 307, 307, 307, 307, 307, 307, 307, 307, 307, 307
$ V3 <int> 930, 940, 950, 1000, 1010, 1020, 1030, 1040, 1050, 1100
$ V4 <dbl> 36.58, 36.73, 36.93, 37.15, 37.23, 37.24, 37.24, 36.90, 36....
$ V5 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
> # colClasses引数で列ごとのクラスを指定する
> data <- "A,B,C,D\n1,3,5,7\n2,4,6,8\n"
> fread(data, colClasses = c(B = "character",
+                            C = "character",
+                            D = "character")) %>% dplyr::glimpse()
Observations: 2
Variables: 4
$ A <int> 1, 2
$ B <chr> "3", "4"
$ C <chr> "5", "6"
$ D <chr> "7", "8"
> fread(data, colClasses = list(character = c("B", "C"))) %>% dplyr::glimpse()
Observations: 2
Variables: 4
$ A <int> 1, 2
$ B <chr> "3", "4"
$ C <chr> "5", "6"
$ D <int> 7, 8
> fread(data, colClasses = list(character = 2:3)) %>% dplyr::glimpse()
Observations: 2
Variables: 4
$ A <int> 1, 2
$ B <chr> "3", "4"
$ C <chr> "5", "6"
$ D <int> 7, 8
> # 不要な列を読み込まない。
> #   drop引数を利用するか、colClassesでNULLを指定するか、あるいはselectで必要な列のみを選択
> fread(data, drop = c("B","C"))
   A D
1: 1 7
2: 2 8
> fread(data, colClasses = list(NULL = c("B","C")))
   A D
1: 1 7
2: 2 8
> fread(data, select = c("A", "D"))
   A D
1: 1 7
2: 2 8

setattr / setnames

オブジェクトに属性・名前を付加する

Arguments

  • arglist
  • expr
  • value
> (i <- rep(1:2, 2))
[1] 1 2 1 2
> # 値1をもつattrib という属性を与える
> (setattr(x = i, name = "attrib", value = 1L))
[1] 1 2 1 2
attr(,"attrib")
[1] 1
> j <- data.frame(A = 1:4)
> names(j)
[1] "A"
> setnames(x = j, old = "A", new = "B")
> names(j)
[1] "B"

setDF

データテーブルをデータフレームに切り替える

setDT

リストおよびデータフレームをデータテーブルに変換する

> data.frame(a = c(1, 2, 3),
+            b = c("A", "b", "c")) %>% {
+   class(.) %>% print()
+   setDT(.) %>% class()
+            }
[1] "data.frame"
[1] "data.table" "data.frame"
> list(1:4, letters[1:4]) %>% setDT() %>% {
+   class(.) %>% print()
+   print(.)
+ }
[1] "data.table" "data.frame"
   V1 V2
1:  1  a
2:  2  b
3:  3  c
4:  4  d

setcolorder

data.tableの列順を入れ替える

> data.table(A = sample(3, 10, TRUE), 
+            B = sample(letters[1:3], 10, TRUE), 
+            C = sample(10)) %>% {
+              colnames(.) %>% print()
+              setcolorder(., c("C", "A", "B")) %>% colnames() %>% print()
+              }
[1] "A" "B" "C"
[1] "C" "A" "B"

setkey

データテーブルのkeyを作成する

Arguments

  • x
  • ...
  • col
  • value
  • verbose
  • physical
> (DT <- data.table(A = 3:1, B = letters[3:1]))
   A B
1: 3 c
2: 2 b
3: 1 a
> key(DT)
NULL
> setkey(DT, B)
> key(DT)
[1] "B"
> keycols <- c("A","B")
> setkeyv(DT, keycols)
> key(DT)
[1] "A" "B"

tables

data.tableクラスのオブジェクトを確認する。data.frameクラスではないので注意

> tables()
     NAME NROW NCOL MB COLS           KEY      
[1,] DT      3    2  1 A,B            A,B      
[2,] dt1    10    2  1 A,X            A        
[3,] dt2    10    2  1 A,Y            A        
[4,] x       4    3  1 start,end,val2          
[5,] y       3    3  1 start,end,val1 start,end
Total: 5MB

timetaken

実行時間の表示

> started.at <- proc.time()
> Sys.sleep(1)
> timetaken(started.at)
[1] "1.007sec"