data.table: Extension of data.frame
データフレームの機能を拡張するパッケージ
- CRAN: http://cran.r-project.org/web/packages/data.table/index.html
- GitHub: https://github.com/Rdatatable/data.table
> library(data.table)
バージョン: 1.9.7
関数名 | 概略 |
---|---|
:= |
Assignment by reference |
IDateTime |
Integer based date class |
J |
Creates a Join data table |
address |
Address in RAM of a variable |
all.equal |
Equality Test Between Two Data Tables |
as.data.table.xts |
Efficient xts to as.data.table conversion |
as.xts.data.table |
Efficient data.table to xts conversion |
between |
Convenience function for range subset logic. |
chmatch |
Faster match of character vectors |
copy |
Copy an entire object |
data.table-class |
S4 Definition for data.table |
data.table-package |
Enhanced data.frame |
dcast.data.table |
Fast dcast for data.table |
duplicated |
Determine Duplicate Rows |
foverlaps |
Fast overlap joins |
frank |
Fast rank |
fread |
Fast and friendly file finagler |
last |
Last item of an object |
like |
Convenience function for calling regexpr. |
melt.data.table |
Fast melt for data.table |
merge |
Merge Two Data Tables |
na.omit.data.table |
Remove rows with missing values on columns specified |
patterns |
Regex patterns to extract columns from data.table |
rbindlist |
Makes one data.table from a list of many |
setDF |
Convert a data.table to data.frame by reference |
setDT |
Convert lists and data.frames to data.table by reference |
setNumericRounding |
Change or turn off numeric rounding |
setattr |
Set attributes to objects by reference |
setcolorder |
Fast column reordering of a data.table by reference |
setkey |
Create key on a data table |
setorder |
Fast reordering of a data.table by reference |
shift |
Fast lead/lag for vectors and lists |
subset.data.table |
Subsetting data.tables |
tables |
Display all objects of class 'data.table' |
test.data.table |
Runs a set of tests. |
timetaken |
Pretty print of time taken |
transform.data.table |
Data table utilities |
transpose |
Efficient transpose of list |
truelength |
Over-allocation access |
tstrsplit |
strsplit and transpose the resulting list efficiently |
IDateTime
整数の日付・時間クラスオブジェクト
> as.IDate("2001-01-01") %>% {
+ print(class(.))
+ print(.)
+ }
[1] "IDate" "Date"
[1] "2001-01-01"
> identical(as.IDate("2001-01-01"), as("2001-01-01", "IDate"))
[1] TRUE
> as.ITime("10:45") %>% {
+ print(class(.))
+ print(.)
+ }
[1] "ITime"
[1] "10:45:00"
> # hour(x)
> # yday(x)
> # wday(x)
> # mday(x)
> # week(x)
> # month(x)
> # quarter(x)
> # year(x)
all.equal
> dt1 <- data.table(A = letters[1:10], X = 1:10, key = "A")
> identical(all.equal(dt1, dt1), TRUE)
[1] TRUE
> dt2 <- data.table(A = letters[5:14], Y = 1:10, key = "A")
> all.equal(dt1, dt2)
[1] "Different column names"
> is.character(all.equal(dt1, dt2))
[1] TRUE
data.tabel
base::data.frame
の拡張
Arguments
- ...
- keep.rownames
- check.names... 既定値は
FALSE
。列名に重複がないかを確認するか否か。重複があったときは独自の列名が与えられる。 - key
- x
- i
- j
- by
- keyby
- with
- nomatch
- mult
- roll
- rollends
- which
- .SDcols
- verbose
- allow.cartesian
- drop
- on
> data.frame(x = rep(c("a", "b", "c"),
+ each = 3),
+ y = c(1, 3, 6),
+ v = 1:9) %>% class()
[1] "data.frame"
> data.table(x = rep(c("a", "b", "c"),
+ each = 3),
+ y = c(1, 3, 6),
+ v = 1:9) %>% {
+ class(.) %>% print(.) # data.tableクラス
+ is.data.table(.)
+ }
[1] "data.table" "data.frame"
[1] TRUE
foverlaps
> x <- data.table(start=c(5,31,22,16), end=c(8,50,25,18), val2 = 7:10)
> y <- data.table(start=c(10, 20, 30), end=c(15, 35, 45), val1 = 1:3)
> setkey(y, start, end)
> foverlaps(x, y, type="any", which=TRUE)
xid yid
1: 1 NA
2: 2 2
3: 2 3
4: 3 2
5: 4 NA
fread
素早くファイルを読み込む。read.table
よりも高速にファイルを読み込める。
Arguments
- input... 読み込みの対象とするファイル名や文字列
- sep
- sep2
- nrows... 読み込む行数を指定できる
- header
- na.strings
- stringsAsFactors
- verbose
- autostart
- skip
- select... 読み込む列を指定
- drop... 読み込みの際に削除する列。colClasses引数でNULLを指定しても良い
- colClasses... 明示的に列名のクラスを指定する
- integer64
- dec
- col.names
- check.names
- encoding
- strip.white
- showProgress
- data.table... 既定値は
TRUE
。データテーブルではなく、データフレームとするときはFALSE
> fread(input = "http://www.stats.ox.ac.uk/pub/datasets/csb/ch11b.dat",
+ nrows = 10) %>% dplyr::glimpse()
Observations: 10
Variables: 5
$ V1 <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
$ V2 <int> 307, 307, 307, 307, 307, 307, 307, 307, 307, 307
$ V3 <int> 930, 940, 950, 1000, 1010, 1020, 1030, 1040, 1050, 1100
$ V4 <dbl> 36.58, 36.73, 36.93, 37.15, 37.23, 37.24, 37.24, 36.90, 36....
$ V5 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
> # colClasses引数で列ごとのクラスを指定する
> data <- "A,B,C,D\n1,3,5,7\n2,4,6,8\n"
> fread(data, colClasses = c(B = "character",
+ C = "character",
+ D = "character")) %>% dplyr::glimpse()
Observations: 2
Variables: 4
$ A <int> 1, 2
$ B <chr> "3", "4"
$ C <chr> "5", "6"
$ D <chr> "7", "8"
> fread(data, colClasses = list(character = c("B", "C"))) %>% dplyr::glimpse()
Observations: 2
Variables: 4
$ A <int> 1, 2
$ B <chr> "3", "4"
$ C <chr> "5", "6"
$ D <int> 7, 8
> fread(data, colClasses = list(character = 2:3)) %>% dplyr::glimpse()
Observations: 2
Variables: 4
$ A <int> 1, 2
$ B <chr> "3", "4"
$ C <chr> "5", "6"
$ D <int> 7, 8
> # 不要な列を読み込まない。
> # drop引数を利用するか、colClassesでNULLを指定するか、あるいはselectで必要な列のみを選択
> fread(data, drop = c("B","C"))
A D
1: 1 7
2: 2 8
> fread(data, colClasses = list(NULL = c("B","C")))
A D
1: 1 7
2: 2 8
> fread(data, select = c("A", "D"))
A D
1: 1 7
2: 2 8
setattr / setnames
オブジェクトに属性・名前を付加する
Arguments
- arglist
- expr
- value
> (i <- rep(1:2, 2))
[1] 1 2 1 2
> # 値1をもつattrib という属性を与える
> (setattr(x = i, name = "attrib", value = 1L))
[1] 1 2 1 2
attr(,"attrib")
[1] 1
> j <- data.frame(A = 1:4)
> names(j)
[1] "A"
> setnames(x = j, old = "A", new = "B")
> names(j)
[1] "B"
setDF
データテーブルをデータフレームに切り替える
setDT
リストおよびデータフレームをデータテーブルに変換する
> data.frame(a = c(1, 2, 3),
+ b = c("A", "b", "c")) %>% {
+ class(.) %>% print()
+ setDT(.) %>% class()
+ }
[1] "data.frame"
[1] "data.table" "data.frame"
> list(1:4, letters[1:4]) %>% setDT() %>% {
+ class(.) %>% print()
+ print(.)
+ }
[1] "data.table" "data.frame"
V1 V2
1: 1 a
2: 2 b
3: 3 c
4: 4 d
setcolorder
data.tableの列順を入れ替える
> data.table(A = sample(3, 10, TRUE),
+ B = sample(letters[1:3], 10, TRUE),
+ C = sample(10)) %>% {
+ colnames(.) %>% print()
+ setcolorder(., c("C", "A", "B")) %>% colnames() %>% print()
+ }
[1] "A" "B" "C"
[1] "C" "A" "B"
setkey
データテーブルのkeyを作成する
Arguments
- x
- ...
- col
- value
- verbose
- physical
> (DT <- data.table(A = 3:1, B = letters[3:1]))
A B
1: 3 c
2: 2 b
3: 1 a
> key(DT)
NULL
> setkey(DT, B)
> key(DT)
[1] "B"
> keycols <- c("A","B")
> setkeyv(DT, keycols)
> key(DT)
[1] "A" "B"
tables
data.tableクラスのオブジェクトを確認する。data.frameクラスではないので注意
> tables()
NAME NROW NCOL MB COLS KEY
[1,] DT 3 2 1 A,B A,B
[2,] dt1 10 2 1 A,X A
[3,] dt2 10 2 1 A,Y A
[4,] x 4 3 1 start,end,val2
[5,] y 3 3 1 start,end,val1 start,end
Total: 5MB
timetaken
実行時間の表示
> started.at <- proc.time()
> Sys.sleep(1)
> timetaken(started.at)
[1] "1.007sec"