readr: Read Tabular Data

表形式のデータを読み込むためのパッケージ

CRAN: http://cran.r-project.org/web/packages/readr/index.html
GitHub: http://github.com/hadley/readr
Vignettes:
- Column types
- Locales

> library(readr)

バージョン: 1.0.0

関数名	概略
`cols`	Create column specification
`cols_condense`	Examine the column specifications for a data frame
`count_fields`	Count the number of fields in each line of a file.
`date_names`	Create or retrieve date names
`guess_encoding`	Guess encoding of file.
`locale`	Create locales
`parse_atomic`	Parse character vector in an atomic vector.
`parse_datetime`	Parse a character vector of dates or date times.
`parse_factor`	Parse a character vector into a factor
`parse_guess`	Parse a character vector into the "best" type.
`parse_number`	Extract numbers out of an atomic vector
`problems`	Retrieve parsing problems.
`read_delim`	Read a delimited file into a data frame.
`read_file`	Read a file into a string.
`read_fwf`	Read a fixed width file.
`read_lines`	Read lines from a file or string.
`read_log`	Read common/combined log file.
`read_rds`	Read object from RDS file.
`read_table`	Read text file where columns are separated by whitespace.
`readr_example`	Get path to readr example
`spec_delim`	Retrieve the column specification of a file.
`type_convert`	Re-convert character columns in existing data frame.
`write_delim`	Save a data frame to a delimited file.
`write_lines`	Write lines/ a file
`write_rds`	Write a single R object to file

parse_atomic

文字ベクトルを任意のタイプに分解する

> parse_integer(c("1", "2", "3"))

[1] 1 2 3

> parse_double(c("1", "2", "3.123"))

[1] 1.000 2.000 3.123

> parse_factor(c("a", "b"), letters)

[1] a b
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z

> parse_number("$1,123,456.00")

[1] 1123456

cols

> cols(a = col_integer())

cols(
  a = col_integer()
)

> cols_only(a = col_integer())

cols_only(
  a = col_integer()
)

cols_condense

date_names

> date_names_lang("ja")

<date_names>
Days:   日曜日 (日), 月曜日 (月), 火曜日 (火), 水曜日 (水), 木曜日 (木),
        金曜日 (金), 土曜日 (土)
Months: 1月, 2月, 3月, 4月, 5月, 6月, 7月, 8月, 9月, 10月, 11月, 12月
AM/PM:  午前/午後

> date_names_lang("en")

<date_names>
Days:   Sunday (Sun), Monday (Mon), Tuesday (Tue), Wednesday (Wed),
        Thursday (Thu), Friday (Fri), Saturday (Sat)
Months: January (Jan), February (Feb), March (Mar), April (Apr), May
        (May), June (Jun), July (Jul), August (Aug), September
        (Sep), October (Oct), November (Nov), December (Dec)
AM/PM:  AM/PM

> date_names_lang("ko")

<date_names>
Days:   일요일 (일), 월요일 (월), 화요일 (화), 수요일 (수), 목요일 (목),
        금요일 (금), 토요일 (토)
Months: 1월, 2월, 3월, 4월, 5월, 6월, 7월, 8월, 9월, 10월, 11월, 12월
AM/PM:  오전/오후

locale

Arguments

data_names
data_format, time_format
decimal_mark, grouping_mark
tz
encoding
asciify

> locale()

<locale>
Numbers:  123,456.78
Formats:  %AD / %AT
Timezone: UTC
Encoding: UTF-8
<date_names>
Days:   Sunday (Sun), Monday (Mon), Tuesday (Tue), Wednesday (Wed),
        Thursday (Thu), Friday (Fri), Saturday (Sat)
Months: January (Jan), February (Feb), March (Mar), April (Apr), May
        (May), June (Jun), July (Jul), August (Aug), September
        (Sep), October (Oct), November (Nov), December (Dec)
AM/PM:  AM/PM

> locale("ja")

<locale>
Numbers:  123,456.78
Formats:  %AD / %AT
Timezone: UTC
Encoding: UTF-8
<date_names>
Days:   日曜日 (日), 月曜日 (月), 火曜日 (火), 水曜日 (水), 木曜日 (木),
        金曜日 (金), 土曜日 (土)
Months: 1月, 2月, 3月, 4月, 5月, 6月, 7月, 8月, 9月, 10月, 11月, 12月
AM/PM:  午前/午後

parse_datetime

> parse_datetime("01/02/2010", "%d/%m/%Y")

[1] "2010-02-01 UTC"

> parse_datetime("2015年9月24日", "%Y年%m月%d日")

[1] "2015-09-24 UTC"

read_delim

カンマ区切りのファイル（.csv、.tsv）を読み込む

Arguments

file
delim
quote
escape_backslash
escape_double
col_names
col_types
locale
na
comment
skip
n_max
progress
trim_ws

> read_csv("https://github.com/hadley/readr/raw/master/inst/extdata/mtcars.csv") %>% head()

日付データなどは形式を指定して読み込むことが可能（？）

read_delim_chunked

Arguments

file
callback
chunk_size
delim
quote
escape_backslash
escape_double
col_names
col_names
locale
na
quoted_na
comment
trim_ws
skip
guess_max
progress

> f <- function(x, pos) subset(x, gear == 3)
> read_csv_chunked(readr_example("mtcars.csv"), DataFrameCallback$new(f), chunk_size = 5)

Parsed with column specification:
cols(
  mpg = col_double(),
  cyl = col_integer(),
  disp = col_integer(),
  hp = col_integer(),
  drat = col_double(),
  wt = col_double(),
  qsec = col_double(),
  vs = col_integer(),
  am = col_integer(),
  gear = col_integer(),
  carb = col_integer()
)

# A tibble: 15 × 11
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
*  <dbl> <int> <int> <int> <dbl> <dbl> <dbl> <int> <int> <int> <int>
1   21.4     6   258   110  3.08 3.215 19.44     1     0     3     1
2   18.7     8   360   175  3.15 3.440 17.02     0     0     3     2
3   18.1     6   225   105  2.76 3.460 20.22     1     0     3     1
4   14.3     8   360   245  3.21 3.570 15.84     0     0     3     4
5   16.4     8    NA   180  3.07 4.070 17.40     0     0     3     3
6   17.3     8    NA   180  3.07 3.730 17.60     0     0     3     3
7   15.2     8    NA   180  3.07 3.780 18.00     0     0     3     3
8   10.4     8   472   205  2.93 5.250 17.98     0     0     3     4
9   10.4     8   460   215  3.00 5.424 17.82     0     0     3     4
10  14.7     8   440   230  3.23 5.345 17.42     0     0     3     4
11  21.5     4    NA    97  3.70 2.465 20.01     1     0     3     1
12  15.5     8   318   150  2.76 3.520 16.87     0     0     3     2
13  15.2     8   304   150  3.15 3.435 17.30     0     0     3     2
14  13.3     8   350   245  3.73 3.840 15.41     0     0     3     4
15  19.2     8   400   175  3.08 3.845 17.05     0     0     3     2

readr_example

spec_delim

type_convert

既存のデータフレームから文字列やロケールを再構成する

Arguments

df
col_types
na
trim_ws
locale

> read_delim(file = "/Users/uri/git/hatena_blog/demo.txt", delim = "\t") %>% 
+   type_convert(locale = readr::locale(encoding = "cp932"))

Parsed with column specification:
cols(
  var1 = col_character(),
  var2 = col_character(),
  var3 = col_integer()
)

Parsed with column specification:
cols(
  var1 = col_character(),
  var2 = col_character()
)

# A tibble: 3 × 3
   var1  var2  var3
  <chr> <chr> <int>
1    あ     a     1
2    い     b     2
3    う     c    10

参考

R の read.csv() と read.csv2() の違い #rstatsj - Qiita