readr: Read Tabular Data

表形式のデータを読み込むためのパッケージ

> library(readr)

バージョン: 1.0.0


関数名 概略
cols Create column specification
cols_condense Examine the column specifications for a data frame
count_fields Count the number of fields in each line of a file.
date_names Create or retrieve date names
guess_encoding Guess encoding of file.
locale Create locales
parse_atomic Parse character vector in an atomic vector.
parse_datetime Parse a character vector of dates or date times.
parse_factor Parse a character vector into a factor
parse_guess Parse a character vector into the "best" type.
parse_number Extract numbers out of an atomic vector
problems Retrieve parsing problems.
read_delim Read a delimited file into a data frame.
read_file Read a file into a string.
read_fwf Read a fixed width file.
read_lines Read lines from a file or string.
read_log Read common/combined log file.
read_rds Read object from RDS file.
read_table Read text file where columns are separated by whitespace.
readr_example Get path to readr example
spec_delim Retrieve the column specification of a file.
type_convert Re-convert character columns in existing data frame.
write_delim Save a data frame to a delimited file.
write_lines Write lines/ a file
write_rds Write a single R object to file

parse_atomic

文字ベクトルを任意のタイプに分解する

> parse_integer(c("1", "2", "3"))
[1] 1 2 3
> parse_double(c("1", "2", "3.123"))
[1] 1.000 2.000 3.123
> parse_factor(c("a", "b"), letters)
[1] a b
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
> parse_number("$1,123,456.00")
[1] 1123456

cols

> cols(a = col_integer())
cols(
  a = col_integer()
)
> cols_only(a = col_integer())
cols_only(
  a = col_integer()
)

cols_condense

date_names

> date_names_lang("ja")
<date_names>
Days:   日曜日 (日), 月曜日 (月), 火曜日 (火), 水曜日 (水), 木曜日 (木),
        金曜日 (金), 土曜日 (土)
Months: 1月, 2月, 3月, 4月, 5月, 6月, 7月, 8月, 9月, 10月, 11月, 12月
AM/PM:  午前/午後
> date_names_lang("en")
<date_names>
Days:   Sunday (Sun), Monday (Mon), Tuesday (Tue), Wednesday (Wed),
        Thursday (Thu), Friday (Fri), Saturday (Sat)
Months: January (Jan), February (Feb), March (Mar), April (Apr), May
        (May), June (Jun), July (Jul), August (Aug), September
        (Sep), October (Oct), November (Nov), December (Dec)
AM/PM:  AM/PM
> date_names_lang("ko")
<date_names>
Days:   일요일 (일), 월요일 (월), 화요일 (화), 수요일 (수), 목요일 (목),
        금요일 (금), 토요일 (토)
Months: 1월, 2월, 3월, 4월, 5월, 6월, 7월, 8월, 9월, 10월, 11월, 12월
AM/PM:  오전/오후

locale

Arguments

  • data_names
  • data_format, time_format
  • decimal_mark, grouping_mark
  • tz
  • encoding
  • asciify
> locale()
<locale>
Numbers:  123,456.78
Formats:  %AD / %AT
Timezone: UTC
Encoding: UTF-8
<date_names>
Days:   Sunday (Sun), Monday (Mon), Tuesday (Tue), Wednesday (Wed),
        Thursday (Thu), Friday (Fri), Saturday (Sat)
Months: January (Jan), February (Feb), March (Mar), April (Apr), May
        (May), June (Jun), July (Jul), August (Aug), September
        (Sep), October (Oct), November (Nov), December (Dec)
AM/PM:  AM/PM
> locale("ja")
<locale>
Numbers:  123,456.78
Formats:  %AD / %AT
Timezone: UTC
Encoding: UTF-8
<date_names>
Days:   日曜日 (日), 月曜日 (月), 火曜日 (火), 水曜日 (水), 木曜日 (木),
        金曜日 (金), 土曜日 (土)
Months: 1月, 2月, 3月, 4月, 5月, 6月, 7月, 8月, 9月, 10月, 11月, 12月
AM/PM:  午前/午後

parse_datetime

> parse_datetime("01/02/2010", "%d/%m/%Y")
[1] "2010-02-01 UTC"
> parse_datetime("2015年9月24日", "%Y年%m月%d日")
[1] "2015-09-24 UTC"

read_delim

カンマ区切りのファイル(.csv.tsv)を読み込む

Arguments

  • file
  • delim
  • quote
  • escape_backslash
  • escape_double
  • col_names
  • col_types
  • locale
  • na
  • comment
  • skip
  • n_max
  • progress
  • trim_ws
> read_csv("https://github.com/hadley/readr/raw/master/inst/extdata/mtcars.csv") %>% head()

日付データなどは形式を指定して読み込むことが可能(?)

read_delim_chunked

Arguments

  • file
  • callback
  • chunk_size
  • delim
  • quote
  • escape_backslash
  • escape_double
  • col_names
  • col_names
  • locale
  • na
  • quoted_na
  • comment
  • trim_ws
  • skip
  • guess_max
  • progress
> f <- function(x, pos) subset(x, gear == 3)
> read_csv_chunked(readr_example("mtcars.csv"), DataFrameCallback$new(f), chunk_size = 5)
Parsed with column specification:
cols(
  mpg = col_double(),
  cyl = col_integer(),
  disp = col_integer(),
  hp = col_integer(),
  drat = col_double(),
  wt = col_double(),
  qsec = col_double(),
  vs = col_integer(),
  am = col_integer(),
  gear = col_integer(),
  carb = col_integer()
)
# A tibble: 15 × 11
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
*  <dbl> <int> <int> <int> <dbl> <dbl> <dbl> <int> <int> <int> <int>
1   21.4     6   258   110  3.08 3.215 19.44     1     0     3     1
2   18.7     8   360   175  3.15 3.440 17.02     0     0     3     2
3   18.1     6   225   105  2.76 3.460 20.22     1     0     3     1
4   14.3     8   360   245  3.21 3.570 15.84     0     0     3     4
5   16.4     8    NA   180  3.07 4.070 17.40     0     0     3     3
6   17.3     8    NA   180  3.07 3.730 17.60     0     0     3     3
7   15.2     8    NA   180  3.07 3.780 18.00     0     0     3     3
8   10.4     8   472   205  2.93 5.250 17.98     0     0     3     4
9   10.4     8   460   215  3.00 5.424 17.82     0     0     3     4
10  14.7     8   440   230  3.23 5.345 17.42     0     0     3     4
11  21.5     4    NA    97  3.70 2.465 20.01     1     0     3     1
12  15.5     8   318   150  2.76 3.520 16.87     0     0     3     2
13  15.2     8   304   150  3.15 3.435 17.30     0     0     3     2
14  13.3     8   350   245  3.73 3.840 15.41     0     0     3     4
15  19.2     8   400   175  3.08 3.845 17.05     0     0     3     2

readr_example

spec_delim

type_convert

既存のデータフレームから文字列やロケールを再構成する

Arguments

  • df
  • col_types
  • na
  • trim_ws
  • locale
> read_delim(file = "/Users/uri/git/hatena_blog/demo.txt", delim = "\t") %>% 
+   type_convert(locale = readr::locale(encoding = "cp932"))
Parsed with column specification:
cols(
  var1 = col_character(),
  var2 = col_character(),
  var3 = col_integer()
)
Parsed with column specification:
cols(
  var1 = col_character(),
  var2 = col_character()
)
# A tibble: 3 × 3
   var1  var2  var3
  <chr> <chr> <int>
1    あ     a     1
2    い     b     2
3    う     c    10

参考