readr: Read Tabular Data
表形式のデータを読み込むためのパッケージ
- CRAN: http://cran.r-project.org/web/packages/readr/index.html
- GitHub: http://github.com/hadley/readr
- Vignettes:
> library(readr)
バージョン: 1.0.0
関数名 | 概略 |
---|---|
cols |
Create column specification |
cols_condense |
Examine the column specifications for a data frame |
count_fields |
Count the number of fields in each line of a file. |
date_names |
Create or retrieve date names |
guess_encoding |
Guess encoding of file. |
locale |
Create locales |
parse_atomic |
Parse character vector in an atomic vector. |
parse_datetime |
Parse a character vector of dates or date times. |
parse_factor |
Parse a character vector into a factor |
parse_guess |
Parse a character vector into the "best" type. |
parse_number |
Extract numbers out of an atomic vector |
problems |
Retrieve parsing problems. |
read_delim |
Read a delimited file into a data frame. |
read_file |
Read a file into a string. |
read_fwf |
Read a fixed width file. |
read_lines |
Read lines from a file or string. |
read_log |
Read common/combined log file. |
read_rds |
Read object from RDS file. |
read_table |
Read text file where columns are separated by whitespace. |
readr_example |
Get path to readr example |
spec_delim |
Retrieve the column specification of a file. |
type_convert |
Re-convert character columns in existing data frame. |
write_delim |
Save a data frame to a delimited file. |
write_lines |
Write lines/ a file |
write_rds |
Write a single R object to file |
parse_atomic
文字ベクトルを任意のタイプに分解する
> parse_integer(c("1", "2", "3"))
[1] 1 2 3
> parse_double(c("1", "2", "3.123"))
[1] 1.000 2.000 3.123
> parse_factor(c("a", "b"), letters)
[1] a b
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
> parse_number("$1,123,456.00")
[1] 1123456
cols
> cols(a = col_integer())
cols(
a = col_integer()
)
> cols_only(a = col_integer())
cols_only(
a = col_integer()
)
cols_condense
date_names
> date_names_lang("ja")
<date_names>
Days: 日曜日 (日), 月曜日 (月), 火曜日 (火), 水曜日 (水), 木曜日 (木),
金曜日 (金), 土曜日 (土)
Months: 1月, 2月, 3月, 4月, 5月, 6月, 7月, 8月, 9月, 10月, 11月, 12月
AM/PM: 午前/午後
> date_names_lang("en")
<date_names>
Days: Sunday (Sun), Monday (Mon), Tuesday (Tue), Wednesday (Wed),
Thursday (Thu), Friday (Fri), Saturday (Sat)
Months: January (Jan), February (Feb), March (Mar), April (Apr), May
(May), June (Jun), July (Jul), August (Aug), September
(Sep), October (Oct), November (Nov), December (Dec)
AM/PM: AM/PM
> date_names_lang("ko")
<date_names>
Days: 일요일 (일), 월요일 (월), 화요일 (화), 수요일 (수), 목요일 (목),
금요일 (금), 토요일 (토)
Months: 1월, 2월, 3월, 4월, 5월, 6월, 7월, 8월, 9월, 10월, 11월, 12월
AM/PM: 오전/오후
locale
Arguments
- data_names
- data_format, time_format
- decimal_mark, grouping_mark
- tz
- encoding
- asciify
> locale()
<locale>
Numbers: 123,456.78
Formats: %AD / %AT
Timezone: UTC
Encoding: UTF-8
<date_names>
Days: Sunday (Sun), Monday (Mon), Tuesday (Tue), Wednesday (Wed),
Thursday (Thu), Friday (Fri), Saturday (Sat)
Months: January (Jan), February (Feb), March (Mar), April (Apr), May
(May), June (Jun), July (Jul), August (Aug), September
(Sep), October (Oct), November (Nov), December (Dec)
AM/PM: AM/PM
> locale("ja")
<locale>
Numbers: 123,456.78
Formats: %AD / %AT
Timezone: UTC
Encoding: UTF-8
<date_names>
Days: 日曜日 (日), 月曜日 (月), 火曜日 (火), 水曜日 (水), 木曜日 (木),
金曜日 (金), 土曜日 (土)
Months: 1月, 2月, 3月, 4月, 5月, 6月, 7月, 8月, 9月, 10月, 11月, 12月
AM/PM: 午前/午後
parse_datetime
> parse_datetime("01/02/2010", "%d/%m/%Y")
[1] "2010-02-01 UTC"
> parse_datetime("2015年9月24日", "%Y年%m月%d日")
[1] "2015-09-24 UTC"
read_delim
カンマ区切りのファイル(.csv
、.tsv
)を読み込む
Arguments
- file
- delim
- quote
- escape_backslash
- escape_double
- col_names
- col_types
- locale
- na
- comment
- skip
- n_max
- progress
- trim_ws
> read_csv("https://github.com/hadley/readr/raw/master/inst/extdata/mtcars.csv") %>% head()
日付データなどは形式を指定して読み込むことが可能(?)
read_delim_chunked
Arguments
- file
- callback
- chunk_size
- delim
- quote
- escape_backslash
- escape_double
- col_names
- col_names
- locale
- na
- quoted_na
- comment
- trim_ws
- skip
- guess_max
- progress
> f <- function(x, pos) subset(x, gear == 3)
> read_csv_chunked(readr_example("mtcars.csv"), DataFrameCallback$new(f), chunk_size = 5)
Parsed with column specification:
cols(
mpg = col_double(),
cyl = col_integer(),
disp = col_integer(),
hp = col_integer(),
drat = col_double(),
wt = col_double(),
qsec = col_double(),
vs = col_integer(),
am = col_integer(),
gear = col_integer(),
carb = col_integer()
)
# A tibble: 15 × 11
mpg cyl disp hp drat wt qsec vs am gear carb
* <dbl> <int> <int> <int> <dbl> <dbl> <dbl> <int> <int> <int> <int>
1 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
2 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
3 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
4 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4
5 16.4 8 NA 180 3.07 4.070 17.40 0 0 3 3
6 17.3 8 NA 180 3.07 3.730 17.60 0 0 3 3
7 15.2 8 NA 180 3.07 3.780 18.00 0 0 3 3
8 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
9 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4
10 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4
11 21.5 4 NA 97 3.70 2.465 20.01 1 0 3 1
12 15.5 8 318 150 2.76 3.520 16.87 0 0 3 2
13 15.2 8 304 150 3.15 3.435 17.30 0 0 3 2
14 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4
15 19.2 8 400 175 3.08 3.845 17.05 0 0 3 2
readr_example
spec_delim
type_convert
既存のデータフレームから文字列やロケールを再構成する
Arguments
- df
- col_types
- na
- trim_ws
- locale
> read_delim(file = "/Users/uri/git/hatena_blog/demo.txt", delim = "\t") %>%
+ type_convert(locale = readr::locale(encoding = "cp932"))
Parsed with column specification:
cols(
var1 = col_character(),
var2 = col_character(),
var3 = col_integer()
)
Parsed with column specification:
cols(
var1 = col_character(),
var2 = col_character()
)
# A tibble: 3 × 3
var1 var2 var3
<chr> <chr> <int>
1 あ a 1
2 い b 2
3 う c 10