tidyjson: A Grammar for Turning 'JSON' into Tidy Tables

JSONファイルを整形して扱いやすくする

> library(tidyjson)

バージョン: 0.2.1


関数名 概略
[.tbl_json Extract subsets of a tbl_json object (not replace)
allowed_json_types Fundamental JSON types from http://json.org/, where I collapse 'true' and 'false' into 'logical'
append_values Appends all values with a specified type as a new column
append_values_factory Creates the appendvalues* functions
append_values_type get list of values from json
commits Commit data for the dplyr repo from github API
companies Startup company information for 1,000 companies
determine_types Determines the types of a list of parsed JSON
enter_object Dive into a specific object "key"
gather_array Stack a JSON array
gather_keys Stack a JSON "key": value object
issues Issue data for the dplyr repo from github API
jfactory Factory that creates the j* functions below
jfunctions Navigates nested objects to get at keys of a specific type, to be used as arguments to spread_values
json_lengths Add a column that contains the length of the JSON data
json_types Add a column that tells the 'type' of the data in the root of the JSON
list_path Recursively access a path
my_unlist Unlists while preserving NULLs and only unlisting lists with one value
prep_path Prepare a path from ...
read_json Reads JSON from an input uri (file, url, ...) and returns a tbl_json
replace_nulls Replace nulls with something else
spread_values Create new columns with JSON values
tbl_json Combines structured JSON (as a data.frame) with remaining JSON
tidyjson tidyjson.
worldbank Projects funded by the World Bank
wrap_dplyr_verb Wrapper for extending dplyr verbs to tbl_json objects

append_values_string / append_values_number / append_values_logical

Arguments

  • x
  • column.name
  • force
> '{"first": "bob", "last": "jones"}' %>%
+   gather_keys() %>%
+   append_values_string()
  document.id   key string
1           1 first    bob
2           1  last  jones
> '{"first": true, "last": false}' %>%
+   gather_keys() %>%
+   append_values_logical()
  document.id   key logical
1           1 first    TRUE
2           1  last   FALSE
> '{"first": 3, "last": 10}' %>%
+   gather_keys() %>%
+   append_values_number()
  document.id   key number
1           1 first      3
2           1  last     10

commits

enter_object

> c('{"name": "bob", "children": ["sally", "george"]}', '{"name": "anne"}') %>%
+   spread_values(parent.name = jstring("name")) %>%
+   enter_object("children") %>%
+   gather_array() %>%
+   append_values_string("children")
  document.id parent.name array.index children
1           1         bob           1    sally
2           1         bob           2   george

gather_array

JSON配列をデータフレームに積み重ねる

> '[1, "a", {"k": "v"}]' %>% gather_array() %>% json_types()
  document.id array.index   type
1           1           1 number
2           1           2 string
3           1           3 object

gather_keys

Arguments

  • x
  • column.name
> '{"name": "bob", "age": 32}' %>% gather_keys() %>% json_types()
  document.id  key   type
1           1 name string
2           1  age number

issues

json_types

> c('{"a": 1}', '[1, 2]', '"a"', '1', 'true', 'null') %>% json_types()
  document.id    type
1           1  object
2           2   array
3           3  string
4           4  number
5           5 logical
6           6    null

json_lengths

> c('[1, 2, 3]', '{"k1": 1, "k2": 2}', '1', {}) %>% json_lengths()
  document.id length
1           1      3
2           2      2
3           3      1

read_json

ファイルパスやURLからJSON形式のファイルを読み込む

> read_json(path, format = c("json", "jsonl", "infer"))

replace_nulls

NULL値を任意の値に置換する

> replace_nulls(1, replace)

spread_values

> '{"name": {"first": "bob", "last": "jones"}, "age": 32}' %>%
+   spread_values(first.name = jstring("name", "first"),
+                 age = jnumber("age"))
  document.id first.name age
1           1        bob  32