broom: Convert Statistical Analysis Objects into Tidy Data Frames

統計解析の出力オブジェクトを整形された形にする

> library(broom)

Attaching package: 'broom'
The following object is masked from 'package:modelr':

    bootstrap
> library(dplyr)

バージョン: 0.4.2


関数名 概略
Arima_tidiers Tidying methods for ARIMA modeling of time series
aareg_tidiers Tidiers for aareg objects
acf_tidiers Tidying method for the acf function
anova_tidiers Tidying methods for anova and AOV objects
auc_tidiers Tidiers for objects from the AUC package
augment Augment data according to a tidied model
augment_columns add fitted values, residuals, and other common outputs to an augment call
biglm_tidiers Tidiers for biglm and bigglm object
binDesign_tidiers Tidy a binDesign object
binWidth_tidiers Tidy a binWidth object
boot_tidiers Tidying methods for bootstrap computations
bootstrap Set up bootstrap replicates of a dplyr operation
broom Convert Statistical Analysis Objects into Tidy Data Frames
btergm_tidiers Tidying method for a bootstrapped temporal
exponential random graph model
cch_tidiers tidiers for case-cohort data
compact Remove NULL items in a vector or list
confint.geeglm Confidence interval for 'geeglm' objects
confint_tidy Calculate confidence interval as a tidy data frame
coxph_tidiers Tidiers for coxph object
cv.glmnet_tidiers Tidiers for glmnet cross-validation objects
data.frame_tidiers Tidiers for data.frame objects
ergm_tidiers Tidying methods for an exponential random graph model
felm_tidiers Tidying methods for models with multiple group fixed effects
finish_glance Add logLik, AIC, BIC, and other common measurements to a glance of a prediction
fitdistr_tidiers Tidying methods for fitdistr objects from the MASS package
fix_data_frame Ensure an object is a data frame, with rownames moved into a column
gam_tidiers Tidying methods for a generalized additive model (gam)
gamlss_tidiers Tidying methods for gamlss objects
geeglm_tidiers Tidying methods for generalized estimating equations models
glance Construct a single row summary "glance" of a model, fit, or other object
glm_tidiers Tidying methods for a glm object
glmnet_tidiers Tidiers for LASSO or elasticnet regularized fits
gmm_tidiers Tidying methods for generalized method of moments "gmm" objects
htest_tidiers Tidying methods for an htest object
inflate Expand a dataset to include all factorial combinations of one or more variables
insert_NAs insert a row of NAs into a data frame wherever another data frame has NAs
kappa_tidiers Tidy a kappa object from a Cohen's kappa calculation
kde_tidiers Tidy a kernel density estimate object from the ks package
kmeans_tidiers Tidying methods for kmeans objects
list_tidiers Tidiers for return values from functions that aren't S3 objects
lm_tidiers Tidying methods for a linear model
lme4_tidiers Tidying methods for mixed effects models
lmodel2_tidiers Tidiers for linear model II objects from the lmodel2 package
loess_tidiers Augmenting methods for loess models
matrix_tidiers Tidiers for matrix objects
mclust_tidiers Tidying methods for Mclust objects
mcmc_tidiers Tidying methods for MCMC (Stan, JAGS, etc.) fits
mle2_tidiers Tidy mle2 maximum likelihood objects
multcomp_tidiers tidying methods for objects produced by 'multcomp'
multinom_tidiers Tidying methods for multinomial logistic regression models
nlme_tidiers Tidying methods for mixed effects models
nls_tidiers Tidying methods for a nonlinear model
optim_tidiers Tidiers for lists returned from optim
orcutt_tidiers Tidiers for Cochrane Orcutt object
plm_tidiers Tidiers for panel regression linear models
poLCA_tidiers Tidiers for poLCA objects
prcomp_tidiers Tidying methods for principal components analysis via 'prcomp'
process_ergm helper function to process a tidied ergm object
process_geeglm helper function to process a tidied geeglm object
process_lm helper function to process a tidied lm object
process_rq Helper function for tidy.rq and tidy.rqs
pyears_tidiers Tidy person-year summaries
rcorr_tidiers Tidying methods for rcorr objects
ridgelm_tidiers Tidying methods for ridgelm objects from the MASS package
rlm_tidiers Tidying methods for an rlm (robust linear model) object
rowwise_df_tidiers Tidying methods for rowwise_dfs from dplyr, for
tidying each row and recombining the results
rq_tidiers Tidying methods for quantile regression models
rstanarm_tidiers Tidying methods for an rstanarm model
sexpfit_tidiers Tidy an expected survival curve
smooth.spline_tidiers tidying methods for smooth.spline objects
sp_tidiers tidying methods for classes from the sp package.
sparse_tidiers Tidy a sparseMatrix object from the Matrix package
summary_tidiers Tidiers for summaryDefault objects
survfit_tidiers tidy survival curve fits
survreg_tidiers Tidiers for a parametric regression survival model
svd_tidiers Tidying methods for singular value decomposition
tidy Tidy the result of a test into a summary data.frame
tidy.NULL tidy on a NULL input
tidy.TukeyHSD tidy a TukeyHSD object
tidy.coeftest Tidying methods for coeftest objects
tidy.default Default tidying method
tidy.density tidy a density objet
tidy.dist Tidy a distance matrix
tidy.ftable tidy an ftable object
tidy.manova tidy a MANOVA object
tidy.map Tidy method for map objects.
tidy.numeric Tidy atomic vectors
tidy.pairwise.htest tidy a pairwise hypothesis test
tidy.power.htest tidy a power.htest
tidy.spec tidy a spec objet
tidy.table tidy a table object
tidy.ts tidy a ts timeseries object
unrowname strip rownames from an object
xyz_tidiers Tidiers for x, y, z lists suitable for persp, image, etc.
zoo_tidiers Tidying methods for a zoo object

augment

> mtcars %>% group_by(cyl) %>%
+   do(fit = lm(wt ~ mpg + qsec + gear, .)) %>% 
+   augment(fit) %>% 
+   head() %>% 
+   knitr::kable()
cyl wt mpg qsec gear .fitted .se.fit .resid .hat .sigma .cooksd .std.resid
4 2.320 22.8 18.61 4 2.46334473610 0.141857133685 -0.143344736098 0.197297497135 0.338715701818 0.015421799919 -0.500972686568
4 3.190 24.4 20.00 4 2.63356011079 0.119955227097 0.556439889210 0.141077439790 0.242723102392 0.145126019805 1.879969736352
4 3.150 22.8 22.90 4 3.39278059524 0.298922321177 -0.242780595235 0.876064146800 0.199323867070 8.240036829579 -2.159360232454
4 2.200 32.4 19.47 4 1.86408215293 0.173846815730 0.335917847066 0.296314357955 0.303757385696 0.165508650040 1.253872394196
4 1.615 30.4 18.52 4 1.82192616046 0.147540125770 -0.206926160460 0.213422161384 0.331544809908 0.036203130352 -0.730557079876
4 1.835 33.9 19.90 4 1.83449503768 0.209860981278 0.000504962315 0.431799975737 0.344955958298 0.000000835906 0.002097577191
> bootnls_aug <- mtcars %>% bootstrap(100) %>%
+     do(augment(nls(mpg ~ k / wt + b, ., start=list(k=1, b=0)), .))
> 
> ggplot(bootnls_aug, aes(wt, mpg)) + geom_point() +
+     geom_line(aes(y=.fitted, group=replicate), alpha=.2)
> smoothspline_aug <- mtcars %>% bootstrap(100) %>%
+     do(augment(smooth.spline(.$wt, .$mpg, df=4), .))
> 
> ggplot(smoothspline_aug, aes(wt, mpg)) + geom_point() +
+     geom_line(aes(y=.fitted, group=replicate), alpha=.2)

bootstrap

ブーツストラップ

> set.seed(2014)
> mtcars %>% bootstrap(100) %>%
+     do(nls(mpg ~ k / wt + b, ., 
+            start = list(k = 1, b = 0)) %>% tidy())
Source: local data frame [200 x 6]
Groups: replicate [100]

   replicate  term       estimate     std.error       statistic
       <int> <chr>          <dbl>         <dbl>           <dbl>
1          1     k 46.63250189869 4.02601640314 11.582789842164
2          1     b  4.36063670161 1.53826660819  2.834773035050
3          2     k 54.18247616275 4.96497634197 10.912937430288
4          2     b  1.00486839665 1.89719556867  0.529659890229
5          3     k 43.25721216486 3.56485961800 12.134338178812
6          3     b  4.83351040793 1.29790905951  3.724074789762
7          4     k 48.53135108848 4.45688317168 10.889078582281
8          4     b  3.50934563498 1.68897933879  2.077790742843
9          5     k 52.60509851090 5.66269038595  9.289771279292
10         5     b  3.34437976627 2.30041096264  1.453818391838
# ... with 190 more rows, and 1 more variables: p.value <dbl>

compact

リストからNULL要素を取り除く

> list("A", "B", NULL, "D") %>% broom:::compact()
[[1]]
[1] "A"

[[2]]
[1] "B"

[[3]]
[1] "D"

data.frame_tidiers

glance

> mtcars %>% group_by(cyl) %>%
+   do(fit = lm(wt ~ mpg + qsec + gear, .)) %>% 
+   glance(fit) %>% 
+   head() %>% 
+   knitr::kable()
cyl r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual
4 0.779913093309 0.685590133298 0.319367260098 8.26853921061 0.010597767245 4 -0.566856603363 11.1337132067 13.1231895707 0.713968127756 7
6 0.969994653934 0.939989307868 0.087294251135 32.32739431824 0.008743763030 4 10.102267469856 -10.2045349397 -10.4749841944 0.022860858844 3
8 0.652127757744 0.547766085067 0.510687079831 6.24872849407 0.011614604664 4 -8.101858384279 26.2037167686 29.3990034166 2.608012935067 10

insert_NAs

tidy

> Orange %>% group_by(Tree) %>% 
+   do(cor.test(.$age, .$circumference) %>% tidy())
Source: local data frame [5 x 9]
Groups: Tree [5]

   Tree       estimate     statistic            p.value parameter
  <ord>          <dbl>         <dbl>              <dbl>     <int>
1     3 0.988176587129 14.4118806470 0.0000290104593668         5
2     1 0.985467542479 12.9725812675 0.0000485190172612         5
3     5 0.987737642292 14.1468609718 0.0000317709260945         5
4     2 0.987362434577 13.9312907259 0.0000342504117564         5
5     4 0.984460969608 12.5357483679 0.0000573308969016         5
# ... with 4 more variables: conf.low <dbl>, conf.high <dbl>,
#   method <fctr>, alternative <fctr>
> Orange %>% group_by(Tree) %>% 
+   do(lm(age ~ circumference, data=.) %>% tidy())
Source: local data frame [10 x 6]
Groups: Tree [5]

    Tree          term         estimate       std.error       statistic
   <ord>         <chr>            <dbl>           <dbl>           <dbl>
1      3   (Intercept) -209.51232149301 85.268290402704 -2.457095369258
2      3 circumference   12.03888487911  0.835344475434 14.411880647031
3      1   (Intercept) -264.67343750000 98.620556898818 -2.683755251672
4      1 circumference   11.91924542683  0.918802910620 12.972581267502
5      5   (Intercept)  -54.48409709432 76.886278786109 -0.708632254734
6      5 circumference    8.78713197900  0.621136519012 14.146860971848
7      2   (Intercept) -132.43972525629 83.131414589342 -1.593136913530
8      2 circumference    7.79522500189  0.559547938180 13.931290725948
9      4   (Intercept)  -76.51367061555 88.294375729161 -0.866574682517
10     4 circumference    7.16984173775  0.571951632031 12.535748367901
# ... with 1 more variables: p.value <dbl>
> mtcars %>% group_by(am) %>% 
+   do(lm(wt ~ mpg + qsec + gear, .) %>% tidy())
Source: local data frame [8 x 6]
Groups: am [2]

     am        term         estimate       std.error        statistic
  <dbl>       <chr>            <dbl>           <dbl>            <dbl>
1     0 (Intercept)  4.9175462315902 1.3966567480290  3.5209411607613
2     0         mpg -0.1918891414413 0.0442832937022 -4.3332174596550
3     0        qsec  0.0919136106783 0.0983206685615  0.9348350862848
4     0        gear  0.1465375354790 0.3681936308209  0.3979904137730
5     1 (Intercept)  4.2830702774164 3.4585995805749  1.2383828129374
6     1         mpg -0.1009831979373 0.0294340870096 -3.4308248767678
7     1        qsec  0.0398316482863 0.1511213514406  0.2635739285459
8     1        gear -0.0228832969429 0.3487822570884 -0.0656091199534
# ... with 1 more variables: p.value <dbl>
> regressions <- mtcars %>% group_by(cyl) %>%
+     do(fit = lm(wt ~ mpg + qsec + gear, .))
> regressions %>% tidy(fit)
Source: local data frame [12 x 6]
Groups: cyl [3]

     cyl        term          estimate       std.error         statistic
   <dbl>       <chr>             <dbl>           <dbl>             <dbl>
1      4 (Intercept) -0.77266239762184 2.2278802640408 -0.34681504661315
2      4         mpg -0.08183156858565 0.0238178703631 -3.43572147040065
3      4        qsec  0.21665171541651 0.0759077293445  2.85414564876753
4      4        gear  0.26746961839298 0.2445417311863  1.09375858711481
5      6 (Intercept) -7.78580829042293 3.3548493219313 -2.32076243768252
6      6         mpg  0.04328328231298 0.0519672400109  0.83289553772489
7      6        qsec  0.42199831465506 0.0913681739538  4.61865763967518
8      6        gear  0.63832001855981 0.2052414340504  3.11009334695533
9      8 (Intercept)  0.00597157772867 4.2746051056959  0.00139698933141
10     8         mpg -0.17692456055530 0.0557085166832 -3.17589788938953
11     8        qsec  0.36940581266784 0.1930955273526  1.91307285949380
12     8        gear  0.14276241610120 0.3166412723714  0.45086483840854
# ... with 1 more variables: p.value <dbl>