broom: Convert Statistical Analysis Objects into Tidy Data Frames

統計解析の出力オブジェクトを整形された形にする

CRAN: http://cran.r-project.org/web/packages/broom/index.html
GitHub: https://github.com/dgrtwo/broom
Vignettes:

> library(broom)


Attaching package: 'broom'

The following object is masked from 'package:modelr':

    bootstrap

> library(dplyr)

バージョン: 0.4.2

関数名	概略
`Arima_tidiers`	Tidying methods for ARIMA modeling of time series
`aareg_tidiers`	Tidiers for aareg objects
`acf_tidiers`	Tidying method for the acf function
`anova_tidiers`	Tidying methods for anova and AOV objects
`auc_tidiers`	Tidiers for objects from the AUC package
`augment`	Augment data according to a tidied model
`augment_columns`	add fitted values, residuals, and other common outputs to an augment call
`biglm_tidiers`	Tidiers for biglm and bigglm object
`binDesign_tidiers`	Tidy a binDesign object
`binWidth_tidiers`	Tidy a binWidth object
`boot_tidiers`	Tidying methods for bootstrap computations
`bootstrap`	Set up bootstrap replicates of a dplyr operation
`broom`	Convert Statistical Analysis Objects into Tidy Data Frames
`btergm_tidiers`	Tidying method for a bootstrapped temporal
`exponential`	random graph model
`cch_tidiers`	tidiers for case-cohort data
`compact`	Remove NULL items in a vector or list
`confint.geeglm`	Confidence interval for 'geeglm' objects
`confint_tidy`	Calculate confidence interval as a tidy data frame
`coxph_tidiers`	Tidiers for coxph object
`cv.glmnet_tidiers`	Tidiers for glmnet cross-validation objects
`data.frame_tidiers`	Tidiers for data.frame objects
`ergm_tidiers`	Tidying methods for an exponential random graph model
`felm_tidiers`	Tidying methods for models with multiple group fixed effects
`finish_glance`	Add logLik, AIC, BIC, and other common measurements to a glance of a prediction
`fitdistr_tidiers`	Tidying methods for fitdistr objects from the MASS package
`fix_data_frame`	Ensure an object is a data frame, with rownames moved into a column
`gam_tidiers`	Tidying methods for a generalized additive model (gam)
`gamlss_tidiers`	Tidying methods for gamlss objects
`geeglm_tidiers`	Tidying methods for generalized estimating equations models
`glance`	Construct a single row summary "glance" of a model, fit, or other object
`glm_tidiers`	Tidying methods for a glm object
`glmnet_tidiers`	Tidiers for LASSO or elasticnet regularized fits
`gmm_tidiers`	Tidying methods for generalized method of moments "gmm" objects
`htest_tidiers`	Tidying methods for an htest object
`inflate`	Expand a dataset to include all factorial combinations of one or more variables
`insert_NAs`	insert a row of NAs into a data frame wherever another data frame has NAs
`kappa_tidiers`	Tidy a kappa object from a Cohen's kappa calculation
`kde_tidiers`	Tidy a kernel density estimate object from the ks package
`kmeans_tidiers`	Tidying methods for kmeans objects
`list_tidiers`	Tidiers for return values from functions that aren't S3 objects
`lm_tidiers`	Tidying methods for a linear model
`lme4_tidiers`	Tidying methods for mixed effects models
`lmodel2_tidiers`	Tidiers for linear model II objects from the lmodel2 package
`loess_tidiers`	Augmenting methods for loess models
`matrix_tidiers`	Tidiers for matrix objects
`mclust_tidiers`	Tidying methods for Mclust objects
`mcmc_tidiers`	Tidying methods for MCMC (Stan, JAGS, etc.) fits
`mle2_tidiers`	Tidy mle2 maximum likelihood objects
`multcomp_tidiers`	tidying methods for objects produced by 'multcomp'
`multinom_tidiers`	Tidying methods for multinomial logistic regression models
`nlme_tidiers`	Tidying methods for mixed effects models
`nls_tidiers`	Tidying methods for a nonlinear model
`optim_tidiers`	Tidiers for lists returned from optim
`orcutt_tidiers`	Tidiers for Cochrane Orcutt object
`plm_tidiers`	Tidiers for panel regression linear models
`poLCA_tidiers`	Tidiers for poLCA objects
`prcomp_tidiers`	Tidying methods for principal components analysis via 'prcomp'
`process_ergm`	helper function to process a tidied ergm object
`process_geeglm`	helper function to process a tidied geeglm object
`process_lm`	helper function to process a tidied lm object
`process_rq`	Helper function for tidy.rq and tidy.rqs
`pyears_tidiers`	Tidy person-year summaries
`rcorr_tidiers`	Tidying methods for rcorr objects
`ridgelm_tidiers`	Tidying methods for ridgelm objects from the MASS package
`rlm_tidiers`	Tidying methods for an rlm (robust linear model) object
`rowwise_df_tidiers`	Tidying methods for rowwise_dfs from dplyr, for
`tidying`	each row and recombining the results
`rq_tidiers`	Tidying methods for quantile regression models
`rstanarm_tidiers`	Tidying methods for an rstanarm model
`sexpfit_tidiers`	Tidy an expected survival curve
`smooth.spline_tidiers`	tidying methods for smooth.spline objects
`sp_tidiers`	tidying methods for classes from the sp package.
`sparse_tidiers`	Tidy a sparseMatrix object from the Matrix package
`summary_tidiers`	Tidiers for summaryDefault objects
`survfit_tidiers`	tidy survival curve fits
`survreg_tidiers`	Tidiers for a parametric regression survival model
`svd_tidiers`	Tidying methods for singular value decomposition
`tidy`	Tidy the result of a test into a summary data.frame
`tidy.NULL`	tidy on a NULL input
`tidy.TukeyHSD`	tidy a TukeyHSD object
`tidy.coeftest`	Tidying methods for coeftest objects
`tidy.default`	Default tidying method
`tidy.density`	tidy a density objet
`tidy.dist`	Tidy a distance matrix
`tidy.ftable`	tidy an ftable object
`tidy.manova`	tidy a MANOVA object
`tidy.map`	Tidy method for map objects.
`tidy.numeric`	Tidy atomic vectors
`tidy.pairwise.htest`	tidy a pairwise hypothesis test
`tidy.power.htest`	tidy a power.htest
`tidy.spec`	tidy a spec objet
`tidy.table`	tidy a table object
`tidy.ts`	tidy a ts timeseries object
`unrowname`	strip rownames from an object
`xyz_tidiers`	Tidiers for x, y, z lists suitable for persp, image, etc.
`zoo_tidiers`	Tidying methods for a zoo object

augment

> mtcars %>% group_by(cyl) %>%
+   do(fit = lm(wt ~ mpg + qsec + gear, .)) %>% 
+   augment(fit) %>% 
+   head() %>% 
+   knitr::kable()

cyl	wt	mpg	qsec	gear	.fitted	.se.fit	.resid	.hat	.sigma	.cooksd	.std.resid
4	2.320	22.8	18.61	4	2.46334473610	0.141857133685	-0.143344736098	0.197297497135	0.338715701818	0.015421799919	-0.500972686568
4	3.190	24.4	20.00	4	2.63356011079	0.119955227097	0.556439889210	0.141077439790	0.242723102392	0.145126019805	1.879969736352
4	3.150	22.8	22.90	4	3.39278059524	0.298922321177	-0.242780595235	0.876064146800	0.199323867070	8.240036829579	-2.159360232454
4	2.200	32.4	19.47	4	1.86408215293	0.173846815730	0.335917847066	0.296314357955	0.303757385696	0.165508650040	1.253872394196
4	1.615	30.4	18.52	4	1.82192616046	0.147540125770	-0.206926160460	0.213422161384	0.331544809908	0.036203130352	-0.730557079876
4	1.835	33.9	19.90	4	1.83449503768	0.209860981278	0.000504962315	0.431799975737	0.344955958298	0.000000835906	0.002097577191

> bootnls_aug <- mtcars %>% bootstrap(100) %>%
+     do(augment(nls(mpg ~ k / wt + b, ., start=list(k=1, b=0)), .))
> 
> ggplot(bootnls_aug, aes(wt, mpg)) + geom_point() +
+     geom_line(aes(y=.fitted, group=replicate), alpha=.2)

> smoothspline_aug <- mtcars %>% bootstrap(100) %>%
+     do(augment(smooth.spline(.$wt, .$mpg, df=4), .))
> 
> ggplot(smoothspline_aug, aes(wt, mpg)) + geom_point() +
+     geom_line(aes(y=.fitted, group=replicate), alpha=.2)

bootstrap

ブーツストラップ

> set.seed(2014)
> mtcars %>% bootstrap(100) %>%
+     do(nls(mpg ~ k / wt + b, ., 
+            start = list(k = 1, b = 0)) %>% tidy())

Source: local data frame [200 x 6]
Groups: replicate [100]

   replicate  term       estimate     std.error       statistic
       <int> <chr>          <dbl>         <dbl>           <dbl>
1          1     k 46.63250189869 4.02601640314 11.582789842164
2          1     b  4.36063670161 1.53826660819  2.834773035050
3          2     k 54.18247616275 4.96497634197 10.912937430288
4          2     b  1.00486839665 1.89719556867  0.529659890229
5          3     k 43.25721216486 3.56485961800 12.134338178812
6          3     b  4.83351040793 1.29790905951  3.724074789762
7          4     k 48.53135108848 4.45688317168 10.889078582281
8          4     b  3.50934563498 1.68897933879  2.077790742843
9          5     k 52.60509851090 5.66269038595  9.289771279292
10         5     b  3.34437976627 2.30041096264  1.453818391838
# ... with 190 more rows, and 1 more variables: p.value <dbl>

compact

リストからNULL要素を取り除く

> list("A", "B", NULL, "D") %>% broom:::compact()

[[1]]
[1] "A"

[[2]]
[1] "B"

[[3]]
[1] "D"

data.frame_tidiers

glance

> mtcars %>% group_by(cyl) %>%
+   do(fit = lm(wt ~ mpg + qsec + gear, .)) %>% 
+   glance(fit) %>% 
+   head() %>% 
+   knitr::kable()

cyl	r.squared	adj.r.squared	sigma	statistic	p.value	df	logLik	AIC	BIC	deviance	df.residual
4	0.779913093309	0.685590133298	0.319367260098	8.26853921061	0.010597767245	4	-0.566856603363	11.1337132067	13.1231895707	0.713968127756	7
6	0.969994653934	0.939989307868	0.087294251135	32.32739431824	0.008743763030	4	10.102267469856	-10.2045349397	-10.4749841944	0.022860858844	3
8	0.652127757744	0.547766085067	0.510687079831	6.24872849407	0.011614604664	4	-8.101858384279	26.2037167686	29.3990034166	2.608012935067	10

insert_NAs

tidy

> Orange %>% group_by(Tree) %>% 
+   do(cor.test(.$age, .$circumference) %>% tidy())

Source: local data frame [5 x 9]
Groups: Tree [5]

   Tree       estimate     statistic            p.value parameter
  <ord>          <dbl>         <dbl>              <dbl>     <int>
1     3 0.988176587129 14.4118806470 0.0000290104593668         5
2     1 0.985467542479 12.9725812675 0.0000485190172612         5
3     5 0.987737642292 14.1468609718 0.0000317709260945         5
4     2 0.987362434577 13.9312907259 0.0000342504117564         5
5     4 0.984460969608 12.5357483679 0.0000573308969016         5
# ... with 4 more variables: conf.low <dbl>, conf.high <dbl>,
#   method <fctr>, alternative <fctr>

> Orange %>% group_by(Tree) %>% 
+   do(lm(age ~ circumference, data=.) %>% tidy())

Source: local data frame [10 x 6]
Groups: Tree [5]

    Tree          term         estimate       std.error       statistic
   <ord>         <chr>            <dbl>           <dbl>           <dbl>
1      3   (Intercept) -209.51232149301 85.268290402704 -2.457095369258
2      3 circumference   12.03888487911  0.835344475434 14.411880647031
3      1   (Intercept) -264.67343750000 98.620556898818 -2.683755251672
4      1 circumference   11.91924542683  0.918802910620 12.972581267502
5      5   (Intercept)  -54.48409709432 76.886278786109 -0.708632254734
6      5 circumference    8.78713197900  0.621136519012 14.146860971848
7      2   (Intercept) -132.43972525629 83.131414589342 -1.593136913530
8      2 circumference    7.79522500189  0.559547938180 13.931290725948
9      4   (Intercept)  -76.51367061555 88.294375729161 -0.866574682517
10     4 circumference    7.16984173775  0.571951632031 12.535748367901
# ... with 1 more variables: p.value <dbl>

> mtcars %>% group_by(am) %>% 
+   do(lm(wt ~ mpg + qsec + gear, .) %>% tidy())

Source: local data frame [8 x 6]
Groups: am [2]

     am        term         estimate       std.error        statistic
  <dbl>       <chr>            <dbl>           <dbl>            <dbl>
1     0 (Intercept)  4.9175462315902 1.3966567480290  3.5209411607613
2     0         mpg -0.1918891414413 0.0442832937022 -4.3332174596550
3     0        qsec  0.0919136106783 0.0983206685615  0.9348350862848
4     0        gear  0.1465375354790 0.3681936308209  0.3979904137730
5     1 (Intercept)  4.2830702774164 3.4585995805749  1.2383828129374
6     1         mpg -0.1009831979373 0.0294340870096 -3.4308248767678
7     1        qsec  0.0398316482863 0.1511213514406  0.2635739285459
8     1        gear -0.0228832969429 0.3487822570884 -0.0656091199534
# ... with 1 more variables: p.value <dbl>

> regressions <- mtcars %>% group_by(cyl) %>%
+     do(fit = lm(wt ~ mpg + qsec + gear, .))
> regressions %>% tidy(fit)

Source: local data frame [12 x 6]
Groups: cyl [3]

     cyl        term          estimate       std.error         statistic
   <dbl>       <chr>             <dbl>           <dbl>             <dbl>
1      4 (Intercept) -0.77266239762184 2.2278802640408 -0.34681504661315
2      4         mpg -0.08183156858565 0.0238178703631 -3.43572147040065
3      4        qsec  0.21665171541651 0.0759077293445  2.85414564876753
4      4        gear  0.26746961839298 0.2445417311863  1.09375858711481
5      6 (Intercept) -7.78580829042293 3.3548493219313 -2.32076243768252
6      6         mpg  0.04328328231298 0.0519672400109  0.83289553772489
7      6        qsec  0.42199831465506 0.0913681739538  4.61865763967518
8      6        gear  0.63832001855981 0.2052414340504  3.11009334695533
9      8 (Intercept)  0.00597157772867 4.2746051056959  0.00139698933141
10     8         mpg -0.17692456055530 0.0557085166832 -3.17589788938953
11     8        qsec  0.36940581266784 0.1930955273526  1.91307285949380
12     8        gear  0.14276241610120 0.3166412723714  0.45086483840854
# ... with 1 more variables: p.value <dbl>