We’re excited to announce the release of broom 0.7.0 on CRAN!
broom is a package for summarizing statistical model objects in tidy tibbles. While several compatibility updates have been released in recent months, this is the first major update to broom in almost two years. This update includes many new tidier methods, bug fixes, improvements to existing tidier methods and their documentation, and improvements to maintainability and internal consistency. The full list of changes is available in the package
release notes
.
This release was made possible in part by the RStudio internship program, which has allowed one of us (
Simon Couch
) to work on broom full-time for the last month.
You can install the most recent broom update with the following code:
install.packages("broom")
Then attach it for use with:
library(broom)
We’ll outline some of the more notable changes below!
For one, this release includes support for several new model objects—many of these additions came from first-time contributors to broom!
anova
objects from the
car
package
pam
objects from the
cluster
package
drm
objects from the
drc
package
summary_emm
objects from the
emmeans
package
epi.2by2
objects from the
epiR
package
fixest
objects from the
fixest
package
regsubsets
objects from the
leaps
package
lm.beta
objects from the
lm.beta
package
rma
objects from the
metafor
package
mfx
,
logitmfx
,
negbinmfx
,
poissonmfx
,
probitmfx
, and
betamfx
objects from the
mfx
package
lmrob
and
glmrob
objects from the
robustbase
package
sarlm
objects from the
spatialreg
package
speedglm
objects from the
speedglm
package
svyglm
objects from the
survey
package
We have restored a simplified version of
glance.aov()
This update also features many bug fixes improvements to existing tidiers. Some of the more notable ones:
Many improvements to the consistency of
augment.*()
methods:
If you pass a dataset to
augment()
via the
data
or
newdata
arguments, you are now guaranteed that the augmented dataset will have exactly the same number of rows as the original dataset. This differs from previous behavior primarily when there are missing values. Previously
augment()
would drop rows containing
NA
. This should no longer be the case. As a result,
augment.*()
methods no longer accept an
na.action
argument.
In previous versions, several
augment.*()
methods inherited the
augment.lm()
method, but required additions to the
augment.lm()
method itself. We have shifted away from this approach in favor of re-implementing many
augment.*()
methods as standalone methods making use of internal helper functions. As a result,
augment.lm()
and some related methods have deprecated (previously unused) arguments.
The
.resid
column in the output of
augment().*
methods is now consistently defined as
y - y_hat
.
augment()
tries to give an informative error when
data
isn’t the original training data.
Several
glance.*()
methods have been refactored in order to return a one-row tibble even when the model matrix is rank-deficient.
glance()
methods now return a
nobs
column, which contains the number of data points used to fit the model!
Various warnings resulting from changes to the tidyr API in v1.0.0 have been fixed.
Added options to provide additional columns in the outputs of
glance.biglm()
,
tidy.felm()
,
tidy.lmsobj()
,
tidy.lmodel2()
,
tidy.polr()
,
tidy.prcomp()
,
tidy.zoo()
,
tidy_optim()
This release also contains a number of breaking changes and deprecations meant to improve maintainability and internal consistency.
We have changed how we report degrees of freedom for
lm
objects. This is especially important for instructors in statistics courses. Previously the
df
column in
glance.lm()
reported the rank of the design matrix. Now it reports degrees of freedom of the numerator for the overall F-statistic. This is equal to the rank of the model matrix minus one (unless you omit an intercept column), so the new
df
should be the old
df
minus one.
We are moving away from supporting
summary.*()
objects. In particular, we have removed
tidy.summary.lm()
as part of a major overhaul of internals. Instead of calling
tidy()
on
summary
-like objects, please call
tidy()
directly on model objects moving forward.
We have removed all support for the
quick
argument in
tidy()
methods. This is to simplify internals and is for maintainability purposes. We anticipate this will not influence many users as few people seemed to use it. If this majorly cramps your style, let us know, as we are considering a new verb to return only model parameters. In the meantime,
stats::coef()
together with
tibble::enframe()
provides most of the functionality of
tidy(..., quick = TRUE)
.
All
conf.int
arguments now default to
FALSE
, and all
conf.level
arguments now default to
0.95
. This should primarily affect
tidy.survreg()
, which previously always returned confidence intervals, although there are some others.
Tidiers for
emmeans
-objects use the arguments
conf.int
and
conf.level
instead of relying on the argument names native to the
emmeans::summary()
-methods (i.e.,
infer
and
level
). Similarly,
multcomp
-tidiers now include a call to
summary()
as previous behavior was akin to setting the now removed argument
quick = TRUE
. Both families of tidiers now use the
adj.p.value
column name when appropriate. Finally,
emmeans
-,
multcomp
-, and
TukeyHSD
-tidiers now consistently use the column names
contrast
and
null.value
instead of
comparison
,
level1
and
level2
, or
lhs
and
rhs
.
This release of broom also deprecates several helper functions as well as tidier methods for a number of non-model objects, each in favor of more principled approaches from other packages (outlined in the NEWS file). Notably, though, tidiers have been deprecated for data frames, rowwise data frames, vectors, and matrices. Further, we have moved forward with the planned transfer of tidiers for mixed models to
broom.mixed
.
Most all unit testing for the package is now supported by the
modeltests
package!
Also, we have revised several vignettes and moved them to the tidymodels website. For backward compatibility, the existing vignettes will now simply link to the revised versions.
Finally, the package’s website has moved from its previous tidyverse domain to
broom.tidymodels.org
.
Most notably,
the broom dev team is changing the process to add new tidying methods to the package.
Instead, we ask that issues/PRs requesting support for new model objects be directed to the model-owning package (i.e. the package that the model is exported from) rather than to broom. If the maintainers of those packages are unable or unwilling to provide tidying methods in the model-owning package, it might be possible to add the new tidier to broom. broom is near its limit of tidiers; adding more may make the package unsustainable.
For developers exporting tidying methods directly from model-owning packages, we are actively working to provide resources to both ease the process of writing new tidiers methods and reduce the dependency burden of taking on broom generics and helpers. As for the first point, we recently posted an
article
on the tidymodels website providing notes on best practices for writing tidiers. This article will be kept up to date as we develop new resources for easing the process of writing new tidier methods. As for the latter, the
r-lib/generics
package provides lightweight dependencies for the main broom generics. We hope to soon provide a coherent suite of helper functions for use in external broom methods.
We anticipate that the most active development on the broom package, looking forward, will center on improving
augment()
methods. We are also hoping to change our CRAN release cycle and to provide incremental updates every several months rather than major changes every couple years.
This release features work and input from over 140 contributors (over 50 of them for their first time) since the last major release. See the package
release notes
to see more specific notes on contributions. Thank you all for your thoughtful comments, patience, and hard work!
@abbylsmith
,
@acoppock
,
@ajb5d
,
@aloy
,
@AndrewKostandy
,
@angusmoore
,
@anniew
,
@aperaltasantos
,
@asbates
,
@asondhi
,
@asreece
,
@atyre2
,
@bachmeil
,
@batpigandme
,
@bbolker
,
@benjbuch
,
@bfgray3
,
@BibeFiu
,
@billdenney
,
@BrianOB
,
@briatte
,
@bruc
,
@brunaw
,
@brunolucian
,
@bschneidr
,
@carlislerainey
,
@CGMossa
,
@CharlesNaylor
,
@ChuliangXiao
,
@cimentadaj
,
@crsh
,
@cwang23
,
@DavisVaughan
,
@dchiu911
,
@ddsjoberg
,
@dgrtwo
,
@dmenne
,
@dylanjm
,
@ecohen13
,
@economer
,
@EDiLD
,
@ekatko1
,
@ellessenne
,
@ethchr
,
@florencevdubois
,
@GegznaV
,
@gershomtripp
,
@grantmcdermott
,
@gregmacfarlane
,
@hadley
,
@haozhu233
,
@hasenbratan
,
@HenrikBengtsson
,
@hermandr
,
@hideaki
,
@hughjonesd
,
@iago-pssjd
,
@ifellows
,
@IndrajeetPatil
,
@Inferrator
,
@istvan60
,
@jamesmartherus
,
@JanLauGe
,
@jasonyang5
,
@jaspercooper
,
@jcfisher
,
@jennybc
,
@jessecambon
,
@jkylearmstrongibx
,
@jmuhlenkamp
,
@JulianMutz
,
@Jungpin
,
@jwilber
,
@jyuu
,
@karissawhiting
,
@karldw
,
@khailper
,
@krauskae
,
@kuriwaki
,
@kyusque
,
@KZARCA
,
@Laura-O
,
@ldlpdx
,
@ldmahoney
,
@lilymedina
,
@llendway
,
@lrose1
,
@ltobalina
,
@LukasWallrich
,
@lukesonnet
,
@lwjohnst86
,
@malcolmbarrett
,
@margarethannum
,
@mariusbarth
,
@MatthieuStigler
,
@mattle24
,
@mattpollock
,
@mattwarkentin
,
@mine-cetinkaya-rundel
,
@mkirzon
,
@mlaviolet
,
@Move87
,
@namarkus
,
@nlubock
,
@nmjakobsen
,
@ns-1m
,
@nt-williams
,
@oij11
,
@petrhrobar
,
@PirateGrunt
,
@pjpaulpj
,
@pkq
,
@poppymiller
,
@QuLogic
,
@randomgambit
,
@riinuots
,
@RobertoMuriel
,
@Roisin-White
,
@romainfrancois
,
@rsbivand
,
@serina-robinson
,
@shabbybanks
,
@Silver-Fang
,
@Sim19
,
@simonpcouch
,
@sjackson1236
,
@softloud
,
@stefvanbuuren
,
@strengejacke
,
@sushmitavgopalan16
,
@tcuongd
,
@thisisnic
,
@topepo
,
@tyluRp
,
@vincentarelbundock
,
@vjcitn
,
@vnijs
,
@weiyangtham
,
@william3031
,
@x249wang
,
@xieguagua
,
@yrosseel
, and
@zoews
New Tidier Methods
Improvements and Bug Fixes for Existing Tidiers
Breaking Changes and Deprecations
Other Changes
Looking Forward
Contributors