R's core strength is statistical analysis. Today you apply hypothesis tests, build regression models, and use the broom package to work with model outputs
broom converts model output to tidy data frames.
t.test() compares means: t.test(x, y) tests if two groups have different means. prop.test() compares proportions. chisq.test() tests independence of categorical variables. cor.test() tests correlation significance. The output includes the test statistic, p-value, confidence interval, and degrees of freedom. R's output is verbose and human-readable but not tidy — that is what broom is for.
lm(y ~ x1 + x2, data=df) fits a linear model. summary(model) shows coefficients, standard errors, p-values, and R-squared. Interaction terms: y ~ x1 * x2. Polynomial terms: y ~ poly(x, 2). predict(model, newdata) makes predictions. Regression diagnostics: plot(model) shows residuals, Q-Q plot, scale-location, and Cook's distance. Check assumptions: linearity, homoscedasticity, normality of residuals.
broom converts model output to tidy data frames. tidy(model) returns one row per coefficient with estimate, std.error, statistic, p.value. glance(model) returns one row of model-level statistics: R-squared, AIC, BIC, F-statistic. augment(model, data) adds fitted values and residuals to the original data. This makes it easy to plot model results with ggplot2 and compare multiple models with dplyr.
library(broom)
# t-test: do groups differ?
group_a <- c(82, 85, 88, 92, 95, 78, 90)
group_b <- c(74, 78, 82, 79, 85, 71, 83)
test_result <- t.test(group_a, group_b)
tidy(test_result) # tidy data frame output
# estimate p.value conf.low conf.high
# 9.71 0.0087 2.85 16.57
# Linear regression
model <- lm(mpg ~ wt + hp + factor(cyl),
data = mtcars)
tidy(model) # coefficient table
glance(model) # R2, AIC, F-statistic
# Predict new observations
new_cars <- data.frame(wt = c(2.5, 3.0, 3.5),
hp = c(100, 150, 200),
cyl = c(4, 6, 8))
predict(model, newdata = new_cars,
interval = 'confidence')
# Visualize model
augment(model) |>
ggplot(aes(.fitted, .resid)) +
geom_point() +
geom_hline(yintercept = 0, linetype = 'dashed') +
labs(title = 'Residuals vs Fitted')
Before moving on, confirm understanding of these key concepts: