Generalizing Regression: t-tests and OLS
by Jacob Dichter
December 23, 2024
Regression analysis is one of the most foundational tools in statistics, enabling us to model and understand relationships between variables. I was surprised to learn that t-tests, a staple technique for comparing group means in statistics, is actually a simple case of Ordinary Least Squares (OLS) regression. Specifically, a t-test can be thought of as a linear regression where the predictor variable is binary (representing the two groups) and its coefficient represents the difference in means between the two groups.
Key Concepts
- t-tests: Used to compare the means of two groups.
- OLS Regression: A method for modeling the relationship between a dependent variable and one or more independent variables.
At its core, a t-test is designed to compare the means of two groups, determining whether the observed differences are statistically significant. On the other hand, OLS regression is a method for modeling the relationship between a dependent variable and one or more independent variables. While these two techniques may seem distinct, they are deeply interconnected. In fact, a t-test can be viewed as a simplified form of OLS regression, where the predictor variable is binary (e.g., representing two groups).
The Connection
When you perform a t-test, you’re essentially running an OLS regression model with a single binary predictor. Here’s how it works: the binary predictor (e.g., group A vs. group B) is coded as 0 or 1, where 0 represents one group and 1 represents the other. The regression coefficient in this model corresponds to the difference in means between the two groups. Remarkably, the t-statistic generated by the regression output is identical to the t-statistic you would obtain from a traditional t-test. This equivalence demonstrates that t-tests are a specific application of the broader framework of regression analysis.
- The binary predictor (e.g., group A vs. group B) is coded as 0 or 1.
- The regression coefficient represents the difference in means between the two groups.
- The t-statistic from the regression output is identical to the t-statistic from a t-test.
Example
Let’s say we want to compare the average test scores of two groups: Group A and Group B. We can use either a t-test or OLS regression to answer this question.
Using a t-test:
from scipy.stats import ttest_ind
t_stat, p_value = ttest_ind(group_a_scores, group_b_scores)
By understanding the connection between t-tests and OLS regression, we gain a deeper appreciation for the flexibility and power of regression analysis. Whether you’re comparing means or modeling complex relationships, regression provides a unified framework for tackling a wide range of statistical problems. This insight not only simplifies your toolkit but also enhances your ability to interpret and apply statistical methods effectively.
Old Post
Is a t-Test Equivalent to Regression on a Group Dummy Variable?
Yes, running a t-test comparing two groups is mathematically equivalent to performing a simple linear regression where the outcome variable is regressed on a single binary (dummy) variable indicating group membership. Here’s why:
The t-Test
A t-test for two independent groups compares the means of a dependent variable (\(Y\)) between two groups (e.g., Group A and Group B). The null hypothesis is that the means of the two groups are equal \((H_0: \mu_A = \mu_B)\).
Simple Linear Regression
In simple linear regression, you can model the dependent variable ((Y)) as:
\[Y_i = \beta_0 + \beta_1 X_i + \epsilon_i\]where:
- \(X_i\) is a binary variable (e.g., 0 for Group A and 1 for Group B),
- \(\beta_0\) is the mean of (Y) for Group A,
- \(\beta_1\) is the difference in means between Group B and Group A ((\mu_B - \mu_A)),
- \(\epsilon_i\) is the error term.
Equivalence
- The t-statistic for the slope (\(\beta_1\)) in this regression is the same as the t-statistic from the t-test comparing the two groups.
- The regression’s \(p\)-value for testing whether \(\beta_1 = 0\) is identical to the \(p\)-value from the t-test for the null hypothesis that the two group means are equal.
- The estimated coefficients in the regression (\(\beta_0)\) and \(\beta_1\)) correspond directly to the group means and the mean difference.
Assumptions
For the equivalence to hold, the assumptions underlying both methods must be satisfied:
- The dependent variable is approximately normally distributed within each group.
- The variances of the dependent variable in the two groups are equal (homoscedasticity).
- Observations are independent.
Why Use One Over the Other?
- t-Test: Direct and simpler for comparing two group means.
- Regression: More flexible, especially when additional covariates are included (e.g., adjusting for confounders or interacting variables).
