Fisher's Exact Test
A statistical significance test used in the analysis of contingency tables, particularly famous for its accuracy with small sample sizes and rare events.
1. What is Fisher's Exact Test?
Fisher's exact test is a statistical test used to determine if there are nonrandom associations between two categorical variables. It is used to test the null hypothesis ($H_0$) that the two variables are completely independent of one another (i.e., the proportions of one variable do not depend on the value of the other variable).
Why "Exact"? Unlike many other statistical tests (like the Pearson's Chi-square test) that rely on large-sample approximations, Fisher's test calculates the exact probability of observing the data given that the null hypothesis is true, using the hypergeometric distribution. This means there are no continuous approximations for discrete data.
2. The Origin: The Lady Tasting Tea
The test was invented by Ronald Fisher in the 1920s. The legend goes that at a tea party, a colleague (Dr. Muriel Bristol) claimed she could tell whether the tea or the milk was poured into the cup first.
Fisher scoffed at the idea and devised an experiment on the spot. He prepared 8 cups of tea (4 milk-first, 4 tea-first) and asked her to identify them. She guessed all 8 correctly. Fisher developed this exact test to determine the probability of her guessing perfectly by pure chance. (Spoiler: The probability was $p = 0.014$, leading Fisher to conclude she likely actually possessed the skill!)
3. When should a Data Scientist use it?
You should reach for Fisher's Exact Test when you are working with a 2x2 contingency table and specific conditions are met.
Ideal Scenarios in Data Science:
- Small Sample Sizes: When A/B testing a new feature on a very small, restricted beta group.
- Rare Events: Click-through rates on highly specific ad targeting, or rare medical diagnoses in healthcare analytics where counts might be 0, 1, or 2.
- Sparse Tables: When more than 20% of the expected cell counts are less than 5, or any expected cell count is less than 1 (this violates Chi-Square assumptions).
- Absolute Certainty: You want an exact p-value without relying on asymptotic distributions.
4. The Mathematical Formula
Consider a 2x2 contingency table representing two variables, each with two outcomes:
| Success | Failure | Row Total | |
|---|---|---|---|
| Group A | \( a \) | \( b \) | \( a + b \) |
| Group B | \( c \) | \( d \) | \( c + d \) |
| Col Total | \( a + c \) | \( b + d \) | \( n \) |
Assuming the row and column totals are fixed, the probability of obtaining this specific configuration of data is given by the Hypergeometric Distribution:
Note: The formula above gives the probability of the exactly observed table. To find the standard two-tailed p-value, we compute the sum of probabilities for the observed table plus all other theoretically possible tables that are as extreme or more extreme (i.e., those with a probability $\le$ the observed probability), while keeping the marginal totals constant.
5. Interactive Calculator
Try it yourself! Enter values into the 2x2 table to instantly compute the Fisher's Exact Test p-value using the exact mathematical algorithm in your browser.
Two-Tailed P-Value
* Note: High numbers (>150) may cause performance slowdowns. The algorithm calculates exact hypergeometric probabilities across all possible more extreme permutations.
6. Fisher's Exact vs. Pearson's Chi-Square
When deciding between the two, remember this rule of thumb:
Fisher's Exact Test
- • Exact p-value
- • Best for small samples or rare events
- • Computationally intensive for very large tables
- • Use when: >20% expected frequencies are < 5.
Chi-Square Test
- • Approximate p-value
- • Best for large samples
- • Very fast calculation
- • Use when: all expected frequencies are $\ge$ 5.
7. Implementation in Python & R
In practice, as a Data Scientist, you won't calculate this by hand. Here is how you perform Fisher's Exact Test using popular Data Science libraries.
Using Python (SciPy)
import scipy.stats as stats
# Define the 2x2 contingency table
# [[Group A Success, Group A Failure], [Group B Success, Group B Failure]]
table = [[2, 8],
[9, 3]]
# Perform Fisher's Exact Test
res = stats.fisher_exact(table, alternative='two-sided')
print(f"P-value: {res.pvalue:.4f}")
print(f"Odds Ratio: {res.statistic:.4f}")
Using R
# Define the 2x2 matrix
table <- matrix(c(2, 8,
9, 3),
nrow = 2,
byrow = TRUE)
# Perform Fisher's Exact Test
result <- fisher.test(table, alternative = "two.sided")
print(result$p.value)
print(result$estimate) # Odds ratio
