Regression ethics

Author

Dr. Alexander Fisher

View libraries and data used in these notes
library(tidyverse)
library(tidymodels)

A case study

Read the following article: Greenwood et al. (2020)

In particular, as you read, try to answer the following:

  • What is the outcome variable?
  • What are the predictors?
  • How was the data collected?
  • What is the model?
  • What are the results?
10:00

Further reading

The problem with “statistical significance”

Read Simmons, Nelson, and Simonsohn (2011) from the abstract through “Study 1” and read the results reported in Table 1.

Read the code below, and figure out what it’s doing. Chat with your neighbor. Explain what the result means.

set.seed(221)
total = 0
N = 1000
for (j in 1:N) {
  y = rnorm(30)
  x1 = runif(30)
  x2 = rnorm(30)
  model1 = lm(y ~ x1 + x2)
  model2 = lm(y ~ x1)
  model3 = lm(y ~ x2)
  model4 = lm(y ~ x1 * x2)
  model5 = lm(y ~ 0 + x1)
  model6 = lm(y ~ 0 + x2)
  model7 = lm(y ~ 1)
  
  hackable = 0
  for (i in 1:7) {
    m = get(paste0("model", i))
    indicator = 0
    if(sum(tidy(m)$p.value < 0.01) > 0) {
      indicator = 1
    }
    hackable = hackable + indicator 
  }
  total = total + hackable
}

total / N
[1] 0.607

References

Borjas, George J, and Robert VerBruggen. 2024. “Physician–Patient Racial Concordance and Newborn Mortality.” Proceedings of the National Academy of Sciences 121 (39): e2409264121. https://www.pnas.org/doi/10.1073/pnas.2409264121.
Greenwood, Brad N, Rachel R Hardeman, Laura Huang, and Aaron Sojourner. 2020. “Physician–Patient Racial Concordance and Disparities in Birthing Mortality for Newborns.” Proceedings of the National Academy of Sciences 117 (35): 21194–200. https://www.pnas.org/doi/10.1073/pnas.1913405117.
Simmons, Joseph P, Leif D Nelson, and Uri Simonsohn. 2011. “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant.” Psychological Science 22 (11): 1359–66. https://pubmed.ncbi.nlm.nih.gov/22006061/.