Advancing the Kenan Scholars’ research journey and enhancing their research skills, on March 5, the Kenan Scholars Program held its third installment of the Research Workshop Series – Data Analytics for Business Research.
At the workshop, David Fisher, a data scientist at the Kenan Institute of Private Enterprise, explained how to establish causality in social science research and how we can leverage analytical tools to do so.
As David noted, research is driven by one important factor: causality. Proving that a relationship exists between two variables while isolating other factors is at the core of any hypothesis-driven research process. So, how does one go about proving causality and what are the implications of doing so?
So why does causality matter? Well, at the surface level, any two things can seem correlated. For example, people who go to the hospital tend to be less healthy than those who do not. Does that mean that hospitals make people sick? Most likely no, but without proving causality and accounting for other variables, at first glance, that relationship may seem true.
Now, let’s take a look at one major threat to proving causation in research, endogeneity. Common sources of endogeneity include selection bias, omitted variables, and simultaneity. Mr. Fisher gave the example where trying to prove whether a certain after-school program helps kids succeed academically, but where selection bias might unintentionally pick the best-performing kids to participate in the program. In this case, the data would be skewed to seem to show that the program helps kids perform better in the classroom. This is just one example of endogeneity and why it is important to account for it when constructing a research study.
How can we avoid endogeneity? One of the most effective ways is by incorporating randomization into your research process. By selecting kids completely at random for the study, you can mitigate other variables and endogeneity to establish likely causality in your results. The randomized control trial (RCT) design is key to making research verifiable; however, RCT studies are not always possible due to issues such as ethics.
Nonetheless, Mr. Fisher discussed some specific ways we can infer causality in the absence of randomization. Firstly, causality can be established through Regression Discontinuity Design, or RDD. He explained that RDD looks at differences in outcome near arbitrary cutoff points, reducing the influence of selection bias. RDD potentially can get around huge sample selection problems, provided you can convincingly argue that, around a narrow window, treated vs. untreated are practically the same.
Another method Mr. Fisher discussed to infer causality in the absence of a true experimental design is Difference in Differences, or DiD. This method is based around the notion that treatment effect can be estimated by examining changes over time between a treatment group and a control group regardless of treated/untreated starting averages.
Wrapping up this session, Mr. Fisher described some ways to make our own research design stronger. Important questions to ask are, is your outcome supported by prior research and do you have a good theoretical reason to expect the results you are seeing? A second strategy to make your research stronger is to alter your perspective and analyze your methods as if they were completely independent from your work.
The workshop on causality and data analysis provided critical guidance for developing research and avoiding possible faults when trying to establish causal relationships. There is much to think about.