Using Public Policy for Social Change - Part 7

Research Designs and Concepts for Causal Inferences - Part 2

While the three research designs mentioned (simple pre-test post-test design, two-group comparison or difference-in-differences research design, and time-series design) can provide strong evidence for causal relationships, they each have their own limitations and assumptions that need to be carefully considered to ensure that causation is accurately demonstrated.

Simple Pre-test Post-test Design

Assumptions:

  1. Temporal Causality: The intervention must occur before the outcome.
  2. No Confounding Variables: The change in the outcome must be due to the intervention and not other factors.

Limitations:

  • Confounding Variables: If there are unobserved or uncontrolled variables that change over time, they can affect the outcome and lead to incorrect conclusions about causation.
  • Measurement Errors: Errors in measuring the intervention or outcome can distort the results.

Two-Group Comparison or Difference-in-Differences Research Design

Assumptions:

  1. Temporal Causality: The intervention must occur before the outcome.
  2. No Confounding Variables: The change in the outcome must be due to the intervention and not other factors.
  3. Parallel Trends: The control group and treatment group should have similar trends in outcomes before the intervention.

Limitations:

  • Selection Bias: If the groups are not randomly assigned, there could be differences in observed and unobserved characteristics that affect the outcome.
  • Common Time Trends: If there are common time trends that affect both groups, these trends need to be controlled for to isolate the effect of the intervention.

Time-Series Design

Assumptions:

  1. Temporal Causality: The intervention must occur before the outcome.
  2. No Confounding Variables: The change in the outcome must be due to the intervention and not other factors.
  3. Stationarity: The time series should be stationary, meaning that the statistical properties of the series do not change over time.

Limitations:

  • Confounding Variables: Unobserved or uncontrolled variables that change over time can affect the outcome.
  • Seasonality and Trends: Time series data often includes seasonal patterns and trends that need to be accounted for to isolate the effect of the intervention.

Establishing a Counterfactual

To establish a counterfactual, which is essential for demonstrating causation, researchers often use techniques such as:

  1. Matching: Matching the treatment and control groups based on observed characteristics to reduce selection bias.
  2. Regression Analysis: Using regression models to control for confounding variables and isolate the effect of the intervention.
  3. Instrumental Variables: Using instrumental variables to identify the causal effect by exploiting the relationship between the instrument and the treatment.

Confidence in Causal Relationships

While these designs can provide strong evidence for causal relationships, they are not foolproof. Each design has its own set of assumptions and limitations that need to be carefully addressed to ensure that the causal relationship is accurately captured. Therefore, it is crucial to:

  1. Validate Assumptions: Ensure that the assumptions underlying each design are met.
  2. Control for Confounders: Use appropriate methods to control for confounding variables.
  3. Sensitivity Analysis: Conduct sensitivity analyses to check the robustness of the findings to different assumptions and scenarios.

When using a pre-test, post-test design, several factors could potentially cause a change in the outcome between the first and second observation points, rather than the intervention itself. These factors are known as threats to internal validity. Here are some key threats to consider:

  • HistoryAny event occurring outside the experiment that could affect the outcome, such as changes in weather, news events, or personal life events. For example: This situation arises when an additional factor happens simultaneously with the intervention and could be contributing to or driving the change observed between the pre- and post-intervention phases. Referring back to our soda tax example, what would happen if schools in that city implemented a new approach to nutrition in their school lunch offerings? They provide more opportunities for physical activity for children or eliminate vending machines from school premises concurrently with the implementation of the soda tax. A series of other related interventions might be occurring simultaneously in history, and our counterfactual would need to consider this.

  • Maturation: Natural changes that occur in participants over time, such as cognitive development in children, can influence the outcome and be mistaken for the effect of the intervention. Maturation refers to the natural changes that occur in individuals over time, which can influence the outcomes of a study independently of any intervention. This phenomenon suggests that participants may show trends in behavior or performance due to developmental processes or aging rather than as a direct result of the intervention being studied.
  • A Testing Effect: The effects of repeated testing can influence the outcome. For example, participants may show improvement due to familiarity with the test rather than the intervention. The testing effect occurs when individuals are aware that they are being observed or assessed, which can lead to changes in their responses or behaviors that are unrelated to the actual intervention. This is particularly relevant in studies evaluating educational interventions, where simply taking a pre-test can influence participants' knowledge, beliefs, or attitudes. Consequently, any observed changes between the pre-test and post-test may not be solely attributable to the intervention but could also stem from the effects of the testing process itself. In addition to the testing effect, another significant threat to internal validity is dropout bias, or loss to follow-up bias. This occurs when participants do not complete the study for various reasons, such as relocating, passing away, or failing to respond to follow-up surveys. The changes observed between the initial and subsequent measurements may therefore reflect shifts in the population rather than effects from the intervention. As a result, it becomes challenging to ascertain whether any observed differences are genuinely due to the intervention or simply a result of changes in the participant pool.
  • Regression to the mean: The natural tendency for extreme scores to regress towards the mean, which can lead to an apparent improvement in performance due to familiarity with the test or other factor. Regression to the mean is a statistical phenomenon where extreme values in a dataset tend to return closer to the average or mean value over time, even in the absence of any intervention. This occurs because extreme values are often outliers that deviate significantly from the central tendency of the distribution. Example in Education: In educational settings, for instance, students who score extremely high on a test might be expected to score lower on subsequent tests, and vice versa for those who scored extremely low. This is not necessarily because they have learned less or more but because their initial scores were outliers that naturally regress towards the mean.


The two-group comparison or difference-in-differences (DID) research design is a powerful method for evaluating causal relationships, particularly in policy evaluation. By incorporating a comparison group, this approach helps mitigate various threats to internal validity, such as history, maturation, testing effects, and regression to the mean. This allows researchers to infer that any significant differences observed between the intervention and control groups after the intervention are likely attributable to the intervention itself. This leads us to discuss experimental designs.

Experimental designs, particularly Randomized Controlled Trials (RCTs),
are considered the gold standard in research for establishing causal relationships. By randomly assigning participants to either the intervention group or the control group, RCTs aim to create two groups that are statistically identical in all respects except for their exposure to the intervention. This design helps to control for various threats to internal validity and provides a clear framework for evaluating the effects of an intervention. 
Randomized Controlled Trials (RCTs) in their standard form typically involve two groups that are randomly allocated to either receive a new intervention or policy change, or to serve as a control group. RCTs involve randomly allocating entities into two groups: one receiving the new intervention or policy change and the other serving as a control group. This structure ensures that any observed differences can be attributed to the intervention, making RCTs a powerful tool for establishing causality in research settings.

Components of RCTs:

Random Allocation
Participants, organizations, neighborhoods, communities, provinces, or other entities are randomly assigned to either the intervention group or the control group. This randomization process aims to minimize selection bias and ensure that the groups are comparable in all respects except for their exposure to the intervention

Intervention Group: 
This group receives the new intervention or policy change. The goal is to measure the effect of this intervention on the outcome variables.

Control Group
This group does not receive the new intervention or policy change. The control group serves as a baseline to compare the outcomes of the intervention group, helping to isolate the effect of the intervention

Data Collection and Analysis in RCTs

Randomized Controlled Trials (RCTs) are structured to collect data on the characteristics of both groups before and after the intervention. 

Data Collection

  1. Pre-Intervention Data:
    • Characteristics: Data is collected on the characteristics of both the intervention and control groups at observation point 1 (pre-intervention).
    • Key Outcomes: Specific data points are recorded concerning the key outcomes of interest. This baseline measurement helps establish a clear understanding of the initial conditions.
  2. Post-Intervention Data:
    • Observation Point 2: Data is also gathered at observation point 2, which occurs after the intervention has been implemented in the treatment group.
    • Outcome Measures: The same key outcomes are measured again to assess any changes that may have occurred due to the intervention.

Statistical Analysis

  1. Evaluating Differences:
    • Statistical analysis is conducted to evaluate whether the difference in changes between observation points 1 and 2 in the treatment group is distinct from the changes observed in the control group.
    • This comparison helps determine if the observed changes are due to the intervention itself or other factors.
  2. Counterfactual Analysis:
    • The control group serves as an effective counterfactual, representing what would happen in the absence of the intervention. By comparing outcomes between these two groups, researchers can infer that any significant differences are likely caused by the intervention.

Internal Validity and Counterfactual

  1. Internal Validity:
    • RCTs are renowned for their exceptional internal validity. This means they provide a reliable way to establish causality by minimizing confounding variables through random assignment.
    • The internal validity ensures that any observed effects can be attributed to the intervention rather than other factors.
  2. Counterfactual:
    • The control group acts as an ideal counterfactual because it represents what would happen if no intervention were applied. This allows researchers to isolate and measure the specific impact of the intervention on outcomes.

Challenges in RCTs

  1. High Costs:
    • Conducting RCTs can be resource-intensive, requiring significant financial investment. This is particularly true for large-scale studies involving multiple participants or communities.
  2. Time Requirements:
    • RCTs often necessitate a substantial amount of time to complete, from initial planning stages to final data analysis and publication. This prolonged duration can make it challenging to implement timely policy changes based on trial results.
  3. Ethical Considerations:
    • Ethical issues arise when random assignment is used, especially in public policy and program evaluations. For instance, assigning communities to different policy interventions without their consent raises concerns about fairness and equity.
    • Ensuring that participants are fully informed about the study and providing them with options for opting out is crucial for maintaining ethical standards.
  4. Balancing Rigor and Practicality:
    • While maintaining high internal validity is essential, it must be balanced with practical considerations such as feasibility, cost constraints, and ethical implications. Researchers often need to adapt their designs to accommodate these challenges while still striving for robust results.

RCTs are a powerful tool for evaluating interventions due to their exceptional internal validity and ability to provide a reliable counterfactual. However, they face significant challenges including high costs, time requirements, and ethical considerations. By acknowledging these limitations and adapting study designs accordingly, researchers can ensure that their findings are both rigorous and practically applicable in real-world settings.

In the realm of policy evaluation, randomized controlled trials (RCTs) are often considered the gold standard due to their ability to establish causation by controlling for confounding variables through random assignment. However, in practical applications, especially in real-world settings, researchers frequently resort to other research designs due to various constraints such as ethical considerations, feasibility, and resource limitations. Each of these alternative designs has its own strengths and weaknesses that must be recognized and addressed.

Common Alternative Research Designs

Quasi-Experimental Designs

  • Strengths: More feasible in real-world settings where random assignment is not possible.
  • Weaknesses: Greater risk of selection bias since participants are not randomly assigned.

Case Studies

  • Strengths: Provide in-depth insights into specific instances or contexts, allowing for a rich understanding of complex phenomena.
  • Weaknesses: Limited generalizability due to the focus on specific cases.