A physician leader should understand the major components of construct, internal, and external validity not only for their own leadership research, but also for research and data that they consume in their roles as leaders.
Many years ago, during my research training in healthcare quality and outcomes, a mentor asked me, “How come every time one of you quality guys puts up a slide, the trend line can go up or down over time, and you describe both outcomes as good?” At first, I laughed at the question, but when he didn’t let up, I realized he was serious and he was asking a legitimate question in need of a response.
In outcomes research, there are important safeguards to defining outcomes. Outcomes must be objective, easy to define and measure, and free from bias. This explains why for my chosen specialty of critical care medicine, mortality is often designated as the independent variable of interest. Mortality is objective; you’re either dead or not. It is easy to define. As physicians, we are trained to identify and diagnose death, and because of its objectivity, definition, and measurement, it is essentially free from bias.
The discipline of outcomes research has expanded over time to include additional outcomes like quality of life. It is recognized that these broader outcomes may be less objective, more difficult to define and measure, and include several biases that threaten the validity of the research effort when measuring them. This does not mean that we should abandon the study of quality of life, but we should be aware of the threats to validity in our research by appropriately identifying them and controlling for them through statistical testing and analysis.
Important outcomes in leadership can and have been defined. Over time, there has been an increasing effort to apply objective measures of leadership to gauge the performance of leaders themselves, the outcomes of the organizations and teams they lead, and the impact on patients.
Some of these outcomes, just like with the study of quality of life, may be less objective, harder to define and measure, and subject to important threats to validity that may cause bias, mislead, or compromise the results. Hence, leaders, like researchers, need to know what the common threats to validity are, how they can be measured and controlled for, and, most importantly, how they may be operative in subtle ways in the data we present.
Construct validity is an important threat that gets to the heart of many leadership decisions. Construct validity in research represents the quality of choices between independent and dependent variables.
As described above, mortality is an independent variable. Treatment options that are studied to determine if they reduce mortality are dependent variables that also need to be clearly defined and “pure” in terms of their effect on the outcome and free from interference by other factors that may influence the outcome under study.
Several objective and summative measures have been proposed to reflect the outcomes of leadership. We can think of these as independent variables. Examples include Leapfrog scores, HCAHPS measures, US News and World Report rankings, and Glassdoor approval ratings, to name a few.
Leaders need to be fluent in how these measures are defined, measured, and biased so that they can appropriately frame their results, not only in the context of their performance, but also in terms of the measure’s validity. Does this measure what it is intended to measure, or are there alternative explanations that need to be considered as one interprets the findings?
Internal validity helps to ensure that the relationship between the variables is believable. While other effects may interfere with the relationship, in studies with good internal validity, these effects have been carefully considered, addressed, and controlled for effectively.
For example, the early administration of antibacterial agents in sepsis with shock is associated with a reduction in mortality from sepsis. To determine that a relationship between antibacterial timing and sepsis mortality exists, investigators needed to isolate the timing of antibacterial administration from other potentially common and confounding contributors to survival, such as antibacterial agent choice, vasopressor support, and mechanical ventilation.
In studies of leadership, we are already at a disadvantage for internal validity as compared to clinical studies. Experimental research designs tend to score well on internal validity, yet this type of study design is rarely used in studies on leadership. Nonexperimental designs like quasi-experimental and correlational research studies do not perform as well as experimental studies on internal validity but are useful in leadership effectiveness studies. Case studies are the poorest performing designs with respect to internal validity but are necessary to learn about potential topical opportunities for study with more rigorous methods.
It is important to recognize several threats to internal validity in leadership studies; many of them have to do with how subjects are selected for study. This gets not only to the heart of inclusion criteria, but also includes other important effects like assignment of subjects to study group, failure to follow-up, maturation of the subject over time, and regression to the mean, to name a few.
Consider, for example, the use of employee satisfaction scores, a commonly used measure in healthcare leadership. We can certainly see the relationship between leadership and employee satisfaction, but it’s whose satisfaction is measured that matters. If some of the most disenfranchised employees leave the organization, they may be lost to follow-up, and the scores may be artificially inflated because their data are not included.
Short-term versus long-term employees may have different levels of employee satisfaction as they mature in the organization. Even more importantly, employees may simply accommodate the culture of the organization and become inliers versus outliers in the measurement. This is especially true during repeated testing when the same employee satisfaction surveys are used year after year with the same questions. Hence, knowing what to look for from the perspective of internal validity matters in real terms related to leader performance.
External validity is the ability to extrapolate the study’s findings to other people, settings, or institutions. In research studies, a distinction is often made between in vitro and in vivo experimentation. The results obtained under study in the laboratory may not be seen to the same degree in real-life circumstances. A common example of this is drug trials. Participation in a trial is bound by stringent criteria, guidelines, rules, and regulations when compared to the use of drugs in the broader population.
The findings of a study in leadership in a particular institution may not directly translate to other institutions, even if they are similar. For example, mentorship in leadership is broadly accepted as positive and valuable. The literature has several very robust examples of mentorship programs that have achieved substantial outcomes for both mentors and mentees.
However, mentorship programs do not directly translate from one organization to another. There may be subtle and unmeasured differences in the subject organization that demonstrate mentorship program success not uniformly present in other organizations that want to implement these programs. This is where the threats to external validity become important.
The major threats to external validity include biases in the subjects, the researchers, or the setting. Using the example of studying a mentorship program, subjects in the study institution may have a selection bias for participation. Participants may be motivated by the prestige of participation and its visibility or be compensated either directly through additional pay or indirectly through the reduction in clinical time for the additional work that participation in the program brings.
In addition, there is the well-known “Hawthorne Effect,” which, by virtue of being observed and studied, participants modify their behavior. From the researchers’ perspective, if researchers are not only observers but teachers in the program, they may unintentionally cause bias in the participants through the way they present information. In addition, a common method of measuring outcomes uses a pre-post test design that can bias the findings because participants know what is coming and are less anxious about the post-test than they were on the pre-test.
Finally, the setting matters. Infrastructure, time of day, the importance of the mentorship program to the organization, the presence of other mentorship programs, or university- support all matter in terms of the results of a program. Extrapolation from the program in which the study was performed to other institutions may depend heavily on this infrastructure support.
As physician leadership continues to mature as a profession, rigorous studies that demonstrate the benefits physicians bring to teams, patients, and organizations matter. A physician leader should understand the major components of construct, internal, and external validity not only for their own leadership research, but also for research and data that they consume in their roles as leaders.