How to increase the external validity of a study in ecology
The degree to which results of a study can be generalized is an important measure of quality of a study, but this is tricky in practice
The external validity of a study is the degree to which results can be generalized. As such, this is a very important feature of experimental and sampling design. High external validity usually means a study is of greater interest to readers (including journal editors), since results are unlikely to be specific to a given case but probably apply more broadly.
The bad news: achieving high external validity pretty much always means more work. This is because your study needs to cover more cases.
Let’s say you’re interested in the effects of microplastic on soils. If you use one soil in your experiment, then your results pertain to this soil, it is not clear to what degree the results can be applied to other situations: your external validity is pretty limited. So in order to increase external validity you need to include more soils in your experimental design. If you observe the same or similar patterns in different soils, then your result was robust to this factor, and thus more broadly generalizable, voilà high external validity. Of course this means you had to include more soils, which is more work. And very likely the results were not the same among the different soil types, but this is okay because you learned something, namely that the results are sensitive to soil type. Basically it’s always good to study more cases.
But when is it enough? That is the real question, and not easy to answer. You clearly can’t do it all, so you will need to pick factors in your experiment that you will represent more broadly. And picking the right factors to vary in your experimental design is an art. Which factors will be important for making general statements? What will most people find interesting? What do you think is most interesting? What is important for driving theory development? What is logistically even feasible? If there are several very important factors in response to these questions (as is often the case, in our example it could also be microplastic type or plant species), then just focus on one or very few for a study. Testing the other factors can be the focus of an entire research program, so a series of experiments, rather than just an individual experiment. And of course, it doesn’t stop with picking your factor or factors to represent. Let’s say you picked soil as the element to vary in your experimental design. So, how many instances of soil do you include? Two or three, or 10 or 50? Again, the choice will have immediate consequences in terms of cost and logistics. Sometimes it can help to have a clear vision of the parameter space about which you wish to make a statement in the end. If it’s German soils, then it may be good to include the top three most important soil types (for example, for agriculture) in Germany. If in the end you want to say something about Europe, then of course you need to go broader, and so on. Again, you can’t do it all, and you need to make a reasonable choice. As you can imagine, many discussions in our lab are about just these points….
In observational studies, as opposed to experiments, you can sample across a more diverse space, and similar thoughts apply; only here you have a pretty clear upper limit: global scale. You can sample locally, or across regions, countries, continents, or at the global scale, and the latter will offer the greatest external validity of your results — of course at an enormous cost in terms of money and logistics.
What are your thoughts, especially in terms of experiments? How do you make these kinds of decisions?
Experimental design is an art and a specialized form of intuition guides the principal in terms of choosing which variables to select. But design exists within a practical continuum of costs, editorial tastes, current research trends...that entire cumbersome sociology of big science and small. Large scope experiments are exercises in almost military logistics;on the other hand, narrow scope experiments are not fairly valued in the marketplace, but they're the bread and butter of scientific progress and are much easier to prosecute. There's Wagnerian Hero Science, and Drudgery Science. All hats off to the latter!
Absolutely! Sometimes, as a PhD student, I am struggling to choose factors to test, and even I am afriad be questioned if I choose these factors which do not have any good reasons, but when I got comments from other reviewers, the most common question is why you choose these rather than those. Most time, I just want to say, I just like or my lab can support me to these factors. But I cannot, because the reviewers want to know the scientific reason.