Reading scientific articles and evaluating them is a skill in and of itself. Many people in the sciences can take this skill for granted, and even in writing my own evaluations of medical studies, I can forget that not everyone knows every term. Even more important than knowing the terms is knowing the significance of the terms.
In this on-going piece I’ll explain key terms used in scientific studies and medical trials, and also try to explain why each term matters. Many of these terms are “flags.” Not necessarily red-flags, but definitely signifiers of whether a study is sound. It is not necessary that a rigorous study mentions each of these terms or utilizes practices related to them. But their mention and use does help a reader shape a perception of how reliable the study is.
The following is a short list of methods used by clinicians and scientists to help shape a reliable study. In the main, scientific studies want to accomplish three basic things, regardless of the outcome:
- 1Repeatable Process: a truly rigorous study is conducted and written up in such a way that any team of scientists can read the report and repeat the study with (largely) the same results.
- 2Reduction of Variables: in an ideal study, scientists have only one variable–the test variable. That means that scientists do everything they can to reduce differences between the two groups being studied.
- 3Results, not Conclusions: studies only test for outcomes, not for conclusions. Like reducing variables, a quality study will only test for one or possibly two results, not overarching conclusions.
An example of all these principles in action would be a study with 100 participants, who are all identical in every measurable way, who all perform the same tasks for 90 days. But one group of fifty takes aspirin twice a day, the other group takes a placebo.
The study only tests before, during, and after for changes in blood pressure. Any other team with the resources could repeat this process.
In this study, they may take other measurements, but that is only to isolate whether the change in blood pressure was aspirin related or from some other reason. The aspirin group may accidentally ingest far less caffeine than the placebo group, which would change their blood pressure. Blood or saliva tests help the scientists reduce variables.
At the end of the study, the scientists don’t test for weight loss, for overall health, for happiness, or memory. They only test for blood pressure. They may, in their Discussion section, suggest correlations to other factors and results–and then suggest further study. But they don’t make blanket conclusions, like “We proved aspirin lowers blood pressure.” And they also focus on their one result–whether the aspirin did or didn’t change the blood pressure.
Now let’s explore some practices rigorous studies employ to reach repeatable, reduced variable, results-oriented outcomes.
Perhaps the most important practice for any good study, and one that ensures all the other practices, terms, and concepts we’ll discuss below. Peer-review comes after the study is concluded, but before it is published. Studies published without peer-review are questionable, when not outright false.
The peer-review process is slightly different for every Journal, but it usually goes like this: a team of scientists submit their written study, as well as all other testing data, materials, and results. A team of scientists both in that field and outside that field review all the documents to ensure, among other things, that the study followed best scientific practices. If it’s all on the up-and-up, the paper is published.
Sometimes the peers may require testing to see if the process is repeatable. They may ask the original team to rewrite their results to be less overarching. Or they may reject the entire study as not rigorous enough.
People outside of science sometimes have the wrong idea about peer-review, that it’s just a club of scientists all glad-handing one another. But that’s not at all the case. Scientists are competitors, competitors against each other. They compete for research funding, for prestige, for being the first person to find a result or test a new hypothesis.
If something survives peer-review it’s akin to an athlete getting perfect marks from the athletes she or he is competing against. Imagine how many fewer goals would be scored in a World Cup if that were the process. That should give you an idea how hard it is to get a paper published.
Many people know that a placebo is usually a sugar pill. But not everyone really knows why. One of the main reasons is something termed the placebo effect. I’ll have a separate entry for that later in this article. For now, it’s important to know that scientists are aware whether someone gets a pill in a study affects the results.
This is a manner of reducing variables. It might sound odd, but if there are 100 people in the aspirin study, and fifty people don’t get a pill of any kind, then there’s a variable that is different, in and of itself, between the groups. Giving both groups a pill, even one that’s fake, reduces a variable.
Blinded and Double-Blinded
Related to the placebo effect, blinding and double blinding in a study help reduce variables. For ethical purposes, in general, participants must know every parameter of the study they’re in. This includes what the variable is (aspirin, in our example) and the outcome (blood pressure).
But after they’re made aware of these parameters, they sign a waiver allowing themselves to be “blinded.” This means the researchers will not tell the participants whether they’re getting that variable or a placebo. Going one step from that is double-blinding. This is when the person handing the participant the pill doesn’t know if it’s an aspirin or a placebo.
This matters because if the doctor or nurse handing a participant the placebo acts any differently when they hand the person their pill, the person may pick up on that. If the participant knows they’re in a blood pressure study, and they know they’re getting a placebo that won’t help their blood pressure, they may get anxious or angry. Which will, of course, affect their blood pressure.
A third layer, called triple blinding, is when the analysts studying the results don’t even know which group is placebo and which is the variable. This type of study is much more difficult to perform, and far less common. But it is the most rigorous, and the analysis the most reliable.
A randomized study will start with a population of participants that are as similar as possible. If there is a target participant (people over 40, or left-handed people) then the scientists look for as many people who meet the criteria as possible. Then, just before the study, they will select the people randomly to the test group and the placebo group.
This also reduces variables and helps the process with repeatability. We learn more about the human body every day, and there’s almost no way to know for sure what similarities might be accidentally assigned to each group if it were done by human selection.
As an example from our aspirin illustration, if scientists look at each participant and decide if they’ll be in the placebo or test group, they may unconsciously decide based on eye color. Then they have a situation where all the aspirin people have brown eyes, all the placebo people have blue or green. The aspirin group ends up with lower blood pressure, but then ten years later a study finds out that brown eyed people have naturally lower blood pressure.
This means the aspirin study is now useless. Randomization helps eliminate this possibility, and helps the researchers focus on their results.
I’ll spend less time on this concept than the others, because it is more self-explanatory. A longitudinal study, like its name looks, takes place over a “long” time. Typically speaking, the longer the study, the more reliable the analysis. If doctors observe a population over a year they are able to draw some soft correlations based on outcomes. But if they study those same people over 50 years, they’re able to draw much stronger correlations.
Test-Variable versus Observation
These two concepts are sometimes discussed as opposing ideas, but that’s not always the case. They both have their uses and neither one is, on the face, “better” than the other.
In this kind of study, there is a hypothesis and a variable. Scientists might hypothesize that aspirin lowers blood pressure. So they test it against no aspirin. Then they have results.
This might sound like an odd category–aren’t all studies observed? But there is a distinction: in an observation study, the researchers don’t tell the participants to do anything differently. These studies are typically also Longitudinal, but not always. In one example, an observation study might look at how people move through an intersection. In this study, researchers subtly observe several intersections in a city and report their results.
Pros, Cons, and Overlaps
Both studies have their uses. If a team wants to know the effects of Ashwagandha root, the only reliable data will come from giving one group of people the herb and other group a placebo. But if scientists want to know if there are any lifestyle similarities between healthy and unhealthy people, they can’t ask the people to do anything differently. They simply have to observe large numbers of people for long periods of time and then look for patterns.
One drawback of observational studies is the length of time and number of people needed for good data. (More on that under statistical significance.) One drawback of the variable trial is that it’s usually limited to several weeks of time–what happens after the trial? Did the results stick? Was there an unknown variable at play that couldn’t be isolated in such a short period?
These drawbacks are one reason the most rigorous studies overlap with one another. With a lot of funding, a team of researchers can test a variable and observe the results over years of time. I reference one such study in my Ginseng for Memory article. These studies are preciously rare, and their data are invaluable.
In Vitro versus In Vivo
Typically, these two terms are written in italics, to denote their Latin origins. In brief, in vitro refers to any test done on tissue or with chemicals in a Petri dish (in vitro literally means ‘in glass’). In vivo means the variable was introduced in a living body–animal or human.
The differences in results between the two can be quite stark. A drug or medication to reduce muscle swelling can work perfectly in vitro, when the muscle tissue is isolated. But then when the medication is given to a live person, it does nothing.
Various factors are involved, and failure in one or the other isn’t always failure of the treatment. It just means further research is needed. In general, though, for human treatments we’re always looking for in vivo human studies.
Human versus Animal
I won’t spend a lot of time here, but it is important to look in a study at whether they are testing a medication on animals (usually lab mice or rats) or on human participants. This isn’t to say that animal testing doesn’t bear at all on human efficacy, only that human trials are far more reliable.
This mostly has to do with human variables, themselves. Person A and Person B are far different from one another than Mouse A and Mouse B are from each other. In addition to that, humans are simply different animals. No matter how similar our biology, a treatment in Mouse does not equal a treatment in Human.
So far we’ve focused on things scientists do. Now we’ll look at some terms and concepts that researchers use and discuss. These terms are important because they can give us clues as to how transparent the researchers are with their results, and whether we can fully trust and rely on their analysis.
This does not mean, except in cases where I mention it, that a study must use these terms or deal in these concepts. They are, however, good indicators to guide our readings.
This concept has existed in medicine for almost as long as humans have treated other humans for illness. It’s also referred to as “mind over matter,” and “psycho-somatic” (somatic means ‘of the body’). The principle is simple: the human mind can convince the body that it’s receiving medicine.
Simple, but unbelievable to the point of near-magic. Hard to believe as it is, the effect is well-documented and has even been tested in its own right. People receiving a sugar pill will often experience some form of relief from their “fake” treatment.
Even in our everyday lives we see the placebo effect. We all know someone who drinks a certain tea or coffee drink, takes a certain pill, or performs some other health ritual. And when they don’t do that thing, drink that beverage, look out. The effects go far beyond the particular caffeine amount or pill. It speaks to how our brains react, neurobiologically, to ritual and to believing the thing we’re taking is helping.
The placebo effect is taken into account in all rigorous studies. Which leads us directly into statistical significance.
Even in writing my own reviews of scientific studies I find myself using this phrase an awful lot. But there’s a reason for that, and there’s really no other term that means quite the same thing. (And don’t worry, I’ll keep the actual math off of this page.)
Statistics get a bad rap, but that’s not a fault with the math discipline itself–it’s because of people misusing statistics. In this case, scientists and doctors aren’t using stats to tell us their study proves anything, only that their results weren’t an accident.
Statistical significance, in essence, is a measure of how likely it is that the results would be reached without the variable. If you’d like to get into the weeds (I have, but I enjoy statistics), significance is measured by something called the P-value, and derived from a curve of probabilities.
These numbers have all been tested against, and there is still debate on what constitutes true significance. Classical thought is that any P-value lower than 0.05 is significant. Some researchers have taken to relying more strongly on P-values closer to 0.001.
While statistical significance is important, it isn’t the Law of the Universe. Finding statistical significance is an important threshold, but there are others: standard deviations from the mean, regressions, and confidence versus compatibility. What we can take away from this is that statistically significance is the minimum we look for in a study’s results.
Correlation versus Causation
Most of us probably use these terms interchangeably–but not scientists! I’ll keep this entry brief, but it’s an important note to make. If we read a report that suggests or highlights “correlation” between a treatment variable and an outcome, that doesn’t mean the variable caused the outcome.
In science, “correlation” only means they were observed in the same time period. For instance, if I observe that every time it rains I am wearing fuzzy socks, that doesn’t mean that my fuzzy socks are causing it to rain. This is a simplified example, but it’s apropos because a cursory glance at evidence can lead to startling bad conclusions.
Rigorous research will point out where their observations are clearly causal (thing A caused effect B), or where they may be simply correlated (thing A and effect B happened, but we don’t know why).
Bias and Conflicts of Interest
Much is made in scientific circles about bias and conflicts of interest. A proper study will always discuss their own risks in these areas.
One common example, now known to the whole world, is when doctors in the mid 20th century concluded that cigarettes were neither addictive nor harmful. Then we all found out that those doctors were on tobacco company payrolls. That’s a clear conflict of interest.
Another, far more common conflict is funding. Perhaps the doctors in a trial aren’t being directly paid by a company, but the entire wing of their university was funded by the pharmaceutical company whose medications they’re testing.
Bias can take many forms, too, and that’s another reason why peer-review is so important. Clear biases can be when a researcher is from a culture that despises herbal medicine. They’re not being paid to have an opinion, but are they really looking at their data objectively? When a study is submitted for peer-review, obvious and even inobvious biases might be teased out, and the study rejected for publication.
This article will be updated based on questions and comments from readers, as well as by new science. As new studies come out, new techniques are developed. When that happens, I’ll research them and try to put them in plain language so everyone can understand them. Please leave comments and questions below for any terms you’d like to see defined.