Evidence-Based Medicine Part 1- How to Interpret an RCT Dr Swapnil Pawar
3 Key areas –
Randomization eliminates selection bias.
It balances intervention and control arms regarding prognostic variables (known and unknown).
Randomization acts as a basis for statistical tests
This method is equivalent to tossing a coin for each subject in the trial, such as Heads = intervention, Tails = Placebo. The random number generator is generally used. It is simple and easy to implement and treatment assignment is totally unpredictable. However, imbalances may occur, particularly in smaller trials.
Simple randomization does not ensure balance in numbers during the trial. For instance, severity of illness may vary during the course of a trial, with sicker patients in the early phase of the trial. Hence, patients are divided into blocks – a block may have, for instance, twice the number of study arms (n=2×2). Thus each block will be 4 patients; 2 will be randomly allocated to one arm, while two get allocated to the other arm. This will ensure a balance of patients throughout the course of the trial.
This method of randomization is carried out to reduce subgroup bias. For instance, patients may be stratified based on age groups, gender, premorbid illnesses, etc, to ensure that there are equal number of patients in each stratum. If stratified randomization is not carried out, a particular subgroup of patients may get treated with the intervention or be included in the control group, leading to bias.
Randomization need not always be 1:1; it could be 2:1. Unequal randomization may be carried out to reduce the cost by using less numbers in one arm; it may help evaluate safety profile on an experimental intervention if there are more patients in that arm. Besides, patients may be more willing to participate in a trial if their likelihood of being allocated a study treatment is higher.
This means that neither care providers, investigators or participants are aware of whether the next eligible participant will receive treatment or be in the control group. Allocation concealment ensures that the decision to whether to enroll a participant or not (e.g., the investigator decides not to enroll a patient to the study treatment, knowing that the outcome is poor). This is particularly important when blinding of arms is possible.
The main purpose of performing an RCT is relies on both arms being treated in exactly the same way. There should be no known or unknown advantage to one arm or the other. If the investigators or participants are aware of who is getting what, information bias may occur. The procedure of blinding the participants (single blind) or both investigators and participants (double blind) helps to eliminate information bias. Whenever possible, blinding should be used in an RCT. It is not always possible to blind either the participants or investigators due to the nature of the RCT. There are many studies that cannot be blinded due to practical reasons; for instance, a study on tracheostomy vs. no tracheostomy.
Minimum clinically important difference, that is required to detect between two groups and convince users of the information to utilize the intervention. First, one needs to know the baseline estimate of outcome rate in the placebo/control arm, i.e., how many patients are expected to benefit from the control intervention. Second, it is important to have an expected estimate as to what percentage of patients are expected to benefit from the intervention. This number is usually derived from previous experiments/observations, previous trials or by consensus opinion.
This is where one needs to be careful to not overestimate the benefit (as this will need lower sample size) or to underestimate the benefit (as one may end up experimenting on more patients than necessary)
Type II (beta) error: false negative. The null hypothesis is accepted when the null hypothesis is not true; i.e., false negative
Different methods of analysis may lead to different results. Reporting per-protocol analysis rather than intention-to- treat analysis often results in overestimation of the effect of intervention. It is important to maintain protocol violations to a minimum
Statistical significance is the likelihood that the observed difference groups is due to chance. If the P value is higher than the chosen alpha level (e.g., .05), the observed difference is assumed to be due to sampling variability. With a sufficiently large sample, a statistical test will almost always demonstrate a significant difference, unless there is no effect whatsoever, that is, when the effect size is exactly zero; however, small differences, even though statistically significant, are often meaningless in clinical practice. Thus, reporting of only P values is inadequate to fully understand the results.
For example, with a large sample size of 10 000, a significant P value may be obtained even if the between groups is negligible. It is also important to remember that the level of significance may not predict effect size; for instance, a lower P value, does not indicate a larger effect size. Unlike significance tests, effect size is independent of sample size.. For this reason, P values are considered to be confounded as they are sample size-dependent. A statistically significant result may only mean that an oversized sample size was used.
Let us consider the study of aspirin to prevent myocardial infarction (MI). More than 22 000 subjects were studied for an average duration of 5 years. Aspirin was shown to be associated with a reduction in MI with a high level of significance (P< .00001). However, the cardiovascular mortality was not different. Aspirin was recommended for general prevention based on this study. However, the effect size was very small: a risk difference of only 0.77%. Many subjects were advised aspirin although they were unlikely to benefit, with a likelihood of adverse effects. Later studies found even smaller effects, leading to modification of the recommendation to use aspirin.
Effect size and confidence interval
It is more meaningful to report the effect size (and its 95% confidence interval) compared to p-values alone. The effect size provides more precise information about the magnitude of effect. Effect size is expressed as risk difference, risk ratio, odds Ratio, or correlation coefficient.
Sampling: Conclusions are made regarding a whole population based on the findings of the study sample. Different samples will provide different results. This is called sampling error. A narrow confidence interval is obtained when the variation is minimal. When variation is large, the confidence interval will be wide. When sample is small, a wide confidence interval is obtained; large the sample, the narrower the confidence interval. A 95% confidence interval is a range of values that you can be 95% certain contains the true mean of the population.
Power: The chance that the study will be able to demonstrate a significant difference if it is present, is known as the power of the study. By convention it is fixed at 0.8 to 0.95 level (80–95%). The study design may be to evaluate the superiority, equivalence or noninferiority; power calculation is different for each type of design.
External validity and generalizability
Is the study generalizable? Will other study populations will get similar results if the follow the same line of treatment?
The observed incidence may be lower, making the trial underpowered, or higher, making it overpowered. Interim analysis is a useful way to make sure that the observed incidence is not too different from the expected incidence. However, interim analyses should be preplanned and stated in the protocol. When event rates are lower than anticipated or variability is larger than expected, methods for sample size re-estimation are available without unblinding.
Interim analysis may sometimes show that differences in the two groups are large and show a clear advantage of the intervention. In this case, continuing the trial is unethical because the control group will be denied the clearly superior alternative