Evidence-Based Medicine Part 1- How to Interpret an RCT

Dr Swapnil Pawar October 4, 2020 884 5

share close

Interpreting an RCT

Blog written by – Dr Jose Chacko & Dr Swapnil Pawar

3 Key areas –

  • the validity of the trial methodology;
  • the magnitude and precision of the treatment effect;
  • the applicability of the results to your patient or population.

Questions to consider when assessing an RCT

  • Did the study ask a clearly focused question?
  • Was the study an RCT and was it appropriately so?
  • Were participants appropriately allocated to intervention and control groups?
  • Were participants, staff, and study personnel blind to participants’ study groups?
  • Were all the participants who entered the trial accounted for at its conclusion?
  • Were participants in all groups followed up and data collected in the same way?
  • Did the study have enough participants to minimise the play of chance?
  • How are the results presented and what are the main results?
  • How precise are the results?
  • Were all important outcomes considered and can the results be applied to your local population?
  1. Why randomize: 

Randomization eliminates selection bias.
It balances intervention and control arms regarding prognostic variables (known and unknown).
Randomization acts as a basis for statistical tests

Simple randomization 

This method is equivalent to tossing a coin for each subject in the trial, such as Heads = intervention, Tails = Placebo. The random number generator is generally used. It is simple and easy to implement and treatment assignment is totally unpredictable. However, imbalances may occur, particularly in smaller trials. 

Block randomization

Simple randomization does not ensure balance in numbers during the trial. For instance, severity of illness may vary during the course of a trial, with sicker patients in the early phase of the trial. Hence, patients are divided into blocks – a block may have, for instance, twice the number of study arms (n=2×2). Thus each block will be 4 patients; 2 will be randomly allocated to one arm, while two get allocated to the other arm. This will ensure a balance of patients throughout the course of the trial. 

Stratified randomization 

This method of randomization is carried out to reduce subgroup bias. For instance, patients may be stratified based on age groups, gender, premorbid illnesses, etc, to ensure that there are equal number of patients in each stratum. If stratified randomization is not carried out, a particular subgroup of patients may get treated with the intervention or be included in the control group, leading to bias. 

Unequal randomization 

Randomization need not always be 1:1; it could be 2:1. Unequal randomization may be carried out to reduce the cost by using less numbers in one arm; it may help evaluate safety profile on an experimental intervention if there are more patients in that arm. Besides, patients may be more willing to participate in a trial if their likelihood of being allocated a study treatment is higher. 

  1. Blinding and allocation concealment – what those term mean

Allocation concealment

This means that neither care providers, investigators or participants are aware of whether the next eligible participant will receive treatment or be in the control group. Allocation concealment ensures that the decision to whether to enroll a participant or not (e.g., the investigator decides not to enroll a patient to the study treatment, knowing that the outcome is poor). This is particularly important when blinding of arms is possible.


The main purpose of performing an RCT is relies on both arms being treated in exactly the same way. There should be no known or unknown advantage to one arm or the other. If the investigators or participants are aware of who is getting what, information bias may occur. The procedure of blinding the participants (single blind) or both investigators and participants (double blind) helps to eliminate information bias. Whenever possible, blinding should be used in an RCT. It is not always possible to blind either the participants or investigators due to the nature of the RCT. There are many studies that cannot be blinded due to practical reasons; for instance, a study on tracheostomy vs. no tracheostomy. 

  1. Sample size calculation and Power 

Minimum clinically important difference, that is required to detect between two groups and convince users of the information to utilize the intervention. First, one needs to know the baseline estimate of outcome rate in the placebo/control arm, i.e., how many patients are expected to benefit from the control intervention. Second, it is important to have an expected estimate as to what percentage of patients are expected to benefit from the intervention. This number is usually derived from previous experiments/observations, previous trials or by consensus opinion.

This is where one needs to be careful to not overestimate the benefit (as this will need lower sample size) or to underestimate the benefit (as one may end up experimenting on more patients than necessary)

  1. Type I (alpha) error: false positive, when the null hypothesis is rejected when the null hypothesis is indeed true (false positive)

Type II (beta) error: false negative. The null hypothesis is accepted when the null hypothesis is not true; i.e.,     false negative  

  1. What is Intention to treat analysis: Outcomes of all participants randomized to the intervention arm should be reported in that group, even if they did not receive the intervention
  2. Per protocol analysis: Based on the treatment they actually received; this may introduce a bias. 

Different methods of analysis may lead to different results. Reporting per-protocol analysis rather than intention-to- treat analysis often results in overestimation of the effect of intervention. It is important to maintain protocol violations to a minimum

  1. p Value vs Confidence interval – what matters the most? 

Statistical significance is the likelihood that the observed difference groups is due to chance. If the P value is higher than the chosen alpha level (e.g., .05), the observed difference is assumed to be due to sampling variability. With a sufficiently large sample, a statistical test will almost always demonstrate a significant difference, unless there is no effect whatsoever, that is, when the effect size is exactly zero; however, small differences, even though statistically significant, are often meaningless in clinical practice. Thus, reporting of only P values is inadequate to fully understand the results.

For example, with a large sample size of 10 000, a significant P value may be obtained even if the between groups is negligible. It is also important to remember that the level of significance may not predict effect size; for instance, a lower P value, does not indicate a larger effect size. Unlike significance tests, effect size is independent of sample size.. For this reason, P values are considered to be confounded as they are sample size-dependent. A statistically significant result may only mean that an oversized sample size was used. 

Let us consider the study of aspirin to prevent myocardial infarction (MI). More than 22 000 subjects were studied for an average duration of 5 years. Aspirin was shown to be associated with a reduction in MI with a high level of significance (P< .00001). However, the cardiovascular mortality was not different. Aspirin was recommended for general prevention based on this study. However, the effect size was very small: a risk difference of only 0.77%. Many subjects were advised aspirin although they were unlikely to benefit, with a likelihood of adverse effects. Later studies found even smaller effects, leading to modification of the recommendation to use aspirin. 

Effect size and confidence interval

It is more meaningful to report the effect size (and its 95% confidence interval) compared to p-values alone. The effect size provides more precise information about the magnitude of effect. Effect size is expressed as risk difference, risk ratio, odds Ratio, or correlation coefficient. 

Confidence interval

Sampling: Conclusions are made regarding a whole population based on the findings of the study sample. Different samples will provide different results. This is called sampling error. A narrow confidence interval is obtained when the variation is minimal. When variation is large, the confidence interval will be wide. When sample is small, a wide confidence interval is obtained; large the sample, the narrower the confidence interval. A 95% confidence interval is a range of values that you can be 95% certain contains the true mean of the population.

Power: The chance that the study will be able to demonstrate a significant difference if it is present, is known as the power of the study. By convention it is fixed at 0.8 to 0.95 level (80–95%). The study design may be to evaluate the superiority, equivalence or noninferiority; power calculation is different for each type of design. 

External validity and generalizability

Is the study generalizable? Will other study populations will get similar results if the follow the same line of treatment? 

Interim analysis 

The observed incidence may be lower, making the trial underpowered, or higher, making it overpowered. Interim analysis is a useful way to make sure that the observed incidence is not too different from the expected incidence. However, interim analyses should be preplanned and stated in the protocol. When event rates are lower than anticipated or variability is larger than expected, methods for sample size re-estimation are available without unblinding.

Interim analysis may sometimes show that differences in the two groups are large and show a clear advantage of the intervention. In this case, continuing the trial is unethical because the control group will be denied the clearly superior alternative

Rate it
Previous episode
Steroid Story in COVID-19
eCritCare Podcast
share playlist_add
  • 1032


Steroid Story in COVID-19

Dr Swapnil Pawar September 21, 2020

Steroids in COVID-19 The role of corticosteroids in treating severe infections has been debatable for some time.  During the coronavirus disease 2019 (COVID-19) pandemic, rigorous data on the efficacy of […]

Read more trending_flat