Statistics

Most scientific manuscripts include statistical analysis, and a study’s conclusions depend on the results of these analyses. If the data are analyzed or reported incorrectly, the manuscript will mislead readers. Therefore, as a scientist, and as a peer reviewer, it is important to have a solid understanding of statistics, and to carefully examine the statistical methods and reporting in manuscripts you review. If you do not feel qualified to fully evaluate the statistics, tell the editor this in your comments so that they know to ask someone else to review them.

Some questions to ask as you review statistical analyses and results are:

Was the sample size appropriate and/or justified? Did the authors perform a power analysis as part of their study design?
Did the data meet the assumptions of the tests used? (e.g., many statistical tests can only be used for data with a normal distribution. Data such as proportions or counts of the number of events are generally not normally distributed and have to be either transformed or, preferably, analyzed with statistical models suitable for these data types). Were the tests used appropriate?
Are the individual data points statistically independent? If there were repeated measurements (for instance, multiple measurements on the same patient), have appropriate statistical models been used?
Have potential sources of bias (e.g. confounding variables) been considered and accounted for in the analysis?
When percentages are presented, are the numerator and denominator clear? E.g., “Of the 500 bee colonies, 200 (40%) were affected by the virus,” or, “Forty percent (200/500) of the bee colonies were affected by the virus.”
Are p-values reported where appropriate? Generally, a p-value should accompany all statistical comparisons mentioned in the text, figures and tables. The actual p-value should be stated (e.g. p = 0.049 and p = 0.0021 rather than p ‹ 0.05 or p ‹ 0.01). However, it is acceptable to state p ‹ 0.0001 if the value is below this threshold. The Statistical Analysis section should also state the threshold for accepting significance, such as "Values of P ‹ 0.05 were considered statistically significant".

There are a number of common problems you might consider when reviewing the methods and statistical analysis of a study. These include:

Replication that is absent or inadequate. Replication is essential in order to minimize sampling error. If a study does not have the right number of replicates, general inferences cannot be made from it and the power of statistical analyses done on the data would be too low. The result of low statistical power is that real differences or treatment effect cannot be detected.
Confounding. The problem of confounding means that differences due to experimental treatments cannot be separated from other factors that might be causing the observed difference. Confounding can be avoided by careful experimental design, such as proper replication, controls and randomization.
Poor sampling methods. In observational studies, random sampling is needed to make sure that the experimental sample is representative of the whole population. If random sampling has not been used, check that the authors justify their sampling methods.
Lack of randomization. In experimental studies, “treatments” must be randomly allocated to experimental units (or vice versa), to make sure that the groups being compared are similar and factors that could confound interpretation of treatment effects are minimized.
Pseudoreplication. The sample size should reflect the number of different times that the effect of interest was independently tested. For instance, if there are repeated measurements on the same set of subjects, as might occur when measuring individuals repeatedly over a period of time, individual data points are not independent. In these cases, averages per individual, or appropriate statistical models that account for repeated measures (e.g. mixed effects models), should be used to analyze the data. If the statistics are not explained, pseudoreplication can often be spotted by looking at the degrees of freedom (essentially, the number of independent pieces of information) of the statistical tests.

Back │ Next