Statistical Reporting Guidelines

European Journal of Applied Physiology


Introduction:
The aim of these guidelines is to provide prospective authors with succinct indications of the expectations of statistical quality that the European Journal of Applied Physiology considers appropriate. They are written assuming a familiarity with common statistical terminology and methods, and are advisory rather than mandatory. If authors are in any doubt, they should consult a bio-statistician at the planning stages of their study. Further consultation during the analytical and/or interpretational stages of the study may also be prudent. 

General Policy Statement:
The European Journal of Applied Physiology’s policy on statistical reporting is to only accept manuscripts for publication which display experimental design, statistical data analysis, and interpretation thereof which are both sound and appropriate to the hypothesis being posed. In particular, the design and methods should be described with sufficient detail and accuracy to enable the reader to repeat the study without ambiguity. Authors are encouraged to provide supporting citations for methods using readily available pertinent original articles and/or reviews. 

Experimental Design:
Good experimental design features are a requirement of all manuscripts considered for publication. The overriding concern is that the design employed is such as to permit the aims of the study to be fully and efficiently achievable. Specifically, the following represents a listing of several of the more commonly employed good design features; many of these features are intended to enhance statistical power and through the control of extraneous variation.

  • Avoidance of confounding (assuring that treatment effects are separable)
  • Balance, counterbalance and crossover (negating temporal or learning effects)
  • Blocking (the use of smaller more homogeneous blocks of experimental units)
  • Use of covariate(s) (such as adjustment for differing initial or baseline values)
  • Randomisation of treatment allocation (often, but not necessarily restricted)
  • Single- or double-blinding (assuring independence of effects)
  • Inclusion of controls or placebos (for normative or comparative purposes)
  • Matched controls (for improved precision of comparisons) 

Sample Size (n) Considerations:
Sample sizes must always be clearly stated. They should be justifiable, and large enough to provide reliable results and inferences. In hierarchical designs, effective sample sizes at each level should be clearly evident. Case studies (n = 1) will be considered on their merits. Intervention studies with small sample size (n less than about 15) should be treated with appropriate caution. For example, in such situations, sensitivity may be compromised, inferences to a wider population may not be feasible, and nonparametric methods may be more appropriate. Other small sample studies will be assessed on their merits.
Larger sample sizes should be subject to an investigation of their cost:benefit aspects, though it is recognised that in the study of rare events where incidence is low, large samples are frequently a necessity. 

Significance Level (α), P-value and Power:
The minimum P-value to achieve significance (the chosen α) should be stated and justified before any data are collected. The commonly adopted α = 0.05 is not necessarily always appropriate. In particular, it is important to recognise that when several tests are being performed (not just in multiple comparisons), the likelihood of rejecting at least one true null hypothesis (a Type 1 error) is increased. Consideration should therefore be given to downward adjustment of the chosen α level(s) a priori. In addition, after analysis, it is informative to readers if authors report the exact P-value for any important results.
Statistical power (probability of rejection of a false null hypothesis, or of not committing a Type 2 error) is affected by the chosen α level, sample size, inherent variability, and the real, but often unknown, value of the tested parameter under the alternate hypothesis (often referred to as the effect size). All of these should be considered. It is advisable to carry out at least an approximate power analysis for any investigation, where feasible. While it is not always necessary to report this in the manuscript itself, it is useful supportive information to include in the submission. 

Selection and Validity of Analytical Methods:
In most instances several analytical methods may be available for use, and it is not normally the case that one is correct and all others are wrong. One may be more appropriate, or more informative than another, and the choice of which to use lies with the author. In some cases it may be helpful to indicate the reason(s) for selection of one method in preference to another. For analysis of data, the European Journal of Applied Physiology has no specific preferred statistical software of choice. Authors are free to select whichever they prefer, but such selection should always be stated and referenced.
Most parametric analyses are based on a variety of assumptions about the distribution of the data (or of the residuals after modelling). Some techniques are sensitive to departures from these assumptions, while others are robust. Authors should note the relevant assumptions for the method(s) they employ, and confirm their validity. If there is a degree of departure, this should be reported also and appropriate remedial action taken. This may necessitate a transformation of the data, application of an adjustment or correction factor, or a change in method. Failure to account for any such departures usually invalidates the method and any subsequent inferences. 

Disclosure of Variability, Uncertainty and Error:
Authors are urged to accurately disclose measures of variability, uncertainty, measurement error etc. For variability of individual values, the standard deviation should be used; whereas for variability of mean values, the standard error should be used. Uncertainty should normally be expressed using either confidence intervals or limits of agreement. The disclosure of measurement error, and the concomitant number of significant digits used when reporting measured values, is an important adjunct to interpretation of the results.

References: 
Readers should consult appropriate texts and references for further information and more specific detail. The following list may be of assistance. 

Altman DG. Why we need confidence intervals. World J Surg. 2005; 29(5): 554-6. 

Altman DG. Statistics in medical journals: some recent trends. Stat Med. 2000; 19(23): 3275-89. 

Altman DG, Bland JM. Standard deviations and standard errors. BMJ. 2005; 331(7521): 903. 

Altman DG, Bland JM. Statistics notes: the normal distribution. BMJ. 1995; 310(6975): 298. 

Altman DG, Gore SM, Gardner MJ, Pocock SJ. Statistical guidelines for contributors to medical journals. Br Med J (Clin Res Ed). 1983; 286(6376): 1489-93. 

Altman DG, Moher D, Schulz KF. Peer review of statistics in medical research. Reporting power calculations is important. BMJ. 2002; 325(7362): 491; author reply 491. 

Atkinson G, Nevill AM. Statistical methods for assessing measurement error (reliability) in variables relevant to sports medicine. Sports Med. 1998; 26(4): 217-38. 

Bailar JC 3rd, Mosteller F. Guidelines for statistical reporting in articles for medical journals. Amplifications and explanations. Ann Intern Med. 1988; 108(2): 266-73. 

Bland JM, Altman DG. Applying the right statistics: analyses of measurement studies. Ultrasound Obstet Gynecol. 2003; 22(1): 85-93. 

Curran-Everett D, Benos DJ. Guidelines for reporting statistics in journals published by the American Physiological Society. J Appl Physiol. 2004; 97: 457-9. 

Curran-Everett D, Benos DJ. Guidelines for reporting statistics in journals published by the American Physiological Society: the sequel. Adv Physiol Educ. 2007; 31(4): 295-8. 

Daniel WW. Biostatistics: A Foundation for Analysis in the Health Sciences (7 Ed). Wiley. New York, 1999. 

Fidler F, Thomason N, Cumming G, Finch S, Leeman J. Editors can lead researchers to confidence intervals, but can't make them think: statistical reform lessons from medicine. Psychol Sci. 2004; 15(2): 119-26. 

Glantz SA. Biostatistics: how to detect, correct and prevent errors in the medical literature. Circulation. 1980; 61(1): 1-7. 

Johnson T. Statistical guidelines for medical journals. Stat Med. 1984; 3(2): 97-9. 

Kurichi JE, Sonnad SS. Statistical methods in the surgical literature. J Am Coll Surg. 2006; 202(3): 476-84. 

Mead R. The design of experiments: Statistical principles for practical application. Cambridge University Press, Cambridge, 1994.

Montgomery DC. Design and Analysis of Experiments (6 Ed). Wiley, New York, 2004. 

Morton RH. On repeated measures designs: Hierarchical structures and time trends. J Sport Sci. 2005; 23(5), 549-57. 

Schriger DL, Cooper RJ. Achieving graphical excellence: suggestions and methods for creating high-quality visual displays of experimental data. Ann Emerg Med. 2001; 37(1): 75-87. 

Sterne JA. Teaching hypothesis tests--time for significant change? Stat Med. 2002; 21(7): 985-94; discussion 995-999, 1001. 

Walter SD. Methods of reporting statistical results from medical research studies. Am J Epidemiol. 1995; 141(10): 896-906. 

Young KD, Lewis RJ. What is confidence? Part 1: The use and interpretation of confidence intervals. Ann Emerg Med. 1997; 30(3): 307-10. 

Zar JH. Biostatistical Analysis (4 Ed). Prentice-Hall, Upper Saddle River, NJ, 1999.

European Journal of Applied Physiology
Editors-in-Chief: H. Westerblad; K.R. Westerterp
ISSN: 1439-6319 (print version)
ISSN: 1439-6327 (electronic version)
Journal no. 421