These are essential mathematical tests which are applied to statistics to determine their degree of certainty and their significance.
Nonparametric interferential statistical methods:
These are mathematical procedures to test the statistical hypothesis which, unlike parametric statistics, do not make any assumption about the frequency distributions of the variables which are determined.
The level of measure may be nominal or ordinal.
The sample does not have to be random.
The frequency distribution does not have to be normal.
It can be used with smaller samples.
Parametric deductive statistical methods:
These are mathematical procedures to test the statistical hypothesis which assume that the distributions of the determined variables have certain characteristics.
The level of measure must be rational or interval.
The sample must be random.
The frequency distribution must be normal.
The variation in results between each frequency must be similar.
Response variable 

Study factor 
Nominal qualitative (two categories) 
Nominal qualitative (> 2 categories) 
Ordinal qualitative 
Quantitative 
Qualitative (two groups) 




Independent 
Ztest for comparison of proportions. Chisquared. Fisher’s exact test 
Chisquared. 
MannWhitney Utest. 
Student’s t–test Fisher. Welch test 
Paired 
McNemar test Fisher’s exact test. 
Cochran’s Q test. 
Sign test. Wilcoxon signedrank test. 
Student’s ttest Fisher for paired data. 
Qualitative (more than two groups) 




Independent 
Chisquared. 
Chisquared. 
KruskalWallis test. 
Analysis of variance. 
Paired 
Cochran’s Q test. 
Cochran’s Q test. 
Friedman test. 
Twoway analysis of variance. 
Quantitative 
Student’s ttestFisher. 
Analysis of variance 
Spearman’s correlation. Kendall’s tau. 
Pearson’s correlation. Linear regression. 
When the statistical tests applicable to quantitative variables do not meet the assumptions needed for their application, corresponding tests should be used as if the response variables were an ordinal variable (nonparametric tests).
KOLMOGOROVSMIRNOV TEST
Nonparametric statistical significance test for contrasting the null hypothesis when the localization parameters of both groups are equal.
This contrast, which is only valid for continuous variables, compares the theoretical distribution function (accumulated probability) with the observed one, and calculates a discrepancy value, usually represented as D. This value corresponds to the maximum discrepancy in absolute value between the observed distribution and the theoretical distribution, thus providing a probability value P, which corresponds, if we are verifying goodnesstofit to the normal distribution, to the probability of obtaining a distribution which differs as much as the observed one if a random sample had really been obtained, of size n, with a normal distribution.
If this probability is high, then there will be no statistical reasons to assume that our data does not come from a distribution, whereas if it is very low, it will not be acceptable to assume this probability model for the data.
FTEST
Statistical test which is used to compare variances.
The experimental Fstatistic is the contrast statistic in ANOVA and other variance comparison tests.
CHISQUARED TEST
The chisquared test is any statistical hypothesis test in which the statistical test has a chisquared distribution if the null hypothesis is true.
It determines whether there is an association between qualitative variables.
If the pvalue associated to the contrast statistic is less, the null hypothesis will be rejected.
It is used to analyze contingency tables and comparison of proportions in independent data.
FISHER’S EXACT TEST (p. 5%)
It enables the effect of chance to be evaluated.
It is a statistical significance test used to analyze categorical data in small samples.
The Fisher test is needed when we have data which is classified into two categories in two different ways.
Statistical significance test used to compare proportions in contingency tables.
It is preferred to the x2 test when the sample size is small (less than 30 subjects).
It is the statistical test of choice when the Chisquared test cannot be used because the sample size is too small.
McNEMAR TEST.
Statistical test which is used to compare proportions in paired data.
Statistical significance test for testing the null hypothesis of inexistence of changes in the proportion of subjects who experiment an event, when each individual is evaluated twice (in different conditions) and the data is paired.
BINOMIAL TEST
In statistics, the binomial test is an exact test of the statistical significance of deviations of a theoretically forecasted distribution of observations in two categories.
The most common use of the binomial test is in the case where the null hypothesis is that two categories are equally likely to occur.
PEARSON’S CORRELATION TEST
This is used to study the association between a study factor and a quantitative response variable. It measures the degree of association between two variables giving values between 1 and 1.
 Values close to 1 will indicate strong positive linear association.
 Values close to 1 will indicate strong negative linear association.
 Values close to 0 will indicate no linear association, which does not mean that another type of association may exist.
Test in a null hypothesis that the relative frequencies of occurrence of the observed events follow a specified frequency distribution.
The events should be mutually exclusive.
This is a goodnessoffit test which establishes whether or not an observed frequency distribution differs from a theoretical distribution.
KAPPA COEFFICIENT
The Kappa is a general index of acceptance in interobserver studies. It indicates the degree of interobserver interrelationship.
It permits the level of interobserver agreement to be quantified in order to reduce the subjectivity of the method used (mobility test) and to know whether the degree of agreement is due to chance (luck).
The percentage of agreement along with the Kappa index is used for qualitative variables.
The Kappa coefficient is used for two therapists and the Fleiss coefficient for more than two therapists.
This coefficient ranges between 0 and 1. 0 corresponds to a correlation which is identical to that found by chance and 1 a perfect correlation between the examinations.
Negative values usually indicate that there is disagreement between two therapists as to how to perform the method.
It is calculated as the proportion of agreement, apart from that expected by chance alone, that has been observed between two repetitions of the same instrument (for example, a judgement carried out by two observers separately).
The maximum coefficient of agreement is 1.00.
A value of 0.00 indicates no agreement.
 between 0.00 and 0.20: slight.
 between 0.21 and 0.40: fair
 between 0.41 and 0.60: moderate
 between 0.61 and 0.80: substantial
 between 0.81 and 1.00: almost perfect.
A coefficient of 0.4 would be considered the limit of acceptable reliability of a test.
The Kappa is “a corrector of the measure of agreement”.
As a statistical test, the Kappa can verify that the agreement exceeds the levels of chance.
All the blocks 
block C2C4 
block C56 

Kappa value 
K = 0.675 
K = 0.756 
K = 0.460 
Specificity 
98% 
98% 
91% 
Sensitivity 
74% 
78% 
55% 
K = Kappa coefficient, SE = standard error, Z =Specificity test of the statistics.
INTRACLASS CORRELATION COEFFICIENT (ICC)
The intraclass correlation coefficient (ICC) is for quantitative variables.
Use Landis and Koch’s model 2 for interexaminer reliability, and model 3 for intraexaminer reliability (Landis RJ & Koch GG, 1977).
This index also ranges from 0 to 1.
– The value 1 corresponds to a perfect reproductivity between measurements.
– The value 0 will indicate that the same variance exists between the measurements taken in a single patient as the measurements taken among different patients.
TESTS 
ICC 
KAPPA 
Height iliac crests 
52 
0.26 
Height EIPS 
75 
0.54 
SFFT 
82 
0.62 
SFFT 
63 
0.26 
Gillet 
60 
0.18 
Height. active leg extended 
93 
0.81 
Joint play 
75 
0.61 
Thigh thrust 
81 
0.73 
Separation 
58 
0.17 
Gaenslen 
80 
0.51 
Patrick 
80 
0.65 
Sacral thrust 
68 
0.38 
Sensitivity SI. ligament 
91 
0.83 
Compression 
85 
0.59 
SPEARMAN’S CORRELATION TEST
This is a nonparametric correlation measure. It assumes an arbitrary monotonic function to describe the relationship between two variables, without making any assumptions about the frequency distribution of the variables.
Unlike the Pearson’s coefficient test, it does not require the assumption that the relationship between variables is linear, nor that the variables are measured in interval scales; it can be used for variables measured at the ordinal level.
It is used if the conditions for applying the Pearson test are not met.
It is a variant of the Pearson correlation test. It is applied when each value in itself is not as important as its situation with regard to the other values.
Its values are interpreted exactly the same as those of the Pearson correlation coefficient.
The Spearman correlation measures the degree of association between two quantitative variables which follow a tendency to always increase or decrease.
It is more general than the Pearson’s correlation coefficient. The Spearman correlation on the other hand can be calculated for exponential or logarithmic relationships between the variables.
WILCOXON TEST
This contrasts the null hypothesis that the sample comes from a population in which the magnitude of the positive and negative differences between the values of the variables is the same.
Nonparametric statistical test for comparing two samples (two treatments).
The data distributions do not need to follow the normal distribution.
It is therefore a less restrictive test than the Student’s ttest.
SHAPIROWILKS TEST
Although this test is less wellknown, it is the one which is recommended to contrast the goodnessoffit of our data to a normal distribution, especially when the sample is small (n<30).
It measures the goodnessoffit of the sample to a straight line, when drawing it on normal probability paper.
FISHER STUDENT’S tTEST
Used if two groups are compared with regard to a quantitative variable.
In the opposite case, an equivalent nonparametric test is used, like the MannWhitney U test.
It is used to compare two means of independent normal populations.
Parametric statistical significance test for contrasting the null hypothesis with regard to the difference between two means.
When the two means have been calculated from two completely independent observation samples (very unlikely situation in practice, at least from a theoretical point of view), the test is described as unpaired.
When the two means have been extracted from consecutive observations of the same subjects in two different situations, the values of each individual are compared, and a paired test is applied.
The Student’s ttest is a type of deductive statistics.
It is used to determine whether there is a significant difference between the means of two groups.
As with all deductive statistics, we assume that the dependent variables have a normal distribution.
We specify the level of probability (alpha level, level of significance, p) which we are willing to accept before data is collected (p < .05 is a common value which is used).
Notes about the Student’s ttest:
 When the difference between two averages of the population is being researched, a ttest is used. That is, it is used when we want to compare two means (the counts should be measured in an interval or ratio scale).
 We will use a ttest if we want to compare the reading performance of men and women.
 With a ttest, we have an independent variable and a dependent one.
 The independent variable (gender in this case) can only have two levels (male and female).
 If the independent variable had more than two levels, then we would use a oneway analysis of variance (ANOVA).
 The statistical test for the Student’s t is the value t. As a concept, the tvalue represents the number of standard units which are separating the means of the two groups.
 With a ttest, the researcher wants to indicate with a certain degree of confidence that the difference obtained between the means of the sample groups is too high to be a chance event.
 If our ttest produces a tvalue which gives a probability of .01, we say that the probability of obtaining the difference that we find would be 1 out of 100 times by chance.
Five factors contribute to indicate whether the difference between two means of the groups can be considered significant:
 The bigger the difference between the two means, the greater the probability that a statistically significant difference exists.
 The amount of overlap which exists between the groups (it is a function of the variation within the groups). The smaller the variations which exist between the two groups, the greater the probability that a statistically significant difference exists.
 Sample size is extremely important in determining the significance of the difference between the means. If the sample size is increased, the means tend to be more stable and more representative.
 A higher alpha level requires a smaller difference between the means (p < .05).
 A nondirectional (twotailed) hypothesis should be used.
Underlying assumptions of the ttest:
 The samples have been drawn randomly from their respective populations.
 The population should be normally distributed.
 Unimodal (one mode).
 Symmetrical (the right and left halves are mirrorimages), the same number of people above or below the mean.
 Bellshaped (maximum height (mode) in the middle).
 Mean, mode and median are located in the centre.
 Asymptotic (the further the curve goes away from the mean, the closer the X axis will be; but the curve should never touch the X axis).
 The number of people in the populations should have the same variance (s2 = s2). If this is not the case, another calculation is used for the standard error.
There are 2 types of Student’s ttests
 ttest for paired difference ( dependent groups, ttest correlated) : df= n (number of pairs) 1
This refers to the difference between the mean counts of a single sample of individuals which is determined before the treatment and after the treatment. It can also compare the mean counts of samples of individuals who are paired in a certain way (for example, brothers and sisters, mothers, daughters, people who are paired in terms of specific characteristics).
 ttest for independent samples
This refers to the difference between the averages of two populations.
Basically, the procedure compares the averages of two samples which were selected independently from each other.
An example would be to compare mathematical counts of an experimental group with a control group.
How do I decide which type of ttest to use?
TypeI error:
 Rejects a null hypothesis which is really true. The probability of making a TypeI error depends on the alpha level which was chosen.
 If the alpha probability was fixed at p < 05, then there is a 5% possibility of making a TypeI error.
 The possibility of making a TypeI error can be reduced by fixing a smaller alpha level (p < .01). The problem of doing this is that is increases the possibility of a TypeII error.
TypeII error:
 Fails to reject a null hypothesis which is false.
 The basic idea for calculating a Student test is to find the difference between the means of the two groups and divide it by the standard error (of the difference), that is, the standard deviation of the distribution of the differences.
 A confidence interval for a twotailed ttest is calculated by multiplying the critical values by the standard error and adding or subtracting this from the difference of the two means.
 The effect size is used to calculate the practical difference. If there are several thousand patients, it is easy to find a statistically significant difference.
 Knowing whether this difference is practical or significant is another question.
 With studies involving group differences, the effect size is the difference of the two means divided by the standard deviation of the control group (or the mean standard deviation of both groups if there is no control group).
 Generally, effect size is only important if there is a statistical significance.
 An effect size of 2 is considered small, 5 is considered medium and 8 is considered big.
MANNWHITNEY TEST
The MannWhitney U test is one of the most wellknown significance tests.
It is appropriate when two independent observation samples are measured at an ordinal level, that is, we can say which is the greater of these two observations.
It determines whether the degree of coincidence between two observed distributions is lower than that expected by chance in the null hypothesis that the two samples come from the same population.
Nonparametric statistical significance test to test the null hypothesis that the location parameter (generally the median) is the same when two independent groups are compared, regardless of the type of distribution of the variable (normal distribution or another type).
It is used when wanting to compare two populations using independent samples, that is, it is an alternative test to the ttest for comparing two means using independent samples.
The null hypothesis is that the median of the two populations is equal and the alternative hypothesis could be that the median of population 1 is greater (less or different) from the median of population 2.
MannWhitney test for independent samples:
 If we have two sets of values of a continuous variable obtained in two independent samples: X1, X2,…, Xn, Y1, Y2,…, Ym, we will proceed to put all the values together in ascending order, allocating their rank, correcting equal values with the average rank.
 Then we calculate the rank sum for the observations of the first sample Sx, and the rank sum of the second sample Sy.
 If the values of the population from which the random X sample was extracted are located below the values of Y, then the X sample will probably have lower ranks, which will be reflected in a lower value of Sx than the theoretically probable one.
 If the lowest of the rank sums is excessively low, very unlikely in the case that the null hypothesis were true, this will be rejected.
KRUSKALWALLIS TEST
Nonparametric statistical significance test for contrasting the null hypothesis when the location parameters of two or more groups are equal.
The KruskalWallis test is an alternative to the Ftest of the analysis of variance for simple classification designs. In this case, several groups are compared but using the median of each of them, instead of the means.
 Ho: The median of the k populations considered are equal and,
 Ha: At least one of the populations has a different median from the others.
Where n is the data total.
This contrast, which is only valid for continuous variables, compares the theoretical distribution function (accumulated probability) with the observed one, and calculates a discrepancy value, usually represented as D. This value corresponds to the maximum discrepancy in absolute value between the observed distribution and the theoretical distribution, thus providing a probability value P, which corresponds, if we are verifying goodnessoffit to the normal distribution, to the probability of obtaining a distribution which differs as much as the observed one if a random sample had really been obtained, of size n, with a normal distribution.
If this probability is high, then there will not be statistical reasons for assuming that our data does not come from a distribution, whereas if it is very low, it will not be acceptable to assume this probability model for the data.
NONPARAMETRIC TESTS
The analysis of variance assumes that the underlying distributions are distributed normally and that the variations in the distributions which are compared are similar.
Pearson’s correlation coefficient assumes normality.
Although parametric techniques are robust (that is, they often have considerable power for detecting differences or similarities even when these assumptions are infringed), some distributions infringe so much that a nonparametric alternative is more desirable for detecting a difference or a similarity.
Nonparametric tests for related samples
Test 
Num. of variables 
Variables 
Objective 
McNemar 
2 
Qualitative: 2 values 
To determine whether the difference between the frequency distributions of the values of the two variables is statistically significant. 
Signs 
2 
At least in ordinal scale 
To determine whether the difference between the number of times the value of a variable is greater than that of the other one and the number of times it is less is statistically significant. 
Wilcoxon 
2 
At least in ordinal scale 
To determine whether the difference between the magnitude of the positive differences between the values of the two variables and the magnitude of the negative differences is statistically significant. 
Cochran’s Q 
p > 2 
Qualitative: 2 values 
To determine whether the differences between the frequency differences of the values of the p variables are statistically significant. 
Friedman’s F 
p > 2 
At least in ordinal scale 
To determine whether the differences between the distributions of the p variables are statistically significant. 
CHOOSING THE APPROPRIATE STATISTICAL TECHNIQUE
With the elements defined in the earlier paragraphs, decision trees can be established to help choose the appropriate statistical test or technique.
There are more than 300 basic statistical tests, making it difficult to cover all of them exhaustively in this article.
Criterion 
Description 
Explanations 
1 
Descriptive statistics 
No statistical content or only descriptive statistics 
2 
Student’s ttests, ztests 
For one sample or two samples (paired and/or independent) 
3 
Bivariate tables 
Chisquared, Fisher’s exact test, McNemar Test 
4 
Nonparametric tests 
Signs Test, MannWhitney U test, Wilcoxon ttest 
5 
Demoepidemiological statistics 
Relative risk. Odds ratio. Log. Odds. Measures of association, sensitivity and specificity 
6 
Pearson’s linear correlation 
Classic correlation (linear correlation coefficient r) 
7 
Pearson’s linear correlation 
Classic correlation (linear correlation coefficient r) 
8 
Simple regression 
Regression of squared minimums with a producer variable and a response 
9 
Analysis of variance 
ANOVA, analysis of covariance, Ftests 
10 
Transformation of variables 
Use of transformations (logarithmic…) 
11 
Nonparametric correlation 
Spearman’s Rho, Kendall’s Tau, trend tests 
12 
Multiple regression 
Includes polynomic regression and stepbystep regression 
13 
Multiple comparisons 
Multiple comparisons 
14 
Goodnessoffit and standardisation 
Standardisation of incidence and prevalence rates 
15 
Multivariate tables 
MantelHaenszel procedures linear log. Models 
16 
Sample size and power 
Determination of the sample size on the basis of a detectable difference 
17 
Survival analysis 
Includes life tables, survival regression and other survival analyzes 
18 
Costbenefit analysis 
Estimation of the health costs for comparing alternative guidelines (costeffectiveness) 
19 
Other analyzes 
Tests not included in the preceding categories: Sensitivity analysis, cluster analysis. Discriminating analysis. 
Protocol designed by EMERSON and COLDTIZ and adapted by MORA, RIPPOLL et al. Reference levels for the analysis of accessibility.
THE FOLLOWING STEPS
Once the statistics have been carried out, the following actions should be carried out:
 Qualitative or quantitative analysis.
 Summary and final interpretation of all the data already analyzed.
 Writing up of the research report.