|
These are essential mathematical tests which are applied to statistics to determine their degree of certainty and their significance. Non-parametric interferential statistical methods: These are mathematical procedures to test the statistical hypothesis which, unlike parametric statistics, do not make any assumption about the frequency distributions of the variables which are determined. The level of measure may be nominal or ordinal. The sample does not have to be random. The frequency distribution does not have to be normal. It can be used with smaller samples. Parametric deductive statistical methods: These are mathematical procedures to test the statistical hypothesis which assume that the distributions of the determined variables have certain characteristics. The level of measure must be rational or interval. The sample must be random. The frequency distribution must be normal. The variation in results between each frequency must be similar.
When the statistical tests applicable to quantitative variables do not meet the assumptions needed for their application, corresponding tests should be used as if the response variables were an ordinal variable (non-parametric tests). KOLMOGOROV-SMIRNOV TEST Non-parametric statistical significance test for contrasting the null hypothesis when the localization parameters of both groups are equal. This contrast, which is only valid for continuous variables, compares the theoretical distribution function (accumulated probability) with the observed one, and calculates a discrepancy value, usually represented as D. This value corresponds to the maximum discrepancy in absolute value between the observed distribution and the theoretical distribution, thus providing a probability value P, which corresponds, if we are verifying goodness-to-fit to the normal distribution, to the probability of obtaining a distribution which differs as much as the observed one if a random sample had really been obtained, of size n, with a normal distribution. If this probability is high, then there will be no statistical reasons to assume that our data does not come from a distribution, whereas if it is very low, it will not be acceptable to assume this probability model for the data. F-TEST Statistical test which is used to compare variances. The experimental F-statistic is the contrast statistic in ANOVA and other variance comparison tests. CHI-SQUARED TEST The chi-squared test is any statistical hypothesis test in which the statistical test has a chi-squared distribution if the null hypothesis is true. It determines whether there is an association between qualitative variables. If the p-value associated to the contrast statistic is less, the null hypothesis will be rejected. It is used to analyze contingency tables and comparison of proportions in independent data. FISHER’S EXACT TEST (p.- 5%) It enables the effect of chance to be evaluated. It is a statistical significance test used to analyze categorical data in small samples. The Fisher test is needed when we have data which is classified into two categories in two different ways. Statistical significance test used to compare proportions in contingency tables. It is preferred to the x2 test when the sample size is small (less than 30 subjects). It is the statistical test of choice when the Chi-squared test cannot be used because the sample size is too small. McNEMAR TEST. Statistical test which is used to compare proportions in paired data. Statistical significance test for testing the null hypothesis of inexistence of changes in the proportion of subjects who experiment an event, when each individual is evaluated twice (in different conditions) and the data is paired. BINOMIAL TEST In statistics, the binomial test is an exact test of the statistical significance of deviations of a theoretically forecasted distribution of observations in two categories. The most common use of the binomial test is in the case where the null hypothesis is that two categories are equally likely to occur. PEARSON’S CORRELATION TEST This is used to study the association between a study factor and a quantitative response variable. It measures the degree of association between two variables giving values between -1 and 1.
Test in a null hypothesis that the relative frequencies of occurrence of the observed events follow a specified frequency distribution. The events should be mutually exclusive. This is a goodness-of-fit test which establishes whether or not an observed frequency distribution differs from a theoretical distribution. KAPPA COEFFICIENT The Kappa is a general index of acceptance in interobserver studies. It indicates the degree of interobserver interrelationship. It permits the level of interobserver agreement to be quantified in order to reduce the subjectivity of the method used (mobility test) and to know whether the degree of agreement is due to chance (luck). The percentage of agreement along with the Kappa index is used for qualitative variables. The Kappa coefficient is used for two therapists and the Fleiss coefficient for more than two therapists. This coefficient ranges between 0 and 1. 0 corresponds to a correlation which is identical to that found by chance and 1 a perfect correlation between the examinations. Negative values usually indicate that there is disagreement between two therapists as to how to perform the method. It is calculated as the proportion of agreement, apart from that expected by chance alone, that has been observed between two repetitions of the same instrument (for example, a judgement carried out by two observers separately). The maximum coefficient of agreement is 1.00. A value of 0.00 indicates no agreement.
A coefficient of 0.4 would be considered the limit of acceptable reliability of a test. The Kappa is “a corrector of the measure of agreement”. As a statistical test, the Kappa can verify that the agreement exceeds the levels of chance.
K = Kappa coefficient, SE = standard error, Z =Specificity test of the statistics. INTRACLASS CORRELATION COEFFICIENT (ICC) The intraclass correlation coefficient (ICC) is for quantitative variables. Use Landis and Koch’s model 2 for inter-examiner reliability, and model 3 for intra-examiner reliability (Landis RJ & Koch GG, 1977). This index also ranges from 0 to 1. - The value 1 corresponds to a perfect reproductivity between measurements. - The value 0 will indicate that the same variance exists between the measurements taken in a single patient as the measurements taken among different patients.
SPEARMAN’S CORRELATION TEST This is a non-parametric correlation measure. It assumes an arbitrary monotonic function to describe the relationship between two variables, without making any assumptions about the frequency distribution of the variables. Unlike the Pearson’s coefficient test, it does not require the assumption that the relationship between variables is linear, nor that the variables are measured in interval scales; it can be used for variables measured at the ordinal level. It is used if the conditions for applying the Pearson test are not met. It is a variant of the Pearson correlation test. It is applied when each value in itself is not as important as its situation with regard to the other values. Its values are interpreted exactly the same as those of the Pearson correlation coefficient. The Spearman correlation measures the degree of association between two quantitative variables which follow a tendency to always increase or decrease. It is more general than the Pearson’s correlation coefficient. The Spearman correlation on the other hand can be calculated for exponential or logarithmic relationships between the variables. WILCOXON TEST This contrasts the null hypothesis that the sample comes from a population in which the magnitude of the positive and negative differences between the values of the variables is the same. Non-parametric statistical test for comparing two samples (two treatments). The data distributions do not need to follow the normal distribution. It is therefore a less restrictive test than the Student’s t-test. SHAPIRO-WILKS TEST Although this test is less well-known, it is the one which is recommended to contrast the goodness-of-fit of our data to a normal distribution, especially when the sample is small (n<30). It measures the goodness-of-fit of the sample to a straight line, when drawing it on normal probability paper. FISHER STUDENT’S t-TEST Used if two groups are compared with regard to a quantitative variable. In the opposite case, an equivalent non-parametric test is used, like the Mann-Whitney U test. It is used to compare two means of independent normal populations. Parametric statistical significance test for contrasting the null hypothesis with regard to the difference between two means. When the two means have been calculated from two completely independent observation samples (very unlikely situation in practice, at least from a theoretical point of view), the test is described as unpaired. When the two means have been extracted from consecutive observations of the same subjects in two different situations, the values of each individual are compared, and a paired test is applied. The Student’s t-test is a type of deductive statistics. It is used to determine whether there is a significant difference between the means of two groups. As with all deductive statistics, we assume that the dependent variables have a normal distribution. We specify the level of probability (alpha level, level of significance, p) which we are willing to accept before data is collected (p < .05 is a common value which is used). Notes about the Student’s t-test:
Five factors contribute to indicate whether the difference between two means of the groups can be considered significant:
Underlying assumptions of the t-test:
There are 2 types of Student’s t-tests
This refers to the difference between the mean counts of a single sample of individuals which is determined before the treatment and after the treatment. It can also compare the mean counts of samples of individuals who are paired in a certain way (for example, brothers and sisters, mothers, daughters, people who are paired in terms of specific characteristics). - t-test for independent samples This refers to the difference between the averages of two populations. Basically, the procedure compares the averages of two samples which were selected independently from each other. An example would be to compare mathematical counts of an experimental group with a control group. How do I decide which type of t-test to use? Type-I error:
Type-II error:
MANN-WHITNEY TEST The Mann-Whitney U test is one of the most well-known significance tests. It is appropriate when two independent observation samples are measured at an ordinal level, that is, we can say which is the greater of these two observations. It determines whether the degree of coincidence between two observed distributions is lower than that expected by chance in the null hypothesis that the two samples come from the same population. Non-parametric statistical significance test to test the null hypothesis that the location parameter (generally the median) is the same when two independent groups are compared, regardless of the type of distribution of the variable (normal distribution or another type). It is used when wanting to compare two populations using independent samples, that is, it is an alternative test to the t-test for comparing two means using independent samples. The null hypothesis is that the median of the two populations is equal and the alternative hypothesis could be that the median of population 1 is greater (less or different) from the median of population 2. Mann-Whitney test for independent samples:
KRUSKAL-WALLIS TEST Non-parametric statistical significance test for contrasting the null hypothesis when the location parameters of two or more groups are equal. The Kruskal-Wallis test is an alternative to the F-test of the analysis of variance for simple classification designs. In this case, several groups are compared but using the median of each of them, instead of the means.
Where n is the data total. This contrast, which is only valid for continuous variables, compares the theoretical distribution function (accumulated probability) with the observed one, and calculates a discrepancy value, usually represented as D. This value corresponds to the maximum discrepancy in absolute value between the observed distribution and the theoretical distribution, thus providing a probability value P, which corresponds, if we are verifying goodness-of-fit to the normal distribution, to the probability of obtaining a distribution which differs as much as the observed one if a random sample had really been obtained, of size n, with a normal distribution. If this probability is high, then there will not be statistical reasons for assuming that our data does not come from a distribution, whereas if it is very low, it will not be acceptable to assume this probability model for the data. NON-PARAMETRIC TESTS The analysis of variance assumes that the underlying distributions are distributed normally and that the variations in the distributions which are compared are similar. Pearson’s correlation coefficient assumes normality. Although parametric techniques are robust (that is, they often have considerable power for detecting differences or similarities even when these assumptions are infringed), some distributions infringe so much that a non-parametric alternative is more desirable for detecting a difference or a similarity. Non-parametric tests for related samples
CHOOSING THE APPROPRIATE STATISTICAL TECHNIQUE With the elements defined in the earlier paragraphs, decision trees can be established to help choose the appropriate statistical test or technique. There are more than 300 basic statistical tests, making it difficult to cover all of them exhaustively in this article.
Protocol designed by EMERSON and COLDTIZ and adapted by MORA, RIPPOLL et al. Reference levels for the analysis of accessibility. THE FOLLOWING STEPSOnce the statistics have been carried out, the following actions should be carried out:
These are essential mathematical tests which are applied to statistics to determine their degree of certainty and their significance. Non-parametric interferential statistical methods: These are mathematical procedures to test the statistical hypothesis which, unlike parametric statistics, do not make any assumption about the frequency distributions of the variables which are determined. The level of measure may be nominal or ordinal. The sample does not have to be random. The frequency distribution does not have to be normal. It can be used with smaller samples. Parametric deductive statistical methods: These are mathematical procedures to test the statistical hypothesis which assume that the distributions of the determined variables have certain characteristics. The level of measure must be rational or interval. The sample must be random. The frequency distribution must be normal. The variation in results between each frequency must be similar.
When the statistical tests applicable to quantitative variables do not meet the assumptions needed for their application, corresponding tests should be used as if the response variables were an ordinal variable (non-parametric tests). KOLMOGOROV-SMIRNOV TEST Non-parametric statistical significance test for contrasting the null hypothesis when the localization parameters of both groups are equal. This contrast, which is only valid for continuous variables, compares the theoretical distribution function (accumulated probability) with the observed one, and calculates a discrepancy value, usually represented as D. This value corresponds to the maximum discrepancy in absolute value between the observed distribution and the theoretical distribution, thus providing a probability value P, which corresponds, if we are verifying goodness-to-fit to the normal distribution, to the probability of obtaining a distribution which differs as much as the observed one if a random sample had really been obtained, of size n, with a normal distribution. If this probability is high, then there will be no statistical reasons to assume that our data does not come from a distribution, whereas if it is very low, it will not be acceptable to assume this probability model for the data. F-TEST Statistical test which is used to compare variances. The experimental F-statistic is the contrast statistic in ANOVA and other variance comparison tests. CHI-SQUARED TEST The chi-squared test is any statistical hypothesis test in which the statistical test has a chi-squared distribution if the null hypothesis is true. It determines whether there is an association between qualitative variables. If the p-value associated to the contrast statistic is less, the null hypothesis will be rejected. It is used to analyze contingency tables and comparison of proportions in independent data. FISHER’S EXACT TEST (p.- 5%) It enables the effect of chance to be evaluated. It is a statistical significance test used to analyze categorical data in small samples. The Fisher test is needed when we have data which is classified into two categories in two different ways. Statistical significance test used to compare proportions in contingency tables. It is preferred to the x2 test when the sample size is small (less than 30 subjects). It is the statistical test of choice when the Chi-squared test cannot be used because the sample size is too small. McNEMAR TEST. Statistical test which is used to compare proportions in paired data. Statistical significance test for testing the null hypothesis of inexistence of changes in the proportion of subjects who experiment an event, when each individual is evaluated twice (in different conditions) and the data is paired. BINOMIAL TEST In statistics, the binomial test is an exact test of the statistical significance of deviations of a theoretically forecasted distribution of observations in two categories. The most common use of the binomial test is in the case where the null hypothesis is that two categories are equally likely to occur. PEARSON’S CORRELATION TEST This is used to study the association between a study factor and a quantitative response variable. It measures the degree of association between two variables giving values between -1 and 1.
Test in a null hypothesis that the relative frequencies of occurrence of the observed events follow a specified frequency distribution. The events should be mutually exclusive. This is a goodness-of-fit test which establishes whether or not an observed frequency distribution differs from a theoretical distribution. KAPPA COEFFICIENT The Kappa is a general index of acceptance in interobserver studies. It indicates the degree of interobserver interrelationship. It permits the level of interobserver agreement to be quantified in order to reduce the subjectivity of the method used (mobility test) and to know whether the degree of agreement is due to chance (luck). The percentage of agreement along with the Kappa index is used for qualitative variables. The Kappa coefficient is used for two therapists and the Fleiss coefficient for more than two therapists. This coefficient ranges between 0 and 1. 0 corresponds to a correlation which is identical to that found by chance and 1 a perfect correlation between the examinations. Negative values usually indicate that there is disagreement between two therapists as to how to perform the method. It is calculated as the proportion of agreement, apart from that expected by chance alone, that has been observed between two repetitions of the same instrument (for example, a judgement carried out by two observers separately). The maximum coefficient of agreement is 1.00. A value of 0.00 indicates no agreement.
A coefficient of 0.4 would be considered the limit of acceptable reliability of a test. The Kappa is “a corrector of the measure of agreement”. As a statistical test, the Kappa can verify that the agreement exceeds the levels of chance.
K = Kappa coefficient, SE = standard error, Z =Specificity test of the statistics. INTRACLASS CORRELATION COEFFICIENT (ICC) The intraclass correlation coefficient (ICC) is for quantitative variables. Use Landis and Koch’s model 2 for inter-examiner reliability, and model 3 for intra-examiner reliability (Landis RJ & Koch GG, 1977). This index also ranges from 0 to 1. - The value 1 corresponds to a perfect reproductivity between measurements. - The value 0 will indicate that the same variance exists between the measurements taken in a single patient as the measurements taken among different patients.
SPEARMAN’S CORRELATION TEST This is a non-parametric correlation measure. It assumes an arbitrary monotonic function to describe the relationship between two variables, without making any assumptions about the frequency distribution of the variables. Unlike the Pearson’s coefficient test, it does not require the assumption that the relationship between variables is linear, nor that the variables are measured in interval scales; it can be used for variables measured at the ordinal level. It is used if the conditions for applying the Pearson test are not met. It is a variant of the Pearson correlation test. It is applied when each value in itself is not as important as its situation with regard to the other values. Its values are interpreted exactly the same as those of the Pearson correlation coefficient. The Spearman correlation measures the degree of association between two quantitative variables which follow a tendency to always increase or decrease. It is more general than the Pearson’s correlation coefficient. The Spearman correlation on the other hand can be calculated for exponential or logarithmic relationships between the variables. WILCOXON TEST This contrasts the null hypothesis that the sample comes from a population in which the magnitude of the positive and negative differences between the values of the variables is the same. Non-parametric statistical test for comparing two samples (two treatments). The data distributions do not need to follow the normal distribution. It is therefore a less restrictive test than the Student’s t-test. SHAPIRO-WILKS TEST Although this test is less well-known, it is the one which is recommended to contrast the goodness-of-fit of our data to a normal distribution, especially when the sample is small (n<30). It measures the goodness-of-fit of the sample to a straight line, when drawing it on normal probability paper. FISHER STUDENT’S t-TEST Used if two groups are compared with regard to a quantitative variable. In the opposite case, an equivalent non-parametric test is used, like the Mann-Whitney U test. It is used to compare two means of independent normal populations. Parametric statistical significance test for contrasting the null hypothesis with regard to the difference between two means. When the two means have been calculated from two completely independent observation samples (very unlikely situation in practice, at least from a theoretical point of view), the test is described as unpaired. When the two means have been extracted from consecutive observations of the same subjects in two different situations, the values of each individual are compared, and a paired test is applied. The Student’s t-test is a type of deductive statistics. It is used to determine whether there is a significant difference between the means of two groups. As with all deductive statistics, we assume that the dependent variables have a normal distribution. We specify the level of probability (alpha level, level of significance, p) which we are willing to accept before data is collected (p < .05 is a common value which is used). Notes about the Student’s t-test:
Five factors contribute to indicate whether the difference between two means of the groups can be considered significant:
Underlying assumptions of the t-test:
There are 2 types of Student’s t-tests
This refers to the difference between the mean counts of a single sample of individuals which is determined before the treatment and after the treatment. It can also compare the mean counts of samples of individuals who are paired in a certain way (for example, brothers and sisters, mothers, daughters, people who are paired in terms of specific characteristics). - t-test for independent samples This refers to the difference between the averages of two populations. Basically, the procedure compares the averages of two samples which were selected independently from each other. An example would be to compare mathematical counts of an experimental group with a control group. How do I decide which type of t-test to use? Type-I error:
Type-II error:
MANN-WHITNEY TEST The Mann-Whitney U test is one of the most well-known significance tests. It is appropriate when two independent observation samples are measured at an ordinal level, that is, we can say which is the greater of these two observations. It determines whether the degree of coincidence between two observed distributions is lower than that expected by chance in the null hypothesis that the two samples come from the same population. Non-parametric statistical significance test to test the null hypothesis that the location parameter (generally the median) is the same when two independent groups are compared, regardless of the type of distribution of the variable (normal distribution or another type). It is used when wanting to compare two populations using independent samples, that is, it is an alternative test to the t-test for comparing two means using independent samples. The null hypothesis is that the median of the two populations is equal and the alternative hypothesis could be that the median of population 1 is greater (less or different) from the median of population 2. Mann-Whitney test for independent samples:
KRUSKAL-WALLIS TEST Non-parametric statistical significance test for contrasting the null hypothesis when the location parameters of two or more groups are equal. The Kruskal-Wallis test is an alternative to the F-test of the analysis of variance for simple classification designs. In this case, several groups are compared but using the median of each of them, instead of the means.
Where n is the data total. This contrast, which is only valid for continuous variables, compares the theoretical distribution function (accumulated probability) with the observed one, and calculates a discrepancy value, usually represented as D. This value corresponds to the maximum discrepancy in absolute value between the observed distribution and the theoretical distribution, thus providing a probability value P, which corresponds, if we are verifying goodness-of-fit to the normal distribution, to the probability of obtaining a distribution which differs as much as the observed one if a random sample had really been obtained, of size n, with a normal distribution. If this probability is high, then there will not be statistical reasons for assuming that our data does not come from a distribution, whereas if it is very low, it will not be acceptable to assume this probability model for the data. NON-PARAMETRIC TESTS The analysis of variance assumes that the underlying distributions are distributed normally and that the variations in the distributions which are compared are similar. Pearson’s correlation coefficient assumes normality. Although parametric techniques are robust (that is, they often have considerable power for detecting differences or similarities even when these assumptions are infringed), some distributions infringe so much that a non-parametric alternative is more desirable for detecting a difference or a similarity. Non-parametric tests for related samples
CHOOSING THE APPROPRIATE STATISTICAL TECHNIQUE With the elements defined in the earlier paragraphs, decision trees can be established to help choose the appropriate statistical test or technique. There are more than 300 basic statistical tests, making it difficult to cover all of them exhaustively in this article.
Protocol designed by EMERSON and COLDTIZ and adapted by MORA, RIPPOLL et al. Reference levels for the analysis of accessibility. THE FOLLOWING STEPSOnce the statistics have been carried out, the following actions should be carried out:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||