# Analysis of Variance (ANOVA)

Menu location: Analysis_Analysis of Variance

Related:

Basics

ANOVA is a set of statistical methods used mainly to compare the means of two or more samples. Estimates of variance are the key intermediate statistics calculated, hence the reference to variance in the title ANOVA. The different types of ANOVA reflect the different experimental designs and situations for which they have been developed.

Excellent accounts of ANOVA are given by Armitage & Berry (1994) and Kleinbaum et. al (1998). Nonparametric alternatives to ANOVA are discussed by Conover (1999) and Hollander and Wolfe (1999).

ANOVA and regression

ANOVA can be treated as a special case of general linear regression where independent/predicator variables are the nominal categories or factors. Each value that can be taken by a factor is referred to as a level. k different levels (e.g. three different types of diet in a study of diet on weight gain) are coded not as a single column (e.g. of diet 1 to 3) but as k-1 dummy variables. The dependent/outcome variable in the regression consists of the study observations.

General linear regression can be used in this way to build more complex ANOVA models than those described in this section; this is best done under expert statistical guidance.

Fixed vs. random effects

A fixed factor has only the levels used in the analysis (e.g. sex, age, blood group). A random factor has many possible levels and some are used in the analysis (e.g. time periods, subjects, observers). Some factors that are usually treated as fixed may also be treated as random if the study is looking at them as part of a larger group (e.g. treatments, locations, tests).

Most general statistical texts arrange data for ANOVA into tables where columns represent fixed factors and the one and two way analyses described are fixed factor methods.

Multiple comparisons

ANOVA gives an overall test for the difference between the means of k groups. StatsDirect enables you to compare all k(k-1)/2 possible pairs of means using methods that are designed to avoid the type I error that would be seen if you used two sample methods such as t test for these comparisons. The multiple comparison/contrast methods offered by StatsDirect are Tukey(-Kramer), Scheffé, Newman-Keuls, Dunnett and Bonferroni (Armitage and Berry, 1994; Wallenstein, 1980; Liddell, 1983; Miller, 1981; Hsu, 1996; Kleinbaum et al., 1998). See multiple comparisons for more information.

Further methods

There are many possible ANOVA designs. StatsDirect covers the common designs in its ANOVA section and provides general tools (see general linear regression and dummy variables) for building more complex designs.

Other software such as SAS and Genstat provide further specific ANOVA designs. For example, balanced incomplete block design:

- with complete missing blocks you should consider a balanced incomplete block design provided the number of missing blocks does not exceed the number of treatments.

 Treatments 1 2 3 4 Blocks: A x x x B x x x C x x x D x x x

Complex ANOVA should not be attempted without expert statistical guidance. Beware situations where over complex analysis is used in order to compensate for poor experimental design. There is no substitute for good experimental design.