The concept of degrees of freedom is central to the principle of estimating statistics of populations from samples of them. "Degrees of freedom" is commonly abbreviated to df.
Think of df as a mathematical restriction that needs to be put in place when estimating one statistic from an estimate of another.
Let us take an example of data that have been drawn at random from a normal distribution. Normal distributions need only two parameters (mean and standard deviation) for their definition; e.g. the standard normal distribution has a mean of 0 and standard deviation (sd) of 1. The population values of mean and sd are referred to as mu and sigma respectively, and the sample estimates are x-bar and s.
In order to estimate sigma, we must first have estimated mu. Thus, mu is replaced by x-bar in the formula for sigma. In other words, we work with the deviations from mu estimated by the deviations from x-bar. At this point, we need to apply the restriction that the deviations must sum to zero. Thus, degrees of freedom are n-1 in the equation for s below:
Standard deviation in a population is:
[x is a value from the population, μ is the mean of all x, n is the number of x in the population, Σ is the summation]
The estimate of population standard deviation calculated from a random sample is:
[xi is the ith observation from a sample of the population, x-bar is the sample mean, n is the sample size, Σ is the summation]
When this principle of restriction is applied to regression and analysis of variance, the general result is that you lose one degree of freedom for each parameter estimated prior to estimating the (residual) standard deviation.
Another way of thinking about the restriction principle behind degrees of freedom is to imagine contingencies. For example, imagine you have four numbers (a, b, c and d) that must add up to a total of m; you are free to choose the first three numbers at random, but the fourth must be chosen so that it makes the total equal to m - thus your degree of freedom is three.