Grouped Linear Regression with Covariance Analysis
Menu location: Analysis_Regression and Correlation_Grouped Linear_Covariance.
This function compares the slopes and separations of two or more simple linear regression lines.
The method involves examination of regression parameters for a group of xY pairs in relation to a common fitted function. This provides an analysis of variance that shows whether or not there is a significant difference between the slopes of the individual regression lines as a whole. StatsDirect then compares all of the slopes individually. The vertical distance between each regression line is then examined using analysis of covariance and the corrected means are given (Armitage and Berry, 1994).
Assumptions:
 Y replicates are a random sample from a normal distribution
 deviations from the regression line (residuals) follow a normal distribution
 deviations from the regression line (residuals) have uniform variance
This is just one facet of analysis of covariance; there are additional and alternative methods. For further information, see Kleinbaum et al. (1998) and Armitage and Berry (1994). Analysis of covariance is best carried out as part of a broader regression modelling exercise by a Statistician.
Technical Validation
Slopes of several regression lines are compared by analysis of variance as follows (Armitage, 1994):
 where SS_{common} is the sum of squares due to the common slope of k regression lines, SS_{between} is the sum of squares due to differences between the slopes, SS_{total} is the total sum of squares and the residual sum of squares is the difference between SS_{total} and SS_{common}. Sxx_{j} is the sum of squares about the mean x observation in the jth group, SxY_{j} is the sum of products of the deviations of xY pairs from their means in the jth group and SYY_{j} is the sum of squares about the mean Y observation in the jth group.
Vertical separation of slopes of several regression lines is tested by analysis of covariance as follows (Armitage, 1994):
 where SS are corrected sums of squares within the groups, total and between the groups (subtract within from total). The constituent sums of products or squares are partitioned between groups, within groups and total as above.
Data preparation
If there are equal numbers of replicate Y observations or single Y observations for each x then you are best prepare and select your data using a group identifier variable. For example with three replicates you would prepare five columns of data: group identifier, x, y1, y2, and y3. Remember to choose the "Groups by identifier" option in this case.
If there are unequal numbers of replicate Y observations for each x then you must prepare the x data in separate columns by group, prepare the Y data in separate columns by group and observation (i.e. Y for group 1 observation 1… r rows long where r is the number of repeat observations). Remember to choose the "Groups by column" option in this case. This is done in the example below.
Example
From Armitage and Berry (1994).
Test workbook (Regression worksheet: Log Dose_Std, BD 1_Std, BD 2_Std, BD 3_Std, Log Dose_I, BD 1_I, BD 2_I, BD 3_I, Log Dose_F, BD 1_F, BD 2_F, BD 3_F).
Three different preparations of Vitamin D are tested for their effect on bones by feeding them to rats that have an induced lack of mineral in their bones. Xray methods are used to test the remineralisation of bones in response to the Vitamin D.
For the standard preparation:
Log dose of Vit D 

0.544 
0.845 
1.146 
Bone density score 

0 
1.5 
2 
0 
2.5 
2.5 
1 
5 
5 
2.75 
6 
4 
2.75 
4.25 
5 
1.75 
2.75 
4 
2.75 
1.5 
2.5 
2.25 
3 
3.5 
2.25 

3 
2.5 

2 


3 


4 


4 
For alternative preparation I:
Log dose of Vit D  
0.398  0.699  1.000  1.301  1.602 
Bone density score  
0  1  1.5  3  3.5 
1  1.5  1  3  3.5 
0  1.5  2  5.5  4.5 
0  1  3.5  2.5  3.5 
0  1  2  1  3.5 
0.5  0.5  0  2  3 
For alternative preparation F:
Log dose of Vit D  
0.398  0.699  1.000 
Bone density score  
2.75  2.5  3.75 
2  2.75  5.25 
1.25  2.25  6 
2  2.25  5.5 
0  3.75  2.25 
0.5  3.5 
To analyse these data in StatsDirect you must first enter them into 14 columns in the workbook appropriately labelled. The first column is just three rows long and contains the three log doses of vitamin D for the standard preparation. The next three columns represent the repeated measures of bone density for each of the three levels of log dose of vitamin D which are represented by the rows of the first column. This is then repeated for the other two preparations. Alternatively, open the test workbook using the file open function of the file menu. Then select covariance from the groups section of the regression and correlation section of the analysis menu. Select the columns marked "Log Dose_Std", "Log Dose_I" and "Log Dose_F" when you are prompted for the predictor (x) variables, these contain the log dose levels (logarithms are taken because, from previous research, the relationship between bone remineralisation and Vitamin D is known to be loglinear). Make sure that the "use Y replicates" option is checked when you are prompted for it. Then select the outcome (Y) variables that represent the replicates. You will have to select three, five and three columns in just three selection actions because these are the number of corresponding dose levels in the x variables in the order in which you selected them.
Alternatively, these data could have been entered in just three pairs of workbook columns representing the three preparations with a log dose column and column of the mean bone density score for each dose level. By accepting the more long winded input of replicates, StatsDirect is encouraging you to run a test of linearity on your data.
For this example:
Grouped linear regression
Source of variation  SSq  DF  MSq  VR  
Common slope  78.340457  1  78.340457  67.676534  P < 0.0001 
Between slopes  4.507547  2  2.253774  1.946984  P = 0.1501 
Separate residuals  83.34518  72  1.157572  
Within groups  166.193185  75 
Common slope is significant
Difference between slopes is NOT significant
Slope comparisons:
slope 1 (Log Dose_Std) v slope 2 (Log Dose_I) = 2.616751 v 2.796235
Difference (95% CI) = 0.179484 (1.576065 to 1.935032)
t = 0.203808, P = 0.8391
slope 1 (Log Dose_Std) v slope 3 (Log Dose_F) = 2.616751 v 4.914175
Difference (95% CI) = 2.297424 (0.245568 to 4.840416)
t = 1.800962, P = 0.0759
slope 2 (Log Dose_I) v slope 3 (Log Dose_F) = 2.796235 v 4.914175
Difference (95% CI) = 2.11794 (0.135343 to 4.371224)
t = 1.873726, P = 0.065
Covariance analysis
Uncorrected:
Source of variation  YY  xY  xx  DF 
Between groups  17.599283  3.322801  0.988515  2 
Within  166.193185  25.927266  8.580791  8 
Total  183.792468  22.604465  9.569306  10 
Corrected:
Source of variation  SSq  DF  MSq  VR 
Between groups  42.543829  2  21.271915  1.694921 
Within  87.852727  7  12.55039  
Total  130.396557  9 
P = 0.251
Corrected Y means ± SE for baseline mean predictor of 0.884372:
Y' = 2.901917 ± 2.045389
Y' = 1.533957 ± 1.590482
Y' = 3.398345 ± 2.057601
Line separations (common slope =3.021547):
line 1 (Log Dose_Std) vs line 2 (Log Dose_I) Vertical separation = 1.367959
95% CI = 4.760348 to 7.496267
t = 0.527831, (7 df), P = 0.6139
line 1 (Log Dose_Std) vs line 3 (Log Dose_F) Vertical separation = 0.496428
95% CI = 7.354566 to 6.36171
t = 0.171164, (7 df), P = 0.8689
line 2 (Log Dose_I) vs line 3 (Log Dose_F) Vertical separation = 1.864388
95% CI = 8.042375 to 4.3136
t = 0.713594, (7 df), P = 0.4986
The common slope is highly significant and the test for difference between the slopes overall was nonsignificant. If our assumption of linearity holds true we can conclude that these lines are reasonably parallel. Looking more closely at the individual slopes preparation F is almost shown to be significantly different from the other two but this difference was not large enough to throw the overall slope comparison into a significant heterogeneity.
The analysis of covariance did not show any significant vertical separation of the three regression lines.