# Principal Components Analysis and Cronbach's Alpha Reliability Coefficient

Menu locations:

Analysis_Regression and Correlation_Principal Components

Analysis_Agreement_Reliability and Reducibility

This function provides principal components analysis (PCA), based upon correlation or covariance, and Cronbach's coefficient alpha for scale reliability.

See questionnaire design for more information on how to use these methods in designing questionnaires or other study methods with multiple elements.

Principal components analysis is most often used as a data reduction technique for selecting a subset of "highly predictive" variables from a larger group of variables. For example, in order to select a sample of questions from a thirty-question questionnaire you could use this method to find a subset that gave the "best overall summary" of the questionnaire (Johnson and Wichern, 1998; Armitage and Berry, 1994; Everitt and Dunn, 1991; Krzanowski, 1988).

There are problems with this approach, and principal components analysis is often wrongly applied and badly interpreted. Please consult a statistician before using this method.

PCA does not assume any particular distribution of your original data but it is very sensitive to variance differences between variables. These differences might lead you to the wrong conclusions. For example, you might be selecting variables on the basis of sampling differences and not their "real" contributions to the group. Armitage and Berry (1994) give an example of visual analogue scale results to which principal components analysis was applied after the data had been transformed to angles as a way of stabilising variances.

Another problem area with this method is the aim for an orthogonal or uncorrelated subset of variables. Consider the questionnaire problem again: it is fair to say that a pair of highly correlated questions are serving much the same purpose, thus one of them should be dropped. The component dropped is most often the one that has the lower correlation with the overall score. It is not reasonable, however, to seek optimal non-correlation in the selected subset of questions. There may be many "real world" reasons why particular questions should remain in your final questionnaire. It is almost impossible to design a questionnaire where all of the questions have the same importance to every subject studied. For these reasons you should cast a net of questions that cover what you are trying to measure as a whole. This sort of design requires strong knowledge of what you are studying combined with strong appreciation of the limitations of the statistical methods used.

Everitt and Dunn (1991) outline PCA and other multivariate methods. McDowell and Newell (1996) and Streiner and Norman (1995) offer practical guidance on the design and analysis of questionnaires.

__Factor analysis vs. principal components__

Factor analysis (FA) is a child of PCA, and the results of PCA are often wrongly labelled as FA. A factor is simply another word for a component. In short, PCA begins with observations and looks for components, i.e. working from data toward a hypothetical model, whereas FA works the other way around. Technically, FA is PCA with some rotation of axes. There are different types of rotations, e.g. varimax (axes are kept orthogonal/perpendicular during rotations) and oblique Procrustean (axes are allowed to form oblique patterns during rotations), and there is disagreement over which to use and how to implement them. Unsurprisingly, FA is misused a lot. There is usually a better analytical route that avoids FA; you should seek the advice of a statistician if you are considering it.

__Data preparation__

To prepare data for principal components analysis in StatsDirect you must first enter them in the workbook. Use a separate column for each variable (component) and make sure that each row corresponds to the observations from one subject. Missing data values in a row will cause that row / subject to be dropped from the analysis. You have the option of investigating either correlation or covariance matrices; most often you will need the correlation matrix. As discussed above, it might be appropriate to transform your data before applying this method.

For the example of 0 to 7 scores from a questionnaire you would enter your data in the workbook in the following format. You might want to transform these data first (Armitage and Berry, 1994).

Question 1 | Question 2 | Question 3 | Question 4 | Question 5 | |

subject 1: | 5 | 7 | 4 | 1 | 5 |

subject 2: | 3 | 3 | 2 | 2 | 6 |

subject 3: | 2 | 2 | 4 | 3 | 7 |

subject 4: | 0 | 0 | 5 | 4 | 2 |

__Internal consistency and deletion of individual components__

Cronbach's alpha is a useful statistic for investigating the internal consistency of a questionnaire. If each variable selected for PCA represents test scores from an element of a questionnaire, StatsDirect gives the overall alpha and the alpha that would be obtained if each element in turn were dropped. If you are using weights then you should use the weighted scores. You should not enter the overall test score as this is assumed to be the sum of the elements you have specified. For most purposes alpha should be above 0.8 to support reasonable internal consistency. If the deletion of an element causes a considerable increase in alpha then you should consider dropping that element from the test. StatsDirect highlights increases of more than 0.1 but this must be considered along with the "real world" relevance of that element to your test. A standardised version of alpha is calculated by standardising all items in the scale so that their mean is 0 and variance is 1 before the summation part of the calculation is done (Streiner and Norman, 1995; McDowell and Newell, 1996; Cronbach, 1951). You should use standardised alpha if there are substantial differences in the variances of the elements of your test/questionnaire.

__Technical Validation__

Singular value decomposition (SVD) is used to calculate the variance contribution of each component of a correlation or covariance matrix (Krzanowski,1988; Chan, 1982):

The SVD of an n by m matrix **X** is **UΣV' = X**. **U** and **V** are orthogonal matrices, i.e. **V' V = V V**' where **V'** is the transpose of **V**. **U** is a matrix formed from column vectors (m elements each) and **V** is a matrix formed from row vectors (n elements each). **Σ** is a symmetrical matrix with positive diagonal entries in non-increasing order. If **X** is a mean-centred, n by m matrix where n>m and rank r = m (i.e. full rank) then the first r columns of **V** are the first r principal components of **X**. The positive eigenvalues of **X'X** on **XX** are the squares of the diagonals in **Σ**. The coefficients or latent vectors are contained in **V**.

Principal component scores are derived from **U** and **Σ** via a **Σ** as trace{**(X-Y)(X-Y)'**}. For a correlation matrix, the principal component score is calculated for the standardized variable, i.e. the original datum minus the mean of the variable then divided by its standard deviation.

Scale reversal is detected by assessing the correlation between the input variables and the scores for the first principal component.

A lower confidence limit for Cronbach's alpha is calculated using the sampling theory of Kristoff (1963) and Feldt (1965):

- where F is the F distribution quantile for a 100(1-p)% confidence limit, k is the number of variables and n is the number of observations per variable.

__Example__

Test workbook (Agreement worksheet: Question 1, Question 2, Question 3, and Question 4). Note you need to click Run with the "Cronbach's alpha for deletions" option selected after the main results are shown.

__Principal components (correlation)__

Sign was reversed for: Question 3; Question 4

Component | Eigenvalue (SVD) | Proportion | Cumulative |

1 | 1.92556 | 48.14% | 48.14% |

2 | 1.305682 | 32.64% | 80.78% |

3 | 0.653959 | 16.35% | 97.13% |

4 | 0.114799 | 2.87% | 100% |

With raw variables:

Scale reliability alpha = 0.54955 (95% lower confidence limit = 0.370886)

Variable dropped | Alpha | Change |

Question 1 | 0.525396 | -0.024155 |

Question 2 | 0.608566 | 0.059015 |

Question 3 | 0.411591 | -0.13796 |

Question 4 | 0.348084 | -0.201466 |

With standardized variables:

Scale reliability alpha = 0.572704 (95% lower confidence limit = 0.403223)

Variable dropped | Alpha | Change |

Question 1 | 0.569121 | -0.003584 |

Question 2 | 0.645305 | 0.072601 |

Question 3 | 0.398328 | -0.174376 |

Question 4 | 0.328003 | -0.244701 |

You can see from the results above that questions 2 and 3 seemed to have scales going in opposite directions to the other two questions, so they were reversed before the final analysis. Dropping question 2 improves the internal consistency of the overall set of questions, but this does not bring the standardised alpha coefficient to the conventionally acceptable level of 0.8 and above. It may be necessary to rethink this questionnaire.