Skip to main content
< All Topics
Print

Performing a Principal Component Analysis (PCA)

Removing Categories (columns) with too little population

In order to perform PCA we remove the low weight Categories: A1(column 2),A2(column 3),A14(column 15) as well the company ID in column 1 and the last columns (Status, RetDis and OldNew) This is done in the column selector parameter of the prcomp() function, here: c(4:14,15:17)

Calling the R PCA prcomp() function

PCA prcomp() function parameters:

occ.pca <- prcomp(OCC_wStatus[,c(4:14,15:17)], center = TRUE, scale = TRUE)

Displaying the bulk PCA results with contributions

BiPlot companies-variables 1-2 components

fviz_pca_biplot(occ.pca, axes = c(1, 2),
col.ind = “cos2”, # Color by the quality of representation
col.var = “contrib”, # Color by contributions to the PC
gradient.cols = c(“#00AFBB”, “#E7B800”, “#FC4E07”),
)

BiPlot companies-variables 2-3 components

fviz_pca_biplot(occ.pca, axes = c(2, 3), repel = TRUE,
col.ind = “cos2”, # Color by the quality of representation
col.var = “contrib”, # Color by contributions to the PC
gradient.cols = c(“#00AFBB”, “#E7B800”, “#FC4E07”),
)

Displaying PCA results with contributions for Dismissed-Retained classes

Modify point colors with ggplot2 using the “palette” parameter
in: ggplot2 – Essentials (http://www.sthda.com/english/wiki/ggplot2-essentials)

BiPlot companies-variables 2-3 components with DisvAccept groups (D = Dismissed, R = Retained)

fviz_pca_ind(occ.pca, label = “none”, # hide individual labels
axes = c(2,3),
habillage = OCC_wStatus$RetDis, # color by groups
palette = c(“#FF99FF”, “#003366”)
)

Table of Contents