Principle Component Analysis is an essential dimensionality reduction process in machine learning. This technique includes a simple matrix operation from statistics and linear algebra. The reason to use this method is to calculate, analyze the original data and generate a projection in the fewer dimension. You can also use this technique to find the projection of the same number. This article will help you understand the concept of Principal Component Analysis and how to run the analysis in R and Python.

Principal Component Analysis

PCA or Principle Component Analysis helps decrease the dimensionality of massive data points into simpler forms. This property of PCA makes it a dimensionality-reduction method. It works by transforming variables into smaller ones by eliminating major information of the large set.

When reducing the data set variables, you lower the accuracy of the data. To simplify the data set, you need to accept the risk of low accuracy.

Smaller data sets will help you visualize and explore the condition easily. This way, the machine learning algorithm will analyze the data quickly and easily, maintaining the data’s relevance. In easy words, PCA will reduce the variables and preserve the important information to easily analyze the data.

Principal Component Analysis Example

Principle Component Analysis Example in 2D

You can understand the concept of principal component analysis in two dimensions. These dimensions are height and weight. You will plot the dataset in the plane as points. But when we are teasing out the variations, PCA will identify a new coordinate system. In this system, every point will contain an x and a y value. There is no physical meaning of the axes. These axes are the principal components and combination of height and weight. This way, even the single axes will have a lot of variations.

Principle Component Analysis Example in 3D

Principle Component Analysis becomes more useful by having three dimensions. This way, you can analyze the data from different angles. For instance, you can view the data in 2D after plotting it in the 3D plane. By rotating the camera angle, you can visualize the data from the best viewpoint. The transformation of PCA ensures that:

  • There are more variations in the horizontal axis or PC1.
  • There are second-most variations in the vertical axis or PC2,
  • There are the least variations in the third axis or PC3.

This way, you can easily drop the third axis. The data in this axis is not as important as the horizontal axis and the vertical axis in the plane.

Principal Component Analysis In R

You can compute the principal component analysis in R using the functions princomp() and prcomp(). These functions allow easy and straightforward analysis. Both these functions differentiate in methods to calculate the PCA.

Prcomp() Function to Calculate PCA

You can choose this method for principal component analysis in R to get accurate numerical. The method calculates PCA by using singular value decomposition of the data matrix. It won’t include eigen on the covariance matrix.

Princomp() Function to Calculate PCA

This method uses eigen on the covariance or correlation matrix. This method runs through compatibility by S-PLUS result.

  • pilots.pca<-prcomp(pilots[,2:7])
  • pilots.pca
  • ## Standard deviations (1, .., p=6):
  • ## [1] 41.497499 29.637102 20.035932 16.157875 11.353640  7.097781
  • ## 
  • ## Rotation (n x k) = (6 x 6):
  • ##                                    PC1         PC2         PC3         PC4
  • ## Intelligence                0.21165160 -0.38949336  0.88819049 -0.03082062
  • ## Form.Relations             -0.03883125 -0.06379320  0.09571590  0.19128493
  • ## Dynamometer                 0.08012946  0.06602004  0.08145863  0.12854488
  • ## Dotting                     0.77552673  0.60795970  0.08071120 -0.08125631
  • ## Sensory.Motor.Coordination -0.09593926 -0.01046493  0.01494473 -0.96813856
  • ## Perservation                0.58019734 -0.68566916 -0.43426141 -0.04518327
  • ##                                    PC5         PC6
  • ## Intelligence               -0.04760343 -0.10677164
  • ## Form.Relations             -0.14793191  0.96269790
  • ## Dynamometer                 0.97505667  0.12379748
  • ## Dotting                    -0.10891968  0.06295166
  • ## Sensory.Motor.Coordination  0.10919120  0.20309559
  • ## Perservation                0.03644629  0.03572141

You can also generate outputs of the variance’s proportion with the help of the prcomp() summary method that the components explain.

  • summary(pilots.pca)
  • ## Importance of components:
  • ##                            PC1     PC2     PC3      PC4      PC5     PC6
  • ## Standard deviation     41.4975 29.6371 20.0359 16.15788 11.35364 7.09778
  • ## Proportion of Variance  0.5003  0.2552  0.1166  0.07585  0.03745 0.01464
  • ## Cumulative Proportion   0.5003  0.7554  0.8721  0.94792  0.98536 1.00000

Principal Component Analysis In Python

You can use the scikit-learn library to calculate the Principal Component Analysis of the dataset.This approach is beneficial because you can apply new data repetitively to find the projection easily after calculating the projection. You will specify the components’ number as the parameter while creating the class.

The class will be the first fit of the dataset. You will use the fit() function, actual, or other dataset and choose the dimension to find the transform() function. You can access principal components and eigenvalues on principal component analysis with components_attributes and explained_variance. In the example below, you create the instance first by using the class. Then, you fit the data on a 3×2 matrix. This will give you access to the vectors and values of the projection. Finally, you can transform the actual data.

  • from numpy import array
  • from numpy import mean
  • from numpy import cov
  • from numpy.linalg import eig
  • # define a matrix
  • A = array([[4, 5], [6, 7], [8, 9]])
  • print(A)
  • # calculate the mean of each column
  • M = mean(A.T, axis=1)
  • print(M)
  • # center columns by subtracting column means
  • C = A – M
  • print(C)
  • # calculate covariance matrix of centered matrix
  • V = cov(C.T)
  • print(V)
  • # eigendecomposition of covariance matrix
  • values, vectors = eig(V)
  • print(vectors)
  • print(values)
  • # project data
  • P = vectors.T.dot(C.T)
  • print(P.T)

By running sample data on the 3×2 matrix, we will find principal components and their values. After that, you will find the projections of the actual matrix. Using this method, we will identify the minor floating-point and achieve the same principle components, projections, and singular values.

  • [[4 5]
  • [6 7]
  • [8 9]][6. 7.]
  • [[-2. -2.]
  • [ 0.  0.]
  • [ 2.  2.]]
  • [[4. 4.]
  • [4. 4.]]
  • [[ 0.70710678 -0.70710678]
  • [ 0.70710678  0.70710678]]
  • [8. 0.]
  • [[-2.82842712  0.        ]
  • [ 0.          0.        ]
  • [ 2.82842712  0.        ]]

Conclusion

You can use any programming language such as Python, R, C++, etc., to hard-code the entire process and find Principal Component Analysis applications. Furthermore, you can also utilize the libraries from different contributors and run the data. If the problem’s complexity is not very high, you should use a hard-code technique instead of libraries. This way, you can analyze the back-end to understand problems easily. When you use principal component analysis in R, you can use libraries such as prcomp, princomp, HSAUR, and others. This helps you use the application directly.