# PCA

Principal Component Analysis

PCA, Principal Component Analysis, is used to identify patterns of given data and find features for classifying the data in hand. It’s a statistical procedure which is basically based on finding the so called principal components, the first of which has the largest variation in data and we call it the first principal component (obviously!).

This was an assignment I had in my pattern recognition class and it helped me grasp the idea behind PCA. Given this wrench in the picture below, we need to find the angle the wrench is making with the horizontal direction. We can easily see that the line passing through the middle of the wrench is the direction of maximum variation. That’s it, we seek the first principal component. We can determine the angle from this component to be: 142.8° After finding the locations of points for wrench, its plot is as: Next we find the final data by multiplying the vector which contains the features by the original data after being shifted by the mean: Next is the reduced dimension plot: Finally, the recovered data: MATLAB code:
%%cleaning up
close all
clear all
clc
imgg=imgg(:,:,1); %transfer to 2D (grayscale)
[y,x]=find(imgg==0); %finding the wrench
y=132-y; %flipping the axis
plot(x,y,’.’) %plotting data
axis([1 233 1 132]) %setting axes in image range
%applying PCA
mux = mean(x);
muy = mean(y);
% shift
x1 = x-mux;
y1 = y-muy;
% get covariance matrix
cov_mat = cov(x1,y1);
%get eigenvalues
[V,D]=eig(cov_mat);
% sort according to eigenvalues
eigen_val = diag(D);
[eigen_val_sorted, index] = sort(eigen_val,’descend’);
% use all eigenvectors for full reconstruction
feature_vector = V(:,index);
angle=atan(feature_vector(2)/feature_vector(1))*180/pi %angle in degrees
hold on
grid on
RowFeatureVector = feature_vector’;
plotv(feature_vector(:,1))
%%final data
RowZeroMeanData = [x1 y1]’;
FinalData = (RowFeatureVector * RowZeroMeanData)’;
figure
plot(FinalData(:,1),FinalData(:,2),’r*’)
%% reduce dimension
figure
plot(FinalData(:,1),0,’r*’)
%% recovered data
[r,c]=size(FinalData);
OriginalMean = repmat([mux muy]’,1,r);
RowOriginalData = (pinv(RowFeatureVector)*FinalData’) + OriginalMean;
ColOriginalData = RowOriginalData’;
figure
plot(ColOriginalData(:,1),ColOriginalData(:,2),’r*’)