Principal Component Analysis

PCA, Principal Component Analysis, is used to identify patterns of given data and find features for classifying the data in hand. It’s a statistical procedure which is basically based on finding the so called principal components, the first of which has the largest variation in data and we call it the first principal component (obviously!).

This was an assignment I had in my pattern recognition class and it helped me grasp the idea behind PCA. Given this wrench in the picture below, we need to find the angle the wrench is making with the horizontal direction. We can easily see that the line passing through the middle of the wrench is the direction of maximum variation. That’s it, we seek the first principal component. We can determine the angle from this component to be: 142.8°

Inline image 1
After finding the locations of points for wrench, its plot is as:

Next we find the final data by multiplying the vector which contains the features by the original data after being shifted by the mean:

Inline image 2

Next is the reduced dimension plot:

Inline image 3

Finally, the recovered data:

Inline image 4

MATLAB code:
%%cleaning up
close all
clear all
%%reading the image
imgg=imgg(:,:,1); %transfer to 2D (grayscale)
[y,x]=find(imgg==0); %finding the wrench
y=132-y; %flipping the axis
plot(x,y,’.’) %plotting data
axis([1 233 1 132]) %setting axes in image range
%applying PCA
mux = mean(x);
muy = mean(y);
% shift
x1 = x-mux;
y1 = y-muy;
% get covariance matrix
cov_mat = cov(x1,y1);
%get eigenvalues
% sort according to eigenvalues
eigen_val = diag(D);
[eigen_val_sorted, index] = sort(eigen_val,’descend’);
% use all eigenvectors for full reconstruction
feature_vector = V(:,index);
angle=atan(feature_vector(2)/feature_vector(1))*180/pi %angle in degrees
hold on
grid on
RowFeatureVector = feature_vector’;
%%final data
RowZeroMeanData = [x1 y1]’;
FinalData = (RowFeatureVector * RowZeroMeanData)’;
%% reduce dimension
%% recovered data
OriginalMean = repmat([mux muy]’,1,r);
RowOriginalData = (pinv(RowFeatureVector)*FinalData’) + OriginalMean;
ColOriginalData = RowOriginalData’;