Principal Component Analysis

PCA, Principal Component Analysis, is used to identify patterns of given data and find features for classifying the data in hand. It’s a statistical procedure which is basically based on finding the so called principal components, the first of which has the largest variation in data and we call it the first principal component (obviously!).

This was an assignment I had in my pattern recognition class and it helped me grasp the idea behind PCA. Given this wrench in the picture below, we need to find the angle the wrench is making with the horizontal direction. We can easily see that the line passing through the middle of the wrench is the direction of maximum variation. That’s it, we seek the first principal component. We can determine the angle from this component to be: 142.8°

After finding the locations of points for wrench, its plot is as:

Next we find the final data by multiplying the vector which contains the features by the original data after being shifted by the mean:

Next is the reduced dimension plot:

Finally, the recovered data:

MATLAB code:

%%cleaning up

close all

clear all

clc

%%reading the image

imgg=imread(‘tool.png’);

imgg=imgg(:,:,1); %transfer to 2D (grayscale)

[y,x]=find(imgg==0); %finding the wrench

y=132-y; %flipping the axis

plot(x,y,’.’) %plotting data

axis([1 233 1 132]) %setting axes in image range

%applying PCA

mux = mean(x);

muy = mean(y);

% shift

x1 = x-mux;

y1 = y-muy;

% get covariance matrix

cov_mat = cov(x1,y1);

%get eigenvalues

[V,D]=eig(cov_mat);

% sort according to eigenvalues

eigen_val = diag(D);

[eigen_val_sorted, index] = sort(eigen_val,’descend’);

% use all eigenvectors for full reconstruction

feature_vector = V(:,index);

angle=atan(feature_vector(2)/feature_vector(1))*180/pi %angle in degrees

hold on

grid on

RowFeatureVector = feature_vector’;

plotv(feature_vector(:,1))

%%final data

RowZeroMeanData = [x1 y1]’;

FinalData = (RowFeatureVector * RowZeroMeanData)’;

figure

plot(FinalData(:,1),FinalData(:,2),’r*’)

%% reduce dimension

figure

plot(FinalData(:,1),0,’r*’)

%% recovered data

[r,c]=size(FinalData);

OriginalMean = repmat([mux muy]’,1,r);

RowOriginalData = (pinv(RowFeatureVector)*FinalData’) + OriginalMean;

ColOriginalData = RowOriginalData’;

figure

plot(ColOriginalData(:,1),ColOriginalData(:,2),’r*’)