26 Dimensionality Reduction – II
Learning Objectives:
The learning objectives of this module are as follows:
• To explain the Fisher Linear Discriminant approach
• To understand the concept of Singular Value Decomposition
• To outline the computation of Singular Value Decomposition
28.1 Introduction
The main idea of Fisher linear discriminant approach is finding the projection to a line such that samples from different classes projected on the line are well separated (Figure 28.1).
Figure 28.1 Bad and Good Projections
28.2 The Basis of Fisher Discriminant
Suppose we have 2 classes and d-dimensional samples X1 ,X2 ,…., Xn where n1 samples come from the first class and n2 samples come from the second class. Consider projection on a line, and let the line direction be given by unit vector V. Scalar Vt Xi is the distance of projection of xi from the origin. Thus is the projection of into a one dimensional subspace (Figure 28.2).
Figure 28.2
Thus scatter is just sample variance multiplied by n. In other words, scatter measures the same concept as variance, the spread of data around the mean, only that scatter is just on a different scale than variance.
28.3 Fisher Linear Discriminant
We need to normalize by both scatter of class 1 and scatter of class 2. Thus Fisher linear discriminant needs to project on line in the direction v which maximizes J(v) (Figure 28.5). Here J(v) is defined such that we want the projected means to be far from each other and the scatter of each class to be as small as possible that is we want the samples of the respective classes to cluster around the projected means. If we find v which makes J(v) large, we are guaranteed that the classes are well separated (Figure 28.6).
Figure 28.5 Definition of J(V)
Figure 28.6 Well Separated Projected Samples
All we need to do now is to express J explicitly as a function of v and maximize it. This is fairly straightforward but needs application of linear algebra and calculus. We define the separate class scatter matrices S1 and S2 for classes 1 and 2. These measure the scatter of original samples xi (before projection) as follows:
Now let us consider the samples belonging to two classes as given below:
– Class 1 has 5 samples c1=[(1,2),(2,3),(3,3),(4,5),(5,5)]
– Class 2 has 6 samples c2=[(1,0),(2,1),(3,1),(5,3),(6,5)] Now let us arrange data in 2 separate matrices as follows:
It is to be noted that PCA performs very poorly on this data because the direction of largest variance is not helpful for classification (Figure 28.7).
Figure 28.7 PCA based Dimensionality Reduction
Now let us first compute the mean for each class. M1= mean(c1)=[3 3.6] M2= mean (c2) = [3.3 2]
Now based on these means let us compute the scatter matrices S1 and S2 for each class as follows:
Notice, that as long as the line has the right direction, its exact position does not matter.
28.4 Singular Value Decomposition
Now let us discuss the final method of dimensionality reduction namely Singular Value Decomposition (SVD). SVD can be viewed as a method for transforming correlated variables into a set of uncorrelated ones that better expose the various relationships among the original data items. It is a method for identifying and ordering the dimensions along which data points exhibit the most variation. With SVD, it’s possible to find the best approximation of the original data points using fewer dimensions. Hence, SVD is used for data reduction.
Singular Value Decomposition factorizes a real or complex matrix. For an M ´ N matrix A of rank r there exists a factorization (Singular Value Decomposition = SVD) as follows:
Here the columns of U are orthogonal eigen vectors of AAT, the columns of V are orthogonal eigen vectors of ATA and Eigen values l1 … lr of AAT are the eigen values of ATA. An illustration of SVD dimensions and sparseness is as given in Figure 28.9.
Figure 28.9 SVD Illustration
The corresponding singular values are given below.
(c)
Summary
- Explained the Fisher Linear Discriminant approach
- Outlined the concept of Singular Value Decomposition
- Discussed the computation of Singular Value Decomposition.