Thursday, September 20, 2018


Linear Discriminant Analysis (LDA)

LDA is a way to reduce 'dimensionality' while at the same time preserving as much of the class discrimination information as possible.

How does it work?
Basically, LDA helps you to find the 'boundaries' around clusters of classes. It projects your data points on a line so that your clusters 'are as separated as possible', with each cluster having a relative (close) distance to a centroid.

What its actually doing?
1. Calculating mean vectors of the data in all dimensions.
2. Calculates scatter from the whole group (to determine separability)
3. Calculates scatter from representatives of the same class, using the whole group scatter as a normalizer.
4. Magical grouping around K centroids.


Let us say you have data that is represented by 100-dimensional feature vectors and you have 100000 data points. You know that these data points belong to three different classes but you are not sure which combination of features are mostly affecting their separation. The data you have is too large to perform any reasonable computation in the reasonable time. So you want to reduce these 100-dimensional feature vector to say 50-dimensional feature vector to allow you to learn the data more efficiently.

Performing Principal Component Analysis (PCA) to reduce the number of features(dimensions) would have given you which all features affected your data by computing their leading eigenvalues. But you are not satisfied, though you have obtained the 50 new features, they do not correctly distinguish the 3 classes as they were in the original data.

You want to preserve as the difference between the classes as well while reducing the dimensions. You look for a better alternative, and it leads you to Linear Discriminant Analysis which reduces the number of features by also considering the inter-class separation between the classes.

LDA just reduces the number of dimensions of the input feature vector by preserving the inter-class separation as present in the original feature vector.




                                   Figure 1: Three Class Feature Data

In Figure 1, a 3-dimensional input feature vector is reduced to 1-dimensional feature vector in the meantime preserving the differences among the classes.

Let's talk about linear regression first. You may know that linear regression analysis tries to fit a line through the data points in an n-dimensional plane,
such that the distances between the points and the line are minimized.

Discriminant Analysis is the opposite of linear regression. Here, the task is to maximize the distance between the discrimination boundary or the discriminating line to the data points on either side of the line and minimize the distances between the points themselves.

we know that the hypothesis equation is h(x) = w(t).x + c

The discriminant analysis tries to find the optimum w and c, such that the above-explained theory holds true.

The linear discriminant analysis is the statistical method of classifying an observation having p component in one of the two groups... It is developed by Fisher... Actually, it gives two regions separated by a line so that helps in classifying the given data... The region & separate line is the defined by linear discriminant function... For more details may go through any reference book on Multivariate Analysis...

Logistic regression is a classification algorithm traditionally limited to only two-class classification problems.

If you have more than two classes then Linear Discriminant Analysis is the preferred linear classification technique.

Limitations of Logistic Regression
Logistic regression is a simple and powerful linear classification algorithm. It also has limitations that suggest at the need for alternate linear classification algorithms.
·Two-Class Problems: Logistic regression is intended for two-class or binary classification problems. It can be extended for multi-class classification but is rarely used for this purpose.

·Unstable With Well Separated Classes: Logistic regression can become unstable when the classes are well separated.

·Unstable With Few Examples: Logistic regression can become unstable when there are few examples from which to estimate the parameters.



                                          Figure 2: 2D mapping of Features

Linear Discriminant Analysis does address each of these points and is the go-to linear method for multi-class classification problems. Even with binary-classification problems, it is a good idea to try both logistic regression and linear discriminant analysis.

It consists of statistical properties of your data, calculated for each class. For a single input variable (x) this is the mean and the variance of the variable for each class. For multiple variables, this is the same properties calculated over the multivariate Gaussian, namely the means and the covariance matrix.

These statistical properties are estimated from your data and plug into the LDA equation to make predictions. These are the model values that you would save to file for your model.

LDA makes predictions by estimating the probability that a new set of inputs belongs to each class. The class that gets the highest probability is the output class and a prediction is made.


How to Prepare Data for LDA

This section lists some suggestions you may consider when preparing your data for use with LDA.

·Classification Problems: This might go without saying, but LDA is intended for classification problems where the output variable is categorical. LDA supports both binary and multi-class classification.

·Gaussian Distribution: The standard implementation of the model assumes a Gaussian distribution of the input variables.

·Remove Outliers:  Consider removing outliers from your data. These can skew the basic statistics used to separate classes in LDA such the mean and the standard deviation.

·Same Variance: LDA assumes that each input variable has the same variance. It is almost always a good idea to standardize your data before using LDA so that it has a mean of 0 and a standard deviation of 1.


Remember one thing if your dataset is linearly separable, then only apply LDA as a classifier,  you will get great results.

Difference between LDA and PCA

LDA is a method of dimensionality reduction. Another well-known one is Principal Component Analysis (PCA).

If you want to know more about PCA please click here

The difference is that PCA does not take into account the class information.

For the two clusters in the above Figure 2, PCA will try to find the direction that maximizes the variance and projects the data onto that direction, which is along the y-axis in this case.This is clearly not ideal. We actually lost information because the projections of the two clusters are no longer separable.

Without any math, LDA needs to accomplish two things: maximize the variance between the two clusters, and minimize the variance of the points within each cluster, after the projection. This results in two projected clusters that are clearly separated. Note that in this case, we’re actually using the fact that there are two clusters, i.e, the class information.

Mathematically, the two goals can be formulated into two covariance matrices. You can read in more detail...

It is based on your dataset. If the dataset has high variance, you need to reduce the number of features and add more dataset. After that use non-linear method for classification.

If the dataset with low variance, use a linear model.

If the dataset is small... having less variance .. use linear model.. otherwise use nonlinear model...

Go Further!


I hope you enjoyed this post. The tutorial is very helpful to get the overall idea of Linear Discriminant Analysis. The difference between LDA and PCA is also highlighted at the end of the tutorial. Good Luck!