Linear Discriminant Analysis (LDA)
LDA is a way to reduce 'dimensionality' while at the same
time preserving as much of the class discrimination information as possible.
How does it work?
Basically, LDA helps you to find the 'boundaries' around
clusters of classes. It projects your data points on a line so that your
clusters 'are as separated as possible', with each cluster having a relative
(close) distance to a centroid.
What its actually doing?
1. Calculating mean vectors of the data in all
dimensions.
2. Calculates scatter from the whole group (to determine
separability)
3. Calculates scatter from representatives of the same
class, using the whole group scatter
as a normalizer.
4. Magical grouping around K centroids.
Let us say you have data that is represented by
100-dimensional feature vectors and you have 100000 data points. You know that
these data points belong to three different classes but you are not sure which
combination of features are mostly affecting their separation. The data you
have is too large to perform any reasonable computation in the reasonable time.
So you want to reduce these 100-dimensional feature vector to say
50-dimensional feature vector to allow you to learn the data more efficiently.
Performing Principal Component Analysis (PCA) to reduce
the number of features(dimensions) would have given you which all features
affected your data by computing their leading eigenvalues. But you are not
satisfied, though you have obtained the 50 new features, they do not correctly
distinguish the 3 classes as they were in the original data.
You want to preserve as the difference between the
classes as well while reducing the dimensions. You look for a better
alternative, and it leads you to Linear Discriminant Analysis which reduces the
number of features by also considering the inter-class separation between the
classes.
LDA just reduces the number of dimensions of the input
feature vector by preserving the inter-class separation as present in the
original feature vector.
In Figure 1, a 3-dimensional input feature vector is
reduced to 1-dimensional feature vector in the meantime preserving the
differences among the classes.
Let's talk about linear regression first. You may know
that linear regression analysis tries to fit a line through the data points in
an n-dimensional plane,
such that the distances between the points and the line
are minimized.
Discriminant Analysis is the opposite of linear
regression. Here, the task is to maximize the distance between the
discrimination boundary or the discriminating line to the data points on either
side of the line and minimize the distances between the points themselves.
we know that the hypothesis equation is h(x) = w(t).x + c
The discriminant analysis tries to find the optimum w and
c, such that the above-explained theory holds true.
The linear discriminant analysis is the statistical
method of classifying an observation having p component in one of the two
groups... It is developed by Fisher... Actually, it gives two regions separated
by a line so that helps in classifying the given data... The region &
separate line is the defined by linear discriminant function... For more
details may go through any reference book on Multivariate Analysis...
Logistic regression is a classification algorithm
traditionally limited to only two-class classification problems.
If you have more than two classes then Linear Discriminant
Analysis is the preferred linear classification technique.
Limitations of Logistic Regression
Logistic regression is a simple and powerful linear
classification algorithm. It also has limitations that suggest at the need for
alternate linear classification algorithms.
·Two-Class Problems: Logistic regression is intended for
two-class or binary classification problems. It can be extended for multi-class
classification but is rarely used for this purpose.
·Unstable With Well Separated Classes:
Logistic
regression can become unstable when the classes are well separated.
·Unstable With Few Examples: Logistic regression can become
unstable when there are few examples from which to estimate the parameters.
Linear Discriminant Analysis does address each of these
points and is the go-to linear method for multi-class classification problems.
Even with binary-classification problems, it is a good idea to try both
logistic regression and linear discriminant analysis.
It consists of statistical properties of your data,
calculated for each class. For a single input variable (x) this is the mean and
the variance of the variable for each class. For multiple variables, this is
the same properties calculated over the multivariate Gaussian, namely the means
and the covariance matrix.
These statistical properties are estimated from your data
and plug into the LDA equation to make predictions. These are the model values
that you would save to file for your model.
LDA makes predictions by estimating the probability that
a new set of inputs belongs to each class. The class that gets the highest
probability is the output class and a prediction is made.
How to Prepare Data for LDA
This section lists some suggestions you may consider when
preparing your data for use with LDA.
·Classification Problems: This might go without saying, but
LDA is intended for classification problems where the output variable is
categorical. LDA supports both binary and multi-class classification.
·Gaussian Distribution: The standard implementation of the
model assumes a Gaussian distribution of the input variables.
·Remove Outliers:
Consider removing outliers from your data. These can skew the basic
statistics used to separate classes in LDA such the mean and the standard
deviation.
·Same Variance: LDA assumes that each input variable
has the same variance. It is almost always a good idea to standardize your data
before using LDA so that it has a mean of 0 and a standard deviation of 1.
Remember one thing if your dataset is linearly separable, then only apply LDA as a
classifier, you will get great results.
Difference between LDA and PCA
LDA is a method of dimensionality reduction. Another
well-known one is Principal Component Analysis (PCA).
If you want to know more about PCA please click here
The difference is that PCA does not take into account the
class information.
For the two clusters in the above Figure 2, PCA will try
to find the direction that maximizes the variance and projects the data onto that
direction, which is along the y-axis in this case.This is clearly not ideal. We
actually lost information because the projections of the two clusters are no
longer separable.
Without any math, LDA needs to accomplish two things:
maximize the variance between the two clusters, and minimize the variance of
the points within each cluster, after the projection. This results in two
projected clusters that are clearly separated. Note that in this case, we’re
actually using the fact that there are two clusters, i.e, the class
information.
Mathematically, the two goals can be formulated into two
covariance matrices. You can read in more detail...
It is based on your dataset. If the dataset has high
variance, you need to reduce the number of features and add more dataset. After
that use non-linear method for classification.
If the dataset with low variance, use a linear model.
If the dataset is small... having less variance .. use
linear model.. otherwise use nonlinear model...
Go Further!
I hope you enjoyed this post. The tutorial is very helpful to get the overall idea of Linear Discriminant Analysis. The difference between LDA and PCA is also highlighted at the end of the tutorial. Good Luck!