Dimension Reduction: Data Compression

Data Compression:

What you can really do with data compression? Why learn it?
  • Reduce the data from 2D to 1D
  • Or you can even reduce it from 3D to 2D.
  • Visualizing a 2D data is easier than 3D or higher dimensional data, such data can be reduced to dimensions that can be easily visualized.

Principal Component Analysis (PCA):

We want to, for example, reduce the dimension from 2 to 1D. What we have to do is find a direction onto which to project the data so that the projection error or the sum of distances of points from this line are minimized.

Another example: to reduce the data from n dimensions to k dimensions, we find k vectors onto which project the data, so as to minimize the projection error.


PCA algorithm:

Suppose the training set is ${x1, x2, x3, .....}$ where each xi is 'n' element vector.
First, we preprocess the data, we apply feature scaling.
Now, in case we want to reduce the data from n-dimensions to k-dimensions:
Compute the "covariance matrix":
\[sigma = (1/m)⅀(xi)@(x(i).T)\]

Now use the standard svd function as follows, np.linalg.svd(sigma) this gives us three results:
\[U, S, V = np.linalg.svd(sigma)\], U contains the n eigenvectors, choose first K and store them in Ureduce of them to get the k eigenvectors.

The reduced data is z = Ureduce.T@X

Reconstruction from compressed representation:

The process is simple, Xapprox is \[Xapprox = Z@Ureduce\].


No comments:

Post a Comment

Installing albert on ubuntu 19.04

Installing Albert on Ubuntu 19.04... Albert is not still released for ubuntu 19.04. But still, you can install it using the following ...