Rishabh's Tech Blog: Dimension Reduction: Data Compression

Data Compression:

What you can really do with data compression? Why learn it?

Reduce the data from 2D to 1D
Or you can even reduce it from 3D to 2D.
Visualizing a 2D data is easier than 3D or higher dimensional data, such data can be reduced to dimensions that can be easily visualized.

Principal Component Analysis (PCA):

We want to, for example, reduce the dimension from 2 to 1D. What we have to do is find a direction onto which to project the data so that the projection error or the sum of distances of points from this line are minimized.

Another example: to reduce the data from n dimensions to k dimensions, we find k vectors onto which project the data, so as to minimize the projection error.

Note: Here is the code for the PCA algorithm: https://github.com/geekRishabhjain/MLpersonalLibrary/blob/master/RlearnPy/PCA.py

PCA algorithm:

Suppose the training set is ${x1, x2, x3, .....}$ where each xi is 'n' element vector.

First, we preprocess the data, we apply feature scaling.

Now, in case we want to reduce the data from n-dimensions to k-dimensions:

Compute the "covariance matrix":

\[sigma = (1/m)⅀(xi)@(x(i).T)\]

Now use the standard svd function as follows, np.linalg.svd(sigma) this gives us three results:

\[U, S, V = np.linalg.svd(sigma)\], U contains the n eigenvectors, choose first K and store them in Ureduce of them to get the k eigenvectors.

The reduced data is z = Ureduce.T@X

Reconstruction from compressed representation:

The process is simple, Xapprox is \[Xapprox = Z@Ureduce\].

Rishabh's Tech Blog

Dimension Reduction: Data Compression

Data Compression:

Principal Component Analysis (PCA):

PCA algorithm:

Reconstruction from compressed representation:

No comments:

Post a Comment

Installing albert on ubuntu 19.04

Report Abuse