Data Compression:
What you can really do with data compression? Why learn it?
- Reduce the data from 2D to 1D
- Or you can even reduce it from 3D to 2D.
- Visualizing a 2D data is easier than 3D or higher dimensional data, such data can be reduced to dimensions that can be easily visualized.
Principal Component Analysis (PCA):
We want to, for example, reduce the dimension from 2 to 1D. What we have to do is find a direction onto which to project the data so that the projection error or the sum of distances of points from this line are minimized.
Another example: to reduce the data from n dimensions to k dimensions, we find k vectors onto which project the data, so as to minimize the projection error.
Note: Here is the code for the PCA algorithm: https://github.com/geekRishabhjain/MLpersonalLibrary/blob/master/RlearnPy/PCA.py
PCA algorithm:
Suppose the training set is ${x1, x2, x3, .....}$ where each xi is 'n' element vector.
First, we preprocess the data, we apply feature scaling.
Now, in case we want to reduce the data from n-dimensions to k-dimensions:
Compute the "covariance matrix":
\[sigma = (1/m)⅀(xi)@(x(i).T)\]
Now use the standard svd function as follows, np.linalg.svd(sigma) this gives us three results:
\[U, S, V = np.linalg.svd(sigma)\], U contains the n eigenvectors, choose first K and store them in Ureduce of them to get the k eigenvectors.
The reduced data is z = Ureduce.T@X
Reconstruction from compressed representation:
The process is simple, Xapprox is \[Xapprox = Z@Ureduce\].
No comments:
Post a Comment