Choosing the number of principal components in PCA:
Choosing K:
We typically choose K to be the smallest value so that:
\[(1/m)*⅀||x(i) - xapprox(i)||^2/ (1/m)*⅀||x(i)||^2<0.01\]
The logic is simple, on the numerator is the averaged squared projection error$(1/m)*⅀||x(i) - xapprox(i)||^2$, and in the denominator is the total variance in the data $(1/m)*⅀||x(i)||^2$.
When the ratio when is 0.01 or 1% we say that 99% of the variance is retained for this value of k, similarly when it is 0.05 or 5% we say 95% of variance had been retained.
Algorithm for choosing k:
- Try PCA with k=1
- compute Ureduce, z, z2, ........, xapprox
- check for the ratio as shown above if less than 0.01 done, else increment k, loop.
One extra trick:
Supervised Learning speedup:
You can speed up the supervised learning by reducing the number of features in the training set, simply extract the inputs Xs and apply PCA on it to get Zs.
Now the training set becomes Z and y. The mapping can also be done on the cross-validation set and the training set.
No comments:
Post a Comment