If we have a normal equation that can do all that cost function and gradient descent do in one single line of code, then why should we use the latter and not former always?
Here are the benefits of each:
Here are the benefits of each:
Normal Equation:
- You don't need to choose alpha, the learning rate.
- Neither you need to iterate as you would have to do while using gradient descent.
- But you need to compute X.T @ X. and then take its inverse. This is a task of n cube complexity.
- Thus if n, the number of features is very large, the normal equation can be very slow.
Thus, ideally, if you don't have too many features you should use the Normal Equation.
Gradient Descent:
- You need to choose the learning rate.
- You need to perform many iterations.
- But, works well for large n.
So you should use gradient descent when the number of features is very large.
No comments:
Post a Comment