Choosing the Thetas:
As was discussed, we keep on changing thetas, till a minimum J is achieved, the critical question then arises how do we vary theta? Do we vary them randomly and pray God that J converges?
You know, it does not happen that way. Machine Learning is maths, after all, nothing is left for god.
So the method we use here is called as Gradient Descent, this is one of the best things you'll encounter, and unlike cost-function, this will be used in other machine learning models too.
Gradient Descent:
The steps we follow for finally deciding the thetas are as follows:
- Start with some random theta0 and theta1, we ideally choose zeros for both values.
- Using gradient descent keep on changing the thetas until J converges to a minimum.
In gradient descent what we do is that we subtract from theta the derivative of J with respect to theta times some parameter alpha, those who know calculus can work it out, and those who don't, just learn this formula and you'll be fine:
Note that theta0 and theta1 need to be changed simultaneously, in the sense as soon as you change theta0 and theta1 h will be changed. So the correct way of changing theta0 and theta1 is to use the same h in both the formulas.
The alpha is called the learning rate, it decides the rate at which the theta proceeds with convergence, you don't want it to be too small or too big.
If alpha is too small, the gradient descent can be very slow, and if it is very large it may not converge at all.
No comments:
Post a Comment