The cost function:
Whatever machine learning model you use, be it linear regression or any other model, the h will always have some parameters that we have to figure out using the given data.
Usually, these parameters are denoted by thetas, theta0, theta1, so on and so forth.
Now the important question, how do we choose these thetas? We want thetas such that the resulting plot of h is the best fit for the given data.
Like in the above-given plot, the red line is the best fit for the given data (blue stars), any other line like a horizontal line or vertical line will not be a good fit.
This was all the intuition, now in mathematical language, what is a good fit to a given data?
A natural answer will be, that minimizes the cumulative (sum of) distances of the data points from the red line. The more this sum, the poorer the choice.
This sum can be represented in the mathematical formula as follows:
Now, we'll try getting the intuition in these equations.
Cost function: intuition
Let us say you want to calculate the error contributed by the training point in the training set
(x(i), y(i)).
You calculate the prediction for x(i) made by our hypothesis, which is done by putting x(i) into our h function, which is here, theta0 + theta1*x.
This prediction is denoted by h(x(i)), now to calculate error due to this point, you subtract it with the real output, which is y(i). This can be both positive and negative, but the error due to all points should not cancel each other, so we square it.
Similarly, we calculate the errors due to each point and sum them, dividing them with 2*m which is the number of points in the training set, this is done to ease computation, no real significance.
The resulting sum is the total error for the chosen pair of theta, if you vary theta, this sum varies. So this cost function is the function of the theta. Our original plan was to minimize the cost function, J, to do this we keep on varying theta till J is minimized.
No comments:
Post a Comment