today we will talk about the following points: - model rePResentation - cost function - gradient descent - gradient descent for linear regression - what’s next
model representation
training set notation: m = number of training examples x’s = input variable / features y’s = output variable / target variable
(x,y) one traing example (x ^i, y ^i) i-th training example
learning algorithmhypothesis
cost function
what ‘s cost function? how it comes?
now we have:
why we want to use it ? the less the cost function, the more accuracy of the hypothesis one parameter theta1:
two parameter theta0, theta1: 
gradient descent if we have training set,learning algorithm and hypothesis, then we want to minimize the cost function J of theta. one way of finding the minimum cost function is gradient descent.
methods:
start with some theta0,theta1keep changing theta0,theta1 to reduce J of theta until we end up at a minimumgraphs: 
problem: diff start position may lead to diff local optima
definition: 
intuition: let’s just think about the parameter theta1:
the problem of learning rate:
why gradient descent can converge without turning learning rate smaller? 
gradient descent for linear regression

outcome: 
batch Gradient Descent: 
two extension
solve theta0,theta1 without iterationlearn with larger number of features subscripts to describe features. Ex: x1 x2 x3use linear algebra. Matrix and Vector新闻热点
疑难解答