My Progress

[Supervised ML] Gradient Descent - 3 본문

AI/ML Specialization

[Supervised ML] Gradient Descent - 3

ghwangbo 2023. 7. 27. 14:30
반응형

Gradient Descent

 

Definition


- This is an algorithm to find minimum cost function. 

- finding w and b in f(x) = wx +  b  that makes minimum cost function

Goal:

Example)

Gradient Descent example

Problem: What is the fastest way to get down to the lowest ground from the hill?

How: By using Gradient Descent

 

Implementation


Gradient descent algorithm

Formula

Alpha represents the learning rate

 

Learning rate decides how big step you are going to take when you are going down the hill.

 

 

Algorithm Intuition

Intuition

If you were to draw a tangent line, which is the derivative of J(w), you get the slope of J(w).

 

Slope of Case 1 is greater than 0. It makes w - learning rate * (slope of J(w)) to decrease.

It is the opposite for case 2.

 

Learning Rate


Finding adequate Learning Rate is crucial.

 

When do we know if we found a local minimum of a cost function?

- Near a local minimum, derivative becomes smaller and updates steps also become smaller

 

Can reach minimum without decreasing learning rate alpha

 

Gradient descent for Linear Regression


Gradient descent algorithm

Case 1:

Derivative of the cost function on w

Case 2:

Derivative of the cost function on b

Simplified

Ultimately gradient descent algorithm equals to the equations above. 

Code


Batch gradient descent: Each step of gradient descent uses all the training examples

 

Computing Derivative of the cost function

#returns derivative of the cost function on w and b
def compute_gradient(x, y, w, b):

	m = len(x)
    dj_dw = 0
    dj_db = 0
    
    for i in range(m):
    	# calculate prediction for this example
        f_wb_i = w * x[i] + b
        # calculate partial derivatives of cost for this example
        dj_db_i = f_wb_i - y[i]
        dj_dw_i = (f_wb_i - y[i]) * x[i]
        # add to totals
        dj_db = dj_db + dj_db_i
        dj_dw = dj_dw + dj_dw_i
        
    # Divide by number of examples
    dj_db = (1 / m) * dj_db
    dj_dw = (1 / m) * dj_dw
    
    return dj_dw, dj_db

 

반응형