My Progress

[Advanced Learning Algorithms] Neural Network Training - 2 본문

AI/ML Specialization

[Advanced Learning Algorithms] Neural Network Training - 2

ghwangbo 2024. 1. 16. 20:23
반응형

1. Details of training


 

#Step 1 : Specify how to compute output given x and parameters w, b (define model)

#Math : f_w,b(x) = ?
#Logistic Regression
z = np.dot(w,x) + b

#Step 2 : Specify loss and cost
#Math: L(f_w,b(x), y)
# 		J(w,b) = 1/m sigma L (f_w,b(x(i)), y[i])


f_x = 1 / (1 + np.exp(-z))
loss = -y * np.log(f_x) - (1 - y) * np.log(1 - f_x)

#Train on data to minimize J(w,b)
w = w - alpha * dj_dw
b = b- alpha * dj_db

 

If we were to illustrate the code above in neural network, it would be the code below

 

 

Code

import tensorflow as tf
from tensorflow.keras.import Sequential
from tensorflow.keras.layers import Dense
	
    model = Sequential([
    	Dense(units = 25, activation = 'sigmoid')
        Dense(units = 15, activation = 'sigmoid')
        Dense(units = 1, activation = 'sigmoid')])
      
from tensorlow.kears.losses import BinaryCrossEntropy
	
    model.compile(loss = BinaryCrossentropy())
    #Uses "Back Propagation" to compute derivate 
    #Gradient descent
    model.fit(X, Y, epochs = 100)

 

Epochs: number of steps in gradient descent

 

Logistic loss is also known as BinaryCrossEntropy since the output is either 0 or 1.

2. Different Activation functions


2.1 Intro

Linear Activation function

- Often refer as "no activation function"

Logistic Regression Function

- Sigmoid Function for classification

ReLU (Rectified Linear Unit) 

- ReLU is also known as Rectified Linear Unit activation function. 

The image above shows the ReLU activation function graph. As you can see if z or x is less than 0, y is equal to 0. But is z is greater than equal to 0, y is equal to z.

ReLU is the common choice for the hidden layer.

Why?

- ReLU is much faster than the sigmoid function since it only computes max(0, Z). 
- Using Linear Activation function or Sigmoid function for hidden layers will not allow to compute complex features.
- A function from the previous layer, w(x) + b, is an input for the next layer, so the equation will be w(w(x) + b) + b.
- As it goes on and on until the output layer, this will only be equal to linear regression formula.
- If we use sigmoid function in output layer with all linear activation function for the hidden layers, the resulting equation will only be equal to the sigmoid function

Conclusion: Do not use linear activation in hidden layer.

2.2 Choosing activation functions

For output layers

Binary Classification
- Sigmoid
Regression(y <= 0 && y >= 0)
- Linear Acivation function
Regression(y>=0)
- ReLU

For hidden layers

- People often use relu function rather than sigmoid because it is much faster.
- Sigmoid function has 2 areas where the slope goes flat while relu has one. 

2.3 Why do we not use linear activation functions

Linear Function of a linear function is a linear function.

If sigmoid function is an output layer, the calculation of the whole neural network layer is just simply a logistic regression.

 


3. Multiclass Classification


3.1 Intro

Multiclass Classification is a problem with more than two outputs(categories).

For example, we want to compute the genre of a movie. It could be adventure, fantasy, thriller, and etcetera.

Then how do we predict the genre of a movie?

 

Originally, sigmoid function allowed us to classify to categories, whether the output is 0 or 1.

By using Softmax Regression algorithm, we can solve the problem with more than two outputs.

3.2 Softmax Regression Algorithm

This algorithm is the generalization of logistic regression algorithm to the multiclass classification contexts.

Intuition

Logistic Regression has 2 possible outputs values

z = w * x + b

a1 = g(z) = 1/ 1+ e^z = P(y = 1 | x)         -> Probability of x to be 1. 

a2 = g(z) = 1/ 1+ e^z = P(y = 0 | x) = 1 - a1         -> Probability of x to be 0. 

 

But Softmax Regression has N outputs.

Lets say there are 4 possible outputs

z1 = w1 * x + b1

z2 = w2 * x + b2

z3 = w3 * x + b3

z4 = w4 * x + b4

 

Probability of 1 to happen(a1) is a1 / (a1 + a2 + a3 + a4)

Cost functions for Softmax Regression

loss = {- loga1 if y = 1
            - loga1 if y = 2
            -loga^n if y = N}

If we were to graph the loss, it would be 

Implementation of Softmax Function

There could be roundoff error depending on the way how you express the function. 
If you can fix this issue for logistic function by declaring an extra parameter.

model.compile(loss = BinaryCrossEntropy())

#this version fixes roundoff error
#logit = z
model.compile(loss = BinaryCrossEntropy(from_logits = True))

We have to do this by setting output layer a linear activation function.

 

We can apply the same method to softmax function.

What we are basically doing is we are plugging in a unsolved logistic regression function instead of plugging in a solved probability of n to happen.

3.3 Classification with multiple outputs

Lets say we want to detect and classify multiple objects: car, pedestrian, dog

We can solve this problem by having 3 different softmax function in the output layer.

반응형