My Progress

[Advanced Learning Algorithms] Machine Learning Development Process - 4 본문

AI/ML Specialization

[Advanced Learning Algorithms] Machine Learning Development Process - 4

ghwangbo 2024. 1. 22. 19:49
반응형

1. Machine Learning Development Process


1.1  Cycle of ML process

Step 1: Choose Architecture (mode, data, etc)

Step 2: Train Model

Step 3: Diagnostics (bias, variance, and error analysis)

Step 4: Deploy in production(Deploy, moinor and maintain system) 

 

1.2 Example (Spam Classification Example)

Example: Spam Classification Example

Supervised Learning Algorithm

  • x = feature of spam emails
  • y = whether the email is spam or not

How do reduce the model's error?

  • Collect more data
  • Develop sophisticated features based on email routing
  • Define sophisticated features from email body
  • Design algorithms to detect misspellings

2. Methods to reduce the error of the model


2.1 Error Analysis

Problem:

  • 100 out of 500 CV test were misclassified

Analysis:

  • Manually examine 100 examples and categorize them based on common traits
  • Collect more datas on the trait that had the most error with

2.2 Adding data

  • Rather than dadding mode data of everything, adding more data at the types where error Analysis indicated might help
  • Rather than getting brand new training examples, we can use the technique called Data Augmentation

2.3 Data Augmentation / Data Synthesis

Def: Modifying or Augmenting the data to create a new testing examples

 

Example: OCR model for Character recognition

  • We can shrink, rotatie, or flip the image "A" for additional training examples
  • Adding purely random or meaningless noise to our data does not help

2.4 Transfer Learning

Def: Using the data from a different task

 

Example: We have a model with 5 layers, which was to classify and detect objects

  • Use trained parameters of these layers for the digit classification problem except for the output layer that we have to modify
    • Object classification and detection model incorporates the parameter that distinguishes basic components of an image such as line, edges, corners and shape.
  • Option 1: only train output layers parameters
  • Options 2: Train all parameters

This method is also called Fine Tuning

2.5 Error Metrics for skewed datasets

2.5.1 Precision and recall

 

We are going to use error metrics to analyze if our algorithms is flawed or not.

 

Ex) Rare disease classification example

We got 1% error on test set. But only 0.5% of patients have the disease. The test error is not helpful to measure the algorithm's performance in this skewed dataset.  

 

Then how are we going to measure the performance of the algorithm with the skewed dataset?

 

By using Precision and Recall

 

y = 1 if disease present

y = 0 otherwise 

 

Predicted | Actual 1 0
1 True Positive False Positive
0 False Negative True Negative

 

 

Precision

= of all patients where we predicted y = 1, what fraction actually have the rare disease

= True positives / Total predicted positive

-> precision of algorithm's performance

 

Recall

= of all patients where we preditced y = 1, what fraction did we correctly detect as having it.

= True positives / Total actual positive

 

Sum: Precision and Recall reassure whether your algorithm is flawed or not

 

2.5.2 Trading off precision and recall

By manipulating the threshold of the logistic regression, we can manage the algorithm's precision and recall.

 

Suppose we want to predict y = 1 only if very confident

- > higher precision, lower recall

 

Suppose we want to avoid missing too many case of rare disease 

-> lower precision, higher recall

 

F1 Score

= This helps you to compare the algorithm's precision and recall, so that you can choose the best threshold value

= computing an average of sorts that pays more attention to whichever is lower

= 1 / (1/2 (1/p + 1/r) = 2PR/(P + R)

 

 

반응형