일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | ||||
4 | 5 | 6 | 7 | 8 | 9 | 10 |
11 | 12 | 13 | 14 | 15 | 16 | 17 |
18 | 19 | 20 | 21 | 22 | 23 | 24 |
25 | 26 | 27 | 28 | 29 | 30 | 31 |
- LLM
- prompt
- llama
- feature engineering
- Andrew Ng
- supervised ml
- Scikitlearn
- AI 트렌드
- ML
- ChatGPT
- Machine Learning
- 언어모델
- Supervised Learning
- learning algorithms
- 프롬프트 엔지니어링
- feature scaling
- Deep Learning
- 머신러닝
- AI
- 챗지피티
- 인공신경망
- neural network
- Regression
- Unsupervised Learning
- 딥러닝
- coursera
- GPT
- 인공지능
- bingai
- nlp
- Today
- Total
My Progress
[Advanced Learning Algorithms] Decision Trees - 5 본문
1. Intuition
Ex) Cat Classification example
We have three different features(Ear shape, Face shape, Whiskers) of cats to determine whether the given image of an animal is a cat or not.
Terminology
Decision tree algorithm in machine learning shares the same knowledge of the tree data structure in computer science. The top node of a tree represents the root node. Nodes at the very bottom of the tree represent the leaf nodes. Nodes that do not correspond to the root and the leaf nodes are decision nodes.
2. Building a Decision Tree
Then how are we going to build a decision tree to determine if an animal is a cat or not?
In other words, how are we going to efficiently build a decision tree to solve the classification problem with high accuracy?
First, we need measurements to quantify how the tree efficiently split the dataset. We have to measure the purity of the dataset, the closeness of data to a single category, by solving the entropy, a measure of purity.
Step 1: Decide what feature to use for the root node
Step 2: Decide what features to use for the subset
Decision 1: How to choose what feature to split on at each node?
- By maximizing or minimizing impurity
Purity : the closeness of data to all cats or all dogs
Decision 2: When do we stop splitting?
- When a node is consisted of a single class(100% purity)
- When splitting a node will result in the tree exceeding a maximum depth
- We set the maximum depth to make sure that the tree isn't wide and big. It might cause overfitting.
- When improvements in purity score are below a threshold
- When number of examples in a node is below a threshold
3. Decision Tree Learning
3.1 How to measure purity
Measuring Purity
- using entropy function
- p1 fraction of examples that are cats
Let p0 be the fraction of examples that are NOT cats.
p0 = 1- p1
H(p1) = -p1* log2(p1) - p0 logs(p0)
= -p1* log2(p1) - (1 - p1) * log2(1 - p1)
If we were to graph this function, it would be the graph below.
3.2 Choosing a split
We want to find the feature that gives us the low entropy, the measure of impurity. We can compute the entropy by taking a weighted average of the entropy of the splitted nodes. If we subtract the weighed average of the entropy at the leaf nodes from the root node, we get a value called Information gain.
3.3 Information Gain:
= Measures a reduction in entropy
= H(p1 root) - (w left * H(p1 left) + w right* H (p1 right))
Decision tree Learning Process
- Start with all examples at the root node
- Calculate information gain for all possible features, and pick the one with the highest information gain
- Split dataset according to selected feature, and create left and right branches of the tree
- Keep repeating splitting process until stopping criteria is met:
- When a node is 100% one class
- When splitting a node exceeds a maximum depth
- Information gain from additional splits is less than the threshold
- When the number of examples in a node is below a threshold
The key is recursively finding the feature to use for splitting by computing the information gain of each feature.
3.4 Features with more than 2 values.
- We can illustrate the value using a one-hot encoding method.
We can illustrate data using a vector using one-hot encoding.
Then how do we illustrate a continuous variable?
To split a continuous variable, we have to decide the threshold that will separate the variables. The information gain value determines the best threshold to split.
Regression with Decision Trees: Predicting a Number
Choosing a split
- Unlike classification problems using impurity to determine a variable to split, regression problems use variance.
- We compute weighted variance of each variance and find the greatest reduction in variance.
4. Tree Ensembles
4.1 Intuition
- Tree is very sensitive to the dataset. A change in a single data can change the whole tree by changing the entropy and information gain.
- We use multiple trees for the model to be less sensitive to the dataset, and we call this Tree Ensemble.
4.2 Creating Tree Ensemble
Sampling with replacement
- This is a method for creating a new dataset.
- The intuition is you randomly choose a data from the dataset from the original dataset. When the data is picked, it is not excluded from the dataset to be picked next time. You do this to make the dataset the same size as the original one. This sampling method is called Sampling with Replacement
Bagged tree algorithm
for b = 1 to B
Use sampling with replacement to create a new training set of size m.
Train a decision tree on the new dataset
- Sometimes we use the same feature at the root node and a few of the nodes near the root node.
- We can randomize the feature choice by allowing the algorithm to only choose from the random subset of features.
- This algorithm is called Random Forest Algorithm
Boosting tree algorithm
Intuition: Efficient learning by creating a dataset using sampling with replacement with more likely to pick misclassified examples from previously trained trees rather than randomly picking the dataset.
ex)XGBoost(eXtreme Gradient Boosting)
5. Decision Tree vs Neural Networks
Decision Tree
- Work well on structured data
- Not recommended for unstructured data(images, audio, text);
- Fast
- Small decision trees may be human interpretable
Neural Networks
- works well on all types of data, including structured and instructed data
- may be slower than a decision tree
- works with transfer learning
- When building a system of multiple models working together, it might be easier to string together multiple neurla networks
'AI > ML Specialization' 카테고리의 다른 글
[Advanced Learning Algorithms] Machine Learning Development Process - 4 (0) | 2024.01.22 |
---|---|
[Advanced Learning Algorithms] Diagnostics / Bias & Variance - 3 (0) | 2024.01.16 |
[Advanced Learning Algorithms] Neural Network Training - 2 (0) | 2024.01.16 |
[Advanced Learning Algorithms] Neural Network - 1 (0) | 2023.08.03 |
[Supervised ML] Review - 11 (0) | 2023.08.03 |