ABOUT ME

-

Today: -

Yesterday: -

Total: -

CHAPTER2. Fundamentals of Machine Learning

공부기록/인공지능개론 2021. 7. 9. 18:58

2.1 Rule Based Machine Learning Overview

ML 정의

experiment E, tast T, performace P → T로 E를해서 P가 높아지는거

Function Approximation

instance X(example)
- Feature O(Sunny, Warm...)
- Label Y(Yes)
Training Set D
Hypotheses H
- hi = <sunny, warm, ?, ? , ? ,same> → yes
Target Function C(진짜)
H를 C로 만드는 것

Graphical Representation of Function Approximation

X와 H를 밴다이어그램으로 매칭해봄
General한 H일수록 instance X를 많이 포함함

2.2 Introduction to Rule Based Algorithm

Find-S Algorithm

D(instance 모음)에서 모든 X에 대해서
- X가 positive이면
  - 모든 features에 대해서 지금까지 없었던거 합집합시켜서 추가(ex, Strong만 있었는데 Light 들어오면 ?(둘다포함)으로 바뀜)

Version Space

많은 hypotheses를 모아서 범위를 찾아보자.(범위가 Version Space)
General Boundary G
Specific Boundary S
VS = G ≥ H ≥ S

Candidate Elimination Algorithm

Version Space 만들기 위해 가장 특정한 가설과 가장 범용적인 가설 사이에서 범위 좁혀서 찾음
G0 = <?,?,?,?,?,?>, S0=<NULL,NULL,NULL,NULL,NULL,NULL>

Progress of Candidate Elimination Algorithm

instance X를 보고 positive 경우면 더 General하게, negative 경우면 더 Specialize

How to classify the next instance?

S와 G사이에 있는 instance는 어떻게 결정할 것인가?
- 어려워서 다른 분야에선 못씀

Is this working?

잘 되는데, 완벽한 세상에서만 잘 된다. noise 처리가 안된다.

2.3 Introduction to Decision Tree

Because we live with noises

Credit Approval Dataset

UCI 데이터 셋(벤치마크 데이터셋)이 있는데 그중 하나인 신용카드 발급하냐?는 데이터셋
690 instance, 15 features, 2 classes
한 특성(A1)을 가지고 보는 트리를 만들었다고 해보자
A9(307+,383-)
- t(284+,77-)
- f(23+,306-)

2.4 Entropy and Information Gain

Entropy

어떤 attribute 를 체크해야 더 좋은가? → Reduce Uncertainty
높은 엔트로피 = 높은 불확실성
$H(X)=-\sum_XP(X=x)log_bP(X=x)$
- Conditional Entropy
  - $H(Y|X)=\sum_XP(X=x)H(Y|X=x)=\sum_XP(X=x)\{-\sum_YP(Y=y|X=x)log_bP(Y=y|X=x)\}$
  - 이때 $P(X=x)$가 prior(사전지식) 같은 역할을 한다

Information Gain

두 개의 Entropy가지고 생각함
Information Gain = 어떤 attribute를 선택했을 때, 때 Y의 entropy가 바뀐 값
- $IG(Y,A_i) = H(Y)-H(Y|A_i)$
- IG가 높은 것이 decision tree 만들 시 root로 사용

Top-Down Induction Algorithm

ID3,C4.5,CART등의 알고리즘이 존재함
ID3
1. 루트를 하나 만든다
2. 모든 instance를 루트에 다 넣음
3. split위한 best var 찾음(IG통해서)
4. 정렬된 아이템을 branch로 넣음

If you want more

decision tree 더 크게 만들면 되는데..

Problem of Decision Tree

큰 decision tree 만들었을 때(세세한 판정 Rule)
- 지금 있는 data는 100% 맞는다 해도 새 data에서는...? → overfit처럼 작동하게 됨
- 그래서 한계점이 다소 있다

2.5 How to create a decision tree given a training dataset

How about statical approach?

UCI - housing dataset
13 attribute(independent value), 1 true value(dependent value)
선형 추정 함수를 만들자
hypotheses를 function 으로 만들어보자 $h:\hat{f}(x;\theta)=\sum^n_{i=0}\theta_ix_i$, n = number of independant value
- $\theta$를 잘 정하면 되겠다

Finding THETA in Linear Regression

$\theta_0$을 1로 놓으면 $\hat{f} = X\theta$의 행렬꼴이 가능함
- $\begin{matrix}1 & ...& x_n^1\\...&...&...\\1&...&x_n^D \end{matrix}$
실제로 noise가 있어서 $f=X\theta+e = Y$. (e=error)
$\hat{\theta}=argmin_\theta(f-\hat{f})^2$를 행렬꼴로 전개해보면 $argmin_\theta(\theta^TX^TX\theta-2\theta^TX^TY)$. 중간에 $Y^TY$떨어져나감(상수니까 $\theta$랑 관련 없어서)

Optimized THETA

이것도 미분해서 극점을 활용해서 구한다
$\theta = (X^TY)^{-1}X^TY$

If you want more...

$\phi(x)$함수를 정의해서 x를 $x^2,x^3,x^4...$등으로 뻥튀기해서 $\theta$를 구할 수 있음
- 그런데 이게 Better Fitting인가? → 아닐 수도 있음

Too Brittle to Be Used Naively

지금까지 배운 decision tree, Linear Regression...
Simple하고, 많이 사용되기도 한다.
그러나 반대로, 많은 data들어오고 예측할 때는 한계점도 존재한다

Quiz.

IG가 놓은 것
$-{4\over11 }log{4\over11}-{7\over11}log{7\over11} \approx 0.95$
이미지가 안보임

저작자표시 비영리 (새창열림)

'공부기록 > 인공지능개론' 카테고리의 다른 글

5주차, Support Vector Machine (0)	2022.06.09
4주차, Logistic Regression (0)	2022.06.09
3강, Naive Bayes Classifier (0)	2022.06.03
2강, 독립사건과 확률 / 확률변수의 정의 (0)	2021.07.12
CHAPTER 1. Motivations and Basics (0)	2021.07.06

관련글 관련글 더보기

인기포스트

ABOUT ME

ADMIN

티스토리툴바