reveal.js

# Supervised Learning: Discriminative Models
## การเรียนรู้แบบมีผู้สอน: โมเดลแบบจำแนก

**ผู้จัดทำ:** อรรถพล คงหวาน

---

# Outline

1. บทนำ (Introduction)
2. การถดถอยเชิงเส้น (Linear Regression)
3. การถดถอยโลจิสติก (Logistic Regression)
4. เพอร์เซ็ปตรอน (Perceptron)
5. ความแตกต่างระหว่าง Generative และ Discriminative Models
6. การประเมินผลโมเดล (Evaluation Metrics)
7. สรุป

---

# 1. บทนำ
## Introduction

---

## Outline: บทนำ

1.1 ภาพรวมของ Discriminative Models

1.2 ประวัติความเป็นมา (Historical Timeline)

1.3 ตำแหน่งใน Machine Learning Landscape

---

## 1.1 ภาพรวมของ Discriminative Models

**โมเดลแบบจำแนก (Discriminative Models)** คือโมเดลที่มุ่งเน้นการเรียนรู้ **ขอบเขตการตัดสินใจ (Decision Boundary)** ระหว่างคลาสต่างๆ โดยตรง

โมเดลประเภทนี้พยายามเรียนรู้:
- ความน่าจะเป็นแบบมีเงื่อนไข **P(y|x)** โดยตรง
- หรือเรียนรู้ฟังก์ชันการทำนาย **f(x) → y**

แตกต่างจาก Generative Models ที่เรียนรู้ **P(x|y)** และ **P(y)**

---

## 1.2 ประวัติความเป็นมา

subgraph era2["ยุคพัฒนา 1970s-1990s"]
        C["1972: Logistic Regression"]
        D["1986: Backpropagation"]
        C --> D
    end

subgraph era3["ยุคปัจจุบัน 2000s-Now"]
        E["2006: Deep Learning"]
        F["2012: AlexNet"]
        E --> F
    end

era1 --> era2 --> era3
```

---

## 1.3 ตำแหน่งใน Machine Learning Landscape

```mermaid
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#458588', 'primaryTextColor': '#ebdbb2', 'primaryBorderColor': '#ebdbb2', 'lineColor': '#ebdbb2', 'secondaryColor': '#689d6a', 'tertiaryColor': '#d79921', 'background': '#282828', 'mainBkg': '#282828', 'textColor': '#ebdbb2'}}}%%
flowchart TB
    subgraph ML["Machine Learning"]
        subgraph SL["Supervised Learning"]
            subgraph GEN["Generative Models"]
                G1["Naive Bayes"]
                G2["GDA"]
                G3["HMM"]
            end
            subgraph DIS["Discriminative Models"]
                D1["Linear Regression"]
                D2["Logistic Regression"]
                D3["Perceptron"]
                D4["SVM"]
                D5["Neural Networks"]
            end
        end
        subgraph UL["Unsupervised Learning"]
            U1["K-Means"]
            U2["PCA"]
        end
    end
```

---

# 2. การถดถอยเชิงเส้น
## Linear Regression

---

## Outline: Linear Regression

2.1 แนวคิดพื้นฐาน

2.2 สมการทางคณิตศาสตร์

2.3 ฟังก์ชันค่าใช้จ่าย (Cost Function)

2.4 วิธีการหาคำตอบ

2.5 ตัวอย่างการคำนวณ

2.6 การเปรียบเทียบวิธีการหาคำตอบ

2.7 การใช้งานด้วย Scikit-learn

---

## 2.1 แนวคิดพื้นฐาน

**การถดถอยเชิงเส้น (Linear Regression)** เป็นวิธีการทางสถิติสำหรับ **การทำนายค่าต่อเนื่อง (Continuous Value Prediction)**

**สมมติฐานหลัก:**
- ความสัมพันธ์ระหว่าง X และ y เป็นเชิงเส้น
- ข้อผิดพลาด (Errors) มีการแจกแจงแบบปกติ
- ความแปรปรวนของข้อผิดพลาดคงที่ (Homoscedasticity)
- ข้อผิดพลาดเป็นอิสระต่อกัน (Independence)

---

## 2.2.1 Simple Linear Regression

สมการพื้นฐานของการถดถอยเชิงเส้นอย่างง่าย:

**คำอธิบายตัวแปร:**
- **y** = ตัวแปรตาม (Target/Dependent Variable)
- **x** = ตัวแปรอิสระ (Feature/Independent Variable)
- **w₀** = ค่าจุดตัดแกน y (Intercept/Bias)
- **w₁** = ค่าความชัน (Slope/Weight)
- **ε** = ค่าความผิดพลาด (Error Term)

---

## 2.2.2 Multiple Linear Regression

สำหรับกรณีที่มีหลายตัวแปร:

หรือเขียนในรูปเวกเตอร์:

---

## 2.2.2 Multiple Linear Regression (ต่อ)

**คำอธิบายตัวแปร:**
- **w** = เวกเตอร์น้ำหนัก [w₀, w₁, w₂, ..., wₙ]ᵀ
- **x** = เวกเตอร์คุณลักษณะ [1, x₁, x₂, ..., xₙ]ᵀ

---

## 2.3 Mean Squared Error (MSE)

**ฟังก์ชัน Mean Squared Error** ใช้วัดความแตกต่างระหว่างค่าทำนายและค่าจริง:

**คำอธิบายตัวแปร:**
- **J(w)** = ค่า Cost Function
- **m** = จำนวนตัวอย่างข้อมูล
- **yᵢ** = ค่าจริงของตัวอย่างที่ i
- **ŷᵢ** = ค่าทำนายของตัวอย่างที่ i

---

## 2.4.1 Normal Equation

**สมการปกติ (Normal Equation)** ให้คำตอบโดยตรงโดยไม่ต้องทำซ้ำ:

**คำอธิบายตัวแปร:**
- **X** = เมทริกซ์ข้อมูล (m × n)
- **y** = เวกเตอร์ค่าเป้าหมาย (m × 1)
- **w** = เวกเตอร์น้ำหนักที่ต้องการหา (n × 1)

---

## 2.4.2 Gradient Descent

**การลดระดับความชัน (Gradient Descent)** เป็นวิธีการปรับค่าน้ำหนักแบบทำซ้ำ:

**คำอธิบายตัวแปร:**
- **α** = อัตราการเรียนรู้ (Learning Rate)
- **∂J/∂wⱼ** = อนุพันธ์ย่อยของ Cost Function ต่อน้ำหนัก wⱼ

---

## 2.4.2 Gradient Descent สำหรับ Linear Regression

---

## 2.5 ตัวอย่างข้อมูลราคาบ้าน

| ตัวอย่าง | พื้นที่ (x₁) ตร.ม. | ห้องนอน (x₂) | อายุบ้าน (x₃) ปี | ราคา (y) ล้านบาท |
|----------|-------------------|--------------|------------------|------------------|
| 1        | 50                | 1            | 10               | 2.0              |
| 2        | 80                | 2            | 5                | 3.5              |
| 3        | 100               | 3            | 2                | 5.0              |
| 4        | 120               | 3            | 8                | 4.5              |

**เป้าหมาย:** หาค่าน้ำหนัก w₀, w₁, w₂, w₃

---

## 2.5 สมการที่ต้องการหา

---

## Normal Equation: ขั้นตอนที่ 1

**สร้างเมทริกซ์ X (เพิ่มคอลัมน์ 1 สำหรับ bias)**

---

## Normal Equation: เวกเตอร์ y

---

## Normal Equation: ขั้นตอนที่ 2 คำนวณ X^T

---

## Normal Equation: คำนวณ X^T X

---

## Normal Equation: ขั้นตอนที่ 3 คำนวณ X^T y

---

## Normal Equation: โค้ด Python

```python
import numpy as np

X = np.array([
    [1, 50, 1, 10],
    [1, 80, 2, 5],
    [1, 100, 3, 2],
    [1, 120, 3, 8]
])
y = np.array([2.0, 3.5, 5.0, 4.5])

# Normal Equation: w = (X^T X)^(-1) X^T y
XtX = X.T @ X
XtX_inv = np.linalg.inv(XtX)
Xty = X.T @ y
w = XtX_inv @ Xty
```

---

## Normal Equation: ผลลัพธ์

```
น้ำหนักที่ได้จาก Normal Equation:
w₀ (Intercept) = -0.1346
w₁ (พื้นที่)    = 0.0269
w₂ (ห้องนอน)   = 0.8462
w₃ (อายุบ้าน)  = -0.0577
```

**ความหมาย:**
- **w₁ = 0.0269**: พื้นที่เพิ่ม 1 ตร.ม. → ราคาเพิ่ม 26,900 บาท
- **w₂ = 0.8462**: ห้องนอนเพิ่ม 1 ห้อง → ราคาเพิ่ม 0.8462 ล้านบาท
- **w₃ = -0.0577**: อายุบ้านเพิ่ม 1 ปี → ราคาลด 0.0577 ล้านบาท

---

## Gradient Descent: Feature Scaling

เนื่องจากค่าของแต่ละ feature มี scale ต่างกันมาก จึงต้อง normalize ก่อน:

```python
X_features = np.array([[50, 1, 10], [80, 2, 5],
                       [100, 3, 2], [120, 3, 8]])
mean = X_features.mean(axis=0)  # [87.5, 2.25, 6.25]
std = X_features.std(axis=0)    # [25.0, 0.829, 3.031]
X_normalized = (X_features - mean) / std
```

---

## Gradient Descent: Implementation

```python
def gradient_descent(X, y, alpha=0.01, n_iterations=10000):
    m, n = X.shape
    w = np.zeros(n)

for iteration in range(n_iterations):
        y_pred = X @ w
        error = y_pred - y
        gradient = (1/m) * X.T @ error
        w = w - alpha * gradient

return w
```

---

## Gradient Descent: ผลลัพธ์

```
Iteration 0: Cost = 4.562500
Iteration 2000: Cost = 0.011364
Iteration 4000: Cost = 0.010975
Iteration 6000: Cost = 0.010963
Iteration 8000: Cost = 0.010962

น้ำหนักที่ได้จาก Gradient Descent (scaled):
w = [3.75, 0.62, 0.68, -0.17]
```

---

## 2.5 การตรวจสอบผลลัพธ์

**ทำนายราคาบ้านใหม่:** พื้นที่ 90 ตร.ม., 2 ห้องนอน, อายุ 3 ปี

<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
  <mrow>
    <mover><mi>y</mi><mo>^</mo></mover>
    <mo>=</mo>
    <mn>3.81</mn>
    <mtext> ล้านบาท</mtext>
  </mrow>
</math>

---

## 2.6 การเปรียบเทียบวิธีการหาคำตอบ

| คุณสมบัติ | Normal Equation | Gradient Descent |
|-----------|-----------------|------------------|
| **ความเร็ว (n เล็ก)** | เร็วกว่า | ช้ากว่า |
| **ความเร็ว (n ใหญ่)** | ช้า O(n³) | เร็วกว่า O(kn²) |
| **ต้องเลือก α** | ไม่ต้อง | ต้องเลือก |
| **ต้องทำ Scaling** | ไม่ต้อง | ต้องทำ |
| **Invertibility** | ต้อง invertible | ไม่จำเป็น |
| **เหมาะกับ** | n < 10,000 | n ใหญ่มาก |

---

## 2.6 เปรียบเทียบผลลัพธ์

| วิธีการ | w₀ | w₁ | w₂ | w₃ |
|---------|-----|-----|-----|-----|
| Normal Equation | -0.1346 | 0.0269 | 0.8462 | -0.0577 |
| Gradient Descent | -0.1344 | 0.0269 | 0.8460 | -0.0576 |

**ข้อสังเกต:** ทั้งสองวิธีให้ผลลัพธ์ใกล้เคียงกันมาก

---

## 2.7 Scikit-learn: Linear Regression

```python
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

# แบ่งข้อมูล Train/Test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42
)

# สร้างและฝึกโมเดล
model = LinearRegression()
model.fit(X_train, y_train)

print(f"Intercept: {model.intercept_}")
print(f"Coefficients: {model.coef_}")
```

---

## 2.7 Scikit-learn: ผลลัพธ์

```
Intercept (w₀): -0.2143
Coefficients:
  w₁ (พื้นที่):   0.0286
  w₂ (ห้องนอน):  0.7857
  w₃ (อายุบ้าน): -0.0571

ผลการประเมิน:
  Train MSE: 0.0143
  Test MSE:  0.0200
  Train R²:  0.9857
  Test R²:   0.9750
```

---

## 2.7 สมการที่ได้จาก Scikit-learn

```
y = -0.2143 + 0.0286×พื้นที่ + 0.7857×ห้องนอน + -0.0571×อายุบ้าน
```

**ตัวอย่างการทำนาย:**

บ้าน: 90 ตร.ม., 2 ห้องนอน, อายุ 3 ปี

ราคาทำนาย: **3.89 ล้านบาท**

---

# 3. การถดถอยโลจิสติก
## Logistic Regression

---

## Outline: Logistic Regression

3.1 แนวคิดพื้นฐาน

3.2 Sigmoid Function

3.3 สมการทางคณิตศาสตร์

3.4 ฟังก์ชันค่าใช้จ่าย (Cost Function)

3.5 Gradient Descent

3.6 ตัวอย่างการคำนวณ

3.7 Multi-class Logistic Regression

3.8 การใช้งานด้วย Scikit-learn

---

## 3.1 แนวคิดพื้นฐาน

**การถดถอยโลจิสติก (Logistic Regression)** เป็นอัลกอริทึมสำหรับ **การจำแนกประเภท (Classification)**

**จุดเด่น:**
- ให้ผลลัพธ์เป็นความน่าจะเป็น (0-1)
- ตีความง่าย (Interpretable)
- ทำงานได้ดีกับข้อมูลที่แยกได้เชิงเส้น
- เป็นพื้นฐานของ Neural Networks

---

## 3.2 Sigmoid Function

**ฟังก์ชัน Sigmoid** แปลงค่าใดๆ ให้อยู่ในช่วง (0, 1):

**คุณสมบัติสำคัญ:**
- σ(0) = 0.5
- เมื่อ z → ∞, σ(z) → 1
- เมื่อ z → -∞, σ(z) → 0
- อนุพันธ์: σ'(z) = σ(z)(1 - σ(z))

---

## 3.2 Sigmoid Function: Diagram

subgraph SIGMOID["Sigmoid Function"]
        B["σ(z) = 1/(1+e^(-z))"]
    end

subgraph OUTPUT["เอาต์พุต"]
        C["P(y=1|x)"]
    end

A --> B --> C
```

---

## 3.3.1 Hypothesis Function

**คำอธิบาย:**
- **h_w(x)** = ค่าทำนาย (ความน่าจะเป็นที่ y = 1)
- **w** = เวกเตอร์น้ำหนัก
- **x** = เวกเตอร์คุณลักษณะ

---

## 3.3.2 การตัดสินใจ (Decision Rule)

---

## 3.4 Binary Cross-Entropy Loss

ใช้ **Binary Cross-Entropy (Log Loss)** แทน MSE:

---

## 3.4.2 ความหมายของ Cost Function

| เมื่อ y = 1 | เมื่อ y = 0 |
|-------------|-------------|
| Cost = -log(ŷ) | Cost = -log(1-ŷ) |
| ถ้า ŷ → 1, Cost → 0 | ถ้า ŷ → 0, Cost → 0 |
| ถ้า ŷ → 0, Cost → ∞ | ถ้า ŷ → 1, Cost → ∞ |

---

## 3.5 Gradient Descent สำหรับ Logistic Regression

**ข้อสังเกต:** แม้สูตรจะดูเหมือนกัน แต่ h_w(x) ต่างกัน (Linear vs Sigmoid)

---

## 3.6 ตัวอย่างข้อมูลการซื้อสินค้า

| ตัวอย่าง | อายุ (x₁) | รายได้ (x₂) หมื่น/เดือน | เวลาดูสินค้า (x₃) นาที | ซื้อ (y) |
|----------|-----------|------------------------|----------------------|----------|
| 1        | 25        | 2.5                    | 5                    | 0        |
| 2        | 35        | 4.0                    | 15                   | 1        |
| 3        | 45        | 5.5                    | 10                   | 1        |
| 4        | 22        | 2.0                    | 3                    | 0        |
| 5        | 38        | 4.5                    | 20                   | 1        |
| 6        | 30        | 3.0                    | 8                    | 0        |

---

## 3.6 สมการที่ต้องการหา

---

## 3.6 Gradient Descent: Implementation

```python
def sigmoid(z):
    return 1 / (1 + np.exp(-np.clip(z, -500, 500)))

def gradient_descent_logistic(X, y, alpha=0.1, n_iterations=5000):
    m, n = X.shape
    w = np.zeros(n)

for iteration in range(n_iterations):
        z = X @ w
        h = sigmoid(z)
        gradient = (1/m) * X.T @ (h - y)
        w = w - alpha * gradient

return w
```

---

## 3.6 ผลลัพธ์การฝึก

```
Iteration 0: Cost = 0.693147
Iteration 1000: Cost = 0.298721
Iteration 2000: Cost = 0.238456
Iteration 3000: Cost = 0.212893
Iteration 4000: Cost = 0.198654

น้ำหนักที่ได้ (scaled):
w₀ (Intercept) = 0.4055
w₁ (อายุ)      = 0.8234
w₂ (รายได้)    = 0.9156
w₃ (เวลาดู)    = 1.2341
```

---

## 3.6 การทำนายลูกค้าใหม่

**ลูกค้าใหม่:** อายุ 33 ปี, รายได้ 3.8 หมื่น/เดือน, ดูสินค้า 12 นาที

**คำนวณ z:**

---

## 3.6 การทำนาย: คำนวณ Sigmoid

**ผลลัพธ์:**
- ความน่าจะเป็นที่จะซื้อ: **73.23%**
- ทำนาย: **ซื้อ**

---

## 3.7 Multi-class: One-vs-Rest (OvR)

**แนวคิด:** แปลงปัญหา Multi-class ให้เป็นหลายๆ ปัญหา Binary Classification

สำหรับ **K คลาส** จะสร้าง **K ตัวจำแนก** โดยแต่ละตัวแยก "คลาสนั้น" ออกจาก "คลาสอื่นทั้งหมด"

---

## 3.7 One-vs-Rest: Diagram

subgraph CLASSIFIERS["Binary Classifiers"]
        C1["Classifier 1: A vs BC"]
        C2["Classifier 2: B vs AC"]
        C3["Classifier 3: C vs AB"]
    end

subgraph OUTPUT["เอาต์พุต"]
        O["เลือกคลาสที่มี<br/>ความน่าจะเป็นสูงสุด"]
    end

D --> C1
    D --> C2
    D --> C3
    C1 --> O
    C2 --> O
    C3 --> O
```

---

## 3.7 OvR: ตัวอย่างการคำนวณ

**ข้อมูลใหม่:** x = [1.5, 0.7]

| Classifier | z | σ(z) | ความน่าจะเป็น |
|------------|---|------|---------------|
| A vs Rest | 1.45 | 0.810 | **81.0%** (สูงสุด) |
| B vs Rest | -1.1 | 0.250 | 25.0% |
| C vs Rest | -1.15 | 0.240 | 24.0% |

**ทำนาย: คลาส A**

---

## 3.7 Softmax Regression

**Softmax Regression** เรียนรู้น้ำหนักสำหรับทุกคลาสพร้อมกันในโมเดลเดียว

**Softmax Function:**

```
                    e^(z_k)
P(y = k | x) = ─────────────────
                K
                Σ e^(z_j)
               j=1
```

**คุณสมบัติ:** ผลรวมเท่ากับ 1 และค่าอยู่ระหว่าง 0-1

---

## 3.7 Softmax: ตัวอย่างการคำนวณ

**ข้อมูลใหม่:** x = [1, 1.5, 0.7]

| ขั้นตอน | คลาส A | คลาส B | คลาส C |
|---------|--------|--------|--------|
| z | 7.55 | 1.5 | 1.05 |
| e^z | 1900.67 | 4.48 | 2.86 |
| **Softmax** | **99.62%** | 0.23% | 0.15% |

**ผลรวม:** 99.62% + 0.23% + 0.15% = 100%

**ทำนาย: คลาส A** (มีความมั่นใจสูงมาก)

---

## 3.7 Cross-Entropy Loss สำหรับ Multi-class

**Categorical Cross-Entropy Loss:**

```
              1   m   K
J(W) = - ─── Σ   Σ  y_(i,k) · log(ŷ_(i,k))
              m  i=1 k=1
```

- **y_{i,k}** = 1 ถ้าตัวอย่าง i เป็นคลาส k (One-hot encoding)
- **ŷ_{i,k}** = P(y=k|xᵢ) จาก Softmax

---

## 3.7 เปรียบเทียบ OvR vs Softmax

| คุณสมบัติ | One-vs-Rest | Softmax |
|-----------|-------------|---------|
| จำนวน Classifiers | K ตัว (แยกกัน) | 1 ตัว (รวม) |
| ความน่าจะเป็น | ไม่รวมกันเป็น 1 | รวมกันได้ 1 เสมอ |
| ความซับซ้อน | ง่ายกว่า | ซับซ้อนกว่า |
| การฝึก | Parallelizable | ต้องฝึกพร้อมกัน |
| เหมาะกับ | คลาสไม่ขึ้นต่อกัน | คลาส mutually exclusive |

---

## 3.8 Scikit-learn: Logistic Regression

```python
from sklearn.linear_model import LogisticRegression

model = LogisticRegression(
    penalty='l2',           # Regularization
    C=1.0,                  # Inverse regularization strength
    solver='lbfgs',         # Algorithm
    max_iter=1000,
    random_state=42
)

model.fit(X_train_scaled, y_train)

# ทำนายคลาส
y_pred = model.predict(X_test_scaled)

# ทำนายความน่าจะเป็น
y_proba = model.predict_proba(X_test_scaled)
```

---

## 3.8 Scikit-learn: ผลลัพธ์

```
Intercept (w₀): 0.4521
Coefficients:
  w₁ (อายุ):      0.7823
  w₂ (รายได้):    0.8956
  w₃ (เวลาดู):    1.1234

Metrics:
  Accuracy:  1.0000
  Precision: 1.0000
  Recall:    1.0000
  F1 Score:  1.0000
```

---

## 3.8 Multi-class ด้วย Scikit-learn

```python
# Softmax (Multinomial)
model_softmax = LogisticRegression(
    multi_class='multinomial',
    solver='lbfgs',
    max_iter=1000
)

# One-vs-Rest
model_ovr = LogisticRegression(
    multi_class='ovr',
    solver='lbfgs',
    max_iter=1000
)
```

---

# 4. เพอร์เซ็ปตรอน
## Perceptron

---

## Outline: Perceptron

4.1 แนวคิดพื้นฐาน

4.2 โครงสร้าง Perceptron

4.3 สมการทางคณิตศาสตร์

4.4 Perceptron Learning Algorithm

4.5 ตัวอย่างการคำนวณ

4.6 การ Implement ด้วย Python

4.7 ข้อจำกัดของ Perceptron

4.8 การใช้งานด้วย Scikit-learn

---

## 4.1 แนวคิดพื้นฐาน

**เพอร์เซ็ปตรอน (Perceptron)** คิดค้นโดย **Frank Rosenblatt** ในปี 1958

**ลักษณะสำคัญ:**
- เป็น Binary Linear Classifier
- ใช้ฟังก์ชัน Step Function แทน Sigmoid
- เรียนรู้แบบ Online (ทีละตัวอย่าง)
- รับประกันการลู่เข้า (Convergence) ถ้าข้อมูลแยกได้เชิงเส้น
- เป็นต้นแบบของ **Neural Networks**

---

## 4.2 โครงสร้าง Perceptron

subgraph SUM["ผลรวมถ่วงน้ำหนัก"]
        S["Σ = w₀ + w₁x₁ + w₂x₂"]
    end

subgraph ACTIVATION["Step Function"]
        A["f(Σ) = 1 if Σ ≥ 0<br/>f(Σ) = 0 if Σ < 0"]
    end

subgraph OUTPUT["เอาต์พุต"]
        Y["ŷ ∈ {0, 1}"]
    end

INPUTS --> SUM --> ACTIVATION --> OUTPUT
```

---

## 4.3.1 Step Function

---

## 4.3.2 Hypothesis

---

## 4.4.1 กฎการอัปเดตน้ำหนัก

**คำอธิบาย:**
- **α** = อัตราการเรียนรู้ (มักใช้ 1)
- **(y - ŷ)** = ค่าความผิดพลาด
- อัปเดตเฉพาะเมื่อทำนายผิด

---

## 4.4.2 ตารางการอัปเดต

| y (จริง) | ŷ (ทำนาย) | y - ŷ | การอัปเดต |
|----------|-----------|-------|-----------|
| 0        | 0         | 0     | ไม่อัปเดต |
| 0        | 1         | -1    | w := w - x |
| 1        | 0         | +1    | w := w + x |
| 1        | 1         | 0     | ไม่อัปเดต |

---

## 4.5 ตัวอย่าง: เรียนรู้ฟังก์ชัน OR

| x₁ | x₂ | y (OR) |
|----|-----|--------|
| 0  | 0   | 0      |
| 0  | 1   | 1      |
| 1  | 0   | 1      |
| 1  | 1   | 1      |

**เริ่มต้น:** w₀ = 0, w₁ = 0, w₂ = 0, α = 1

---

## 4.5 Epoch 1: ตัวอย่างที่ 1

**x = [1, 0, 0], y = 0**

- z = 0×1 + 0×0 + 0×0 = 0
- ŷ = f(0) = 1 (เพราะ z ≥ 0)
- Error = 0 - 1 = -1
- อัปเดต: w = [0, 0, 0] + (-1)[1, 0, 0] = **[-1, 0, 0]**

---

## 4.5 Epoch 1: ตัวอย่างที่ 2

**x = [1, 0, 1], y = 1**

- z = -1×1 + 0×0 + 0×1 = -1
- ŷ = f(-1) = 0
- Error = 1 - 0 = 1
- อัปเดต: w = [-1, 0, 0] + (1)[1, 0, 1] = **[0, 0, 1]**

---

## 4.5 Epoch 1: ตัวอย่างที่ 3-4

**ตัวอย่างที่ 3:** x = [1, 1, 0], y = 1
- z = 0×1 + 0×1 + 1×0 = 0
- ŷ = f(0) = 1, Error = 0
- **ไม่อัปเดต**

**ตัวอย่างที่ 4:** x = [1, 1, 1], y = 1
- z = 0×1 + 0×1 + 1×1 = 1
- ŷ = f(1) = 1, Error = 0
- **ไม่อัปเดต**

(ดำเนินการต่อจนลู่เข้า...)

---

## 4.6 Python Implementation

```python
class Perceptron:
    def __init__(self, learning_rate=1.0, n_iterations=100):
        self.learning_rate = learning_rate
        self.n_iterations = n_iterations
        self.weights = None

def _step_function(self, z):
        return np.where(z >= 0, 1, 0)

def fit(self, X, y):
        X_b = np.column_stack([np.ones(X.shape[0]), X])
        self.weights = np.zeros(X_b.shape[1])

for epoch in range(self.n_iterations):
            errors = 0
            for i in range(len(y)):
                y_pred = self._step_function(np.dot(self.weights, X_b[i]))
                error = y[i] - y_pred
                if error != 0:
                    self.weights += self.learning_rate * error * X_b[i]
                    errors += 1
            if errors == 0:
                break
```

---

## 4.6 ผลลัพธ์การเรียนรู้ Logic Gates

```
=== OR Gate ===
ลู่เข้าที่ epoch 4
น้ำหนัก: [-1.  1.  1.]
Accuracy: 1.0000

=== AND Gate ===
ลู่เข้าที่ epoch 5
น้ำหนัก: [-2.  1.  1.]
Accuracy: 1.0000

=== XOR Gate ===
(ไม่ลู่เข้า)
Accuracy: 0.5000
```

---

## 4.7 ข้อจำกัด: XOR Problem

subgraph NONLINEAR["Not Linearly Separable ✗"]
        N1["XOR: แยกไม่ได้"]
    end

subgraph SOLUTION["ทางแก้"]
        S1["Multi-Layer Perceptron"]
        S2["Feature Engineering"]
        S3["Kernel Methods (SVM)"]
    end

NONLINEAR --> SOLUTION
```

---

## 4.8 Scikit-learn: Perceptron

```python
from sklearn.linear_model import Perceptron

perceptron = Perceptron(
    penalty=None,           # Regularization
    alpha=0.0001,          # Regularization strength
    max_iter=1000,
    eta0=0.1,              # Learning rate
    tol=1e-3,              # Stopping criterion
    shuffle=True,
    random_state=42
)

perceptron.fit(X_train_scaled, y_train)
```

---

## 4.8 เปรียบเทียบ Perceptron vs Logistic Regression

| คุณสมบัติ | Perceptron | Logistic Regression |
|-----------|------------|---------------------|
| Activation | Step Function | Sigmoid |
| Output | 0 หรือ 1 | ความน่าจะเป็น (0-1) |
| Loss Function | Hinge-like | Cross-Entropy |
| Probability | ไม่มี | มี |
| Convergence | Linearly Separable | Always |

---

# 5. Generative vs Discriminative
## ความแตกต่างระหว่างโมเดล

---

## Outline: Generative vs Discriminative

5.1 หลักการพื้นฐาน

5.2 การเปรียบเทียบเชิงภาพ

5.3 ตารางเปรียบเทียบ

5.4 ข้อดี-ข้อเสีย

5.5 ตัวอย่างการเลือกใช้

---

## 5.1.1 Generative Models

**โมเดลแบบกำเนิด** เรียนรู้ **P(x, y)** หรือ **P(x|y), P(y)** แล้วใช้ทฤษฎีเบส์:

**ตัวอย่าง:** Naive Bayes, GDA, HMM

---

## 5.1.2 Discriminative Models

**โมเดลแบบจำแนก** เรียนรู้ **P(y|x) โดยตรง** หรือเรียนรู้ **ขอบเขตการตัดสินใจ**

**ตัวอย่าง:** Logistic Regression, Perceptron, SVM, Neural Networks

---

## 5.2 การเปรียบเทียบเชิงภาพ

subgraph DIS["Discriminative Approach"]
        D1["เรียนรู้ P(y|x) โดยตรง"]
        D2["หา Decision Boundary"]
        D1 --> D2
    end
```

---

## 5.3 ตารางเปรียบเทียบ

| หัวข้อ | Generative | Discriminative |
|--------|------------|----------------|
| **เรียนรู้** | P(x,y) หรือ P(x\|y), P(y) | P(y\|x) โดยตรง |
| **วิธีการ** | สร้างแบบจำลองการแจกแจงข้อมูล | หาขอบเขตการตัดสินใจ |
| **ความสามารถ** | สร้างข้อมูลใหม่ได้ | จำแนกเท่านั้น |
| **Missing Data** | ดีกว่า | แย่กว่า |
| **ประสิทธิภาพ** | ต่ำกว่า (ข้อมูลมาก) | สูงกว่า (ข้อมูลมาก) |

---

## 5.4.1 ข้อดี Generative Models

- สามารถสร้างข้อมูลใหม่ได้ (Data Generation)
- จัดการกับ Missing Data ได้ดี
- เข้าใจโครงสร้างข้อมูลได้ลึกกว่า
- ใช้ Prior Knowledge ได้ง่าย
- ทำงานได้ดีเมื่อข้อมูลน้อย

---

## 5.4.2 ข้อดี Discriminative Models

- ประสิทธิภาพการจำแนกสูงกว่า (ข้อมูลมาก)
- เรียนรู้เร็วกว่า
- ไม่ต้องตั้ง Assumption เกี่ยวกับการแจกแจงข้อมูล
- ยืดหยุ่นกว่าในการจัดการ Feature ที่ซับซ้อน

---

## 5.5 ตัวอย่างการเปรียบเทียบ

```python
from sklearn.naive_bayes import GaussianNB      # Generative
from sklearn.linear_model import LogisticRegression  # Discriminative

# สร้างข้อมูลสังเคราะห์
X, y = make_classification(n_samples=1000, ...)

# Generative: Gaussian Naive Bayes
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# Discriminative: Logistic Regression
lr = LogisticRegression()
lr.fit(X_train, y_train)
```

---

## 5.5 ผลลัพธ์การเปรียบเทียบ

```
=== ข้อมูลน้อย (n=100) ===
Gaussian Naive Bayes: 0.8333
Logistic Regression:  0.8000
→ Generative ชนะ

=== ข้อมูลมาก (n=10000) ===
Gaussian Naive Bayes: 0.8510
Logistic Regression:  0.8823
→ Discriminative ชนะ
```

---

# 6. การประเมินผลโมเดล
## Evaluation Metrics

---

## Outline: Evaluation Metrics

6.1 Confusion Matrix

6.2 Metrics พื้นฐาน

6.3 ตัวอย่างการคำนวณ

6.4 ROC Curve และ AUC

6.5 Metrics สำหรับ Regression

6.6 การ Implement ด้วย Python

6.7 การเลือก Metric ที่เหมาะสม

---

## 6.1 Confusion Matrix

**เมทริกซ์ความสับสน** แสดงผลการจำแนกประเภทเมื่อเทียบกับค่าจริง:

| | ทำนาย Positive | ทำนาย Negative |
|---|----------------|----------------|
| **จริง Positive** | True Positive (TP) | False Negative (FN) |
| **จริง Negative** | False Positive (FP) | True Negative (TN) |

---

## 6.1 Confusion Matrix: Diagram

```mermaid
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#458588', 'primaryTextColor': '#ebdbb2', 'primaryBorderColor': '#ebdbb2', 'lineColor': '#ebdbb2', 'secondaryColor': '#689d6a', 'tertiaryColor': '#d79921', 'background': '#282828', 'mainBkg': '#282828', 'textColor': '#ebdbb2'}}}%%
flowchart TB
    subgraph CM["Confusion Matrix"]
        subgraph POS["Actual Positive"]
            TP["TP: True Positive"]
            FN["FN: False Negative<br/>(Type II Error)"]
        end
        subgraph NEG["Actual Negative"]
            FP["FP: False Positive<br/>(Type I Error)"]
            TN["TN: True Negative"]
        end
    end
```

---

## 6.2.1 Accuracy

**ความถูกต้อง** = สัดส่วนการทำนายถูกทั้งหมด

<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
  <mrow>
    <mtext>Accuracy</mtext>
    <mo>=</mo>
    <mfrac>
      <mrow><mi>TP</mi><mo>+</mo><mi>TN</mi></mrow>
      <mrow><mi>TP</mi><mo>+</mo><mi>TN</mi><mo>+</mo><mi>FP</mi><mo>+</mo><mi>FN</mi></mrow>
    </mfrac>
  </mrow>
</math>

**ข้อจำกัด:** ไม่เหมาะกับข้อมูลที่ไม่สมดุล (Imbalanced Data)

---

## 6.2.2 Precision

**ความแม่นยำ** = ในสิ่งที่ทำนายว่า Positive ถูกต้องกี่เปอร์เซ็นต์

<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
  <mrow>
    <mtext>Precision</mtext>
    <mo>=</mo>
    <mfrac>
      <mi>TP</mi>
      <mrow><mi>TP</mi><mo>+</mo><mi>FP</mi></mrow>
    </mfrac>
  </mrow>
</math>

**ใช้เมื่อ:** ต้องการลด False Positive (เช่น Spam Detection)

---

## 6.2.3 Recall (Sensitivity)

**ความไว** = ในสิ่งที่เป็น Positive จริง ทำนายถูกกี่เปอร์เซ็นต์

<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
  <mrow>
    <mtext>Recall</mtext>
    <mo>=</mo>
    <mfrac>
      <mi>TP</mi>
      <mrow><mi>TP</mi><mo>+</mo><mi>FN</mi></mrow>
    </mfrac>
  </mrow>
</math>

**ใช้เมื่อ:** ต้องการลด False Negative (เช่น การวินิจฉัยโรค)

---

## 6.2.4 F1-Score

**คะแนน F1** = ค่าเฉลี่ยฮาร์โมนิกของ Precision และ Recall

<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
  <mrow>
    <msub><mi>F</mi><mn>1</mn></msub>
    <mo>=</mo>
    <mn>2</mn>
    <mo>×</mo>
    <mfrac>
      <mrow><mtext>Precision</mtext><mo>×</mo><mtext>Recall</mtext></mrow>
      <mrow><mtext>Precision</mtext><mo>+</mo><mtext>Recall</mtext></mrow>
    </mfrac>
  </mrow>
</math>

**ใช้เมื่อ:** ต้องการสมดุลระหว่าง Precision และ Recall

---

## 6.3 ตัวอย่างข้อมูล

สมมติโมเดลทำนายว่าผู้ป่วยเป็นโรคหรือไม่:

| | ทำนาย: เป็นโรค | ทำนาย: ไม่เป็น |
|---|---------------|----------------|
| **จริง: เป็นโรค** | TP = 80 | FN = 20 |
| **จริง: ไม่เป็น** | FP = 10 | TN = 90 |

---

## 6.3 การคำนวณ Accuracy

<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
  <mrow>
    <mtext>Accuracy</mtext>
    <mo>=</mo>
    <mfrac>
      <mrow><mn>80</mn><mo>+</mo><mn>90</mn></mrow>
      <mrow><mn>80</mn><mo>+</mo><mn>90</mn><mo>+</mo><mn>10</mn><mo>+</mo><mn>20</mn></mrow>
    </mfrac>
    <mo>=</mo>
    <mfrac>
      <mn>170</mn>
      <mn>200</mn>
    </mfrac>
    <mo>=</mo>
    <mn>85</mn>
    <mo>%</mo>
  </mrow>
</math>

---

## 6.3 การคำนวณ Precision

<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
  <mrow>
    <mtext>Precision</mtext>
    <mo>=</mo>
    <mfrac>
      <mn>80</mn>
      <mrow><mn>80</mn><mo>+</mo><mn>10</mn></mrow>
    </mfrac>
    <mo>=</mo>
    <mfrac>
      <mn>80</mn>
      <mn>90</mn>
    </mfrac>
    <mo>=</mo>
    <mn>88.9</mn>
    <mo>%</mo>
  </mrow>
</math>

---

## 6.3 การคำนวณ Recall

<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
  <mrow>
    <mtext>Recall</mtext>
    <mo>=</mo>
    <mfrac>
      <mn>80</mn>
      <mrow><mn>80</mn><mo>+</mo><mn>20</mn></mrow>
    </mfrac>
    <mo>=</mo>
    <mfrac>
      <mn>80</mn>
      <mn>100</mn>
    </mfrac>
    <mo>=</mo>
    <mn>80</mn>
    <mo>%</mo>
  </mrow>
</math>

---

## 6.3 การคำนวณ F1-Score

---

## 6.4.1 ROC Curve

**ROC Curve** แสดงความสัมพันธ์ระหว่าง **TPR** และ **FPR** ที่ threshold ต่างๆ

**True Positive Rate (TPR):**
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
  <mrow>
    <mi>TPR</mi>
    <mo>=</mo>
    <mfrac>
      <mi>TP</mi>
      <mrow><mi>TP</mi><mo>+</mo><mi>FN</mi></mrow>
    </mfrac>
  </mrow>
</math>

**False Positive Rate (FPR):**
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
  <mrow>
    <mi>FPR</mi>
    <mo>=</mo>
    <mfrac>
      <mi>FP</mi>
      <mrow><mi>FP</mi><mo>+</mo><mi>TN</mi></mrow>
    </mfrac>
  </mrow>
</math>

---

## 6.4.2 AUC (Area Under Curve)

**AUC** คือพื้นที่ใต้กราฟ ROC Curve:

- **AUC = 1.0**: โมเดลสมบูรณ์แบบ
- **AUC = 0.5**: โมเดลสุ่ม (ไม่มีประโยชน์)
- **AUC < 0.5**: โมเดลแย่กว่าสุ่ม

---

## 6.5 Metrics สำหรับ Regression

| Metric | สูตร | คำอธิบาย |
|--------|------|----------|
| **MAE** | Σ\|y - ŷ\| / m | Mean Absolute Error |
| **MSE** | Σ(y - ŷ)² / m | Mean Squared Error |
| **RMSE** | √(MSE) | Root Mean Squared Error |
| **R²** | 1 - SS_res/SS_tot | Coefficient of Determination |

---

## 6.6 Python Implementation

```python
def accuracy(y_true, y_pred):
    return np.mean(y_true == y_pred)

def precision(y_true, y_pred):
    tp = np.sum((y_true == 1) & (y_pred == 1))
    fp = np.sum((y_true == 0) & (y_pred == 1))
    return tp / (tp + fp) if (tp + fp) > 0 else 0

def recall(y_true, y_pred):
    tp = np.sum((y_true == 1) & (y_pred == 1))
    fn = np.sum((y_true == 1) & (y_pred == 0))
    return tp / (tp + fn) if (tp + fn) > 0 else 0

def f1_score(y_true, y_pred):
    prec = precision(y_true, y_pred)
    rec = recall(y_true, y_pred)
    return 2 * (prec * rec) / (prec + rec) if (prec + rec) > 0 else 0
```

---

## 6.7 การเลือก Metric ที่เหมาะสม

subgraph BALANCED["ข้อมูลสมดุล"]
        B1["ใช้ Accuracy ได้เลย"]
    end

subgraph IMBALANCED["ข้อมูลไม่สมดุล"]
        B2["หลีกเลี่ยง Accuracy"]
        B3["ใช้ F1, Precision, Recall"]
    end

subgraph COST["ค่าใช้จ่ายของความผิดพลาด"]
        C1["FP สูง → เน้น Precision"]
        C2["FN สูง → เน้น Recall"]
    end

A --> BALANCED
    A --> IMBALANCED
    IMBALANCED --> COST
```

---

## 6.7 สรุปการเลือก Metric

| สถานการณ์ | Metric ที่แนะนำ | ตัวอย่าง |
|-----------|----------------|----------|
| ข้อมูลสมดุล | Accuracy | การจำแนกภาพทั่วไป |
| ข้อมูลไม่สมดุล | F1, AUC | การตรวจจับการฉ้อโกง |
| FP มีต้นทุนสูง | Precision | Spam Filter |
| FN มีต้นทุนสูง | Recall | การวินิจฉัยโรค |
| ต้องการ Trade-off | F1-Score | หลายกรณี |
| เปรียบเทียบโมเดล | AUC | การเลือกโมเดล |

---

# 7. สรุป
## Summary

---

## Outline: สรุป

7.1 ประเด็นสำคัญที่ได้เรียนรู้

7.2 แผนภาพสรุปความสัมพันธ์

7.3 แนวทางการศึกษาต่อ

7.4 การใช้งาน Scikit-learn

---

## 7.1 ประเด็นสำคัญ (1)

1. **Discriminative Models** มุ่งเน้นการเรียนรู้ขอบเขตการตัดสินใจโดยตรง ให้ประสิทธิภาพดีกว่า Generative Models เมื่อมีข้อมูลมาก

2. **Linear Regression** ใช้สำหรับทำนายค่าต่อเนื่อง หาคำตอบได้ทั้ง Closed-form และ Gradient Descent

---

## 7.1 ประเด็นสำคัญ (2)

3. **Logistic Regression** ใช้สำหรับการจำแนกประเภท ใช้ Sigmoid Function และ Binary Cross-Entropy

4. **Perceptron** เป็นต้นแบบของ Neural Networks รับประกันการลู่เข้าถ้าข้อมูลแยกได้เชิงเส้น

5. **การประเมินผล** ควรเลือก Metric ที่เหมาะสมกับปัญหา

---

## 7.2 แผนภาพสรุป

```mermaid
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#458588', 'primaryTextColor': '#ebdbb2', 'primaryBorderColor': '#ebdbb2', 'lineColor': '#ebdbb2', 'secondaryColor': '#689d6a', 'tertiaryColor': '#d79921', 'background': '#282828', 'mainBkg': '#282828', 'textColor': '#ebdbb2'}}}%%
flowchart TB
    subgraph SUMMARY["สรุป Discriminative Models"]
        subgraph REGRESSION["Regression Task"]
            LR["Linear Regression"]
        end
        subgraph CLASSIFICATION["Classification Task"]
            LOG["Logistic Regression"]
            PER["Perceptron"]
        end
        subgraph EVAL["การประเมินผล"]
            E1["Regression: MSE, R²"]
            E2["Classification: Accuracy, F1, AUC"]
        end
        REGRESSION --> E1
        CLASSIFICATION --> E2
    end
```

---

## 7.3 แนวทางการศึกษาต่อ

- **Neural Networks & Deep Learning:** Multi-Layer Perceptron, Backpropagation
- **Support Vector Machines:** Kernel Methods, Margin Maximization
- **Regularization:** L1, L2, Elastic Net
- **Advanced Topics:** Gradient Boosting, Random Forest

---

## 7.4 สรุปการใช้งาน Scikit-learn

| โมเดล | Class | Parameters สำคัญ |
|-------|-------|-----------------|
| Linear Regression | `LinearRegression` | `fit_intercept` |
| Logistic Regression | `LogisticRegression` | `C`, `penalty`, `solver`, `multi_class` |
| Perceptron | `Perceptron` | `penalty`, `alpha`, `max_iter`, `eta0` |

---

## 7.4 Pattern การใช้งานทั่วไป

```python
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# 1. เตรียมข้อมูล
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# 2. Feature Scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# 3. สร้างและฝึกโมเดล
model = SomeModel(params)
model.fit(X_train_scaled, y_train)

# 4. ทำนาย
y_pred = model.predict(X_test_scaled)
```

---

## เมื่อไหร่ควรใช้โมเดลไหน

| สถานการณ์ | โมเดลที่แนะนำ |
|-----------|--------------|
| ทำนายค่าต่อเนื่อง | Linear Regression |
| Classification + ต้องการความน่าจะเป็น | Logistic Regression |
| Classification + ข้อมูลแยกได้เชิงเส้นชัดเจน | Perceptron |
| ต้องการ Baseline ง่ายๆ | Linear/Logistic Regression |
| ข้อมูล High-dimensional, sparse | Perceptron หรือ Logistic (with L1) |

---

## เอกสารอ้างอิง

1. Bishop, C. M. (2006). *Pattern Recognition and Machine Learning*. Springer.
2. Murphy, K. P. (2012). *Machine Learning: A Probabilistic Perspective*. MIT Press.
3. Hastie, T., Tibshirani, R., & Friedman, J. (2009). *The Elements of Statistical Learning*. Springer.
4. Ng, A. (2018). Machine Learning Course. Stanford University / Coursera.
5. Rosenblatt, F. (1958). The perceptron: A probabilistic model.

---

## เอกสารอ้างอิง (ต่อ)

6. Cox, D. R. (1958). The regression analysis of binary sequences.
7. Scikit-learn Documentation. https://scikit-learn.org/
8. Andrew Ng's CS229 Lecture Notes. Stanford University. http://cs229.stanford.edu/

---

# คำถาม - ข้อสงสัย
<img src="/revealjs/pics/Designer.png" width="55%" />