reveal.js

# โครงข่ายประสาทเทียม (Neural Networks)
**ผู้จัดทำ:** อรรถพล คงหวาน

---

## Outline

1. บทนำสู่โครงข่ายประสาทเทียม
2. Multi-Layer Perceptron (MLP)
3. Backpropagation Algorithm
4. Activation Functions
5. Deep Learning เบื้องต้น
6. Convolutional Neural Networks (CNN) พื้นฐาน
7. สรุปโดยรวม

---

# 1. บทนำสู่โครงข่ายประสาทเทียม

---

## Outline: บทนำสู่โครงข่ายประสาทเทียม

1.1 ความเป็นมาและแรงบันดาลใจ

1.2 ประวัติศาสตร์และพัฒนาการ

1.3 องค์ประกอบพื้นฐานของ Artificial Neuron

---

## 1.1 ความเป็นมาและแรงบันดาลใจ

**โครงข่ายประสาทเทียม (Artificial Neural Networks: ANN)** คือแบบจำลองทางคณิตศาสตร์ที่ได้รับแรงบันดาลใจจากการทำงานของระบบประสาทในสมองของสิ่งมีชีวิต

โดยเลียนแบบกระบวนการเรียนรู้และประมวลผลข้อมูลของเซลล์ประสาท (Neurons) ที่เชื่อมต่อกันเป็นเครือข่าย

---

## เซลล์ประสาทชีวภาพ (Biological Neuron)

ประกอบด้วยส่วนสำคัญดังนี้:

- **เดนไดรต์ (Dendrites):** รับสัญญาณจากเซลล์ประสาทอื่น
- **ตัวเซลล์ (Cell Body/Soma):** ประมวลผลสัญญาณที่ได้รับ
- **แอกซอน (Axon):** ส่งสัญญาณออกไปยังเซลล์ประสาทอื่น
- **ไซแนปส์ (Synapse):** จุดเชื่อมต่อระหว่างเซลล์ประสาท

---

## การเปรียบเทียบ Biological vs Artificial Neuron

```mermaid
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#458588', 'primaryTextColor': '#ebdbb2', 'primaryBorderColor': '#83a598', 'lineColor': '#a89984', 'secondaryColor': '#b16286', 'tertiaryColor': '#282828', 'background': '#282828', 'mainBkg': '#282828', 'textColor': '#ebdbb2'}}}%%
graph LR
    subgraph BIO["เซลล์ประสาทชีวภาพ"]
        D1["เดนไดรต์"] --> CB["ตัวเซลล์"]
        D2["เดนไดรต์"] --> CB
        CB --> AX["แอกซอน"]
        AX --> SY["ไซแนปส์"]
    end
```

---

## การเปรียบเทียบ Biological vs Artificial Neuron (2)

```mermaid
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#b16286', 'primaryTextColor': '#ebdbb2', 'primaryBorderColor': '#d3869b', 'lineColor': '#a89984', 'secondaryColor': '#458588', 'tertiaryColor': '#282828', 'background': '#282828', 'mainBkg': '#282828', 'textColor': '#ebdbb2'}}}%%
graph LR
    subgraph ART["เซลล์ประสาทเทียม"]
        X1["อินพุต x₁"] -->|"w₁"| SUM["Σ"]
        X2["อินพุต x₂"] -->|"w₂"| SUM
        X3["อินพุต x₃"] -->|"w₃"| SUM
        SUM --> ACT["f"]
        ACT --> OUT["เอาต์พุต y"]
    end
```

---

## 1.2 ประวัติศาสตร์และพัฒนาการ

```mermaid
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#d65d0e', 'primaryTextColor': '#ebdbb2', 'primaryBorderColor': '#fe8019', 'lineColor': '#a89984', 'background': '#282828', 'mainBkg': '#282828', 'textColor': '#ebdbb2'}}}%%
timeline
    title วิวัฒนาการของ Neural Networks
    1943 : McCulloch-Pitts โมเดลเซลล์ประสาทแรก
    1958 : Rosenblatt Perceptron
    1969 : Minsky & Papert ชี้ข้อจำกัด XOR
    1986 : Backpropagation
    1989 : LeCun CNN
```

---

## ประวัติศาสตร์: ยุคบุกเบิก (1943-1969)

- **1943: McCulloch-Pitts** - โมเดลเซลล์ประสาทแรก
- **1958: Rosenblatt** - Perceptron
- **1969: Minsky & Papert** - หนังสือ Perceptrons ชี้ข้อจำกัด XOR

ทำให้เกิด "AI Winter" ช่วงแรก

---

## ประวัติศาสตร์: ยุคฟื้นฟู (1980-1995)

- **1986: Rumelhart et al.** - Backpropagation Algorithm
- **1989: LeCun** - CNN สำหรับจำแนกตัวเลข
- **1995: Vapnik** - Support Vector Machines

การกลับมาของความสนใจใน Neural Networks

---

## ประวัติศาสตร์: ยุค Deep Learning (2006-ปัจจุบัน)

- **2006: Hinton** - Deep Belief Networks
- **2012: AlexNet** - ชนะ ImageNet
- **2014: GANs** - โดย Goodfellow
- **2017: Transformer** - Attention Is All You Need
- **2022+: LLMs** - GPT, Claude, etc.

---

## 1.3 องค์ประกอบพื้นฐานของ Artificial Neuron

**Artificial Neuron** หรือ **Perceptron** ประกอบด้วย 4 ส่วน:

1. **อินพุต (Inputs):** x₁, x₂, ..., xₙ
2. **น้ำหนัก (Weights):** w₁, w₂, ..., wₙ
3. **ไบแอส (Bias):** b
4. **ฟังก์ชันกระตุ้น (Activation Function):** f

---

## สมการพื้นฐานของ Artificial Neuron

**ผลรวมถ่วงน้ำหนัก (Weighted Sum):**

---

## สมการเอาต์พุตของ Artificial Neuron

**เอาต์พุต:**

---

## คำอธิบายตัวแปร

| ตัวแปร | ความหมาย |
|--------|----------|
| **z** | ผลรวมถ่วงน้ำหนัก (Pre-activation) |
| **xᵢ** | อินพุตตัวที่ i |
| **wᵢ** | น้ำหนักของอินพุตตัวที่ i |
| **b** | ไบแอส (Bias) |
| **f** | ฟังก์ชันกระตุ้น |
| **y** | เอาต์พุตของเซลล์ประสาท |

---

# 2. Multi-Layer Perceptron (MLP)

---

## Outline: Multi-Layer Perceptron

2.1 โครงสร้างของ MLP

2.2 การคำนวณแบบ Forward Propagation

2.3 ตัวอย่างการคำนวณ

2.4 Code Implementation

2.5 การใช้งาน MLP ด้วย Scikit-learn\

2.6 Universal Approximation Theorem

---

## 2.1 โครงสร้างของ MLP

**Multi-Layer Perceptron (MLP)** หรือ **Feedforward Neural Network** คือโครงข่ายประสาทเทียมที่ประกอบด้วยหลายชั้น

ข้อมูลไหลจากชั้นอินพุตไปยังชั้นเอาต์พุตในทิศทางเดียว (Forward Direction) โดยไม่มีการวนกลับ

---

## ชั้นต่างๆ ใน MLP

- **ชั้นอินพุต (Input Layer):** รับข้อมูลจากภายนอก
- **ชั้นซ่อน (Hidden Layers):** ประมวลผลข้อมูลระดับกลาง (มีได้หลายชั้น)
- **ชั้นเอาต์พุต (Output Layer):** ให้ผลลัพธ์สุดท้าย

---

## โครงสร้าง MLP

```mermaid
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#458588', 'primaryTextColor': '#ebdbb2', 'primaryBorderColor': '#83a598', 'lineColor': '#a89984', 'background': '#282828', 'mainBkg': '#282828', 'textColor': '#ebdbb2'}}}%%
graph LR
    subgraph INPUT["Input Layer"]
        I1((x₁))
        I2((x₂))
        I3((x₃))
    end
    subgraph HIDDEN["Hidden Layer"]
        H1((h₁))
        H2((h₂))
        H3((h₃))
        H4((h₄))
    end
    subgraph OUTPUT["Output Layer"]
        O1((ŷ₁))
        O2((ŷ₂))
    end
    I1 --> H1 & H2 & H3 & H4
    I2 --> H1 & H2 & H3 & H4
    I3 --> H1 & H2 & H3 & H4
    H1 --> O1 & O2
    H2 --> O1 & O2
    H3 --> O1 & O2
    H4 --> O1 & O2
```

---

## 2.2 Forward Propagation

**Forward Propagation** คือกระบวนการคำนวณเอาต์พุตจากอินพุต โดยคำนวณทีละชั้นจากชั้นอินพุตไปยังชั้นเอาต์พุต

---

## สมการ Forward Propagation

**Pre-activation:**

**Activation:**

---

## คำอธิบายตัวแปร Forward Propagation

| ตัวแปร | ความหมาย |
|--------|----------|
| **z⁽ˡ⁾** | Pre-activation ของชั้นที่ l |
| **W⁽ˡ⁾** | เมทริกซ์น้ำหนักของชั้นที่ l (nₗ × nₗ₋₁) |
| **a⁽ˡ⁻¹⁾** | Activation จากชั้นก่อนหน้า |
| **b⁽ˡ⁾** | เวกเตอร์ไบแอสของชั้นที่ l |
| **f** | ฟังก์ชันกระตุ้น (element-wise) |

---

## 2.3 ตัวอย่างการคำนวณ

**MLP ขนาดเล็ก:**
- ชั้นอินพุต: 2 โหนด
- ชั้นซ่อน: 2 โหนด (ใช้ ReLU)
- ชั้นเอาต์พุต: 1 โหนด (ใช้ Sigmoid)

---

## ข้อมูลที่กำหนด

- อินพุต: **x** = [1, 2]ᵀ
- น้ำหนักชั้นซ่อน: **W⁽¹⁾** = [[0.5, 0.3], [0.2, 0.4]]
- ไบแอสชั้นซ่อน: **b⁽¹⁾** = [0.1, 0.1]ᵀ
- น้ำหนักชั้นเอาต์พุต: **W⁽²⁾** = [[0.6, 0.5]]
- ไบแอสชั้นเอาต์พุต: **b⁽²⁾** = [0.2]

---

## ขั้นที่ 1: คำนวณชั้นซ่อน

**Pre-activation:**
- z₁⁽¹⁾ = (0.5 × 1) + (0.3 × 2) + 0.1 = **1.2**
- z₂⁽¹⁾ = (0.2 × 1) + (0.4 × 2) + 0.1 = **1.1**

**Activation (ReLU):**
- a₁⁽¹⁾ = max(0, 1.2) = **1.2**
- a₂⁽¹⁾ = max(0, 1.1) = **1.1**

---

## ขั้นที่ 2: คำนวณชั้นเอาต์พุต

**Pre-activation:**
- z⁽²⁾ = (0.6 × 1.2) + (0.5 × 1.1) + 0.2 = **1.47**

**Activation (Sigmoid):**
- ŷ = 1 / (1 + e⁻¹·⁴⁷) = 1 / 1.229 ≈ **0.814**

---

## 2.4 Code Implementation: MLP Class

```python
class MultiLayerPerceptron:
    def __init__(self, layer_sizes, activation='relu',
                 learning_rate=0.01):
        self.layer_sizes = layer_sizes
        self.num_layers = len(layer_sizes)
        self.learning_rate = learning_rate
        self.weights = []
        self.biases = []
        # Initialize weights
        for i in range(1, self.num_layers):
            w = np.random.randn(layer_sizes[i],
                layer_sizes[i-1]) * np.sqrt(2.0/layer_sizes[i-1])
            self.weights.append(w)
            self.biases.append(np.zeros((layer_sizes[i], 1)))
```

---

## MLP: Forward Method

```python
def forward(self, X):
    self.activations = [X]
    self.z_values = []
    a = X
    for i in range(len(self.weights) - 1):
        z = np.dot(self.weights[i], a) + self.biases[i]
        self.z_values.append(z)
        a = self.activation(z)  # ReLU
        self.activations.append(a)
    # Output layer with sigmoid
    z = np.dot(self.weights[-1], a) + self.biases[-1]
    a = 1 / (1 + np.exp(-z))
    self.activations.append(a)
    return a
```

---

## 2.5 MLPClassifier (Scikit-learn)

```python
from sklearn.neural_network import MLPClassifier

mlp = MLPClassifier(
    hidden_layer_sizes=(64, 32),  # 2 ชั้นซ่อน
    activation='relu',
    solver='adam',
    alpha=0.001,              # L2 regularization
    max_iter=500,
    early_stopping=True,
    random_state=42
)

mlp.fit(X_train, y_train)
y_pred = mlp.predict(X_test)
```

---

## MLPRegressor (Scikit-learn)

```python
from sklearn.neural_network import MLPRegressor

mlp_reg = MLPRegressor(
    hidden_layer_sizes=(100, 50, 25),  # 3 ชั้นซ่อน
    activation='relu',
    solver='adam',
    alpha=0.01,
    max_iter=1000,
    early_stopping=True,
    random_state=42
)

mlp_reg.fit(X_train, y_train)
y_pred = mlp_reg.predict(X_test)
```

---

## 2.6 Universal Approximation Theorem

**ทฤษฎีบท Universal Approximation** กล่าวว่า:

MLP ที่มีชั้นซ่อนเพียงชั้นเดียวที่มีจำนวนโหนดเพียงพอ และใช้ฟังก์ชันกระตุ้นที่ไม่เป็นเชิงเส้น สามารถประมาณค่าฟังก์ชันต่อเนื่องใดๆ บนเซตปิดและมีขอบเขตได้

---

## ความหมายในทางปฏิบัติ

- Neural Networks มีความสามารถในการเรียนรู้รูปแบบที่ซับซ้อนได้
- ไม่ได้บอกว่าต้องใช้โหนดกี่ตัว หรือจะหาค่าน้ำหนักได้อย่างไร
- Deep Networks (หลายชั้น) มักมีประสิทธิภาพดีกว่า Shallow Networks ที่กว้างมาก

---

# 3. Backpropagation Algorithm

---

## Outline: Backpropagation

3.1 แนวคิดพื้นฐาน

3.2 Chain Rule และการไหลของ Gradient

3.3 สมการ Backpropagation

3.4 ตัวอย่างการคำนวณแบบละเอียด

3.5 Gradient Descent และ Optimization

3.6 Advanced Optimizers

---

## 3.1 แนวคิดพื้นฐาน

**Backpropagation** (Backward Propagation of Errors) คืออัลกอริทึมสำหรับคำนวณ **Gradient** ของฟังก์ชัน Loss

ใช้ **Chain Rule** ในการคำนวณย้อนกลับจากชั้นเอาต์พุตไปยังชั้นอินพุต

---

## กระบวนการ Forward และ Backward

```mermaid
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#98971a', 'primaryTextColor': '#eeeeee', 'primaryBorderColor': '#b8bb26', 'lineColor': '#a89984', 'background': '#282828', 'mainBkg': '#282828', 'textColor': '#ebdbb2'}}}%%
graph LR
    subgraph FORWARD["Forward Pass"]
        F1["อินพุต x"] --> F2["ชั้นซ่อน h"]
        F2 --> F3["เอาต์พุต ŷ"]
        F3 --> F4["Loss L"]
    end
```

---

## กระบวนการ Backward Pass

```mermaid
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#cc241d', 'primaryTextColor': '#ebdbb2', 'primaryBorderColor': '#fb4934', 'lineColor': '#a89984', 'background': '#282828', 'mainBkg': '#282828', 'textColor': '#ebdbb2'}}}%%
graph RL
    subgraph BACKWARD["Backward Pass"]
        B1["∂L/∂ŷ"] --> B2["∂L/∂h"]
        B2 --> B3["∂L/∂W"]
        B3 --> B4["อัปเดต W"]
    end
```

---

## 3.2 Chain Rule

**Chain Rule** ในแคลคูลัส: ถ้า y = f(g(x)) แล้ว:

---

## การประยุกต์ Chain Rule ใน NN

---

## 3.3 นิยาม Error Signal (δ)

**Error Signal** คือ gradient ของ Loss เทียบกับ pre-activation

---

## Error Signal ของชั้นเอาต์พุต

สำหรับ Cross-Entropy Loss กับ Sigmoid:

(ค่าทำนาย - ค่าจริง)

---

## Error Signal ของชั้นซ่อน

**⊙** = Hadamard product (element-wise multiplication)

---

## Gradient ของน้ำหนักและไบแอส

**Gradient ของน้ำหนัก:**

**Gradient ของไบแอส:**

---

## 3.4 ตัวอย่างการคำนวณ Backpropagation

**โจทย์:** MLP แบบง่าย
- อินพุต: x = 0.5
- น้ำหนัก: w₁ = 0.8, w₂ = 0.6
- ไบแอส: b₁ = 0.2, b₂ = 0.3
- ค่าเป้าหมาย: y = 1
- ใช้ Sigmoid ทุกชั้น, MSE Loss

---

## ขั้นที่ 1: Forward Pass

**ชั้นซ่อน:**
- z₁ = 0.8 × 0.5 + 0.2 = **0.6**
- a₁ = σ(0.6) = **0.6457**

**ชั้นเอาต์พุต:**
- z₂ = 0.6 × 0.6457 + 0.3 = **0.6874**
- ŷ = σ(0.6874) = **0.6653**

---

## ขั้นที่ 2: คำนวณ Loss

---

## ขั้นที่ 3: คำนวณ δ₂ (ชั้นเอาต์พุต)

∂L/∂ŷ = -(y - ŷ) = -(1 - 0.6653) = **-0.3347**

δ₂ = ∂L/∂ŷ × σ'(z₂)
δ₂ = -0.3347 × 0.6653 × (1 - 0.6653) = **-0.0746**

---

## ขั้นที่ 3: คำนวณ Gradient ของ w₂

---

## ขั้นที่ 3: คำนวณ δ₁ (ชั้นซ่อน)

δ₁ = (w₂ × δ₂) × σ'(z₁)

δ₁ = (0.6 × -0.0746) × 0.6457 × (1 - 0.6457) = **-0.0102**

---

## ขั้นที่ 3: คำนวณ Gradient ของ w₁

---

## ขั้นที่ 4: อัปเดตน้ำหนัก (η = 0.1)

**อัปเดต w₂:**

**อัปเดต w₁:**

---

## 3.5 Gradient Descent

**Gradient Descent** คือวิธีการอัปเดตน้ำหนักเพื่อลดค่า Loss:

- **θ:** พารามิเตอร์ทั้งหมด
- **η:** Learning Rate

---

## ประเภทของ Gradient Descent

| ประเภท | Batch Size | ข้อดี | ข้อเสีย |
|--------|------------|-------|---------|
| **Batch GD** | ทั้งหมด | เสถียร | ช้ามาก |
| **SGD** | 1 ตัวอย่าง | เร็ว | Noisy |
| **Mini-batch** | 32-256 | สมดุล | ต้องเลือก size |

---

## 3.6 Advanced Optimizers

---

## Adam Optimizer

**Adam** (Adaptive Moment Estimation):

---

## Adam: การอัปเดตพารามิเตอร์

**Default values:**
- β₁ = 0.9, β₂ = 0.999, ε = 10⁻⁸

---

# 4. Activation Functions

---

## Outline: Activation Functions

4.1 ความสำคัญของ Activation Functions

4.2 ประเภทของ Activation Functions

4.3 ตารางเปรียบเทียบ

4.4 Code Implementation

---

## 4.1 ความสำคัญของ Activation Functions

**Activation Functions** ทำให้ Neural Networks สามารถเรียนรู้ความสัมพันธ์ที่ไม่เป็นเชิงเส้น (Non-linear) ได้

---

## ทำไมต้องใช้ Non-linear Activation?

หากใช้เฉพาะฟังก์ชันเชิงเส้น ไม่ว่าจะมีกี่ชั้นก็จะลดรูปเป็นฟังก์ชันเชิงเส้นตัวเดียว:

---

## 4.2.1 Sigmoid Function

- **ช่วงค่า:** (0, 1)
- **ข้อดี:** ตีความเป็นความน่าจะเป็น
- **ข้อเสีย:** Vanishing Gradient

---

## Sigmoid: อนุพันธ์

เมื่อ |z| มาก → σ'(z) → 0 (**Vanishing Gradient**)

---

## 4.2.2 Hyperbolic Tangent (tanh)

- **ช่วงค่า:** (-1, 1)
- **ข้อดี:** Zero-centered
- **ข้อเสีย:** ยังมี Vanishing Gradient

---

## tanh: อนุพันธ์

---

## 4.2.3 ReLU (Rectified Linear Unit)

- **ช่วงค่า:** [0, ∞)
- **ข้อดี:** คำนวณง่าย, ไม่มี Vanishing Gradient (z > 0)
- **ข้อเสีย:** Dead ReLU Problem

---

## ReLU: อนุพันธ์

---

## 4.2.4 Leaky ReLU

<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
  <mi>LeakyReLU</mi>
  <mo>(</mo><mi>z</mi><mo>)</mo>
  <mo>=</mo>
  <mrow>
    <mo>{</mo>
    <mtable>
      <mtr><mtd><mi>z</mi></mtd><mtd><mtext> ถ้า </mtext><mi>z</mi><mo>></mo><mn>0</mn></mtd></mtr>
      <mtr><mtd><mi>α</mi><mi>z</mi></mtd><mtd><mtext> ถ้า </mtext><mi>z</mi><mo>≤</mo><mn>0</mn></mtd></mtr>
    </mtable>
  </mrow>
</math>

โดยทั่วไป α = 0.01 (แก้ปัญหา Dead ReLU)

---

## 4.2.5 Softmax (Multi-class)

<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
  <mi>Softmax</mi>
  <msub>
    <mrow><mo>(</mo><mi mathvariant="bold">z</mi><mo>)</mo></mrow>
    <mi>i</mi>
  </msub>
  <mo>=</mo>
  <mfrac>
    <msup><mi>e</mi><msub><mi>z</mi><mi>i</mi></msub></msup>
    <mrow>
      <munderover>
        <mo>∑</mo>
        <mrow><mi>j</mi><mo>=</mo><mn>1</mn></mrow>
        <mi>K</mi>
      </munderover>
      <msup><mi>e</mi><msub><mi>z</mi><mi>j</mi></msub></msup>
    </mrow>
  </mfrac>
</math>

**คุณสมบัติ:** ผลรวม = 1 (ตีความเป็นความน่าจะเป็น)

---

## 4.3 ตารางเปรียบเทียบ Activation Functions

| Activation | ช่วงค่า | การใช้งาน |
|------------|---------|-----------|
| **Sigmoid** | (0,1) | Output (binary) |
| **tanh** | (-1,1) | Hidden (RNN) |
| **ReLU** | [0,∞) | Hidden (CNN) |
| **Leaky ReLU** | (-∞,∞) | Hidden layers |
| **Softmax** | (0,1), Σ=1 | Output (multi-class) |

---

## 4.4 Code Implementation

```python
class ActivationFunctions:
    @staticmethod
    def sigmoid(z, derivative=False):
        sig = 1 / (1 + np.exp(-np.clip(z, -500, 500)))
        if derivative:
            return sig * (1 - sig)
        return sig

@staticmethod
    def relu(z, derivative=False):
        if derivative:
            return np.where(z > 0, 1, 0).astype(float)
        return np.maximum(0, z)
```

---

## Code: tanh และ Softmax

```python
    @staticmethod
    def tanh(z, derivative=False):
        t = np.tanh(z)
        if derivative:
            return 1 - t**2
        return t

@staticmethod
    def softmax(z):
        exp_z = np.exp(z - np.max(z, axis=0, keepdims=True))
        return exp_z / np.sum(exp_z, axis=0, keepdims=True)
```

---

# 5. Deep Learning เบื้องต้น

---

## Outline: Deep Learning

5.1 ความหมายของ Deep Learning

5.2 คุณสมบัติสำคัญ

5.3 ความท้าทายในการฝึก

5.4 เทคนิคสำคัญ

5.5 Regularization Techniques

5.6 Loss Functions

---

## 5.1 ความหมายของ Deep Learning

**Deep Learning** คือสาขาย่อยของ Machine Learning ที่ใช้ **โครงข่ายประสาทเทียมลึก (Deep Neural Networks)**

มีหลายชั้นซ่อน (Hidden Layers) ในการเรียนรู้ **Hierarchical Representations**

---

## ความสัมพันธ์ AI, ML, DL

```mermaid
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#d65d0e', 'primaryTextColor': '#ebdbb2', 'primaryBorderColor': '#fe8019', 'lineColor': '#a89984', 'background': '#282828', 'mainBkg': '#282828', 'textColor': '#ebdbb2'}}}%%
graph TB
    subgraph AI["Artificial Intelligence"]
        subgraph ML["Machine Learning"]
            subgraph DL["Deep Learning"]
                CNN["CNN"]
                RNN["RNN"]
                TF["Transformer"]
            end
        end
    end
```

---

## 5.2 Hierarchical Feature Learning

Deep Networks เรียนรู้ features แบบลำดับชั้น:

| ระดับชั้น | ประเภท Feature | ตัวอย่าง (ภาพ) |
|-----------|----------------|----------------|
| ชั้นแรก | Low-level | ขอบ, มุม, สี |
| ชั้นกลาง | Mid-level | รูปทรง, texture |
| ชั้นลึก | High-level | วัตถุ, ใบหน้า |

---

## 5.3 Vanishing Gradient Problem

เมื่อโครงข่ายลึกมาก gradient จะเล็กลงเรื่อยๆ เมื่อ backpropagate ผ่านหลายชั้น

**วิธีแก้ไข:**
- ใช้ ReLU แทน Sigmoid/tanh
- Batch Normalization
- Residual Connections (Skip Connections)
- Proper Weight Initialization

---

## 5.3 Exploding Gradient Problem

Gradient มีค่าใหญ่มากจนทำให้น้ำหนักอัปเดตไม่เสถียร

**วิธีแก้ไข:**
- Gradient Clipping
- Batch Normalization
- Careful Learning Rate Selection

---

## 5.4.1 Weight Initialization

**Xavier/Glorot Initialization:**

**He Initialization (สำหรับ ReLU):**

---

## 5.4.2 Batch Normalization

Normalize activation ในแต่ละ mini-batch:

**γ, β** = พารามิเตอร์ที่เรียนรู้ได้

---

## 5.4.3 Dropout

**Dropout** คือเทคนิค Regularization ที่สุ่ม "ปิด" neurons ระหว่างการฝึก

**ประโยชน์:**
- ป้องกัน Overfitting
- สร้าง Ensemble Effect โดยปริยาย

---

## 5.5 Regularization Techniques

| เทคนิค | วิธีการ |
|--------|---------|
| **L2 Regularization** | เพิ่ม λΣw² ใน Loss |
| **L1 Regularization** | เพิ่ม λΣ\|w\| ใน Loss |
| **Dropout** | สุ่มปิด neurons |
| **Data Augmentation** | เพิ่มข้อมูลด้วยการแปลง |
| **Early Stopping** | หยุดเมื่อ val loss ไม่ลด |

---

## 5.6.1 MSE (Regression)

---

## 5.6.2 Binary Cross-Entropy

---

## 5.6.3 Categorical Cross-Entropy

---

# 6. Convolutional Neural Networks (CNN)

---

## Outline: CNN

6.1 แนวคิดและแรงบันดาลใจ

6.2 องค์ประกอบหลัก

6.3 Convolution Operation

6.4 Hyperparameters

6.5 Pooling Layer

6.6 Convolutional vs Fully Connected

6.7 สถาปัตยกรรม CNN ที่สำคัญ

6.8-6.10 Code Implementation และการประยุกต์ใช้

---

## 6.1 แนวคิดและแรงบันดาลใจ

**CNN (ConvNets)** ออกแบบมาสำหรับข้อมูลที่มีโครงสร้างแบบตาราง เช่น รูปภาพ

**แรงบันดาลใจจากระบบการมองเห็น:**
- Visual Cortex มี neurons ที่ตอบสนองต่อ features เฉพาะในพื้นที่เล็กๆ
- CNN ใช้ **Local Connectivity** และ **Parameter Sharing**

---

## 6.2 องค์ประกอบหลักของ CNN

---

## 6.3 Convolution Operation

**Convolution** คือการเลื่อน Filter/Kernel ผ่านภาพและคำนวณผลรวมถ่วงน้ำหนัก

---

## ตัวอย่างการคำนวณ Convolution

```
Input (3×3):        Kernel (2×2):
[1  2  3]           [1  0]
[4  5  6]           [0  1]
[7  8  9]
```

**Feature Map (2×2):**
- (0,0): 1×1 + 2×0 + 4×0 + 5×1 = **6**
- (0,1): 2×1 + 3×0 + 5×0 + 6×1 = **8**
- (1,0): 4×1 + 5×0 + 7×0 + 8×1 = **12**
- (1,1): 5×1 + 6×0 + 8×0 + 9×1 = **14**

---

## 6.4.1 Padding

**Padding** คือการเพิ่มขอบรอบภาพเพื่อควบคุมขนาด output

- **Valid Padding (No Padding):** ไม่เพิ่มขอบ → output เล็กลง
- **Same Padding:** เพิ่มขอบให้ output = input size

---

## 6.4.2 Stride

**Stride** คือระยะการเลื่อน kernel ในแต่ละครั้ง

- **Stride = 1:** เลื่อนทีละ 1 pixel
- **Stride = 2:** เลื่อนทีละ 2 pixels → output เล็กลงครึ่งหนึ่ง

---

## 6.4.3 สูตรคำนวณขนาด Output

- **I:** ขนาด Input, **K:** ขนาด Kernel
- **P:** Padding, **S:** Stride

**ตัวอย่าง:** I=32, K=5, P=2, S=1 → Output = 32

---

## 6.5 Pooling Layer

**Pooling** ลดขนาดของ Feature Map โดยสรุปข้อมูลในพื้นที่เล็กๆ

| ประเภท | วิธีการ |
|--------|---------|
| **Max Pooling** | เลือกค่ามากที่สุด |
| **Average Pooling** | หาค่าเฉลี่ย |
| **Global Average Pooling** | ค่าเฉลี่ยทั้ง feature map |

---

## ตัวอย่าง Max Pooling (2×2, Stride 2)

```
Input (4×4):           Output (2×2):
[1  3  2  4]           [5  6]
[5  2  1  6]   →       [8  9]
[7  8  3  2]
[4  5  9  1]
```

---

## 6.6 Convolutional vs Fully Connected

| คุณสมบัติ | Convolutional | Fully Connected |
|-----------|---------------|-----------------|
| **การเชื่อมต่อ** | Local | Global |
| **Parameter Sharing** | มี | ไม่มี |
| **จำนวน Parameters** | น้อย | มาก |
| **เหมาะกับ** | ภาพ | Tabular data |

---

## 6.7 วิวัฒนาการของ CNN

---

## สถาปัตยกรรม CNN ที่สำคัญ

- **1998: LeNet-5** - ตัวเลข MNIST
- **2012: AlexNet** - ImageNet Champion
- **2014: VGGNet** - เน้นความลึก
- **2014: GoogLeNet** - Inception Module
- **2015: ResNet** - Skip Connections
- **2020: Vision Transformer (ViT)** - Attention-based

---

## 6.8 Simple CNN Class

```python
class Conv2D:
    def __init__(self, num_filters, kernel_size,
                 stride=1, padding=0):
        self.num_filters = num_filters
        self.kernel_size = kernel_size
        self.stride = stride
        self.padding = padding
        # He initialization
        self.filters = np.random.randn(
            num_filters, kernel_size, kernel_size
        ) * np.sqrt(2.0 / (kernel_size * kernel_size))
```

---

## Conv2D: Forward Method

```python
def forward(self, X):
    batch_size, in_h, in_w = X.shape
    out_h = (in_h - self.kernel_size + 2*self.padding) \
            // self.stride + 1
    output = np.zeros((batch_size, self.num_filters,
                       out_h, out_h))

for f in range(self.num_filters):
        for i in range(out_h):
            for j in range(out_h):
                region = X_padded[:, h_start:h_end,
                                  w_start:w_end]
                output[:, f, i, j] = np.sum(
                    region * self.filters[f], axis=(1,2))
    return output
```

---

## 6.9 CNN with TensorFlow/Keras

```python
import tensorflow as tf
from tensorflow.keras import layers, models

model = models.Sequential([
    layers.Conv2D(32, (3,3), activation='relu',
                  padding='same', input_shape=(28,28,1)),
    layers.BatchNormalization(),
    layers.MaxPooling2D((2,2)),

layers.Conv2D(64, (3,3), activation='relu', padding='same'),
    layers.BatchNormalization(),
    layers.MaxPooling2D((2,2)),

layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(10, activation='softmax')
])
```

---

## Compile และ Train

```python
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

history = model.fit(
    X_train, y_train,
    batch_size=128,
    epochs=20,
    validation_split=0.1,
    callbacks=[
        tf.keras.callbacks.EarlyStopping(patience=5),
        tf.keras.callbacks.ReduceLROnPlateau(factor=0.5)
    ]
)
```

---

## Transfer Learning

```python
# โหลด Pre-trained MobileNetV2
base_model = tf.keras.applications.MobileNetV2(
    input_shape=(96, 96, 3),
    include_top=False,
    weights='imagenet'
)
base_model.trainable = False  # Freeze

# เพิ่ม Classification Head ใหม่
model = models.Sequential([
    base_model,
    layers.GlobalAveragePooling2D(),
    layers.Dense(256, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(10, activation='softmax')
])
```

---

## 6.10 การประยุกต์ใช้ CNN

| งาน | ตัวอย่าง |
|-----|----------|
| **Image Classification** | แมว/หมา, เลข 0-9 |
| **Object Detection** | YOLO, Faster R-CNN |
| **Semantic Segmentation** | U-Net, DeepLab |
| **Face Recognition** | FaceNet |
| **Medical Imaging** | ตรวจจับมะเร็ง |

---

# 7. สรุปโดยรวม

---

## Outline: สรุปโดยรวม

7.1 สรุปเนื้อหาสำคัญ

7.2 ทักษะที่ได้รับ

7.3 แนวทางการศึกษาต่อ

---

## 7.1 สรุปเนื้อหาสำคัญ (1)

**1. Artificial Neuron (Perceptron)**
- รับอินพุต คูณน้ำหนัก บวกไบแอส ผ่าน Activation Function
- เป็นหน่วยพื้นฐานของโครงข่าย

**2. Multi-Layer Perceptron (MLP)**
- ประกอบด้วยหลายชั้น: Input, Hidden, Output
- Forward Propagation: คำนวณจากอินพุตไปเอาต์พุต

---

## 7.1 สรุปเนื้อหาสำคัญ (2)

**3. Backpropagation**
- อัลกอริทึมสำคัญสำหรับการฝึกโครงข่าย
- ใช้ Chain Rule คำนวณ Gradient

**4. Activation Functions**
- ทำให้โครงข่ายเรียนรู้ความสัมพันธ์ Non-linear
- ReLU เป็นที่นิยมสำหรับ Hidden Layers

---

## 7.1 สรุปเนื้อหาสำคัญ (3)

**5. Deep Learning**
- โครงข่ายที่มีหลายชั้นซ่อน
- เทคนิค: Batch Norm, Dropout, Skip Connections

**6. Convolutional Neural Networks (CNN)**
- ออกแบบสำหรับข้อมูลภาพ
- Convolution + Pooling Layers
- ประสบความสำเร็จในงาน Computer Vision

---

## 7.2 ทักษะที่ได้รับ

หลังจากศึกษาบทนี้ ผู้เรียนสามารถ:

1. อธิบายหลักการทำงานของ Neural Networks
2. คำนวณ Forward/Backward Propagation
3. เลือก Activation Function ที่เหมาะสม
4. ระบุปัญหาและวิธีแก้ไขใน Deep Networks
5. อธิบายองค์ประกอบและการทำงานของ CNN
6. ประยุกต์ใช้ Neural Networks กับปัญหาจริง

---

## 7.3 แนวทางการศึกษาต่อ

- **Recurrent Neural Networks (RNN):** สำหรับข้อมูลลำดับ
- **Transformer Architecture:** ใช้ใน LLMs
- **Generative Models:** VAE, GAN, Diffusion Models
- **Reinforcement Learning:** การเรียนรู้จากการลองผิดลองถูก

---

## เอกสารอ้างอิง (1)

1. Goodfellow, I., Bengio, Y., & Courville, A. (2016). *Deep Learning*. MIT Press.
2. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. *Nature*.
3. Rumelhart, D. E., et al. (1986). Learning representations by back-propagating errors. *Nature*.
4. He, K., et al. (2015). Delving deep into rectifiers. *ICCV*.
5. Ioffe, S., & Szegedy, C. (2015). Batch normalization. *ICML*.

---

## เอกสารอ้างอิง (2)

6. Srivastava, N., et al. (2014). Dropout. *JMLR*.
7. Kingma, D. P., & Ba, J. (2015). Adam. *ICLR*.
8. LeCun, Y., et al. (1998). Gradient-based learning. *Proceedings of the IEEE*.
9. Krizhevsky, A., et al. (2012). ImageNet classification with deep CNN. *NeurIPS*.
10. He, K., et al. (2016). Deep residual learning. *CVPR*.

---

# คำถาม - ข้อสงสัย
<img src="/revealjs/pics/Designer.png" width="55%" />