โครงข่ายประสาทเทียม (Neural Networks)

1. บทนำสู่โครงข่ายประสาทเทียม

1.1 ความเป็นมาและแรงบันดาลใจ

โครงข่ายประสาทเทียม (Artificial Neural Networks: ANN) คือแบบจำลองทางคณิตศาสตร์ที่ได้รับแรงบันดาลใจจากการทำงานของระบบประสาทในสมองของสิ่งมีชีวิต โดยเลียนแบบกระบวนการเรียนรู้และประมวลผลข้อมูลของเซลล์ประสาท (Neurons) ที่เชื่อมต่อกันเป็นเครือข่าย

เซลล์ประสาทชีวภาพ (Biological Neuron) ประกอบด้วยส่วนสำคัญดังนี้:

เดนไดรต์ (Dendrites): รับสัญญาณจากเซลล์ประสาทอื่น
ตัวเซลล์ (Cell Body/Soma): ประมวลผลสัญญาณที่ได้รับ
แอกซอน (Axon): ส่งสัญญาณออกไปยังเซลล์ประสาทอื่น
ไซแนปส์ (Synapse): จุดเชื่อมต่อระหว่างเซลล์ประสาท

graph LR
    subgraph BIO["เซลล์ประสาทชีวภาพ (Biological Neuron)"]
        D1["เดนไดรต์
Dendrites"] --> CB["ตัวเซลล์
Cell Body"]
        D2["เดนไดรต์
Dendrites"] --> CB
        D3["เดนไดรต์
Dendrites"] --> CB
        CB --> AX["แอกซอน
Axon"]
        AX --> SY["ไซแนปส์
Synapse"]
    end
    
    subgraph ART["เซลล์ประสาทเทียม (Artificial Neuron)"]
        X1["อินพุต x₁"] --> |"น้ำหนัก w₁"| SUM["ผลรวมถ่วงน้ำหนัก
Σ"]
        X2["อินพุต x₂"] --> |"น้ำหนัก w₂"| SUM
        X3["อินพุต x₃"] --> |"น้ำหนัก w₃"| SUM
        SUM --> ACT["ฟังก์ชันกระตุ้น
Activation f"]
        ACT --> OUT["เอาต์พุต y"]
    end
    
    style BIO fill:#458588,stroke:#83a598,color:#ebdbb2
    style ART fill:#b16286,stroke:#d3869b,color:#ebdbb2
    style D1 fill:#689d6a,stroke:#8ec07c,color:#ebdbb2
    style D2 fill:#689d6a,stroke:#8ec07c,color:#ebdbb2
    style D3 fill:#689d6a,stroke:#8ec07c,color:#ebdbb2
    style CB fill:#d79921,stroke:#fabd2f,color:#282828
    style AX fill:#cc241d,stroke:#fb4934,color:#ebdbb2
    style SY fill:#98971a,stroke:#b8bb26,color:#282828
    style X1 fill:#689d6a,stroke:#8ec07c,color:#ebdbb2
    style X2 fill:#689d6a,stroke:#8ec07c,color:#ebdbb2
    style X3 fill:#689d6a,stroke:#8ec07c,color:#ebdbb2
    style SUM fill:#d79921,stroke:#fabd2f,color:#282828
    style ACT fill:#cc241d,stroke:#fb4934,color:#ebdbb2
    style OUT fill:#98971a,stroke:#b8bb26,color:#282828

1.2 ประวัติศาสตร์และพัฒนาการ

graph TB
    subgraph ERA1["ยุคบุกเบิก (1943-1969)"]
        A1["1943: McCulloch-Pitts
โมเดลเซลล์ประสาทแรก"]
        A2["1958: Rosenblatt
Perceptron"]
        A3["1969: Minsky & Papert
หนังสือ Perceptrons
(ชี้ข้อจำกัด XOR)"]
        A1 --> A2 --> A3
    end
    
    subgraph ERA2["ยุคฟื้นฟู (1980-1995)"]
        B1["1986: Rumelhart et al.
Backpropagation"]
        B2["1989: LeCun
CNN สำหรับตัวเลข"]
        B3["1995: Vapnik
Support Vector Machines"]
        B1 --> B2 --> B3
    end
    
    subgraph ERA3["ยุค Deep Learning (2006-ปัจจุบัน)"]
        C1["2006: Hinton
Deep Belief Networks"]
        C2["2012: AlexNet
ชนะ ImageNet"]
        C3["2014: GANs
โดย Goodfellow"]
        C4["2017: Transformer
Attention Is All You Need"]
        C5["2022+: Large Language Models
GPT, Claude, etc."]
        C1 --> C2 --> C3 --> C4 --> C5
    end
    
    ERA1 --> ERA2 --> ERA3
    
    style ERA1 fill:#282828,stroke:#d65d0e,color:#ebdbb2
    style ERA2 fill:#282828,stroke:#458588,color:#ebdbb2
    style ERA3 fill:#282828,stroke:#b16286,color:#ebdbb2
    style A1 fill:#d65d0e,stroke:#fe8019,color:#282828
    style A2 fill:#d65d0e,stroke:#fe8019,color:#282828
    style A3 fill:#cc241d,stroke:#fb4934,color:#ebdbb2
    style B1 fill:#458588,stroke:#83a598,color:#ebdbb2
    style B2 fill:#458588,stroke:#83a598,color:#ebdbb2
    style B3 fill:#458588,stroke:#83a598,color:#ebdbb2
    style C1 fill:#b16286,stroke:#d3869b,color:#ebdbb2
    style C2 fill:#b16286,stroke:#d3869b,color:#ebdbb2
    style C3 fill:#b16286,stroke:#d3869b,color:#ebdbb2
    style C4 fill:#b16286,stroke:#d3869b,color:#ebdbb2
    style C5 fill:#98971a,stroke:#b8bb26,color:#282828

1.3 องค์ประกอบพื้นฐานของ Artificial Neuron

Artificial Neuron หรือ Perceptron ประกอบด้วยองค์ประกอบหลัก 4 ส่วน:

อินพุต (Inputs): ค่าข้อมูลที่ป้อนเข้าสู่เซลล์ประสาท แทนด้วย x₁, x₂, ..., xₙ
น้ำหนัก (Weights): ค่าที่กำหนดความสำคัญของแต่ละอินพุต แทนด้วย w₁, w₂, ..., wₙ
ไบแอส (Bias): ค่าคงที่ที่เพิ่มเข้าไปเพื่อปรับจุดตัดแกน แทนด้วย b
ฟังก์ชันกระตุ้น (Activation Function): ฟังก์ชันที่แปลงผลรวมถ่วงน้ำหนักเป็นเอาต์พุต แทนด้วย f

สมการพื้นฐานของ Artificial Neuron:

z = \sum_{i = 1}^{n} w_{i} x_{i} + b = w^{T} x + b

y = f (z) = f (w^{T} x + b)

คำอธิบายตัวแปร:

z: ผลรวมถ่วงน้ำหนัก (Weighted Sum) หรือ Pre-activation
xᵢ: อินพุตตัวที่ i
wᵢ: น้ำหนักของอินพุตตัวที่ i
b: ไบแอส (Bias)
f: ฟังก์ชันกระตุ้น (Activation Function)
y: เอาต์พุตของเซลล์ประสาท

2. Multi-Layer Perceptron (MLP)

2.1 โครงสร้างของ MLP

Multi-Layer Perceptron (MLP) หรือ Feedforward Neural Network คือโครงข่ายประสาทเทียมที่ประกอบด้วยหลายชั้น (Layers) โดยข้อมูลจะไหลจากชั้นอินพุตไปยังชั้นเอาต์พุตในทิศทางเดียว (Forward Direction) โดยไม่มีการวนกลับ

ชั้นต่างๆ ใน MLP:

ชั้นอินพุต (Input Layer): รับข้อมูลจากภายนอก
ชั้นซ่อน (Hidden Layers): ประมวลผลข้อมูลระดับกลาง (สามารถมีได้หลายชั้น)
ชั้นเอาต์พุต (Output Layer): ให้ผลลัพธ์สุดท้าย

graph LR
    subgraph INPUT["ชั้นอินพุต
(Input Layer)"]
        I1((x₁))
        I2((x₂))
        I3((x₃))
    end
    
    subgraph HIDDEN1["ชั้นซ่อนที่ 1
(Hidden Layer 1)"]
        H11((h₁⁽¹⁾))
        H12((h₂⁽¹⁾))
        H13((h₃⁽¹⁾))
        H14((h₄⁽¹⁾))
    end
    
    subgraph HIDDEN2["ชั้นซ่อนที่ 2
(Hidden Layer 2)"]
        H21((h₁⁽²⁾))
        H22((h₂⁽²⁾))
        H23((h₃⁽²⁾))
    end
    
    subgraph OUTPUT["ชั้นเอาต์พุต
(Output Layer)"]
        O1((ŷ₁))
        O2((ŷ₂))
    end
    
    I1 --> H11 & H12 & H13 & H14
    I2 --> H11 & H12 & H13 & H14
    I3 --> H11 & H12 & H13 & H14
    
    H11 --> H21 & H22 & H23
    H12 --> H21 & H22 & H23
    H13 --> H21 & H22 & H23
    H14 --> H21 & H22 & H23
    
    H21 --> O1 & O2
    H22 --> O1 & O2
    H23 --> O1 & O2
    
    style INPUT fill:#282828,stroke:#458588,color:#ebdbb2
    style HIDDEN1 fill:#282828,stroke:#b16286,color:#ebdbb2
    style HIDDEN2 fill:#282828,stroke:#d65d0e,color:#ebdbb2
    style OUTPUT fill:#282828,stroke:#98971a,color:#ebdbb2
    style I1 fill:#458588,stroke:#83a598,color:#ebdbb2
    style I2 fill:#458588,stroke:#83a598,color:#ebdbb2
    style I3 fill:#458588,stroke:#83a598,color:#ebdbb2
    style H11 fill:#b16286,stroke:#d3869b,color:#ebdbb2
    style H12 fill:#b16286,stroke:#d3869b,color:#ebdbb2
    style H13 fill:#b16286,stroke:#d3869b,color:#ebdbb2
    style H14 fill:#b16286,stroke:#d3869b,color:#ebdbb2
    style H21 fill:#d65d0e,stroke:#fe8019,color:#282828
    style H22 fill:#d65d0e,stroke:#fe8019,color:#282828
    style H23 fill:#d65d0e,stroke:#fe8019,color:#282828
    style O1 fill:#98971a,stroke:#b8bb26,color:#282828
    style O2 fill:#98971a,stroke:#b8bb26,color:#282828

2.2 การคำนวณแบบ Forward Propagation

Forward Propagation คือกระบวนการคำนวณเอาต์พุตของโครงข่ายจากอินพุตที่กำหนด โดยคำนวณทีละชั้นจากชั้นอินพุตไปยังชั้นเอาต์พุต

สมการสำหรับชั้นที่ l:

z^{(l)} = W^{(l)} a^{(l - 1)} + b^{(l)}

a^{(l)} = f (z^{(l)})

คำอธิบายตัวแปร:

z⁽ˡ⁾: Pre-activation ของชั้นที่ l
W⁽ˡ⁾: เมทริกซ์น้ำหนักของชั้นที่ l (ขนาด nₗ × nₗ₋₁)
a⁽ˡ⁻¹⁾: Activation จากชั้นก่อนหน้า (a⁽⁰⁾ = x คืออินพุต)
b⁽ˡ⁾: เวกเตอร์ไบแอสของชั้นที่ l
f: ฟังก์ชันกระตุ้น (ใช้แบบ element-wise)
a⁽ˡ⁾: Activation ของชั้นที่ l

2.3 ตัวอย่างการคำนวณ Forward Propagation

สมมติให้มี MLP ขนาดเล็กดังนี้:

ชั้นอินพุต: 2 โหนด
ชั้นซ่อน: 2 โหนด (ใช้ ReLU)
ชั้นเอาต์พุต: 1 โหนด (ใช้ Sigmoid)

ข้อมูลที่กำหนด:

อินพุต: x = [1, 2]ᵀ

น้ำหนักชั้นซ่อน: W⁽¹⁾ = [[0.5, 0.3], [0.2, 0.4]]

ไบแอสชั้นซ่อน: b⁽¹⁾ = [0.1, 0.1]ᵀ

น้ำหนักชั้นเอาต์พุต: W⁽²⁾ = [[0.6, 0.5]]

ไบแอสชั้นเอาต์พุต: b⁽²⁾ = [0.2]

ขั้นตอนการคำนวณ:

ขั้นที่ 1: คำนวณชั้นซ่อน

z^{(1)} = W^{(1)} x + b^{(1)}

z₁⁽¹⁾ = (0.5 × 1) + (0.3 × 2) + 0.1 = 0.5 + 0.6 + 0.1 = 1.2

z₂⁽¹⁾ = (0.2 × 1) + (0.4 × 2) + 0.1 = 0.2 + 0.8 + 0.1 = 1.1

ใช้ ReLU: a⁽¹⁾ = ReLU(z⁽¹⁾) = max(0, z⁽¹⁾)

a₁⁽¹⁾ = max(0, 1.2) = 1.2

a₂⁽¹⁾ = max(0, 1.1) = 1.1

ขั้นที่ 2: คำนวณชั้นเอาต์พุต

z^{(2)} = W^{(2)} a^{(1)} + b^{(2)}

z⁽²⁾ = (0.6 × 1.2) + (0.5 × 1.1) + 0.2 = 0.72 + 0.55 + 0.2 = 1.47

ใช้ Sigmoid: ŷ = σ(z⁽²⁾) = 1 / (1 + e⁻ᶻ)

ŷ = 1 / (1 + e⁻¹·⁴⁷) = 1 / (1 + 0.229) = 1 / 1.229 ≈ 0.814

2.4 Code Implementation: MLP

import numpy as np

class MultiLayerPerceptron:
    """
    โครงข่ายประสาทเทียมหลายชั้น (Multi-Layer Perceptron)
    
    คลาสนี้สร้าง MLP ที่สามารถกำหนดจำนวนชั้นและโหนดได้อย่างยืดหยุ่น
    รองรับการฝึกสอนด้วย Backpropagation และ Gradient Descent
    """
    
    def __init__(self, layer_sizes, activation='relu', learning_rate=0.01):
        """
        กำหนดค่าเริ่มต้นของ MLP
        
        Parameters:
        -----------
        layer_sizes : list
            รายการจำนวนโหนดในแต่ละชั้น เช่น [2, 4, 3, 1]
            หมายถึง 2 อินพุต, 2 ชั้นซ่อน (4 และ 3 โหนด), 1 เอาต์พุต
        activation : str
            ฟังก์ชันกระตุ้น ('relu', 'sigmoid', 'tanh')
        learning_rate : float
            อัตราการเรียนรู้
        """
        self.layer_sizes = layer_sizes
        self.num_layers = len(layer_sizes)
        self.learning_rate = learning_rate
        self.activation_name = activation
        
        # กำหนดค่าเริ่มต้นของน้ำหนักและไบแอส
        self.weights = []
        self.biases = []
        
        for i in range(1, self.num_layers):
            # ใช้ Xavier/He initialization
            if activation == 'relu':
                # He initialization สำหรับ ReLU
                w = np.random.randn(layer_sizes[i], layer_sizes[i-1]) * np.sqrt(2.0 / layer_sizes[i-1])
            else:
                # Xavier initialization สำหรับ Sigmoid/Tanh
                w = np.random.randn(layer_sizes[i], layer_sizes[i-1]) * np.sqrt(1.0 / layer_sizes[i-1])
            
            b = np.zeros((layer_sizes[i], 1))
            
            self.weights.append(w)
            self.biases.append(b)
    
    def activation(self, z, derivative=False):
        """
        คำนวณฟังก์ชันกระตุ้น
        
        Parameters:
        -----------
        z : numpy.ndarray
            ค่า pre-activation
        derivative : bool
            ถ้า True จะคืนค่าอนุพันธ์
            
        Returns:
        --------
        numpy.ndarray
            ผลลัพธ์ของฟังก์ชันกระตุ้น (หรืออนุพันธ์)
        """
        if self.activation_name == 'relu':
            if derivative:
                return np.where(z > 0, 1, 0)
            return np.maximum(0, z)
        
        elif self.activation_name == 'sigmoid':
            sig = 1 / (1 + np.exp(-np.clip(z, -500, 500)))
            if derivative:
                return sig * (1 - sig)
            return sig
        
        elif self.activation_name == 'tanh':
            t = np.tanh(z)
            if derivative:
                return 1 - t**2
            return t
    
    def forward(self, X):
        """
        คำนวณ Forward Propagation
        
        Parameters:
        -----------
        X : numpy.ndarray
            ข้อมูลอินพุต (features × samples)
            
        Returns:
        --------
        numpy.ndarray
            ผลทำนาย
        """
        self.activations = [X]  # เก็บ activation ของแต่ละชั้น
        self.z_values = []      # เก็บ pre-activation
        
        a = X
        for i in range(len(self.weights) - 1):
            # คำนวณ pre-activation
            z = np.dot(self.weights[i], a) + self.biases[i]
            self.z_values.append(z)
            
            # ใช้ activation function
            a = self.activation(z)
            self.activations.append(a)
        
        # ชั้นเอาต์พุต (ใช้ sigmoid สำหรับ binary classification)
        z = np.dot(self.weights[-1], a) + self.biases[-1]
        self.z_values.append(z)
        
        # ใช้ sigmoid สำหรับชั้นเอาต์พุต
        a = 1 / (1 + np.exp(-np.clip(z, -500, 500)))
        self.activations.append(a)
        
        return a
    
    def compute_loss(self, y_true, y_pred):
        """
        คำนวณ Binary Cross-Entropy Loss
        
        Parameters:
        -----------
        y_true : numpy.ndarray
            ค่าจริง
        y_pred : numpy.ndarray
            ค่าทำนาย
            
        Returns:
        --------
        float
            ค่า loss
        """
        m = y_true.shape[1]  # จำนวนตัวอย่าง
        
        # เพิ่ม epsilon เพื่อป้องกัน log(0)
        epsilon = 1e-15
        y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
        
        loss = -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))
        return loss
    
    def backward(self, X, y_true):
        """
        คำนวณ Backpropagation
        
        Parameters:
        -----------
        X : numpy.ndarray
            ข้อมูลอินพุต
        y_true : numpy.ndarray
            ค่าจริง
        """
        m = X.shape[1]  # จำนวนตัวอย่าง
        
        # เก็บ gradients
        self.dW = []
        self.db = []
        
        # คำนวณ gradient ของชั้นเอาต์พุต
        dz = self.activations[-1] - y_true  # derivative of cross-entropy + sigmoid
        
        dW = np.dot(dz, self.activations[-2].T) / m
        db = np.sum(dz, axis=1, keepdims=True) / m
        
        self.dW.insert(0, dW)
        self.db.insert(0, db)
        
        # Backpropagate ผ่านชั้นซ่อน
        for i in range(len(self.weights) - 2, -1, -1):
            dz = np.dot(self.weights[i+1].T, dz) * self.activation(self.z_values[i], derivative=True)
            
            dW = np.dot(dz, self.activations[i].T) / m
            db = np.sum(dz, axis=1, keepdims=True) / m
            
            self.dW.insert(0, dW)
            self.db.insert(0, db)
    
    def update_parameters(self):
        """
        อัปเดตน้ำหนักและไบแอสด้วย Gradient Descent
        """
        for i in range(len(self.weights)):
            self.weights[i] -= self.learning_rate * self.dW[i]
            self.biases[i] -= self.learning_rate * self.db[i]
    
    def train(self, X, y, epochs=1000, verbose=True):
        """
        ฝึกสอนโมเดล
        
        Parameters:
        -----------
        X : numpy.ndarray
            ข้อมูลอินพุต (features × samples)
        y : numpy.ndarray
            ค่าเป้าหมาย (1 × samples)
        epochs : int
            จำนวนรอบการฝึก
        verbose : bool
            แสดงความคืบหน้า
        """
        history = {'loss': []}
        
        for epoch in range(epochs):
            # Forward propagation
            y_pred = self.forward(X)
            
            # คำนวณ loss
            loss = self.compute_loss(y, y_pred)
            history['loss'].append(loss)
            
            # Backward propagation
            self.backward(X, y)
            
            # อัปเดต parameters
            self.update_parameters()
            
            if verbose and epoch % 100 == 0:
                print(f"Epoch {epoch}, Loss: {loss:.6f}")
        
        return history
    
    def predict(self, X, threshold=0.5):
        """
        ทำนายผลลัพธ์
        
        Parameters:
        -----------
        X : numpy.ndarray
            ข้อมูลอินพุต
        threshold : float
            เกณฑ์สำหรับการจำแนกประเภท
            
        Returns:
        --------
        numpy.ndarray
            ผลทำนาย (0 หรือ 1)
        """
        y_pred = self.forward(X)
        return (y_pred >= threshold).astype(int)


# ตัวอย่างการใช้งาน
if __name__ == "__main__":
    # สร้างข้อมูลตัวอย่าง (XOR problem)
    np.random.seed(42)
    
    X = np.array([[0, 0, 1, 1],
                  [0, 1, 0, 1]])
    y = np.array([[0, 1, 1, 0]])  # XOR output
    
    # สร้างและฝึก MLP
    # โครงสร้าง: 2 อินพุต -> 4 ชั้นซ่อน -> 1 เอาต์พุต
    mlp = MultiLayerPerceptron(
        layer_sizes=[2, 4, 1],
        activation='relu',
        learning_rate=0.5
    )
    
    print("=== ฝึกสอน MLP สำหรับปัญหา XOR ===")
    history = mlp.train(X, y, epochs=1000, verbose=True)
    
    # ทดสอบผลลัพธ์
    print("\n=== ผลทำนาย ===")
    predictions = mlp.forward(X)
    for i in range(X.shape[1]):
        print(f"Input: [{X[0,i]}, {X[1,i]}] -> Prediction: {predictions[0,i]:.4f} -> Class: {int(predictions[0,i] > 0.5)}")

ผลลัพธ์ที่คาดหวัง:

=== ฝึกสอน MLP สำหรับปัญหา XOR ===
Epoch 0, Loss: 0.693147
Epoch 100, Loss: 0.234521
Epoch 200, Loss: 0.089234
...
Epoch 900, Loss: 0.012345

=== ผลทำนาย ===
Input: [0, 0] -> Prediction: 0.0234 -> Class: 0
Input: [0, 1] -> Prediction: 0.9821 -> Class: 1
Input: [1, 0] -> Prediction: 0.9756 -> Class: 1
Input: [1, 1] -> Prediction: 0.0312 -> Class: 0

2.5 การใช้งาน MLP ด้วย Scikit-learn

Scikit-learn มี MLPClassifier และ MLPRegressor ที่พร้อมใช้งานสำหรับงาน Classification และ Regression ตามลำดับ

import numpy as np
from sklearn.neural_network import MLPClassifier, MLPRegressor
from sklearn.datasets import load_iris, load_diabetes, fetch_openml
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import (
    accuracy_score, classification_report, confusion_matrix,
    mean_squared_error, r2_score
)
import matplotlib.pyplot as plt

# ============================================================
# ตัวอย่างที่ 1: MLPClassifier สำหรับ Classification
# ============================================================

def mlp_classification_example():
    """
    ตัวอย่างการใช้ MLPClassifier สำหรับจำแนกประเภทดอกไม้ Iris
    
    Dataset: Iris (150 ตัวอย่าง, 4 features, 3 classes)
    """
    print("=" * 60)
    print("ตัวอย่าง MLPClassifier: การจำแนกดอกไม้ Iris")
    print("=" * 60)
    
    # โหลดข้อมูล Iris
    iris = load_iris()
    X, y = iris.data, iris.target
    
    print(f"\nข้อมูล Iris:")
    print(f"  - จำนวนตัวอย่าง: {X.shape[0]}")
    print(f"  - จำนวน Features: {X.shape[1]}")
    print(f"  - Features: {iris.feature_names}")
    print(f"  - Classes: {iris.target_names}")
    
    # แบ่งข้อมูล Train/Test (80/20)
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, stratify=y
    )
    
    # Standardization (สำคัญมากสำหรับ Neural Networks)
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)
    
    print(f"\nการแบ่งข้อมูล:")
    print(f"  - Training set: {X_train.shape[0]} ตัวอย่าง")
    print(f"  - Test set: {X_test.shape[0]} ตัวอย่าง")
    
    # สร้างและฝึก MLPClassifier
    mlp = MLPClassifier(
        hidden_layer_sizes=(64, 32),    # 2 ชั้นซ่อน: 64 และ 32 neurons
        activation='relu',               # ฟังก์ชันกระตุ้น
        solver='adam',                   # Optimizer
        alpha=0.001,                     # L2 regularization
        batch_size='auto',               # Mini-batch size
        learning_rate='adaptive',        # ปรับ learning rate อัตโนมัติ
        learning_rate_init=0.001,        # Learning rate เริ่มต้น
        max_iter=500,                    # จำนวน epochs สูงสุด
        early_stopping=True,             # หยุดเมื่อ validation ไม่ดีขึ้น
        validation_fraction=0.1,         # 10% สำหรับ validation
        n_iter_no_change=20,             # จำนวน epochs ที่รอก่อนหยุด
        random_state=42,
        verbose=False
    )
    
    print("\nโครงสร้าง MLP:")
    print(f"  - Input Layer: {X_train.shape[1]} neurons")
    print(f"  - Hidden Layers: {mlp.hidden_layer_sizes}")
    print(f"  - Output Layer: {len(np.unique(y))} neurons")
    print(f"  - Activation: {mlp.activation}")
    print(f"  - Optimizer: {mlp.solver}")
    
    # ฝึกโมเดล
    print("\nกำลังฝึกโมเดล...")
    mlp.fit(X_train_scaled, y_train)
    
    print(f"\nผลการฝึก:")
    print(f"  - จำนวน Iterations: {mlp.n_iter_}")
    print(f"  - Final Loss: {mlp.loss_:.6f}")
    print(f"  - จำนวน Layers: {mlp.n_layers_}")
    
    # ทำนายและประเมินผล
    y_pred = mlp.predict(X_test_scaled)
    y_pred_proba = mlp.predict_proba(X_test_scaled)
    
    accuracy = accuracy_score(y_test, y_pred)
    
    print(f"\n{'='*40}")
    print("ผลการประเมิน (Test Set)")
    print(f"{'='*40}")
    print(f"Accuracy: {accuracy:.4f} ({accuracy*100:.2f}%)")
    
    print("\nClassification Report:")
    print(classification_report(y_test, y_pred, target_names=iris.target_names))
    
    print("Confusion Matrix:")
    cm = confusion_matrix(y_test, y_pred)
    print(cm)
    
    # แสดงตัวอย่างการทำนาย
    print("\nตัวอย่างการทำนาย (5 ตัวอย่างแรก):")
    print("-" * 60)
    for i in range(5):
        true_class = iris.target_names[y_test[i]]
        pred_class = iris.target_names[y_pred[i]]
        probs = y_pred_proba[i]
        print(f"  ตัวอย่าง {i+1}: จริง={true_class:12s} | ทำนาย={pred_class:12s}")
        print(f"             ความน่าจะเป็น: {probs}")
    
    # Plot Loss Curve
    plt.figure(figsize=(10, 4))
    
    plt.subplot(1, 2, 1)
    plt.plot(mlp.loss_curve_, color='#458588', linewidth=2)
    plt.xlabel('Iterations')
    plt.ylabel('Loss')
    plt.title('Training Loss Curve')
    plt.grid(True, alpha=0.3)
    
    # Plot Confusion Matrix
    plt.subplot(1, 2, 2)
    im = plt.imshow(cm, cmap='Blues')
    plt.colorbar(im)
    plt.xlabel('Predicted')
    plt.ylabel('Actual')
    plt.title('Confusion Matrix')
    
    # เพิ่มตัวเลขใน heatmap
    for i in range(cm.shape[0]):
        for j in range(cm.shape[1]):
            plt.text(j, i, cm[i, j], ha='center', va='center', 
                    color='white' if cm[i, j] > cm.max()/2 else 'black')
    
    plt.tight_layout()
    plt.savefig('mlp_classification_results.png', dpi=150, bbox_inches='tight')
    plt.close()
    
    return mlp, accuracy


# ============================================================
# ตัวอย่างที่ 2: MLPRegressor สำหรับ Regression
# ============================================================

def mlp_regression_example():
    """
    ตัวอย่างการใช้ MLPRegressor สำหรับทำนายค่าโรคเบาหวาน
    
    Dataset: Diabetes (442 ตัวอย่าง, 10 features)
    """
    print("\n" + "=" * 60)
    print("ตัวอย่าง MLPRegressor: การทำนายระดับโรคเบาหวาน")
    print("=" * 60)
    
    # โหลดข้อมูล Diabetes
    diabetes = load_diabetes()
    X, y = diabetes.data, diabetes.target
    
    print(f"\nข้อมูล Diabetes:")
    print(f"  - จำนวนตัวอย่าง: {X.shape[0]}")
    print(f"  - จำนวน Features: {X.shape[1]}")
    print(f"  - Target range: [{y.min():.1f}, {y.max():.1f}]")
    
    # แบ่งข้อมูล
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42
    )
    
    # Standardization
    scaler_X = StandardScaler()
    scaler_y = StandardScaler()
    
    X_train_scaled = scaler_X.fit_transform(X_train)
    X_test_scaled = scaler_X.transform(X_test)
    
    # Scale target (ช่วยให้ converge เร็วขึ้น)
    y_train_scaled = scaler_y.fit_transform(y_train.reshape(-1, 1)).ravel()
    
    # สร้างและฝึก MLPRegressor
    mlp_reg = MLPRegressor(
        hidden_layer_sizes=(100, 50, 25),  # 3 ชั้นซ่อน
        activation='relu',
        solver='adam',
        alpha=0.01,                         # Regularization
        batch_size=32,
        learning_rate='adaptive',
        learning_rate_init=0.001,
        max_iter=1000,
        early_stopping=True,
        validation_fraction=0.1,
        n_iter_no_change=30,
        random_state=42,
        verbose=False
    )
    
    print(f"\nโครงสร้าง MLP:")
    print(f"  - Hidden Layers: {mlp_reg.hidden_layer_sizes}")
    
    # ฝึกโมเดล
    print("\nกำลังฝึกโมเดล...")
    mlp_reg.fit(X_train_scaled, y_train_scaled)
    
    print(f"\nผลการฝึก:")
    print(f"  - จำนวน Iterations: {mlp_reg.n_iter_}")
    print(f"  - Final Loss: {mlp_reg.loss_:.6f}")
    
    # ทำนายและ inverse transform
    y_pred_scaled = mlp_reg.predict(X_test_scaled)
    y_pred = scaler_y.inverse_transform(y_pred_scaled.reshape(-1, 1)).ravel()
    
    # ประเมินผล
    mse = mean_squared_error(y_test, y_pred)
    rmse = np.sqrt(mse)
    r2 = r2_score(y_test, y_pred)
    
    print(f"\n{'='*40}")
    print("ผลการประเมิน (Test Set)")
    print(f"{'='*40}")
    print(f"  MSE:  {mse:.4f}")
    print(f"  RMSE: {rmse:.4f}")
    print(f"  R²:   {r2:.4f}")
    
    # Plot results
    plt.figure(figsize=(12, 4))
    
    # Loss curve
    plt.subplot(1, 3, 1)
    plt.plot(mlp_reg.loss_curve_, color='#cc241d', linewidth=2)
    plt.xlabel('Iterations')
    plt.ylabel('Loss')
    plt.title('Training Loss Curve')
    plt.grid(True, alpha=0.3)
    
    # Actual vs Predicted
    plt.subplot(1, 3, 2)
    plt.scatter(y_test, y_pred, alpha=0.6, color='#458588')
    plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 
             'r--', linewidth=2, label='Perfect prediction')
    plt.xlabel('Actual Values')
    plt.ylabel('Predicted Values')
    plt.title(f'Actual vs Predicted (R² = {r2:.3f})')
    plt.legend()
    plt.grid(True, alpha=0.3)
    
    # Residuals
    plt.subplot(1, 3, 3)
    residuals = y_test - y_pred
    plt.scatter(y_pred, residuals, alpha=0.6, color='#98971a')
    plt.axhline(y=0, color='r', linestyle='--', linewidth=2)
    plt.xlabel('Predicted Values')
    plt.ylabel('Residuals')
    plt.title('Residual Plot')
    plt.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.savefig('mlp_regression_results.png', dpi=150, bbox_inches='tight')
    plt.close()
    
    return mlp_reg, r2


# ============================================================
# ตัวอย่างที่ 3: Hyperparameter Tuning ด้วย GridSearchCV
# ============================================================

def mlp_hyperparameter_tuning():
    """
    ตัวอย่างการหา Hyperparameters ที่ดีที่สุดด้วย GridSearchCV
    """
    print("\n" + "=" * 60)
    print("ตัวอย่าง Hyperparameter Tuning ด้วย GridSearchCV")
    print("=" * 60)
    
    # โหลดและเตรียมข้อมูล
    iris = load_iris()
    X, y = iris.data, iris.target
    
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, stratify=y
    )
    
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)
    
    # กำหนด Parameter Grid
    param_grid = {
        'hidden_layer_sizes': [(32,), (64,), (32, 16), (64, 32)],
        'activation': ['relu', 'tanh'],
        'alpha': [0.0001, 0.001, 0.01],
        'learning_rate_init': [0.001, 0.01]
    }
    
    print("\nParameter Grid:")
    for key, values in param_grid.items():
        print(f"  - {key}: {values}")
    
    # สร้าง Base Model
    mlp = MLPClassifier(
        solver='adam',
        max_iter=500,
        early_stopping=True,
        random_state=42,
        verbose=False
    )
    
    # GridSearchCV
    print("\nกำลังค้นหา Hyperparameters (อาจใช้เวลาสักครู่)...")
    grid_search = GridSearchCV(
        mlp, 
        param_grid, 
        cv=5,                    # 5-Fold Cross Validation
        scoring='accuracy',
        n_jobs=-1,               # ใช้ทุก CPU cores
        verbose=0
    )
    
    grid_search.fit(X_train_scaled, y_train)
    
    print(f"\nผลการค้นหา:")
    print(f"  Best Parameters: {grid_search.best_params_}")
    print(f"  Best CV Score: {grid_search.best_score_:.4f}")
    
    # ประเมินด้วย Test Set
    best_model = grid_search.best_estimator_
    test_accuracy = best_model.score(X_test_scaled, y_test)
    print(f"  Test Accuracy: {test_accuracy:.4f}")
    
    # แสดงผลลัพธ์ทั้งหมด
    print("\nTop 5 Parameter Combinations:")
    print("-" * 60)
    
    results = grid_search.cv_results_
    indices = np.argsort(results['rank_test_score'])[:5]
    
    for i, idx in enumerate(indices, 1):
        print(f"  {i}. Score: {results['mean_test_score'][idx]:.4f} "
              f"(±{results['std_test_score'][idx]:.4f})")
        print(f"     Params: {results['params'][idx]}")
    
    return grid_search.best_estimator_


# ============================================================
# ตัวอย่างที่ 4: การจำแนกตัวเลข MNIST
# ============================================================

def mlp_mnist_example():
    """
    ตัวอย่างการใช้ MLPClassifier สำหรับจำแนกตัวเลขเขียนมือ MNIST
    
    Dataset: MNIST (70,000 ตัวอย่าง, 784 features, 10 classes)
    """
    print("\n" + "=" * 60)
    print("ตัวอย่าง MLPClassifier: การจำแนกตัวเลข MNIST")
    print("=" * 60)
    
    # โหลดข้อมูล MNIST (ใช้ subset เพื่อความเร็ว)
    print("\nกำลังโหลดข้อมูล MNIST...")
    mnist = fetch_openml('mnist_784', version=1, as_frame=False, parser='auto')
    X, y = mnist.data, mnist.target.astype(int)
    
    # ใช้ subset 10,000 ตัวอย่างเพื่อความเร็ว
    n_samples = 10000
    indices = np.random.choice(len(X), n_samples, replace=False)
    X = X[indices]
    y = y[indices]
    
    print(f"\nข้อมูล MNIST (subset):")
    print(f"  - จำนวนตัวอย่าง: {X.shape[0]}")
    print(f"  - ขนาดภาพ: 28x28 = {X.shape[1]} pixels")
    print(f"  - Classes: 0-9")
    
    # แบ่งข้อมูล
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, stratify=y
    )
    
    # Normalize (0-255 -> 0-1)
    X_train = X_train / 255.0
    X_test = X_test / 255.0
    
    # สร้างและฝึก MLPClassifier
    mlp = MLPClassifier(
        hidden_layer_sizes=(256, 128, 64),  # 3 ชั้นซ่อน
        activation='relu',
        solver='adam',
        alpha=0.0001,
        batch_size=128,
        learning_rate_init=0.001,
        max_iter=50,
        early_stopping=True,
        validation_fraction=0.1,
        random_state=42,
        verbose=True
    )
    
    print("\nโครงสร้าง MLP:")
    print(f"  - Input: 784 neurons")
    print(f"  - Hidden: {mlp.hidden_layer_sizes}")
    print(f"  - Output: 10 neurons")
    
    print("\nกำลังฝึกโมเดล...")
    mlp.fit(X_train, y_train)
    
    # ประเมินผล
    y_pred = mlp.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    
    print(f"\n{'='*40}")
    print("ผลการประเมิน")
    print(f"{'='*40}")
    print(f"  Accuracy: {accuracy:.4f} ({accuracy*100:.2f}%)")
    print(f"  Iterations: {mlp.n_iter_}")
    
    # แสดง Confusion Matrix
    print("\nConfusion Matrix:")
    cm = confusion_matrix(y_test, y_pred)
    print(cm)
    
    # Plot ตัวอย่างการทำนาย
    fig, axes = plt.subplots(2, 5, figsize=(12, 5))
    
    for i, ax in enumerate(axes.flat):
        ax.imshow(X_test[i].reshape(28, 28), cmap='gray')
        ax.set_title(f'True: {y_test[i]}, Pred: {y_pred[i]}')
        ax.axis('off')
    
    plt.tight_layout()
    plt.savefig('mlp_mnist_predictions.png', dpi=150, bbox_inches='tight')
    plt.close()
    
    return mlp, accuracy


# ============================================================
# Main: รันตัวอย่างทั้งหมด
# ============================================================

if __name__ == "__main__":
    # ตัวอย่าง Classification
    mlp_clf, clf_acc = mlp_classification_example()
    
    # ตัวอย่าง Regression
    mlp_reg, reg_r2 = mlp_regression_example()
    
    # ตัวอย่าง Hyperparameter Tuning
    best_mlp = mlp_hyperparameter_tuning()
    
    # ตัวอย่าง MNIST (ใช้เวลานานกว่า)
    # mlp_mnist, mnist_acc = mlp_mnist_example()
    
    print("\n" + "=" * 60)
    print("สรุปผลการทดลอง")
    print("=" * 60)
    print(f"  Classification Accuracy: {clf_acc:.4f}")
    print(f"  Regression R² Score: {reg_r2:.4f}")
    print("\nบันทึกกราฟไปที่:")
    print("  - mlp_classification_results.png")
    print("  - mlp_regression_results.png")

ผลลัพธ์ที่คาดหวัง:

============================================================
ตัวอย่าง MLPClassifier: การจำแนกดอกไม้ Iris
============================================================

ข้อมูล Iris:
  - จำนวนตัวอย่าง: 150
  - จำนวน Features: 4
  - Features: ['sepal length (cm)', 'sepal width (cm)', ...]
  - Classes: ['setosa' 'versicolor' 'virginica']

ผลการประเมิน (Test Set)
========================================
Accuracy: 1.0000 (100.00%)

Classification Report:
              precision    recall  f1-score   support
      setosa       1.00      1.00      1.00        10
  versicolor       1.00      1.00      1.00        10
   virginica       1.00      1.00      1.00        10

============================================================
ตัวอย่าง Hyperparameter Tuning ด้วย GridSearchCV
============================================================

ผลการค้นหา:
  Best Parameters: {'activation': 'relu', 'alpha': 0.001, ...}
  Best CV Score: 0.9750
  Test Accuracy: 1.0000

2.6 Universal Approximation Theorem

ทฤษฎีบท Universal Approximation กล่าวว่า MLP ที่มีชั้นซ่อนเพียงชั้นเดียวที่มีจำนวนโหนดเพียงพอและใช้ฟังก์ชันกระตุ้นที่ไม่เป็นเชิงเส้น สามารถประมาณค่าฟังก์ชันต่อเนื่องใดๆ บนเซตปิดและมีขอบเขตได้ตามความแม่นยำที่ต้องการ

ความหมายในทางปฏิบัติ:

Neural Networks มีความสามารถในการเรียนรู้รูปแบบที่ซับซ้อนได้
ไม่ได้บอกว่าต้องใช้โหนดกี่ตัว หรือจะหาค่าน้ำหนักที่เหมาะสมได้อย่างไร
Deep Networks (หลายชั้น) มักมีประสิทธิภาพดีกว่า Shallow Networks ที่กว้างมาก

3. Backpropagation Algorithm

3.1 แนวคิดพื้นฐาน

Backpropagation (Backward Propagation of Errors) คืออัลกอริทึมสำหรับคำนวณ Gradient ของฟังก์ชัน Loss เทียบกับน้ำหนักทุกตัวในโครงข่ายอย่างมีประสิทธิภาพ โดยใช้ Chain Rule ในการคำนวณย้อนกลับจากชั้นเอาต์พุตไปยังชั้นอินพุต

graph LR
    subgraph FORWARD["Forward Pass"]
        direction LR
        F1["อินพุต x"] --> F2["ชั้นซ่อน h"]
        F2 --> F3["เอาต์พุต ŷ"]
        F3 --> F4["Loss L"]
    end
    
    subgraph BACKWARD["Backward Pass"]
        direction RL
        B1["∂L/∂ŷ"] --> B2["∂L/∂h"]
        B2 --> B3["∂L/∂W"]
        B3 --> B4["อัปเดต W"]
    end
    
    F4 -.-> B1
    
    style FORWARD fill:#282828,stroke:#98971a,color:#ebdbb2
    style BACKWARD fill:#282828,stroke:#cc241d,color:#ebdbb2
    style F1 fill:#458588,stroke:#83a598,color:#ebdbb2
    style F2 fill:#b16286,stroke:#d3869b,color:#ebdbb2
    style F3 fill:#d65d0e,stroke:#fe8019,color:#282828
    style F4 fill:#98971a,stroke:#b8bb26,color:#282828
    style B1 fill:#cc241d,stroke:#fb4934,color:#ebdbb2
    style B2 fill:#d65d0e,stroke:#fe8019,color:#282828
    style B3 fill:#b16286,stroke:#d3869b,color:#ebdbb2
    style B4 fill:#458588,stroke:#83a598,color:#ebdbb2

3.2 Chain Rule และการไหลของ Gradient

Chain Rule ในแคลคูลัสกล่าวว่า ถ้า y = f(g(x)) แล้ว:

\frac{d y}{d x} = \frac{d y}{d g} \cdot \frac{d g}{d x}

การประยุกต์ใช้ใน Neural Networks:

สำหรับ Loss Function L ที่ขึ้นกับน้ำหนัก w ผ่านทางตัวแปรกลางหลายตัว:

\frac{\partial L}{\partial w_{ij}^{(l)}} = \frac{\partial L}{\partial z_{i}^{(l)}} \cdot \frac{\partial z_{i}^{(l)}}{\partial w_{ij}^{(l)}}

3.3 สมการ Backpropagation

นิยาม Error Signal (δ):

δ_{i}^{(l)} = \frac{\partial L}{\partial z_{i}^{(l)}}

Error Signal ของชั้นเอาต์พุต (L = ชั้นสุดท้าย):

สำหรับ Cross-Entropy Loss กับ Sigmoid Activation:

δ^{(L)} = a^{(L)} - y

Error Signal ของชั้นซ่อน:

δ^{(l)} = ({W^{(l + 1)}}^{T} δ^{(l + 1)}) ⊙ f^{'} (z^{(l)})

คำอธิบายตัวแปร:

⊙: Hadamard product (การคูณแบบ element-wise)
f': อนุพันธ์ของฟังก์ชันกระตุ้น

Gradient ของน้ำหนักและไบแอส:

\frac{\partial L}{\partial W^{(l)}} = δ^{(l)} {a^{(l - 1)}}^{T}

\frac{\partial L}{\partial b^{(l)}} = δ^{(l)}

3.4 ตัวอย่างการคำนวณ Backpropagation แบบละเอียด

โจทย์: ให้ MLP แบบง่ายมีโครงสร้างดังนี้

อินพุต: x = 0.5
น้ำหนักชั้น 1: w₁ = 0.8
ไบแอสชั้น 1: b₁ = 0.2
น้ำหนักชั้น 2: w₂ = 0.6
ไบแอสชั้น 2: b₂ = 0.3
ค่าเป้าหมาย: y = 1
ใช้ Sigmoid ทุกชั้น
ใช้ MSE Loss

ขั้นที่ 1: Forward Pass

ชั้นซ่อน:

z₁ = w₁ · x + b₁ = 0.8 × 0.5 + 0.2 = 0.6
a₁ = σ(z₁) = 1/(1 + e⁻⁰·⁶) = 0.6457

ชั้นเอาต์พุต:

z₂ = w₂ · a₁ + b₂ = 0.6 × 0.6457 + 0.3 = 0.6874
ŷ = σ(z₂) = 1/(1 + e⁻⁰·⁶⁸⁷⁴) = 0.6653

ขั้นที่ 2: คำนวณ Loss

L = \frac{1}{2} {(y - \hat{y})}^{2} = \frac{1}{2} {(1 - 0.6653)}^{2} = 0.0560

ขั้นที่ 3: Backward Pass

คำนวณ δ₂ (error signal ของชั้นเอาต์พุต):

\frac{\partial L}{\partial \hat{y}} = - (y - \hat{y}) = - (1 - 0.6653) = −0.3347

δ_{2} = \frac{\partial L}{\partial \hat{y}} \cdot σ^{'} (z_{2}) = - 0.3347 \times 0.6653 \times (1 - 0.6653) = −0.0746

คำนวณ Gradient ของ w₂:

\frac{\partial L}{\partial w_{2}} = δ_{2} \cdot a_{1} = - 0.0746 \times 0.6457 = −0.0482

คำนวณ δ₁ (error signal ของชั้นซ่อน):

δ_{1} = (w_{2} \cdot δ_{2}) \cdot σ^{'} (z_{1})

δ_{1} = (0.6 \times - 0.0746) \times 0.6457 \times (1 - 0.6457) = −0.0102

คำนวณ Gradient ของ w₁:

\frac{\partial L}{\partial w_{1}} = δ_{1} \cdot x = - 0.0102 \times 0.5 = −0.0051

ขั้นที่ 4: อัปเดตน้ำหนัก (Learning Rate = 0.1)

w_{2}^{n e w} = w_{2} - η \cdot \frac{\partial L}{\partial w_{2}} = 0.6 - 0.1 \times (- 0.0482) = 0.6048

w_{1}^{n e w} = w_{1} - η \cdot \frac{\partial L}{\partial w_{1}} = 0.8 - 0.1 \times (- 0.0051) = 0.8005

3.5 Gradient Descent และ Optimization

Gradient Descent คือวิธีการอัปเดตน้ำหนักเพื่อลดค่า Loss:

θ \leftarrow θ - η \cdot \nabla_{θ} L (θ)

คำอธิบายตัวแปร:

θ: พารามิเตอร์ทั้งหมด (น้ำหนักและไบแอส)
η: อัตราการเรียนรู้ (Learning Rate)
∇θL: Gradient ของ Loss เทียบกับพารามิเตอร์

ประเภทของ Gradient Descent:

ประเภท	Batch Size	ข้อดี	ข้อเสีย
Batch GD	ทั้งหมด	Gradient เสถียร, converge แน่นอน	ช้ามาก, ใช้หน่วยความจำเยอะ
Stochastic GD	1 ตัวอย่าง	เร็ว, หนีจาก local minima ได้	Gradient มี noise มาก
Mini-batch GD	32-256 ตัวอย่าง	สมดุลระหว่างสองแบบ	ต้องเลือก batch size

3.6 Advanced Optimizers

graph TB
    subgraph OPT["Optimizers"]
        GD["Gradient Descent
พื้นฐาน"]
        MOM["Momentum
เพิ่มโมเมนตัม"]
        NAG["Nesterov
มองไปข้างหน้า"]
        ADA["AdaGrad
ปรับ LR ตามประวัติ"]
        RMS["RMSprop
แก้ปัญหา AdaGrad"]
        ADAM["Adam
รวม Momentum + RMSprop"]
    end
    
    GD --> MOM --> NAG
    GD --> ADA --> RMS
    MOM --> ADAM
    RMS --> ADAM
    
    style OPT fill:#282828,stroke:#458588,color:#ebdbb2
    style GD fill:#cc241d,stroke:#fb4934,color:#ebdbb2
    style MOM fill:#d65d0e,stroke:#fe8019,color:#282828
    style NAG fill:#d79921,stroke:#fabd2f,color:#282828
    style ADA fill:#98971a,stroke:#b8bb26,color:#282828
    style RMS fill:#689d6a,stroke:#8ec07c,color:#ebdbb2
    style ADAM fill:#458588,stroke:#83a598,color:#ebdbb2

Adam Optimizer (Adaptive Moment Estimation):

\begin{matrix} m_{t} & = & β_{1} m_{t - 1} + (1 - β_{1}) g_{t} \\ v_{t} & = & β_{2} v_{t - 1} + (1 - β_{2}) g_{t}^{2} \\ {\hat{m}}_{t} & = & \frac{m_{t}}{1 - β_{1}^{t}} \\ {\hat{v}}_{t} & = & \frac{v_{t}}{1 - β_{2}^{t}} \\ θ & \leftarrow & θ - \frac{η}{\sqrt{{\hat{v}}_{t}} + ε} {\hat{m}}_{t} \end{matrix}

คำอธิบายตัวแปร:

mₜ: First moment (mean of gradients)
vₜ: Second moment (uncentered variance of gradients)
β₁, β₂: Decay rates (default: 0.9, 0.999)
ε: ค่าคงที่ป้องกันการหารด้วยศูนย์ (default: 10⁻⁸)

4. Activation Functions

4.1 ความสำคัญของ Activation Functions

Activation Functions หรือ ฟังก์ชันกระตุ้น เป็นองค์ประกอบสำคัญที่ทำให้ Neural Networks สามารถเรียนรู้ความสัมพันธ์ที่ไม่เป็นเชิงเส้น (Non-linear) ได้

ทำไมต้องใช้ Non-linear Activation?

หากใช้เฉพาะฟังก์ชันเชิงเส้น ไม่ว่าจะมีกี่ชั้นก็จะลดรูปเป็นฟังก์ชันเชิงเส้นตัวเดียว:

W^{(2)} (W^{(1)} x) = (W^{(2)} W^{(1)}) x = W^{'} x

4.2 ประเภทของ Activation Functions

4.2.1 Sigmoid Function

σ (z) = \frac{1}{1 + e^{- z}}

อนุพันธ์:

σ^{'} (z) = σ (z) \cdot (1 - σ (z))

คุณสมบัติ:

ช่วงค่า: (0, 1)
เหมาะสำหรับ Binary Classification ที่ชั้นเอาต์พุต
ข้อเสีย: Vanishing Gradient Problem (gradient → 0 เมื่อ |z| มาก)

4.2.2 Hyperbolic Tangent (tanh)

\tanh (z) = \frac{e^{z} - e^{- z}}{e^{z} + e^{- z}}

อนุพันธ์:

\tanh^{'} (z) = 1 - \tanh^{2} (z)

คุณสมบัติ:

ช่วงค่า: (-1, 1)
Zero-centered (ค่ากลางอยู่ที่ 0)
มักดีกว่า Sigmoid สำหรับชั้นซ่อน
ข้อเสีย: ยังมี Vanishing Gradient

4.2.3 ReLU (Rectified Linear Unit)

ReLU (z) = \max (0, z) = {\begin{matrix} z & ถ้า z > 0 \\ 0 & ถ้า z \leq 0 \end{matrix}

อนุพันธ์:

{ReLU}^{'} (z) = {\begin{matrix} 1 & ถ้า z > 0 \\ 0 & ถ้า z \leq 0 \end{matrix}

คุณสมบัติ:

ช่วงค่า: [0, ∞)
คำนวณง่าย ไม่มี exponential
ไม่มี Vanishing Gradient (เมื่อ z > 0)
ข้อเสีย: Dead ReLU Problem (neurons ที่ z < 0 จะไม่เรียนรู้)

4.2.4 Leaky ReLU

LeakyReLU (z) = {\begin{matrix} z & ถ้า z > 0 \\ α z & ถ้า z \leq 0 \end{matrix}

โดยทั่วไป α = 0.01

4.2.5 Softmax (สำหรับ Multi-class Classification)

Softmax {(z)}_{i} = \frac{e^{z_{i}}}{\sum_{j = 1}^{K} e^{z_{j}}}

คุณสมบัติ:

ผลรวมของทุก output = 1 (ตีความเป็นความน่าจะเป็น)
ใช้กับ Cross-Entropy Loss

4.3 ตารางเปรียบเทียบ Activation Functions

Activation	สมการ	ช่วงค่า	ข้อดี	ข้อเสีย	การใช้งาน
Sigmoid	1/(1+e⁻ᶻ)	(0,1)	ตีความเป็นความน่าจะเป็น	Vanishing gradient	Output layer (binary)
tanh	(eᶻ-e⁻ᶻ)/(eᶻ+e⁻ᶻ)	(-1,1)	Zero-centered	Vanishing gradient	Hidden layers (RNN)
ReLU	max(0,z)	[0,∞)	เร็ว, ไม่ vanish	Dead neurons	Hidden layers (CNN)
Leaky ReLU	max(αz,z)	(-∞,∞)	แก้ dead neurons	ต้องเลือก α	Hidden layers
Softmax	eᶻⁱ/Σeᶻʲ	(0,1), Σ=1	Multi-class prob	Expensive	Output layer (multi-class)

4.4 Code Implementation: Activation Functions

import numpy as np
import matplotlib.pyplot as plt

class ActivationFunctions:
    """
    คลาสรวมฟังก์ชันกระตุ้นและอนุพันธ์ต่างๆ
    
    ใช้สำหรับศึกษาและทดสอบ Activation Functions
    """
    
    @staticmethod
    def sigmoid(z, derivative=False):
        """
        ฟังก์ชัน Sigmoid
        
        Parameters:
        -----------
        z : numpy.ndarray
            ค่าอินพุต
        derivative : bool
            ถ้า True คืนค่าอนุพันธ์
            
        Returns:
        --------
        numpy.ndarray
            ผลลัพธ์ของฟังก์ชัน
        """
        # ป้องกัน overflow
        z = np.clip(z, -500, 500)
        sig = 1 / (1 + np.exp(-z))
        
        if derivative:
            return sig * (1 - sig)
        return sig
    
    @staticmethod
    def tanh(z, derivative=False):
        """
        ฟังก์ชัน Hyperbolic Tangent
        """
        t = np.tanh(z)
        
        if derivative:
            return 1 - t**2
        return t
    
    @staticmethod
    def relu(z, derivative=False):
        """
        ฟังก์ชัน ReLU (Rectified Linear Unit)
        """
        if derivative:
            return np.where(z > 0, 1, 0).astype(float)
        return np.maximum(0, z)
    
    @staticmethod
    def leaky_relu(z, alpha=0.01, derivative=False):
        """
        ฟังก์ชัน Leaky ReLU
        
        Parameters:
        -----------
        alpha : float
            ค่า slope สำหรับส่วนที่ z < 0
        """
        if derivative:
            return np.where(z > 0, 1, alpha).astype(float)
        return np.where(z > 0, z, alpha * z)
    
    @staticmethod
    def softmax(z):
        """
        ฟังก์ชัน Softmax
        
        ใช้สำหรับ Multi-class Classification
        """
        # ลบค่ามากที่สุดเพื่อป้องกัน overflow
        exp_z = np.exp(z - np.max(z, axis=0, keepdims=True))
        return exp_z / np.sum(exp_z, axis=0, keepdims=True)
    
    @staticmethod
    def elu(z, alpha=1.0, derivative=False):
        """
        ฟังก์ชัน ELU (Exponential Linear Unit)
        """
        if derivative:
            return np.where(z > 0, 1, alpha * np.exp(z))
        return np.where(z > 0, z, alpha * (np.exp(z) - 1))
    
    @staticmethod
    def swish(z, beta=1.0, derivative=False):
        """
        ฟังก์ชัน Swish (z * sigmoid(βz))
        
        Self-gated activation function ที่ใช้ใน EfficientNet
        """
        sig = 1 / (1 + np.exp(-beta * z))
        
        if derivative:
            return sig + z * sig * (1 - sig) * beta
        return z * sig


def visualize_activations():
    """
    แสดงกราฟของ Activation Functions ต่างๆ
    """
    z = np.linspace(-5, 5, 1000)
    
    act = ActivationFunctions()
    
    fig, axes = plt.subplots(2, 3, figsize=(14, 8))
    
    # กำหนด Gruvbox colors
    colors = {
        'func': '#458588',    # Blue
        'deriv': '#cc241d',   # Red
        'bg': '#282828',      # Dark background
        'fg': '#ebdbb2'       # Light foreground
    }
    
    # Sigmoid
    ax = axes[0, 0]
    ax.plot(z, act.sigmoid(z), color=colors['func'], linewidth=2, label='σ(z)')
    ax.plot(z, act.sigmoid(z, derivative=True), color=colors['deriv'], 
            linewidth=2, linestyle='--', label="σ'(z)")
    ax.set_title('Sigmoid', color=colors['fg'])
    ax.set_xlabel('z')
    ax.legend()
    ax.grid(True, alpha=0.3)
    ax.axhline(y=0, color='gray', linestyle='-', linewidth=0.5)
    ax.axvline(x=0, color='gray', linestyle='-', linewidth=0.5)
    
    # tanh
    ax = axes[0, 1]
    ax.plot(z, act.tanh(z), color=colors['func'], linewidth=2, label='tanh(z)')
    ax.plot(z, act.tanh(z, derivative=True), color=colors['deriv'], 
            linewidth=2, linestyle='--', label="tanh'(z)")
    ax.set_title('Hyperbolic Tangent', color=colors['fg'])
    ax.set_xlabel('z')
    ax.legend()
    ax.grid(True, alpha=0.3)
    ax.axhline(y=0, color='gray', linestyle='-', linewidth=0.5)
    ax.axvline(x=0, color='gray', linestyle='-', linewidth=0.5)
    
    # ReLU
    ax = axes[0, 2]
    ax.plot(z, act.relu(z), color=colors['func'], linewidth=2, label='ReLU(z)')
    ax.plot(z, act.relu(z, derivative=True), color=colors['deriv'], 
            linewidth=2, linestyle='--', label="ReLU'(z)")
    ax.set_title('ReLU', color=colors['fg'])
    ax.set_xlabel('z')
    ax.legend()
    ax.grid(True, alpha=0.3)
    ax.axhline(y=0, color='gray', linestyle='-', linewidth=0.5)
    ax.axvline(x=0, color='gray', linestyle='-', linewidth=0.5)
    ax.set_ylim(-1, 5)
    
    # Leaky ReLU
    ax = axes[1, 0]
    ax.plot(z, act.leaky_relu(z), color=colors['func'], linewidth=2, label='LeakyReLU(z)')
    ax.plot(z, act.leaky_relu(z, derivative=True), color=colors['deriv'], 
            linewidth=2, linestyle='--', label="LeakyReLU'(z)")
    ax.set_title('Leaky ReLU (α=0.01)', color=colors['fg'])
    ax.set_xlabel('z')
    ax.legend()
    ax.grid(True, alpha=0.3)
    ax.axhline(y=0, color='gray', linestyle='-', linewidth=0.5)
    ax.axvline(x=0, color='gray', linestyle='-', linewidth=0.5)
    
    # ELU
    ax = axes[1, 1]
    ax.plot(z, act.elu(z), color=colors['func'], linewidth=2, label='ELU(z)')
    ax.plot(z, act.elu(z, derivative=True), color=colors['deriv'], 
            linewidth=2, linestyle='--', label="ELU'(z)")
    ax.set_title('ELU (α=1.0)', color=colors['fg'])
    ax.set_xlabel('z')
    ax.legend()
    ax.grid(True, alpha=0.3)
    ax.axhline(y=0, color='gray', linestyle='-', linewidth=0.5)
    ax.axvline(x=0, color='gray', linestyle='-', linewidth=0.5)
    
    # Swish
    ax = axes[1, 2]
    ax.plot(z, act.swish(z), color=colors['func'], linewidth=2, label='Swish(z)')
    ax.plot(z, act.swish(z, derivative=True), color=colors['deriv'], 
            linewidth=2, linestyle='--', label="Swish'(z)")
    ax.set_title('Swish (β=1.0)', color=colors['fg'])
    ax.set_xlabel('z')
    ax.legend()
    ax.grid(True, alpha=0.3)
    ax.axhline(y=0, color='gray', linestyle='-', linewidth=0.5)
    ax.axvline(x=0, color='gray', linestyle='-', linewidth=0.5)
    
    plt.tight_layout()
    plt.savefig('activation_functions.png', dpi=150, bbox_inches='tight')
    plt.show()


# ตัวอย่างการใช้งาน
if __name__ == "__main__":
    act = ActivationFunctions()
    
    # ทดสอบ Sigmoid
    z = np.array([-2, -1, 0, 1, 2])
    print("=== ทดสอบ Sigmoid ===")
    print(f"z = {z}")
    print(f"sigmoid(z) = {act.sigmoid(z)}")
    print(f"sigmoid'(z) = {act.sigmoid(z, derivative=True)}")
    
    # ทดสอบ Softmax
    print("\n=== ทดสอบ Softmax ===")
    logits = np.array([[2.0], [1.0], [0.1]])
    probs = act.softmax(logits)
    print(f"Logits: {logits.flatten()}")
    print(f"Softmax: {probs.flatten()}")
    print(f"Sum: {probs.sum()}")  # ควรได้ 1.0
    
    # แสดงกราฟ
    visualize_activations()

5. Deep Learning เบื้องต้น

5.1 ความหมายของ Deep Learning

Deep Learning คือสาขาย่อยของ Machine Learning ที่ใช้ โครงข่ายประสาทเทียมลึก (Deep Neural Networks) ซึ่งมีหลายชั้นซ่อน (Hidden Layers) ในการเรียนรู้ Hierarchical Representations ของข้อมูล

graph TB
    subgraph AI["ปัญญาประดิษฐ์ (Artificial Intelligence)"]
        subgraph ML["การเรียนรู้ของเครื่อง (Machine Learning)"]
            subgraph DL["การเรียนรู้เชิงลึก (Deep Learning)"]
                CNN["CNN
ภาพ"]
                RNN["RNN
ลำดับ"]
                TF["Transformer
ภาษา"]
                GAN["GAN
สร้างภาพ"]
            end
            TRAD["ML แบบดั้งเดิม
SVM, Decision Tree"]
        end
        RULE["ระบบผู้เชี่ยวชาญ
Rule-based"]
    end
    
    style AI fill:#282828,stroke:#d65d0e,color:#ebdbb2
    style ML fill:#3c3836,stroke:#458588,color:#ebdbb2
    style DL fill:#504945,stroke:#b16286,color:#ebdbb2
    style CNN fill:#98971a,stroke:#b8bb26,color:#282828
    style RNN fill:#d79921,stroke:#fabd2f,color:#282828
    style TF fill:#cc241d,stroke:#fb4934,color:#ebdbb2
    style GAN fill:#689d6a,stroke:#8ec07c,color:#ebdbb2
    style TRAD fill:#458588,stroke:#83a598,color:#ebdbb2
    style RULE fill:#928374,stroke:#a89984,color:#282828

5.2 คุณสมบัติสำคัญของ Deep Learning

Hierarchical Feature Learning:

Deep Networks เรียนรู้ features แบบลำดับชั้น:

ระดับชั้น	ประเภท Feature	ตัวอย่าง (ภาพ)
ชั้นแรก	Low-level	ขอบ, มุม, สี
ชั้นกลาง	Mid-level	รูปทรง, texture
ชั้นลึก	High-level	วัตถุ, ใบหน้า

5.3 ความท้าทายในการฝึก Deep Networks

5.3.1 Vanishing Gradient Problem

เมื่อโครงข่ายลึกมาก gradient จะเล็กลงเรื่อยๆ เมื่อ backpropagate ผ่านหลายชั้น ทำให้ชั้นแรกๆ เรียนรู้ได้ช้ามาก

วิธีแก้ไข:

ใช้ ReLU แทน Sigmoid/tanh
Batch Normalization
Residual Connections (Skip Connections)
Proper Weight Initialization

5.3.2 Exploding Gradient Problem

Gradient มีค่าใหญ่มากจนทำให้น้ำหนักอัปเดตไม่เสถียร

วิธีแก้ไข:

Gradient Clipping
Batch Normalization
Careful Learning Rate Selection

5.4 เทคนิคสำคัญใน Deep Learning

5.4.1 Weight Initialization

Xavier/Glorot Initialization:

W \sim N (0, \frac{2}{n_{in} + n_{out}})

He Initialization (สำหรับ ReLU):

W \sim N (0, \frac{2}{n_{in}})

5.4.2 Batch Normalization

Batch Normalization ทำการ normalize activation ในแต่ละ mini-batch:

\begin{matrix} μ_{B} & = & \frac{1}{m} \sum_{i = 1}^{m} x_{i} \\ σ_{B}^{2} & = & \frac{1}{m} \sum_{i = 1}^{m} {(x_{i} - μ_{B})}^{2} \\ {\hat{x}}_{i} & = & \frac{x_{i} - μ_{B}}{\sqrt{σ_{B}^{2} + ε}} \\ y_{i} & = & γ {\hat{x}}_{i} + β \end{matrix}

คำอธิบายตัวแปร:

μB: ค่าเฉลี่ยของ mini-batch
σB²: ความแปรปรวนของ mini-batch
γ, β: พารามิเตอร์ที่เรียนรู้ได้

5.4.3 Dropout

Dropout คือเทคนิค Regularization ที่สุ่ม "ปิด" neurons บางส่วนระหว่างการฝึก

\tilde{h} = r ⊙ h, โดยที่ r_{i} \sim Bernoulli (p)

ประโยชน์:

ป้องกัน Overfitting
สร้าง Ensemble Effect โดยปริยาย
บังคับให้โครงข่ายเรียนรู้ features ที่หลากหลาย

5.5 Regularization Techniques

เทคนิค	วิธีการ	ผลลัพธ์
L2 Regularization	เพิ่ม λΣw² ใน Loss	น้ำหนักเล็กลง
L1 Regularization	เพิ่ม λΣ\|w\| ใน Loss	น้ำหนักบางตัวเป็น 0 (Sparse)
Dropout	สุ่มปิด neurons	ป้องกัน co-adaptation
Data Augmentation	เพิ่มข้อมูลด้วยการแปลง	เพิ่มความหลากหลาย
Early Stopping	หยุดเมื่อ validation loss ไม่ลด	ป้องกัน overfit

5.6 Loss Functions สำหรับงานต่างๆ

5.6.1 Regression: Mean Squared Error (MSE)

MSE = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}

5.6.2 Binary Classification: Binary Cross-Entropy

BCE = - \frac{1}{n} \sum_{i = 1}^{n} [y_{i} \log ({\hat{y}}_{i}) + (1 - y_{i}) \log (1 - {\hat{y}}_{i})]

5.6.3 Multi-class Classification: Categorical Cross-Entropy

CCE = - \sum_{i = 1}^{n} \sum_{c = 1}^{C} y_{i, c} \log ({\hat{y}}_{i, c})

6. Convolutional Neural Networks (CNN) พื้นฐาน

6.1 แนวคิดและแรงบันดาลใจ

Convolutional Neural Networks (CNN) หรือ ConvNets เป็นสถาปัตยกรรม Neural Network ที่ออกแบบมาโดยเฉพาะสำหรับการประมวลผลข้อมูลที่มีโครงสร้างแบบตาราง (Grid-like Topology) เช่น รูปภาพ

แรงบันดาลใจจากระบบการมองเห็น:

Visual Cortex ของสมองมี neurons ที่ตอบสนองต่อ features เฉพาะในพื้นที่เล็กๆ
CNN เลียนแบบการทำงานนี้ด้วย Local Connectivity และ Parameter Sharing

6.2 องค์ประกอบหลักของ CNN

graph LR
    subgraph CNN["สถาปัตยกรรม CNN"]
        INPUT["ภาพอินพุต
Input Image"] --> CONV1["Convolution
Layer 1"]
        CONV1 --> POOL1["Pooling
Layer 1"]
        POOL1 --> CONV2["Convolution
Layer 2"]
        CONV2 --> POOL2["Pooling
Layer 2"]
        POOL2 --> FLAT["Flatten"]
        FLAT --> FC1["Fully Connected
Layer 1"]
        FC1 --> FC2["Fully Connected
Layer 2"]
        FC2 --> OUTPUT["Output
Class Probabilities"]
    end
    
    style CNN fill:#282828,stroke:#458588,color:#ebdbb2
    style INPUT fill:#98971a,stroke:#b8bb26,color:#282828
    style CONV1 fill:#458588,stroke:#83a598,color:#ebdbb2
    style CONV2 fill:#458588,stroke:#83a598,color:#ebdbb2
    style POOL1 fill:#b16286,stroke:#d3869b,color:#ebdbb2
    style POOL2 fill:#b16286,stroke:#d3869b,color:#ebdbb2
    style FLAT fill:#d65d0e,stroke:#fe8019,color:#282828
    style FC1 fill:#d79921,stroke:#fabd2f,color:#282828
    style FC2 fill:#d79921,stroke:#fabd2f,color:#282828
    style OUTPUT fill:#cc241d,stroke:#fb4934,color:#ebdbb2

6.3 Convolution Operation

Convolution คือการเลื่อน Filter/Kernel ผ่านภาพอินพุตและคำนวณผลรวมถ่วงน้ำหนัก

สมการ Convolution 2D:

S (i, j) = (I * K) (i, j) = \sum_{m}^{} \sum_{n}^{} I (i + m, j + n) \cdot K (m, n)

คำอธิบายตัวแปร:

I: ภาพอินพุต (Input Image)
K: Filter/Kernel
S: Feature Map ที่ได้

ตัวอย่างการคำนวณ Convolution:

สมมติ Input (3×3) และ Kernel (2×2):

Input I:          Kernel K:
[1  2  3]         [1  0]
[4  5  6]         [0  1]
[7  8  9]

การคำนวณ Feature Map (2×2):

Position (0,0): (1×1) + (2×0) + (4×0) + (5×1) = 1 + 0 + 0 + 5 = 6 Position (0,1): (2×1) + (3×0) + (5×0) + (6×1) = 2 + 0 + 0 + 6 = 8 Position (1,0): (4×1) + (5×0) + (7×0) + (8×1) = 4 + 0 + 0 + 8 = 12 Position (1,1): (5×1) + (6×0) + (8×0) + (9×1) = 5 + 0 + 0 + 9 = 14

Feature Map S:
[6   8]
[12  14]

6.4 Hyperparameters ของ Convolution Layer

6.4.1 Padding

Padding คือการเพิ่มขอบรอบภาพเพื่อควบคุมขนาด output

Valid Padding (No Padding): ไม่เพิ่มขอบ → output เล็กลง
Same Padding: เพิ่มขอบให้ output มีขนาดเท่า input

6.4.2 Stride

Stride คือระยะการเลื่อน kernel ในแต่ละครั้ง

Stride = 1: เลื่อนทีละ 1 pixel
Stride = 2: เลื่อนทีละ 2 pixels → output เล็กลงครึ่งหนึ่ง

6.4.3 สูตรคำนวณขนาด Output

O_{size} = \frac{I - K + 2 P}{S} + 1

คำอธิบายตัวแปร:

I: ขนาด Input
K: ขนาด Kernel
P: Padding
S: Stride

ตัวอย่าง: Input 32×32, Kernel 5×5, Padding 2, Stride 1

Output = (32 - 5 + 2×2) / 1 + 1 = 32 (Same size)

6.5 Pooling Layer

Pooling คือการลดขนาดของ Feature Map โดยสรุปข้อมูลในพื้นที่เล็กๆ

ประเภทของ Pooling:

ประเภท	วิธีการ	ข้อดี
Max Pooling	เลือกค่ามากที่สุด	เก็บ feature ที่โดดเด่น
Average Pooling	หาค่าเฉลี่ย	ลด noise
Global Average Pooling	หาค่าเฉลี่ยทั้ง feature map	ลด parameters

ตัวอย่าง Max Pooling (2×2, Stride 2):

Input (4×4):           Output (2×2):
[1  3  2  4]           [5  6]
[5  2  1  6]   →       [8  9]
[7  8  3  2]
[4  5  9  1]

6.6 Convolutional Layer vs Fully Connected Layer

คุณสมบัติ	Convolutional	Fully Connected
การเชื่อมต่อ	Local (เฉพาะพื้นที่)	Global (ทุก neuron)
Parameter Sharing	มี (ใช้ kernel เดียวกัน)	ไม่มี
Translation Invariance	มี	ไม่มี
จำนวน Parameters	น้อย	มาก
เหมาะกับ	Spatial data (ภาพ)	Tabular data

6.7 สถาปัตยกรรม CNN ที่สำคัญ

graph TB
    subgraph HISTORY["วิวัฒนาการของ CNN"]
        L1["1998: LeNet-5
ตัวเลข MNIST"]
        L2["2012: AlexNet
ImageNet Champion"]
        L3["2014: VGGNet
เน้นความลึก"]
        L4["2014: GoogLeNet
Inception Module"]
        L5["2015: ResNet
Skip Connections"]
        L6["2017: DenseNet
Dense Connections"]
        L7["2019: EfficientNet
Compound Scaling"]
        L8["2020: Vision Transformer
ViT"]
        
        L1 --> L2 --> L3 --> L4 --> L5 --> L6 --> L7 --> L8
    end
    
    style HISTORY fill:#282828,stroke:#458588,color:#ebdbb2
    style L1 fill:#cc241d,stroke:#fb4934,color:#ebdbb2
    style L2 fill:#d65d0e,stroke:#fe8019,color:#282828
    style L3 fill:#d79921,stroke:#fabd2f,color:#282828
    style L4 fill:#98971a,stroke:#b8bb26,color:#282828
    style L5 fill:#689d6a,stroke:#8ec07c,color:#ebdbb2
    style L6 fill:#458588,stroke:#83a598,color:#ebdbb2
    style L7 fill:#b16286,stroke:#d3869b,color:#ebdbb2
    style L8 fill:#cc241d,stroke:#fb4934,color:#ebdbb2

6.8 Code Implementation: Simple CNN

import numpy as np

class Conv2D:
    """
    ชั้น Convolutional 2D แบบง่าย
    
    รองรับ multiple filters และ padding
    """
    
    def __init__(self, num_filters, kernel_size, stride=1, padding=0):
        """
        กำหนดค่าเริ่มต้นของ Conv2D Layer
        
        Parameters:
        -----------
        num_filters : int
            จำนวน filters
        kernel_size : int
            ขนาดของ kernel (สมมติเป็นสี่เหลี่ยมจัตุรัส)
        stride : int
            ระยะเลื่อน
        padding : int
            จำนวน padding รอบภาพ
        """
        self.num_filters = num_filters
        self.kernel_size = kernel_size
        self.stride = stride
        self.padding = padding
        
        # กำหนดค่าเริ่มต้นของ filters (He initialization)
        self.filters = np.random.randn(
            num_filters, kernel_size, kernel_size
        ) * np.sqrt(2.0 / (kernel_size * kernel_size))
        
        self.biases = np.zeros(num_filters)
    
    def _pad_input(self, X):
        """เพิ่ม zero-padding รอบภาพ"""
        if self.padding == 0:
            return X
        return np.pad(
            X,
            ((0, 0), (self.padding, self.padding), (self.padding, self.padding)),
            mode='constant'
        )
    
    def _get_output_size(self, input_size):
        """คำนวณขนาด output"""
        return (input_size - self.kernel_size + 2 * self.padding) // self.stride + 1
    
    def forward(self, X):
        """
        Forward pass ของ Conv2D
        
        Parameters:
        -----------
        X : numpy.ndarray
            Input ขนาด (batch_size, height, width)
            
        Returns:
        --------
        numpy.ndarray
            Feature maps ขนาด (batch_size, num_filters, out_h, out_w)
        """
        self.X = X
        batch_size, in_h, in_w = X.shape
        
        # เพิ่ม padding
        X_padded = self._pad_input(X)
        
        # คำนวณขนาด output
        out_h = self._get_output_size(in_h)
        out_w = self._get_output_size(in_w)
        
        # สร้าง output array
        output = np.zeros((batch_size, self.num_filters, out_h, out_w))
        
        # ทำ convolution
        for f in range(self.num_filters):
            for i in range(out_h):
                for j in range(out_w):
                    # ดึง region ที่จะทำ convolution
                    h_start = i * self.stride
                    h_end = h_start + self.kernel_size
                    w_start = j * self.stride
                    w_end = w_start + self.kernel_size
                    
                    region = X_padded[:, h_start:h_end, w_start:w_end]
                    
                    # คำนวณ convolution
                    output[:, f, i, j] = np.sum(
                        region * self.filters[f], axis=(1, 2)
                    ) + self.biases[f]
        
        return output


class MaxPool2D:
    """
    ชั้น Max Pooling 2D
    """
    
    def __init__(self, pool_size=2, stride=2):
        """
        Parameters:
        -----------
        pool_size : int
            ขนาดของ pooling window
        stride : int
            ระยะเลื่อน
        """
        self.pool_size = pool_size
        self.stride = stride
    
    def forward(self, X):
        """
        Forward pass ของ MaxPool2D
        
        Parameters:
        -----------
        X : numpy.ndarray
            Input ขนาด (batch_size, num_channels, height, width)
            
        Returns:
        --------
        numpy.ndarray
            Output ขนาดลดลงตาม pool_size และ stride
        """
        self.X = X
        batch_size, num_channels, in_h, in_w = X.shape
        
        out_h = (in_h - self.pool_size) // self.stride + 1
        out_w = (in_w - self.pool_size) // self.stride + 1
        
        output = np.zeros((batch_size, num_channels, out_h, out_w))
        
        for i in range(out_h):
            for j in range(out_w):
                h_start = i * self.stride
                h_end = h_start + self.pool_size
                w_start = j * self.stride
                w_end = w_start + self.pool_size
                
                region = X[:, :, h_start:h_end, w_start:w_end]
                output[:, :, i, j] = np.max(region, axis=(2, 3))
        
        return output


class Flatten:
    """
    ชั้น Flatten: แปลง multi-dimensional array เป็น 1D
    """
    
    def forward(self, X):
        """
        Parameters:
        -----------
        X : numpy.ndarray
            Input ขนาด (batch_size, ...)
            
        Returns:
        --------
        numpy.ndarray
            Output ขนาด (batch_size, flattened_size)
        """
        self.input_shape = X.shape
        batch_size = X.shape[0]
        return X.reshape(batch_size, -1)


class Dense:
    """
    ชั้น Fully Connected (Dense)
    """
    
    def __init__(self, output_size, activation='relu'):
        """
        Parameters:
        -----------
        output_size : int
            จำนวน neurons ใน layer นี้
        activation : str
            ฟังก์ชันกระตุ้น ('relu', 'sigmoid', 'softmax')
        """
        self.output_size = output_size
        self.activation = activation
        self.W = None
        self.b = None
    
    def _init_weights(self, input_size):
        """กำหนดค่าเริ่มต้นของน้ำหนัก"""
        if self.activation == 'relu':
            self.W = np.random.randn(input_size, self.output_size) * np.sqrt(2.0 / input_size)
        else:
            self.W = np.random.randn(input_size, self.output_size) * np.sqrt(1.0 / input_size)
        self.b = np.zeros(self.output_size)
    
    def _activate(self, z):
        """ใช้ activation function"""
        if self.activation == 'relu':
            return np.maximum(0, z)
        elif self.activation == 'sigmoid':
            return 1 / (1 + np.exp(-np.clip(z, -500, 500)))
        elif self.activation == 'softmax':
            exp_z = np.exp(z - np.max(z, axis=1, keepdims=True))
            return exp_z / np.sum(exp_z, axis=1, keepdims=True)
        return z
    
    def forward(self, X):
        """
        Forward pass ของ Dense layer
        """
        if self.W is None:
            self._init_weights(X.shape[1])
        
        self.X = X
        z = np.dot(X, self.W) + self.b
        return self._activate(z)


class SimpleCNN:
    """
    CNN แบบง่ายสำหรับการจำแนกภาพ
    
    สถาปัตยกรรม:
    Conv2D(8, 3x3) -> ReLU -> MaxPool(2x2) ->
    Conv2D(16, 3x3) -> ReLU -> MaxPool(2x2) ->
    Flatten -> Dense(64) -> ReLU -> Dense(10) -> Softmax
    """
    
    def __init__(self):
        """สร้างโครงสร้าง CNN"""
        self.layers = [
            Conv2D(num_filters=8, kernel_size=3, padding=1),
            MaxPool2D(pool_size=2, stride=2),
            Conv2D(num_filters=16, kernel_size=3, padding=1),
            MaxPool2D(pool_size=2, stride=2),
            Flatten(),
            Dense(64, activation='relu'),
            Dense(10, activation='softmax')
        ]
    
    def forward(self, X):
        """
        Forward pass ผ่านทุกชั้น
        
        Parameters:
        -----------
        X : numpy.ndarray
            ภาพอินพุต ขนาด (batch_size, height, width)
            
        Returns:
        --------
        numpy.ndarray
            ความน่าจะเป็นของแต่ละ class
        """
        out = X
        for layer in self.layers:
            out = layer.forward(out)
            
            # ใช้ ReLU สำหรับ Conv2D
            if isinstance(layer, Conv2D):
                out = np.maximum(0, out)
        
        return out
    
    def predict(self, X):
        """
        ทำนาย class
        
        Returns:
        --------
        numpy.ndarray
            Class labels ที่ทำนาย
        """
        probs = self.forward(X)
        return np.argmax(probs, axis=1)


# ตัวอย่างการใช้งาน
if __name__ == "__main__":
    # สร้างข้อมูลจำลอง (10 ภาพ ขนาด 28x28)
    np.random.seed(42)
    X = np.random.randn(10, 28, 28)
    
    # สร้าง CNN
    cnn = SimpleCNN()
    
    print("=== ทดสอบ Simple CNN ===")
    print(f"Input shape: {X.shape}")
    
    # Forward pass
    output = cnn.forward(X)
    print(f"Output shape: {output.shape}")
    print(f"Output (probabilities for first image):\n{output[0]}")
    
    # ทำนาย
    predictions = cnn.predict(X)
    print(f"Predictions: {predictions}")
    
    # ทดสอบแต่ละ layer
    print("\n=== ขนาด output ของแต่ละชั้น ===")
    out = X
    for i, layer in enumerate(cnn.layers):
        out = layer.forward(out)
        if isinstance(layer, Conv2D):
            out = np.maximum(0, out)
        print(f"Layer {i} ({type(layer).__name__}): {out.shape}")

ผลลัพธ์ที่คาดหวัง:

=== ทดสอบ Simple CNN ===
Input shape: (10, 28, 28)
Output shape: (10, 10)
Output (probabilities for first image):
[0.098 0.102 0.095 0.103 0.099 0.101 0.097 0.104 0.099 0.102]

Predictions: [7 3 7 3 1 7 7 7 7 3]

=== ขนาด output ของแต่ละชั้น ===
Layer 0 (Conv2D): (10, 8, 28, 28)
Layer 1 (MaxPool2D): (10, 8, 14, 14)
Layer 2 (Conv2D): (10, 16, 14, 14)
Layer 3 (MaxPool2D): (10, 16, 7, 7)
Layer 4 (Flatten): (10, 784)
Layer 5 (Dense): (10, 64)
Layer 6 (Dense): (10, 10)

6.9 การใช้งาน CNN ด้วย TensorFlow/Keras

TensorFlow และ Keras เป็น Framework ยอดนิยมสำหรับการสร้างและฝึก Deep Learning Models รวมถึง CNN

import numpy as np
import matplotlib.pyplot as plt

# TensorFlow และ Keras
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models, callbacks, regularizers
from tensorflow.keras.datasets import mnist, cifar10, fashion_mnist
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# ตรวจสอบ TensorFlow version และ GPU
print(f"TensorFlow Version: {tf.__version__}")
print(f"GPU Available: {tf.config.list_physical_devices('GPU')}")


# ============================================================
# ตัวอย่างที่ 1: CNN พื้นฐานสำหรับ MNIST
# ============================================================

def cnn_mnist_basic():
    """
    สร้าง CNN พื้นฐานสำหรับจำแนกตัวเลขเขียนมือ MNIST
    
    Dataset: MNIST (60,000 train, 10,000 test, 28x28 grayscale)
    """
    print("=" * 60)
    print("ตัวอย่าง CNN พื้นฐาน: การจำแนกตัวเลข MNIST")
    print("=" * 60)
    
    # โหลดข้อมูล MNIST
    (X_train, y_train), (X_test, y_test) = mnist.load_data()
    
    print(f"\nข้อมูล MNIST:")
    print(f"  Training set: {X_train.shape}")
    print(f"  Test set: {X_test.shape}")
    
    # Preprocessing
    # 1. Reshape เพิ่มมิติ channel (28, 28) -> (28, 28, 1)
    X_train = X_train.reshape(-1, 28, 28, 1).astype('float32')
    X_test = X_test.reshape(-1, 28, 28, 1).astype('float32')
    
    # 2. Normalize 0-255 -> 0-1
    X_train = X_train / 255.0
    X_test = X_test / 255.0
    
    # 3. One-hot encoding สำหรับ labels
    y_train_cat = to_categorical(y_train, 10)
    y_test_cat = to_categorical(y_test, 10)
    
    print(f"\nหลัง Preprocessing:")
    print(f"  X_train shape: {X_train.shape}")
    print(f"  y_train shape: {y_train_cat.shape}")
    
    # สร้างโมเดล CNN ด้วย Sequential API
    model = models.Sequential([
        # Convolutional Block 1
        layers.Conv2D(
            filters=32,                    # จำนวน filters
            kernel_size=(3, 3),            # ขนาด kernel
            activation='relu',             # Activation function
            padding='same',                # รักษาขนาด output
            input_shape=(28, 28, 1),       # ขนาด input
            name='conv1'
        ),
        layers.BatchNormalization(name='bn1'),
        layers.MaxPooling2D(pool_size=(2, 2), name='pool1'),
        
        # Convolutional Block 2
        layers.Conv2D(64, (3, 3), activation='relu', padding='same', name='conv2'),
        layers.BatchNormalization(name='bn2'),
        layers.MaxPooling2D((2, 2), name='pool2'),
        
        # Convolutional Block 3
        layers.Conv2D(128, (3, 3), activation='relu', padding='same', name='conv3'),
        layers.BatchNormalization(name='bn3'),
        
        # Flatten และ Dense Layers
        layers.Flatten(name='flatten'),
        layers.Dense(128, activation='relu', name='dense1'),
        layers.Dropout(0.5, name='dropout'),  # Regularization
        layers.Dense(10, activation='softmax', name='output')
    ])
    
    # แสดงโครงสร้างโมเดล
    print("\nโครงสร้างโมเดล:")
    model.summary()
    
    # Compile โมเดล
    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=0.001),
        loss='categorical_crossentropy',
        metrics=['accuracy']
    )
    
    # Callbacks
    early_stop = callbacks.EarlyStopping(
        monitor='val_loss',
        patience=5,
        restore_best_weights=True
    )
    
    reduce_lr = callbacks.ReduceLROnPlateau(
        monitor='val_loss',
        factor=0.5,
        patience=3,
        min_lr=1e-6
    )
    
    # ฝึกโมเดล
    print("\nกำลังฝึกโมเดล...")
    history = model.fit(
        X_train, y_train_cat,
        batch_size=128,
        epochs=20,
        validation_split=0.1,
        callbacks=[early_stop, reduce_lr],
        verbose=1
    )
    
    # ประเมินผล
    test_loss, test_acc = model.evaluate(X_test, y_test_cat, verbose=0)
    
    print(f"\n{'='*40}")
    print("ผลการประเมิน (Test Set)")
    print(f"{'='*40}")
    print(f"  Loss: {test_loss:.4f}")
    print(f"  Accuracy: {test_acc:.4f} ({test_acc*100:.2f}%)")
    
    # ทำนายและแสดงตัวอย่าง
    y_pred = model.predict(X_test[:10], verbose=0)
    y_pred_classes = np.argmax(y_pred, axis=1)
    
    print("\nตัวอย่างการทำนาย (10 ตัวอย่างแรก):")
    print(f"  True labels:      {y_test[:10]}")
    print(f"  Predicted labels: {y_pred_classes}")
    
    # Plot Training History
    fig, axes = plt.subplots(1, 2, figsize=(12, 4))
    
    # Accuracy
    axes[0].plot(history.history['accuracy'], label='Train', color='#458588')
    axes[0].plot(history.history['val_accuracy'], label='Validation', color='#cc241d')
    axes[0].set_xlabel('Epoch')
    axes[0].set_ylabel('Accuracy')
    axes[0].set_title('Model Accuracy')
    axes[0].legend()
    axes[0].grid(True, alpha=0.3)
    
    # Loss
    axes[1].plot(history.history['loss'], label='Train', color='#458588')
    axes[1].plot(history.history['val_loss'], label='Validation', color='#cc241d')
    axes[1].set_xlabel('Epoch')
    axes[1].set_ylabel('Loss')
    axes[1].set_title('Model Loss')
    axes[1].legend()
    axes[1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.savefig('cnn_mnist_history.png', dpi=150, bbox_inches='tight')
    plt.close()
    
    # Plot Predictions
    fig, axes = plt.subplots(2, 5, figsize=(12, 5))
    for i, ax in enumerate(axes.flat):
        ax.imshow(X_test[i].reshape(28, 28), cmap='gray')
        color = 'green' if y_pred_classes[i] == y_test[i] else 'red'
        ax.set_title(f'True: {y_test[i]}, Pred: {y_pred_classes[i]}', color=color)
        ax.axis('off')
    
    plt.tight_layout()
    plt.savefig('cnn_mnist_predictions.png', dpi=150, bbox_inches='tight')
    plt.close()
    
    return model, history


# ============================================================
# ตัวอย่างที่ 2: CNN สำหรับ CIFAR-10 (ภาพสี)
# ============================================================

def cnn_cifar10():
    """
    สร้าง CNN สำหรับจำแนกภาพ CIFAR-10
    
    Dataset: CIFAR-10 (50,000 train, 10,000 test, 32x32x3 RGB)
    Classes: airplane, automobile, bird, cat, deer, 
             dog, frog, horse, ship, truck
    """
    print("\n" + "=" * 60)
    print("ตัวอย่าง CNN: การจำแนกภาพ CIFAR-10")
    print("=" * 60)
    
    # Class names
    class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
                   'dog', 'frog', 'horse', 'ship', 'truck']
    
    # โหลดข้อมูล
    (X_train, y_train), (X_test, y_test) = cifar10.load_data()
    
    print(f"\nข้อมูล CIFAR-10:")
    print(f"  Training set: {X_train.shape}")
    print(f"  Test set: {X_test.shape}")
    print(f"  Classes: {class_names}")
    
    # Preprocessing
    X_train = X_train.astype('float32') / 255.0
    X_test = X_test.astype('float32') / 255.0
    
    y_train_cat = to_categorical(y_train, 10)
    y_test_cat = to_categorical(y_test, 10)
    
    # Data Augmentation
    datagen = ImageDataGenerator(
        rotation_range=15,          # หมุนภาพ ±15 องศา
        width_shift_range=0.1,      # เลื่อนแนวนอน ±10%
        height_shift_range=0.1,     # เลื่อนแนวตั้ง ±10%
        horizontal_flip=True,       # พลิกซ้าย-ขวา
        zoom_range=0.1              # ซูม ±10%
    )
    datagen.fit(X_train)
    
    # สร้างโมเดล CNN ด้วย Functional API
    inputs = layers.Input(shape=(32, 32, 3), name='input')
    
    # Block 1
    x = layers.Conv2D(32, (3, 3), padding='same', activation='relu')(inputs)
    x = layers.BatchNormalization()(x)
    x = layers.Conv2D(32, (3, 3), padding='same', activation='relu')(x)
    x = layers.BatchNormalization()(x)
    x = layers.MaxPooling2D((2, 2))(x)
    x = layers.Dropout(0.25)(x)
    
    # Block 2
    x = layers.Conv2D(64, (3, 3), padding='same', activation='relu')(x)
    x = layers.BatchNormalization()(x)
    x = layers.Conv2D(64, (3, 3), padding='same', activation='relu')(x)
    x = layers.BatchNormalization()(x)
    x = layers.MaxPooling2D((2, 2))(x)
    x = layers.Dropout(0.25)(x)
    
    # Block 3
    x = layers.Conv2D(128, (3, 3), padding='same', activation='relu')(x)
    x = layers.BatchNormalization()(x)
    x = layers.Conv2D(128, (3, 3), padding='same', activation='relu')(x)
    x = layers.BatchNormalization()(x)
    x = layers.MaxPooling2D((2, 2))(x)
    x = layers.Dropout(0.25)(x)
    
    # Dense Layers
    x = layers.Flatten()(x)
    x = layers.Dense(256, activation='relu')(x)
    x = layers.BatchNormalization()(x)
    x = layers.Dropout(0.5)(x)
    outputs = layers.Dense(10, activation='softmax', name='output')(x)
    
    model = models.Model(inputs=inputs, outputs=outputs)
    
    print("\nโครงสร้างโมเดล:")
    model.summary()
    
    # Compile
    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=0.001),
        loss='categorical_crossentropy',
        metrics=['accuracy']
    )
    
    # Callbacks
    early_stop = callbacks.EarlyStopping(
        monitor='val_accuracy',
        patience=10,
        restore_best_weights=True
    )
    
    reduce_lr = callbacks.ReduceLROnPlateau(
        monitor='val_loss',
        factor=0.5,
        patience=5,
        min_lr=1e-6
    )
    
    # ฝึกโมเดลด้วย Data Augmentation
    print("\nกำลังฝึกโมเดล (ใช้ Data Augmentation)...")
    history = model.fit(
        datagen.flow(X_train, y_train_cat, batch_size=64),
        steps_per_epoch=len(X_train) // 64,
        epochs=50,
        validation_data=(X_test, y_test_cat),
        callbacks=[early_stop, reduce_lr],
        verbose=1
    )
    
    # ประเมินผล
    test_loss, test_acc = model.evaluate(X_test, y_test_cat, verbose=0)
    
    print(f"\n{'='*40}")
    print("ผลการประเมิน (Test Set)")
    print(f"{'='*40}")
    print(f"  Loss: {test_loss:.4f}")
    print(f"  Accuracy: {test_acc:.4f} ({test_acc*100:.2f}%)")
    
    return model, history


# ============================================================
# ตัวอย่างที่ 3: Transfer Learning ด้วย Pre-trained Model
# ============================================================

def cnn_transfer_learning():
    """
    ตัวอย่าง Transfer Learning ด้วย MobileNetV2
    
    ใช้โมเดลที่ฝึกมาแล้วจาก ImageNet และ Fine-tune สำหรับงานใหม่
    """
    print("\n" + "=" * 60)
    print("ตัวอย่าง Transfer Learning: MobileNetV2")
    print("=" * 60)
    
    # โหลด CIFAR-10 และ resize เป็น 96x96 (MobileNetV2 ต้องการ >= 32x32)
    (X_train, y_train), (X_test, y_test) = cifar10.load_data()
    
    # Preprocessing
    X_train = X_train.astype('float32') / 255.0
    X_test = X_test.astype('float32') / 255.0
    
    # Resize images (CIFAR-10 32x32 -> 96x96)
    X_train_resized = tf.image.resize(X_train, (96, 96))
    X_test_resized = tf.image.resize(X_test, (96, 96))
    
    y_train_cat = to_categorical(y_train, 10)
    y_test_cat = to_categorical(y_test, 10)
    
    print(f"\nข้อมูลหลัง Resize:")
    print(f"  X_train: {X_train_resized.shape}")
    print(f"  X_test: {X_test_resized.shape}")
    
    # โหลด Pre-trained MobileNetV2 (ไม่รวมชั้น Dense บนสุด)
    base_model = keras.applications.MobileNetV2(
        input_shape=(96, 96, 3),
        include_top=False,           # ไม่รวม classification head
        weights='imagenet'           # ใช้ weights จาก ImageNet
    )
    
    # Freeze base model (ไม่ฝึกชั้นเหล่านี้)
    base_model.trainable = False
    
    print(f"\nBase Model: MobileNetV2")
    print(f"  Total layers: {len(base_model.layers)}")
    print(f"  Trainable: {base_model.trainable}")
    
    # สร้างโมเดลใหม่
    inputs = layers.Input(shape=(96, 96, 3))
    
    # Preprocessing สำหรับ MobileNetV2
    x = keras.applications.mobilenet_v2.preprocess_input(inputs)
    
    # Base model
    x = base_model(x, training=False)
    
    # Classification head ใหม่
    x = layers.GlobalAveragePooling2D()(x)
    x = layers.Dense(256, activation='relu')(x)
    x = layers.Dropout(0.5)(x)
    outputs = layers.Dense(10, activation='softmax')(x)
    
    model = models.Model(inputs=inputs, outputs=outputs)
    
    print("\nโครงสร้างโมเดล Transfer Learning:")
    model.summary()
    
    # Compile
    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=0.001),
        loss='categorical_crossentropy',
        metrics=['accuracy']
    )
    
    # ฝึก Classification Head เท่านั้น (Feature Extraction)
    print("\nPhase 1: Feature Extraction (ฝึกเฉพาะ Classification Head)")
    history1 = model.fit(
        X_train_resized, y_train_cat,
        batch_size=64,
        epochs=10,
        validation_data=(X_test_resized, y_test_cat),
        verbose=1
    )
    
    # Fine-tuning: Unfreeze บางชั้นของ base model
    print("\nPhase 2: Fine-tuning (ฝึกบางชั้นของ Base Model)")
    
    base_model.trainable = True
    
    # Freeze ชั้นแรกๆ (low-level features)
    fine_tune_at = 100  # Unfreeze ตั้งแต่ชั้นที่ 100 เป็นต้นไป
    for layer in base_model.layers[:fine_tune_at]:
        layer.trainable = False
    
    print(f"  Trainable layers: {len([l for l in base_model.layers if l.trainable])}")
    
    # Compile ใหม่ด้วย learning rate ต่ำกว่า
    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=0.0001),
        loss='categorical_crossentropy',
        metrics=['accuracy']
    )
    
    history2 = model.fit(
        X_train_resized, y_train_cat,
        batch_size=64,
        epochs=10,
        validation_data=(X_test_resized, y_test_cat),
        verbose=1
    )
    
    # ประเมินผล
    test_loss, test_acc = model.evaluate(X_test_resized, y_test_cat, verbose=0)
    
    print(f"\n{'='*40}")
    print("ผลการประเมิน (Test Set)")
    print(f"{'='*40}")
    print(f"  Loss: {test_loss:.4f}")
    print(f"  Accuracy: {test_acc:.4f} ({test_acc*100:.2f}%)")
    
    return model


# ============================================================
# ตัวอย่างที่ 4: Custom CNN Layer
# ============================================================

class ResidualBlock(layers.Layer):
    """
    Residual Block แบบกำหนดเอง (Custom Layer)
    
    ใช้ Skip Connection เพื่อช่วยให้ gradient ไหลได้ดีขึ้น
    """
    
    def __init__(self, filters, kernel_size=3, **kwargs):
        """
        Parameters:
        -----------
        filters : int
            จำนวน filters
        kernel_size : int
            ขนาด kernel
        """
        super(ResidualBlock, self).__init__(**kwargs)
        self.filters = filters
        self.kernel_size = kernel_size
        
        # Layers
        self.conv1 = layers.Conv2D(filters, kernel_size, padding='same')
        self.bn1 = layers.BatchNormalization()
        self.conv2 = layers.Conv2D(filters, kernel_size, padding='same')
        self.bn2 = layers.BatchNormalization()
        
        # Skip connection (ใช้ 1x1 conv ถ้า channels ไม่เท่ากัน)
        self.skip_conv = None
        self.skip_bn = None
    
    def build(self, input_shape):
        """สร้าง layers เมื่อรู้ขนาด input"""
        if input_shape[-1] != self.filters:
            self.skip_conv = layers.Conv2D(self.filters, 1, padding='same')
            self.skip_bn = layers.BatchNormalization()
        super().build(input_shape)
    
    def call(self, inputs, training=None):
        """Forward pass"""
        # Main path
        x = self.conv1(inputs)
        x = self.bn1(x, training=training)
        x = tf.nn.relu(x)
        
        x = self.conv2(x)
        x = self.bn2(x, training=training)
        
        # Skip connection
        if self.skip_conv is not None:
            skip = self.skip_conv(inputs)
            skip = self.skip_bn(skip, training=training)
        else:
            skip = inputs
        
        # Add skip connection
        x = x + skip
        x = tf.nn.relu(x)
        
        return x
    
    def get_config(self):
        """สำหรับ serialization"""
        config = super().get_config()
        config.update({
            'filters': self.filters,
            'kernel_size': self.kernel_size
        })
        return config


def cnn_with_residual_blocks():
    """
    สร้าง CNN ที่ใช้ Residual Blocks แบบกำหนดเอง
    """
    print("\n" + "=" * 60)
    print("ตัวอย่าง CNN with Custom Residual Blocks")
    print("=" * 60)
    
    # โหลด Fashion MNIST
    (X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()
    
    class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
                   'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
    
    print(f"\nข้อมูล Fashion MNIST:")
    print(f"  Training set: {X_train.shape}")
    print(f"  Classes: {class_names}")
    
    # Preprocessing
    X_train = X_train.reshape(-1, 28, 28, 1).astype('float32') / 255.0
    X_test = X_test.reshape(-1, 28, 28, 1).astype('float32') / 255.0
    
    y_train_cat = to_categorical(y_train, 10)
    y_test_cat = to_categorical(y_test, 10)
    
    # สร้างโมเดลด้วย Custom Residual Blocks
    inputs = layers.Input(shape=(28, 28, 1))
    
    # Initial Conv
    x = layers.Conv2D(32, 3, padding='same')(inputs)
    x = layers.BatchNormalization()(x)
    x = layers.Activation('relu')(x)
    
    # Residual Blocks
    x = ResidualBlock(32)(x)
    x = layers.MaxPooling2D(2)(x)
    
    x = ResidualBlock(64)(x)
    x = layers.MaxPooling2D(2)(x)
    
    x = ResidualBlock(128)(x)
    x = layers.GlobalAveragePooling2D()(x)
    
    # Classification Head
    x = layers.Dense(128, activation='relu')(x)
    x = layers.Dropout(0.5)(x)
    outputs = layers.Dense(10, activation='softmax')(x)
    
    model = models.Model(inputs=inputs, outputs=outputs)
    
    print("\nโครงสร้างโมเดล:")
    model.summary()
    
    # Compile และฝึก
    model.compile(
        optimizer='adam',
        loss='categorical_crossentropy',
        metrics=['accuracy']
    )
    
    print("\nกำลังฝึกโมเดล...")
    history = model.fit(
        X_train, y_train_cat,
        batch_size=128,
        epochs=15,
        validation_split=0.1,
        verbose=1
    )
    
    # ประเมินผล
    test_loss, test_acc = model.evaluate(X_test, y_test_cat, verbose=0)
    
    print(f"\n{'='*40}")
    print("ผลการประเมิน (Test Set)")
    print(f"{'='*40}")
    print(f"  Loss: {test_loss:.4f}")
    print(f"  Accuracy: {test_acc:.4f} ({test_acc*100:.2f}%)")
    
    return model


# ============================================================
# ตัวอย่างที่ 5: Visualizing CNN Filters and Feature Maps
# ============================================================

def visualize_cnn_features(model, X_test):
    """
    แสดง Filters และ Feature Maps ของ CNN
    
    Parameters:
    -----------
    model : keras.Model
        โมเดล CNN ที่ฝึกแล้ว
    X_test : numpy.ndarray
        ข้อมูลทดสอบ
    """
    print("\n" + "=" * 60)
    print("การแสดงผล CNN Filters และ Feature Maps")
    print("=" * 60)
    
    # หา Convolutional layers
    conv_layers = [layer for layer in model.layers 
                   if isinstance(layer, layers.Conv2D)]
    
    print(f"\nจำนวน Convolutional Layers: {len(conv_layers)}")
    
    if len(conv_layers) == 0:
        print("ไม่พบ Conv2D layers")
        return
    
    # 1. แสดง Filters ของ Layer แรก
    first_conv = conv_layers[0]
    filters = first_conv.get_weights()[0]
    
    print(f"\nFilters ของ {first_conv.name}:")
    print(f"  Shape: {filters.shape}")
    
    n_filters = min(filters.shape[3], 16)
    fig, axes = plt.subplots(2, n_filters // 2, figsize=(12, 4))
    
    for i, ax in enumerate(axes.flat):
        if i < n_filters:
            # แสดง filter (ใช้ channel แรกถ้าเป็น grayscale)
            f = filters[:, :, 0, i]
            ax.imshow(f, cmap='gray')
            ax.set_title(f'Filter {i+1}')
            ax.axis('off')
    
    plt.suptitle(f'Learned Filters: {first_conv.name}')
    plt.tight_layout()
    plt.savefig('cnn_filters.png', dpi=150, bbox_inches='tight')
    plt.close()
    
    # 2. แสดง Feature Maps
    # สร้างโมเดลที่ output feature maps
    layer_outputs = [layer.output for layer in conv_layers[:3]]
    activation_model = models.Model(inputs=model.input, outputs=layer_outputs)
    
    # ทำนายภาพตัวอย่าง
    sample_image = X_test[0:1]
    activations = activation_model.predict(sample_image, verbose=0)
    
    # แสดง Feature Maps ของแต่ละ layer
    for layer_idx, (layer, activation) in enumerate(zip(conv_layers[:3], activations)):
        n_features = min(activation.shape[3], 8)
        
        fig, axes = plt.subplots(2, n_features // 2, figsize=(12, 4))
        
        for i, ax in enumerate(axes.flat):
            if i < n_features:
                ax.imshow(activation[0, :, :, i], cmap='viridis')
                ax.axis('off')
        
        plt.suptitle(f'Feature Maps: {layer.name}')
        plt.tight_layout()
        plt.savefig(f'cnn_feature_maps_layer{layer_idx+1}.png', 
                    dpi=150, bbox_inches='tight')
        plt.close()
    
    print("\nบันทึกภาพไปที่:")
    print("  - cnn_filters.png")
    print("  - cnn_feature_maps_layer1.png")
    print("  - cnn_feature_maps_layer2.png")
    print("  - cnn_feature_maps_layer3.png")


# ============================================================
# ตัวอย่างที่ 6: Model Saving and Loading
# ============================================================

def save_and_load_model(model, model_name='my_cnn_model'):
    """
    บันทึกและโหลดโมเดล
    
    Parameters:
    -----------
    model : keras.Model
        โมเดลที่ต้องการบันทึก
    model_name : str
        ชื่อไฟล์โมเดล
    """
    print("\n" + "=" * 60)
    print("การบันทึกและโหลดโมเดล")
    print("=" * 60)
    
    # วิธีที่ 1: บันทึกทั้งโมเดล (.keras format - แนะนำ)
    model.save(f'{model_name}.keras')
    print(f"\n1. บันทึกโมเดลทั้งหมด: {model_name}.keras")
    
    # โหลดโมเดล
    loaded_model = keras.models.load_model(f'{model_name}.keras')
    print(f"   โหลดสำเร็จ: {type(loaded_model)}")
    
    # วิธีที่ 2: บันทึกเฉพาะ Weights
    model.save_weights(f'{model_name}_weights.weights.h5')
    print(f"\n2. บันทึก Weights: {model_name}_weights.weights.h5")
    
    # วิธีที่ 3: บันทึก Architecture (JSON)
    model_json = model.to_json()
    with open(f'{model_name}_architecture.json', 'w') as f:
        f.write(model_json)
    print(f"\n3. บันทึก Architecture: {model_name}_architecture.json")
    
    # โหลดจาก Architecture + Weights
    from tensorflow.keras.models import model_from_json
    
    with open(f'{model_name}_architecture.json', 'r') as f:
        loaded_json = f.read()
    
    new_model = model_from_json(loaded_json)
    new_model.load_weights(f'{model_name}_weights.weights.h5')
    print("   โหลดจาก Architecture + Weights สำเร็จ")
    
    return loaded_model


# ============================================================
# Main: รันตัวอย่างทั้งหมด
# ============================================================

if __name__ == "__main__":
    # ตั้งค่า random seed
    np.random.seed(42)
    tf.random.set_seed(42)
    
    # ตัวอย่างที่ 1: CNN พื้นฐาน MNIST
    model_mnist, history_mnist = cnn_mnist_basic()
    
    # Visualize features
    (_, _), (X_test, _) = mnist.load_data()
    X_test = X_test.reshape(-1, 28, 28, 1).astype('float32') / 255.0
    visualize_cnn_features(model_mnist, X_test)
    
    # บันทึกโมเดล
    save_and_load_model(model_mnist, 'mnist_cnn')
    
    # ตัวอย่างอื่นๆ (ใช้เวลานานกว่า):
    # model_cifar, history_cifar = cnn_cifar10()
    # model_transfer = cnn_transfer_learning()
    # model_residual = cnn_with_residual_blocks()
    
    print("\n" + "=" * 60)
    print("สรุป")
    print("=" * 60)
    print("  ตัวอย่างทั้งหมดทำงานสำเร็จ!")
    print("\nไฟล์ที่สร้าง:")
    print("  - cnn_mnist_history.png")
    print("  - cnn_mnist_predictions.png")
    print("  - cnn_filters.png")
    print("  - cnn_feature_maps_*.png")
    print("  - mnist_cnn.keras")

ผลลัพธ์ที่คาดหวัง:

============================================================
ตัวอย่าง CNN พื้นฐาน: การจำแนกตัวเลข MNIST
============================================================

ข้อมูล MNIST:
  Training set: (60000, 28, 28)
  Test set: (10000, 28, 28)

หลัง Preprocessing:
  X_train shape: (60000, 28, 28, 1)
  y_train shape: (60000, 10)

โครงสร้างโมเดล:
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv1 (Conv2D)              (None, 28, 28, 32)        320       
 bn1 (BatchNormalization)    (None, 28, 28, 32)        128       
 pool1 (MaxPooling2D)        (None, 14, 14, 32)        0         
 conv2 (Conv2D)              (None, 14, 14, 64)        18496     
 bn2 (BatchNormalization)    (None, 14, 14, 64)        256       
 pool2 (MaxPooling2D)        (None, 7, 7, 64)          0         
 conv3 (Conv2D)              (None, 7, 7, 128)         73856     
 bn3 (BatchNormalization)    (None, 7, 7, 128)         512       
 flatten (Flatten)           (None, 6272)              0         
 dense1 (Dense)              (None, 128)               802944    
 dropout (Dropout)           (None, 128)               0         
 output (Dense)              (None, 10)                1290      
=================================================================
Total params: 897,802
Trainable params: 897,354
Non-trainable params: 448

กำลังฝึกโมเดล...
Epoch 1/20
422/422 [==============================] - 15s - loss: 0.2156 - accuracy: 0.9342
...
Epoch 10/20
422/422 [==============================] - 14s - loss: 0.0234 - accuracy: 0.9921

========================================
ผลการประเมิน (Test Set)
========================================
  Loss: 0.0312
  Accuracy: 0.9912 (99.12%)

ตัวอย่างการทำนาย (10 ตัวอย่างแรก):
  True labels:      [7 2 1 0 4 1 4 9 5 9]
  Predicted labels: [7 2 1 0 4 1 4 9 5 9]

6.10 การประยุกต์ใช้ CNN

งาน	รายละเอียด	ตัวอย่าง
Image Classification	จำแนกประเภทภาพ	แมว/หมา, เลข 0-9
Object Detection	ตรวจจับและระบุตำแหน่งวัตถุ	YOLO, Faster R-CNN
Semantic Segmentation	แบ่งส่วนภาพระดับ pixel	U-Net, DeepLab
Face Recognition	จดจำใบหน้า	FaceNet
Medical Imaging	วิเคราะห์ภาพทางการแพทย์	ตรวจจับมะเร็ง

7. สรุปโดยรวม

7.1 สรุปเนื้อหาสำคัญ

โครงข่ายประสาทเทียม (Neural Networks) เป็นหัวใจสำคัญของ Deep Learning และ AI สมัยใหม่ โดยมีองค์ประกอบและแนวคิดหลักดังนี้:

1. Artificial Neuron (Perceptron)

รับอินพุต คูณน้ำหนัก บวกไบแอส ผ่าน Activation Function
เป็นหน่วยพื้นฐานของโครงข่าย

2. Multi-Layer Perceptron (MLP)

ประกอบด้วยหลายชั้น: Input, Hidden, Output
Forward Propagation: คำนวณจากอินพุตไปเอาต์พุต
Universal Approximation: สามารถประมาณค่าฟังก์ชันใดๆ ได้

3. Backpropagation

อัลกอริทึมสำคัญสำหรับการฝึกโครงข่าย
ใช้ Chain Rule คำนวณ Gradient
Gradient Descent อัปเดตน้ำหนักเพื่อลด Loss

4. Activation Functions

ทำให้โครงข่ายเรียนรู้ความสัมพันธ์ Non-linear
ReLU เป็นที่นิยมสำหรับ Hidden Layers
Softmax สำหรับ Multi-class Classification

5. Deep Learning

โครงข่ายที่มีหลายชั้นซ่อน
เรียนรู้ Hierarchical Features
ต้องใช้เทคนิคพิเศษ: Batch Norm, Dropout, Skip Connections

6. Convolutional Neural Networks (CNN)

ออกแบบสำหรับข้อมูลภาพ
Convolution Layer: ดึง Features ด้วย Filters
Pooling Layer: ลดขนาดและ Invariance
ประสบความสำเร็จในงาน Computer Vision

7.2 ทักษะที่ได้รับ

หลังจากศึกษาบทนี้ ผู้เรียนจะสามารถ:

อธิบายหลักการทำงานของ Neural Networks
คำนวณ Forward และ Backward Propagation ด้วยมือ
เลือก Activation Function ที่เหมาะสม
ระบุปัญหาและวิธีแก้ไขในการฝึก Deep Networks
อธิบายองค์ประกอบและการทำงานของ CNN
นำ Neural Networks ไปประยุกต์ใช้กับปัญหาจริง

7.3 แนวทางการศึกษาต่อ

Recurrent Neural Networks (RNN): สำหรับข้อมูลลำดับ
Transformer Architecture: สถาปัตยกรรมที่ใช้ใน LLMs
Generative Models: VAE, GAN, Diffusion Models
Reinforcement Learning: การเรียนรู้จากการลองผิดลองถูก

8. เอกสารอ้างอิง

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. https://www.deeplearningbook.org/
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533-536.
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. ICCV.
Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training. ICML.
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. JMLR, 15(1), 1929-1958.
Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. ICLR.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. NeurIPS.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. CVPR.