Support Vector Machines

ภาพรวม Support Vector Machines

Support Vector Machines (SVM) คืออัลกอริทึมการเรียนรู้ของเครื่องที่มีพื้นฐานทางคณิตศาสตร์อันแข็งแกร่ง โดยมีเป้าหมายในการหา Hyperplane ที่แบ่งแยกข้อมูลสองกลุ่มด้วย Margin ที่กว้างที่สุด (Maximum Margin Classifier) SVM เป็น Discriminative Model ที่มีประสิทธิภาพสูงในงานจำแนกประเภทและยังสามารถขยายไปใช้ในงาน Regression ได้อีกด้วย

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#282828', 'primaryTextColor': '#ebdbb2', 'primaryBorderColor': '#d79921', 'lineColor': '#d65d0e', 'secondaryColor': '#3c3836', 'tertiaryColor': '#504945', 'background': '#1d2021', 'mainBkg': '#282828', 'nodeBorder': '#d79921', 'clusterBkg': '#3c3836', 'titleColor': '#ebdbb2', 'edgeLabelBackground': '#504945', 'attributeBackgroundColorEven': '#282828', 'attributeBackgroundColorOdd': '#3c3836'}}}%%
graph TD
    A["🤖 Support Vector Machines
เครื่องเวกเตอร์สนับสนุน"] --> B["Linear SVM
เส้นตรง"]
    A --> C["Non-linear SVM
ไม่เป็นเส้นตรง"]
    A --> D["Multi-class SVM
หลายคลาส"]

    B --> B1["Hard Margin SVM
Margin แข็ง"]
    B --> B2["Soft Margin SVM
Margin นิ่ม"]

    C --> C1["Kernel Trick
เทคนิคเคอร์เนล"]
    C1 --> C2["RBF Kernel"]
    C1 --> C3["Polynomial Kernel"]
    C1 --> C4["Sigmoid Kernel"]

    D --> D1["One-vs-One (OvO)"]
    D --> D2["One-vs-Rest (OvR)"]

    style A fill:#d79921,color:#282828,stroke:#b57614
    style B fill:#458588,color:#ebdbb2,stroke:#076678
    style C fill:#98971a,color:#ebdbb2,stroke:#79740e
    style D fill:#cc241d,color:#ebdbb2,stroke:#9d0006
    style B1 fill:#3c3836,color:#ebdbb2,stroke:#504945
    style B2 fill:#3c3836,color:#ebdbb2,stroke:#504945
    style C1 fill:#3c3836,color:#ebdbb2,stroke:#504945
    style C2 fill:#504945,color:#ebdbb2,stroke:#665c54
    style C3 fill:#504945,color:#ebdbb2,stroke:#665c54
    style C4 fill:#504945,color:#ebdbb2,stroke:#665c54
    style D1 fill:#3c3836,color:#ebdbb2,stroke:#504945
    style D2 fill:#3c3836,color:#ebdbb2,stroke:#504945

1 Linear SVM

1.1 แนวคิดพื้นฐาน (Fundamental Concepts)

Linear SVM เป็นพื้นฐานของ SVM ทั้งหมด โดยมีเป้าหมายในการหา Decision Boundary ที่เป็นเส้นตรง (Hyperplane) ซึ่งแบ่งแยกข้อมูลสองคลาสออกจากกัน โดยทำให้ระยะห่างระหว่าง Hyperplane กับจุดข้อมูลที่ใกล้ที่สุดของแต่ละคลาส (Margin) มีค่าสูงสุด

คำศัพท์สำคัญ:

Hyperplane — ระนาบแบ่งแยกข้อมูล มีสมการ wᵀx + b = 0
Support Vectors — จุดข้อมูลที่อยู่ใกล้ Hyperplane ที่สุด และมีอิทธิพลต่อการกำหนด Hyperplane
Margin — ระยะห่างระหว่าง Hyperplane กับ Support Vectors ของแต่ละคลาส
Maximum Margin Classifier — หลักการหา Hyperplane ที่ทำให้ Margin กว้างที่สุด

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#282828', 'primaryTextColor': '#ebdbb2', 'primaryBorderColor': '#d79921', 'lineColor': '#d65d0e', 'secondaryColor': '#3c3836', 'background': '#1d2021', 'mainBkg': '#282828'}}}%%
graph LR
    subgraph SPACE["Feature Space (ปริภูมิคุณสมบัติ)"]
        SV1["● Support Vector
(คลาส +1)"]
        SV2["▲ Support Vector
(คลาส -1)"]
        HP["━━━ Decision Hyperplane
w·x + b = 0"]
        M1["- - - Margin Boundary +
w·x + b = +1"]
        M2["- - - Margin Boundary -
w·x + b = -1"]
    end
    M1 --> HP
    HP --> M2
    SV1 -. "2/||w||" .-> SV2

    style SPACE fill:#282828,stroke:#d79921
    style SV1 fill:#458588,color:#ebdbb2,stroke:#076678
    style SV2 fill:#cc241d,color:#ebdbb2,stroke:#9d0006
    style HP fill:#d79921,color:#282828,stroke:#b57614
    style M1 fill:#3c3836,color:#ebdbb2,stroke:#504945
    style M2 fill:#3c3836,color:#ebdbb2,stroke:#504945

1.2 สมการและการกำหนดปัญหา (Mathematical Formulation)

สมการ Hyperplane

w^{T} x + b = 0

โดยที่:

w = เวกเตอร์น้ำหนัก (Weight Vector) ตั้งฉากกับ Hyperplane
x = เวกเตอร์คุณสมบัติ (Feature Vector) ของข้อมูล
b = ค่า Bias (จุดตัดแกน)

เงื่อนไขการจำแนก (Classification Condition)

y_{i} (w^{T} x_{i} + b) \geq 1, \forall i

โดยที่ yᵢ ∈ {+1, -1} คือ label ของข้อมูล

ความกว้างของ Margin

Margin = \frac{2}{‖ w ‖}

1.3 ปัญหาการหาค่าเหมาะสมที่สุด (Optimization Problem)

Hard Margin SVM (กรณีข้อมูลแยกได้สมบูรณ์)

วัตถุประสงค์: ทำให้ Margin กว้างที่สุด หรือเทียบเท่ากับทำให้ ||w|| น้อยที่สุด

\underset{w, b}{minimize} \frac{1}{2} {‖ w ‖}^{2}

subject to: y_{i} (w^{T} x_{i} + b) \geq 1, i = 1, ..., n

1.4 Soft Margin SVM (กรณีข้อมูลแยกไม่สมบูรณ์)

Soft Margin SVM อนุญาตให้มีการจำแนกผิดพลาดได้บ้าง โดยเพิ่ม Slack Variables (ξᵢ) เข้าไปในสมการ

\underset{w, b, ξ}{minimize} \frac{1}{2} {‖ w ‖}^{2} + C \sum_{i = 1}^{n} ξ_{i}

subject to: y_{i} (w^{T} x_{i} + b) \geq 1 - ξ_{i}, ξ_{i} \geq 0

โดยที่:

ξᵢ (xi) = Slack Variable ของจุดข้อมูลที่ i — วัดระดับการละเมิด Margin
C = Regularization Parameter ควบคุมการแลกเปลี่ยนระหว่าง Margin กับ Error
- C สูง → ให้ความสำคัญกับการจำแนกถูกต้องมากขึ้น (Overfitting ง่าย)
- C ต่ำ → ยอมรับ Error ได้มากขึ้น Margin กว้างขึ้น (Underfitting)

1.5 ตัวอย่างการคำนวณ Linear SVM

ข้อมูลตัวอย่าง (2 มิติ):

จุด (i)	x₁	x₂	Label (y)
1	1	1	+1
2	2	2	+1
3	2	0	+1
4	0	0	-1
5	1	-1	-1
6	0	1	-1

เป้าหมาย: หา Hyperplane w₁x₁ + w₂x₂ + b = 0

สมมติว่า Support Vectors คือ: จุด 2 (คลาส +1) และจุด 4 (คลาส -1)

เงื่อนไข:

จุด 2: +1 × (2w₁ + 2w₂ + b) = 1 → 2w₁ + 2w₂ + b = 1
จุด 4: -1 × (0w₁ + 0w₂ + b) = 1 → b = -1 → b = -1

แทนค่า b = -1:

2w₁ + 2w₂ - 1 = 1 → 2w₁ + 2w₂ = 2 → w₁ + w₂ = 1

สมมติ w₁ = w₂ = 0.5 (โดยใช้ symmetry):

w = [0.5, 0.5], b = -1

ความกว้าง Margin: $Margin = \frac{2}{‖ w ‖} = \frac{2}{\sqrt{{0.5}^{2} + {0.5}^{2}}} = \frac{2}{\sqrt{0.5}} \approx 2.83$

1.6 ตัวอย่างโค้ด Python: Linear SVM

"""
Linear SVM ด้วย scikit-learn
สัปดาห์ที่ 11: Support Vector Machines
"""

import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns

# ===== สร้างข้อมูลตัวอย่าง =====
np.random.seed(42)

# ข้อมูลคลาส +1 (จุดสีน้ำเงิน)
class_pos = np.random.multivariate_normal(
    mean=[2, 2],
    cov=[[0.5, 0], [0, 0.5]],
    size=50
)

# ข้อมูลคลาส -1 (จุดสีแดง)
class_neg = np.random.multivariate_normal(
    mean=[-1, -1],
    cov=[[0.5, 0], [0, 0.5]],
    size=50
)

# รวมข้อมูลและสร้าง labels
X = np.vstack([class_pos, class_neg])
y = np.hstack([np.ones(50), -np.ones(50)])

# ===== แบ่งข้อมูล Train/Test =====
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

# ===== ปรับมาตราส่วน (Normalization) =====
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# ===== สร้างและฝึก Hard Margin SVM =====
# C สูงมาก ≈ Hard Margin SVM
svm_hard = SVC(kernel='linear', C=1000.0, random_state=42)
svm_hard.fit(X_train_scaled, y_train)

# ===== สร้างและฝึก Soft Margin SVM =====
svm_soft = SVC(kernel='linear', C=1.0, random_state=42)
svm_soft.fit(X_train_scaled, y_train)

# ===== ประเมินผล =====
for name, model in [("Hard Margin (C=1000)", svm_hard), ("Soft Margin (C=1)", svm_soft)]:
    y_pred = model.predict(X_test_scaled)
    print(f"\n{'='*50}")
    print(f"Model: {name}")
    print(f"Support Vectors: {model.n_support_}")  # จำนวน Support Vectors แต่ละคลาส
    print(f"Weight Vector (w): {model.coef_[0].round(4)}")
    print(f"Bias (b): {model.intercept_[0]:.4f}")
    print(f"\nClassification Report:")
    print(classification_report(y_test, y_pred, target_names=['คลาส -1', 'คลาส +1']))


def plot_svm_decision_boundary(model, X, y, scaler, title):
    """
    วาด Decision Boundary และ Margin ของ SVM
    
    Parameters:
        model: SVM model ที่ฝึกแล้ว
        X: ข้อมูล (ก่อน scale)
        y: labels
        scaler: StandardScaler ที่ใช้
        title: ชื่อกราฟ
    """
    X_scaled = scaler.transform(X)
    
    # สร้าง meshgrid
    x_min, x_max = X_scaled[:, 0].min() - 0.5, X_scaled[:, 0].max() + 0.5
    y_min, y_max = X_scaled[:, 1].min() - 0.5, X_scaled[:, 1].max() + 0.5
    xx, yy = np.meshgrid(
        np.linspace(x_min, x_max, 300),
        np.linspace(y_min, y_max, 300)
    )
    
    # คำนวณ Decision Function
    Z = model.decision_function(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    
    plt.figure(figsize=(8, 6))
    
    # วาด Decision Boundary และ Margin
    plt.contourf(xx, yy, Z, levels=[-999, -1, 0, 1, 999],
                 colors=['#cc241d33', '#cc241d55', '#45858855', '#45858833'], alpha=0.4)
    plt.contour(xx, yy, Z, levels=[-1, 0, 1],
                colors=['#cc241d', '#d79921', '#458588'],
                linestyles=['--', '-', '--'], linewidths=[1.5, 2.5, 1.5])
    
    # วาดจุดข้อมูล
    plt.scatter(X_scaled[y == 1, 0], X_scaled[y == 1, 1],
                c='#458588', marker='o', s=60, label='คลาส +1', edgecolors='k', linewidth=0.5)
    plt.scatter(X_scaled[y == -1, 0], X_scaled[y == -1, 1],
                c='#cc241d', marker='^', s=60, label='คลาส -1', edgecolors='k', linewidth=0.5)
    
    # เน้น Support Vectors
    sv = model.support_vectors_
    plt.scatter(sv[:, 0], sv[:, 1], s=150, facecolors='none',
                edgecolors='#d79921', linewidth=2.5, label='Support Vectors', zorder=5)
    
    plt.title(f'{title}\n(Support Vectors: {model.n_support_})', fontsize=13)
    plt.xlabel('Feature 1 (scaled)')
    plt.ylabel('Feature 2 (scaled)')
    plt.legend(fontsize=9)
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.savefig(f'linear_svm_{title[:4]}.png', dpi=150, bbox_inches='tight')
    plt.show()


# วาดกราฟเปรียบเทียบ
plot_svm_decision_boundary(svm_hard, X_train, y_train, scaler, "Hard Margin SVM (C=1000)")
plot_svm_decision_boundary(svm_soft, X_train, y_train, scaler, "Soft Margin SVM (C=1.0)")

ผลลัพธ์ที่คาดหวัง:

==================================================
Model: Hard Margin (C=1000)
Support Vectors: [1 1]    ← น้อยมาก เพราะ Margin แคบ
Weight Vector (w): [ 0.8234 -0.5671]
Bias (b): -0.0123

Model: Soft Margin (C=1)
Support Vectors: [5 4]    ← มากขึ้น เพราะ Margin กว้างขึ้น
Weight Vector (w): [ 0.7891 -0.5432]
Bias (b): -0.0087

1.7 เปรียบเทียบ Hard Margin vs Soft Margin

คุณสมบัติ	Hard Margin SVM	Soft Margin SVM
พารามิเตอร์ C	ไม่มี (∞)	มีค่า C > 0
ข้อมูล Noise	ไม่ทนทาน	ทนทาน
ข้อมูล Outlier	จัดการไม่ได้	จัดการได้
ความสามารถ Generalize	ต่ำ	สูง
Slack Variables ξᵢ	ไม่มี	มี
ความซับซ้อน	น้อย	ปานกลาง
การใช้งานจริง	น้อย	มาก

2 Kernel Trick

2.1 ความจำเป็นของ Kernel Trick

ปัญหาใหญ่ของ Linear SVM คือสามารถแบ่งแยกได้เฉพาะข้อมูลที่ แยกเชิงเส้นได้ (Linearly Separable) เท่านั้น ในความเป็นจริง ข้อมูลส่วนใหญ่มีโครงสร้างที่ซับซ้อนกว่านั้น

แนวคิด: แทนที่จะแปลงข้อมูลไปยัง Feature Space มิติสูง (ซึ่งใช้คอมพิวเตอร์มาก) เราใช้ Kernel Function ที่คำนวณผลคูณภายใน (Inner Product) ใน Feature Space โดยตรง

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#282828', 'primaryTextColor': '#ebdbb2', 'primaryBorderColor': '#d79921', 'lineColor': '#d65d0e', 'background': '#1d2021', 'mainBkg': '#282828'}}}%%
graph LR
    subgraph ORIG["Original Space (2D)"]
        D1["ข้อมูลที่แยกเชิงเส้น
ไม่ได้"]
    end

    subgraph FEAT["Feature Space (Higher Dim)"]
        D2["ข้อมูลที่แยกเชิงเส้น
ได้แล้ว"]
    end

    D1 -->|"φ(x) — Mapping"| D2
    D2 -->|"Kernel K(x,z) = φ(x)·φ(z)"| D1

    note["💡 Kernel Trick: คำนวณ φ(x)·φ(z)
โดยไม่ต้องคำนวณ φ(x) จริง ๆ"]

    style ORIG fill:#3c3836,stroke:#d79921,color:#ebdbb2
    style FEAT fill:#3c3836,stroke:#98971a,color:#ebdbb2
    style D1 fill:#282828,color:#ebdbb2,stroke:#504945
    style D2 fill:#282828,color:#ebdbb2,stroke:#504945
    style note fill:#b57614,color:#282828,stroke:#d79921

2.2 นิยาม Kernel Function

Kernel Function K(x, z) คือฟังก์ชันที่คำนวณค่าเทียบเท่ากับผลคูณภายในของ Feature Maps:

K (x, z) = ⟨ φ (x), φ (z) ⟩

โดยที่ φ คือ Feature Map ที่แปลงข้อมูลไปยัง Feature Space มิติสูง

เงื่อนไข Mercer's Theorem: K ต้องเป็น Positive Semi-definite (PSD) จึงจะเป็น Kernel ที่ถูกต้อง

2.3 ประเภทของ Kernel Functions

Kernel ที่ใช้บ่อย

Kernel	สมการ	พารามิเตอร์	การใช้งาน
Linear	K(x,z) = xᵀz	ไม่มี	ข้อมูล High-dimensional
Polynomial	K(x,z) = (γxᵀz + r)ᵈ	d, γ, r	Text Classification
RBF/Gaussian	K(x,z) = exp(-γ‖x-z‖²)	γ	งานทั่วไป
Sigmoid	K(x,z) = tanh(γxᵀz + r)	γ, r	Neural Network-like

Polynomial Kernel — ตัวอย่างการคำนวณ

ข้อมูลตัวอย่าง:

x = [1, 2], z = [3, 4], d = 2, γ = 1, r = 0

คำนวณ Polynomial Kernel: $K (x, z) = (x^{T} z)^{2} = (1 \times 3 + 2 \times 4)^{2} = 11^{2} = 121$

ตรวจสอบด้วย Feature Map φ(x) = [x₁², √2·x₁x₂, x₂²]:

φ([1,2]) = [1, 2√2, 4]
φ([3,4]) = [9, 12√2, 16]
φ(x)·φ(z) = 1×9 + 2√2×12√2 + 4×16 = 9 + 48 + 64 = 121 ✓

RBF Kernel — ตัวอย่างการคำนวณ

ข้อมูลตัวอย่าง:

x = [1, 0], z = [0, 1], γ = 1.0

K (x, z) = \exp (- γ {‖ x - z ‖}^{2}) = \exp (- 1.0 \times (1^{2} + 1^{2})) = \exp (- 2) \approx 0.1353

ตีความ: ค่า K ≈ 0.135 แสดงว่า x และ z มีความคล้ายกันน้อย (ห่างกัน) ถ้า x = z จะได้ K = 1 (คล้ายกันที่สุด)

2.4 ตัวอย่างโค้ด Python: เปรียบเทียบ Kernel Functions

"""
เปรียบเทียบ Kernel Functions ใน SVM
"""

import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.datasets import make_circles, make_moons

# ===== สร้างข้อมูลตัวอย่างที่ไม่เป็นเส้นตรง =====
# Circles Dataset: ข้อมูลวงกลมซ้อนกัน — Linear SVM จัดการไม่ได้
X_circles, y_circles = make_circles(n_samples=200, noise=0.1, factor=0.4, random_state=42)

# Moons Dataset: ข้อมูลรูปพระจันทร์เสี้ยว
X_moons, y_moons = make_moons(n_samples=200, noise=0.15, random_state=42)

# ===== กำหนด Kernels ที่จะทดสอบ =====
kernels = {
    'Linear': SVC(kernel='linear', C=1.0, random_state=42),
    'Polynomial (d=3)': SVC(kernel='poly', degree=3, C=1.0, gamma='scale', random_state=42),
    'RBF (γ=auto)': SVC(kernel='rbf', C=1.0, gamma='scale', random_state=42),
    'Sigmoid': SVC(kernel='sigmoid', C=1.0, gamma='scale', random_state=42),
}

def evaluate_kernels_on_dataset(X, y, dataset_name):
    """
    ประเมินผล SVM หลาย Kernels บนชุดข้อมูลที่กำหนด
    
    Parameters:
        X: Feature matrix
        y: Labels (0 หรือ 1)
        dataset_name: ชื่อชุดข้อมูล
    """
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)
    X_train, X_test, y_train, y_test = train_test_split(
        X_scaled, y, test_size=0.3, random_state=42, stratify=y
    )
    
    print(f"\n{'='*60}")
    print(f"Dataset: {dataset_name}")
    print(f"{'Kernel':<25} {'Train Acc':>10} {'Test Acc':>10} {'#SV':>6}")
    print('-'*55)
    
    results = {}
    for name, model in kernels.items():
        model.fit(X_train, y_train)
        train_acc = accuracy_score(y_train, model.predict(X_train))
        test_acc = accuracy_score(y_test, model.predict(X_test))
        n_sv = sum(model.n_support_)
        print(f"{name:<25} {train_acc:>10.4f} {test_acc:>10.4f} {n_sv:>6}")
        results[name] = {'model': model, 'scaler': scaler, 'test_acc': test_acc}
    
    return results

# รันการเปรียบเทียบ
results_circles = evaluate_kernels_on_dataset(X_circles, y_circles, "Circles Dataset")
results_moons = evaluate_kernels_on_dataset(X_moons, y_moons, "Moons Dataset")

# ===== คำนวณ Kernel Matrix =====
def compute_kernel_matrix(X, kernel='rbf', gamma=1.0):
    """
    คำนวณ Kernel Matrix K[i,j] = K(xᵢ, xⱼ)
    
    Parameters:
        X: ข้อมูล shape (n_samples, n_features)
        kernel: ชนิดของ Kernel ('linear', 'rbf', 'poly')
        gamma: พารามิเตอร์ γ สำหรับ RBF kernel
    """
    n = X.shape[0]
    K = np.zeros((n, n))
    
    for i in range(n):
        for j in range(n):
            if kernel == 'linear':
                # Linear Kernel: xᵢᵀxⱼ
                K[i, j] = np.dot(X[i], X[j])
            elif kernel == 'rbf':
                # RBF Kernel: exp(-γ‖xᵢ-xⱼ‖²)
                diff = X[i] - X[j]
                K[i, j] = np.exp(-gamma * np.dot(diff, diff))
            elif kernel == 'poly':
                # Polynomial Kernel: (xᵢᵀxⱼ + 1)²
                K[i, j] = (np.dot(X[i], X[j]) + 1) ** 2
    return K


# ตัวอย่างการคำนวณ Kernel Matrix สำหรับข้อมูล 5 จุดแรก
X_sample = np.array([[1, 0], [0, 1], [1, 1], [-1, 0], [0, -1]], dtype=float)
print("\n=== Kernel Matrix (RBF, γ=1.0) สำหรับข้อมูล 5 จุด ===")
K_rbf = compute_kernel_matrix(X_sample, kernel='rbf', gamma=1.0)
print(np.round(K_rbf, 4))

print("\n=== Kernel Matrix (Polynomial, d=2) สำหรับข้อมูล 5 จุด ===")
K_poly = compute_kernel_matrix(X_sample, kernel='poly')
print(np.round(K_poly, 4))

ผลลัพธ์ที่คาดหวัง:

============================================================
Dataset: Circles Dataset
Kernel                    Train Acc   Test Acc     #SV
-------------------------------------------------------
Linear                       0.5357     0.5167     94
Polynomial (d=3)             0.9929     0.9833      8
RBF (γ=auto)                 0.9929     0.9833      7
Sigmoid                      0.5571     0.5500     90

Dataset: Moons Dataset
Linear                       0.8929     0.8833     28
Polynomial (d=3)             0.9929     0.9667     10
RBF (γ=auto)                 0.9929     0.9833      8
Sigmoid                      0.8786     0.8833     28

สังเกต: Linear SVM ทำได้แค่ ~50% บน Circles เพราะข้อมูลไม่แยกเชิงเส้นได้ แต่ RBF Kernel ทำได้ ~98%

2.5 ผลกระทบของพารามิเตอร์ γ ใน RBF Kernel

K (x, z) = \exp (- γ {‖ x - z ‖}^{2})

ค่า γ	ผล	ความเสี่ยง
γ เล็กมาก	Decision Boundary เรียบ Kernel มีอิทธิพลไกล	Underfitting
γ เหมาะสม	สมดุลระหว่าง Bias และ Variance	ดี
γ ใหญ่มาก	Decision Boundary ซับซ้อน Kernel มีอิทธิพลใกล้	Overfitting

3 Non-linear SVM

3.1 แนวคิด Non-linear SVM

Non-linear SVM ใช้ Kernel Trick เพื่อแปลงปัญหา Non-linear ในปริภูมิเดิมให้กลายเป็นปัญหา Linear ใน Feature Space มิติสูง

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#282828', 'primaryTextColor': '#ebdbb2', 'primaryBorderColor': '#d79921', 'lineColor': '#d65d0e', 'background': '#1d2021', 'mainBkg': '#282828'}}}%%
flowchart LR
    A["Input Space ℝⁿ
ข้อมูลดิบ x"] -->|"Kernel K(xᵢ,xⱼ)"| B["Dual Optimization
หา α"]
    B --> C["Decision Function
f(x) = Σ αᵢyᵢK(xᵢ,x)+b"]
    C --> D["Classification
y = sign(f(x))"]

    style A fill:#458588,color:#ebdbb2,stroke:#076678
    style B fill:#d79921,color:#282828,stroke:#b57614
    style C fill:#98971a,color:#ebdbb2,stroke:#79740e
    style D fill:#cc241d,color:#ebdbb2,stroke:#9d0006

3.2 Dual Formulation ของ SVM

Dual Problem ของ Soft Margin SVM มีรูปแบบ:

\underset{α}{maximize} \sum_{i = 1}^{n} α_{i} - \frac{1}{2} \sum_{i = 1}^{n} \sum_{j = 1}^{n} α_{i} α_{j} y_{i} y_{j} K (x_{i}, x_{j})

subject to: 0 \leq α_{i} \leq C, \sum_{i = 1}^{n} α_{i} y_{i} = 0

Decision Function:

f (x) = \sum_{i = 1}^{n} α_{i} y_{i} K (x_{i}, x) + b

โดยที่ αᵢ > 0 เฉพาะจุด Support Vectors เท่านั้น — ทำให้การทำนายขึ้นอยู่กับ Support Vectors เท่านั้น

3.3 SVR — Support Vector Regression

SVM ยังสามารถใช้กับงาน Regression ได้ผ่าน Support Vector Regression (SVR)

ε-insensitive loss: | y - f (x) | - ε ถ้า | y - f (x) | > ε

3.4 ตัวอย่างโค้ด Python: Non-linear SVM บน Real Dataset

"""
Non-linear SVM บน Breast Cancer Dataset
เปรียบเทียบ Kernels และทำ Hyperparameter Tuning
"""

import numpy as np
from sklearn.svm import SVC, SVR
from sklearn.datasets import load_breast_cancer, load_diabetes
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.metrics import (classification_report, confusion_matrix,
                              roc_auc_score, roc_curve)
import matplotlib.pyplot as plt
import seaborn as sns

# ===== โหลดข้อมูล Breast Cancer =====
data = load_breast_cancer()
X, y = data.data, data.target
feature_names = data.feature_names
target_names = data.target_names

print(f"Dataset: Breast Cancer Wisconsin")
print(f"จำนวนตัวอย่าง: {X.shape[0]}")
print(f"จำนวน Features: {X.shape[1]}")
print(f"คลาส: {target_names} ({np.bincount(y)})")

# ===== แบ่งข้อมูลและ Normalize =====
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

scaler = StandardScaler()
X_train_s = scaler.fit_transform(X_train)
X_test_s = scaler.transform(X_test)

# ===== Hyperparameter Tuning ด้วย GridSearchCV =====
print("\n🔍 กำลังทำ Hyperparameter Tuning...")

param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': ['scale', 'auto', 0.001, 0.01, 0.1],
    'kernel': ['rbf', 'poly']
}

svm_grid = SVC(probability=True, random_state=42)
grid_search = GridSearchCV(
    svm_grid,
    param_grid,
    cv=5,              # 5-Fold Cross-Validation
    scoring='roc_auc', # ประเมินด้วย AUC-ROC
    n_jobs=-1,
    verbose=0
)
grid_search.fit(X_train_s, y_train)

print(f"\n✅ Best Parameters: {grid_search.best_params_}")
print(f"✅ Best CV AUC-ROC: {grid_search.best_score_:.4f}")

# ===== ประเมินผลด้วย Best Model =====
best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test_s)
y_prob = best_model.predict_proba(X_test_s)[:, 1]

print(f"\n{'='*60}")
print(f"ผลการประเมิน (Test Set)")
print(f"{'='*60}")
print(classification_report(y_test, y_pred,
                             target_names=['Malignant (มะเร็ง)', 'Benign (ไม่มะเร็ง)']))
print(f"AUC-ROC: {roc_auc_score(y_test, y_prob):.4f}")

# ===== วาด Confusion Matrix =====
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Confusion Matrix
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='YlOrRd', ax=axes[0],
            xticklabels=['Malignant', 'Benign'],
            yticklabels=['Malignant', 'Benign'])
axes[0].set_title('Confusion Matrix\n(Non-linear SVM with RBF Kernel)', fontsize=12)
axes[0].set_ylabel('True Label')
axes[0].set_xlabel('Predicted Label')

# ROC Curve
fpr, tpr, _ = roc_curve(y_test, y_prob)
auc_score = roc_auc_score(y_test, y_prob)
axes[1].plot(fpr, tpr, color='#d79921', lw=2.5,
             label=f'SVM (AUC = {auc_score:.4f})')
axes[1].plot([0, 1], [0, 1], 'k--', lw=1, label='Random (AUC = 0.50)')
axes[1].fill_between(fpr, tpr, alpha=0.2, color='#d79921')
axes[1].set_xlabel('False Positive Rate')
axes[1].set_ylabel('True Positive Rate')
axes[1].set_title('ROC Curve\n(Non-linear SVM)', fontsize=12)
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('nonlinear_svm_results.png', dpi=150, bbox_inches='tight')
plt.show()

# ===== Cross-validation เพื่อความมั่นใจ =====
cv_scores = cross_val_score(best_model, X_train_s, y_train, cv=10, scoring='accuracy')
print(f"\n10-Fold CV Accuracy: {cv_scores.mean():.4f} ± {cv_scores.std():.4f}")

# ===== Support Vector Regression ตัวอย่าง =====
print(f"\n{'='*60}")
print(f"SVR (Support Vector Regression) ตัวอย่าง")
print(f"{'='*60}")

# โหลดข้อมูล Diabetes
diabetes = load_diabetes()
Xr, yr = diabetes.data, diabetes.target

Xr_train, Xr_test, yr_train, yr_test = train_test_split(
    Xr, yr, test_size=0.2, random_state=42
)
scaler_r = StandardScaler()
Xr_train_s = scaler_r.fit_transform(Xr_train)
Xr_test_s = scaler_r.transform(Xr_test)

# สร้าง SVR
svr = SVR(kernel='rbf', C=100, gamma='scale', epsilon=10)
svr.fit(Xr_train_s, yr_train)

r2 = svr.score(Xr_test_s, yr_test)
print(f"SVR R² Score (Diabetes Dataset): {r2:.4f}")
print(f"จำนวน Support Vectors: {svr.n_support_[0]}")

3.5 ผลกระทบของ C และ γ ต่อ Decision Boundary

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#282828', 'primaryTextColor': '#ebdbb2', 'primaryBorderColor': '#d79921', 'lineColor': '#d65d0e', 'background': '#1d2021', 'mainBkg': '#282828'}}}%%
graph TD
    subgraph C_EFFECT["ผลของ C (Regularization)"]
        C1["C เล็ก
Margin กว้าง
ยอมรับ Error
→ Underfitting"]
        C2["C เหมาะสม
สมดุล
Bias-Variance"]
        C3["C ใหญ่
Margin แคบ
ไม่ยอมรับ Error
→ Overfitting"]
        C1 --> C2 --> C3
    end

    subgraph G_EFFECT["ผลของ γ (RBF Kernel)"]
        G1["γ เล็ก
อิทธิพลไกล
Boundary เรียบ
→ Underfitting"]
        G2["γ เหมาะสม
สมดุล"]
        G3["γ ใหญ่
อิทธิพลใกล้
Boundary ซับซ้อน
→ Overfitting"]
        G1 --> G2 --> G3
    end

    style C1 fill:#cc241d,color:#ebdbb2,stroke:#9d0006
    style C2 fill:#98971a,color:#ebdbb2,stroke:#79740e
    style C3 fill:#cc241d,color:#ebdbb2,stroke:#9d0006
    style G1 fill:#cc241d,color:#ebdbb2,stroke:#9d0006
    style G2 fill:#98971a,color:#ebdbb2,stroke:#79740e
    style G3 fill:#cc241d,color:#ebdbb2,stroke:#9d0006
    style C_EFFECT fill:#3c3836,stroke:#d79921,color:#ebdbb2
    style G_EFFECT fill:#3c3836,stroke:#d79921,color:#ebdbb2

4 Multi-class Classification

4.1 ความท้าทายของการจำแนกหลายคลาส

SVM ถูกออกแบบมาสำหรับ Binary Classification (2 คลาส) โดยธรรมชาติ การขยายไปยัง Multi-class จึงต้องใช้กลยุทธ์พิเศษ

กลยุทธ์	วิธีการ	จำนวนโมเดล (K คลาส)	ข้อดี	ข้อเสีย
One-vs-Rest (OvR)	สร้าง K โมเดล แต่ละโมเดลแยก 1 คลาสออกจากที่เหลือ	K โมเดล	เร็ว ใช้หน่วยความจำน้อย	คลาส imbalance
One-vs-One (OvO)	สร้างโมเดลสำหรับทุกคู่คลาส และ Vote	K(K-1)/2 โมเดล	แม่นยำกว่า	ช้า โมเดลมาก
DAGSVM	เรียงการตัดสินใจเป็น DAG	K(K-1)/2 โมเดล	ประสิทธิภาพดี	ซับซ้อน
Crammer-Singer	แก้ปัญหา Optimization ร่วมกัน	1 โมเดล	Principled approach	คำนวณยาก

4.2 One-vs-Rest (OvR) Strategy

กระบวนการ:

สำหรับคลาส k: สร้าง binary SVM โดย Label คลาส k เป็น +1 และคลาสอื่นทั้งหมดเป็น -1
ฝึก K โมเดล
ทำนายโดยเลือกคลาสที่ให้ค่า Decision Function สูงสุด

\hat{y} = \underset{k \in {1,..., K}}{argmax} f_{k} (x)

4.3 One-vs-One (OvO) Strategy

กระบวนการ:

สร้างโมเดลสำหรับทุกคู่คลาส (i, j): K(K-1)/2 โมเดล
แต่ละโมเดลลงคะแนนให้คลาสที่ชนะ
คลาสที่ได้คะแนน Vote มากที่สุดคือผลลัพธ์

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#282828', 'primaryTextColor': '#ebdbb2', 'primaryBorderColor': '#d79921', 'lineColor': '#d65d0e', 'background': '#1d2021', 'mainBkg': '#282828'}}}%%
graph TD
    X["ข้อมูลใหม่ x"] --> M12["SVM(คลาส1 vs คลาส2)
→ คลาส 1 ✓"]
    X --> M13["SVM(คลาส1 vs คลาส3)
→ คลาส 3 ✓"]
    X --> M14["SVM(คลาส1 vs คลาส4)
→ คลาส 1 ✓"]
    X --> M23["SVM(คลาส2 vs คลาส3)
→ คลาส 3 ✓"]
    X --> M24["SVM(คลาส2 vs คลาส4)
→ คลาส 2 ✓"]
    X --> M34["SVM(คลาส3 vs คลาส4)
→ คลาส 3 ✓"]

    M12 --> VOTE["นับคะแนน Vote:
คลาส 1: 2 votes
คลาส 2: 1 votes
คลาส 3: 3 votes ⭐
คลาส 4: 0 votes"]
    M13 --> VOTE
    M14 --> VOTE
    M23 --> VOTE
    M24 --> VOTE
    M34 --> VOTE

    VOTE --> RESULT["ผล: คลาส 3"]

    style X fill:#458588,color:#ebdbb2,stroke:#076678
    style VOTE fill:#d79921,color:#282828,stroke:#b57614
    style RESULT fill:#98971a,color:#ebdbb2,stroke:#79740e
    style M12 fill:#3c3836,color:#ebdbb2,stroke:#504945
    style M13 fill:#3c3836,color:#ebdbb2,stroke:#504945
    style M14 fill:#3c3836,color:#ebdbb2,stroke:#504945
    style M23 fill:#3c3836,color:#ebdbb2,stroke:#504945
    style M24 fill:#3c3836,color:#ebdbb2,stroke:#504945
    style M34 fill:#3c3836,color:#ebdbb2,stroke:#504945

4.4 ตัวอย่างโค้ด Python: Multi-class SVM บน Iris Dataset

"""
Multi-class SVM บน Iris Dataset
เปรียบเทียบ OvO vs OvR Strategy
"""

import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler, label_binarize
from sklearn.model_selection import train_test_split, StratifiedKFold, cross_val_score
from sklearn.metrics import (classification_report, confusion_matrix,
                              ConfusionMatrixDisplay)
from sklearn.multiclass import OneVsRestClassifier, OneVsOneClassifier
import pandas as pd

# ===== โหลดข้อมูล Iris =====
iris = load_iris()
X, y = iris.data, iris.target
class_names = iris.target_names  # ['setosa', 'versicolor', 'virginica']

print(f"Dataset: Iris")
print(f"จำนวนตัวอย่าง: {X.shape[0]}")
print(f"จำนวน Features: {X.shape[1]} ({', '.join(iris.feature_names)})")
print(f"จำนวนคลาส: {len(class_names)} ({', '.join(class_names)})")
print(f"การกระจายคลาส: {np.bincount(y)}")

# ===== แบ่งข้อมูลและ Normalize =====
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

scaler = StandardScaler()
X_train_s = scaler.fit_transform(X_train)
X_test_s = scaler.transform(X_test)

# ===== กำหนด Strategy ต่าง ๆ =====
strategies = {
    "OvO (default SVC)": SVC(
        kernel='rbf', C=10, gamma='scale', decision_function_shape='ovo', random_state=42
    ),
    "OvR (SVC)": SVC(
        kernel='rbf', C=10, gamma='scale', decision_function_shape='ovr', random_state=42
    ),
    "OvO (explicit)": OneVsOneClassifier(
        SVC(kernel='rbf', C=10, gamma='scale', random_state=42)
    ),
    "OvR (explicit)": OneVsRestClassifier(
        SVC(kernel='rbf', C=10, gamma='scale', probability=True, random_state=42)
    ),
}

# ===== เปรียบเทียบทุก Strategy =====
print(f"\n{'='*65}")
print(f"เปรียบเทียบ Multi-class SVM Strategies")
print(f"{'Strategy':<25} {'Train Acc':>10} {'Test Acc':>10} {'CV Mean':>10} {'CV Std':>8}")
print('-'*65)

results_mc = {}
for name, model in strategies.items():
    model.fit(X_train_s, y_train)
    train_acc = model.score(X_train_s, y_train)
    test_acc = model.score(X_test_s, y_test)
    
    # Cross-validation
    cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
    cv_scores = cross_val_score(model, X_train_s, y_train, cv=cv, scoring='accuracy')
    
    print(f"{name:<25} {train_acc:>10.4f} {test_acc:>10.4f} {cv_scores.mean():>10.4f} {cv_scores.std():>8.4f}")
    results_mc[name] = {'model': model, 'test_acc': test_acc}

# ===== Classification Report โดยละเอียด =====
best_model_name = max(results_mc, key=lambda k: results_mc[k]['test_acc'])
best_mc_model = results_mc[best_model_name]['model']
y_pred_best = best_mc_model.predict(X_test_s)

print(f"\n{'='*65}")
print(f"Classification Report — {best_model_name}")
print(classification_report(y_test, y_pred_best, target_names=class_names))

# ===== Confusion Matrix =====
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Confusion Matrix ปกติ
cm = confusion_matrix(y_test, y_pred_best)
ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=class_names).plot(
    ax=axes[0], colorbar=False, cmap='YlOrRd'
)
axes[0].set_title(f'Confusion Matrix\n{best_model_name}', fontsize=12)

# Decision Boundary (Feature 0 และ 2 เพื่อให้เห็น 3 คลาสชัด)
X_2d = X_test_s[:, [0, 2]]
X_train_2d = X_train_s[:, [0, 2]]

model_2d = SVC(kernel='rbf', C=10, gamma='scale', random_state=42)
model_2d.fit(X_train_2d, y_train)

x_min, x_max = X_train_2d[:, 0].min() - 0.5, X_train_2d[:, 0].max() + 0.5
y_min2, y_max2 = X_train_2d[:, 1].min() - 0.5, X_train_2d[:, 1].max() + 0.5
xx2, yy2 = np.meshgrid(np.linspace(x_min, x_max, 300),
                        np.linspace(y_min2, y_max2, 300))
Z2 = model_2d.predict(np.c_[xx2.ravel(), yy2.ravel()]).reshape(xx2.shape)

colors_map = {0: '#cc241d', 1: '#98971a', 2: '#458588'}
colors_light = {0: '#cc241d33', 1: '#98971a33', 2: '#45858833'}

axes[1].contourf(xx2, yy2, Z2, alpha=0.3,
                 colors=[colors_light[0], colors_light[1], colors_light[2]])

for class_idx, class_name in enumerate(class_names):
    mask = y_test == class_idx
    axes[1].scatter(X_2d[mask, 0], X_2d[mask, 1],
                    c=colors_map[class_idx], label=class_name,
                    s=60, edgecolors='k', linewidth=0.5)

axes[1].set_xlabel('Sepal Length (scaled)')
axes[1].set_ylabel('Petal Length (scaled)')
axes[1].set_title('Decision Boundary (OvO SVM RBF)\n2 Features', fontsize=12)
axes[1].legend(fontsize=9)
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('multiclass_svm_iris.png', dpi=150, bbox_inches='tight')
plt.show()

# ===== ตัวอย่างการทำนายจุดใหม่ =====
print(f"\n{'='*65}")
print("ตัวอย่างการทำนายจุดข้อมูลใหม่")
print(f"{'='*65}")

X_new = np.array([
    [5.1, 3.5, 1.4, 0.2],  # น่าจะเป็น setosa
    [6.0, 2.7, 5.1, 1.6],  # น่าจะเป็น versicolor
    [6.9, 3.1, 5.4, 2.1],  # น่าจะเป็น virginica
])
X_new_s = scaler.transform(X_new)
y_new_pred = best_mc_model.predict(X_new_s)

for i, (x, pred) in enumerate(zip(X_new, y_new_pred)):
    print(f"ข้อมูล {i+1}: {x} → ทำนาย: {class_names[pred]}")

ผลลัพธ์ที่คาดหวัง:

=================================================================
เปรียบเทียบ Multi-class SVM Strategies
Strategy                   Train Acc   Test Acc    CV Mean   CV Std
-----------------------------------------------------------------
OvO (default SVC)             1.0000     0.9778     0.9810   0.0209
OvR (SVC)                     1.0000     0.9778     0.9810   0.0209
OvO (explicit)                1.0000     0.9778     0.9810   0.0209
OvR (explicit)                1.0000     0.9778     0.9810   0.0209

ตัวอย่างการทำนาย:
ข้อมูล 1: [5.1, 3.5, 1.4, 0.2] → ทำนาย: setosa
ข้อมูล 2: [6.0, 2.7, 5.1, 1.6] → ทำนาย: versicolor
ข้อมูล 3: [6.9, 3.1, 5.4, 2.1] → ทำนาย: virginica

4.5 การเลือก Multi-class Strategy ที่เหมาะสม

"""
แนวทางการเลือก Strategy ตามคุณลักษณะของข้อมูล
"""

from sklearn.datasets import load_digits
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, cross_val_score
import time

# โหลดข้อมูล Digits (10 คลาส: 0-9)
digits = load_digits()
X_d, y_d = digits.data, digits.target

print(f"Dataset: Digits")
print(f"จำนวนตัวอย่าง: {X_d.shape[0]}, Features: {X_d.shape[1]}, คลาส: {len(np.unique(y_d))}")

scaler_d = StandardScaler()
X_d_s = scaler_d.fit_transform(X_d)
X_d_train, X_d_test, y_d_train, y_d_test = train_test_split(
    X_d_s, y_d, test_size=0.2, random_state=42, stratify=y_d
)

# เปรียบเทียบ OvO vs OvR ในแง่เวลา
print(f"\n{'Strategy':<15} {'Test Acc':>10} {'Time (s)':>10}")
print('-'*40)

for shape in ['ovo', 'ovr']:
    start = time.time()
    model = SVC(kernel='rbf', C=10, gamma='scale',
                decision_function_shape=shape, random_state=42)
    model.fit(X_d_train, y_d_train)
    acc = model.score(X_d_test, y_d_test)
    elapsed = time.time() - start
    print(f"{shape.upper():<15} {acc:>10.4f} {elapsed:>10.3f}")

# ===== จำนวนโมเดลใน OvO vs OvR =====
K = 10  # จำนวนคลาส
n_ovo = K * (K - 1) // 2  # OvO
n_ovr = K                  # OvR

print(f"\nสำหรับ {K} คลาส:")
print(f"OvO ต้องการ {n_ovo} โมเดล")
print(f"OvR ต้องการ {n_ovr} โมเดล")

5 ประวัติศาสตร์และพัฒนาการของ SVM

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#282828', 'primaryTextColor': '#ebdbb2', 'primaryBorderColor': '#d79921', 'lineColor': '#d65d0e', 'background': '#1d2021', 'mainBkg': '#282828'}}}%%
flowchart TD
    subgraph ERA1["ยุคก่อตั้ง 1960s-1980s"]
        A1["1963 — Vapnik & Chervonenkis
เสนอ VC Theory
พื้นฐานทางทฤษฎีของ SVM"]
        A2["1971 — Mercer's Theorem
เงื่อนไขของ Kernel ที่ถูกต้อง"]
    end

    subgraph ERA2["ยุคกำเนิด 1990s"]
        B1["1992 — Boser, Guyon, Vapnik
เสนอ Hard Margin SVM + Kernel Trick
(NIPS Conference)"]
        B2["1995 — Cortes & Vapnik
เสนอ Soft Margin SVM (C-SVM)
รองรับ Noisy Data"]
        B3["1996 — Schölkopf et al.
Support Vector Regression (SVR)
และ ν-SVM"]
    end

    subgraph ERA3["ยุคเฟื่องฟู 2000s"]
        C1["2001 — Platt Scaling
แปลง SVM Output เป็น Probability"]
        C2["2004 — One-class SVM
สำหรับ Anomaly Detection"]
        C3["2008 — LIBSVM Library
Implementation ที่ใช้กันแพร่หลาย"]
    end

    subgraph ERA4["ยุคปัจจุบัน 2010s-ปัจจุบัน"]
        D1["2010s — Deep Learning ได้รับความนิยม
SVM ยังคงแข็งแกร่งใน Small Data"]
        D2["ปัจจุบัน — SVM ใน scikit-learn
การผสมผสาน SVM + Deep Features"]
    end

    A1 --> B1
    A2 --> B1
    B1 --> B2
    B2 --> B3
    B3 --> C1
    C1 --> C2
    C2 --> C3
    C3 --> D1
    D1 --> D2

    style ERA1 fill:#3c3836,stroke:#d79921,color:#ebdbb2
    style ERA2 fill:#3c3836,stroke:#458588,color:#ebdbb2
    style ERA3 fill:#3c3836,stroke:#98971a,color:#ebdbb2
    style ERA4 fill:#3c3836,stroke:#cc241d,color:#ebdbb2
    style A1 fill:#282828,color:#ebdbb2,stroke:#d79921
    style A2 fill:#282828,color:#ebdbb2,stroke:#d79921
    style B1 fill:#282828,color:#ebdbb2,stroke:#458588
    style B2 fill:#282828,color:#ebdbb2,stroke:#458588
    style B3 fill:#282828,color:#ebdbb2,stroke:#458588
    style C1 fill:#282828,color:#ebdbb2,stroke:#98971a
    style C2 fill:#282828,color:#ebdbb2,stroke:#98971a
    style C3 fill:#282828,color:#ebdbb2,stroke:#98971a
    style D1 fill:#282828,color:#ebdbb2,stroke:#cc241d
    style D2 fill:#282828,color:#ebdbb2,stroke:#cc241d

6 ข้อดี ข้อเสีย และการเลือกใช้ SVM

6.1 ข้อดีของ SVM

Theoretical Foundation แข็งแกร่ง — มีทฤษฎี VC Dimension รองรับ การ Generalize ดีในทางทฤษฎี
Effective in High-dimensional Spaces — ทำงานได้ดีเมื่อจำนวน Features มากกว่า Samples
Memory Efficient — ใช้เฉพาะ Support Vectors ในการทำนาย ไม่ต้องเก็บข้อมูลทั้งหมด
Versatile — เลือก Kernel Function ได้หลายแบบตามลักษณะข้อมูล
Global Optimum — ปัญหา Convex Optimization มี Global Optimum รับประกัน

6.2 ข้อเสียของ SVM

Slow on Large Datasets — ความซับซ้อน O(n²) ถึง O(n³) ในการฝึก ช้ากับข้อมูลขนาดใหญ่
Sensitive to Feature Scaling — ต้อง Normalize ข้อมูลก่อนใช้งานเสมอ
Kernel Selection ยาก — ต้องเลือก Kernel และปรับ Hyperparameters อย่างระมัดระวัง
No Direct Probability Output — ต้องใช้ Platt Scaling เพิ่มเติมเพื่อให้ได้ Probability
Hard to Interpret — โมเดลซับซ้อน อธิบายผลให้ผู้ไม่มีพื้นฐานเข้าใจยาก

6.3 แนวทางการเลือกใช้ SVM

"""
แนวทางการเลือก Kernel และพารามิเตอร์สำหรับ SVM
"""

def recommend_svm_config(n_samples, n_features, task_type, has_noise):
    """
    แนะนำการตั้งค่า SVM ตามลักษณะของข้อมูลและงาน
    
    Parameters:
        n_samples (int): จำนวนตัวอย่างข้อมูล
        n_features (int): จำนวน Features
        task_type (str): 'classification' หรือ 'regression'
        has_noise (bool): มี Noise หรือ Outliers มากหรือไม่
    
    Returns:
        dict: การตั้งค่า SVM ที่แนะนำ
    """
    config = {}
    
    # เลือก Kernel
    if n_features > n_samples:
        config['kernel'] = 'linear'  # High-dimensional: ใช้ Linear
        config['reason_kernel'] = 'Features > Samples ใช้ Linear มักพอ'
    elif n_samples < 10000:
        config['kernel'] = 'rbf'     # Dataset เล็ก-กลาง: RBF ดีที่สุด
        config['reason_kernel'] = 'Dataset ขนาดเล็ก-กลาง RBF มักดีที่สุด'
    else:
        config['kernel'] = 'linear'  # Dataset ใหญ่: Linear เร็วกว่า
        config['reason_kernel'] = 'Dataset ขนาดใหญ่ Linear เร็วกว่าและยังดี'
    
    # เลือกค่า C
    if has_noise:
        config['C_range'] = [0.1, 1, 10]   # C น้อย ทนทาน Noise
        config['reason_C'] = 'มี Noise ใช้ C น้อยเพื่อ Soft Margin'
    else:
        config['C_range'] = [1, 10, 100]    # C มาก เหมาะข้อมูลสะอาด
        config['reason_C'] = 'ข้อมูลสะอาด ลอง C ช่วง 1-100'
    
    # เลือก Model
    if task_type == 'classification':
        config['model'] = 'SVC'
    else:
        config['model'] = 'SVR'
    
    return config


# ทดสอบฟังก์ชัน
scenarios = [
    (500, 20, 'classification', True, "Breast Cancer with Noise"),
    (150, 4, 'classification', False, "Iris Dataset"),
    (10000, 100, 'classification', False, "Large Text Dataset"),
    (200, 500, 'classification', False, "High-dim Genomics"),
]

print("แนวทางการเลือกตั้งค่า SVM:")
print("="*70)
for n_s, n_f, task, noise, desc in scenarios:
    config = recommend_svm_config(n_s, n_f, task, noise)
    print(f"\nScenario: {desc}")
    print(f"  n_samples={n_s}, n_features={n_f}, noise={noise}")
    print(f"  → Model: {config['model']}")
    print(f"  → Kernel: {config['kernel']} ({config['reason_kernel']})")
    print(f"  → C range: {config['C_range']} ({config['reason_C']})")

สรุป

Support Vector Machines (SVM) เป็นหนึ่งในอัลกอริทึม Machine Learning ที่มีทฤษฎีรองรับมากที่สุด โดยมีหลักการพื้นฐานคือการหา Hyperplane ที่แบ่งแยกข้อมูลสองคลาสด้วย Margin ที่กว้างที่สุด

ประเด็นสำคัญที่ควรจำ

Linear SVM — หา Hyperplane ที่ Maximize Margin โดยแก้ปัญหา Convex Optimization มีทั้ง Hard Margin (ข้อมูลแยกได้สมบูรณ์) และ Soft Margin (ข้อมูล Noisy) ที่ควบคุมด้วย Regularization Parameter C
Kernel Trick — แทนที่จะแปลงข้อมูลไป Feature Space มิติสูงจริง ๆ ใช้ Kernel Function K(x,z) คำนวณผลคูณภายในแทน โดย Kernel ที่นิยมคือ RBF, Polynomial และ Sigmoid
Non-linear SVM — ผ่าน Kernel Trick สามารถจัดการข้อมูลที่ไม่แยกเชิงเส้นได้ด้วยการ Optimize บน Dual Problem ที่มีตัวแปร Lagrange Multipliers αᵢ
Multi-class SVM — ขยาย Binary SVM ไปยังหลายคลาสด้วยกลยุทธ์ OvO (K(K-1)/2 โมเดล แม่นยำกว่า) หรือ OvR (K โมเดล เร็วกว่า)

ความสัมพันธ์กับหัวข้ออื่นในวิชา

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#282828', 'primaryTextColor': '#ebdbb2', 'primaryBorderColor': '#d79921', 'lineColor': '#d65d0e', 'background': '#1d2021', 'mainBkg': '#282828'}}}%%
graph LR
    LR["Logistic Regression
(สัปดาห์ 5-6)
Discriminative Model"] -->|"เพิ่ม Margin Max"| SVM

    DT["Decision Trees
(สัปดาห์ 7)
Boundary ซับซ้อน"] -->|"เทียบ Non-linear boundary"| SVM

    SVM["SVM
(สัปดาห์ 11)
Maximum Margin"]

    SVM -->|"Kernel → RKHS"| KM["Kernel Methods
Gaussian Processes"]
    SVM -->|"Soft Margin → Regularization"| LT["Learning Theory
(สัปดาห์ 14)
Bias-Variance Tradeoff"]
    SVM -->|"Feature Engineering"| DR["Dimensionality Reduction
(สัปดาห์ 13)
PCA, LDA"]

    style SVM fill:#d79921,color:#282828,stroke:#b57614
    style LR fill:#458588,color:#ebdbb2,stroke:#076678
    style DT fill:#458588,color:#ebdbb2,stroke:#076678
    style KM fill:#98971a,color:#ebdbb2,stroke:#79740e
    style LT fill:#98971a,color:#ebdbb2,stroke:#79740e
    style DR fill:#98971a,color:#ebdbb2,stroke:#79740e

เปรียบเทียบ SVM กับอัลกอริทึมอื่น

เกณฑ์	SVM	Logistic Regression	Neural Network	Random Forest
ข้อมูลขนาดเล็ก	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐	⭐⭐⭐
ข้อมูลขนาดใหญ่	⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐
High Dimension	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐
Noisy Data	⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐
ความเร็วฝึก	⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐
ความเร็วทำนาย	⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐
ตีความผล	⭐⭐	⭐⭐⭐⭐	⭐	⭐⭐⭐
Hyperparameter	ปานกลาง	น้อย	มาก	ปานกลาง

เอกสารอ้างอิง

Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297. — บทความต้นฉบับ Soft Margin SVM
Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. Proceedings of the 5th Annual Workshop on Computational Learning Theory (COLT). — บทความแรกของ SVM + Kernel Trick
Schölkopf, B., & Smola, A. J. (2002). Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press. — ตำราหลักเรื่อง Kernel Methods
Chang, C.-C., & Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3), 27:1–27:27. — Library LIBSVM ที่ scikit-learn ใช้
Pedregosa, F., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830. — scikit-learn Documentation
Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer. (Chapter 7: Sparse Kernel Machines) — ตำรา Pattern Recognition ครอบคลุม SVM
Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective. MIT Press. (Chapter 14: Kernels) — มุมมอง Probabilistic ของ Kernel Methods
scikit-learn Documentation — SVM User Guide: https://scikit-learn.org/stable/modules/svm.html