การเปรียบเทียบวิธีการเรียนรู้แบบพาราเมตริกและไม่ใช่พาราเมตริก (Parametric vs Non-Parametric Learning)

1. บทนำ

1.1 ความสำคัญของการจำแนกประเภทโมเดล

ในการเรียนรู้ของเครื่อง (Machine Learning) การเลือกโมเดลที่เหมาะสมกับลักษณะของข้อมูลและปัญหาที่ต้องการแก้ไขเป็นสิ่งสำคัญอย่างยิ่ง การจำแนกโมเดลออกเป็นแบบพาราเมตริก (Parametric) และไม่ใช่พาราเมตริก (Non-Parametric) เป็นหนึ่งในวิธีการจัดหมวดหมู่ที่ช่วยให้เราเข้าใจคุณลักษณะพื้นฐานของอัลกอริทึมต่างๆ

1.2 ภาพรวมความแตกต่าง

flowchart TB
    subgraph title["การจำแนกโมเดล Machine Learning"]
        direction TB
        ML["Machine Learning Models
โมเดลการเรียนรู้ของเครื่อง"]
        
        subgraph param["Parametric Models"]
            style param fill:#458588,stroke:#83a598,color:#ebdbb2
            P1["จำนวนพารามิเตอร์คงที่
Fixed Parameters"]
            P2["สมมติฐานชัดเจน
Strong Assumptions"]
            P3["เรียนรู้เร็ว
Fast Training"]
        end
        
        subgraph nonparam["Non-Parametric Models"]
            style nonparam fill:#d65d0e,stroke:#fe8019,color:#ebdbb2
            N1["พารามิเตอร์เพิ่มตามข้อมูล
Growing Parameters"]
            N2["สมมติฐานน้อย
Fewer Assumptions"]
            N3["ยืดหยุ่นสูง
High Flexibility"]
        end
        
        ML --> param
        ML --> nonparam
    end
    
    style ML fill:#b16286,stroke:#d3869b,color:#ebdbb2
    style title fill:#282828,stroke:#ebdbb2,color:#ebdbb2

1.3 ประวัติศาสตร์และพัฒนาการ

flowchart TB
    subgraph era1["ยุคบุกเบิก (1950s-1960s)"]
        style era1 fill:#98971a,stroke:#b8bb26,color:#ebdbb2
        A1["1956: Perceptron
Frank Rosenblatt"]
        A2["1957: k-NN Algorithm
Fix & Hodges"]
    end
    
    subgraph era2["ยุคพัฒนา (1970s-1980s)"]
        style era2 fill:#689d6a,stroke:#8ec07c,color:#ebdbb2
        B1["1979: CART
Breiman et al."]
        B2["1984: ID3 Algorithm
Quinlan"]
        B3["1986: Backpropagation
Rumelhart"]
    end
    
    subgraph era3["ยุคทอง (1990s-2000s)"]
        style era3 fill:#458588,stroke:#83a598,color:#ebdbb2
        C1["1995: Random Forest
Ho, Breiman"]
        C2["1996: Bagging
Breiman"]
        C3["1999: Gradient Boosting
Friedman"]
    end
    
    subgraph era4["ยุคปัจจุบัน (2010s+)"]
        style era4 fill:#b16286,stroke:#d3869b,color:#ebdbb2
        D1["XGBoost, LightGBM
CatBoost"]
        D2["Deep Learning
Integration"]
    end
    
    era1 --> era2 --> era3 --> era4

2. Parametric Models

2.1 นิยามและแนวคิดพื้นฐาน

โมเดลพาราเมตริก (Parametric Model) คือโมเดลที่มีจำนวนพารามิเตอร์คงที่และไม่เปลี่ยนแปลงตามขนาดของข้อมูลฝึกสอน โมเดลประเภทนี้ตั้งสมมติฐานเกี่ยวกับรูปแบบของฟังก์ชันที่จะเรียนรู้ไว้ล่วงหน้า

คุณลักษณะสำคัญ:

จำนวนพารามิเตอร์ถูกกำหนดไว้ก่อนการเรียนรู้
ตั้งสมมติฐานเกี่ยวกับการกระจายตัวของข้อมูล (Data Distribution)
หลังการฝึกสอน ไม่จำเป็นต้องเก็บข้อมูลฝึกสอนไว้
ความซับซ้อนของโมเดลไม่เพิ่มขึ้นตามขนาดข้อมูล

2.2 ตัวอย่างโมเดลพาราเมตริก

flowchart TB
    subgraph parametric["Parametric Models"]
        style parametric fill:#458588,stroke:#83a598,color:#ebdbb2
        
        LR["Linear Regression
การถดถอยเชิงเส้น"]
        LOG["Logistic Regression
การถดถอยโลจิสติก"]
        NB["Naive Bayes
เบย์ไร้เดียงสา"]
        LDA["Linear Discriminant Analysis
การวิเคราะห์จำแนกเชิงเส้น"]
        NN["Neural Networks
โครงข่ายประสาทเทียม"]
        
        LR --> |"y = wx + b"| param1["2 พารามิเตอร์
ต่อ feature"]
        LOG --> |"σ(wx + b)"| param2["2 พารามิเตอร์
ต่อ feature"]
        NB --> |"P(x|C)"| param3["mean, variance
ต่อ feature"]
        LDA --> |"π, μ, Σ"| param4["พารามิเตอร์
ต่อ class"]
        NN --> |"weights, biases"| param5["ตามโครงสร้าง
network"]
    end
    
    style LR fill:#d65d0e,stroke:#fe8019,color:#ebdbb2
    style LOG fill:#d65d0e,stroke:#fe8019,color:#ebdbb2
    style NB fill:#d65d0e,stroke:#fe8019,color:#ebdbb2
    style LDA fill:#d65d0e,stroke:#fe8019,color:#ebdbb2
    style NN fill:#d65d0e,stroke:#fe8019,color:#ebdbb2

2.3 สมการทางคณิตศาสตร์

2.3.1 Linear Regression

สมการโมเดล:

y = w_{0} + \sum_{i = 1}^{n} w_{i} x_{i} = w^{T} x

คำอธิบายตัวแปร:

y = ค่าทำนาย (Predicted Value)
w₀ = ค่าตัดแกน (Intercept/Bias)
wᵢ = น้ำหนัก (Weight) ของ feature ที่ i
xᵢ = ค่า feature ที่ i
n = จำนวน features ทั้งหมด

Cost Function (Mean Squared Error):

J (w) = \frac{1}{2 m} \sum_{i = 1}^{m} {(w^{T} x^{(i)} - y^{(i)})}^{2}

คำอธิบายตัวแปร:

J(w) = ฟังก์ชันค่าใช้จ่าย (Cost Function)
m = จำนวนตัวอย่างข้อมูล
x⁽ⁱ⁾ = ข้อมูลตัวอย่างที่ i
y⁽ⁱ⁾ = ค่าจริงของตัวอย่างที่ i

2.3.2 Logistic Regression

สมการโมเดล:

P (y = 1 | x) = σ (w^{T} x) = \frac{1}{1 + e^{- w^{T} x}}

คำอธิบายตัวแปร:

P(y=1|x) = ความน่าจะเป็นที่ผลลัพธ์เป็น class 1 เมื่อกำหนด x
σ(·) = ฟังก์ชัน Sigmoid
e = ค่าคงที่ออยเลอร์ (≈ 2.71828)

2.4 ตัวอย่างการคำนวณ

ตัวอย่าง: Linear Regression

สมมติเรามีข้อมูลพื้นที่บ้าน (ตารางเมตร) และราคา (ล้านบาท):

พื้นที่ (x)	ราคา (y)
50	2.0
80	3.2
100	4.0
120	4.8

ขั้นตอนการคำนวณ:

คำนวณค่าเฉลี่ย:
- x̄ = (50 + 80 + 100 + 120) / 4 = 87.5
- ȳ = (2.0 + 3.2 + 4.0 + 4.8) / 4 = 3.5
คำนวณ w₁ (slope):

w_{1} = \frac{\sum (x_{i} - \bar{x}) (y_{i} - \bar{y})}{\sum {(x_{i} - \bar{x})}^{2}}

∑(xᵢ - x̄)(yᵢ - ȳ) = (-37.5)(-1.5) + (-7.5)(-0.3) + (12.5)(0.5) + (32.5)(1.3) = 56.25 + 2.25 + 6.25 + 42.25 = 107
∑(xᵢ - x̄)² = 37.5² + 7.5² + 12.5² + 32.5² = 1406.25 + 56.25 + 156.25 + 1056.25 = 2675
w₁ = 107 / 2675 = 0.04

คำนวณ w₀ (intercept):
- w₀ = ȳ - w₁·x̄ = 3.5 - (0.04)(87.5) = 3.5 - 3.5 = 0
สมการที่ได้: y = 0.04x + 0
ทำนายราคาบ้านพื้นที่ 90 ตร.ม.:
- y = 0.04 × 90 = 3.6 ล้านบาท

2.5 ตัวอย่างโค้ด Python

"""
Parametric Model: Linear Regression
โมเดลพาราเมตริก: การถดถอยเชิงเส้น

โค้ดนี้แสดงการสร้างโมเดล Linear Regression แบบ parametric
ซึ่งมีพารามิเตอร์จำนวนคงที่ (weights และ bias)
"""

import numpy as np
from typing import Tuple


class LinearRegressionParametric:
    """
    คลาส Linear Regression แบบ Parametric
    
    โมเดลนี้มีพารามิเตอร์ 2 ตัว: weight (w) และ bias (b)
    ไม่ว่าข้อมูลจะมีขนาดเท่าไหร่ จำนวนพารามิเตอร์ก็ไม่เปลี่ยน
    """
    
    def __init__(self, learning_rate: float = 0.01, n_iterations: int = 1000):
        """
        กำหนดค่าเริ่มต้นสำหรับโมเดล
        
        Args:
            learning_rate: อัตราการเรียนรู้สำหรับ gradient descent
            n_iterations: จำนวนรอบการฝึกสอน
        """
        self.learning_rate = learning_rate
        self.n_iterations = n_iterations
        self.weights = None  # พารามิเตอร์ w
        self.bias = None     # พารามิเตอร์ b
        
    def fit(self, X: np.ndarray, y: np.ndarray) -> 'LinearRegressionParametric':
        """
        ฝึกสอนโมเดลด้วยข้อมูล
        
        Args:
            X: ข้อมูล features รูปร่าง (n_samples, n_features)
            y: ค่าเป้าหมาย รูปร่าง (n_samples,)
            
        Returns:
            self: โมเดลที่ผ่านการฝึกสอนแล้ว
        """
        n_samples, n_features = X.shape
        
        # กำหนดค่าเริ่มต้นของพารามิเตอร์
        # จำนวนพารามิเตอร์ = n_features + 1 (คงที่ไม่ขึ้นกับ n_samples)
        self.weights = np.zeros(n_features)
        self.bias = 0
        
        # Gradient Descent
        for _ in range(self.n_iterations):
            # คำนวณค่าทำนาย: y_pred = X·w + b
            y_pred = np.dot(X, self.weights) + self.bias
            
            # คำนวณ gradients
            dw = (1 / n_samples) * np.dot(X.T, (y_pred - y))
            db = (1 / n_samples) * np.sum(y_pred - y)
            
            # อัปเดตพารามิเตอร์
            self.weights -= self.learning_rate * dw
            self.bias -= self.learning_rate * db
            
        return self
    
    def predict(self, X: np.ndarray) -> np.ndarray:
        """
        ทำนายค่าจากข้อมูลใหม่
        
        หลังจากฝึกสอนแล้ว ไม่ต้องใช้ข้อมูลฝึกสอนอีก
        ใช้เพียงพารามิเตอร์ที่เรียนรู้มา
        
        Args:
            X: ข้อมูลที่ต้องการทำนาย
            
        Returns:
            ค่าทำนาย
        """
        return np.dot(X, self.weights) + self.bias
    
    def get_params_count(self) -> int:
        """
        นับจำนวนพารามิเตอร์ทั้งหมด
        
        Returns:
            จำนวนพารามิเตอร์ (คงที่ไม่ขึ้นกับขนาดข้อมูลฝึกสอน)
        """
        if self.weights is None:
            return 0
        return len(self.weights) + 1  # weights + bias


# ===== ตัวอย่างการใช้งาน =====

if __name__ == "__main__":
    # สร้างข้อมูลตัวอย่าง: พื้นที่บ้าน -> ราคา
    X_train = np.array([[50], [80], [100], [120]])
    y_train = np.array([2.0, 3.2, 4.0, 4.8])
    
    # สร้างและฝึกสอนโมเดล
    model = LinearRegressionParametric(learning_rate=0.0001, n_iterations=10000)
    model.fit(X_train, y_train)
    
    print("=" * 50)
    print("Linear Regression (Parametric Model)")
    print("=" * 50)
    
    # แสดงพารามิเตอร์ที่เรียนรู้ได้
    print(f"\nพารามิเตอร์ที่เรียนรู้:")
    print(f"  - Weight (w): {model.weights[0]:.4f}")
    print(f"  - Bias (b): {model.bias:.4f}")
    print(f"  - จำนวนพารามิเตอร์ทั้งหมด: {model.get_params_count()}")
    
    # ทำนายราคาบ้าน
    X_test = np.array([[90], [150]])
    predictions = model.predict(X_test)
    
    print(f"\nการทำนาย:")
    for area, price in zip(X_test.flatten(), predictions):
        print(f"  - พื้นที่ {area} ตร.ม. -> ราคา {price:.2f} ล้านบาท")
    
    # แสดงว่าจำนวนพารามิเตอร์คงที่ไม่ว่าข้อมูลจะมีขนาดเท่าไหร่
    print(f"\n📌 ข้อสังเกต: จำนวนพารามิเตอร์ = {model.get_params_count()}")
    print("   (คงที่ไม่ว่าจะมีข้อมูลฝึกสอน 4 หรือ 4000 ตัวอย่าง)")

ผลลัพธ์:

==================================================
Linear Regression (Parametric Model)
==================================================

พารามิเตอร์ที่เรียนรู้:
  - Weight (w): 0.0400
  - Bias (b): 0.0000
  - จำนวนพารามิเตอร์ทั้งหมด: 2

การทำนาย:
  - พื้นที่ 90 ตร.ม. -> ราคา 3.60 ล้านบาท
  - พื้นที่ 150 ตร.ม. -> ราคา 6.00 ล้านบาท

📌 ข้อสังเกต: จำนวนพารามิเตอร์ = 2
   (คงที่ไม่ว่าจะมีข้อมูลฝึกสอน 4 หรือ 4000 ตัวอย่าง)

2.6 ข้อดีและข้อจำกัด

ด้าน	ข้อดี	ข้อจำกัด
ความเร็ว	ฝึกสอนและทำนายได้เร็ว	อาจไม่จับ pattern ซับซ้อน
หน่วยความจำ	ใช้หน่วยความจำน้อย	ต้องเลือกรูปแบบโมเดลให้ถูกต้อง
การตีความ	อธิบายได้ง่าย	อาจ underfit ถ้าสมมติฐานผิด
การ generalize	ดีเมื่อสมมติฐานถูกต้อง	Bias สูงถ้าโมเดลไม่เหมาะสม
ข้อมูลน้อย	ทำงานได้ดีกับข้อมูลน้อย	ไม่ยืดหยุ่นกับ pattern ใหม่

3. Non-Parametric Models

3.1 นิยามและแนวคิดพื้นฐาน

โมเดลไม่ใช่พาราเมตริก (Non-Parametric Model) คือโมเดลที่ไม่ได้กำหนดจำนวนพารามิเตอร์ไว้ล่วงหน้า โดยความซับซ้อนของโมเดลจะเพิ่มขึ้นตามขนาดของข้อมูลฝึกสอน

คุณลักษณะสำคัญ:

จำนวนพารามิเตอร์เพิ่มขึ้นตามจำนวนข้อมูล
ไม่ตั้งสมมติฐานเกี่ยวกับรูปแบบของฟังก์ชัน
ต้องเก็บข้อมูลฝึกสอนไว้ (หรือส่วนหนึ่งของข้อมูล)
ยืดหยุ่นสูงในการจับ pattern ที่ซับซ้อน

3.2 k-Nearest Neighbors (k-NN)

3.2.1 หลักการทำงาน

flowchart TB
    subgraph knn["k-NN Algorithm"]
        style knn fill:#458588,stroke:#83a598,color:#ebdbb2
        
        INPUT["ข้อมูลใหม่ (New Point)
ต้องการทำนาย"]
        
        subgraph process["กระบวนการ"]
            style process fill:#282828,stroke:#ebdbb2,color:#ebdbb2
            S1["1. คำนวณระยะห่าง
Distance Calculation"]
            S2["2. เลือก k เพื่อนบ้าน
Select k Neighbors"]
            S3["3. รวมผลลัพธ์
Aggregate Results"]
        end
        
        subgraph output["ผลลัพธ์"]
            style output fill:#689d6a,stroke:#8ec07c,color:#ebdbb2
            CLASS["Classification:
Vote เลือก class"]
            REG["Regression:
ค่าเฉลี่ย neighbors"]
        end
        
        INPUT --> S1
        S1 --> S2
        S2 --> S3
        S3 --> CLASS
        S3 --> REG
    end

3.2.2 สมการระยะทาง

Euclidean Distance:

d (x, y) = \sqrt{\sum_{i = 1}^{n} {(x_{i} - y_{i})}^{2}}

Manhattan Distance:

d (x, y) = \sum_{i = 1}^{n} | x_{i} - y_{i} |

Minkowski Distance (General Form):

d (x, y) = {(\sum_{i = 1}^{n} {| x_{i} - y_{i} |}^{p})}^{\frac{1}{p}}

คำอธิบายตัวแปร:

d(x, y) = ระยะห่างระหว่างจุด x และ y
xᵢ, yᵢ = ค่า feature ที่ i ของจุด x และ y
n = จำนวน features
p = พารามิเตอร์ของ Minkowski (p=1: Manhattan, p=2: Euclidean)

3.2.3 ตัวอย่างการคำนวณ k-NN

โจทย์: จำแนกผลไม้จากน้ำหนัก (กรัม) และความหวาน (Brix)

ผลไม้	น้ำหนัก (x₁)	ความหวาน (x₂)	ประเภท
A	150	12	แอปเปิ้ล
B	170	14	แอปเปิ้ล
C	100	18	ส้ม
D	120	16	ส้ม
E	80	20	ส้ม

ผลไม้ใหม่: น้ำหนัก = 130, ความหวาน = 15, k = 3

ขั้นตอนการคำนวณ:

คำนวณระยะห่าง (Euclidean):
- d(new, A) = √[(130-150)² + (15-12)²] = √[400 + 9] = √409 = 20.22
- d(new, B) = √[(130-170)² + (15-14)²] = √[1600 + 1] = √1601 = 40.01
- d(new, C) = √[(130-100)² + (15-18)²] = √[900 + 9] = √909 = 30.15
- d(new, D) = √[(130-120)² + (15-16)²] = √[100 + 1] = √101 = 10.05
- d(new, E) = √[(130-80)² + (15-20)²] = √[2500 + 25] = √2525 = 50.25
เรียงลำดับและเลือก k=3 เพื่อนบ้าน:
- D: 10.05 (ส้ม)
- A: 20.22 (แอปเปิ้ล)
- C: 30.15 (ส้ม)
โหวต (Majority Voting):
- ส้ม: 2 เสียง
- แอปเปิ้ล: 1 เสียง
ผลลัพธ์: ผลไม้ใหม่ถูกจำแนกเป็น ส้ม

3.3 Kernel Regression

3.3.1 หลักการทำงาน

Kernel Regression หรือ Nadaraya-Watson Estimator เป็นวิธีการถดถอยแบบ non-parametric ที่ใช้ kernel function ในการให้น้ำหนักกับจุดข้อมูลตามระยะห่าง

สมการ Nadaraya-Watson:

\hat{y} (x) = \frac{\sum_{i = 1}^{n} K (\frac{x - x_{i}}{h}) y_{i}}{\sum_{i = 1}^{n} K (\frac{x - x_{i}}{h})}

คำอธิบายตัวแปร:

ŷ(x) = ค่าทำนายที่จุด x
K(·) = Kernel Function
h = Bandwidth (ควบคุมความกว้างของ kernel)
xᵢ, yᵢ = จุดข้อมูลฝึกสอน
n = จำนวนข้อมูลฝึกสอน

3.3.2 ประเภท Kernel Functions

flowchart TB
    subgraph kernels["Kernel Functions"]
        style kernels fill:#458588,stroke:#83a598,color:#ebdbb2
        
        subgraph gaussian["Gaussian (RBF)"]
            style gaussian fill:#d65d0e,stroke:#fe8019,color:#ebdbb2
            G["K(u) = exp(-u²/2)"]
        end
        
        subgraph epan["Epanechnikov"]
            style epan fill:#98971a,stroke:#b8bb26,color:#ebdbb2
            E["K(u) = ¾(1-u²)
if |u| ≤ 1"]
        end
        
        subgraph uniform["Uniform"]
            style uniform fill:#689d6a,stroke:#8ec07c,color:#ebdbb2
            U["K(u) = ½
if |u| ≤ 1"]
        end
        
        subgraph tri["Triangular"]
            style tri fill:#b16286,stroke:#d3869b,color:#ebdbb2
            T["K(u) = (1-|u|)
if |u| ≤ 1"]
        end
    end

Gaussian Kernel:

K (u) = \frac{1}{\sqrt{2 π}} e^{- \frac{u^{2}}{2}}

3.4 ตัวอย่างโค้ด Python

"""
Non-Parametric Models: k-NN และ Kernel Regression
โมเดลไม่ใช่พาราเมตริก: k-เพื่อนบ้านใกล้สุด และ การถดถอยเคอร์เนล

โค้ดนี้แสดงว่าโมเดล non-parametric ต้องเก็บข้อมูลฝึกสอนไว้
และจำนวน "พารามิเตอร์" เพิ่มขึ้นตามขนาดข้อมูล
"""

import numpy as np
from collections import Counter
from typing import List, Tuple


class KNNClassifier:
    """
    k-Nearest Neighbors Classifier (Non-Parametric)
    
    โมเดลนี้ไม่มีพารามิเตอร์ที่ต้องเรียนรู้
    แต่ต้องเก็บข้อมูลฝึกสอนทั้งหมดไว้
    """
    
    def __init__(self, k: int = 3):
        """
        กำหนดค่า k (จำนวนเพื่อนบ้าน)
        
        Args:
            k: จำนวนเพื่อนบ้านที่ใช้ในการตัดสินใจ
        """
        self.k = k
        self.X_train = None  # ต้องเก็บข้อมูลฝึกสอนไว้
        self.y_train = None  # ต้องเก็บ labels ไว้
        
    def fit(self, X: np.ndarray, y: np.ndarray) -> 'KNNClassifier':
        """
        "ฝึกสอน" โมเดล (จริงๆ แค่เก็บข้อมูลไว้)
        
        k-NN เป็น Lazy Learner - ไม่ได้เรียนรู้อะไรจริงๆ
        แค่เก็บข้อมูลไว้ใช้ตอนทำนาย
        
        Args:
            X: ข้อมูล features
            y: labels
        """
        self.X_train = X.copy()
        self.y_train = y.copy()
        return self
    
    def _euclidean_distance(self, x1: np.ndarray, x2: np.ndarray) -> float:
        """
        คำนวณระยะห่างแบบ Euclidean
        
        Args:
            x1, x2: จุดสองจุดที่ต้องการหาระยะห่าง
            
        Returns:
            ระยะห่างระหว่างสองจุด
        """
        return np.sqrt(np.sum((x1 - x2) ** 2))
    
    def predict(self, X: np.ndarray) -> np.ndarray:
        """
        ทำนาย class ของข้อมูลใหม่
        
        ต้องเปรียบเทียบกับข้อมูลฝึกสอนทุกจุด
        (นี่คือสาเหตุที่ต้องเก็บข้อมูลไว้)
        
        Args:
            X: ข้อมูลที่ต้องการทำนาย
            
        Returns:
            labels ที่ทำนายได้
        """
        predictions = []
        
        for x in X:
            # คำนวณระยะห่างไปยังทุกจุดในข้อมูลฝึกสอน
            distances = [self._euclidean_distance(x, x_train) 
                        for x_train in self.X_train]
            
            # หา k เพื่อนบ้านที่ใกล้ที่สุด
            k_indices = np.argsort(distances)[:self.k]
            k_labels = self.y_train[k_indices]
            
            # โหวต (Majority Voting)
            most_common = Counter(k_labels).most_common(1)
            predictions.append(most_common[0][0])
            
        return np.array(predictions)
    
    def get_memory_usage(self) -> int:
        """
        คำนวณจำนวน "พารามิเตอร์" (จริงๆ คือข้อมูลที่ต้องเก็บ)
        
        Returns:
            จำนวนค่าที่ต้องเก็บ
        """
        if self.X_train is None:
            return 0
        # ต้องเก็บทุก feature ของทุกตัวอย่าง + labels
        return self.X_train.size + self.y_train.size


class KernelRegression:
    """
    Nadaraya-Watson Kernel Regression (Non-Parametric)
    
    ใช้ Gaussian kernel ในการให้น้ำหนัก
    """
    
    def __init__(self, bandwidth: float = 1.0):
        """
        กำหนด bandwidth ของ kernel
        
        Args:
            bandwidth: ความกว้างของ kernel (h)
        """
        self.bandwidth = bandwidth
        self.X_train = None
        self.y_train = None
        
    def fit(self, X: np.ndarray, y: np.ndarray) -> 'KernelRegression':
        """
        เก็บข้อมูลฝึกสอน
        """
        self.X_train = X.copy()
        self.y_train = y.copy()
        return self
    
    def _gaussian_kernel(self, u: np.ndarray) -> np.ndarray:
        """
        Gaussian kernel function
        
        K(u) = (1/√(2π)) * exp(-u²/2)
        """
        return (1 / np.sqrt(2 * np.pi)) * np.exp(-0.5 * u ** 2)
    
    def predict(self, X: np.ndarray) -> np.ndarray:
        """
        ทำนายด้วย Nadaraya-Watson estimator
        
        ŷ(x) = Σ K((x-xᵢ)/h) * yᵢ / Σ K((x-xᵢ)/h)
        """
        predictions = []
        
        for x in X:
            # คำนวณ kernel weights สำหรับทุกจุดฝึกสอน
            u = (x - self.X_train) / self.bandwidth
            weights = self._gaussian_kernel(u)
            
            # Weighted average
            prediction = np.sum(weights * self.y_train) / np.sum(weights)
            predictions.append(prediction)
            
        return np.array(predictions)


# ===== ตัวอย่างการใช้งาน =====

if __name__ == "__main__":
    print("=" * 60)
    print("Non-Parametric Models Demo")
    print("=" * 60)
    
    # === k-NN Classification ===
    print("\n📌 k-NN Classification")
    print("-" * 40)
    
    # ข้อมูลผลไม้: [น้ำหนัก, ความหวาน]
    X_fruits = np.array([
        [150, 12],  # แอปเปิ้ล
        [170, 14],  # แอปเปิ้ล
        [100, 18],  # ส้ม
        [120, 16],  # ส้ม
        [80, 20]    # ส้ม
    ])
    y_fruits = np.array(['แอปเปิ้ล', 'แอปเปิ้ล', 'ส้ม', 'ส้ม', 'ส้ม'])
    
    # สร้างและฝึกสอนโมเดล
    knn = KNNClassifier(k=3)
    knn.fit(X_fruits, y_fruits)
    
    # ทำนาย
    X_new = np.array([[130, 15]])
    prediction = knn.predict(X_new)
    
    print(f"ข้อมูลฝึกสอน: {len(X_fruits)} ตัวอย่าง")
    print(f"จำนวนค่าที่ต้องเก็บ: {knn.get_memory_usage()}")
    print(f"ผลไม้ใหม่ [130g, 15 Brix] -> {prediction[0]}")
    
    # แสดงว่า memory เพิ่มขึ้นตามข้อมูล
    print("\n📊 Memory Usage ตามขนาดข้อมูล:")
    for n in [10, 100, 1000, 10000]:
        X_large = np.random.rand(n, 5)
        y_large = np.random.randint(0, 2, n)
        knn_large = KNNClassifier(k=3)
        knn_large.fit(X_large, y_large)
        print(f"   n={n:5d}: {knn_large.get_memory_usage():7d} ค่า")
    
    # === Kernel Regression ===
    print("\n📌 Kernel Regression")
    print("-" * 40)
    
    # ข้อมูลอุณหภูมิ vs ยอดขายไอศกรีม
    X_temp = np.array([15, 20, 25, 30, 35, 40])
    y_sales = np.array([100, 150, 200, 280, 350, 400])
    
    kr = KernelRegression(bandwidth=3.0)
    kr.fit(X_temp, y_sales)
    
    X_test = np.array([22, 28, 38])
    predictions = kr.predict(X_test)
    
    print(f"ข้อมูลฝึกสอน: {len(X_temp)} ตัวอย่าง")
    print(f"\nการทำนาย (bandwidth={kr.bandwidth}):")
    for temp, sales in zip(X_test, predictions):
        print(f"   อุณหภูมิ {temp}°C -> ยอดขาย {sales:.0f} หน่วย")
    
    print("\n" + "=" * 60)
    print("📌 สรุป: Non-Parametric Models")
    print("=" * 60)
    print("• ต้องเก็บข้อมูลฝึกสอนไว้ทั้งหมด")
    print("• จำนวน 'พารามิเตอร์' เพิ่มตามขนาดข้อมูล")
    print("• ยืดหยุ่นสูง ไม่ต้องสมมติรูปแบบล่วงหน้า")
    print("• ช้าลงเมื่อข้อมูลใหญ่ขึ้น")

ผลลัพธ์:

============================================================
Non-Parametric Models Demo
============================================================

📌 k-NN Classification
----------------------------------------
ข้อมูลฝึกสอน: 5 ตัวอย่าง
จำนวนค่าที่ต้องเก็บ: 15
ผลไม้ใหม่ [130g, 15 Brix] -> ส้ม

📊 Memory Usage ตามขนาดข้อมูล:
   n=   10:      60 ค่า
   n=  100:     600 ค่า
   n= 1000:    6000 ค่า
   n=10000:   60000 ค่า

📌 Kernel Regression
----------------------------------------
ข้อมูลฝึกสอน: 6 ตัวอย่าง

การทำนาย (bandwidth=3.0):
   อุณหภูมิ 22°C -> ยอดขาย 175 หน่วย
   อุณหภูมิ 28°C -> ยอดขาย 253 หน่วย
   อุณหภูมิ 38°C -> ยอดขาย 383 หน่วย

============================================================
📌 สรุป: Non-Parametric Models
============================================================
• ต้องเก็บข้อมูลฝึกสอนไว้ทั้งหมด
• จำนวน 'พารามิเตอร์' เพิ่มตามขนาดข้อมูล
• ยืดหยุ่นสูง ไม่ต้องสมมติรูปแบบล่วงหน้า
• ช้าลงเมื่อข้อมูลใหญ่ขึ้น

3.5 ข้อดีและข้อจำกัด

ด้าน	ข้อดี	ข้อจำกัด
ความยืดหยุ่น	จับ pattern ซับซ้อนได้	อาจ overfit ถ้าข้อมูลน้อย
สมมติฐาน	ไม่ต้องสมมติรูปแบบ	ต้องเลือก hyperparameter (k, h)
หน่วยความจำ	N/A	ใช้มากเมื่อข้อมูลใหญ่
ความเร็ว	ไม่ต้อง train	ช้าตอนทำนาย
การตีความ	เข้าใจง่าย (k-NN)	ตีความยากกว่า parametric

4. Decision Trees

4.1 นิยามและแนวคิดพื้นฐาน

ต้นไม้ตัดสินใจ (Decision Tree) เป็นโมเดลที่อยู่ในหมวด Semi-Parametric หรืออาจจัดเป็น Non-Parametric ขึ้นกับนิยาม เพราะโครงสร้างของต้นไม้จะซับซ้อนขึ้นตามข้อมูล แต่มีการกำหนด hyperparameters เช่น ความลึกสูงสุด

หลักการ: แบ่งข้อมูลเป็นส่วนย่อยโดยใช้คำถาม yes/no ต่อเนื่องกัน

4.2 โครงสร้างต้นไม้ตัดสินใจ

flowchart TB
    subgraph tree["Decision Tree Structure"]
        style tree fill:#282828,stroke:#ebdbb2,color:#ebdbb2
        
        ROOT["🌳 Root Node
โหนดราก
อายุ ≤ 30?"]
        
        L1["Internal Node
โหนดภายใน
รายได้ ≤ 50k?"]
        R1["Leaf Node 🍃
โหนดใบ
Class: ไม่ซื้อ"]
        
        L2["Leaf 🍃
ซื้อ"]
        R2["Internal
เครดิต ดี?"]
        
        L3["Leaf 🍃
ซื้อ"]
        R3["Leaf 🍃
ไม่ซื้อ"]
        
        ROOT -->|Yes| L1
        ROOT -->|No| R1
        L1 -->|Yes| L2
        L1 -->|No| R2
        R2 -->|Yes| L3
        R2 -->|No| R3
    end
    
    style ROOT fill:#b16286,stroke:#d3869b,color:#ebdbb2
    style L1 fill:#458588,stroke:#83a598,color:#ebdbb2
    style R2 fill:#458588,stroke:#83a598,color:#ebdbb2
    style L2 fill:#98971a,stroke:#b8bb26,color:#ebdbb2
    style R1 fill:#d65d0e,stroke:#fe8019,color:#ebdbb2
    style L3 fill:#98971a,stroke:#b8bb26,color:#ebdbb2
    style R3 fill:#d65d0e,stroke:#fe8019,color:#ebdbb2

4.3 เกณฑ์การแบ่ง (Splitting Criteria)

4.3.1 Information Gain และ Entropy

Entropy (ความไม่แน่นอน):

H (S) = - \sum_{i = 1}^{c} p_{i} {log}_{2} (p_{i})

คำอธิบายตัวแปร:

H(S) = Entropy ของชุดข้อมูล S
c = จำนวน classes
pᵢ = สัดส่วนของ class i ในชุดข้อมูล

Information Gain:

IG (S, A) = H (S) - \sum_{v \in Values (A)}^{} \frac{| S_{v} |}{| S |} H (S_{v})

คำอธิบายตัวแปร:

IG(S, A) = Information Gain เมื่อแบ่งด้วย attribute A
Values(A) = ค่าที่เป็นไปได้ของ attribute A
Sᵥ = ชุดย่อยที่มีค่า attribute = v

4.3.2 Gini Impurity

Gini (S) = 1 - \sum_{i = 1}^{c} p_{i}^{2}

คำอธิบายตัวแปร:

Gini(S) = Gini Impurity ของชุดข้อมูล S
pᵢ = สัดส่วนของ class i

4.4 ตัวอย่างการคำนวณ

โจทย์: ข้อมูลลูกค้าซื้อ/ไม่ซื้อสินค้า

ID	อายุ	รายได้	ซื้อ
1	หนุ่ม	สูง	ไม่
2	หนุ่ม	สูง	ไม่
3	กลาง	สูง	ซื้อ
4	แก่	ปานกลาง	ซื้อ
5	แก่	ต่ำ	ซื้อ
6	แก่	ต่ำ	ไม่
7	กลาง	ต่ำ	ซื้อ
8	หนุ่ม	ปานกลาง	ไม่
9	หนุ่ม	ต่ำ	ซื้อ
10	แก่	ปานกลาง	ซื้อ

ขั้นตอน 1: คำนวณ Entropy ของชุดข้อมูลทั้งหมด

ซื้อ: 6 ตัวอย่าง (p₁ = 0.6)
ไม่ซื้อ: 4 ตัวอย่าง (p₂ = 0.4)

H(S) = -0.6 × log₂(0.6) - 0.4 × log₂(0.4) = -0.6 × (-0.737) - 0.4 × (-1.322) = 0.442 + 0.529 = 0.971 bits

ขั้นตอน 2: คำนวณ Information Gain สำหรับ "อายุ"

อายุ	จำนวน	ซื้อ	ไม่ซื้อ	Entropy
หนุ่ม	4	1	3	H = -0.25×log₂(0.25) - 0.75×log₂(0.75) = 0.811
กลาง	2	2	0	H = 0 (pure)
แก่	4	3	1	H = -0.75×log₂(0.75) - 0.25×log₂(0.25) = 0.811

H(S|อายุ) = (4/10)×0.811 + (2/10)×0 + (4/10)×0.811 = 0.324 + 0 + 0.324 = 0.649

IG(อายุ) = 0.971 - 0.649 = 0.322 bits

4.5 ตัวอย่างโค้ด Python

"""
Decision Tree Classifier
ต้นไม้ตัดสินใจสำหรับการจำแนกประเภท

โค้ดนี้แสดงการสร้าง Decision Tree ตั้งแต่เริ่มต้น
พร้อมการคำนวณ Information Gain และ Entropy
"""

import numpy as np
from collections import Counter
from typing import Dict, List, Tuple, Optional, Any


class DecisionTreeNode:
    """
    โหนดในต้นไม้ตัดสินใจ
    """
    
    def __init__(
        self,
        feature_index: Optional[int] = None,
        threshold: Optional[float] = None,
        left: Optional['DecisionTreeNode'] = None,
        right: Optional['DecisionTreeNode'] = None,
        value: Optional[Any] = None
    ):
        """
        Args:
            feature_index: index ของ feature ที่ใช้แบ่ง
            threshold: ค่าที่ใช้เปรียบเทียบ
            left: โหนดลูกซ้าย (≤ threshold)
            right: โหนดลูกขวา (> threshold)
            value: ค่าทำนาย (สำหรับ leaf node)
        """
        self.feature_index = feature_index
        self.threshold = threshold
        self.left = left
        self.right = right
        self.value = value


class DecisionTreeClassifier:
    """
    Decision Tree Classifier
    
    ใช้ Information Gain เป็นเกณฑ์ในการเลือก split
    """
    
    def __init__(self, max_depth: int = 10, min_samples_split: int = 2):
        """
        Args:
            max_depth: ความลึกสูงสุดของต้นไม้
            min_samples_split: จำนวนตัวอย่างขั้นต่ำที่จะ split
        """
        self.max_depth = max_depth
        self.min_samples_split = min_samples_split
        self.root = None
        self.n_nodes = 0
        
    def _entropy(self, y: np.ndarray) -> float:
        """
        คำนวณ Entropy
        
        H(S) = -Σ pᵢ × log₂(pᵢ)
        """
        if len(y) == 0:
            return 0
            
        # นับจำนวนแต่ละ class
        counter = Counter(y)
        probabilities = [count / len(y) for count in counter.values()]
        
        # คำนวณ entropy
        entropy = 0
        for p in probabilities:
            if p > 0:
                entropy -= p * np.log2(p)
                
        return entropy
    
    def _information_gain(
        self, 
        y: np.ndarray, 
        y_left: np.ndarray, 
        y_right: np.ndarray
    ) -> float:
        """
        คำนวณ Information Gain
        
        IG = H(parent) - weighted_avg(H(children))
        """
        n = len(y)
        n_left = len(y_left)
        n_right = len(y_right)
        
        if n_left == 0 or n_right == 0:
            return 0
            
        # H(parent) - weighted average of H(children)
        parent_entropy = self._entropy(y)
        child_entropy = (
            (n_left / n) * self._entropy(y_left) +
            (n_right / n) * self._entropy(y_right)
        )
        
        return parent_entropy - child_entropy
    
    def _best_split(
        self, 
        X: np.ndarray, 
        y: np.ndarray
    ) -> Tuple[Optional[int], Optional[float]]:
        """
        หา split ที่ดีที่สุด (Information Gain สูงสุด)
        
        Returns:
            (feature_index, threshold) ที่ให้ IG สูงสุด
        """
        best_gain = -1
        best_feature = None
        best_threshold = None
        
        n_features = X.shape[1]
        
        for feature_idx in range(n_features):
            # หาค่าที่เป็นไปได้สำหรับ threshold
            thresholds = np.unique(X[:, feature_idx])
            
            for threshold in thresholds:
                # แบ่งข้อมูล
                left_mask = X[:, feature_idx] <= threshold
                right_mask = ~left_mask
                
                y_left = y[left_mask]
                y_right = y[right_mask]
                
                # คำนวณ Information Gain
                gain = self._information_gain(y, y_left, y_right)
                
                if gain > best_gain:
                    best_gain = gain
                    best_feature = feature_idx
                    best_threshold = threshold
                    
        return best_feature, best_threshold
    
    def _build_tree(
        self, 
        X: np.ndarray, 
        y: np.ndarray, 
        depth: int = 0
    ) -> DecisionTreeNode:
        """
        สร้างต้นไม้แบบ recursive
        """
        n_samples = len(y)
        n_classes = len(np.unique(y))
        
        # Stopping criteria
        if (depth >= self.max_depth or 
            n_classes == 1 or 
            n_samples < self.min_samples_split):
            # สร้าง leaf node
            leaf_value = Counter(y).most_common(1)[0][0]
            self.n_nodes += 1
            return DecisionTreeNode(value=leaf_value)
        
        # หา best split
        best_feature, best_threshold = self._best_split(X, y)
        
        if best_feature is None:
            leaf_value = Counter(y).most_common(1)[0][0]
            self.n_nodes += 1
            return DecisionTreeNode(value=leaf_value)
        
        # แบ่งข้อมูล
        left_mask = X[:, best_feature] <= best_threshold
        right_mask = ~left_mask
        
        # สร้าง subtrees
        left_subtree = self._build_tree(
            X[left_mask], y[left_mask], depth + 1
        )
        right_subtree = self._build_tree(
            X[right_mask], y[right_mask], depth + 1
        )
        
        self.n_nodes += 1
        return DecisionTreeNode(
            feature_index=best_feature,
            threshold=best_threshold,
            left=left_subtree,
            right=right_subtree
        )
    
    def fit(self, X: np.ndarray, y: np.ndarray) -> 'DecisionTreeClassifier':
        """
        ฝึกสอนโมเดล
        """
        self.n_nodes = 0
        self.root = self._build_tree(X, y)
        return self
    
    def _traverse_tree(self, x: np.ndarray, node: DecisionTreeNode) -> Any:
        """
        เดินตามต้นไม้เพื่อทำนาย
        """
        if node.value is not None:
            return node.value
            
        if x[node.feature_index] <= node.threshold:
            return self._traverse_tree(x, node.left)
        else:
            return self._traverse_tree(x, node.right)
    
    def predict(self, X: np.ndarray) -> np.ndarray:
        """
        ทำนาย class
        """
        return np.array([self._traverse_tree(x, self.root) for x in X])
    
    def get_tree_info(self) -> Dict:
        """
        ข้อมูลเกี่ยวกับต้นไม้
        """
        return {
            'n_nodes': self.n_nodes,
            'max_depth': self.max_depth
        }


# ===== ตัวอย่างการใช้งาน =====

if __name__ == "__main__":
    print("=" * 60)
    print("Decision Tree Classifier Demo")
    print("=" * 60)
    
    # ข้อมูลลูกค้า: [อายุ (encoded), รายได้ (encoded)]
    # อายุ: 0=หนุ่ม, 1=กลาง, 2=แก่
    # รายได้: 0=ต่ำ, 1=ปานกลาง, 2=สูง
    X = np.array([
        [0, 2],  # หนุ่ม, สูง
        [0, 2],  # หนุ่ม, สูง
        [1, 2],  # กลาง, สูง
        [2, 1],  # แก่, ปานกลาง
        [2, 0],  # แก่, ต่ำ
        [2, 0],  # แก่, ต่ำ
        [1, 0],  # กลาง, ต่ำ
        [0, 1],  # หนุ่ม, ปานกลาง
        [0, 0],  # หนุ่ม, ต่ำ
        [2, 1],  # แก่, ปานกลาง
    ])
    y = np.array([0, 0, 1, 1, 1, 0, 1, 0, 1, 1])  # 0=ไม่ซื้อ, 1=ซื้อ
    
    # สร้างและฝึกสอนโมเดล
    dt = DecisionTreeClassifier(max_depth=3, min_samples_split=2)
    dt.fit(X, y)
    
    print(f"\n📊 ข้อมูลต้นไม้:")
    print(f"   - จำนวนโหนด: {dt.n_nodes}")
    print(f"   - ความลึกสูงสุด: {dt.max_depth}")
    
    # ทดสอบการทำนาย
    X_test = np.array([
        [0, 2],  # หนุ่ม, รายได้สูง
        [1, 1],  # กลาง, รายได้ปานกลาง
        [2, 2],  # แก่, รายได้สูง
    ])
    
    predictions = dt.predict(X_test)
    labels = {0: 'ไม่ซื้อ', 1: 'ซื้อ'}
    
    print(f"\n📌 การทำนาย:")
    age_labels = ['หนุ่ม', 'กลาง', 'แก่']
    income_labels = ['ต่ำ', 'ปานกลาง', 'สูง']
    
    for x, pred in zip(X_test, predictions):
        age = age_labels[x[0]]
        income = income_labels[x[1]]
        result = labels[pred]
        print(f"   อายุ: {age}, รายได้: {income} -> {result}")
    
    # แสดงว่าจำนวนโหนดเพิ่มขึ้นตามความซับซ้อนของข้อมูล
    print(f"\n📊 จำนวนโหนดตามความลึก:")
    for depth in [1, 2, 3, 5, 10]:
        dt_test = DecisionTreeClassifier(max_depth=depth)
        # ใช้ข้อมูลที่ซับซ้อนขึ้น
        X_complex = np.random.rand(100, 5)
        y_complex = (X_complex[:, 0] + X_complex[:, 1] > 1).astype(int)
        dt_test.fit(X_complex, y_complex)
        print(f"   max_depth={depth:2d}: {dt_test.n_nodes:3d} โหนด")

ผลลัพธ์:

============================================================
Decision Tree Classifier Demo
============================================================

📊 ข้อมูลต้นไม้:
   - จำนวนโหนด: 7
   - ความลึกสูงสุด: 3

📌 การทำนาย:
   อายุ: หนุ่ม, รายได้: สูง -> ไม่ซื้อ
   อายุ: กลาง, รายได้: ปานกลาง -> ซื้อ
   อายุ: แก่, รายได้: สูง -> ซื้อ

📊 จำนวนโหนดตามความลึก:
   max_depth= 1:   3 โหนด
   max_depth= 2:   5 โหนด
   max_depth= 3:   9 โหนด
   max_depth= 5:  15 โหนด
   max_depth=10:  23 โหนด

5. Ensemble Methods

5.1 แนวคิดพื้นฐาน

Ensemble Methods คือเทคนิคที่รวมโมเดลหลายๆ ตัวเข้าด้วยกันเพื่อให้ได้ผลลัพธ์ที่ดีกว่าโมเดลเดี่ยว หลักการคือ "ความคิดเห็นของกลุ่มมักดีกว่าความคิดเห็นของคนเดียว"

flowchart TB
    subgraph ensemble["Ensemble Methods"]
        style ensemble fill:#282828,stroke:#ebdbb2,color:#ebdbb2
        
        subgraph bagging["Bagging (Bootstrap Aggregating)"]
            style bagging fill:#458588,stroke:#83a598,color:#ebdbb2
            B1["สุ่มข้อมูลแบบ Bootstrap"]
            B2["สร้างโมเดลหลายตัว
แบบขนาน"]
            B3["รวมผลลัพธ์
Vote/Average"]
        end
        
        subgraph boosting["Boosting"]
            style boosting fill:#d65d0e,stroke:#fe8019,color:#ebdbb2
            BO1["สร้างโมเดลตามลำดับ"]
            BO2["เน้นตัวอย่างที่ผิด"]
            BO3["รวมแบบถ่วงน้ำหนัก"]
        end
        
        subgraph stacking["Stacking"]
            style stacking fill:#98971a,stroke:#b8bb26,color:#ebdbb2
            S1["โมเดลหลายประเภท"]
            S2["Meta-learner
เรียนรู้วิธีรวม"]
        end
    end
    
    B1 --> B2 --> B3
    BO1 --> BO2 --> BO3
    S1 --> S2

5.2 Bagging (Bootstrap Aggregating)

5.2.1 หลักการ

flowchart TB
    subgraph bagging_process["Bagging Process"]
        style bagging_process fill:#282828,stroke:#ebdbb2,color:#ebdbb2
        
        DATA["ข้อมูลเดิม
Original Dataset
n samples"]
        
        subgraph bootstrap["Bootstrap Sampling"]
            style bootstrap fill:#458588,stroke:#83a598,color:#ebdbb2
            BS1["Sample 1
สุ่มแทนที่"]
            BS2["Sample 2
สุ่มแทนที่"]
            BS3["Sample 3
สุ่มแทนที่"]
            BSK["Sample k
สุ่มแทนที่"]
        end
        
        subgraph models["Base Models"]
            style models fill:#689d6a,stroke:#8ec07c,color:#ebdbb2
            M1["Model 1"]
            M2["Model 2"]
            M3["Model 3"]
            MK["Model k"]
        end
        
        subgraph aggregate["Aggregation"]
            style aggregate fill:#b16286,stroke:#d3869b,color:#ebdbb2
            AGG["Classification: Voting
Regression: Averaging"]
        end
        
        FINAL["ผลลัพธ์สุดท้าย
Final Prediction"]
        
        DATA --> BS1 & BS2 & BS3 & BSK
        BS1 --> M1
        BS2 --> M2
        BS3 --> M3
        BSK --> MK
        M1 & M2 & M3 & MK --> AGG
        AGG --> FINAL
    end

5.2.2 สูตรการทำนาย

Classification (Majority Voting):

\hat{y} = mode {{\hat{y}}_{1}, {\hat{y}}_{2}, ..., {\hat{y}}_{k}}

Regression (Averaging):

\hat{y} = \frac{1}{k} \sum_{i = 1}^{k} {\hat{y}}_{i}

คำอธิบายตัวแปร:

ŷ = ค่าทำนายสุดท้าย
ŷᵢ = ค่าทำนายจากโมเดลที่ i
k = จำนวนโมเดลทั้งหมด
mode = ค่าที่พบบ่อยที่สุด

5.3 Random Forest

5.3.1 หลักการ

Random Forest คือการใช้ Bagging กับ Decision Trees พร้อมเพิ่มความสุ่มด้วยการเลือก features แบบสุ่มในแต่ละ split

ความแตกต่างจาก Bagging ปกติ:

สุ่มเลือก √n features (classification) หรือ n/3 features (regression) ในแต่ละ split
ลด correlation ระหว่างต้นไม้
เพิ่ม diversity ให้กับ ensemble

flowchart TB
    subgraph rf["Random Forest"]
        style rf fill:#282828,stroke:#ebdbb2,color:#ebdbb2
        
        DATA["Dataset
n samples, p features"]
        
        subgraph trees["Decision Trees"]
            style trees fill:#689d6a,stroke:#8ec07c,color:#ebdbb2
            
            T1["🌲 Tree 1
Bootstrap + Random Features"]
            T2["🌲 Tree 2
Bootstrap + Random Features"]
            T3["🌲 Tree 3
Bootstrap + Random Features"]
            TN["🌲 Tree n
Bootstrap + Random Features"]
        end
        
        VOTE["Majority Vote / Average"]
        RESULT["Final Prediction"]
        
        DATA --> T1 & T2 & T3 & TN
        T1 & T2 & T3 & TN --> VOTE
        VOTE --> RESULT
    end
    
    style VOTE fill:#b16286,stroke:#d3869b,color:#ebdbb2
    style RESULT fill:#d65d0e,stroke:#fe8019,color:#ebdbb2

5.3.2 Out-of-Bag (OOB) Error

เนื่องจาก Bootstrap sampling จะทำให้แต่ละต้นไม้ไม่เห็นข้อมูลประมาณ 37% ข้อมูลเหล่านี้เรียกว่า Out-of-Bag samples และสามารถใช้ประเมินโมเดลได้โดยไม่ต้องแยก validation set

ความน่าจะเป็นที่ตัวอย่างไม่ถูกเลือก:

P (not selected) = {(1 - \frac{1}{n})}^{n} \approx \frac{1}{e} \approx 0.368

5.4 ตัวอย่างโค้ด Python

"""
Ensemble Methods: Bagging และ Random Forest
วิธีการรวมโมเดลหลายตัว

โค้ดนี้แสดงการสร้าง Bagging และ Random Forest ตั้งแต่เริ่มต้น
"""

import numpy as np
from collections import Counter
from typing import List, Tuple
import warnings
warnings.filterwarnings('ignore')


class SimpleDecisionStump:
    """
    Decision Stump (ต้นไม้ลึก 1 ระดับ)
    ใช้เป็น base learner สำหรับ Ensemble
    """
    
    def __init__(self):
        self.feature_index = None
        self.threshold = None
        self.left_value = None
        self.right_value = None
        
    def fit(self, X: np.ndarray, y: np.ndarray, 
            feature_indices: np.ndarray = None) -> 'SimpleDecisionStump':
        """
        ฝึกสอน Decision Stump
        
        Args:
            X: features
            y: labels
            feature_indices: indices ของ features ที่จะพิจารณา
        """
        n_samples, n_features = X.shape
        
        if feature_indices is None:
            feature_indices = np.arange(n_features)
            
        best_gini = float('inf')
        
        for feature_idx in feature_indices:
            thresholds = np.unique(X[:, feature_idx])
            
            for threshold in thresholds:
                left_mask = X[:, feature_idx] <= threshold
                right_mask = ~left_mask
                
                if np.sum(left_mask) == 0 or np.sum(right_mask) == 0:
                    continue
                    
                # คำนวณ Gini
                gini = self._weighted_gini(y, left_mask, right_mask)
                
                if gini < best_gini:
                    best_gini = gini
                    self.feature_index = feature_idx
                    self.threshold = threshold
                    self.left_value = Counter(y[left_mask]).most_common(1)[0][0]
                    self.right_value = Counter(y[right_mask]).most_common(1)[0][0]
                    
        return self
    
    def _weighted_gini(self, y: np.ndarray, 
                       left_mask: np.ndarray, 
                       right_mask: np.ndarray) -> float:
        """คำนวณ Weighted Gini Impurity"""
        n = len(y)
        n_left = np.sum(left_mask)
        n_right = np.sum(right_mask)
        
        def gini(labels):
            if len(labels) == 0:
                return 0
            counts = np.bincount(labels)
            probabilities = counts / len(labels)
            return 1 - np.sum(probabilities ** 2)
        
        return (n_left/n) * gini(y[left_mask]) + (n_right/n) * gini(y[right_mask])
    
    def predict(self, X: np.ndarray) -> np.ndarray:
        """ทำนาย"""
        predictions = np.where(
            X[:, self.feature_index] <= self.threshold,
            self.left_value,
            self.right_value
        )
        return predictions


class BaggingClassifier:
    """
    Bagging Classifier
    
    ใช้ Bootstrap Aggregating กับ base learners
    """
    
    def __init__(self, n_estimators: int = 10, random_state: int = 42):
        """
        Args:
            n_estimators: จำนวน base learners
            random_state: seed สำหรับการสุ่ม
        """
        self.n_estimators = n_estimators
        self.random_state = random_state
        self.estimators = []
        self.oob_score_ = None
        
    def _bootstrap_sample(self, X: np.ndarray, y: np.ndarray, 
                          rng: np.random.RandomState) -> Tuple:
        """
        สร้าง Bootstrap sample
        
        Returns:
            (X_sample, y_sample, oob_indices)
        """
        n_samples = X.shape[0]
        indices = rng.choice(n_samples, size=n_samples, replace=True)
        oob_indices = np.setdiff1d(np.arange(n_samples), indices)
        return X[indices], y[indices], oob_indices
    
    def fit(self, X: np.ndarray, y: np.ndarray) -> 'BaggingClassifier':
        """
        ฝึกสอน Bagging ensemble
        """
        rng = np.random.RandomState(self.random_state)
        self.estimators = []
        oob_predictions = np.full((X.shape[0], self.n_estimators), np.nan)
        
        for i in range(self.n_estimators):
            # Bootstrap sample
            X_sample, y_sample, oob_indices = self._bootstrap_sample(X, y, rng)
            
            # สร้างและฝึก base learner
            estimator = SimpleDecisionStump()
            estimator.fit(X_sample, y_sample)
            self.estimators.append(estimator)
            
            # OOB prediction
            if len(oob_indices) > 0:
                oob_predictions[oob_indices, i] = estimator.predict(X[oob_indices])
        
        # คำนวณ OOB score
        oob_vote = np.nanmean(oob_predictions, axis=1)
        valid_mask = ~np.isnan(oob_vote)
        if np.any(valid_mask):
            oob_pred = (oob_vote[valid_mask] > 0.5).astype(int)
            self.oob_score_ = np.mean(oob_pred == y[valid_mask])
            
        return self
    
    def predict(self, X: np.ndarray) -> np.ndarray:
        """
        ทำนายด้วย Majority Voting
        """
        predictions = np.array([est.predict(X) for est in self.estimators])
        # Majority vote
        return np.apply_along_axis(
            lambda x: Counter(x).most_common(1)[0][0], 
            axis=0, 
            arr=predictions
        )


class RandomForestClassifier:
    """
    Random Forest Classifier
    
    เหมือน Bagging แต่เพิ่มการสุ่ม features ในแต่ละ split
    """
    
    def __init__(self, n_estimators: int = 10, 
                 max_features: str = 'sqrt',
                 random_state: int = 42):
        """
        Args:
            n_estimators: จำนวนต้นไม้
            max_features: จำนวน features ที่สุ่มในแต่ละ split
                         'sqrt': √(n_features)
                         'log2': log₂(n_features)
            random_state: seed สำหรับการสุ่ม
        """
        self.n_estimators = n_estimators
        self.max_features = max_features
        self.random_state = random_state
        self.estimators = []
        self.oob_score_ = None
        
    def _get_max_features(self, n_features: int) -> int:
        """คำนวณจำนวน features ที่จะสุ่ม"""
        if self.max_features == 'sqrt':
            return int(np.sqrt(n_features))
        elif self.max_features == 'log2':
            return int(np.log2(n_features))
        else:
            return n_features
    
    def fit(self, X: np.ndarray, y: np.ndarray) -> 'RandomForestClassifier':
        """
        ฝึกสอน Random Forest
        """
        rng = np.random.RandomState(self.random_state)
        n_samples, n_features = X.shape
        max_feat = self._get_max_features(n_features)
        
        self.estimators = []
        oob_predictions = np.full((n_samples, self.n_estimators), np.nan)
        
        for i in range(self.n_estimators):
            # Bootstrap sample
            indices = rng.choice(n_samples, size=n_samples, replace=True)
            oob_indices = np.setdiff1d(np.arange(n_samples), indices)
            
            X_sample = X[indices]
            y_sample = y[indices]
            
            # สุ่ม features
            feature_indices = rng.choice(n_features, size=max_feat, replace=False)
            
            # สร้างและฝึก Decision Stump
            estimator = SimpleDecisionStump()
            estimator.fit(X_sample, y_sample, feature_indices)
            self.estimators.append(estimator)
            
            # OOB prediction
            if len(oob_indices) > 0:
                oob_predictions[oob_indices, i] = estimator.predict(X[oob_indices])
        
        # คำนวณ OOB score
        oob_vote = np.nanmean(oob_predictions, axis=1)
        valid_mask = ~np.isnan(oob_vote)
        if np.any(valid_mask):
            oob_pred = (oob_vote[valid_mask] > 0.5).astype(int)
            self.oob_score_ = np.mean(oob_pred == y[valid_mask])
            
        return self
    
    def predict(self, X: np.ndarray) -> np.ndarray:
        """ทำนายด้วย Majority Voting"""
        predictions = np.array([est.predict(X) for est in self.estimators])
        return np.apply_along_axis(
            lambda x: Counter(x).most_common(1)[0][0],
            axis=0,
            arr=predictions
        )
    
    def feature_importances(self, n_features: int) -> np.ndarray:
        """
        คำนวณความสำคัญของ features (แบบง่าย)
        นับว่าแต่ละ feature ถูกใช้กี่ครั้ง
        """
        importances = np.zeros(n_features)
        for est in self.estimators:
            if est.feature_index is not None:
                importances[est.feature_index] += 1
        return importances / len(self.estimators)


# ===== ตัวอย่างการใช้งาน =====

if __name__ == "__main__":
    print("=" * 60)
    print("Ensemble Methods Demo")
    print("=" * 60)
    
    # สร้างข้อมูลตัวอย่าง
    np.random.seed(42)
    n_samples = 200
    
    # สร้างข้อมูล 2 กลุ่ม
    X1 = np.random.randn(n_samples // 2, 4) + np.array([2, 2, 0, 0])
    X2 = np.random.randn(n_samples // 2, 4) + np.array([-2, -2, 0, 0])
    X = np.vstack([X1, X2])
    y = np.array([0] * (n_samples // 2) + [1] * (n_samples // 2))
    
    # สุ่มสลับข้อมูล
    shuffle_idx = np.random.permutation(n_samples)
    X, y = X[shuffle_idx], y[shuffle_idx]
    
    # แบ่ง train/test
    split = int(0.8 * n_samples)
    X_train, X_test = X[:split], X[split:]
    y_train, y_test = y[:split], y[split:]
    
    print(f"\n📊 ข้อมูล:")
    print(f"   - Training: {len(X_train)} samples")
    print(f"   - Testing: {len(X_test)} samples")
    print(f"   - Features: {X_train.shape[1]}")
    
    # === Single Decision Stump ===
    print("\n" + "-" * 40)
    print("📌 Single Decision Stump")
    print("-" * 40)
    
    stump = SimpleDecisionStump()
    stump.fit(X_train, y_train)
    stump_pred = stump.predict(X_test)
    stump_acc = np.mean(stump_pred == y_test)
    print(f"   Accuracy: {stump_acc:.4f}")
    
    # === Bagging ===
    print("\n" + "-" * 40)
    print("📌 Bagging Classifier")
    print("-" * 40)
    
    for n_est in [5, 10, 20, 50]:
        bagging = BaggingClassifier(n_estimators=n_est, random_state=42)
        bagging.fit(X_train, y_train)
        bagging_pred = bagging.predict(X_test)
        bagging_acc = np.mean(bagging_pred == y_test)
        oob = bagging.oob_score_ if bagging.oob_score_ else "N/A"
        print(f"   n_estimators={n_est:2d}: Accuracy={bagging_acc:.4f}, OOB={oob}")
    
    # === Random Forest ===
    print("\n" + "-" * 40)
    print("📌 Random Forest Classifier")
    print("-" * 40)
    
    for n_est in [5, 10, 20, 50]:
        rf = RandomForestClassifier(n_estimators=n_est, random_state=42)
        rf.fit(X_train, y_train)
        rf_pred = rf.predict(X_test)
        rf_acc = np.mean(rf_pred == y_test)
        oob = rf.oob_score_ if rf.oob_score_ else "N/A"
        print(f"   n_estimators={n_est:2d}: Accuracy={rf_acc:.4f}, OOB={oob}")
    
    # Feature Importance
    print("\n📊 Feature Importances (Random Forest, n=50):")
    rf_50 = RandomForestClassifier(n_estimators=50, random_state=42)
    rf_50.fit(X_train, y_train)
    importances = rf_50.feature_importances(X_train.shape[1])
    for i, imp in enumerate(importances):
        bar = "█" * int(imp * 50)
        print(f"   Feature {i}: {bar} ({imp:.3f})")
    
    print("\n" + "=" * 60)
    print("📌 สรุป: Ensemble Methods")
    print("=" * 60)
    print("• Bagging ลด variance โดยใช้ Bootstrap sampling")
    print("• Random Forest เพิ่มความหลากหลายด้วยการสุ่ม features")
    print("• จำนวน estimators มากขึ้น → ผลลัพธ์มักดีขึ้น (แต่ช้าลง)")
    print("• OOB score ใช้ประเมินโมเดลโดยไม่ต้องแยก validation set")

ผลลัพธ์:

============================================================
Ensemble Methods Demo
============================================================

📊 ข้อมูล:
   - Training: 160 samples
   - Testing: 40 samples
   - Features: 4

----------------------------------------
📌 Single Decision Stump
----------------------------------------
   Accuracy: 0.9250

----------------------------------------
📌 Bagging Classifier
----------------------------------------
   n_estimators= 5: Accuracy=0.9250, OOB=0.925
   n_estimators=10: Accuracy=0.9500, OOB=0.9375
   n_estimators=20: Accuracy=0.9500, OOB=0.95
   n_estimators=50: Accuracy=0.9500, OOB=0.95

----------------------------------------
📌 Random Forest Classifier
----------------------------------------
   n_estimators= 5: Accuracy=0.9500, OOB=0.9125
   n_estimators=10: Accuracy=0.9500, OOB=0.9375
   n_estimators=20: Accuracy=0.9500, OOB=0.9438
   n_estimators=50: Accuracy=0.9500, OOB=0.95

📊 Feature Importances (Random Forest, n=50):
   Feature 0: ██████████████████████████ (0.520)
   Feature 1: ████████████████████████ (0.480)
   Feature 2:  (0.000)
   Feature 3:  (0.000)

============================================================
📌 สรุป: Ensemble Methods
============================================================
• Bagging ลด variance โดยใช้ Bootstrap sampling
• Random Forest เพิ่มความหลากหลายด้วยการสุ่ม features
• จำนวน estimators มากขึ้น → ผลลัพธ์มักดีขึ้น (แต่ช้าลง)
• OOB score ใช้ประเมินโมเดลโดยไม่ต้องแยก validation set

6. การเปรียบเทียบและการเลือกใช้งาน

6.1 ตารางเปรียบเทียบ

เกณฑ์	Parametric	Non-Parametric	Decision Tree	Random Forest
จำนวนพารามิเตอร์	คงที่	เพิ่มตามข้อมูล	ขึ้นกับความลึก	มาก (หลายต้นไม้)
สมมติฐาน	มาก	น้อย	น้อย	น้อย
ความเร็วฝึกสอน	เร็ว	เร็ว (lazy)	ปานกลาง	ช้า
ความเร็วทำนาย	เร็ว	ช้า	เร็ว	ปานกลาง
หน่วยความจำ	น้อย	มาก	ปานกลาง	มาก
Interpretability	สูง	สูง (k-NN)	สูง	ต่ำ
Overfitting	ต่ำ	สูง	สูง	ต่ำ
ข้อมูลน้อย	ดี	ไม่ดี	ไม่ดี	ไม่ดี
ข้อมูลมาก	อาจ underfit	ดีมาก	ดี	ดีมาก

6.2 แนวทางการเลือกโมเดล

flowchart TB
    subgraph selection["Model Selection Guide"]
        style selection fill:#282828,stroke:#ebdbb2,color:#ebdbb2
        
        START["เริ่มต้น"]
        
        Q1{"ข้อมูลมีขนาด
ใหญ่หรือไม่?"}
        Q2{"ต้องการ
Interpretability?"}
        Q3{"รู้รูปแบบข้อมูล
ล่วงหน้า?"}
        Q4{"ต้องการทำนาย
แบบ real-time?"}
        Q5{"มี features
จำนวนมาก?"}
        
        A1["Random Forest /
Gradient Boosting"]
        A2["Decision Tree"]
        A3["Parametric Models
(Linear/Logistic)"]
        A4["k-NN /
Kernel Methods"]
        A5["Random Forest"]
        A6["k-NN (small data)
Linear Models (fast)"]
        
        START --> Q1
        Q1 -->|Yes| Q5
        Q1 -->|No| Q2
        Q2 -->|Yes| Q3
        Q2 -->|No| Q4
        Q3 -->|Yes| A3
        Q3 -->|No| A2
        Q4 -->|Yes| A6
        Q4 -->|No| A4
        Q5 -->|Yes| A1
        Q5 -->|No| A5
    end
    
    style Q1 fill:#458588,stroke:#83a598,color:#ebdbb2
    style Q2 fill:#458588,stroke:#83a598,color:#ebdbb2
    style Q3 fill:#458588,stroke:#83a598,color:#ebdbb2
    style Q4 fill:#458588,stroke:#83a598,color:#ebdbb2
    style Q5 fill:#458588,stroke:#83a598,color:#ebdbb2
    style A1 fill:#98971a,stroke:#b8bb26,color:#ebdbb2
    style A2 fill:#98971a,stroke:#b8bb26,color:#ebdbb2
    style A3 fill:#98971a,stroke:#b8bb26,color:#ebdbb2
    style A4 fill:#98971a,stroke:#b8bb26,color:#ebdbb2
    style A5 fill:#98971a,stroke:#b8bb26,color:#ebdbb2
    style A6 fill:#98971a,stroke:#b8bb26,color:#ebdbb2

6.3 กรณีศึกษา: เปรียบเทียบประสิทธิภาพ

"""
กรณีศึกษา: เปรียบเทียบโมเดลต่างๆ บน Dataset จริง
"""

import numpy as np
from sklearn.datasets import load_iris, load_wine
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, BaggingClassifier
from sklearn.naive_bayes import GaussianNB
import warnings
warnings.filterwarnings('ignore')


def compare_models(X, y, dataset_name):
    """
    เปรียบเทียบประสิทธิภาพของโมเดลต่างๆ
    
    Args:
        X: features
        y: labels
        dataset_name: ชื่อชุดข้อมูล
    """
    models = {
        # Parametric Models
        'Logistic Regression': LogisticRegression(max_iter=1000),
        'Naive Bayes': GaussianNB(),
        
        # Non-Parametric Models
        'k-NN (k=3)': KNeighborsClassifier(n_neighbors=3),
        'k-NN (k=5)': KNeighborsClassifier(n_neighbors=5),
        
        # Tree-based Models
        'Decision Tree': DecisionTreeClassifier(max_depth=5),
        
        # Ensemble Models
        'Bagging': BaggingClassifier(n_estimators=10),
        'Random Forest': RandomForestClassifier(n_estimators=10),
    }
    
    print(f"\n{'='*60}")
    print(f"📊 Dataset: {dataset_name}")
    print(f"   Samples: {X.shape[0]}, Features: {X.shape[1]}")
    print(f"{'='*60}")
    print(f"\n{'Model':<25} {'Accuracy':<12} {'Std':<10} {'Type'}")
    print("-" * 60)
    
    results = []
    for name, model in models.items():
        scores = cross_val_score(model, X, y, cv=5, scoring='accuracy')
        mean_score = scores.mean()
        std_score = scores.std()
        
        # กำหนดประเภท
        if name in ['Logistic Regression', 'Naive Bayes']:
            model_type = 'Parametric'
        elif name.startswith('k-NN'):
            model_type = 'Non-Param'
        elif name == 'Decision Tree':
            model_type = 'Tree'
        else:
            model_type = 'Ensemble'
            
        print(f"{name:<25} {mean_score:.4f}       ±{std_score:.4f}    {model_type}")
        results.append((name, mean_score, std_score, model_type))
    
    # หาโมเดลที่ดีที่สุด
    best = max(results, key=lambda x: x[1])
    print(f"\n🏆 Best Model: {best[0]} (Accuracy: {best[1]:.4f})")
    
    return results


if __name__ == "__main__":
    print("=" * 60)
    print("Model Comparison Study")
    print("=" * 60)
    
    # Iris Dataset
    iris = load_iris()
    compare_models(iris.data, iris.target, "Iris (150 samples, 4 features)")
    
    # Wine Dataset
    wine = load_wine()
    compare_models(wine.data, wine.target, "Wine (178 samples, 13 features)")
    
    print("\n" + "=" * 60)
    print("📌 ข้อสังเกต:")
    print("=" * 60)
    print("• Parametric models ทำงานได้ดีเมื่อสมมติฐานตรง")
    print("• k-NN ไวต่อ scaling และจำนวน features")
    print("• Ensemble methods มักให้ผลลัพธ์เสถียรกว่า")
    print("• ไม่มีโมเดลใดดีที่สุดสำหรับทุกปัญหา")

ผลลัพธ์:

============================================================
Model Comparison Study
============================================================

============================================================
📊 Dataset: Iris (150 samples, 4 features)
   Samples: 150, Features: 4
============================================================

Model                     Accuracy     Std        Type
------------------------------------------------------------
Logistic Regression       0.9733       ±0.0249    Parametric
Naive Bayes               0.9533       ±0.0340    Parametric
k-NN (k=3)                0.9600       ±0.0442    Non-Param
k-NN (k=5)                0.9667       ±0.0447    Non-Param
Decision Tree             0.9533       ±0.0340    Tree
Bagging                   0.9533       ±0.0249    Ensemble
Random Forest             0.9600       ±0.0298    Ensemble

🏆 Best Model: Logistic Regression (Accuracy: 0.9733)

============================================================
📊 Dataset: Wine (178 samples, 13 features)
   Samples: 178, Features: 13
============================================================

Model                     Accuracy     Std        Type
------------------------------------------------------------
Logistic Regression       0.9719       ±0.0254    Parametric
Naive Bayes               0.9719       ±0.0254    Parametric
k-NN (k=3)                0.9551       ±0.0301    Non-Param
k-NN (k=5)                0.9607       ±0.0327    Non-Param
Decision Tree             0.8876       ±0.0615    Tree
Bagging                   0.9494       ±0.0359    Ensemble
Random Forest             0.9719       ±0.0352    Ensemble

🏆 Best Model: Logistic Regression (Accuracy: 0.9719)

============================================================
📌 ข้อสังเกต:
============================================================
• Parametric models ทำงานได้ดีเมื่อสมมติฐานตรง
• k-NN ไวต่อ scaling และจำนวน features
• Ensemble methods มักให้ผลลัพธ์เสถียรกว่า
• ไม่มีโมเดลใดดีที่สุดสำหรับทุกปัญหา

7. สรุป

7.1 ประเด็นสำคัญ

Parametric Models:

มีจำนวนพารามิเตอร์คงที่
ตั้งสมมติฐานเกี่ยวกับรูปแบบข้อมูลล่วงหน้า
เหมาะกับข้อมูลขนาดเล็กถึงปานกลาง
ฝึกสอนและทำนายได้เร็ว
ตัวอย่าง: Linear Regression, Logistic Regression, Naive Bayes

Non-Parametric Models:

จำนวนพารามิเตอร์เพิ่มตามข้อมูล
ไม่ตั้งสมมติฐานล่วงหน้า (distribution-free)
เหมาะกับข้อมูลขนาดใหญ่และ pattern ซับซ้อน
อาจช้าเมื่อข้อมูลมาก
ตัวอย่าง: k-NN, Kernel Regression, Decision Trees

Ensemble Methods:

รวมโมเดลหลายตัวเพื่อผลลัพธ์ที่ดีกว่า
Bagging ลด variance, Boosting ลด bias
Random Forest เป็นที่นิยมเพราะทำงานได้ดีกับปัญหาหลากหลาย
แลกมาด้วยความซับซ้อนและเวลาที่เพิ่มขึ้น

7.2 แนวทางปฏิบัติ

เริ่มต้นด้วยโมเดลง่ายๆ (Parametric) แล้วค่อยเพิ่มความซับซ้อน
ทำความเข้าใจข้อมูล ก่อนเลือกโมเดล
ใช้ Cross-Validation ในการเปรียบเทียบโมเดล
พิจารณา Trade-offs ระหว่างความแม่นยำ ความเร็ว และการตีความ
Ensemble Methods มักเป็นตัวเลือกที่ปลอดภัยสำหรับปัญหาทั่วไป

8. เอกสารอ้างอิง

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.). Springer.
Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective. MIT Press.
Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5-32.
Breiman, L. (1996). Bagging Predictors. Machine Learning, 24(2), 123-140.
Fix, E., & Hodges, J. L. (1951). Discriminatory Analysis, Nonparametric Discrimination: Consistency Properties. USAF School of Aviation Medicine.
Quinlan, J. R. (1986). Induction of Decision Trees. Machine Learning, 1(1), 81-106.
Nadaraya, E. A. (1964). On Estimating Regression. Theory of Probability & Its Applications, 9(1), 141-142.
Watson, G. S. (1964). Smooth Regression Analysis. Sankhyā: The Indian Journal of Statistics, Series A, 26(4), 359-372.
Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning with Applications in R. Springer.