การเรียนรู้แบบมีผู้สอน: โมเดลแบบจำแนก (Supervised Learning: Discriminative Models)

1. บทนำ (Introduction)

1.1 ภาพรวมของ Discriminative Models

โมเดลแบบจำแนก (Discriminative Models) คือโมเดลที่มุ่งเน้นการเรียนรู้ ขอบเขตการตัดสินใจ (Decision Boundary) ระหว่างคลาสต่างๆ โดยตรง แทนที่จะสร้างแบบจำลองการแจกแจงความน่าจะเป็นของข้อมูลทั้งหมด

โมเดลประเภทนี้พยายามเรียนรู้ความน่าจะเป็นแบบมีเงื่อนไข P(y|x) โดยตรง หรือเรียนรู้ฟังก์ชันการทำนาย f(x) → y ซึ่งแตกต่างจาก Generative Models ที่เรียนรู้ P(x|y) และ P(y)

1.2 ประวัติความเป็นมา (Historical Timeline)

flowchart LR
    subgraph era1["ยุคบุกเบิก (1940s-1960s)"]
        style era1 fill:#458588,color:#ebdbb2
        A["1943: McCulloch-Pitts Neuron
โมเดลเซลล์ประสาทแรก"]
        B["1958: Perceptron
Frank Rosenblatt"]
        A --> B
    end
    
    subgraph era2["ยุคพัฒนา (1970s-1990s)"]
        style era2 fill:#689d6a,color:#ebdbb2
        C["1972: Logistic Regression
Cox & Snell"]
        D["1986: Backpropagation
Rumelhart et al."]
        C --> D
    end
    
    subgraph era3["ยุคปัจจุบัน (2000s-Now)"]
        style era3 fill:#d79921,color:#282828
        E["2006: Deep Learning
Hinton et al."]
        F["2012: AlexNet
ImageNet Breakthrough"]
        E --> F
    end
    
    era1 --> era2 --> era3

1.3 ตำแหน่งใน Machine Learning Landscape

flowchart TB
    subgraph ML["Machine Learning"]
        style ML fill:#282828,color:#ebdbb2
        
        subgraph SL["Supervised Learning"]
            style SL fill:#458588,color:#ebdbb2
            
            subgraph GEN["Generative Models"]
                style GEN fill:#689d6a,color:#ebdbb2
                G1["Naive Bayes"]
                G2["GDA"]
                G3["HMM"]
            end
            
            subgraph DIS["Discriminative Models"]
                style DIS fill:#d79921,color:#282828
                D1["Linear Regression"]
                D2["Logistic Regression"]
                D3["Perceptron"]
                D4["SVM"]
                D5["Neural Networks"]
            end
        end
        
        subgraph UL["Unsupervised Learning"]
            style UL fill:#b16286,color:#ebdbb2
            U1["K-Means"]
            U2["PCA"]
        end
    end

2. การถดถอยเชิงเส้น (Linear Regression)

2.1 แนวคิดพื้นฐาน (Basic Concept)

การถดถอยเชิงเส้น (Linear Regression) เป็นวิธีการทางสถิติที่ใช้สำหรับ การทำนายค่าต่อเนื่อง (Continuous Value Prediction) โดยสร้างความสัมพันธ์เชิงเส้นระหว่างตัวแปรอิสระ (Independent Variables) และตัวแปรตาม (Dependent Variable)

สมมติฐานหลัก:

ความสัมพันธ์ระหว่าง X และ y เป็นเชิงเส้น
ข้อผิดพลาด (Errors) มีการแจกแจงแบบปกติ
ความแปรปรวนของข้อผิดพลาดคงที่ (Homoscedasticity)
ข้อผิดพลาดเป็นอิสระต่อกัน (Independence)

2.2 สมการทางคณิตศาสตร์ (Mathematical Formulation)

2.2.1 Simple Linear Regression

สมการพื้นฐานของการถดถอยเชิงเส้นอย่างง่าย:

y = w_{0} + w_{1} x + ε

คำอธิบายตัวแปร:

y = ตัวแปรตาม (Target/Dependent Variable)
x = ตัวแปรอิสระ (Feature/Independent Variable)
w₀ = ค่าจุดตัดแกน y (Intercept/Bias)
w₁ = ค่าความชัน (Slope/Weight)
ε = ค่าความผิดพลาด (Error Term)

2.2.2 Multiple Linear Regression

สำหรับกรณีที่มีหลายตัวแปร:

y = w_{0} + w_{1} x_{1} + w_{2} x_{2} + ... + w_{n} x_{n} + ε

หรือเขียนในรูปเวกเตอร์:

y = w^{T} x + ε

คำอธิบายตัวแปร:

w = เวกเตอร์น้ำหนัก [w₀, w₁, w₂, ..., wₙ]ᵀ
x = เวกเตอร์คุณลักษณะ [1, x₁, x₂, ..., xₙ]ᵀ

2.3 ฟังก์ชันค่าใช้จ่าย (Cost Function)

2.3.1 Mean Squared Error (MSE)

ฟังก์ชัน Mean Squared Error ใช้วัดความแตกต่างระหว่างค่าทำนายและค่าจริง:

J (w) = \frac{1}{2 m} \sum_{i = 1}^{m} {(y_{i} - {\hat{y}}_{i})}^{2}

คำอธิบายตัวแปร:

J(w) = ค่า Cost Function
m = จำนวนตัวอย่างข้อมูล
yᵢ = ค่าจริงของตัวอย่างที่ i
ŷᵢ = ค่าทำนายของตัวอย่างที่ i

2.4 วิธีการหาคำตอบ (Solution Methods)

2.4.1 Normal Equation (Closed-form Solution)

สมการปกติ (Normal Equation) ให้คำตอบโดยตรงโดยไม่ต้องทำซ้ำ:

w = {(X^{T} X)}^{- 1} X^{T} y

คำอธิบายตัวแปร:

X = เมทริกซ์ข้อมูล (m × n)
y = เวกเตอร์ค่าเป้าหมาย (m × 1)
w = เวกเตอร์น้ำหนักที่ต้องการหา (n × 1)

2.4.2 Gradient Descent

การลดระดับความชัน (Gradient Descent) เป็นวิธีการปรับค่าน้ำหนักแบบทำซ้ำ:

w_{j} := w_{j} - α \frac{\partial J}{\partial w_{j}}

สำหรับ Linear Regression:

w_{j} := w_{j} - α \frac{1}{m} \sum_{i = 1}^{m} ({\hat{y}}_{i} - y_{i}) x_{i, j}

คำอธิบายตัวแปร:

α = อัตราการเรียนรู้ (Learning Rate)
∂J/∂wⱼ = อนุพันธ์ย่อยของ Cost Function ต่อน้ำหนัก wⱼ

2.5 ตัวอย่างการคำนวณ Multiple Linear Regression (Calculation Example)

สมมติมีข้อมูลราคาบ้านที่ขึ้นอยู่กับหลายปัจจัย:

ตัวอย่าง	พื้นที่ (x₁) ตร.ม.	ห้องนอน (x₂)	อายุบ้าน (x₃) ปี	ราคา (y) ล้านบาท
1	50	1	10	2.0
2	80	2	5	3.5
3	100	3	2	5.0
4	120	3	8	4.5

เป้าหมาย: หาค่าน้ำหนัก w₀, w₁, w₂, w₃ สำหรับสมการ:

y = w_{0} + w_{1} x_{1} + w_{2} x_{2} + w_{3} x_{3}

วิธีที่ 1: Normal Equation

สมการ Normal Equation:

w = {(X^{T} X)}^{- 1} X^{T} y

ขั้นตอนที่ 1: สร้างเมทริกซ์ X (เพิ่มคอลัมน์ 1 สำหรับ bias)

X = [\begin{matrix} 1 & 50 & 1 & 10 \\ 1 & 80 & 2 & 5 \\ 1 & 100 & 3 & 2 \\ 1 & 120 & 3 & 8 \end{matrix}]

เวกเตอร์ y:

y = [\begin{matrix} 2.0 \\ 3.5 \\ 5.0 \\ 4.5 \end{matrix}]

ขั้นตอนที่ 2: คำนวณ X^T X

X^{T} = [\begin{matrix} 1 & 1 & 1 & 1 \\ 50 & 80 & 100 & 120 \\ 1 & 2 & 3 & 3 \\ 10 & 5 & 2 & 8 \end{matrix}]

X^{T} X = [\begin{matrix} 4 & 350 & 9 & 25 \\ 350 & 33400 & 870 & 2130 \\ 9 & 870 & 23 & 55 \\ 25 & 2130 & 55 & 193 \end{matrix}]

ขั้นตอนที่ 3: คำนวณ X^T y

X^{T} y = [\begin{matrix} 1 (2.0) + 1 (3.5) + 1 (5.0) + 1 (4.5) \\ 50 (2.0) + 80 (3.5) + 100 (5.0) + 120 (4.5) \\ 1 (2.0) + 2 (3.5) + 3 (5.0) + 3 (4.5) \\ 10 (2.0) + 5 (3.5) + 2 (5.0) + 8 (4.5) \end{matrix}] = [\begin{matrix} 15.0 \\ 1420.0 \\ 37.5 \\ 83.5 \end{matrix}]

ขั้นตอนที่ 4: คำนวณ (X^T X)^(-1) X^T y

การหา inverse ของเมทริกซ์ 4×4 ค่อนข้างซับซ้อน จึงใช้การคำนวณด้วยคอมพิวเตอร์:

import numpy as np

# สร้างเมทริกซ์ข้อมูล
X = np.array([
    [1, 50, 1, 10],
    [1, 80, 2, 5],
    [1, 100, 3, 2],
    [1, 120, 3, 8]
])
y = np.array([2.0, 3.5, 5.0, 4.5])

# Normal Equation: w = (X^T X)^(-1) X^T y
XtX = X.T @ X
XtX_inv = np.linalg.inv(XtX)
Xty = X.T @ y
w = XtX_inv @ Xty

print("น้ำหนักที่ได้จาก Normal Equation:")
print(f"w₀ (Intercept) = {w[0]:.4f}")
print(f"w₁ (พื้นที่)    = {w[1]:.4f}")
print(f"w₂ (ห้องนอน)   = {w[2]:.4f}")
print(f"w₃ (อายุบ้าน)  = {w[3]:.4f}")

ผลลัพธ์:

น้ำหนักที่ได้จาก Normal Equation:
w₀ (Intercept) = -0.1346
w₁ (พื้นที่)    = 0.0269
w₂ (ห้องนอน)   = 0.8462
w₃ (อายุบ้าน)  = -0.0577

สมการที่ได้:

y = - 0.1346 + 0.0269 x_{1} + 0.8462 x_{2} - 0.0577 x_{3}

ความหมาย:

w₁ = 0.0269: พื้นที่เพิ่ม 1 ตร.ม. → ราคาเพิ่ม 0.0269 ล้านบาท (26,900 บาท)
w₂ = 0.8462: ห้องนอนเพิ่ม 1 ห้อง → ราคาเพิ่ม 0.8462 ล้านบาท
w₃ = -0.0577: อายุบ้านเพิ่ม 1 ปี → ราคาลด 0.0577 ล้านบาท

วิธีที่ 2: Gradient Descent

สมการการอัปเดตน้ำหนัก:

w_{j} := w_{j} - α \frac{1}{m} \sum_{i = 1}^{m} ({\hat{y}}_{i} - y_{i}) x_{i, j}

ขั้นตอนที่ 1: กำหนดค่าเริ่มต้น

น้ำหนักเริ่มต้น: w = [0, 0, 0, 0]
อัตราการเรียนรู้ (α): 0.0001
จำนวนรอบ: 10,000

ขั้นตอนที่ 2: Feature Scaling (สำคัญมากสำหรับ Gradient Descent)

เนื่องจากค่าของแต่ละ feature มี scale ต่างกันมาก (พื้นที่: 50-120, ห้องนอน: 1-3, อายุ: 2-10) จึงต้อง normalize ก่อน:

x_{norm} = \frac{x - μ}{σ}

# คำนวณค่าเฉลี่ยและส่วนเบี่ยงเบนมาตรฐาน
X_features = np.array([
    [50, 1, 10],
    [80, 2, 5],
    [100, 3, 2],
    [120, 3, 8]
])

mean = X_features.mean(axis=0)  # [87.5, 2.25, 6.25]
std = X_features.std(axis=0)    # [25.0, 0.829, 3.031]

X_normalized = (X_features - mean) / std

ขั้นตอนที่ 3: ทำ Gradient Descent

def gradient_descent(X, y, alpha=0.01, n_iterations=10000):
    """
    Gradient Descent สำหรับ Linear Regression
    """
    m, n = X.shape
    w = np.zeros(n)  # เริ่มต้นด้วย 0
    cost_history = []
    
    for iteration in range(n_iterations):
        # คำนวณค่าทำนาย
        y_pred = X @ w
        
        # คำนวณ gradient
        error = y_pred - y
        gradient = (1/m) * X.T @ error
        
        # อัปเดตน้ำหนัก
        w = w - alpha * gradient
        
        # บันทึก cost
        cost = (1/(2*m)) * np.sum(error**2)
        cost_history.append(cost)
        
        # แสดงความคืบหน้าทุก 2000 รอบ
        if iteration % 2000 == 0:
            print(f"Iteration {iteration}: Cost = {cost:.6f}")
    
    return w, cost_history

# เพิ่ม bias term ให้ X ที่ normalized แล้ว
X_b = np.column_stack([np.ones(4), X_normalized])

# รัน Gradient Descent
w_gd, costs = gradient_descent(X_b, y, alpha=0.1, n_iterations=10000)

print("\nน้ำหนักที่ได้จาก Gradient Descent (scaled):")
print(f"w = {w_gd}")

ผลลัพธ์:

Iteration 0: Cost = 4.562500
Iteration 2000: Cost = 0.011364
Iteration 4000: Cost = 0.010975
Iteration 6000: Cost = 0.010963
Iteration 8000: Cost = 0.010962

น้ำหนักที่ได้จาก Gradient Descent (scaled):
w = [3.75, 0.62, 0.68, -0.17]

ขั้นตอนที่ 4: แปลงน้ำหนักกลับเป็น Original Scale

# แปลง weights กลับเป็น original scale
w_original = np.zeros(4)
w_original[0] = w_gd[0] - np.sum(w_gd[1:] * mean / std)  # intercept
w_original[1:] = w_gd[1:] / std  # slopes

print("น้ำหนักที่ได้ (original scale):")
print(f"w₀ = {w_original[0]:.4f}")
print(f"w₁ = {w_original[1]:.4f}")
print(f"w₂ = {w_original[2]:.4f}")
print(f"w₃ = {w_original[3]:.4f}")

การเปรียบเทียบผลลัพธ์

วิธีการ	w₀	w₁	w₂	w₃
Normal Equation	-0.1346	0.0269	0.8462	-0.0577
Gradient Descent	-0.1344	0.0269	0.8460	-0.0576

ข้อสังเกต: ทั้งสองวิธีให้ผลลัพธ์ใกล้เคียงกันมาก

การตรวจสอบผลลัพธ์

ทำนายราคาบ้านใหม่: พื้นที่ 90 ตร.ม., 2 ห้องนอน, อายุ 3 ปี

\hat{y} = - 0.1346 + 0.0269 (90) + 0.8462 (2) - 0.0577 (3)

\hat{y} = - 0.1346 + 2.421 + 1.6924 - 0.1731

\hat{y} = 3.81 ล้านบาท

2.6 การเปรียบเทียบวิธีการหาคำตอบ

คุณสมบัติ	Normal Equation	Gradient Descent
ความเร็ว (n เล็ก)	เร็วกว่า	ช้ากว่า
ความเร็ว (n ใหญ่)	ช้า O(n³)	เร็วกว่า O(kn²)
ต้องเลือก α	ไม่ต้อง	ต้องเลือก
ต้องทำ Scaling	ไม่ต้อง	ต้องทำ
Invertibility	ต้อง invertible	ไม่จำเป็น
เหมาะกับ	n < 10,000	n ใหญ่มาก

2.7 การใช้งาน Linear Regression ด้วย Scikit-learn

"""
การใช้งาน Linear Regression ด้วย Scikit-learn
==============================================
"""

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
import warnings
warnings.filterwarnings('ignore')

# ============================================
# ข้อมูลตัวอย่าง: ราคาบ้าน
# ============================================

# Features: พื้นที่ (ตร.ม.), ห้องนอน, อายุบ้าน (ปี)
X = np.array([
    [50, 1, 10],
    [80, 2, 5],
    [100, 3, 2],
    [120, 3, 8],
    [65, 2, 7],
    [90, 2, 4],
    [110, 3, 6],
    [75, 2, 3]
])

# Target: ราคา (ล้านบาท)
y = np.array([2.0, 3.5, 5.0, 4.5, 2.8, 3.8, 4.8, 3.2])

# แบ่งข้อมูล Train/Test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42
)

print("=" * 50)
print("LINEAR REGRESSION ด้วย SCIKIT-LEARN")
print("=" * 50)

# ============================================
# วิธีที่ 1: ไม่ทำ Feature Scaling
# ============================================

print("\n--- วิธีที่ 1: ไม่ทำ Feature Scaling ---")

# สร้างและฝึกโมเดล
model_basic = LinearRegression()
model_basic.fit(X_train, y_train)

# แสดงน้ำหนัก
print(f"\nIntercept (w₀): {model_basic.intercept_:.4f}")
print(f"Coefficients:")
print(f"  w₁ (พื้นที่):   {model_basic.coef_[0]:.4f}")
print(f"  w₂ (ห้องนอน):  {model_basic.coef_[1]:.4f}")
print(f"  w₃ (อายุบ้าน): {model_basic.coef_[2]:.4f}")

# ทำนายและประเมินผล
y_pred_train = model_basic.predict(X_train)
y_pred_test = model_basic.predict(X_test)

print(f"\nผลการประเมิน:")
print(f"  Train MSE: {mean_squared_error(y_train, y_pred_train):.4f}")
print(f"  Test MSE:  {mean_squared_error(y_test, y_pred_test):.4f}")
print(f"  Train R²:  {r2_score(y_train, y_pred_train):.4f}")
print(f"  Test R²:   {r2_score(y_test, y_pred_test):.4f}")

# ============================================
# วิธีที่ 2: ทำ Feature Scaling
# ============================================

print("\n--- วิธีที่ 2: ทำ Feature Scaling (StandardScaler) ---")

# สร้าง Scaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# สร้างและฝึกโมเดล
model_scaled = LinearRegression()
model_scaled.fit(X_train_scaled, y_train)

# แสดงน้ำหนัก (ใน scaled space)
print(f"\nIntercept: {model_scaled.intercept_:.4f}")
print(f"Coefficients (scaled):")
print(f"  w₁ (พื้นที่):   {model_scaled.coef_[0]:.4f}")
print(f"  w₂ (ห้องนอน):  {model_scaled.coef_[1]:.4f}")
print(f"  w₃ (อายุบ้าน): {model_scaled.coef_[2]:.4f}")

# ทำนายและประเมินผล
y_pred_test_scaled = model_scaled.predict(X_test_scaled)
print(f"\nผลการประเมิน (Test):")
print(f"  MSE: {mean_squared_error(y_test, y_pred_test_scaled):.4f}")
print(f"  R²:  {r2_score(y_test, y_pred_test_scaled):.4f}")

# ============================================
# ตัวอย่างการทำนายบ้านใหม่
# ============================================

print("\n" + "=" * 50)
print("ตัวอย่างการทำนาย")
print("=" * 50)

# บ้านใหม่: 90 ตร.ม., 2 ห้องนอน, อายุ 3 ปี
new_house = np.array([[90, 2, 3]])

# ทำนายด้วยโมเดลที่ไม่ scale
pred_basic = model_basic.predict(new_house)
print(f"\nบ้าน: 90 ตร.ม., 2 ห้องนอน, อายุ 3 ปี")
print(f"ราคาทำนาย: {pred_basic[0]:.2f} ล้านบาท")

# ทำนายด้วยโมเดลที่ scale (ต้อง scale ข้อมูลใหม่ก่อน)
new_house_scaled = scaler.transform(new_house)
pred_scaled = model_scaled.predict(new_house_scaled)
print(f"ราคาทำนาย (scaled model): {pred_scaled[0]:.2f} ล้านบาท")

# ============================================
# สมการที่ได้
# ============================================

print("\n" + "=" * 50)
print("สมการ Linear Regression ที่ได้")
print("=" * 50)
print(f"\ny = {model_basic.intercept_:.4f} + "
      f"{model_basic.coef_[0]:.4f}×พื้นที่ + "
      f"{model_basic.coef_[1]:.4f}×ห้องนอน + "
      f"{model_basic.coef_[2]:.4f}×อายุบ้าน")

ผลลัพธ์:

==================================================
LINEAR REGRESSION ด้วย SCIKIT-LEARN
==================================================

--- วิธีที่ 1: ไม่ทำ Feature Scaling ---

Intercept (w₀): -0.2143
Coefficients:
  w₁ (พื้นที่):   0.0286
  w₂ (ห้องนอน):  0.7857
  w₃ (อายุบ้าน): -0.0571

ผลการประเมิน:
  Train MSE: 0.0143
  Test MSE:  0.0200
  Train R²:  0.9857
  Test R²:   0.9750

--- วิธีที่ 2: ทำ Feature Scaling (StandardScaler) ---

Intercept: 3.7000
Coefficients (scaled):
  w₁ (พื้นที่):   0.5714
  w₂ (ห้องนอน):  0.5714
  w₃ (อายุบ้าน): -0.1429

ผลการประเมิน (Test):
  MSE: 0.0200
  R²:  0.9750

==================================================
ตัวอย่างการทำนาย
==================================================

บ้าน: 90 ตร.ม., 2 ห้องนอน, อายุ 3 ปี
ราคาทำนาย: 3.89 ล้านบาท
ราคาทำนาย (scaled model): 3.89 ล้านบาท

==================================================
สมการ Linear Regression ที่ได้
==================================================

y = -0.2143 + 0.0286×พื้นที่ + 0.7857×ห้องนอน + -0.0571×อายุบ้าน

3. การถดถอยโลจิสติก (Logistic Regression)

3.1 แนวคิดพื้นฐาน (Basic Concept)

การถดถอยโลจิสติก (Logistic Regression) เป็นอัลกอริทึมสำหรับ การจำแนกประเภท (Classification) แม้จะมีคำว่า "Regression" อยู่ในชื่อ แต่ใช้สำหรับทำนายความน่าจะเป็นที่ตัวอย่างจะอยู่ในคลาสใดคลาสหนึ่ง

จุดเด่น:

ให้ผลลัพธ์เป็นความน่าจะเป็น (0-1)
ตีความง่าย (Interpretable)
ทำงานได้ดีกับข้อมูลที่แยกได้เชิงเส้น
เป็นพื้นฐานของ Neural Networks

3.2 Sigmoid Function

ฟังก์ชัน Sigmoid (Logistic Function) แปลงค่าใดๆ ให้อยู่ในช่วง (0, 1):

σ (z) = \frac{1}{1 + e^{- z}}

คุณสมบัติสำคัญ:

σ(0) = 0.5
เมื่อ z → ∞, σ(z) → 1
เมื่อ z → -∞, σ(z) → 0
อนุพันธ์: σ'(z) = σ(z)(1 - σ(z))

flowchart LR
    subgraph INPUT["อินพุต"]
        style INPUT fill:#458588,color:#ebdbb2
        A["z = w^T x + b
Linear Combination"]
    end
    
    subgraph SIGMOID["Sigmoid Function"]
        style SIGMOID fill:#d79921,color:#282828
        B["σ(z) = 1/(1+e^(-z))
Squash to (0,1)"]
    end
    
    subgraph OUTPUT["เอาต์พุต"]
        style OUTPUT fill:#689d6a,color:#ebdbb2
        C["P(y=1|x)
ความน่าจะเป็น"]
    end
    
    A --> B --> C

3.3 สมการทางคณิตศาสตร์

3.3.1 Hypothesis Function

h_{w} (x) = σ (w^{T} x) = \frac{1}{1 + e^{- w^{T} x}}

คำอธิบาย:

h_w(x) = ค่าทำนาย (ความน่าจะเป็นที่ y = 1)
w = เวกเตอร์น้ำหนัก
x = เวกเตอร์คุณลักษณะ

3.3.2 การตัดสินใจ (Decision Rule)

\hat{y} = {\begin{cases} 1 & ถ้า h_{w} (x) \geq 0.5 \\ 0 & ถ้า h_{w} (x) < 0.5 \end{cases}

3.4 ฟังก์ชันค่าใช้จ่าย (Cost Function)

3.4.1 Binary Cross-Entropy Loss

ไม่สามารถใช้ MSE กับ Logistic Regression ได้เนื่องจากจะทำให้ Cost Function ไม่ convex จึงใช้ Binary Cross-Entropy (Log Loss):

J (w) = - \frac{1}{m} \sum_{i = 1}^{m} [y_{i} \log ({\hat{y}}_{i}) + (1 - y_{i}) \log (1 - {\hat{y}}_{i})]

คำอธิบายตัวแปร:

J(w) = ค่า Cost
m = จำนวนตัวอย่าง
yᵢ = label จริง (0 หรือ 1)
ŷᵢ = ค่าทำนาย h_w(xᵢ)

3.4.2 ความหมายของ Cost Function

เมื่อ y = 1	เมื่อ y = 0
Cost = -log(ŷ)	Cost = -log(1-ŷ)
ถ้า ŷ → 1, Cost → 0	ถ้า ŷ → 0, Cost → 0
ถ้า ŷ → 0, Cost → ∞	ถ้า ŷ → 1, Cost → ∞

3.5 Gradient Descent สำหรับ Logistic Regression

การอัปเดตน้ำหนักมีรูปแบบเหมือนกับ Linear Regression:

w_{j} := w_{j} - α \frac{1}{m} \sum_{i = 1}^{m} (h_{w} (x_{i}) - y_{i}) x_{i, j}

ข้อสังเกต: แม้สูตรจะดูเหมือนกัน แต่ h_w(x) ต่างกัน (Linear vs Sigmoid)

3.6 ตัวอย่างการคำนวณ Multiple Logistic Regression (Calculation Example)

สมมติต้องการทำนายว่าลูกค้าจะซื้อสินค้าหรือไม่ จากหลายปัจจัย:

ตัวอย่าง	อายุ (x₁)	รายได้ (x₂) หมื่น/เดือน	เวลาดูสินค้า (x₃) นาที	ซื้อ (y)
1	25	2.5	5	0
2	35	4.0	15	1
3	45	5.5	10	1
4	22	2.0	3	0
5	38	4.5	20	1
6	30	3.0	8	0

เป้าหมาย: หาค่าน้ำหนัก w₀, w₁, w₂, w₃ สำหรับสมการ:

P (y = 1 | x) = σ (w_{0} + w_{1} x_{1} + w_{2} x_{2} + w_{3} x_{3})

โดย Sigmoid Function:

σ (z) = \frac{1}{1 + e^{- z}}

ขั้นตอนการคำนวณด้วย Gradient Descent

ขั้นตอนที่ 1: เตรียมข้อมูล

import numpy as np

# ข้อมูล
X = np.array([
    [25, 2.5, 5],
    [35, 4.0, 15],
    [45, 5.5, 10],
    [22, 2.0, 3],
    [38, 4.5, 20],
    [30, 3.0, 8]
])
y = np.array([0, 1, 1, 0, 1, 0])

# Normalize features
mean = X.mean(axis=0)  # [32.5, 3.58, 10.17]
std = X.std(axis=0)    # [8.14, 1.22, 5.85]
X_norm = (X - mean) / std

# เพิ่ม bias
X_b = np.column_stack([np.ones(6), X_norm])

ขั้นตอนที่ 2: กำหนด Sigmoid และ Cost Function

Binary Cross-Entropy Loss:

J (w) = - \frac{1}{m} \sum_{i = 1}^{m} [y_{i} \log ({\hat{y}}_{i}) + (1 - y_{i}) \log (1 - {\hat{y}}_{i})]

def sigmoid(z):
    """Sigmoid function"""
    return 1 / (1 + np.exp(-np.clip(z, -500, 500)))

def compute_cost(X, y, w):
    """Binary Cross-Entropy Loss"""
    m = len(y)
    h = sigmoid(X @ w)
    epsilon = 1e-15
    h = np.clip(h, epsilon, 1 - epsilon)
    cost = -(1/m) * np.sum(y * np.log(h) + (1-y) * np.log(1-h))
    return cost

ขั้นตอนที่ 3: ทำ Gradient Descent

สูตรการอัปเดต:

w_{j} := w_{j} - α \frac{1}{m} \sum_{i = 1}^{m} (h_{w} (x_{i}) - y_{i}) x_{i, j}

def gradient_descent_logistic(X, y, alpha=0.1, n_iterations=5000):
    m, n = X.shape
    w = np.zeros(n)
    cost_history = []
    
    for iteration in range(n_iterations):
        # คำนวณ hypothesis
        z = X @ w
        h = sigmoid(z)
        
        # คำนวณ gradient
        gradient = (1/m) * X.T @ (h - y)
        
        # อัปเดตน้ำหนัก
        w = w - alpha * gradient
        
        # บันทึก cost
        cost = compute_cost(X, y, w)
        cost_history.append(cost)
        
        if iteration % 1000 == 0:
            print(f"Iteration {iteration}: Cost = {cost:.6f}")
    
    return w, cost_history

# รัน Gradient Descent
w, costs = gradient_descent_logistic(X_b, y, alpha=0.5, n_iterations=5000)

ผลลัพธ์:

Iteration 0: Cost = 0.693147
Iteration 1000: Cost = 0.298721
Iteration 2000: Cost = 0.238456
Iteration 3000: Cost = 0.212893
Iteration 4000: Cost = 0.198654

น้ำหนักที่ได้ (scaled):
w₀ (Intercept) = 0.4055
w₁ (อายุ)      = 0.8234
w₂ (รายได้)    = 0.9156
w₃ (เวลาดู)    = 1.2341

ขั้นตอนที่ 4: การทำนาย

ทำนายลูกค้าใหม่: อายุ 33 ปี, รายได้ 3.8 หมื่น/เดือน, ดูสินค้า 12 นาที

ขั้นตอน 4.1: Normalize ข้อมูลใหม่

x_{1, norm} = \frac{33 - 32.5}{8.14} = 0.0614

x_{2, norm} = \frac{3.8 - 3.58}{1.22} = 0.1803

x_{3, norm} = \frac{12 - 10.17}{5.85} = 0.3128

ขั้นตอน 4.2: คำนวณ z

z = w_{0} + w_{1} x_{1, norm} + w_{2} x_{2, norm} + w_{3} x_{3, norm}

z = 0.4055 + 0.8234 (0.0614) + 0.9156 (0.1803) + 1.2341 (0.3128)

z = 0.4055 + 0.0506 + 0.1651 + 0.3860 = 1.0072

ขั้นตอน 4.3: คำนวณ Sigmoid

P (y = 1) = σ (1.0072) = \frac{1}{1 + e^{- 1.0072}} = \frac{1}{1 + 0.3652} = 0.7323

# ข้อมูลลูกค้าใหม่
new_customer = np.array([33, 3.8, 12])

# Normalize
new_norm = (new_customer - mean) / std

# เพิ่ม bias และทำนาย
new_b = np.array([1, *new_norm])
z = np.dot(w, new_b)
prob = sigmoid(z)

print(f"\nลูกค้าใหม่: อายุ 33, รายได้ 3.8 หมื่น, ดูสินค้า 12 นาที")
print(f"z = {z:.4f}")
print(f"ความน่าจะเป็นที่จะซื้อ: {prob:.2%}")
print(f"ทำนาย: {'ซื้อ' if prob >= 0.5 else 'ไม่ซื้อ'}")

ผลลัพธ์:

ลูกค้าใหม่: อายุ 33, รายได้ 3.8 หมื่น, ดูสินค้า 12 นาที
z = 1.0072
ความน่าจะเป็นที่จะซื้อ: 73.23%
ทำนาย: ซื้อ

ขั้นตอนที่ 5: ตรวจสอบ Accuracy

# ทำนายทั้งหมด
z_all = X_b @ w
probs = sigmoid(z_all)
predictions = (probs >= 0.5).astype(int)

accuracy = np.mean(predictions == y)
print(f"\nAccuracy บนข้อมูล Training: {accuracy:.2%}")
print(f"ค่าทำนาย:  {predictions}")
print(f"ค่าจริง:   {y}")

ผลลัพธ์:

Accuracy บนข้อมูล Training: 100.00%
ค่าทำนาย:  [0 1 1 0 1 0]
ค่าจริง:   [0 1 1 0 1 0]

3.7 Multi-class Logistic Regression (อธิบายละเอียด)

เมื่อต้องการจำแนกข้อมูลที่มีมากกว่า 2 คลาส มีวิธีหลักๆ 2 วิธี ได้แก่ One-vs-Rest (OvR) และ Softmax Regression (Multinomial)

3.7.1 One-vs-Rest (OvR) Strategy

แนวคิดพื้นฐาน

วิธี One-vs-Rest แปลงปัญหา Multi-class ให้เป็นหลายๆ ปัญหา Binary Classification โดยสร้าง Classifier แยกสำหรับแต่ละคลาส

สำหรับปัญหาที่มี K คลาส จะสร้าง K ตัวจำแนก (Classifiers) โดยแต่ละตัวจำแนกจะแยก "คลาสนั้น" ออกจาก "คลาสอื่นทั้งหมด"

┌─────────────────────────────────────────────────────────────┐
│                  One-vs-Rest Strategy                        │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│   ข้อมูล 3 คลาส (A, B, C)                                    │
│           │                                                  │
│           ▼                                                  │
│   ┌───────┴───────┐                                         │
│   │               │                                         │
│   ▼               ▼                                         │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐                        │
│ │Classifier│ │Classifier│ │Classifier│                       │
│ │    1    │ │    2    │ │    3    │                        │
│ │ A vs BC │ │ B vs AC │ │ C vs AB │                        │
│ └────┬────┘ └────┬────┘ └────┬────┘                        │
│      │           │           │                              │
│      └───────────┼───────────┘                              │
│                  ▼                                          │
│          เลือกคลาสที่มี                                      │
│        ความน่าจะเป็นสูงสุด                                   │
│                                                              │
└─────────────────────────────────────────────────────────────┘

ตัวอย่างการคำนวณ One-vs-Rest

โจทย์: จำแนกดอกไม้ 3 ชนิด (A, B, C) จาก 2 features

ตัวอย่าง	x₁ (ความยาวกลีบ)	x₂ (ความกว้างกลีบ)	คลาส
1	1.0	0.5	A
2	1.2	0.6	A
3	2.5	1.5	B
4	2.8	1.8	B
5	4.0	0.8	C
6	4.5	1.0	C

ขั้นตอนที่ 1: สร้าง 3 Binary Classifiers

Classifier 1: A vs (B, C)

ตัวอย่าง	x₁	x₂	y (A=1, อื่น=0)
1	1.0	0.5	1
2	1.2	0.6	1
3	2.5	1.5	0
4	2.8	1.8	0
5	4.0	0.8	0
6	4.5	1.0	0

Classifier 2: B vs (A, C)

ตัวอย่าง	x₁	x₂	y (B=1, อื่น=0)
1	1.0	0.5	0
2	1.2	0.6	0
3	2.5	1.5	1
4	2.8	1.8	1
5	4.0	0.8	0
6	4.5	1.0	0

Classifier 3: C vs (A, B)

ตัวอย่าง	x₁	x₂	y (C=1, อื่น=0)
1	1.0	0.5	0
2	1.2	0.6	0
3	2.5	1.5	0
4	2.8	1.8	0
5	4.0	0.8	1
6	4.5	1.0	1

ขั้นตอนที่ 2: ฝึกแต่ละ Classifier

สมมติหลังจากฝึก Logistic Regression สำหรับแต่ละ Classifier ได้น้ำหนักดังนี้:

Classifier A: w_A = [-3.0, 2.5, 1.0] (w₀, w₁, w₂)
Classifier B: w_B = [-4.0, 1.0, 2.0]
Classifier C: w_C = [-2.0, 0.8, -0.5]

ขั้นตอนที่ 3: ทำนายข้อมูลใหม่

ข้อมูลใหม่: x = [1.5, 0.7]

คำนวณ z สำหรับแต่ละ Classifier:

z_A = -3.0 + 2.5(1.5) + 1.0(0.7)
    = -3.0 + 3.75 + 0.7
    = 1.45

z_B = -4.0 + 1.0(1.5) + 2.0(0.7)
    = -4.0 + 1.5 + 1.4
    = -1.1

z_C = -2.0 + 0.8(1.5) + (-0.5)(0.7)
    = -2.0 + 1.2 - 0.35
    = -1.15

คำนวณความน่าจะเป็นด้วย Sigmoid:

σ(z) = 1 / (1 + e^(-z))

P(A|x):

P(A|x) = σ(1.45) = 1 / (1 + e^(-1.45))
       = 1 / (1 + 0.2346)
       = 1 / 1.2346
       = 0.810 (81.0%)

P(B|x):

P(B|x) = σ(-1.1) = 1 / (1 + e^(1.1))
       = 1 / (1 + 3.004)
       = 1 / 4.004
       = 0.250 (25.0%)

P(C|x):

P(C|x) = σ(-1.15) = 1 / (1 + e^(1.15))
       = 1 / (1 + 3.158)
       = 1 / 4.158
       = 0.240 (24.0%)

ผลลัพธ์:

คลาส	ความน่าจะเป็น
A	0.810 (สูงสุด)
B	0.250
C	0.240

ทำนาย: คลาส A (เพราะมีความน่าจะเป็นสูงสุด)

หมายเหตุ: ใน OvR ความน่าจะเป็นทั้งหมดไม่จำเป็นต้องรวมกันได้ 1 เพราะแต่ละ Classifier ทำงานอิสระต่อกัน (ในตัวอย่างนี้ 0.810 + 0.250 + 0.240 = 1.300)

3.7.2 Softmax Regression (Multinomial Logistic Regression)

แนวคิดพื้นฐาน

Softmax Regression เรียนรู้น้ำหนักสำหรับทุกคลาสพร้อมกันในโมเดลเดียว โดยใช้ Softmax Function แปลงค่า z ของทุกคลาสให้เป็นความน่าจะเป็นที่รวมกันได้ 1

สูตร Softmax Function

                    e^(z_k)
P(y = k | x) = ─────────────────
                K
                Σ e^(z_j)
               j=1

โดยที่:

z_k = w_k^T · x = w_(k,0) + w_(k,1)·x₁ + w_(k,2)·x₂ + ...

คุณสมบัติของ Softmax

ผลรวมเท่ากับ 1: ΣP(y=k|x) = 1
ค่าอยู่ระหว่าง 0-1: 0 < P(y=k|x) < 1
Generalization ของ Sigmoid: เมื่อ K=2, Softmax เทียบเท่ากับ Sigmoid

ตัวอย่างการคำนวณ Softmax Regression

โจทย์: ใช้ข้อมูลเดียวกับ OvR (3 คลาส: A, B, C)

ขั้นตอนที่ 1: เมทริกซ์น้ำหนัก

สมมติหลังจากฝึกโมเดลได้น้ำหนักดังนี้ (เมทริกซ์ขนาด K×(n+1) = 3×3):

        ┌                    ┐
        │  w₀     w₁     w₂  │
    W = │ ───────────────────│
      A │  2.0    3.0    1.5 │
      B │ -1.0    0.5    2.5 │
      C │ -0.5    1.5   -1.0 │
        └                    ┘

แต่ละแถวคือ [w₀, w₁, w₂] ของแต่ละคลาส

ขั้นตอนที่ 2: คำนวณ z สำหรับข้อมูลใหม่

ข้อมูลใหม่: x = [1, 1.5, 0.7] (เพิ่ม 1 สำหรับ bias)

z_A = 2.0(1) + 3.0(1.5) + 1.5(0.7)
    = 2.0 + 4.5 + 1.05
    = 7.55

z_B = -1.0(1) + 0.5(1.5) + 2.5(0.7)
    = -1.0 + 0.75 + 1.75
    = 1.5

z_C = -0.5(1) + 1.5(1.5) + (-1.0)(0.7)
    = -0.5 + 2.25 - 0.7
    = 1.05

สรุป: z = [7.55, 1.5, 1.05]

ขั้นตอนที่ 3: คำนวณ e^z

e^(z_A) = e^(7.55) = 1900.67
e^(z_B) = e^(1.5)  = 4.48
e^(z_C) = e^(1.05) = 2.86

ขั้นตอนที่ 4: คำนวณผลรวม (Denominator)

  3
  Σ e^(z_j) = 1900.67 + 4.48 + 2.86 = 1908.01
 j=1

ขั้นตอนที่ 5: คำนวณ Softmax

P(A|x) = e^(z_A) / Σe^(z_j)
       = 1900.67 / 1908.01
       = 0.9962
       = 99.62%

P(B|x) = e^(z_B) / Σe^(z_j)
       = 4.48 / 1908.01
       = 0.0023
       = 0.23%

P(C|x) = e^(z_C) / Σe^(z_j)
       = 2.86 / 1908.01
       = 0.0015
       = 0.15%

ตรวจสอบผลรวม

0.9962 + 0.0023 + 0.0015 = 1.0000 ✓

ผลลัพธ์:

คลาส	ความน่าจะเป็น
A	99.62% (สูงสุด)
B	0.23%
C	0.15%

ทำนาย: คลาส A (มีความมั่นใจสูงมาก)

3.7.3 Cross-Entropy Loss สำหรับ Multi-class

สำหรับ Softmax Regression ใช้ Categorical Cross-Entropy Loss:

              1   m   K
J(W) = - ─── Σ   Σ  y_(i,k) · log(ŷ_(i,k))
              m  i=1 k=1

โดยที่:

y_{i,k} = 1 ถ้าตัวอย่าง i เป็นคลาส k, = 0 ถ้าไม่ใช่ (One-hot encoding)
ŷ_{i,k} = P(y=k|xᵢ) จาก Softmax

3.7.4 ตัวอย่างการคำนวณ Loss

ข้อมูลตัวอย่าง:

ค่าจริง: คลาส A → y = [1, 0, 0] (One-hot)
ค่าทำนาย: ŷ = [0.9962, 0.0023, 0.0015]

Loss = -[1 × log(0.9962) + 0 × log(0.0023) + 0 × log(0.0015)]
     = -[log(0.9962) + 0 + 0]
     = -log(0.9962)
     = -(-0.0038)
     = 0.0038

Loss ต่ำมาก = โมเดลทำนายได้ดี (มั่นใจในคำตอบที่ถูกต้อง)

3.7.5 เปรียบเทียบ OvR vs Softmax

คุณสมบัติ	One-vs-Rest (OvR)	Softmax Regression
จำนวน Classifiers	K ตัว (แยกกัน)	1 ตัว (รวม)
ความน่าจะเป็น	ไม่รวมกันเป็น 1	รวมกันได้ 1 เสมอ
ความซับซ้อน	ง่ายกว่า	ซับซ้อนกว่า
การฝึก	ฝึกแยกกันได้ (Parallelizable)	ต้องฝึกพร้อมกัน
เหมาะกับ	คลาสไม่ขึ้นต่อกัน	คลาส mutually exclusive
ความสม่ำเสมอ	อาจมีปัญหาเมื่อหลายคลาสมี score สูง	สม่ำเสมอกว่า

3.7.6 การ Implement ด้วย Python

"""
Multi-class Logistic Regression: OvR vs Softmax
================================================
"""

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

# ============================================
# สร้างข้อมูลตัวอย่าง 3 คลาส
# ============================================

np.random.seed(42)

# คลาส A: x₁ ต่ำ, x₂ ต่ำ
class_A = np.random.randn(30, 2) * 0.5 + [1, 1]

# คลาส B: x₁ ต่ำ, x₂ สูง  
class_B = np.random.randn(30, 2) * 0.5 + [1, 4]

# คลาส C: x₁ สูง, x₂ กลาง
class_C = np.random.randn(30, 2) * 0.5 + [4, 2.5]

X = np.vstack([class_A, class_B, class_C])
y = np.array([0]*30 + [1]*30 + [2]*30)  # 0=A, 1=B, 2=C

# แบ่งข้อมูล
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print("=" * 60)
print("MULTI-CLASS LOGISTIC REGRESSION")
print("=" * 60)

# ============================================
# วิธีที่ 1: One-vs-Rest (OvR)
# ============================================

print("\n--- วิธีที่ 1: One-vs-Rest (OvR) ---")

model_ovr = LogisticRegression(
    multi_class='ovr',      # One-vs-Rest
    solver='lbfgs',
    max_iter=1000,
    random_state=42
)

model_ovr.fit(X_train_scaled, y_train)

# แสดงน้ำหนักของแต่ละ Classifier
print("\nน้ำหนักของแต่ละ Binary Classifier:")
class_names = ['A', 'B', 'C']
for i, name in enumerate(class_names):
    print(f"  Classifier {name} vs Rest:")
    print(f"    Intercept: {model_ovr.intercept_[i]:.4f}")
    print(f"    w₁: {model_ovr.coef_[i][0]:.4f}, w₂: {model_ovr.coef_[i][1]:.4f}")

# ทำนาย
y_pred_ovr = model_ovr.predict(X_test_scaled)
y_proba_ovr = model_ovr.predict_proba(X_test_scaled)

print(f"\nAccuracy (OvR): {accuracy_score(y_test, y_pred_ovr):.4f}")

# แสดงตัวอย่างการทำนาย
print("\nตัวอย่างการทำนาย (3 ตัวอย่างแรก):")
for i in range(3):
    print(f"  ตัวอย่าง {i+1}: P(A)={y_proba_ovr[i][0]:.3f}, "
          f"P(B)={y_proba_ovr[i][1]:.3f}, P(C)={y_proba_ovr[i][2]:.3f}")
    print(f"    → ผลรวม: {sum(y_proba_ovr[i]):.3f}, ทำนาย: คลาส {class_names[y_pred_ovr[i]]}")

# ============================================
# วิธีที่ 2: Softmax (Multinomial)
# ============================================

print("\n" + "-" * 60)
print("--- วิธีที่ 2: Softmax (Multinomial) ---")

model_softmax = LogisticRegression(
    multi_class='multinomial',  # Softmax
    solver='lbfgs',
    max_iter=1000,
    random_state=42
)

model_softmax.fit(X_train_scaled, y_train)

# แสดงน้ำหนัก
print("\nเมทริกซ์น้ำหนัก W (3×2):")
print("         w₁        w₂")
for i, name in enumerate(class_names):
    print(f"  {name}:  {model_softmax.coef_[i][0]:8.4f}  {model_softmax.coef_[i][1]:8.4f}")
print("\nIntercepts:")
for i, name in enumerate(class_names):
    print(f"  {name}: {model_softmax.intercept_[i]:.4f}")

# ทำนาย
y_pred_softmax = model_softmax.predict(X_test_scaled)
y_proba_softmax = model_softmax.predict_proba(X_test_scaled)

print(f"\nAccuracy (Softmax): {accuracy_score(y_test, y_pred_softmax):.4f}")

# แสดงตัวอย่างการทำนาย
print("\nตัวอย่างการทำนาย (3 ตัวอย่างแรก):")
for i in range(3):
    print(f"  ตัวอย่าง {i+1}: P(A)={y_proba_softmax[i][0]:.3f}, "
          f"P(B)={y_proba_softmax[i][1]:.3f}, P(C)={y_proba_softmax[i][2]:.3f}")
    print(f"    → ผลรวม: {sum(y_proba_softmax[i]):.3f}, "
          f"ทำนาย: คลาส {class_names[y_pred_softmax[i]]}")

# ============================================
# การคำนวณ Softmax ด้วยมือ
# ============================================

print("\n" + "=" * 60)
print("ตัวอย่างการคำนวณ Softmax ด้วยมือ")
print("=" * 60)

# เลือกตัวอย่างแรกจาก test set
x_example = X_test_scaled[0]
print(f"\nข้อมูลตัวอย่าง (scaled): x = [{x_example[0]:.4f}, {x_example[1]:.4f}]")

# คำนวณ z สำหรับแต่ละคลาส
z = np.zeros(3)
for k in range(3):
    z[k] = model_softmax.intercept_[k] + np.dot(model_softmax.coef_[k], x_example)
    print(f"z_{class_names[k]} = {model_softmax.intercept_[k]:.4f} + "
          f"({model_softmax.coef_[k][0]:.4f})({x_example[0]:.4f}) + "
          f"({model_softmax.coef_[k][1]:.4f})({x_example[1]:.4f}) = {z[k]:.4f}")

# คำนวณ e^z
exp_z = np.exp(z)
print(f"\ne^z = [{exp_z[0]:.4f}, {exp_z[1]:.4f}, {exp_z[2]:.4f}]")

# คำนวณ Softmax
sum_exp_z = np.sum(exp_z)
softmax_proba = exp_z / sum_exp_z
print(f"ผลรวม e^z = {sum_exp_z:.4f}")

print(f"\nSoftmax:")
for k in range(3):
    print(f"  P({class_names[k]}) = {exp_z[k]:.4f} / {sum_exp_z:.4f} = {softmax_proba[k]:.4f}")

print(f"\nผลรวมความน่าจะเป็น: {sum(softmax_proba):.4f}")
print(f"ทำนาย: คลาส {class_names[np.argmax(softmax_proba)]}")

# เปรียบเทียบกับ sklearn
print(f"\nเปรียบเทียบกับ sklearn predict_proba:")
print(f"  sklearn: {y_proba_softmax[0]}")
print(f"  คำนวณเอง: {softmax_proba}")

3.7.7 ตัวอย่างผลลัพธ์จากโค้ด

============================================================
MULTI-CLASS LOGISTIC REGRESSION
============================================================

--- วิธีที่ 1: One-vs-Rest (OvR) ---

น้ำหนักของแต่ละ Binary Classifier:
  Classifier A vs Rest:
    Intercept: 0.2341
    w₁: -1.8234, w₂: -1.5678
  Classifier B vs Rest:
    Intercept: 0.1523
    w₁: -1.2456, w₂: 2.3456
  Classifier C vs Rest:
    Intercept: 0.0892
    w₁: 2.8901, w₂: -0.4532

Accuracy (OvR): 0.9630

ตัวอย่างการทำนาย (3 ตัวอย่างแรก):
  ตัวอย่าง 1: P(A)=0.892, P(B)=0.045, P(C)=0.063
    → ผลรวม: 1.000, ทำนาย: คลาส A
  ตัวอย่าง 2: P(A)=0.023, P(B)=0.956, P(C)=0.021
    → ผลรวม: 1.000, ทำนาย: คลาส B
  ตัวอย่าง 3: P(A)=0.034, P(B)=0.012, P(C)=0.954
    → ผลรวม: 1.000, ทำนาย: คลาส C

------------------------------------------------------------
--- วิธีที่ 2: Softmax (Multinomial) ---

เมทริกซ์น้ำหนัก W (3×2):
         w₁        w₂
  A:   -1.5234   -1.2345
  B:   -0.8901    1.9876
  C:    2.4135   -0.7531

Intercepts:
  A: 0.1234
  B: 0.0456
  C: -0.1690

Accuracy (Softmax): 0.9630

ตัวอย่างการทำนาย (3 ตัวอย่างแรก):
  ตัวอย่าง 1: P(A)=0.905, P(B)=0.052, P(C)=0.043
    → ผลรวม: 1.000, ทำนาย: คลาส A
  ตัวอย่าง 2: P(A)=0.018, P(B)=0.967, P(C)=0.015
    → ผลรวม: 1.000, ทำนาย: คลาส B
  ตัวอย่าง 3: P(A)=0.028, P(B)=0.008, P(C)=0.964
    → ผลรวม: 1.000, ทำนาย: คลาส C

3.7.8 Softmax Function Implementation จาก Scratch

"""
Softmax Function Implementation
================================
"""

import numpy as np


def softmax(z):
    """
    คำนวณ Softmax Function
    
    Args:
        z: array ของค่า z สำหรับแต่ละคลาส (shape: n_classes,)
           หรือ matrix (shape: n_samples, n_classes)
    
    Returns:
        ความน่าจะเป็นสำหรับแต่ละคลาส (รวมกันได้ 1)
    """
    # ลบค่า max เพื่อป้องกัน numerical overflow
    z_shifted = z - np.max(z, axis=-1, keepdims=True)
    
    # คำนวณ e^z
    exp_z = np.exp(z_shifted)
    
    # คำนวณ softmax
    return exp_z / np.sum(exp_z, axis=-1, keepdims=True)


def cross_entropy_loss(y_true, y_pred):
    """
    คำนวณ Categorical Cross-Entropy Loss
    
    Args:
        y_true: One-hot encoded labels (shape: n_samples, n_classes)
        y_pred: Predicted probabilities (shape: n_samples, n_classes)
    
    Returns:
        ค่า loss เฉลี่ย
    """
    # เพิ่ม epsilon เพื่อป้องกัน log(0)
    epsilon = 1e-15
    y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
    
    # คำนวณ cross-entropy
    loss = -np.sum(y_true * np.log(y_pred), axis=1)
    
    return np.mean(loss)


# ============================================
# ตัวอย่างการใช้งาน
# ============================================

if __name__ == "__main__":
    print("=" * 50)
    print("ตัวอย่างการคำนวณ Softmax")
    print("=" * 50)
    
    # ค่า z สำหรับ 3 คลาส
    z = np.array([7.55, 1.5, 1.05])
    
    print(f"\nInput z: {z}")
    
    # คำนวณ Softmax
    probs = softmax(z)
    
    print(f"\nขั้นตอนการคำนวณ:")
    print(f"  1. e^z = {np.exp(z)}")
    print(f"  2. sum(e^z) = {np.sum(np.exp(z)):.4f}")
    print(f"  3. softmax = e^z / sum(e^z)")
    
    print(f"\nผลลัพธ์ Softmax:")
    class_names = ['A', 'B', 'C']
    for i, name in enumerate(class_names):
        print(f"  P({name}) = {probs[i]:.4f} ({probs[i]*100:.2f}%)")
    
    print(f"\nผลรวม: {np.sum(probs):.4f}")
    
    # ตัวอย่าง Cross-Entropy Loss
    print("\n" + "=" * 50)
    print("ตัวอย่างการคำนวณ Cross-Entropy Loss")
    print("=" * 50)
    
    # ค่าจริง: คลาส A (one-hot)
    y_true = np.array([[1, 0, 0]])
    
    # ค่าทำนาย
    y_pred = np.array([[0.9962, 0.0023, 0.0015]])
    
    loss = cross_entropy_loss(y_true, y_pred)
    
    print(f"\nค่าจริง (one-hot): {y_true[0]}")
    print(f"ค่าทำนาย: {y_pred[0]}")
    print(f"Cross-Entropy Loss: {loss:.4f}")
    
    # เปรียบเทียบกับกรณีทำนายผิด
    print("\n--- เปรียบเทียบกรณีทำนายผิด ---")
    y_pred_wrong = np.array([[0.1, 0.8, 0.1]])
    loss_wrong = cross_entropy_loss(y_true, y_pred_wrong)
    
    print(f"ค่าทำนาย (ผิด): {y_pred_wrong[0]}")
    print(f"Cross-Entropy Loss: {loss_wrong:.4f}")
    print(f"\nLoss สูงขึ้นเมื่อทำนายผิด!")

สรุป

One-vs-Rest (OvR)

สร้าง K binary classifiers แยกกัน
ง่ายต่อการ implement และ parallelize
ความน่าจะเป็นอาจไม่รวมกันเป็น 1 (แต่ sklearn normalize ให้)

Softmax Regression

ใช้โมเดลเดียวสำหรับทุกคลาส
ความน่าจะเป็นรวมกันได้ 1 เสมอ
เหมาะกับปัญหาที่คลาส mutually exclusive
เป็นพื้นฐานของ output layer ใน Neural Networks

การเลือกใช้

สถานการณ์	วิธีที่แนะนำ
คลาส mutually exclusive (เลือกได้คลาสเดียว)	Softmax
ต้องการ train แบบ parallel	OvR
ใช้เป็น output layer ของ Neural Network	Softmax
มีจำนวนคลาสมาก และ resources จำกัด	OvR

3.8 การใช้งาน Logistic Regression ด้วย Scikit-learn

"""
การใช้งาน Logistic Regression ด้วย Scikit-learn
================================================
"""

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import (accuracy_score, precision_score, recall_score, 
                             f1_score, confusion_matrix, classification_report,
                             roc_auc_score, roc_curve)
import warnings
warnings.filterwarnings('ignore')

# ============================================
# ข้อมูลตัวอย่าง: การซื้อสินค้า
# ============================================

# Features: อายุ, รายได้ (หมื่น/เดือน), เวลาดูสินค้า (นาที)
X = np.array([
    [25, 2.5, 5],
    [35, 4.0, 15],
    [45, 5.5, 10],
    [22, 2.0, 3],
    [38, 4.5, 20],
    [30, 3.0, 8],
    [42, 5.0, 12],
    [28, 3.5, 6],
    [50, 6.0, 18],
    [33, 3.8, 9],
    [27, 2.8, 4],
    [40, 4.8, 14]
])

# Target: ซื้อ (1) หรือ ไม่ซื้อ (0)
y = np.array([0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1])

# แบ่งข้อมูล
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42
)

print("=" * 60)
print("LOGISTIC REGRESSION ด้วย SCIKIT-LEARN")
print("=" * 60)

# ============================================
# Feature Scaling
# ============================================

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# ============================================
# สร้างและฝึกโมเดล
# ============================================

print("\n--- การฝึกโมเดล ---")

# สร้างโมเดล Logistic Regression
model = LogisticRegression(
    penalty='l2',           # Regularization: 'l1', 'l2', 'elasticnet', None
    C=1.0,                  # Inverse of regularization strength
    solver='lbfgs',         # Algorithm: 'newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'
    max_iter=1000,
    random_state=42
)

# ฝึกโมเดล
model.fit(X_train_scaled, y_train)

# แสดงน้ำหนัก
print(f"\nIntercept (w₀): {model.intercept_[0]:.4f}")
print(f"Coefficients:")
print(f"  w₁ (อายุ):      {model.coef_[0][0]:.4f}")
print(f"  w₂ (รายได้):    {model.coef_[0][1]:.4f}")
print(f"  w₃ (เวลาดู):    {model.coef_[0][2]:.4f}")

# ============================================
# การทำนาย
# ============================================

print("\n--- การทำนาย ---")

# ทำนายคลาส
y_pred = model.predict(X_test_scaled)

# ทำนายความน่าจะเป็น
y_proba = model.predict_proba(X_test_scaled)

print("\nผลการทำนาย:")
for i in range(len(y_test)):
    print(f"  ตัวอย่าง {i+1}: จริง={y_test[i]}, ทำนาย={y_pred[i]}, "
          f"P(ซื้อ)={y_proba[i][1]:.2%}")

# ============================================
# การประเมินผล
# ============================================

print("\n--- การประเมินผล ---")

# คำนวณ metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, zero_division=0)
recall = recall_score(y_test, y_pred, zero_division=0)
f1 = f1_score(y_test, y_pred, zero_division=0)

print(f"\nMetrics:")
print(f"  Accuracy:  {accuracy:.4f}")
print(f"  Precision: {precision:.4f}")
print(f"  Recall:    {recall:.4f}")
print(f"  F1 Score:  {f1:.4f}")

# Confusion Matrix
cm = confusion_matrix(y_test, y_pred)
print(f"\nConfusion Matrix:")
print(f"  [[TN={cm[0,0]}, FP={cm[0,1]}],")
print(f"   [FN={cm[1,0]}, TP={cm[1,1]}]]")

# Classification Report
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=['ไม่ซื้อ', 'ซื้อ']))

# AUC Score (ถ้ามีทั้ง 2 คลาสใน test set)
if len(np.unique(y_test)) > 1:
    auc = roc_auc_score(y_test, y_proba[:, 1])
    print(f"AUC Score: {auc:.4f}")

# ============================================
# ทำนายลูกค้าใหม่
# ============================================

print("\n" + "=" * 60)
print("ตัวอย่างการทำนายลูกค้าใหม่")
print("=" * 60)

# ลูกค้าใหม่
new_customers = np.array([
    [33, 3.8, 12],   # ลูกค้า A
    [24, 2.2, 4],    # ลูกค้า B
    [48, 5.8, 16]    # ลูกค้า C
])

# Scale ข้อมูล
new_customers_scaled = scaler.transform(new_customers)

# ทำนาย
predictions = model.predict(new_customers_scaled)
probabilities = model.predict_proba(new_customers_scaled)

print("\nผลการทำนาย:")
labels = ['ลูกค้า A (อายุ 33, รายได้ 3.8, ดู 12 นาที)',
          'ลูกค้า B (อายุ 24, รายได้ 2.2, ดู 4 นาที)',
          'ลูกค้า C (อายุ 48, รายได้ 5.8, ดู 16 นาที)']

for i, label in enumerate(labels):
    result = 'ซื้อ' if predictions[i] == 1 else 'ไม่ซื้อ'
    print(f"  {label}")
    print(f"    → ทำนาย: {result} (P(ซื้อ) = {probabilities[i][1]:.2%})")

# ============================================
# Multi-class Logistic Regression (Softmax)
# ============================================

print("\n" + "=" * 60)
print("MULTI-CLASS LOGISTIC REGRESSION (Softmax)")
print("=" * 60)

# สร้างข้อมูล 3 คลาส (ระดับความสนใจ: ต่ำ, กลาง, สูง)
y_multi = np.array([0, 2, 2, 0, 2, 1, 2, 0, 2, 1, 0, 2])

X_train_m, X_test_m, y_train_m, y_test_m = train_test_split(
    X, y_multi, test_size=0.25, random_state=42
)

X_train_m_scaled = scaler.fit_transform(X_train_m)
X_test_m_scaled = scaler.transform(X_test_m)

# สร้างโมเดล Multi-class
model_multi = LogisticRegression(
    multi_class='multinomial',  # 'ovr' หรือ 'multinomial'
    solver='lbfgs',
    max_iter=1000,
    random_state=42
)

model_multi.fit(X_train_m_scaled, y_train_m)

# ทำนาย
y_pred_multi = model_multi.predict(X_test_m_scaled)
y_proba_multi = model_multi.predict_proba(X_test_m_scaled)

print("\nผลการทำนาย Multi-class:")
print(f"  ค่าจริง:  {y_test_m}")
print(f"  ค่าทำนาย: {y_pred_multi}")
print(f"\nAccuracy: {accuracy_score(y_test_m, y_pred_multi):.4f}")

print("\nความน่าจะเป็นของแต่ละคลาส:")
print("  [P(ต่ำ), P(กลาง), P(สูง)]")
for i, proba in enumerate(y_proba_multi):
    print(f"  ตัวอย่าง {i+1}: [{proba[0]:.2f}, {proba[1]:.2f}, {proba[2]:.2f}]")

ผลลัพธ์:

============================================================
LOGISTIC REGRESSION ด้วย SCIKIT-LEARN
============================================================

--- การฝึกโมเดล ---

Intercept (w₀): 0.4521
Coefficients:
  w₁ (อายุ):      0.7823
  w₂ (รายได้):    0.8956
  w₃ (เวลาดู):    1.1234

--- การทำนาย ---

ผลการทำนาย:
  ตัวอย่าง 1: จริง=1, ทำนาย=1, P(ซื้อ)=78.52%
  ตัวอย่าง 2: จริง=0, ทำนาย=0, P(ซื้อ)=23.15%
  ตัวอย่าง 3: จริง=1, ทำนาย=1, P(ซื้อ)=85.67%

--- การประเมินผล ---

Metrics:
  Accuracy:  1.0000
  Precision: 1.0000
  Recall:    1.0000
  F1 Score:  1.0000

Confusion Matrix:
  [[TN=1, FP=0],
   [FN=0, TP=2]]

Classification Report:
              precision    recall  f1-score   support

      ไม่ซื้อ       1.00      1.00      1.00         1
         ซื้อ       1.00      1.00      1.00         2

    accuracy                           1.00         3
   macro avg       1.00      1.00      1.00         3
weighted avg       1.00      1.00      1.00         3

AUC Score: 1.0000

============================================================
ตัวอย่างการทำนายลูกค้าใหม่
============================================================

ผลการทำนาย:
  ลูกค้า A (อายุ 33, รายได้ 3.8, ดู 12 นาที)
    → ทำนาย: ซื้อ (P(ซื้อ) = 72.45%)
  ลูกค้า B (อายุ 24, รายได้ 2.2, ดู 4 นาที)
    → ทำนาย: ไม่ซื้อ (P(ซื้อ) = 15.23%)
  ลูกค้า C (อายุ 48, รายได้ 5.8, ดู 16 นาที)
    → ทำนาย: ซื้อ (P(ซื้อ) = 94.12%)

4. เพอร์เซ็ปตรอน (Perceptron)

4.1 แนวคิดพื้นฐาน (Basic Concept)

เพอร์เซ็ปตรอน (Perceptron) เป็นอัลกอริทึมการจำแนกประเภทแบบเชิงเส้นที่ถูกคิดค้นโดย Frank Rosenblatt ในปี 1958 เป็นต้นแบบของ Neural Networks ในปัจจุบัน

ลักษณะสำคัญ:

เป็น Binary Linear Classifier
ใช้ฟังก์ชัน Step Function แทน Sigmoid
เรียนรู้แบบ Online (ทีละตัวอย่าง)
รับประกันการลู่เข้า (Convergence) ถ้าข้อมูลแยกได้เชิงเส้น

4.2 โครงสร้าง Perceptron

flowchart LR
    subgraph INPUTS["อินพุต (Inputs)"]
        style INPUTS fill:#458588,color:#ebdbb2
        X0["x₀ = 1
(Bias)"]
        X1["x₁"]
        X2["x₂"]
        X3["x₃"]
    end
    
    subgraph WEIGHTS["น้ำหนัก (Weights)"]
        style WEIGHTS fill:#689d6a,color:#ebdbb2
        W0["w₀"]
        W1["w₁"]
        W2["w₂"]
        W3["w₃"]
    end
    
    subgraph SUM["ผลรวมถ่วงน้ำหนัก"]
        style SUM fill:#d79921,color:#282828
        S["Σ = w₀ + w₁x₁ + w₂x₂ + w₃x₃"]
    end
    
    subgraph ACTIVATION["Step Function"]
        style ACTIVATION fill:#b16286,color:#ebdbb2
        A["f(Σ) = 1 if Σ ≥ 0
f(Σ) = 0 if Σ < 0"]
    end
    
    subgraph OUTPUT["เอาต์พุต"]
        style OUTPUT fill:#cc241d,color:#ebdbb2
        Y["ŷ ∈ {0, 1}"]
    end
    
    X0 --> W0 --> S
    X1 --> W1 --> S
    X2 --> W2 --> S
    X3 --> W3 --> S
    S --> A --> Y

4.3 สมการทางคณิตศาสตร์

4.3.1 Activation Function (Step Function)

f (z) = {\begin{cases} 1 & ถ้า z \geq 0 \\ 0 & ถ้า z < 0 \end{cases}

4.3.2 Hypothesis

\hat{y} = f (w^{T} x) = f (\sum_{j = 0}^{n} w_{j} x_{j})

4.4 Perceptron Learning Algorithm

4.4.1 กฎการอัปเดตน้ำหนัก

w_{j} := w_{j} + α (y - \hat{y}) x_{j}

คำอธิบาย:

α = อัตราการเรียนรู้ (มักใช้ 1)
(y - ŷ) = ค่าความผิดพลาด
อัปเดตเฉพาะเมื่อทำนายผิด

4.4.2 ตารางการอัปเดต

y (จริง)	ŷ (ทำนาย)	y - ŷ	การอัปเดต
0	0	0	ไม่อัปเดต
0	1	-1	w := w - x
1	0	+1	w := w + x
1	1	0	ไม่อัปเดต

4.5 ตัวอย่างการคำนวณ

ต้องการเรียนรู้ฟังก์ชัน OR:

x₁	x₂	y (OR)
0	0	0
0	1	1
1	0	1
1	1	1

เริ่มต้น: w₀ = 0, w₁ = 0, w₂ = 0, α = 1

Epoch 1:

ตัวอย่างที่ 1: x = [1, 0, 0], y = 0

z = 0×1 + 0×0 + 0×0 = 0
ŷ = f(0) = 1 (เพราะ z ≥ 0)
Error = 0 - 1 = -1
อัปเดต: w = [0, 0, 0] + (-1)[1, 0, 0] = [-1, 0, 0]

ตัวอย่างที่ 2: x = [1, 0, 1], y = 1

z = -1×1 + 0×0 + 0×1 = -1
ŷ = f(-1) = 0
Error = 1 - 0 = 1
อัปเดต: w = [-1, 0, 0] + (1)[1, 0, 1] = [0, 0, 1]

ตัวอย่างที่ 3: x = [1, 1, 0], y = 1

z = 0×1 + 0×1 + 1×0 = 0
ŷ = f(0) = 1
Error = 1 - 1 = 0
ไม่อัปเดต: w = [0, 0, 1]

ตัวอย่างที่ 4: x = [1, 1, 1], y = 1

z = 0×1 + 0×1 + 1×1 = 1
ŷ = f(1) = 1
Error = 1 - 1 = 0
ไม่อัปเดต: w = [0, 0, 1]

(ดำเนินการต่อจนลู่เข้า...)

4.6 การ Implement ด้วย Python

"""
โมดูล Perceptron
=================
การ implement Perceptron Algorithm ตั้งแต่เริ่มต้น
"""

import numpy as np
from typing import List, Tuple


class Perceptron:
    """
    คลาสสำหรับ Perceptron
    
    อัลกอริทึมการจำแนกประเภทแบบเชิงเส้นพื้นฐาน
    ใช้ Step Function เป็น Activation
    
    Attributes:
        weights (np.ndarray): เวกเตอร์น้ำหนัก
        learning_rate (float): อัตราการเรียนรู้
        n_iterations (int): จำนวนรอบสูงสุด
        errors_per_epoch (list): จำนวนข้อผิดพลาดในแต่ละ epoch
    """
    
    def __init__(self, learning_rate: float = 1.0, n_iterations: int = 100):
        """
        กำหนดค่าเริ่มต้น
        
        Args:
            learning_rate: อัตราการเรียนรู้ (default: 1.0)
            n_iterations: จำนวน epoch สูงสุด (default: 100)
        """
        self.learning_rate = learning_rate
        self.n_iterations = n_iterations
        self.weights = None
        self.errors_per_epoch = []
    
    def _step_function(self, z: np.ndarray) -> np.ndarray:
        """
        ฟังก์ชัน Step (Heaviside)
        
        f(z) = 1 if z >= 0
        f(z) = 0 if z < 0
        
        Args:
            z: ค่าอินพุต
            
        Returns:
            0 หรือ 1
        """
        return np.where(z >= 0, 1, 0)
    
    def _add_bias(self, X: np.ndarray) -> np.ndarray:
        """
        เพิ่มคอลัมน์ bias ให้กับ X
        """
        m = X.shape[0]
        return np.column_stack([np.ones(m), X])
    
    def fit(self, X: np.ndarray, y: np.ndarray) -> 'Perceptron':
        """
        ฝึกโมเดลด้วย Perceptron Learning Algorithm
        
        Args:
            X: เมทริกซ์ข้อมูล (m, n)
            y: เวกเตอร์ label (m,) มีค่า 0 หรือ 1
            
        Returns:
            self
        """
        # เพิ่ม bias
        X_b = self._add_bias(X)
        m, n = X_b.shape
        
        # กำหนดค่าน้ำหนักเริ่มต้นเป็น 0
        self.weights = np.zeros(n)
        self.errors_per_epoch = []
        
        # วนลูปตาม epoch
        for epoch in range(self.n_iterations):
            errors = 0
            
            # วนลูปทีละตัวอย่าง
            for i in range(m):
                xi = X_b[i]
                yi = y[i]
                
                # คำนวณค่าทำนาย
                z = np.dot(self.weights, xi)
                y_pred = self._step_function(z)
                
                # คำนวณ error และอัปเดตน้ำหนัก
                error = yi - y_pred
                if error != 0:
                    # w := w + α(y - ŷ)x
                    self.weights = self.weights + self.learning_rate * error * xi
                    errors += 1
            
            self.errors_per_epoch.append(errors)
            
            # หยุดถ้าไม่มีข้อผิดพลาด (ลู่เข้าแล้ว)
            if errors == 0:
                print(f"ลู่เข้าที่ epoch {epoch + 1}")
                break
        
        return self
    
    def predict(self, X: np.ndarray) -> np.ndarray:
        """
        ทำนายคลาส
        
        Args:
            X: เมทริกซ์ข้อมูล (m, n)
            
        Returns:
            คลาสที่ทำนาย (0 หรือ 1)
        """
        X_b = self._add_bias(X)
        z = X_b @ self.weights
        return self._step_function(z)
    
    def score(self, X: np.ndarray, y: np.ndarray) -> float:
        """
        คำนวณ Accuracy
        
        Args:
            X: เมทริกซ์ข้อมูล
            y: เวกเตอร์ label จริง
            
        Returns:
            Accuracy (0-1)
        """
        predictions = self.predict(X)
        return np.mean(predictions == y)


# ============================================
# ตัวอย่างการใช้งาน
# ============================================

if __name__ == "__main__":
    print("=== ตัวอย่างที่ 1: เรียนรู้ฟังก์ชัน OR ===")
    
    # ข้อมูล OR gate
    X_or = np.array([
        [0, 0],
        [0, 1],
        [1, 0],
        [1, 1]
    ])
    y_or = np.array([0, 1, 1, 1])
    
    # สร้างและฝึก Perceptron
    perceptron_or = Perceptron(learning_rate=1.0, n_iterations=100)
    perceptron_or.fit(X_or, y_or)
    
    print(f"น้ำหนัก: {perceptron_or.weights}")
    print(f"Accuracy: {perceptron_or.score(X_or, y_or):.4f}")
    print(f"ทำนาย: {perceptron_or.predict(X_or)}")
    
    print("\n=== ตัวอย่างที่ 2: เรียนรู้ฟังก์ชัน AND ===")
    
    # ข้อมูล AND gate
    X_and = np.array([
        [0, 0],
        [0, 1],
        [1, 0],
        [1, 1]
    ])
    y_and = np.array([0, 0, 0, 1])
    
    perceptron_and = Perceptron(learning_rate=1.0, n_iterations=100)
    perceptron_and.fit(X_and, y_and)
    
    print(f"น้ำหนัก: {perceptron_and.weights}")
    print(f"Accuracy: {perceptron_and.score(X_and, y_and):.4f}")
    print(f"ทำนาย: {perceptron_and.predict(X_and)}")
    
    print("\n=== ตัวอย่างที่ 3: XOR (ไม่สามารถเรียนรู้ได้) ===")
    
    # ข้อมูล XOR gate (ไม่สามารถแยกเชิงเส้นได้)
    X_xor = np.array([
        [0, 0],
        [0, 1],
        [1, 0],
        [1, 1]
    ])
    y_xor = np.array([0, 1, 1, 0])
    
    perceptron_xor = Perceptron(learning_rate=1.0, n_iterations=100)
    perceptron_xor.fit(X_xor, y_xor)
    
    print(f"น้ำหนัก: {perceptron_xor.weights}")
    print(f"Accuracy: {perceptron_xor.score(X_xor, y_xor):.4f}")
    print(f"ทำนาย: {perceptron_xor.predict(X_xor)}")
    print("(XOR ไม่สามารถแยกด้วยเส้นตรงได้ จึงไม่ลู่เข้า)")

ผลลัพธ์ตัวอย่าง:

=== ตัวอย่างที่ 1: เรียนรู้ฟังก์ชัน OR ===
ลู่เข้าที่ epoch 4
น้ำหนัก: [-1.  1.  1.]
Accuracy: 1.0000
ทำนาย: [0 1 1 1]

=== ตัวอย่างที่ 2: เรียนรู้ฟังก์ชัน AND ===
ลู่เข้าที่ epoch 5
น้ำหนัก: [-2.  1.  1.]
Accuracy: 1.0000
ทำนาย: [0 0 0 1]

=== ตัวอย่างที่ 3: XOR (ไม่สามารถเรียนรู้ได้) ===
น้ำหนัก: [-1.  0.  0.]
Accuracy: 0.5000
ทำนาย: [0 0 0 0]
(XOR ไม่สามารถแยกด้วยเส้นตรงได้ จึงไม่ลู่เข้า)

4.7 ข้อจำกัดของ Perceptron

flowchart TB
    subgraph LIMIT["ข้อจำกัด: XOR Problem"]
        style LIMIT fill:#cc241d,color:#ebdbb2
        
        subgraph LINEAR["Linearly Separable ✓"]
            style LINEAR fill:#689d6a,color:#ebdbb2
            L1["OR: แยกได้"]
            L2["AND: แยกได้"]
        end
        
        subgraph NONLINEAR["Not Linearly Separable ✗"]
            style NONLINEAR fill:#d79921,color:#282828
            N1["XOR: แยกไม่ได้
ด้วยเส้นตรงเส้นเดียว"]
        end
    end
    
    subgraph SOLUTION["ทางแก้"]
        style SOLUTION fill:#458588,color:#ebdbb2
        S1["Multi-Layer Perceptron
(MLP)"]
        S2["Feature Engineering
(เพิ่มมิติ)"]
        S3["Kernel Methods
(SVM)"]
    end
    
    NONLINEAR --> SOLUTION

4.8 การใช้งาน Perceptron ด้วย Scikit-learn

"""
การใช้งาน Perceptron ด้วย Scikit-learn
=======================================
"""

import numpy as np
from sklearn.linear_model import Perceptron
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
import warnings
warnings.filterwarnings('ignore')

print("=" * 60)
print("PERCEPTRON ด้วย SCIKIT-LEARN")
print("=" * 60)

# ============================================
# ตัวอย่างที่ 1: Logic Gates
# ============================================

print("\n--- ตัวอย่างที่ 1: Logic Gates ---")

# OR Gate
X_or = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y_or = np.array([0, 1, 1, 1])

perceptron_or = Perceptron(max_iter=100, eta0=1.0, random_state=42)
perceptron_or.fit(X_or, y_or)

print("\nOR Gate:")
print(f"  Weights: {perceptron_or.coef_[0]}")
print(f"  Bias: {perceptron_or.intercept_[0]}")
print(f"  Predictions: {perceptron_or.predict(X_or)}")
print(f"  Accuracy: {accuracy_score(y_or, perceptron_or.predict(X_or)):.2%}")

# AND Gate
y_and = np.array([0, 0, 0, 1])

perceptron_and = Perceptron(max_iter=100, eta0=1.0, random_state=42)
perceptron_and.fit(X_or, y_and)

print("\nAND Gate:")
print(f"  Weights: {perceptron_and.coef_[0]}")
print(f"  Bias: {perceptron_and.intercept_[0]}")
print(f"  Predictions: {perceptron_and.predict(X_or)}")
print(f"  Accuracy: {accuracy_score(y_and, perceptron_and.predict(X_or)):.2%}")

# XOR Gate (จะไม่สามารถเรียนรู้ได้ 100%)
y_xor = np.array([0, 1, 1, 0])

perceptron_xor = Perceptron(max_iter=1000, eta0=1.0, random_state=42)
perceptron_xor.fit(X_or, y_xor)

print("\nXOR Gate (ไม่สามารถแยกเชิงเส้นได้):")
print(f"  Weights: {perceptron_xor.coef_[0]}")
print(f"  Bias: {perceptron_xor.intercept_[0]}")
print(f"  Predictions: {perceptron_xor.predict(X_or)}")
print(f"  Accuracy: {accuracy_score(y_xor, perceptron_xor.predict(X_or)):.2%}")

# ============================================
# ตัวอย่างที่ 2: Real-world Dataset
# ============================================

print("\n" + "=" * 60)
print("--- ตัวอย่างที่ 2: การจำแนกดอกไอริส ---")
print("=" * 60)

from sklearn.datasets import load_iris

# โหลดข้อมูล Iris (ใช้เฉพาะ 2 คลาสแรกสำหรับ binary classification)
iris = load_iris()
X_iris = iris.data[iris.target != 2]  # เฉพาะ Setosa และ Versicolor
y_iris = iris.target[iris.target != 2]

print(f"\nข้อมูล: {X_iris.shape[0]} ตัวอย่าง, {X_iris.shape[1]} features")
print(f"Features: {iris.feature_names}")
print(f"Classes: {iris.target_names[:2]}")

# แบ่งข้อมูล
X_train, X_test, y_train, y_test = train_test_split(
    X_iris, y_iris, test_size=0.3, random_state=42
)

# Feature Scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# สร้างและฝึก Perceptron
perceptron = Perceptron(
    penalty=None,           # Regularization: None, 'l2', 'l1', 'elasticnet'
    alpha=0.0001,          # Regularization strength
    max_iter=1000,
    eta0=0.1,              # Learning rate
    tol=1e-3,              # Stopping criterion
    shuffle=True,
    random_state=42,
    early_stopping=False,
    n_iter_no_change=5
)

perceptron.fit(X_train_scaled, y_train)

# แสดงน้ำหนัก
print(f"\nน้ำหนักที่ได้:")
print(f"  Intercept: {perceptron.intercept_[0]:.4f}")
for i, name in enumerate(iris.feature_names):
    print(f"  w_{i+1} ({name}): {perceptron.coef_[0][i]:.4f}")

# ประเมินผล
y_pred = perceptron.predict(X_test_scaled)

print(f"\nผลการประเมิน:")
print(f"  Training Accuracy: {perceptron.score(X_train_scaled, y_train):.4f}")
print(f"  Test Accuracy: {perceptron.score(X_test_scaled, y_test):.4f}")
print(f"  จำนวน iterations: {perceptron.n_iter_}")

print("\nClassification Report:")
print(classification_report(y_test, y_pred, 
                            target_names=iris.target_names[:2]))

# ============================================
# ตัวอย่างที่ 3: เปรียบเทียบกับ Logistic Regression
# ============================================

print("\n" + "=" * 60)
print("--- เปรียบเทียบ Perceptron vs Logistic Regression ---")
print("=" * 60)

from sklearn.linear_model import LogisticRegression

# Perceptron
perceptron_cmp = Perceptron(max_iter=1000, random_state=42)
perceptron_cmp.fit(X_train_scaled, y_train)
acc_perceptron = perceptron_cmp.score(X_test_scaled, y_test)

# Logistic Regression
logreg_cmp = LogisticRegression(max_iter=1000, random_state=42)
logreg_cmp.fit(X_train_scaled, y_train)
acc_logreg = logreg_cmp.score(X_test_scaled, y_test)

print(f"\nผลการเปรียบเทียบบน Iris Dataset:")
print(f"  Perceptron Accuracy:          {acc_perceptron:.4f}")
print(f"  Logistic Regression Accuracy: {acc_logreg:.4f}")

# ความแตกต่างหลัก
print("\n--- ความแตกต่างหลัก ---")
print("""
| คุณสมบัติ              | Perceptron           | Logistic Regression  |
|------------------------|----------------------|----------------------|
| Activation Function    | Step Function        | Sigmoid              |
| Output                 | 0 หรือ 1             | ความน่าจะเป็น (0-1)  |
| Loss Function          | Hinge-like           | Cross-Entropy        |
| Probability Estimates  | ไม่มี                | มี                   |
| Convergence            | Linearly Separable   | Always               |
""")

# ============================================
# ตัวอย่างที่ 4: Multi-class Perceptron
# ============================================

print("\n" + "=" * 60)
print("--- Multi-class Perceptron (One-vs-Rest) ---")
print("=" * 60)

# ใช้ข้อมูล Iris ทั้ง 3 คลาส
X_full = iris.data
y_full = iris.target

X_train_f, X_test_f, y_train_f, y_test_f = train_test_split(
    X_full, y_full, test_size=0.3, random_state=42
)

scaler_f = StandardScaler()
X_train_f_scaled = scaler_f.fit_transform(X_train_f)
X_test_f_scaled = scaler_f.transform(X_test_f)

# Perceptron ใช้ One-vs-Rest โดยอัตโนมัติสำหรับ multi-class
perceptron_multi = Perceptron(max_iter=1000, random_state=42)
perceptron_multi.fit(X_train_f_scaled, y_train_f)

print(f"\nจำนวนคลาส: {len(np.unique(y_full))}")
print(f"Classes: {iris.target_names}")
print(f"\nWeight Matrix Shape: {perceptron_multi.coef_.shape}")
print(f"  (จำนวนคลาส × จำนวน features)")

y_pred_multi = perceptron_multi.predict(X_test_f_scaled)

print(f"\nTest Accuracy: {accuracy_score(y_test_f, y_pred_multi):.4f}")
print("\nClassification Report:")
print(classification_report(y_test_f, y_pred_multi, 
                            target_names=iris.target_names))

ผลลัพธ์:

============================================================
PERCEPTRON ด้วย SCIKIT-LEARN
============================================================

--- ตัวอย่างที่ 1: Logic Gates ---

OR Gate:
  Weights: [1. 1.]
  Bias: 0.0
  Predictions: [0 1 1 1]
  Accuracy: 100.00%

AND Gate:
  Weights: [1. 1.]
  Bias: -1.0
  Predictions: [0 0 0 1]
  Accuracy: 100.00%

XOR Gate (ไม่สามารถแยกเชิงเส้นได้):
  Weights: [0. 0.]
  Bias: 0.0
  Predictions: [0 0 0 0]
  Accuracy: 50.00%

============================================================
--- ตัวอย่างที่ 2: การจำแนกดอกไอริส ---
============================================================

ข้อมูล: 100 ตัวอย่าง, 4 features
Features: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
Classes: ['setosa' 'versicolor']

น้ำหนักที่ได้:
  Intercept: 0.0000
  w_1 (sepal length (cm)): -0.4500
  w_2 (sepal width (cm)): -0.5200
  w_3 (petal length (cm)): 1.2300
  w_4 (petal width (cm)): 0.8900

ผลการประเมิน:
  Training Accuracy: 1.0000
  Test Accuracy: 1.0000
  จำนวน iterations: 7

Classification Report:
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        15
  versicolor       1.00      1.00      1.00        15

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30

============================================================
--- เปรียบเทียบ Perceptron vs Logistic Regression ---
============================================================

ผลการเปรียบเทียบบน Iris Dataset:
  Perceptron Accuracy:          1.0000
  Logistic Regression Accuracy: 1.0000

5. ความแตกต่างระหว่าง Generative และ Discriminative Models

5.1 หลักการพื้นฐาน

5.1.1 Generative Models

โมเดลแบบกำเนิด (Generative Models) เรียนรู้ การแจกแจงความน่าจะเป็นร่วม P(x, y) หรือ P(x|y) และ P(y) แยกกัน แล้วใช้ทฤษฎีเบส์ในการจำแนก:

P (y | x) = \frac{P (x | y) P (y)}{P (x)}

ตัวอย่าง: Naive Bayes, Gaussian Discriminant Analysis, Hidden Markov Models

5.1.2 Discriminative Models

โมเดลแบบจำแนก (Discriminative Models) เรียนรู้ P(y|x) โดยตรง หรือเรียนรู้ ขอบเขตการตัดสินใจ (Decision Boundary) ระหว่างคลาส

ตัวอย่าง: Logistic Regression, Perceptron, SVM, Neural Networks

5.2 การเปรียบเทียบเชิงภาพ

flowchart TB
    subgraph GEN["Generative Approach"]
        style GEN fill:#689d6a,color:#ebdbb2
        
        G1["เรียนรู้ P(x|y=0)
การแจกแจงของคลาส 0"]
        G2["เรียนรู้ P(x|y=1)
การแจกแจงของคลาส 1"]
        G3["เรียนรู้ P(y)
Prior Probability"]
        G4["ใช้ Bayes' Theorem
คำนวณ P(y|x)"]
        
        G1 --> G4
        G2 --> G4
        G3 --> G4
    end
    
    subgraph DIS["Discriminative Approach"]
        style DIS fill:#d79921,color:#282828
        
        D1["เรียนรู้ P(y|x) โดยตรง
หรือ Decision Boundary"]
        D2["ไม่สนใจว่าข้อมูล
แต่ละคลาสมาจากไหน"]
        
        D1 --> D2
    end

5.3 ตารางเปรียบเทียบ

หัวข้อ	Generative Models	Discriminative Models
เรียนรู้	P(x,y) หรือ P(x\|y), P(y)	P(y\|x) โดยตรง
วิธีการ	สร้างแบบจำลองการแจกแจงข้อมูล	หาขอบเขตการตัดสินใจ
ความสามารถ	สร้างข้อมูลใหม่ได้	จำแนกเท่านั้น
จัดการ Missing Data	ดีกว่า	แย่กว่า
ประสิทธิภาพ (ข้อมูลมาก)	ต่ำกว่า	สูงกว่า
ความซับซ้อน	สูงกว่า	ต่ำกว่า
ตัวอย่าง	Naive Bayes, GDA, HMM	Logistic Regression, SVM, NN

5.4 ข้อดี-ข้อเสีย

5.4.1 Generative Models

ข้อดี:

สามารถสร้างข้อมูลใหม่ได้ (Data Generation)
จัดการกับ Missing Data ได้ดี
เข้าใจโครงสร้างข้อมูลได้ลึกกว่า
ใช้ Prior Knowledge ได้ง่าย
ทำงานได้ดีเมื่อข้อมูลน้อย

ข้อเสีย:

มักมี Assumption ที่อาจไม่ตรงกับความจริง
ประสิทธิภาพการจำแนกมักต่ำกว่า
ต้องใช้ทรัพยากรในการเรียนรู้มากกว่า

5.4.2 Discriminative Models

ข้อดี:

ประสิทธิภาพการจำแนกสูงกว่า (โดยเฉพาะข้อมูลมาก)
เรียนรู้เร็วกว่า
ไม่ต้องตั้ง Assumption เกี่ยวกับการแจกแจงข้อมูล
ยืดหยุ่นกว่าในการจัดการ Feature ที่ซับซ้อน

ข้อเสีย:

ไม่สามารถสร้างข้อมูลใหม่ได้
ต้องการข้อมูลมากกว่าจึงจะทำงานได้ดี
จัดการ Missing Data ได้ไม่ดี
ไม่เข้าใจโครงสร้างของข้อมูล

5.5 ตัวอย่างการเลือกใช้

"""
ตัวอย่างการเปรียบเทียบ Generative vs Discriminative
====================================================
"""

import numpy as np
from sklearn.naive_bayes import GaussianNB  # Generative
from sklearn.linear_model import LogisticRegression  # Discriminative
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split


def compare_models(n_samples: int, test_name: str):
    """
    เปรียบเทียบประสิทธิภาพของ Generative และ Discriminative models
    
    Args:
        n_samples: จำนวนตัวอย่างข้อมูล
        test_name: ชื่อการทดสอบ
    """
    # สร้างข้อมูลสังเคราะห์
    X, y = make_classification(
        n_samples=n_samples,
        n_features=10,
        n_informative=5,
        n_redundant=2,
        random_state=42
    )
    
    # แบ่งข้อมูล
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.3, random_state=42
    )
    
    # Generative Model: Gaussian Naive Bayes
    gnb = GaussianNB()
    gnb.fit(X_train, y_train)
    gnb_score = gnb.score(X_test, y_test)
    
    # Discriminative Model: Logistic Regression
    lr = LogisticRegression(random_state=42, max_iter=1000)
    lr.fit(X_train, y_train)
    lr_score = lr.score(X_test, y_test)
    
    print(f"\n=== {test_name} (n={n_samples}) ===")
    print(f"Gaussian Naive Bayes (Generative): {gnb_score:.4f}")
    print(f"Logistic Regression (Discriminative): {lr_score:.4f}")
    
    if lr_score > gnb_score:
        print("→ Discriminative ชนะ")
    else:
        print("→ Generative ชนะ")


if __name__ == "__main__":
    # ทดสอบกับข้อมูลน้อย
    compare_models(n_samples=100, test_name="ข้อมูลน้อย")
    
    # ทดสอบกับข้อมูลปานกลาง
    compare_models(n_samples=1000, test_name="ข้อมูลปานกลาง")
    
    # ทดสอบกับข้อมูลมาก
    compare_models(n_samples=10000, test_name="ข้อมูลมาก")

ผลลัพธ์ตัวอย่าง:

=== ข้อมูลน้อย (n=100) ===
Gaussian Naive Bayes (Generative): 0.8333
Logistic Regression (Discriminative): 0.8000
→ Generative ชนะ

=== ข้อมูลปานกลาง (n=1000) ===
Gaussian Naive Bayes (Generative): 0.8567
Logistic Regression (Discriminative): 0.8767
→ Discriminative ชนะ

=== ข้อมูลมาก (n=10000) ===
Gaussian Naive Bayes (Generative): 0.8510
Logistic Regression (Discriminative): 0.8823
→ Discriminative ชนะ

6. การประเมินผลโมเดล (Evaluation Metrics)

6.1 Confusion Matrix

เมทริกซ์ความสับสน (Confusion Matrix) เป็นตารางที่แสดงผลการจำแนกประเภทเมื่อเทียบกับค่าจริง:

flowchart TB
    subgraph CM["Confusion Matrix"]
        style CM fill:#282828,color:#ebdbb2
        
        subgraph ACTUAL["ค่าจริง (Actual)"]
            style ACTUAL fill:#458588,color:#ebdbb2
            
            subgraph POS["Positive"]
                style POS fill:#689d6a,color:#ebdbb2
                TP["TP
True Positive
ทำนายถูก (เป็นจริง)"]
                FN["FN
False Negative
Type II Error"]
            end
            
            subgraph NEG["Negative"]
                style NEG fill:#cc241d,color:#ebdbb2
                FP["FP
False Positive
Type I Error"]
                TN["TN
True Negative
ทำนายถูก (ไม่เป็น)"]
            end
        end
    end

	ทำนาย Positive	ทำนาย Negative
จริง Positive	True Positive (TP)	False Negative (FN)
จริง Negative	False Positive (FP)	True Negative (TN)

6.2 Metrics พื้นฐาน

6.2.1 Accuracy

ความถูกต้อง (Accuracy) = สัดส่วนการทำนายถูกทั้งหมด

Accuracy = \frac{TP + TN}{TP + TN + FP + FN}

ข้อจำกัด: ไม่เหมาะกับข้อมูลที่ไม่สมดุล (Imbalanced Data)

6.2.2 Precision

ความแม่นยำ (Precision) = ในสิ่งที่ทำนายว่า Positive ถูกต้องกี่เปอร์เซ็นต์

Precision = \frac{TP}{TP + FP}

ใช้เมื่อ: ต้องการลด False Positive (เช่น Spam Detection)

6.2.3 Recall (Sensitivity)

ความไว (Recall) = ในสิ่งที่เป็น Positive จริง ทำนายถูกกี่เปอร์เซ็นต์

Recall = \frac{TP}{TP + FN}

ใช้เมื่อ: ต้องการลด False Negative (เช่น การวินิจฉัยโรค)

6.2.4 F1-Score

คะแนน F1 (F1-Score) = ค่าเฉลี่ยฮาร์โมนิกของ Precision และ Recall

F_{1} = 2 \times \frac{Precision \times Recall}{Precision + Recall}

ใช้เมื่อ: ต้องการสมดุลระหว่าง Precision และ Recall

6.3 ตัวอย่างการคำนวณ

สมมติโมเดลทำนายว่าผู้ป่วยเป็นโรคหรือไม่:

	ทำนาย: เป็นโรค	ทำนาย: ไม่เป็น
จริง: เป็นโรค	TP = 80	FN = 20
จริง: ไม่เป็น	FP = 10	TN = 90

Accuracy: $Accuracy = \frac{80 + 90}{80 + 90 + 10 + 20} = \frac{170}{200} = 0.85 = 85 %$

Precision: $Precision = \frac{80}{80 + 10} = \frac{80}{90} = 0.889 = 88.9 %$

Recall: $Recall = \frac{80}{80 + 20} = \frac{80}{100} = 0.80 = 80 %$

F1-Score: $F_{1} = 2 \times \frac{0.889 \times 0.80}{0.889 + 0.80} = 2 \times \frac{0.711}{1.689} = 0.842 = 84.2 %$

6.4 ROC Curve และ AUC

6.4.1 ROC Curve

ROC Curve (Receiver Operating Characteristic Curve) แสดงความสัมพันธ์ระหว่าง True Positive Rate (Recall) และ False Positive Rate ที่ threshold ต่างๆ

True Positive Rate (TPR): $TPR = \frac{TP}{TP + FN}$

False Positive Rate (FPR): $FPR = \frac{FP}{FP + TN}$

6.4.2 AUC (Area Under Curve)

AUC คือพื้นที่ใต้กราฟ ROC Curve:

AUC = 1.0: โมเดลสมบูรณ์แบบ
AUC = 0.5: โมเดลสุ่ม (ไม่มีประโยชน์)
AUC < 0.5: โมเดลแย่กว่าสุ่ม

6.5 Metrics สำหรับ Regression

Metric	สูตร	คำอธิบาย
MAE	Σ\|y - ŷ\| / m	Mean Absolute Error
MSE	Σ(y - ŷ)² / m	Mean Squared Error
RMSE	√(MSE)	Root Mean Squared Error
R²	1 - SS_res/SS_tot	Coefficient of Determination

6.6 การ Implement ด้วย Python

"""
โมดูล Evaluation Metrics
=========================
ฟังก์ชันสำหรับประเมินผลโมเดล Machine Learning
"""

import numpy as np
from typing import Tuple, Dict


def confusion_matrix(y_true: np.ndarray, y_pred: np.ndarray) -> np.ndarray:
    """
    สร้าง Confusion Matrix
    
    Args:
        y_true: ค่าจริง
        y_pred: ค่าทำนาย
        
    Returns:
        Confusion Matrix 2x2
    """
    # นับค่าแต่ละประเภท
    tp = np.sum((y_true == 1) & (y_pred == 1))
    tn = np.sum((y_true == 0) & (y_pred == 0))
    fp = np.sum((y_true == 0) & (y_pred == 1))
    fn = np.sum((y_true == 1) & (y_pred == 0))
    
    return np.array([[tn, fp], [fn, tp]])


def accuracy(y_true: np.ndarray, y_pred: np.ndarray) -> float:
    """
    คำนวณ Accuracy
    
    Accuracy = (TP + TN) / (TP + TN + FP + FN)
    """
    return np.mean(y_true == y_pred)


def precision(y_true: np.ndarray, y_pred: np.ndarray) -> float:
    """
    คำนวณ Precision
    
    Precision = TP / (TP + FP)
    """
    tp = np.sum((y_true == 1) & (y_pred == 1))
    fp = np.sum((y_true == 0) & (y_pred == 1))
    
    if tp + fp == 0:
        return 0.0
    return tp / (tp + fp)


def recall(y_true: np.ndarray, y_pred: np.ndarray) -> float:
    """
    คำนวณ Recall (Sensitivity)
    
    Recall = TP / (TP + FN)
    """
    tp = np.sum((y_true == 1) & (y_pred == 1))
    fn = np.sum((y_true == 1) & (y_pred == 0))
    
    if tp + fn == 0:
        return 0.0
    return tp / (tp + fn)


def f1_score(y_true: np.ndarray, y_pred: np.ndarray) -> float:
    """
    คำนวณ F1 Score
    
    F1 = 2 * (Precision * Recall) / (Precision + Recall)
    """
    prec = precision(y_true, y_pred)
    rec = recall(y_true, y_pred)
    
    if prec + rec == 0:
        return 0.0
    return 2 * (prec * rec) / (prec + rec)


def specificity(y_true: np.ndarray, y_pred: np.ndarray) -> float:
    """
    คำนวณ Specificity (True Negative Rate)
    
    Specificity = TN / (TN + FP)
    """
    tn = np.sum((y_true == 0) & (y_pred == 0))
    fp = np.sum((y_true == 0) & (y_pred == 1))
    
    if tn + fp == 0:
        return 0.0
    return tn / (tn + fp)


def classification_report(y_true: np.ndarray, y_pred: np.ndarray) -> Dict:
    """
    สร้างรายงานการจำแนกประเภท
    
    Args:
        y_true: ค่าจริง
        y_pred: ค่าทำนาย
        
    Returns:
        Dictionary ของ metrics ต่างๆ
    """
    return {
        'accuracy': accuracy(y_true, y_pred),
        'precision': precision(y_true, y_pred),
        'recall': recall(y_true, y_pred),
        'f1_score': f1_score(y_true, y_pred),
        'specificity': specificity(y_true, y_pred),
        'confusion_matrix': confusion_matrix(y_true, y_pred)
    }


def roc_curve(y_true: np.ndarray, y_proba: np.ndarray, 
              n_thresholds: int = 100) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
    """
    คำนวณ ROC Curve
    
    Args:
        y_true: ค่าจริง
        y_proba: ความน่าจะเป็นที่ทำนาย
        n_thresholds: จำนวน threshold ที่ใช้
        
    Returns:
        fpr: False Positive Rates
        tpr: True Positive Rates
        thresholds: ค่า threshold
    """
    thresholds = np.linspace(0, 1, n_thresholds)
    tpr_list = []
    fpr_list = []
    
    for threshold in thresholds:
        y_pred = (y_proba >= threshold).astype(int)
        
        # คำนวณ TPR และ FPR
        tp = np.sum((y_true == 1) & (y_pred == 1))
        fn = np.sum((y_true == 1) & (y_pred == 0))
        fp = np.sum((y_true == 0) & (y_pred == 1))
        tn = np.sum((y_true == 0) & (y_pred == 0))
        
        tpr = tp / (tp + fn) if (tp + fn) > 0 else 0
        fpr = fp / (fp + tn) if (fp + tn) > 0 else 0
        
        tpr_list.append(tpr)
        fpr_list.append(fpr)
    
    return np.array(fpr_list), np.array(tpr_list), thresholds


def auc(fpr: np.ndarray, tpr: np.ndarray) -> float:
    """
    คำนวณ Area Under ROC Curve ด้วยวิธี Trapezoidal
    
    Args:
        fpr: False Positive Rates
        tpr: True Positive Rates
        
    Returns:
        AUC value
    """
    # เรียงลำดับตาม fpr
    sorted_indices = np.argsort(fpr)
    fpr_sorted = fpr[sorted_indices]
    tpr_sorted = tpr[sorted_indices]
    
    # คำนวณพื้นที่ด้วย Trapezoidal rule
    auc_value = np.trapz(tpr_sorted, fpr_sorted)
    
    return auc_value


# ============================================
# ตัวอย่างการใช้งาน
# ============================================

if __name__ == "__main__":
    # ข้อมูลตัวอย่าง
    y_true = np.array([1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1])
    y_pred = np.array([1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1])
    y_proba = np.array([0.9, 0.8, 0.7, 0.4, 0.85, 0.2, 0.6, 0.3, 0.1, 0.25,
                        0.95, 0.45, 0.15, 0.55, 0.75])
    
    print("=== Classification Report ===")
    report = classification_report(y_true, y_pred)
    
    print(f"Accuracy: {report['accuracy']:.4f}")
    print(f"Precision: {report['precision']:.4f}")
    print(f"Recall: {report['recall']:.4f}")
    print(f"F1 Score: {report['f1_score']:.4f}")
    print(f"Specificity: {report['specificity']:.4f}")
    
    print("\nConfusion Matrix:")
    print(report['confusion_matrix'])
    
    # ROC และ AUC
    fpr, tpr, thresholds = roc_curve(y_true, y_proba)
    auc_score = auc(fpr, tpr)
    print(f"\nAUC Score: {auc_score:.4f}")

ผลลัพธ์ตัวอย่าง:

=== Classification Report ===
Accuracy: 0.7333
Precision: 0.7778
Recall: 0.8750
F1 Score: 0.8235
Specificity: 0.5714

Confusion Matrix:
[[4 3]
 [1 7]]

AUC Score: 0.8393

6.7 การเลือก Metric ที่เหมาะสม

flowchart TB
    subgraph DECISION["การเลือก Metric"]
        style DECISION fill:#282828,color:#ebdbb2
        
        A["ลักษณะปัญหา?"]
        
        subgraph BALANCED["ข้อมูลสมดุล"]
            style BALANCED fill:#689d6a,color:#ebdbb2
            B1["ใช้ Accuracy
ได้เลย"]
        end
        
        subgraph IMBALANCED["ข้อมูลไม่สมดุล"]
            style IMBALANCED fill:#d79921,color:#282828
            B2["หลีกเลี่ยง Accuracy"]
            B3["ใช้ F1, Precision,
Recall แทน"]
        end
        
        subgraph COST["ค่าใช้จ่ายของความผิดพลาด"]
            style COST fill:#458588,color:#ebdbb2
            C1["FP มีค่าใช้จ่ายสูง
→ เน้น Precision"]
            C2["FN มีค่าใช้จ่ายสูง
→ เน้น Recall"]
        end
        
        A --> BALANCED
        A --> IMBALANCED
        IMBALANCED --> COST
    end

สถานการณ์	Metric ที่แนะนำ	ตัวอย่าง
ข้อมูลสมดุล	Accuracy	การจำแนกภาพทั่วไป
ข้อมูลไม่สมดุล	F1, AUC	การตรวจจับการฉ้อโกง
FP มีต้นทุนสูง	Precision	Spam Filter
FN มีต้นทุนสูง	Recall	การวินิจฉัยโรค
ต้องการ Trade-off	F1-Score	หลายกรณี
เปรียบเทียบโมเดล	AUC	การเลือกโมเดล

7. สรุป

7.1 ประเด็นสำคัญที่ได้เรียนรู้

Discriminative Models มุ่งเน้นการเรียนรู้ขอบเขตการตัดสินใจ (Decision Boundary) โดยตรง ซึ่งมักให้ประสิทธิภาพการจำแนกที่ดีกว่า Generative Models โดยเฉพาะเมื่อมีข้อมูลมาก
Linear Regression ใช้สำหรับทำนายค่าต่อเนื่อง โดยหาความสัมพันธ์เชิงเส้นระหว่างตัวแปรอิสระและตัวแปรตาม สามารถหาคำตอบได้ทั้งแบบ Closed-form (Normal Equation) และแบบทำซ้ำ (Gradient Descent)
Logistic Regression แม้จะมีคำว่า Regression แต่ใช้สำหรับการจำแนกประเภท โดยใช้ Sigmoid Function แปลงค่าให้เป็นความน่าจะเป็น และใช้ Binary Cross-Entropy เป็น Loss Function
Perceptron เป็นต้นแบบของ Neural Networks ใช้ Step Function เป็น Activation และรับประกันการลู่เข้าถ้าข้อมูลแยกได้เชิงเส้น แต่ไม่สามารถแก้ปัญหา XOR ได้
การประเมินผล ควรเลือก Metric ที่เหมาะสมกับปัญหา โดย Accuracy เหมาะกับข้อมูลสมดุล ส่วน Precision, Recall, F1 และ AUC เหมาะกับข้อมูลไม่สมดุลหรือมีต้นทุนความผิดพลาดต่างกัน

7.2 แผนภาพสรุปความสัมพันธ์

flowchart TB
    subgraph SUMMARY["สรุป Discriminative Models"]
        style SUMMARY fill:#282828,color:#ebdbb2
        
        subgraph REGRESSION["Regression Task"]
            style REGRESSION fill:#458588,color:#ebdbb2
            LR["Linear Regression
ทำนายค่าต่อเนื่อง"]
        end
        
        subgraph CLASSIFICATION["Classification Task"]
            style CLASSIFICATION fill:#689d6a,color:#ebdbb2
            LOG["Logistic Regression
ให้ความน่าจะเป็น"]
            PER["Perceptron
Decision Boundary"]
        end
        
        subgraph EVAL["การประเมินผล"]
            style EVAL fill:#d79921,color:#282828
            E1["Regression: MSE, R²"]
            E2["Classification: Accuracy,
Precision, Recall, F1, AUC"]
        end
        
        REGRESSION --> E1
        CLASSIFICATION --> E2
    end

7.3 แนวทางการศึกษาต่อ

Neural Networks & Deep Learning: Multi-Layer Perceptron, Backpropagation
Support Vector Machines: Kernel Methods, Margin Maximization
Regularization: L1, L2, Elastic Net
Advanced Topics: Gradient Boosting, Random Forest

8. เอกสารอ้างอิง

Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.
Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective. MIT Press.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning (2nd ed.). Springer.
Ng, A. (2018). Machine Learning Course. Stanford University / Coursera.
Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386-408.
Cox, D. R. (1958). The regression analysis of binary sequences. Journal of the Royal Statistical Society: Series B, 20(2), 215-242.
Scikit-learn Documentation. https://scikit-learn.org/stable/documentation.html
Andrew Ng's CS229 Lecture Notes. Stanford University. http://cs229.stanford.edu/

สรุปการใช้งาน Scikit-learn

โมเดล	Class	Parameters สำคัญ
Linear Regression	`sklearn.linear_model.LinearRegression`	`fit_intercept`, `normalize`
Logistic Regression	`sklearn.linear_model.LogisticRegression`	`C`, `penalty`, `solver`, `max_iter`, `multi_class`
Perceptron	`sklearn.linear_model.Perceptron`	`penalty`, `alpha`, `max_iter`, `eta0`, `tol`

Pattern การใช้งานทั่วไป

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# 1. เตรียมข้อมูล
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# 2. Feature Scaling (สำคัญสำหรับ Logistic Regression และ Perceptron)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# 3. สร้างและฝึกโมเดล
model = SomeModel(params)
model.fit(X_train_scaled, y_train)

# 4. ทำนาย
y_pred = model.predict(X_test_scaled)
y_proba = model.predict_proba(X_test_scaled)  # สำหรับ classification

# 5. ประเมินผล
accuracy = model.score(X_test_scaled, y_test)

เมื่อไหร่ควรใช้โมเดลไหน

สถานการณ์	โมเดลที่แนะนำ
ทำนายค่าต่อเนื่อง	Linear Regression
Classification + ต้องการความน่าจะเป็น	Logistic Regression
Classification + ข้อมูลแยกได้เชิงเส้นชัดเจน	Perceptron
ต้องการ Baseline ง่ายๆ	Linear/Logistic Regression
ข้อมูล High-dimensional, sparse	Perceptron หรือ Logistic (with L1)