Probabilistic Reasoning — การจัดการความไม่แน่นอน

1. ประวัติและภาพรวม (History and Overview)

%%{init: {'theme': 'base', 'themeVariables': {
  'primaryColor': '#282828',
  'primaryTextColor': '#ebdbb2',
  'primaryBorderColor': '#504945',
  'lineColor': '#d79921',
  'secondaryColor': '#3c3836',
  'tertiaryColor': '#1d2021',
  'edgeLabelBackground': '#3c3836',
  'clusterBkg': '#32302f',
  'titleColor': '#ebdbb2'
}}}%%
flowchart TD
    subgraph E1["📜 ยุคที่ 1: รากฐาน (1600s–1800s)"]
        A["1654: Pascal & Fermat - วางรากฐานทฤษฎีความน่าจะเป็น - Probability Theory Foundations"]
        B["1763: Thomas Bayes - เสนอ Bayes' Theorem - (ตีพิมพ์หลังเสียชีวิต)"]
        C["1812: Pierre-Simon Laplace - พัฒนาสูตร Bayesian - Laplace's Rule of Succession"]
    end

    subgraph E2["⚙️ ยุคที่ 2: คณิตศาสตร์เชิงสถิติ (1900s–1950s)"]
        D["1913: Andrey Markov - Markov Chains - ลำดับความน่าจะเป็น"]
        E["1950s: Claude Shannon - Information Theory - ทฤษฎีสารสนเทศ"]
    end

    subgraph E3["🤖 ยุคที่ 3: AI และ ML (1980s–2000s)"]
        F["1985: Judea Pearl - Bayesian Networks - โครงข่ายเชิงความน่าจะเป็น"]
        G["1989: Baum-Welch Algorithm - HMM Training - การเรียนรู้ HMM"]
        H["1990s: Naive Bayes - ถูกนำมาใช้กรอง Spam Email - Spam Filtering"]
    end

    subgraph E4["🚀 ยุคที่ 4: Modern AI (2010s–ปัจจุบัน)"]
        I["2010s: Deep Probabilistic Models - Variational Autoencoders (VAE)"]
        J["2023+: LLMs ใช้ Probabilistic Sampling - Temperature, Top-p Sampling"]
    end

    A --> B --> C --> D --> E --> F --> G --> H --> I --> J

    style A fill:#458588,color:#ebdbb2
    style B fill:#689d6a,color:#282828
    style C fill:#689d6a,color:#282828
    style D fill:#d79921,color:#282828
    style E fill:#d79921,color:#282828
    style F fill:#cc241d,color:#ebdbb2
    style G fill:#cc241d,color:#ebdbb2
    style H fill:#cc241d,color:#ebdbb2
    style I fill:#b16286,color:#ebdbb2
    style J fill:#b16286,color:#ebdbb2

2. พื้นฐานทฤษฎีความน่าจะเป็น (Probability Theory Basics)

2.1 แนวคิดหลัก (Core Concepts)

ความน่าจะเป็น (Probability) คือ ตัวเลขในช่วง [0, 1] ที่แสดงถึงโอกาสที่เหตุการณ์หนึ่งจะเกิดขึ้น โดย:

0 หมายถึง เหตุการณ์ที่เป็นไปไม่ได้ (Impossible Event)
1 หมายถึง เหตุการณ์ที่แน่นอน (Certain Event)
ค่าระหว่าง 0–1 หมายถึง ระดับความไม่แน่นอน (Uncertainty)

คำศัพท์สำคัญ:

Sample Space (Ω) — ปริภูมิตัวอย่าง: เซตของผลลัพธ์ที่เป็นไปได้ทั้งหมด
Event (E) — เหตุการณ์: เซตย่อยของ Sample Space
Random Variable (X) — ตัวแปรสุ่ม: ฟังก์ชันที่แมปผลลัพธ์ไปยังตัวเลข
Probability Distribution — การแจกแจงความน่าจะเป็น: ฟังก์ชันที่บอกว่าแต่ละค่ามีความน่าจะเป็นเท่าใด

2.2 กฎพื้นฐาน (Axioms of Probability)

กฎของ Kolmogorov:

0 \leq P (E) \leq 1

P (Ω) = 1

P (A \cup B) = P (A) + P (B) - P (A \cap B)

2.3 ความน่าจะเป็นแบบเงื่อนไข (Conditional Probability)

ความน่าจะเป็นแบบมีเงื่อนไข คือ ความน่าจะเป็นที่เหตุการณ์ A เกิดขึ้น เมื่อรู้ว่า เหตุการณ์ B เกิดขึ้นแล้ว:

P (A | B) = \frac{P (A \cap B)}{P (B)} เมื่อ P(B) > 0

ตัวแปรอธิบาย:

P(A|B) = ความน่าจะเป็นของ A เมื่อรู้ว่า B เกิดขึ้น
P(A∩B) = ความน่าจะเป็นที่ A และ B เกิดขึ้นพร้อมกัน
P(B) = ความน่าจะเป็นของ B

ตัวอย่างการคำนวณ:

โจทย์: ในการสุ่มไพ่ 1 ใบจากสำรับ 52 ใบ กำหนดให้ A = "ไพ่แดง", B = "ไพ่ Heart"

P(A) = 26/52 = 0.5
P(B) = 13/52 = 0.25
P(A∩B) = P(B) = 13/52 (ไพ่ Heart ทุกใบเป็นสีแดง)
P(A|B) = (13/52) / (13/52) = 1.0 (ถ้าเป็น Heart แน่นอนว่าแดง)
P(B|A) = (13/52) / (26/52) = 13/26 = 0.5 (ถ้าแดง มีโอกาส 50% เป็น Heart)

2.4 ทฤษฎีของเบส์ (Bayes' Theorem)

ทฤษฎีบทของเบส์ เป็นหัวใจของ Probabilistic Reasoning ทั้งหมด:

P (H | E) = \frac{P (E | H) \cdot P (H)}{P (E)}

ตัวแปรอธิบาย:

P(H|E) = Posterior: ความน่าจะเป็นของสมมติฐาน H หลังเห็นหลักฐาน E
P(E|H) = Likelihood: ความน่าจะเป็นของหลักฐาน E ถ้าสมมติฐาน H เป็นจริง
P(H) = Prior: ความน่าจะเป็นของสมมติฐาน H ก่อนเห็นหลักฐาน
P(E) = Evidence: ความน่าจะเป็นรวมของหลักฐาน E (ค่านิจรูป / Normalizing Constant)

ตัวอย่างการคำนวณ — การตรวจโรค:

โจทย์: โรคหายาก X มีอัตราการเกิดในประชากร 1% มีชุดทดสอบที่:

ความแม่นยำ 99% (Sensitivity) — ถ้าป่วย ผลบวก 99%
ความจำเพาะ 95% (Specificity) — ถ้าไม่ป่วย ผลลบ 95%

คำถาม: ถ้าผลทดสอบเป็นบวก โอกาสที่จริง ๆ ป่วยคือเท่าไร?

กำหนด:

H = "ป่วย", E = "ผลบวก"
P(H) = 0.01 (Prior — อัตราการเกิดโรค)
P(E|H) = 0.99 (Likelihood — Sensitivity)
P(E|¬H) = 0.05 (False Positive Rate = 1 - Specificity)

คำนวณ P(E) ด้วย Law of Total Probability:

P (E) = P (E | H) P (H) + P (E | \neg H) P (\neg H)

P (E) = (0.99 \times 0.01) + (0.05 \times 0.99) = 0.0099 + 0.0495 = 0.0594

P (H | E) = \frac{0.99 \times 0.01}{0.0594} = \frac{0.0099}{0.0594} \approx 0.1667 \approx 16.67%

ข้อสังเกต: แม้ผลทดสอบบวก แต่โอกาสป่วยจริงแค่ ~16.67% เพราะโรคหายากมาก นี่คือปรากฏการณ์ Base Rate Fallacy ที่สำคัญมากในทางการแพทย์และ AI

3. Bayes Decision Theory — ทฤษฎีการตัดสินใจแบบเบส์

3.1 หลักการ (Principle)

Bayes Decision Theory คือ กรอบการตัดสินใจที่ เหมาะสมที่สุดทางสถิติ โดยเลือกการกระทำที่ลด ความเสี่ยงคาดหวัง (Expected Risk) ให้น้อยที่สุด

3.2 องค์ประกอบ (Components)

%%{init: {'theme': 'base', 'themeVariables': {
  'primaryColor': '#282828',
  'primaryTextColor': '#ebdbb2',
  'primaryBorderColor': '#d79921',
  'lineColor': '#b8bb26',
  'secondaryColor': '#3c3836'
}}}%%
graph LR
    OB["🔍 สังเกตการณ์ - Observation x"]
    PR["📊 Prior P(Cᵢ) - ความน่าจะเป็นล่วงหน้า"]
    LK["📈 Likelihood P(x|Cᵢ) - ฟังก์ชัน Likelihood"]
    PO["🎯 Posterior P(Cᵢ|x) - ความน่าจะเป็นภายหลัง"]
    LF["⚖️ Loss Function λ(αᵢ|Cⱼ) - ฟังก์ชันความสูญเสีย"]
    RI["📉 Expected Risk R(αᵢ|x) - ความเสี่ยงคาดหวัง"]
    DE["✅ ตัดสินใจ Decision - เลือก αᵢ ที่ R ต่ำสุด"]

    OB --> PO
    PR --> PO
    LK --> PO
    PO --> RI
    LF --> RI
    RI --> DE

    style OB fill:#458588,color:#ebdbb2
    style PR fill:#689d6a,color:#282828
    style LK fill:#689d6a,color:#282828
    style PO fill:#d79921,color:#282828
    style LF fill:#cc241d,color:#ebdbb2
    style RI fill:#b16286,color:#ebdbb2
    style DE fill:#98971a,color:#ebdbb2

3.3 Bayes Classifier (การจำแนกประเภทแบบเบส์)

กฎการตัดสินใจแบบ MAP (Maximum A Posteriori):

C^{*} = \underset{C \in Classes}{arg max} P (C | x) = \underset{C}{arg max} P (x | C) \cdot P (C)

ตัวแปรอธิบาย:

C* = คลาสที่เลือก (ผลการจำแนกประเภท)
P(C|x) = Posterior ของคลาส C เมื่อเห็นข้อมูล x
P(x|C) = Likelihood ของข้อมูล x ในคลาส C
P(C) = Prior ของคลาส C

3.4 ตัวอย่างการคำนวณ — จำแนกอีเมล Spam

ข้อมูลตัวอย่าง:

หมวดหมู่	จำนวนอีเมล	คำว่า "ฟรี"	คำว่า "ชนะ"
Spam	100	80 อีเมล	70 อีเมล
Not Spam	400	40 อีเมล	20 อีเมล
รวม	500

คำถาม: อีเมลใหม่มีทั้งคำว่า "ฟรี" และ "ชนะ" — เป็น Spam หรือไม่?

ขั้นตอนการคำนวณ:

Prior:
- P(Spam) = 100/500 = 0.20
- P(Not Spam) = 400/500 = 0.80
Likelihood (สมมติ Independence):
- P("ฟรี"|Spam) = 80/100 = 0.80
- P("ชนะ"|Spam) = 70/100 = 0.70
- P("ฟรี"|Not Spam) = 40/400 = 0.10
- P("ชนะ"|Not Spam) = 20/400 = 0.05
คำนวณ (ไม่รวม Normalizing Constant):
- Score(Spam) = 0.80 × 0.70 × 0.20 = 0.112
- Score(Not Spam) = 0.10 × 0.05 × 0.80 = 0.004
ตัดสินใจ: เลือก Spam เพราะ 0.112 > 0.004

4. Naive Bayes Classifier — ตัวจำแนกประเภทแบบเบส์ไร้เดียงสา

4.1 แนวคิด (Concept)

Naive Bayes Classifier คือ อัลกอริทึมที่ใช้ Bayes' Theorem โดยมีสมมติฐาน "Naive" ว่า Features ทุกตัวมีความเป็นอิสระต่อกัน (Conditional Independence) เมื่อรู้คลาสแล้ว

สมมติฐานนี้ "ไร้เดียงสา" เพราะในความเป็นจริงไม่เป็นเช่นนั้น แต่ในทางปฏิบัติโมเดลนี้ทำงานได้ดีมากในหลายปัญหา

4.2 สูตร (Formula)

P (C | x_{1}, x_{2}, \dots, x_{n}) \propto P (C) \cdot \prod_{i = 1}^{n} P (x_{i} | C)

ตัวแปรอธิบาย:

C = คลาสที่ต้องการทำนาย
x₁, x₂, …, xₙ = Feature ที่ 1 ถึง n ของข้อมูล
P(C) = Prior ของคลาส C
P(xᵢ|C) = Likelihood ของ Feature ที่ i ในคลาส C
∝ = สัดส่วน (ไม่รวม Normalizing Constant)

4.3 ประเภทของ Naive Bayes

ประเภท	ข้อมูล Feature	การใช้งานหลัก
Gaussian Naive Bayes	ตัวเลขต่อเนื่อง (สมมติ Normal Distribution)	การจำแนกข้อมูลวิทยาศาสตร์
Multinomial Naive Bayes	จำนวนการนับ (Count Data)	Text Classification, NLP
Bernoulli Naive Bayes	ข้อมูลสองค่า (Binary)	Spam Detection
Complement Naive Bayes	ข้อมูลที่ไม่สมดุล (Imbalanced)	Text Classification แบบปรับปรุง

4.4 ตัวอย่างการคำนวณ — จำแนกรีวิวภัตตาคาร

ชุดข้อมูลฝึกสอน (Training Data):

รีวิว	อร่อย	บริการ	ราคา	ประทับใจ
R1	✓	✓	แพง	บวก
R2	✗	✓	ถูก	ลบ
R3	✓	✗	ถูก	บวก
R4	✓	✓	ถูก	บวก
R5	✗	✗	แพง	ลบ

คำถาม: รีวิวใหม่ {อร่อย=✓, บริการ=✓, ราคา=ถูก} ควรจัดเป็น บวก หรือ ลบ?

คำนวณ:

P(บวก) = 3/5 = 0.60, P(ลบ) = 2/5 = 0.40
P(อร่อย=✓|บวก) = 3/3 = 1.00, P(อร่อย=✓|ลบ) = 0/2 = 0.00*
P(บริการ=✓|บวก) = 2/3 = 0.67, P(บริการ=✓|ลบ) = 1/2 = 0.50
P(ราคา=ถูก|บวก) = 2/3 = 0.67, P(ราคา=ถูก|ลบ) = 1/2 = 0.50

*ปัญหา Zero Probability: ถ้า Feature ใดไม่เคยปรากฏในคลาส จะทำให้ผลคูณทั้งหมดเป็น 0 แก้ด้วย Laplace Smoothing (Add-1 Smoothing)

Laplace Smoothing:

P (x_{i} | C) = \frac{count (x_{i}, C) + 1}{count (C) + | V |}

ตัวแปรอธิบาย: |V| = จำนวน Feature ที่เป็นไปได้ทั้งหมด

หลังใช้ Laplace Smoothing (|V|=2 สำหรับ อร่อย):

P(อร่อย=✓|ลบ) = (0+1)/(2+2) = 0.25

คำนวณ Final Score:

Score(บวก) = 0.60 × 1.00 × 0.67 × 0.67 ≈ 0.269
Score(ลบ) = 0.40 × 0.25 × 0.50 × 0.50 = 0.025

ผล: จัดเป็น บวก ✓

4.5 โค้ด Python

"""
naive_bayes_restaurant.py
ตัวอย่างการสร้าง Naive Bayes Classifier สำหรับจำแนกรีวิวภัตตาคาร
"""

import numpy as np
from sklearn.naive_bayes import MultinomialNB, GaussianNB
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
import pandas as pd

# =============================================
# ส่วนที่ 1: สร้างข้อมูลตัวอย่าง
# =============================================

# ข้อมูลรีวิวภัตตาคาร
data = {
    'อร่อย':    [1, 0, 1, 1, 0, 1, 0, 1, 1, 0],
    'บริการดี': [1, 1, 0, 1, 0, 1, 0, 1, 0, 1],
    'ราคาถูก':  [0, 1, 1, 1, 0, 0, 1, 1, 0, 1],  # 1=ถูก, 0=แพง
    'ประทับใจ': [1, 0, 1, 1, 0, 1, 0, 1, 0, 0]   # 1=บวก, 0=ลบ
}

df = pd.DataFrame(data)
X = df[['อร่อย', 'บริการดี', 'ราคาถูก']].values
y = df['ประทับใจ'].values

print("=" * 50)
print("ข้อมูลการฝึกสอน (Training Data)")
print("=" * 50)
print(df.to_string())

# =============================================
# ส่วนที่ 2: สร้างและฝึกโมเดล Naive Bayes
# =============================================

# แบ่งข้อมูล train/test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# สร้างโมเดล Bernoulli Naive Bayes (เหมาะกับข้อมูล binary)
from sklearn.naive_bayes import BernoulliNB

model = BernoulliNB(alpha=1.0)  # alpha=1.0 คือ Laplace Smoothing
model.fit(X_train, y_train)

# =============================================
# ส่วนที่ 3: ทำนายรีวิวใหม่
# =============================================

# รีวิวใหม่: อร่อย=ใช่, บริการดี=ใช่, ราคาถูก=ใช่
รีวิวใหม่ = np.array([[1, 1, 1]])

# ทำนายคลาส
ผลทำนาย = model.predict(รีวิวใหม่)
ความน่าจะเป็น = model.predict_proba(รีวิวใหม่)

print(" - " + "=" * 50)
print("ผลการทำนาย")
print("=" * 50)
print(f"รีวิวใหม่: อร่อย=✓, บริการดี=✓, ราคาถูก=✓")
print(f"ทำนาย: {'ประทับใจ (บวก)' if ผลทำนาย[0] == 1 else 'ไม่ประทับใจ (ลบ)'}")
print(f"ความน่าจะเป็นที่ลบ: {ความน่าจะเป็น[0][0]:.4f}")
print(f"ความน่าจะเป็นที่บวก: {ความน่าจะเป็น[0][1]:.4f}")

# =============================================
# ส่วนที่ 4: ประเมินโมเดล
# =============================================

y_pred = model.predict(X_test)

print(" - " + "=" * 50)
print("ผลการประเมินโมเดล")
print("=" * 50)
print(classification_report(y_test, y_pred,
                             target_names=['ลบ', 'บวก']))

# =============================================
# ส่วนที่ 5: แสดงค่า Log Probability
# =============================================

print("Log Prior Probability ของแต่ละคลาส:")
for i, cls in enumerate(['ลบ', 'บวก']):
    print(f"  P({cls}) = {np.exp(model.class_log_prior_[i]):.4f}")

print(" - Log Feature Probability:")
features = ['อร่อย', 'บริการดี', 'ราคาถูก']
for i, feat in enumerate(features):
    p_pos = np.exp(model.feature_log_prob_[1][i])
    p_neg = np.exp(model.feature_log_prob_[0][i])
    print(f"  P({feat}|บวก) = {p_pos:.4f}, P({feat}|ลบ) = {p_neg:.4f}")

5. Bayesian Networks — โครงข่ายเบส์

5.1 แนวคิด (Concept)

Bayesian Network (Belief Network) คือ กราฟวนทิศทางที่ไม่มีวงวน (Directed Acyclic Graph — DAG) ที่แสดงความสัมพันธ์เชิงเหตุผลระหว่างตัวแปรสุ่ม พร้อมด้วยตาราง Conditional Probability Table (CPT) ในแต่ละโหนด

ส่วนประกอบ:

โหนด (Node) — แทนตัวแปรสุ่ม (Random Variable)
ส่วนโค้ง (Edge) — แทนความสัมพันธ์เชิงเหตุผล (Causal Relationship)
CPT — ตารางความน่าจะเป็นแบบมีเงื่อนไขของแต่ละโหนด

5.2 ตัวอย่าง — Bayesian Network การวินิจฉัยโรค

%%{init: {'theme': 'base', 'themeVariables': {
  'primaryColor': '#1d2021',
  'primaryTextColor': '#ebdbb2',
  'primaryBorderColor': '#d79921',
  'lineColor': '#b8bb26',
  'edgeLabelBackground': '#3c3836',
  'fontFamily': 'monospace'
}}}%%
graph TD
    SMOKE["🚬 สูบบุหรี่ - Smoking - P(S=T)=0.30"]
    POLL["🏭 มลพิษ - Pollution - P(P=High)=0.40"]
    CANCER["🔬 มะเร็งปอด - Lung Cancer - P(C|S,P)"]
    XRAY["📷 เอกซเรย์ - X-Ray Result - P(X|C)"]
    BREATH["😮‍💨 หายใจลำบาก - Dyspnea - P(D|C)"]

    SMOKE --> CANCER
    POLL --> CANCER
    CANCER --> XRAY
    CANCER --> BREATH

    style SMOKE fill:#cc241d,color:#ebdbb2,stroke:#9d0006
    style POLL fill:#d65d0e,color:#ebdbb2,stroke:#af3a03
    style CANCER fill:#b16286,color:#ebdbb2,stroke:#8f3f71
    style XRAY fill:#458588,color:#ebdbb2,stroke:#076678
    style BREATH fill:#689d6a,color:#282828,stroke:#427b58

Conditional Probability Tables (CPT):

CPT ของ Cancer P(C|Smoking, Pollution):

Smoking	Pollution	P(Cancer=True)
False	Low	0.01
False	High	0.05
True	Low	0.10
True	High	0.30

CPT ของ X-Ray และ Dyspnea:

Cancer	P(X-Ray=Positive)	P(Dyspnea=True)
False	0.05	0.10
True	0.90	0.70

5.3 การอนุมาน (Inference)

ตัวอย่าง: ถ้าผู้ป่วยสูบบุหรี่ และผล X-Ray เป็นบวก โอกาสเป็นมะเร็งคือเท่าไร?

การคำนวณแบบ Variable Elimination:

P (C | S = T, X = T) \propto \sum_{P} P (C | S, P) \cdot P (X = T | C) \cdot P (P)

คำนวณทีละขั้น:

สำหรับ Pollution=Low (P=L, P(Low)=0.60):

P(C=T|S=T,P=L) × P(X=T|C=T) × P(P=L) = 0.10 × 0.90 × 0.60 = 0.0540
P(C=F|S=T,P=L) × P(X=T|C=F) × P(P=L) = 0.90 × 0.05 × 0.60 = 0.0270

สำหรับ Pollution=High (P=H, P(High)=0.40):

P(C=T|S=T,P=H) × P(X=T|C=T) × P(P=H) = 0.30 × 0.90 × 0.40 = 0.1080
P(C=F|S=T,P=H) × P(X=T|C=F) × P(P=H) = 0.70 × 0.05 × 0.40 = 0.0140

รวม:

Unnormalized P(C=T) = 0.0540 + 0.1080 = 0.1620
Unnormalized P(C=F) = 0.0270 + 0.0140 = 0.0410
Total = 0.1620 + 0.0410 = 0.2030
P(C=T | S=T, X=T) = 0.1620/0.2030 ≈ 0.798 ≈ 79.8%

5.4 โค้ด Python — สร้าง Bayesian Network

"""
bayesian_network_medical.py
ตัวอย่างการสร้าง Bayesian Network สำหรับวินิจฉัยโรคมะเร็งปอด
โดยใช้ไลบรารี pgmpy
"""

from pgmpy.models import BayesianNetwork
from pgmpy.factors.discrete import TabularCPD
from pgmpy.inference import VariableElimination

# =============================================
# ส่วนที่ 1: กำหนดโครงสร้างกราฟ
# =============================================

# กำหนดส่วนโค้ง (edges) ของ Bayesian Network
model = BayesianNetwork([
    ('Smoking', 'Cancer'),      # สูบบุหรี่ → มะเร็ง
    ('Pollution', 'Cancer'),    # มลพิษ → มะเร็ง
    ('Cancer', 'XRay'),         # มะเร็ง → X-Ray
    ('Cancer', 'Dyspnea')       # มะเร็ง → หายใจลำบาก
])

# =============================================
# ส่วนที่ 2: กำหนด CPT ของแต่ละโหนด
# =============================================

# CPT สำหรับ Smoking (Prior)
# 0=False, 1=True
cpd_smoking = TabularCPD(
    variable='Smoking',
    variable_card=2,          # จำนวนค่าที่เป็นไปได้
    values=[[0.70], [0.30]]  # P(False), P(True)
)

# CPT สำหรับ Pollution (Prior)
cpd_pollution = TabularCPD(
    variable='Pollution',
    variable_card=2,
    values=[[0.60], [0.40]]  # P(Low), P(High)
)

# CPT สำหรับ Cancer (มีเงื่อนไขจาก Smoking และ Pollution)
# คอลัมน์: S=F,P=L | S=F,P=H | S=T,P=L | S=T,P=H
cpd_cancer = TabularCPD(
    variable='Cancer',
    variable_card=2,
    values=[
        [0.99, 0.95, 0.90, 0.70],  # P(Cancer=False|...)
        [0.01, 0.05, 0.10, 0.30]   # P(Cancer=True|...)
    ],
    evidence=['Smoking', 'Pollution'],
    evidence_card=[2, 2]
)

# CPT สำหรับ XRay
cpd_xray = TabularCPD(
    variable='XRay',
    variable_card=2,
    values=[
        [0.95, 0.10],  # P(XRay=Neg|Cancer=F), P(XRay=Neg|Cancer=T)
        [0.05, 0.90]   # P(XRay=Pos|Cancer=F), P(XRay=Pos|Cancer=T)
    ],
    evidence=['Cancer'],
    evidence_card=[2]
)

# CPT สำหรับ Dyspnea
cpd_dyspnea = TabularCPD(
    variable='Dyspnea',
    variable_card=2,
    values=[
        [0.90, 0.30],  # P(Dyspnea=False|...)
        [0.10, 0.70]   # P(Dyspnea=True|...)
    ],
    evidence=['Cancer'],
    evidence_card=[2]
)

# =============================================
# ส่วนที่ 3: เพิ่ม CPT เข้าโมเดลและตรวจสอบ
# =============================================

model.add_cpds(cpd_smoking, cpd_pollution, cpd_cancer, cpd_xray, cpd_dyspnea)
print(f"โมเดลถูกต้อง: {model.check_model()}")

# =============================================
# ส่วนที่ 4: อนุมาน (Inference)
# =============================================

inference = VariableElimination(model)

# คำถาม: ถ้าสูบบุหรี่และ X-Ray บวก โอกาสเป็นมะเร็งคือเท่าไร?
result = inference.query(
    variables=['Cancer'],
    evidence={'Smoking': 1, 'XRay': 1}  # 1=True/Positive
)

print(" - " + "=" * 50)
print("ผลการอนุมาน: P(Cancer | Smoking=T, XRay=Positive)")
print("=" * 50)
print(result)

# คำถาม: ถ้ามีอาการหายใจลำบาก โอกาสเป็นมะเร็งคือเท่าไร?
result2 = inference.query(
    variables=['Cancer'],
    evidence={'Dyspnea': 1}
)
print(" - P(Cancer | Dyspnea=True):")
print(result2)

6. Markov Models — แบบจำลองมาร์คอฟ

6.1 สมมติฐานมาร์คอฟ (Markov Assumption)

สมมติฐานมาร์คอฟ (Markov Property) กล่าวว่า อนาคตขึ้นอยู่เฉพาะกับปัจจุบัน ไม่ขึ้นกับอดีต:

P (S_{t + 1} | S_{t}, S_{t - 1}, \dots, S_{1}) = P (S_{t + 1} | S_{t})

6.2 Markov Chain — โซ่มาร์คอฟ

Markov Chain คือ กระบวนการสุ่มที่เปลี่ยนสถานะตามเวลา โดยใช้ Transition Matrix (A) แสดงความน่าจะเป็นการเปลี่ยนสถานะ

%%{init: {'theme': 'base', 'themeVariables': {
  'primaryColor': '#282828',
  'primaryTextColor': '#ebdbb2',
  'primaryBorderColor': '#689d6a',
  'lineColor': '#b8bb26'
}}}%%
stateDiagram-v2
    [*] --> ฝน: เริ่มต้น
    ฝน --> ฝน: 0.7 ฝนต่อ
    ฝน --> แดด: 0.3 เปลี่ยนเป็นแดด
    แดด --> ฝน: 0.4 เปลี่ยนเป็นฝน
    แดด --> แดด: 0.6 แดดต่อ

Transition Matrix สภาพอากาศ:

A = \begin{matrix} จาก↓  ไป→ & ฝน & แดด \\ ฝน & 0.7 & 0.3 \\ แดด & 0.4 & 0.6 \end{matrix}

6.3 ตัวอย่างการคำนวณ

โจทย์: วันนี้ฝนตก อีก 2 วันข้างหน้าจะมีสภาพอากาศอะไร?

State Vector เริ่มต้น (วันนี้ฝนตก):

π_{0} = [1.0, 0.0] (ฝน=100%, แดด=0%)

วันที่ 1 (พรุ่งนี้):

π_{1} = π_{0} \times A = [1.0 \times 0.7 + 0.0 \times 0.4, 1.0 \times 0.3 + 0.0 \times 0.6] = [0.70, 0.30]

วันที่ 2 (มะรืน):

π_{2} = π_{1} \times A = [0.70 \times 0.7 + 0.30 \times 0.4, 0.70 \times 0.3 + 0.30 \times 0.6]

π_{2} = [0.490 + 0.120, 0.210 + 0.180] = [0.61, 0.39]

ผล: อีก 2 วัน มีโอกาส ฝน 61% และ แดด 39%

6.4 Stationary Distribution — การแจกแจงในสภาวะคงตัว

เมื่อเวลาผ่านไปนาน ๆ Markov Chain จะเข้าสู่ สภาวะสมดุล (Stationary Distribution) π* ที่ไม่เปลี่ยนแปลงอีก:

π^{*} = π^{*} \times A

แก้สมการ: π*(ฝน) = 0.7π*(ฝน) + 0.4π*(แดด), และ π*(ฝน) + π*(แดด) = 1

ได้ π(ฝน) = 4/7 ≈ 57.14%* และ π(แดด) = 3/7 ≈ 42.86%*

7. Hidden Markov Models (HMM) — แบบจำลองมาร์คอฟแบบซ่อน

7.1 แนวคิด (Concept)

Hidden Markov Model (HMM) คือ ส่วนขยายของ Markov Chain ที่ สถานะ (State) ถูกซ่อน ไม่สามารถสังเกตได้โดยตรง แต่สังเกตเห็นได้เฉพาะ สัญลักษณ์ที่ปล่อยออกมา (Observation/Emission)

%%{init: {
  'theme': 'base',
  'themeVariables': {
    'primaryColor': '#282828',
    'primaryTextColor': '#ebdbb2',
    'primaryBorderColor': '#d79921',
    'lineColor': '#fabd2f'
  }
}}%%
graph LR
    subgraph HIDDEN["🔒 สถานะซ่อน (Hidden States)"]
        S1["☀️ แดด - Sunny"]
        S2["🌧️ ฝน - Rainy"]
        S3["☁️ เมฆ - Cloudy"]
    end

    subgraph OBS["👁️ การสังเกต (Observations)"]
        O1["🚶 เดิน - Walk"]
        O2["🛍️ ช้อปปิ้ง - Shop"]
        O3["🧹 ทำความสะอาด - Clean"]
    end

    %% Transition Probabilities
    S1 -->|"a₁₁=0.6"| S1
    S1 -->|"a₁₂=0.3"| S2
    S1 -->|"a₁₃=0.1"| S3
    S2 -->|"a₂₁=0.2"| S1
    S2 -->|"a₂₂=0.5"| S2
    S2 -->|"a₂₃=0.3"| S3
    S3 -->|"a₃₁=0.4"| S1
    S3 -->|"a₃₂=0.3"| S2
    S3 -->|"a₃₃=0.3"| S3

    %% Emission Probabilities
    S1 -.->|"b₁: 0.6, 0.3, 0.1"| O1
    S1 -.-> O2
    S1 -.-> O3
    S2 -.->|"b₂: 0.1, 0.4, 0.5"| O1
    S2 -.-> O2
    S2 -.-> O3
    S3 -.->|"b₃: 0.3, 0.4, 0.3"| O1
    S3 -.-> O2
    S3 -.-> O3

    %% Styling
    style S1 fill:#d79921,color:#282828
    style S2 fill:#458588,color:#ebdbb2
    style S3 fill:#504945,color:#ebdbb2
    style O1 fill:#98971a,color:#ebdbb2
    style O2 fill:#b16286,color:#ebdbb2
    style O3 fill:#689d6a,color:#282828

7.2 พารามิเตอร์ HMM (Parameters)

HMM กำหนดด้วย λ = (A, B, π):

พารามิเตอร์	สัญลักษณ์	ความหมาย
Transition Matrix	A	aᵢⱼ = P(Sₜ=j \| Sₜ₋₁=i) ความน่าจะเป็นการเปลี่ยนสถานะ
Emission Matrix	B	bᵢ(k) = P(Oₜ=k \| Sₜ=i) ความน่าจะเป็นของ Observation
Initial Distribution	π	πᵢ = P(S₁=i) ความน่าจะเป็นสถานะเริ่มต้น

7.3 ปัญหาหลักสามประการของ HMM (Three Core Problems)

Evaluation Problem — "ลำดับ Observation นี้มีความน่าจะเป็นเท่าไร?" → แก้ด้วย Forward Algorithm
Decoding Problem — "ลำดับ Hidden State ที่น่าจะเป็นที่สุดคืออะไร?" → แก้ด้วย Viterbi Algorithm
Learning Problem — "จะปรับพารามิเตอร์ให้ดีที่สุดอย่างไร?" → แก้ด้วย Baum-Welch Algorithm (EM)

7.4 Viterbi Algorithm — หาลำดับสถานะที่ดีที่สุด

ตัวอย่าง: สังเกตกิจกรรม 3 วัน: [เดิน, ช้อปปิ้ง, ทำความสะอาด] อยากรู้สภาพอากาศแต่ละวัน

ข้อมูล HMM:

Initial State: π = [แดด=0.5, ฝน=0.3, เมฆ=0.2]

Emission Probabilities B:

แดด: เดิน=0.6, ช้อป=0.3, สะอาด=0.1
ฝน: เดิน=0.1, ช้อป=0.4, สะอาด=0.5
เมฆ: เดิน=0.3, ช้อป=0.4, สะอาด=0.3

วันที่ 1 (O₁=เดิน):

สถานะ	π × b(เดิน)	ค่า δ₁
แดด	0.5 × 0.6	0.300
ฝน	0.3 × 0.1	0.030
เมฆ	0.2 × 0.3	0.060

วันที่ 2 (O₂=ช้อปปิ้ง):

สำหรับสถานะ แดด:

มาจากแดด: δ₁(แดด) × a(แดด→แดด) × b(แดด,ช้อป) = 0.300 × 0.6 × 0.3 = 0.0540
มาจากฝน: δ₁(ฝน) × a(ฝน→แดด) × b(แดด,ช้อป) = 0.030 × 0.2 × 0.3 = 0.0018
มาจากเมฆ: δ₁(เมฆ) × a(เมฆ→แดด) × b(แดด,ช้อป) = 0.060 × 0.4 × 0.3 = 0.0072
δ₂(แดด) = max(0.0540, 0.0018, 0.0072) = 0.0540 (มาจากแดด)

ทำเช่นเดียวกันสำหรับทุกสถานะในทุกขั้นเวลา จากนั้น Backtrack หาเส้นทางที่ดีที่สุด

7.5 การประยุกต์ใช้ HMM

%%{init: {'theme': 'base', 'themeVariables': {
  'primaryColor': '#282828',
  'primaryTextColor': '#ebdbb2',
  'primaryBorderColor': '#d79921',
  'lineColor': '#b8bb26'
}}}%%
mindmap
  root((HMM - Applications))
    Speech Recognition
      Hidden State = Phoneme
      Observation = Audio Signal
      Apple Siri / Google Voice
    NLP
      POS Tagging
      Named Entity Recognition
      Machine Translation
    Bioinformatics
      Gene Finding
      Protein Structure
      DNA Sequence Analysis
    Finance
      Market Regime Detection
      Stock Price Modeling
      Risk Analysis
    Computer Vision
      Gesture Recognition
      Activity Detection
      Video Segmentation

7.6 โค้ด Python — HMM ด้วย hmmlearn

"""
hmm_weather.py
ตัวอย่างการสร้างและใช้ Hidden Markov Model สำหรับพยากรณ์สภาพอากาศ
โดยใช้ไลบรารี hmmlearn
"""

import numpy as np
from hmmlearn import hmm
import warnings
warnings.filterwarnings('ignore')

# =============================================
# ส่วนที่ 1: กำหนดพารามิเตอร์ HMM
# =============================================

# กำหนด: แดด=0, ฝน=1, เมฆ=2
# กิจกรรม: เดิน=0, ช้อปปิ้ง=1, ทำความสะอาด=2

# สร้างโมเดล Categorical HMM (3 hidden states)
model = hmm.CategoricalHMM(n_components=3, random_state=42)

# Initial State Distribution π
model.startprob_ = np.array([0.5, 0.3, 0.2])  # [แดด, ฝน, เมฆ]

# Transition Matrix A
model.transmat_ = np.array([
    [0.6, 0.3, 0.1],  # จากแดด → แดด, ฝน, เมฆ
    [0.2, 0.5, 0.3],  # จากฝน → แดด, ฝน, เมฆ
    [0.4, 0.3, 0.3],  # จากเมฆ → แดด, ฝน, เมฆ
])

# Emission Matrix B
model.emissionprob_ = np.array([
    [0.6, 0.3, 0.1],  # แดด: เดิน, ช้อป, สะอาด
    [0.1, 0.4, 0.5],  # ฝน: เดิน, ช้อป, สะอาด
    [0.3, 0.4, 0.3],  # เมฆ: เดิน, ช้อป, สะอาด
])

# =============================================
# ส่วนที่ 2: ประเมิน Observation Sequence
# =============================================

# ลำดับกิจกรรมที่สังเกต: เดิน, ช้อป, สะอาด
observations = np.array([[0, 1, 2]]).T  # ต้องเป็น column vector

# คำนวณ Log-Likelihood ของลำดับ observation นี้
log_prob = model.score(observations)
print(f"Log-Likelihood: {log_prob:.4f}")
print(f"Probability: {np.exp(log_prob):.6f}")

# =============================================
# ส่วนที่ 3: Viterbi Decoding
# =============================================

# หาลำดับ Hidden State ที่น่าจะเป็นที่สุด
log_prob_viterbi, hidden_states = model.decode(
    observations, algorithm='viterbi'
)

state_names = ['☀️ แดด', '🌧️ ฝน', '☁️ เมฆ']
obs_names = ['🚶 เดิน', '🛍️ ช้อปปิ้ง', '🧹 สะอาด']

print(" - " + "=" * 50)
print("ผล Viterbi Decoding")
print("=" * 50)
print(f"{'วัน':<6} {'กิจกรรมที่สังเกต':<20} {'สภาพอากาศที่คาดเดา'}")
print("-" * 50)
for day, (obs, state) in enumerate(zip(observations.flatten(), hidden_states)):
    print(f"วันที่ {day+1:<2} {obs_names[obs]:<22} {state_names[state]}")

# =============================================
# ส่วนที่ 4: การสร้างข้อมูลจำลอง
# =============================================

print(" - " + "=" * 50)
print("สร้างข้อมูลจำลอง 10 วัน")
print("=" * 50)

# สุ่มสร้างลำดับ observation และ hidden state
generated_obs, generated_states = model.sample(10)

for day, (obs, state) in enumerate(zip(generated_obs, generated_states)):
    print(f"วันที่ {day+1:<2}: {state_names[state]:<15} → {obs_names[obs[0]]}")

# =============================================
# ส่วนที่ 5: ฝึกโมเดลจากข้อมูล (Baum-Welch)
# =============================================

print(" - " + "=" * 50)
print("การฝึกโมเดล HMM ด้วย Baum-Welch Algorithm")
print("=" * 50)

# สร้างข้อมูลฝึกจำนวนมาก
train_data, _ = model.sample(1000)

# สร้างโมเดลใหม่และฝึก
new_model = hmm.CategoricalHMM(
    n_components=3,
    n_iter=100,        # จำนวน iteration สูงสุด
    random_state=42
)
new_model.fit(train_data)

print(f"Training Log-Likelihood: {new_model.score(train_data):.4f}")
print(" - Transition Matrix ที่เรียนรู้:")
print(np.round(new_model.transmat_, 3))
print(" - Emission Matrix ที่เรียนรู้:")
print(np.round(new_model.emissionprob_, 3))

8. เปรียบเทียบโมเดล (Model Comparison)

คุณสมบัติ	Naive Bayes	Bayesian Network	Markov Model	HMM
ข้อมูลที่เหมาะ	Classification แบบ i.i.d.	ข้อมูลที่มีความสัมพันธ์ซับซ้อน	ข้อมูลลำดับ (เห็นได้)	ข้อมูลลำดับ (สถานะซ่อน)
สมมติฐาน	Feature อิสระต่อกัน	ความสัมพันธ์เชิงเหตุผล	Markov Property	Markov + Emission
ความซับซ้อน	O(nc) ต่ำมาก	O(2ⁿ) สูงขึ้นตามโครงสร้าง	O(s²)	O(s²·T)
ตีความ	ง่าย	ดีมาก (เชิงเหตุผล)	ปานกลาง	ยาก
การใช้งาน	Spam, Sentiment	Medical, Risk	PageRank, Typing	Speech, NLP
ข้อดี	เร็ว, ง่าย, ข้อมูลน้อย	ยืดหยุ่น, Causal Reasoning	เข้าใจง่าย	จับ Temporal Pattern
ข้อเสีย	Independence Assumption	ออกแบบโครงสร้างยาก	ต้องเห็น State	Training ซับซ้อน

9. เครื่องมือและไลบรารี (Tools and Libraries)

%%{init: {'theme': 'base', 'themeVariables': {
  'primaryColor': '#1d2021',
  'primaryTextColor': '#ebdbb2',
  'primaryBorderColor': '#504945',
  'lineColor': '#d79921'
}}}%%
graph LR
    subgraph CORE["🐍 Python Core"]
        NP["numpy - การคำนวณเชิงตัวเลข"]
        SP["scipy - สถิติขั้นสูง"]
        SK["scikit-learn - Naive Bayes, ML"]
    end

    subgraph BN["🕸️ Bayesian Networks"]
        PG["pgmpy - Probabilistic Graphical Models"]
        PM["pomegranate - Bayesian Networks, HMM"]
        PY["PyMC - Bayesian Inference"]
    end

    subgraph HMM_T["🔄 HMM Tools"]
        HL["hmmlearn - Gaussian & Categorical HMM"]
        POM2["pomegranate - Advanced HMM"]
    end

    subgraph VIZ["📊 Visualization"]
        MPL["matplotlib - Plotting"]
        PLT["networkx - Graph Visualization"]
        SB["seaborn - Statistical Plots"]
    end

    CORE --> BN
    CORE --> HMM_T
    CORE --> VIZ

    style NP fill:#458588,color:#ebdbb2
    style SP fill:#458588,color:#ebdbb2
    style SK fill:#458588,color:#ebdbb2
    style PG fill:#689d6a,color:#282828
    style PM fill:#689d6a,color:#282828
    style PY fill:#689d6a,color:#282828
    style HL fill:#d79921,color:#282828
    style POM2 fill:#d79921,color:#282828
    style MPL fill:#b16286,color:#ebdbb2
    style PLT fill:#b16286,color:#ebdbb2
    style SB fill:#b16286,color:#ebdbb2

คำสั่งติดตั้ง:

# ติดตั้ง Library สำหรับ Probabilistic Reasoning
pip install pgmpy pomegranate hmmlearn pymc networkx
pip install scikit-learn numpy scipy matplotlib seaborn

10. สรุปโดยรวม (Summary)

Probabilistic Reasoning เป็นรากฐานสำคัญของ AI สมัยใหม่ที่ช่วยให้ระบบสามารถ จัดการกับความไม่แน่นอน ได้อย่างมีหลักการทางคณิตศาสตร์

ประเด็นสำคัญที่ได้เรียนรู้:

Bayes' Theorem คือหัวใจของทุกอย่าง — ช่วยอัปเดตความเชื่อเมื่อได้รับหลักฐานใหม่ และแก้ปัญหา Base Rate Fallacy
Naive Bayes Classifier แม้มีสมมติฐานที่ "ไร้เดียงสา" แต่ประสิทธิภาพดีมากในงานจริง เช่น กรอง Spam และ Sentiment Analysis
Bayesian Networks ช่วยแสดงความสัมพันธ์เชิงเหตุผลในระบบที่ซับซ้อน เช่น การวินิจฉัยโรค และการวิเคราะห์ความเสี่ยง
Markov Models เหมาะสำหรับข้อมูลที่มีลำดับเวลา และเป็นรากฐานของ PageRank ของ Google
Hidden Markov Models คือเครื่องมือทรงพลังสำหรับข้อมูลลำดับที่ไม่สามารถสังเกตสถานะได้โดยตรง เช่น Speech Recognition และ NLP

ความเชื่อมโยงกับสัปดาห์อื่น:

สัปดาห์ 6-7 (Logic): Probabilistic Logic Bridge ระหว่าง Logic และ Uncertainty
สัปดาห์ 9-10 (ML): Naive Bayes เป็น Baseline ML Classifier, Bayesian Optimization
สัปดาห์ 15 (NLP): HMM ใช้ใน POS Tagging, Language Models ใช้ Probabilistic Sampling

เอกสารอ้างอิง (References)

Russell, S., & Norvig, P. (2020). Artificial Intelligence: A Modern Approach (4th ed.). Pearson. — บทที่ 12-17: Quantifying Uncertainty & Probabilistic Reasoning
Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer. — บทที่ 8: Graphical Models, บทที่ 13: Sequential Data
Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective. MIT Press.
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.
Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann.
pgmpy Documentation. https://pgmpy.org/
hmmlearn Documentation. https://hmmlearn.readthedocs.io/
scikit-learn: Naive Bayes. https://scikit-learn.org/stable/modules/naive_bayes.html
Koller, D., & Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques. MIT Press.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. — บทที่ 3: Probability and Information Theory

เอกสารนี้จัดทำสำหรับรายวิชา Artificial Intelligence สาขาวิทยาการคอมพิวเตอร์ | มหาวิทยาลัยเทคโนโลยีราชมงคลศรีวิชัย

ปรับปรุงล่าสุด: กุมภาพันธ์ 2569