การค้นหาแบบแข่งขันและการเล่นเกม (Adversarial Search & Game Playing)

1. บทนำสู่การค้นหาแบบแข่งขัน

1.1 ความหมายและความสำคัญ

การค้นหาแบบแข่งขัน (Adversarial Search) คือเทคนิคการค้นหาที่ใช้ในสถานการณ์ที่มีตัวแทน (Agent) หลายฝ่ายที่มีเป้าหมายขัดแย้งกัน โดยทั่วไปจะพบในเกมที่มีผู้เล่นสองฝ่ายหรือมากกว่า ซึ่งการตัดสินใจของฝ่ายหนึ่งจะส่งผลกระทบต่อผลลัพธ์ของอีกฝ่ายหนึ่ง

ลักษณะสำคัญของการค้นหาแบบแข่งขัน:

สภาพแวดล้อมแบบหลายตัวแทน (Multi-agent Environment): มีผู้เล่นตั้งแต่ 2 ฝ่ายขึ้นไป
เป้าหมายที่ขัดแย้ง (Conflicting Goals): ชัยชนะของฝ่ายหนึ่งคือความพ่ายแพ้ของอีกฝ่าย
การตัดสินใจสลับกัน (Turn-based Decision): ผู้เล่นผลัดกันเลือกการกระทำ
ความไม่แน่นอน (Uncertainty): ไม่สามารถคาดเดาการตัดสินใจของคู่แข่งได้อย่างแน่นอน

1.2 ประเภทของเกมในปัญญาประดิษฐ์

graph TB
    subgraph GAMES["🎮 ประเภทของเกม (Game Types)"]
        A["เกมทั้งหมด
All Games"]
        
        subgraph DETERMINISTIC["เกมแบบกำหนดได้
Deterministic Games"]
            B["เกมข้อมูลสมบูรณ์
Perfect Information"]
            C["เกมข้อมูลไม่สมบูรณ์
Imperfect Information"]
        end
        
        subgraph STOCHASTIC["เกมแบบสุ่ม
Stochastic Games"]
            D["เกมมีโอกาส
Games with Chance"]
        end
        
        A --> DETERMINISTIC
        A --> STOCHASTIC
        
        B --> E["♟️ หมากรุก
Chess"]
        B --> F["⚫ โกะ
Go"]
        B --> G["⭕ Tic-Tac-Toe"]
        
        C --> H["🃏 โป๊กเกอร์
Poker"]
        C --> I["🎴 บริดจ์
Bridge"]
        
        D --> J["🎲 แบ็คแกมมอน
Backgammon"]
        D --> K["🎰 เกมลูกเต๋า
Dice Games"]
    end
    
    style A fill:#458588,stroke:#282828,color:#ebdbb2
    style B fill:#98971a,stroke:#282828,color:#282828
    style C fill:#d79921,stroke:#282828,color:#282828
    style D fill:#cc241d,stroke:#282828,color:#ebdbb2
    style E fill:#83a598,stroke:#282828,color:#282828
    style F fill:#83a598,stroke:#282828,color:#282828
    style G fill:#83a598,stroke:#282828,color:#282828
    style H fill:#fabd2f,stroke:#282828,color:#282828
    style I fill:#fabd2f,stroke:#282828,color:#282828
    style J fill:#fb4934,stroke:#282828,color:#282828
    style K fill:#fb4934,stroke:#282828,color:#282828

1.3 ประวัติศาสตร์และพัฒนาการ

graph TB
    subgraph ERA1["🕰️ ยุคบุกเบิก (1950-1970)"]
        A1["1950: Claude Shannon
เสนอแนวคิดหมากรุกคอมพิวเตอร์"]
        A2["1952: Arthur Samuel
โปรแกรมหมากฮอส"]
        A3["1958: Alpha-Beta
ถูกค้นพบ"]
    end
    
    subgraph ERA2["🖥️ ยุคพัฒนา (1970-1990)"]
        B1["1979: Backgammon
BKG 9.8 ชนะแชมป์โลก"]
        B2["1989: Deep Thought
เป็น Grandmaster"]
    end
    
    subgraph ERA3["🏆 ยุคแห่งชัยชนะ (1990-2010)"]
        C1["1997: Deep Blue
ชนะ Kasparov"]
        C2["2007: Checkers
ถูก Solved"]
    end
    
    subgraph ERA4["🤖 ยุค AI สมัยใหม่ (2010-ปัจจุบัน)"]
        D1["2016: AlphaGo
ชนะ Lee Sedol"]
        D2["2017: AlphaZero
เรียนรู้ด้วยตนเอง"]
        D3["2019: Pluribus
ชนะโป๊กเกอร์"]
    end
    
    A1 --> A2 --> A3 --> B1 --> B2 --> C1 --> C2 --> D1 --> D2 --> D3
    
    style A1 fill:#b16286,stroke:#282828,color:#ebdbb2
    style A2 fill:#b16286,stroke:#282828,color:#ebdbb2
    style A3 fill:#b16286,stroke:#282828,color:#ebdbb2
    style B1 fill:#689d6a,stroke:#282828,color:#282828
    style B2 fill:#689d6a,stroke:#282828,color:#282828
    style C1 fill:#d79921,stroke:#282828,color:#282828
    style C2 fill:#d79921,stroke:#282828,color:#282828
    style D1 fill:#458588,stroke:#282828,color:#ebdbb2
    style D2 fill:#458588,stroke:#282828,color:#ebdbb2
    style D3 fill:#458588,stroke:#282828,color:#ebdbb2

2. พื้นฐานทฤษฎีเกม (Game Theory Basics)

2.1 องค์ประกอบของเกม

ทฤษฎีเกม (Game Theory) เป็นสาขาของคณิตศาสตร์ที่ศึกษาการตัดสินใจเชิงกลยุทธ์ระหว่างผู้เล่นหลายฝ่าย องค์ประกอบหลักของเกมประกอบด้วย:

องค์ประกอบ	คำอธิบาย	ตัวอย่าง (หมากรุก)
ผู้เล่น (Players)	ตัวแทนที่ตัดสินใจในเกม	ขาว และ ดำ
สถานะ (States)	การจัดเรียงของเกมในขณะนั้น	ตำแหน่งตัวหมากบนกระดาน
การกระทำ (Actions)	ตัวเลือกที่ผู้เล่นสามารถทำได้	การเคลื่อนที่ของตัวหมาก
สถานะปลายทาง (Terminal States)	สถานะที่เกมจบลง	รุกจน, เสมอ
ฟังก์ชันผลตอบแทน (Utility Function)	ค่าที่บอกผลลัพธ์ของเกม	ชนะ=+1, แพ้=-1, เสมอ=0

2.2 เกมผลรวมเป็นศูนย์ (Zero-Sum Games)

เกมผลรวมเป็นศูนย์ คือเกมที่ผลรวมของผลตอบแทนของผู้เล่นทุกคนเท่ากับศูนย์เสมอ กล่าวคือ สิ่งที่ผู้เล่นคนหนึ่งได้ ผู้เล่นอีกคนจะเสียในปริมาณเท่ากัน

สมการผลรวมเป็นศูนย์:

\sum_{i = 1}^{n} U_{i} (s) = 0

โดยที่:

$U_{i} (s)$ = ผลตอบแทนของผู้เล่น i ในสถานะ s
n = จำนวนผู้เล่นทั้งหมด

สำหรับเกมสองผู้เล่น:

U_{1} (s) + U_{2} (s) = 0

หรือ

U_{1} (s) = - U_{2} (s)

2.3 ต้นไม้เกม (Game Tree)

ต้นไม้เกม คือโครงสร้างข้อมูลที่แสดงสถานะและการเคลื่อนไหวที่เป็นไปได้ทั้งหมดในเกม

graph TB
    subgraph GAMETREE["🌳 ต้นไม้เกม Tic-Tac-Toe (Game Tree)"]
        ROOT["สถานะเริ่มต้น
Initial State
กระดานว่าง"]
        
        subgraph LEVEL1["ระดับ 1: ตาของ X"]
            L1A["X เล่นกลาง"]
            L1B["X เล่นมุม"]
            L1C["X เล่นขอบ"]
        end
        
        subgraph LEVEL2["ระดับ 2: ตาของ O"]
            L2A["O ตอบโต้"]
            L2B["O ตอบโต้"]
        end
        
        subgraph TERMINAL["สถานะปลายทาง"]
            T1["X ชนะ
Utility: +1"]
            T2["O ชนะ
Utility: -1"]
            T3["เสมอ
Utility: 0"]
        end
        
        ROOT --> L1A
        ROOT --> L1B
        ROOT --> L1C
        L1A --> L2A
        L1B --> L2B
        L2A --> T1
        L2A --> T3
        L2B --> T2
    end
    
    style ROOT fill:#458588,stroke:#282828,color:#ebdbb2
    style L1A fill:#98971a,stroke:#282828,color:#282828
    style L1B fill:#98971a,stroke:#282828,color:#282828
    style L1C fill:#98971a,stroke:#282828,color:#282828
    style L2A fill:#d79921,stroke:#282828,color:#282828
    style L2B fill:#d79921,stroke:#282828,color:#282828
    style T1 fill:#b8bb26,stroke:#282828,color:#282828
    style T2 fill:#fb4934,stroke:#282828,color:#282828
    style T3 fill:#83a598,stroke:#282828,color:#282828

2.4 กลยุทธ์การเล่น

กลยุทธ์ (Strategy) คือแผนการที่ระบุว่าผู้เล่นจะเลือกการกระทำใดในแต่ละสถานะที่เป็นไปได้

ประเภทกลยุทธ์	คำอธิบาย
กลยุทธ์บริสุทธิ์ (Pure Strategy)	เลือกการกระทำเดียวแน่นอนในแต่ละสถานะ
กลยุทธ์ผสม (Mixed Strategy)	กำหนดความน่าจะเป็นให้กับการกระทำแต่ละอย่าง
กลยุทธ์เหนือกว่า (Dominant Strategy)	กลยุทธ์ที่ดีที่สุดไม่ว่าคู่แข่งจะเล่นอย่างไร

Nash Equilibrium:

จุดสมดุลแนช (Nash Equilibrium) คือสถานการณ์ที่ผู้เล่นทุกคนเลือกกลยุทธ์ที่ดีที่สุดสำหรับตนเอง โดยที่ไม่มีผู้เล่นคนใดต้องการเปลี่ยนกลยุทธ์หากผู้เล่นอื่นยังคงใช้กลยุทธ์เดิม

3. อัลกอริทึม Minimax (Minimax Algorithm)

3.1 หลักการทำงาน

อัลกอริทึม Minimax เป็นอัลกอริทึมแบบ recursive ที่ใช้ตัดสินใจในเกมผลรวมเป็นศูนย์สองผู้เล่น โดยสมมติว่าคู่แข่งจะเล่นอย่างดีที่สุดเสมอ

หลักการ:

ผู้เล่น MAX: พยายามเพิ่มค่า utility ให้มากที่สุด (ผู้เล่นฝ่ายเรา)
ผู้เล่น MIN: พยายามลดค่า utility ให้น้อยที่สุด (คู่แข่ง)

3.2 สมการ Minimax

ค่า Minimax ของโหนด:

minimax (s) = \{\begin{cases} utility (s) & ถ้า s เป็นสถานะปลายทาง \\ max_{a \in Actions (s)} minimax (Result (s, a)) & ถ้า s เป็นตาของ MAX \\ min_{a \in Actions (s)} minimax (Result (s, a)) & ถ้า s เป็นตาของ MIN \end{cases}

โดยที่:

s = สถานะปัจจุบัน
Actions(s) = เซตของการกระทำที่เป็นไปได้ในสถานะ s
Result(s, a) = สถานะใหม่หลังจากทำการกระทำ a ในสถานะ s
utility(s) = ค่าผลตอบแทนของสถานะปลายทาง s

3.3 ไดอะแกรมการทำงาน

graph TB
    subgraph MINIMAX["🎯 การทำงานของ Minimax"]
        MAX1["MAX
เลือก: max(3, 2) = 3"]
        
        MIN1["MIN
เลือก: min(3, 12, 8) = 3"]
        MIN2["MIN
เลือก: min(2, 4, 6) = 2"]
        
        T1["3"]
        T2["12"]
        T3["8"]
        T4["2"]
        T5["4"]
        T6["6"]
        
        MAX1 --> MIN1
        MAX1 --> MIN2
        
        MIN1 --> T1
        MIN1 --> T2
        MIN1 --> T3
        
        MIN2 --> T4
        MIN2 --> T5
        MIN2 --> T6
    end
    
    style MAX1 fill:#b8bb26,stroke:#282828,color:#282828
    style MIN1 fill:#fb4934,stroke:#282828,color:#282828
    style MIN2 fill:#fb4934,stroke:#282828,color:#282828
    style T1 fill:#83a598,stroke:#282828,color:#282828
    style T2 fill:#83a598,stroke:#282828,color:#282828
    style T3 fill:#83a598,stroke:#282828,color:#282828
    style T4 fill:#83a598,stroke:#282828,color:#282828
    style T5 fill:#83a598,stroke:#282828,color:#282828
    style T6 fill:#83a598,stroke:#282828,color:#282828

3.4 โค้ด Python: Minimax Algorithm

"""
อัลกอริทึม Minimax สำหรับเกม Tic-Tac-Toe
Minimax Algorithm for Tic-Tac-Toe Game

โมดูลนี้แสดงการทำงานของอัลกอริทึม Minimax
พร้อมตัวอย่างการใช้งานในเกม Tic-Tac-Toe
"""

from typing import List, Tuple, Optional
import math


class TicTacToe:
    """
    คลาสสำหรับเกม Tic-Tac-Toe พร้อม AI ที่ใช้ Minimax
    
    Attributes:
        board: กระดานเกมขนาด 3x3
        current_player: ผู้เล่นปัจจุบัน ('X' หรือ 'O')
    """
    
    def __init__(self):
        """เริ่มต้นเกมใหม่ด้วยกระดานว่าง"""
        # กระดาน 3x3 เริ่มต้นว่างเปล่า
        self.board: List[List[str]] = [[' ' for _ in range(3)] for _ in range(3)]
        # X เริ่มก่อนเสมอ
        self.current_player: str = 'X'
    
    def get_available_moves(self) -> List[Tuple[int, int]]:
        """
        หาตำแหน่งที่ยังว่างอยู่บนกระดาน
        
        Returns:
            รายการของ tuple (row, col) ที่สามารถเล่นได้
        """
        moves = []
        for i in range(3):
            for j in range(3):
                if self.board[i][j] == ' ':
                    moves.append((i, j))
        return moves
    
    def make_move(self, row: int, col: int) -> bool:
        """
        ทำการเล่นที่ตำแหน่งที่กำหนด
        
        Args:
            row: แถวที่ต้องการเล่น (0-2)
            col: คอลัมน์ที่ต้องการเล่น (0-2)
            
        Returns:
            True ถ้าเล่นสำเร็จ, False ถ้าตำแหน่งไม่ว่าง
        """
        if self.board[row][col] == ' ':
            self.board[row][col] = self.current_player
            # สลับผู้เล่น
            self.current_player = 'O' if self.current_player == 'X' else 'X'
            return True
        return False
    
    def undo_move(self, row: int, col: int):
        """
        ยกเลิกการเล่นที่ตำแหน่งที่กำหนด (สำหรับ backtracking)
        
        Args:
            row: แถวที่ต้องการยกเลิก
            col: คอลัมน์ที่ต้องการยกเลิก
        """
        self.board[row][col] = ' '
        # สลับผู้เล่นกลับ
        self.current_player = 'O' if self.current_player == 'X' else 'X'
    
    def check_winner(self) -> Optional[str]:
        """
        ตรวจสอบว่ามีผู้ชนะหรือยัง
        
        Returns:
            'X' หรือ 'O' ถ้ามีผู้ชนะ, None ถ้ายังไม่มี
        """
        # ตรวจสอบแถว
        for row in self.board:
            if row[0] == row[1] == row[2] != ' ':
                return row[0]
        
        # ตรวจสอบคอลัมน์
        for col in range(3):
            if self.board[0][col] == self.board[1][col] == self.board[2][col] != ' ':
                return self.board[0][col]
        
        # ตรวจสอบแนวทแยง
        if self.board[0][0] == self.board[1][1] == self.board[2][2] != ' ':
            return self.board[0][0]
        if self.board[0][2] == self.board[1][1] == self.board[2][0] != ' ':
            return self.board[0][2]
        
        return None
    
    def is_terminal(self) -> bool:
        """
        ตรวจสอบว่าเกมจบหรือยัง
        
        Returns:
            True ถ้าเกมจบ (มีผู้ชนะหรือเสมอ)
        """
        return self.check_winner() is not None or len(self.get_available_moves()) == 0
    
    def get_utility(self) -> int:
        """
        คำนวณค่า utility ของสถานะปัจจุบัน
        
        Returns:
            +1 ถ้า X ชนะ, -1 ถ้า O ชนะ, 0 ถ้าเสมอ
        """
        winner = self.check_winner()
        if winner == 'X':
            return 1
        elif winner == 'O':
            return -1
        return 0
    
    def minimax(self, is_maximizing: bool) -> int:
        """
        อัลกอริทึม Minimax แบบ recursive
        
        Args:
            is_maximizing: True ถ้าเป็นตาของ MAX (X), False ถ้าเป็นตาของ MIN (O)
            
        Returns:
            ค่า minimax ของสถานะปัจจุบัน
        """
        # Base case: ถ้าเกมจบแล้ว คืนค่า utility
        if self.is_terminal():
            return self.get_utility()
        
        if is_maximizing:
            # MAX พยายามหาค่าสูงสุด
            max_eval = -math.inf
            for row, col in self.get_available_moves():
                # ลองเล่น
                self.make_move(row, col)
                # คำนวณค่าของการเล่นนี้
                eval_score = self.minimax(False)
                # ยกเลิกการเล่น
                self.undo_move(row, col)
                # อัพเดทค่าสูงสุด
                max_eval = max(max_eval, eval_score)
            return max_eval
        else:
            # MIN พยายามหาค่าต่ำสุด
            min_eval = math.inf
            for row, col in self.get_available_moves():
                self.make_move(row, col)
                eval_score = self.minimax(True)
                self.undo_move(row, col)
                min_eval = min(min_eval, eval_score)
            return min_eval
    
    def get_best_move(self) -> Tuple[int, int]:
        """
        หาการเล่นที่ดีที่สุดสำหรับผู้เล่นปัจจุบัน
        
        Returns:
            tuple (row, col) ของตำแหน่งที่ดีที่สุด
        """
        best_move = None
        is_maximizing = (self.current_player == 'X')
        
        if is_maximizing:
            best_score = -math.inf
            for row, col in self.get_available_moves():
                self.make_move(row, col)
                score = self.minimax(False)
                self.undo_move(row, col)
                if score > best_score:
                    best_score = score
                    best_move = (row, col)
        else:
            best_score = math.inf
            for row, col in self.get_available_moves():
                self.make_move(row, col)
                score = self.minimax(True)
                self.undo_move(row, col)
                if score < best_score:
                    best_score = score
                    best_move = (row, col)
        
        return best_move
    
    def display(self):
        """แสดงกระดานเกมปัจจุบัน"""
        print("\n")
        for i, row in enumerate(self.board):
            print(f" {row[0]} | {row[1]} | {row[2]} ")
            if i < 2:
                print("-----------")
        print("\n")


# ตัวอย่างการใช้งาน
if __name__ == "__main__":
    print("=" * 50)
    print("ตัวอย่างการใช้งาน Minimax ใน Tic-Tac-Toe")
    print("=" * 50)
    
    # สร้างเกมใหม่
    game = TicTacToe()
    
    # จำลองการเล่น
    game.make_move(0, 0)  # X เล่นมุมบนซ้าย
    game.make_move(1, 1)  # O เล่นกลาง
    
    print("สถานะหลัง X เล่นมุมบนซ้าย และ O เล่นกลาง:")
    game.display()
    
    # หาการเล่นที่ดีที่สุดสำหรับ X
    best = game.get_best_move()
    print(f"การเล่นที่ดีที่สุดสำหรับ {game.current_player}: ตำแหน่ง {best}")
    
    # ทำการเล่น
    game.make_move(best[0], best[1])
    print(f"หลังจาก {game.current_player} เล่นตำแหน่งที่แนะนำ:")
    game.display()

ผลลัพธ์ตัวอย่าง:

==================================================
ตัวอย่างการใช้งาน Minimax ใน Tic-Tac-Toe
==================================================
สถานะหลัง X เล่นมุมบนซ้าย และ O เล่นกลาง:

 X |   |   
-----------
   | O |   
-----------
   |   |   

การเล่นที่ดีที่สุดสำหรับ X: ตำแหน่ง (0, 2)
หลังจาก X เล่นตำแหน่งที่แนะนำ:

 X |   | X 
-----------
   | O |   
-----------
   |   |

3.5 การวิเคราะห์ความซับซ้อน

ด้าน	ความซับซ้อน	คำอธิบาย
เวลา (Time)	O(b^m)	b = branching factor, m = ความลึกสูงสุด
พื้นที่ (Space)	O(bm)	เก็บ path ปัจจุบันใน stack

ตัวอย่างในหมากรุก:

Branching factor เฉลี่ย ≈ 35
ความลึกเฉลี่ย ≈ 80 ครึ่งตา
จำนวนโหนด ≈ 35⁸⁰ ≈ 10¹²³ (มากกว่าจำนวนอะตอมในจักรวาล!)

4. การตัดแต่งกิ่ง Alpha-Beta (Alpha-Beta Pruning)

4.1 แนวคิดหลัก

Alpha-Beta Pruning เป็นเทคนิคการปรับปรุง Minimax โดยการตัดกิ่งที่ไม่จำเป็นต้องค้นหาออก ทำให้ลดจำนวนโหนดที่ต้องสำรวจลงอย่างมาก โดยไม่กระทบต่อผลลัพธ์สุดท้าย

ตัวแปรสำคัญ:

Alpha (α): ค่าดีที่สุดที่ MAX รับประกันได้ (lower bound)
Beta (β): ค่าดีที่สุดที่ MIN รับประกันได้ (upper bound)

กฎการตัด:

Alpha cut-off: ที่โหนด MIN ถ้าค่า ≤ α ให้หยุดค้นหา
Beta cut-off: ที่โหนด MAX ถ้าค่า ≥ β ให้หยุดค้นหา

4.2 สมการ Alpha-Beta

alphabeta (s, α, β) = \{\begin{cases} utility (s) & ถ้า s เป็นสถานะปลายทาง \\ MAX พร้อม α-update และ β-cutoff & ถ้า s เป็นตาของ MAX \\ MIN พร้อม β-update และ α-cutoff & ถ้า s เป็นตาของ MIN \end{cases}

กฎการอัพเดท:

ที่โหนด MAX: $α = max (α, value)$

ที่โหนด MIN: $β = min (β, value)$

4.3 ไดอะแกรมการทำงาน

graph TB
    subgraph ALPHABETA["✂️ การตัดแต่งกิ่ง Alpha-Beta"]
        MAX1["MAX
α=-∞, β=+∞
ค่า: 3"]
        
        MIN1["MIN
α=-∞, β=+∞
ค่า: 3"]
        MIN2["MIN
α=3, β=+∞
❌ ถูกตัด!"]
        
        T1["3
α=3"]
        T2["12
ไม่ปรับปรุง"]
        T3["8
ไม่ปรับปรุง"]
        T4["2
β=2 ≤ α=3
⚡ Beta Cutoff!"]
        T5["❌ ไม่ถูกสำรวจ"]
        T6["❌ ไม่ถูกสำรวจ"]
        
        MAX1 --> MIN1
        MAX1 --> MIN2
        
        MIN1 --> T1
        MIN1 --> T2
        MIN1 --> T3
        
        MIN2 --> T4
        MIN2 -.-> T5
        MIN2 -.-> T6
    end
    
    style MAX1 fill:#b8bb26,stroke:#282828,color:#282828
    style MIN1 fill:#fb4934,stroke:#282828,color:#282828
    style MIN2 fill:#928374,stroke:#282828,color:#ebdbb2
    style T1 fill:#83a598,stroke:#282828,color:#282828
    style T2 fill:#83a598,stroke:#282828,color:#282828
    style T3 fill:#83a598,stroke:#282828,color:#282828
    style T4 fill:#fabd2f,stroke:#282828,color:#282828
    style T5 fill:#665c54,stroke:#282828,color:#ebdbb2
    style T6 fill:#665c54,stroke:#282828,color:#ebdbb2

4.4 โค้ด Python: Alpha-Beta Pruning

"""
อัลกอริทึม Alpha-Beta Pruning
Alpha-Beta Pruning Algorithm

การปรับปรุง Minimax ด้วยการตัดกิ่งที่ไม่จำเป็น
เพื่อเพิ่มประสิทธิภาพในการค้นหา
"""

from typing import Tuple, Optional
import math


class AlphaBetaGame:
    """
    คลาสแสดงการทำงานของ Alpha-Beta Pruning
    พร้อมตัวนับจำนวนโหนดที่สำรวจ
    """
    
    def __init__(self):
        """เริ่มต้นเกม Tic-Tac-Toe"""
        self.board = [[' ' for _ in range(3)] for _ in range(3)]
        self.current_player = 'X'
        # ตัวนับจำนวนโหนดที่สำรวจ
        self.nodes_explored = 0
    
    def get_available_moves(self):
        """หาตำแหน่งที่ว่าง"""
        return [(i, j) for i in range(3) for j in range(3) 
                if self.board[i][j] == ' ']
    
    def make_move(self, row: int, col: int):
        """ทำการเล่น"""
        if self.board[row][col] == ' ':
            self.board[row][col] = self.current_player
            self.current_player = 'O' if self.current_player == 'X' else 'X'
            return True
        return False
    
    def undo_move(self, row: int, col: int):
        """ยกเลิกการเล่น"""
        self.board[row][col] = ' '
        self.current_player = 'O' if self.current_player == 'X' else 'X'
    
    def check_winner(self) -> Optional[str]:
        """ตรวจสอบผู้ชนะ"""
        # ตรวจสอบแถว คอลัมน์ และแนวทแยง
        for i in range(3):
            if self.board[i][0] == self.board[i][1] == self.board[i][2] != ' ':
                return self.board[i][0]
            if self.board[0][i] == self.board[1][i] == self.board[2][i] != ' ':
                return self.board[0][i]
        if self.board[0][0] == self.board[1][1] == self.board[2][2] != ' ':
            return self.board[0][0]
        if self.board[0][2] == self.board[1][1] == self.board[2][0] != ' ':
            return self.board[0][2]
        return None
    
    def is_terminal(self) -> bool:
        """ตรวจสอบว่าเกมจบหรือยัง"""
        return self.check_winner() is not None or len(self.get_available_moves()) == 0
    
    def get_utility(self) -> int:
        """คำนวณค่า utility"""
        winner = self.check_winner()
        if winner == 'X':
            return 10
        elif winner == 'O':
            return -10
        return 0
    
    def minimax_no_pruning(self, is_maximizing: bool) -> int:
        """
        Minimax แบบไม่มี pruning (สำหรับเปรียบเทียบ)
        """
        self.nodes_explored += 1
        
        if self.is_terminal():
            return self.get_utility()
        
        if is_maximizing:
            max_eval = -math.inf
            for row, col in self.get_available_moves():
                self.make_move(row, col)
                eval_score = self.minimax_no_pruning(False)
                self.undo_move(row, col)
                max_eval = max(max_eval, eval_score)
            return max_eval
        else:
            min_eval = math.inf
            for row, col in self.get_available_moves():
                self.make_move(row, col)
                eval_score = self.minimax_no_pruning(True)
                self.undo_move(row, col)
                min_eval = min(min_eval, eval_score)
            return min_eval
    
    def alpha_beta(self, alpha: float, beta: float, is_maximizing: bool) -> int:
        """
        อัลกอริทึม Alpha-Beta Pruning
        
        Args:
            alpha: ค่าดีที่สุดที่ MAX รับประกันได้ (lower bound)
            beta: ค่าดีที่สุดที่ MIN รับประกันได้ (upper bound)
            is_maximizing: True ถ้าเป็นตาของ MAX
            
        Returns:
            ค่า minimax ของสถานะปัจจุบัน
        """
        # นับจำนวนโหนดที่สำรวจ
        self.nodes_explored += 1
        
        # Base case: ถ้าเกมจบแล้ว
        if self.is_terminal():
            return self.get_utility()
        
        if is_maximizing:
            # MAX node: พยายามเพิ่มค่าให้มากที่สุด
            max_eval = -math.inf
            for row, col in self.get_available_moves():
                # ลองเล่น
                self.make_move(row, col)
                # เรียก recursive
                eval_score = self.alpha_beta(alpha, beta, False)
                # ยกเลิกการเล่น
                self.undo_move(row, col)
                
                # อัพเดทค่าสูงสุด
                max_eval = max(max_eval, eval_score)
                # อัพเดท alpha
                alpha = max(alpha, eval_score)
                
                # Beta cutoff: ถ้าค่า >= beta ไม่ต้องดูต่อ
                # เพราะ MIN ที่ระดับบนจะไม่เลือก path นี้แน่นอน
                if beta <= alpha:
                    break  # ตัดกิ่ง!
            
            return max_eval
        else:
            # MIN node: พยายามลดค่าให้น้อยที่สุด
            min_eval = math.inf
            for row, col in self.get_available_moves():
                self.make_move(row, col)
                eval_score = self.alpha_beta(alpha, beta, True)
                self.undo_move(row, col)
                
                # อัพเดทค่าต่ำสุด
                min_eval = min(min_eval, eval_score)
                # อัพเดท beta
                beta = min(beta, eval_score)
                
                # Alpha cutoff: ถ้าค่า <= alpha ไม่ต้องดูต่อ
                # เพราะ MAX ที่ระดับบนจะไม่เลือก path นี้แน่นอน
                if beta <= alpha:
                    break  # ตัดกิ่ง!
            
            return min_eval
    
    def get_best_move_alphabeta(self) -> Tuple[int, int]:
        """
        หาการเล่นที่ดีที่สุดโดยใช้ Alpha-Beta Pruning
        
        Returns:
            tuple (row, col) ของตำแหน่งที่ดีที่สุด
        """
        best_move = None
        is_maximizing = (self.current_player == 'X')
        alpha = -math.inf
        beta = math.inf
        
        if is_maximizing:
            best_score = -math.inf
            for row, col in self.get_available_moves():
                self.make_move(row, col)
                score = self.alpha_beta(alpha, beta, False)
                self.undo_move(row, col)
                if score > best_score:
                    best_score = score
                    best_move = (row, col)
                alpha = max(alpha, score)
        else:
            best_score = math.inf
            for row, col in self.get_available_moves():
                self.make_move(row, col)
                score = self.alpha_beta(alpha, beta, True)
                self.undo_move(row, col)
                if score < best_score:
                    best_score = score
                    best_move = (row, col)
                beta = min(beta, score)
        
        return best_move
    
    def display(self):
        """แสดงกระดาน"""
        print("\n")
        for i, row in enumerate(self.board):
            print(f" {row[0]} | {row[1]} | {row[2]} ")
            if i < 2:
                print("-----------")
        print()


# ตัวอย่างการใช้งานและเปรียบเทียบประสิทธิภาพ
if __name__ == "__main__":
    print("=" * 60)
    print("เปรียบเทียบ Minimax vs Alpha-Beta Pruning")
    print("=" * 60)
    
    # ทดสอบจากกระดานว่าง
    game1 = AlphaBetaGame()
    game1.nodes_explored = 0
    _ = game1.minimax_no_pruning(True)
    nodes_minimax = game1.nodes_explored
    
    game2 = AlphaBetaGame()
    game2.nodes_explored = 0
    _ = game2.alpha_beta(-math.inf, math.inf, True)
    nodes_alphabeta = game2.nodes_explored
    
    print(f"\n📊 จำนวนโหนดที่สำรวจจากกระดานว่าง:")
    print(f"   Minimax:        {nodes_minimax:,} โหนด")
    print(f"   Alpha-Beta:     {nodes_alphabeta:,} โหนด")
    print(f"   ประหยัดได้:     {(1 - nodes_alphabeta/nodes_minimax)*100:.1f}%")
    
    # ทดสอบจากกระดานที่มีการเล่นแล้ว
    print("\n" + "-" * 60)
    game3 = AlphaBetaGame()
    game3.make_move(0, 0)  # X
    game3.make_move(1, 1)  # O
    game3.make_move(2, 2)  # X
    game3.display()
    
    game3.nodes_explored = 0
    _ = game3.minimax_no_pruning(False)  # ตาของ O
    nodes_minimax_2 = game3.nodes_explored
    
    game4 = AlphaBetaGame()
    game4.make_move(0, 0)
    game4.make_move(1, 1)
    game4.make_move(2, 2)
    game4.nodes_explored = 0
    _ = game4.alpha_beta(-math.inf, math.inf, False)
    nodes_alphabeta_2 = game4.nodes_explored
    
    print(f"📊 จำนวนโหนดจากสถานะนี้ (ตาของ O):")
    print(f"   Minimax:        {nodes_minimax_2:,} โหนด")
    print(f"   Alpha-Beta:     {nodes_alphabeta_2:,} โหนด")
    print(f"   ประหยัดได้:     {(1 - nodes_alphabeta_2/nodes_minimax_2)*100:.1f}%")
    
    # หาการเล่นที่ดีที่สุด
    best = game4.get_best_move_alphabeta()
    print(f"\n🎯 การเล่นที่ดีที่สุดสำหรับ O: ตำแหน่ง {best}")

ผลลัพธ์ตัวอย่าง:

============================================================
เปรียบเทียบ Minimax vs Alpha-Beta Pruning
============================================================

📊 จำนวนโหนดที่สำรวจจากกระดานว่าง:
   Minimax:        549,946 โหนด
   Alpha-Beta:     29,509 โหนด
   ประหยัดได้:     94.6%

------------------------------------------------------------

 X |   |   
-----------
   | O |   
-----------
   |   | X 

📊 จำนวนโหนดจากสถานะนี้ (ตาของ O):
   Minimax:        7,332 โหนด
   Alpha-Beta:     588 โหนด
   ประหยัดได้:     92.0%

🎯 การเล่นที่ดีที่สุดสำหรับ O: ตำแหน่ง (0, 2)

4.5 การวิเคราะห์ประสิทธิภาพ

สถานการณ์	ความซับซ้อนเวลา	คำอธิบาย
กรณีแย่ที่สุด	O(b^m)	เหมือน Minimax (ไม่มีการตัด)
กรณีดีที่สุด	O(b^m/2)	ตัดได้ครึ่งหนึ่งทุกระดับ
กรณีเฉลี่ย	O(b^3m/4)	ด้วยการเรียงลำดับแบบสุ่ม

การปรับปรุงด้วย Move Ordering:

เรียงลำดับการเคลื่อนไหวจากดีไปแย่
ทำให้ cutoff เกิดเร็วขึ้น
สามารถเข้าใกล้กรณีดีที่สุดได้

ผลกระทบในเกมจริง:

หมากรุก: สามารถค้นหาลึกขึ้นเกือบ 2 เท่า
โกะ: เพิ่มประสิทธิภาพอย่างมีนัยสำคัญ

5. การค้นหาต้นไม้แบบ Monte Carlo (Monte Carlo Tree Search - MCTS)

5.1 แนวคิดหลัก

Monte Carlo Tree Search (MCTS) เป็นอัลกอริทึมการค้นหาที่ใช้การสุ่มตัวอย่าง (Sampling) เพื่อประเมินค่าของการเคลื่อนไหว แทนที่จะค้นหาแบบครอบคลุมทั้งหมดเหมือน Minimax

ข้อดีของ MCTS:

ไม่ต้องการ heuristic evaluation function
เหมาะกับเกมที่มี branching factor สูง (เช่น โกะ)
สามารถหยุดได้ทุกเมื่อและให้คำตอบที่ดีที่สุดในขณะนั้น
ปรับตัวได้ดีกับเวลาที่มี

5.2 สี่ขั้นตอนหลักของ MCTS

graph LR
    subgraph MCTS["🎲 วงจร MCTS (MCTS Cycle)"]
        A["1️⃣ Selection
การเลือก
เลือก node ที่น่าสนใจ"]
        B["2️⃣ Expansion
การขยาย
เพิ่ม child node ใหม่"]
        C["3️⃣ Simulation
การจำลอง
เล่นแบบสุ่มจนจบ"]
        D["4️⃣ Backpropagation
การย้อนกลับ
อัพเดทค่าสถิติ"]
        
        A --> B --> C --> D --> A
    end
    
    style A fill:#458588,stroke:#282828,color:#ebdbb2
    style B fill:#98971a,stroke:#282828,color:#282828
    style C fill:#d79921,stroke:#282828,color:#282828
    style D fill:#cc241d,stroke:#282828,color:#ebdbb2

รายละเอียดแต่ละขั้นตอน:

Selection (การเลือก):
- เริ่มจาก root node
- ใช้ policy (เช่น UCB1) เลือก child node
- ทำซ้ำจนถึง node ที่ยังไม่ถูกขยายเต็มที่
Expansion (การขยาย):
- เพิ่ม child node ใหม่หนึ่งหรือหลาย node
- แทนการเคลื่อนไหวที่ยังไม่ถูกลอง
Simulation (การจำลอง):
- เล่นเกมแบบสุ่มจนจบ (random playout)
- หรือใช้ lightweight policy
Backpropagation (การย้อนกลับ):
- อัพเดทสถิติ (จำนวนครั้งที่เยือน, จำนวนชนะ)
- ย้อนกลับจาก leaf ไป root

5.3 สูตร UCB1 (Upper Confidence Bound)

UCB1 Formula:

UCB1 (v_{i}) = \frac{w_{i}}{n_{i}} + c \sqrt{\frac{ln N}{n_{i}}}

โดยที่:

$w_{i}$ = จำนวนครั้งที่ชนะหลังจากเลือก node i
$n_{i}$ = จำนวนครั้งที่เยือน node i
N = จำนวนครั้งที่เยือน parent node
c = ค่าคงที่สำรวจ (exploration constant) โดยทั่วไปใช้ √2

สองส่วนของ UCB1:

ส่วน	สูตร	ความหมาย
Exploitation	w_i/n_i	อัตราชนะเฉลี่ย (ค่าที่ดีที่สุดที่รู้)
Exploration	c√(ln N / n_i)	โบนัสสำหรับ node ที่ยังไม่ถูกสำรวจมาก

5.4 ไดอะแกรมการทำงานโดยละเอียด

graph TB
    subgraph MCTS_DETAIL["🌳 โครงสร้างต้นไม้ MCTS"]
        ROOT["Root
N=100, W=45"]
        
        A["Node A
n=40, w=22
UCB=0.55+0.48=1.03"]
        B["Node B
n=35, w=15
UCB=0.43+0.51=0.94"]
        C["Node C
n=25, w=8
UCB=0.32+0.57=0.89"]
        
        A1["A1
n=20, w=12"]
        A2["A2
n=15, w=8"]
        A3["🆕 ยังไม่ขยาย"]
        
        B1["B1
n=20, w=10"]
        B2["B2
n=15, w=5"]
        
        ROOT --> A
        ROOT --> B
        ROOT --> C
        
        A --> A1
        A --> A2
        A --> A3
        
        B --> B1
        B --> B2
    end
    
    style ROOT fill:#458588,stroke:#282828,color:#ebdbb2
    style A fill:#b8bb26,stroke:#282828,color:#282828
    style B fill:#83a598,stroke:#282828,color:#282828
    style C fill:#83a598,stroke:#282828,color:#282828
    style A1 fill:#fabd2f,stroke:#282828,color:#282828
    style A2 fill:#fabd2f,stroke:#282828,color:#282828
    style A3 fill:#d3869b,stroke:#282828,color:#282828
    style B1 fill:#fabd2f,stroke:#282828,color:#282828
    style B2 fill:#fabd2f,stroke:#282828,color:#282828

5.5 โค้ด Python: MCTS

"""
Monte Carlo Tree Search (MCTS)
การค้นหาต้นไม้แบบ Monte Carlo

อัลกอริทึมที่ใช้การสุ่มตัวอย่างเพื่อประเมินการเคลื่อนไหว
เหมาะสำหรับเกมที่มี branching factor สูงเช่น โกะ
"""

import math
import random
from typing import List, Optional, Tuple
from copy import deepcopy


class MCTSNode:
    """
    โหนดในต้นไม้ MCTS
    
    Attributes:
        state: สถานะของเกม ณ โหนดนี้
        parent: โหนดแม่
        children: รายการโหนดลูก
        visits: จำนวนครั้งที่เยือน
        wins: จำนวนครั้งที่ชนะ
        untried_moves: การเคลื่อนไหวที่ยังไม่ได้ลอง
        move: การเคลื่อนไหวที่นำมาสู่โหนดนี้
    """
    
    def __init__(self, state, parent=None, move=None):
        """
        สร้างโหนดใหม่
        
        Args:
            state: สถานะเกม (copy ของ game object)
            parent: โหนดแม่
            move: การเคลื่อนไหวที่นำมาสู่โหนดนี้
        """
        self.state = state
        self.parent = parent
        self.move = move
        self.children: List['MCTSNode'] = []
        self.visits: int = 0
        self.wins: float = 0.0
        self.untried_moves: List[Tuple[int, int]] = state.get_available_moves()
    
    def ucb1(self, exploration_constant: float = 1.414) -> float:
        """
        คำนวณค่า UCB1 ของโหนด
        
        Args:
            exploration_constant: ค่าคงที่ c สำหรับควบคุม exploration
            
        Returns:
            ค่า UCB1
        """
        if self.visits == 0:
            return float('inf')  # ยังไม่เคยเยือน ให้ค่าสูงสุด
        
        # Exploitation: อัตราชนะเฉลี่ย
        exploitation = self.wins / self.visits
        
        # Exploration: โบนัสสำหรับ node ที่ยังไม่ถูกสำรวจมาก
        exploration = exploration_constant * math.sqrt(
            math.log(self.parent.visits) / self.visits
        )
        
        return exploitation + exploration
    
    def select_child(self) -> 'MCTSNode':
        """
        เลือกโหนดลูกที่มีค่า UCB1 สูงสุด
        
        Returns:
            โหนดลูกที่ถูกเลือก
        """
        return max(self.children, key=lambda child: child.ucb1())
    
    def expand(self) -> 'MCTSNode':
        """
        ขยายต้นไม้โดยเพิ่มโหนดลูกใหม่
        
        Returns:
            โหนดลูกที่ถูกสร้างใหม่
        """
        # เลือกการเคลื่อนไหวที่ยังไม่ได้ลอง
        move = self.untried_moves.pop()
        
        # สร้างสถานะใหม่
        new_state = deepcopy(self.state)
        new_state.make_move(move[0], move[1])
        
        # สร้างโหนดลูก
        child = MCTSNode(new_state, parent=self, move=move)
        self.children.append(child)
        
        return child
    
    def is_fully_expanded(self) -> bool:
        """ตรวจสอบว่าโหนดถูกขยายเต็มที่หรือยัง"""
        return len(self.untried_moves) == 0
    
    def is_terminal(self) -> bool:
        """ตรวจสอบว่าเป็นสถานะปลายทางหรือไม่"""
        return self.state.is_terminal()
    
    def backpropagate(self, result: float):
        """
        ย้อนกลับอัพเดทสถิติจากโหนดนี้ไป root
        
        Args:
            result: ผลลัพธ์ของ simulation (1=ชนะ, 0=แพ้, 0.5=เสมอ)
        """
        node = self
        while node is not None:
            node.visits += 1
            # ผลลัพธ์จะสลับมุมมองในแต่ละระดับ
            node.wins += result
            # สลับผลลัพธ์สำหรับผู้เล่นคนถัดไป
            result = 1 - result
            node = node.parent


class MCTSGame:
    """
    เกม Tic-Tac-Toe สำหรับใช้กับ MCTS
    """
    
    def __init__(self):
        """เริ่มต้นเกมใหม่"""
        self.board = [[' ' for _ in range(3)] for _ in range(3)]
        self.current_player = 'X'
    
    def get_available_moves(self) -> List[Tuple[int, int]]:
        """หาตำแหน่งที่ว่าง"""
        return [(i, j) for i in range(3) for j in range(3) 
                if self.board[i][j] == ' ']
    
    def make_move(self, row: int, col: int) -> bool:
        """ทำการเล่น"""
        if self.board[row][col] == ' ':
            self.board[row][col] = self.current_player
            self.current_player = 'O' if self.current_player == 'X' else 'X'
            return True
        return False
    
    def check_winner(self) -> Optional[str]:
        """ตรวจสอบผู้ชนะ"""
        for i in range(3):
            if self.board[i][0] == self.board[i][1] == self.board[i][2] != ' ':
                return self.board[i][0]
            if self.board[0][i] == self.board[1][i] == self.board[2][i] != ' ':
                return self.board[0][i]
        if self.board[0][0] == self.board[1][1] == self.board[2][2] != ' ':
            return self.board[0][0]
        if self.board[0][2] == self.board[1][1] == self.board[2][0] != ' ':
            return self.board[0][2]
        return None
    
    def is_terminal(self) -> bool:
        """ตรวจสอบว่าเกมจบหรือยัง"""
        return self.check_winner() is not None or len(self.get_available_moves()) == 0
    
    def get_result(self, player: str) -> float:
        """
        คืนผลลัพธ์สำหรับผู้เล่นที่ระบุ
        
        Args:
            player: 'X' หรือ 'O'
            
        Returns:
            1.0 ถ้าชนะ, 0.0 ถ้าแพ้, 0.5 ถ้าเสมอ
        """
        winner = self.check_winner()
        if winner == player:
            return 1.0
        elif winner is None:
            return 0.5
        else:
            return 0.0
    
    def display(self):
        """แสดงกระดาน"""
        print()
        for i, row in enumerate(self.board):
            print(f" {row[0]} | {row[1]} | {row[2]} ")
            if i < 2:
                print("-----------")
        print()


def mcts_search(root_state: MCTSGame, num_simulations: int = 1000) -> Tuple[int, int]:
    """
    ทำ MCTS search เพื่อหาการเคลื่อนไหวที่ดีที่สุด
    
    Args:
        root_state: สถานะเริ่มต้นของเกม
        num_simulations: จำนวน simulations ที่จะทำ
        
    Returns:
        tuple (row, col) ของการเคลื่อนไหวที่ดีที่สุด
    """
    # สร้าง root node
    root = MCTSNode(deepcopy(root_state))
    
    # ทำ simulations
    for _ in range(num_simulations):
        node = root
        state = deepcopy(root_state)
        
        # 1. Selection: เลือก node จนถึงที่ยังไม่ถูกขยายเต็มที่
        while node.is_fully_expanded() and not node.is_terminal():
            node = node.select_child()
            if node.move:
                state.make_move(node.move[0], node.move[1])
        
        # 2. Expansion: ขยายต้นไม้
        if not node.is_terminal() and not node.is_fully_expanded():
            node = node.expand()
            if node.move:
                state.make_move(node.move[0], node.move[1])
        
        # 3. Simulation: เล่นแบบสุ่มจนจบ
        sim_state = deepcopy(state)
        while not sim_state.is_terminal():
            moves = sim_state.get_available_moves()
            move = random.choice(moves)
            sim_state.make_move(move[0], move[1])
        
        # 4. Backpropagation: อัพเดทสถิติ
        # ผู้เล่นที่เพิ่งเล่นคือคนก่อนหน้า current_player
        player_just_moved = 'O' if state.current_player == 'X' else 'X'
        result = sim_state.get_result(player_just_moved)
        node.backpropagate(result)
    
    # เลือกโหนดลูกที่ถูกเยือนมากที่สุด
    best_child = max(root.children, key=lambda c: c.visits)
    
    return best_child.move


def simulate_random_playout(state: MCTSGame) -> float:
    """
    จำลองการเล่นแบบสุ่มจนจบเกม
    
    Args:
        state: สถานะเริ่มต้น
        
    Returns:
        ผลลัพธ์ 1.0/0.5/0.0
    """
    sim_state = deepcopy(state)
    player = sim_state.current_player
    
    while not sim_state.is_terminal():
        moves = sim_state.get_available_moves()
        move = random.choice(moves)
        sim_state.make_move(move[0], move[1])
    
    return sim_state.get_result(player)


# ตัวอย่างการใช้งาน
if __name__ == "__main__":
    print("=" * 60)
    print("ตัวอย่างการใช้งาน Monte Carlo Tree Search")
    print("=" * 60)
    
    # สร้างเกมใหม่
    game = MCTSGame()
    
    # จำลองการเล่นไปบางส่วน
    game.make_move(0, 0)  # X
    game.make_move(1, 1)  # O
    
    print("\nสถานะปัจจุบัน (ตาของ X):")
    game.display()
    
    # ใช้ MCTS หาการเล่นที่ดีที่สุด
    print("กำลังค้นหาด้วย MCTS (1000 simulations)...")
    best_move = mcts_search(game, num_simulations=1000)
    print(f"🎯 การเล่นที่ดีที่สุดสำหรับ X: ตำแหน่ง {best_move}")
    
    # แสดงผลลัพธ์
    game.make_move(best_move[0], best_move[1])
    print("\nหลังจากเล่นตามคำแนะนำ:")
    game.display()
    
    # เปรียบเทียบจำนวน simulations
    print("\n" + "-" * 60)
    print("เปรียบเทียบผลลัพธ์ตามจำนวน simulations:")
    print("-" * 60)
    
    for num_sims in [10, 100, 500, 1000, 5000]:
        test_game = MCTSGame()
        test_game.make_move(0, 0)
        test_game.make_move(1, 1)
        
        move = mcts_search(test_game, num_simulations=num_sims)
        print(f"   {num_sims:5} simulations -> ตำแหน่ง {move}")

ผลลัพธ์ตัวอย่าง:

============================================================
ตัวอย่างการใช้งาน Monte Carlo Tree Search
============================================================

สถานะปัจจุบัน (ตาของ X):

 X |   |   
-----------
   | O |   
-----------
   |   |   

กำลังค้นหาด้วย MCTS (1000 simulations)...
🎯 การเล่นที่ดีที่สุดสำหรับ X: ตำแหน่ง (0, 2)

หลังจากเล่นตามคำแนะนำ:

 X |   | X 
-----------
   | O |   
-----------
   |   |   

------------------------------------------------------------
เปรียบเทียบผลลัพธ์ตามจำนวน simulations:
------------------------------------------------------------
      10 simulations -> ตำแหน่ง (2, 0)
     100 simulations -> ตำแหน่ง (0, 2)
     500 simulations -> ตำแหน่ง (0, 2)
    1000 simulations -> ตำแหน่ง (0, 2)
    5000 simulations -> ตำแหน่ง (0, 2)

5.6 เปรียบเทียบ MCTS กับ Minimax

ด้าน	Minimax + Alpha-Beta	MCTS
ต้องการ Evaluation Function	ใช่ (สำหรับเกมที่ลึก)	ไม่ (ใช้ random playouts)
การใช้หน่วยความจำ	O(bm)	O(จำนวน simulations)
เหมาะกับเกม	Branching factor ต่ำ-กลาง	Branching factor สูง
การหยุดก่อนเวลา	ต้องถึงความลึกที่กำหนด	หยุดได้ทุกเมื่อ
ตัวอย่างเกม	หมากรุก, Checkers	โกะ, เกมวางแผน
การรับประกันผลลัพธ์	Optimal (ถ้าค้นหาจนสุด)	Asymptotically optimal

6. การประยุกต์ใช้กับเกมต่าง ๆ (Game Applications)

6.1 การประยุกต์ใช้กับหมากรุก (Chess)

ความท้าทายของหมากรุก:

Branching factor เฉลี่ย ≈ 35
ความลึกเฉลี่ย ≈ 80 half-moves
State space ≈ 10⁴⁷

เทคนิคที่ใช้:

Alpha-Beta Pruning พร้อม Move Ordering
Iterative Deepening
Transposition Tables
Evaluation Functions ที่ซับซ้อน

"""
ตัวอย่างการใช้งาน python-chess library
สำหรับการพัฒนา Chess AI
"""

# หมายเหตุ: ต้องติดตั้ง python-chess ก่อน
# pip install python-chess

import chess
import chess.engine


def simple_chess_evaluation(board: chess.Board) -> int:
    """
    ฟังก์ชันประเมินค่ากระดานหมากรุกอย่างง่าย
    
    Args:
        board: กระดานหมากรุก
        
    Returns:
        ค่าประเมิน (บวก = ดีสำหรับขาว, ลบ = ดีสำหรับดำ)
    """
    # ค่าของตัวหมากแต่ละประเภท
    piece_values = {
        chess.PAWN: 100,
        chess.KNIGHT: 320,
        chess.BISHOP: 330,
        chess.ROOK: 500,
        chess.QUEEN: 900,
        chess.KING: 20000
    }
    
    score = 0
    
    # นับค่าตัวหมากทั้งหมด
    for piece_type in piece_values:
        # ตัวหมากขาว (บวก)
        white_pieces = len(board.pieces(piece_type, chess.WHITE))
        # ตัวหมากดำ (ลบ)
        black_pieces = len(board.pieces(piece_type, chess.BLACK))
        
        score += piece_values[piece_type] * (white_pieces - black_pieces)
    
    # โบนัสสำหรับการควบคุมกลางกระดาน
    center_squares = [chess.D4, chess.D5, chess.E4, chess.E5]
    for square in center_squares:
        piece = board.piece_at(square)
        if piece:
            if piece.color == chess.WHITE:
                score += 10
            else:
                score -= 10
    
    return score


def chess_minimax(board: chess.Board, depth: int, alpha: float, beta: float, 
                  is_maximizing: bool) -> float:
    """
    Minimax with Alpha-Beta Pruning สำหรับหมากรุก
    
    Args:
        board: กระดานหมากรุก
        depth: ความลึกที่เหลือ
        alpha: ค่า alpha
        beta: ค่า beta
        is_maximizing: True ถ้าเป็นตาของ MAX (ขาว)
        
    Returns:
        ค่าประเมินที่ดีที่สุด
    """
    if depth == 0 or board.is_game_over():
        return simple_chess_evaluation(board)
    
    if is_maximizing:
        max_eval = float('-inf')
        for move in board.legal_moves:
            board.push(move)
            eval_score = chess_minimax(board, depth - 1, alpha, beta, False)
            board.pop()
            max_eval = max(max_eval, eval_score)
            alpha = max(alpha, eval_score)
            if beta <= alpha:
                break
        return max_eval
    else:
        min_eval = float('inf')
        for move in board.legal_moves:
            board.push(move)
            eval_score = chess_minimax(board, depth - 1, alpha, beta, True)
            board.pop()
            min_eval = min(min_eval, eval_score)
            beta = min(beta, eval_score)
            if beta <= alpha:
                break
        return min_eval


def get_best_chess_move(board: chess.Board, depth: int = 3) -> chess.Move:
    """
    หาการเดินที่ดีที่สุดสำหรับผู้เล่นปัจจุบัน
    
    Args:
        board: กระดานหมากรุก
        depth: ความลึกในการค้นหา
        
    Returns:
        การเดินที่ดีที่สุด
    """
    best_move = None
    is_white = board.turn == chess.WHITE
    
    if is_white:
        best_score = float('-inf')
        for move in board.legal_moves:
            board.push(move)
            score = chess_minimax(board, depth - 1, float('-inf'), 
                                 float('inf'), False)
            board.pop()
            if score > best_score:
                best_score = score
                best_move = move
    else:
        best_score = float('inf')
        for move in board.legal_moves:
            board.push(move)
            score = chess_minimax(board, depth - 1, float('-inf'), 
                                 float('inf'), True)
            board.pop()
            if score < best_score:
                best_score = score
                best_move = move
    
    return best_move


# ตัวอย่างการใช้งาน
if __name__ == "__main__":
    print("=" * 50)
    print("Chess AI with Minimax + Alpha-Beta")
    print("=" * 50)
    
    # สร้างกระดานใหม่
    board = chess.Board()
    
    # แสดงกระดานเริ่มต้น
    print("\nกระดานเริ่มต้น:")
    print(board)
    
    # หาการเดินที่ดีที่สุดสำหรับขาว
    print("\nกำลังค้นหาการเดินที่ดีที่สุด (depth=3)...")
    best = get_best_chess_move(board, depth=3)
    print(f"การเดินที่ดีที่สุดสำหรับขาว: {best}")
    
    # ทำการเดิน
    board.push(best)
    print("\nหลังจากเดิน:")
    print(board)

6.2 การประยุกต์ใช้กับโกะ (Go)

ความท้าทายของโกะ:

Branching factor เฉลี่ย ≈ 250
กระดาน 19×19 = 361 ตำแหน่ง
State space ≈ 10¹⁷⁰
ยากต่อการสร้าง evaluation function

ทำไม MCTS ถึงเหมาะกับโกะ:

ไม่ต้องการ evaluation function ที่แม่นยำ
สามารถจัดการกับ branching factor สูงได้
ให้ผลลัพธ์ที่ดีแม้จะค้นหาไม่ลึก

AlphaGo และ AlphaZero:

ใช้ MCTS ร่วมกับ Deep Neural Networks
Policy Network: ทำนายการเดินที่ดี
Value Network: ประเมินค่าตำแหน่ง
Self-play: เรียนรู้จากการเล่นกับตัวเอง

graph TB
    subgraph ALPHAGO["🤖 สถาปัตยกรรม AlphaGo"]
        INPUT["สถานะกระดาน
Board State"]
        
        subgraph NETWORKS["Neural Networks"]
            POLICY["Policy Network
ทำนายการเดินที่ดี
P(a|s)"]
            VALUE["Value Network
ประเมินค่าตำแหน่ง
V(s)"]
        end
        
        MCTS_NODE["MCTS
ค้นหาและปรับปรุง"]
        
        OUTPUT["การเดินที่ดีที่สุด
Best Move"]
        
        INPUT --> POLICY
        INPUT --> VALUE
        POLICY --> MCTS_NODE
        VALUE --> MCTS_NODE
        MCTS_NODE --> OUTPUT
    end
    
    style INPUT fill:#458588,stroke:#282828,color:#ebdbb2
    style POLICY fill:#98971a,stroke:#282828,color:#282828
    style VALUE fill:#d79921,stroke:#282828,color:#282828
    style MCTS_NODE fill:#b16286,stroke:#282828,color:#ebdbb2
    style OUTPUT fill:#cc241d,stroke:#282828,color:#ebdbb2

6.3 ตารางเปรียบเทียบเกมและอัลกอริทึม

เกม	Branching Factor	State Space	อัลกอริทึมที่เหมาะ	หมายเหตุ
Tic-Tac-Toe	~4	5,478	Minimax	Solved game
Connect Four	~7	10¹³	Alpha-Beta	Solved (1988)
Checkers	~2.8	10²⁰	Alpha-Beta	Solved (2007)
Chess	~35	10⁴⁷	Alpha-Beta + NN	Deep Blue, Stockfish
Shogi	~80	10⁷¹	MCTS + NN	AlphaZero
Go	~250	10¹⁷⁰	MCTS + NN	AlphaGo, AlphaZero
Poker	Variable	Very large	CFR, RL	Libratus, Pluribus

6.4 การพัฒนา Game AI สมัยใหม่

"""
โครงสร้างพื้นฐานสำหรับ Game AI สมัยใหม่
ที่ผสมผสาน MCTS กับ Neural Networks
"""

from abc import ABC, abstractmethod
from typing import List, Tuple, Any
import numpy as np


class GameState(ABC):
    """
    Abstract class สำหรับสถานะเกมทั่วไป
    เป็น interface ที่ game AI ต้องการ
    """
    
    @abstractmethod
    def get_legal_actions(self) -> List[Any]:
        """คืนรายการ actions ที่ถูกกฎหมาย"""
        pass
    
    @abstractmethod
    def apply_action(self, action: Any) -> 'GameState':
        """คืนสถานะใหม่หลังทำ action"""
        pass
    
    @abstractmethod
    def is_terminal(self) -> bool:
        """ตรวจสอบว่าเกมจบหรือยัง"""
        pass
    
    @abstractmethod
    def get_reward(self, player: int) -> float:
        """คืน reward สำหรับผู้เล่นที่ระบุ"""
        pass
    
    @abstractmethod
    def get_current_player(self) -> int:
        """คืนผู้เล่นปัจจุบัน"""
        pass
    
    @abstractmethod
    def to_tensor(self) -> np.ndarray:
        """แปลงสถานะเป็น tensor สำหรับ neural network"""
        pass


class NeuralNetwork(ABC):
    """
    Abstract class สำหรับ Neural Network
    ใช้ร่วมกับ MCTS
    """
    
    @abstractmethod
    def predict(self, state: GameState) -> Tuple[np.ndarray, float]:
        """
        ทำนาย policy และ value
        
        Returns:
            Tuple ของ:
            - policy: probability distribution over actions
            - value: ค่าประเมินสถานะ
        """
        pass
    
    @abstractmethod
    def train(self, states: List[GameState], 
              policies: List[np.ndarray], 
              values: List[float]):
        """
        Train network จากข้อมูล self-play
        """
        pass


class MCTSWithNN:
    """
    MCTS ที่ใช้ Neural Network เป็น guide
    คล้ายกับ AlphaZero
    """
    
    def __init__(self, network: NeuralNetwork, 
                 num_simulations: int = 800,
                 c_puct: float = 1.0):
        """
        Args:
            network: Neural network สำหรับ policy และ value
            num_simulations: จำนวน simulations ต่อการค้นหา
            c_puct: ค่าคงที่สำหรับ exploration
        """
        self.network = network
        self.num_simulations = num_simulations
        self.c_puct = c_puct
    
    def search(self, root_state: GameState) -> List[float]:
        """
        ทำ MCTS search และคืน probability distribution
        
        Args:
            root_state: สถานะเริ่มต้น
            
        Returns:
            Probability distribution over actions
        """
        # สร้างโครงสร้าง tree
        root = self._create_node(root_state)
        
        # ทำ simulations
        for _ in range(self.num_simulations):
            node = root
            path = [node]
            
            # Selection
            while node.is_expanded and not node.state.is_terminal():
                action, node = self._select_child(node)
                path.append(node)
            
            # Expansion and Evaluation
            if not node.state.is_terminal():
                policy, value = self.network.predict(node.state)
                self._expand(node, policy)
            else:
                value = node.state.get_reward(root_state.get_current_player())
            
            # Backpropagation
            self._backpropagate(path, value)
        
        # คำนวณ probability distribution
        visits = [child.visit_count for child in root.children.values()]
        total = sum(visits)
        return [v / total for v in visits]
    
    def _create_node(self, state: GameState) -> 'MCTSNode':
        """สร้าง node ใหม่"""
        # Implementation details...
        pass
    
    def _select_child(self, node: 'MCTSNode') -> Tuple[Any, 'MCTSNode']:
        """เลือก child ด้วย PUCT formula"""
        best_score = float('-inf')
        best_action = None
        best_child = None
        
        for action, child in node.children.items():
            # PUCT formula (AlphaZero style)
            q_value = child.value_sum / child.visit_count if child.visit_count > 0 else 0
            u_value = self.c_puct * child.prior * np.sqrt(node.visit_count) / (1 + child.visit_count)
            score = q_value + u_value
            
            if score > best_score:
                best_score = score
                best_action = action
                best_child = child
        
        return best_action, best_child
    
    def _expand(self, node: 'MCTSNode', policy: np.ndarray):
        """ขยาย node ด้วย policy จาก network"""
        actions = node.state.get_legal_actions()
        for i, action in enumerate(actions):
            new_state = node.state.apply_action(action)
            child = self._create_node(new_state)
            child.prior = policy[i]
            node.children[action] = child
        node.is_expanded = True
    
    def _backpropagate(self, path: List['MCTSNode'], value: float):
        """อัพเดทค่าย้อนกลับ"""
        for node in reversed(path):
            node.visit_count += 1
            node.value_sum += value
            value = -value  # สลับมุมมอง


# ตัวอย่าง Self-Play Training Loop
def self_play_training():
    """
    ตัวอย่างการ train โดยใช้ self-play
    (Simplified version)
    """
    print("=" * 50)
    print("Self-Play Training Framework")
    print("=" * 50)
    
    print("""
    1. สร้าง Neural Network
    2. สร้าง MCTS ที่ใช้ Network
    3. วนซ้ำ:
       a. เล่นเกมด้วย MCTS (self-play)
       b. เก็บข้อมูล (state, policy, outcome)
       c. Train network จากข้อมูล
       d. ประเมินผลกับ version ก่อนหน้า
    4. ทำซ้ำจนแข็งแกร่งพอ
    """)
    
    print("นี่คือแนวคิดเบื้องหลัง AlphaZero!")


if __name__ == "__main__":
    self_play_training()

7. สรุปโดยรวม

7.1 สรุปเนื้อหา

ในบทนี้เราได้เรียนรู้เกี่ยวกับ การค้นหาแบบแข่งขัน (Adversarial Search) ซึ่งเป็นเทคนิคสำคัญในการพัฒนา AI สำหรับเกม โดยครอบคลุมหัวข้อหลักดังนี้:

1. พื้นฐานทฤษฎีเกม:

เข้าใจองค์ประกอบของเกม (ผู้เล่น, สถานะ, การกระทำ, ผลตอบแทน)
เกมผลรวมเป็นศูนย์และต้นไม้เกม
กลยุทธ์และจุดสมดุลแนช

2. อัลกอริทึม Minimax:

หลักการ MAX และ MIN
การค้นหาแบบ recursive บนต้นไม้เกม
ความซับซ้อน O(b^m)

3. Alpha-Beta Pruning:

การปรับปรุงประสิทธิภาพ Minimax
กลไกการตัดกิ่งด้วย α และ β
ลดจำนวนโหนดที่ต้องสำรวจได้ถึง 90%+

4. Monte Carlo Tree Search:

แนวคิดการสุ่มตัวอย่าง
สี่ขั้นตอน: Selection, Expansion, Simulation, Backpropagation
สูตร UCB1 สำหรับการสมดุล exploration/exploitation
เหมาะกับเกมที่มี branching factor สูง

5. การประยุกต์ใช้:

หมากรุก: Alpha-Beta กับ evaluation functions
โกะ: MCTS กับ neural networks
AlphaZero: self-play และ deep learning

7.2 ข้อเปรียบเทียบอัลกอริทึม

graph LR
    subgraph COMPARISON["📊 การเลือกใช้อัลกอริทึม"]
        START["เริ่มต้น"]
        
        Q1{"Branching Factor
สูงหรือไม่?"}
        Q2{"มี Evaluation
Function ดีหรือไม่?"}
        Q3{"ต้องการ
Optimal หรือไม่?"}
        
        A1["MCTS
(+ Neural Network)"]
        A2["Alpha-Beta
+ Iterative Deepening"]
        A3["Minimax
(เกมเล็ก)"]
        
        START --> Q1
        Q1 -->|"สูง (>100)"| A1
        Q1 -->|"ต่ำ-กลาง"| Q2
        Q2 -->|"ใช่"| A2
        Q2 -->|"ไม่"| A1
        Q1 -->|"ต่ำมาก (<10)"| Q3
        Q3 -->|"ใช่"| A3
        Q3 -->|"ไม่"| A2
    end
    
    style START fill:#458588,stroke:#282828,color:#ebdbb2
    style Q1 fill:#d79921,stroke:#282828,color:#282828
    style Q2 fill:#d79921,stroke:#282828,color:#282828
    style Q3 fill:#d79921,stroke:#282828,color:#282828
    style A1 fill:#98971a,stroke:#282828,color:#282828
    style A2 fill:#b16286,stroke:#282828,color:#ebdbb2
    style A3 fill:#cc241d,stroke:#282828,color:#ebdbb2

7.3 แนวทางการศึกษาเพิ่มเติม

สำหรับผู้ที่ต้องการศึกษาเพิ่มเติม แนะนำให้:

ฝึกปฏิบัติ: พัฒนา AI สำหรับเกมง่าย ๆ เช่น Tic-Tac-Toe, Connect Four
ศึกษา Chess Programming: เรียนรู้เทคนิคขั้นสูงเช่น Transposition Tables, Null Move Pruning
เรียนรู้ Deep RL: ศึกษา AlphaZero paper และลองทำ simplified version
ทดลองกับ Libraries: ใช้ python-chess, pachi-py เพื่อทดสอบอัลกอริทึม

8. เอกสารอ้างอิง

8.1 ตำราหลัก

Russell, S. & Norvig, P. (2020). Artificial Intelligence: A Modern Approach (4th ed.). Pearson.
- Chapter 5: Adversarial Search
Millington, I. & Funge, J. (2009). Artificial Intelligence for Games (2nd ed.). Morgan Kaufmann.

8.2 บทความวิจัยสำคัญ

Shannon, C.E. (1950). "Programming a Computer for Playing Chess." Philosophical Magazine, 41(314).
Knuth, D.E. & Moore, R.W. (1975). "An Analysis of Alpha-Beta Pruning." Artificial Intelligence, 6(4), 293-326.
Kocsis, L. & Szepesvári, C. (2006). "Bandit Based Monte-Carlo Planning." ECML.
Silver, D. et al. (2016). "Mastering the game of Go with deep neural networks and tree search." Nature, 529, 484-489.
Silver, D. et al. (2017). "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm." arXiv:1712.01815.

8.3 แหล่งเรียนรู้ออนไลน์

Chess Programming Wiki: https://www.chessprogramming.org/
Coursera - Games, Strategy, and Decision Making: https://www.coursera.org/
Stanford CS221 - Artificial Intelligence: https://stanford-cs221.github.io/

8.4 Libraries และ Tools

python-chess: https://python-chess.readthedocs.io/
Pygame: https://www.pygame.org/
OpenSpiel (DeepMind): https://github.com/deepmind/open_spiel