Regular Expression ใน Python

คำนำ

Regular Expression หรือที่เรียกย่อ ๆ ว่า Regex เป็นเครื่องมือที่ทรงพลังสำหรับการค้นหา จับคู่ และประมวลผลข้อความ (Text Processing) ในรูปแบบต่าง ๆ เอกสารฉบับนี้จะนำเสนอความรู้ครอบคลุมเกี่ยวกับการใช้งาน Regular Expression ใน Python ตั้งแต่พื้นฐานจนถึงการประยุกต์ใช้งานจริง

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#458588','primaryTextColor':'#ebdbb2','primaryBorderColor':'#83a598','lineColor':'#fabd2f','secondaryColor':'#b16286','tertiaryColor':'#689d6a','background':'#282828','mainBkg':'#3c3836','textColor':'#ebdbb2','fontSize':'16px'}}}%%
graph LR
    A["ข้อความ
Text Input"] -->|"ใช้ Regex Pattern"| B["โมดูล re
re Module"]
    B --> C{"ประเภทการทำงาน
Operation Type"}
    C -->|"ค้นหา
Search"| D["re.search()"]
    C -->|"จับคู่
Match"| E["re.match()"]
    C -->|"แทนที่
Replace"| F["re.sub()"]
    C -->|"แยก
Split"| G["re.split()"]
    D --> H["ผลลัพธ์
Result"]
    E --> H
    F --> H
    G --> H
    
    style A fill:#458588,stroke:#83a598,color:#ebdbb2
    style B fill:#b16286,stroke:#d3869b,color:#ebdbb2
    style C fill:#689d6a,stroke:#8ec07c,color:#ebdbb2
    style H fill:#d79921,stroke:#fabd2f,color:#282828

1. ความรู้พื้นฐานเกี่ยวกับ Regular Expression (Regex)

1.1 ความหมายและประโยชน์

Regular Expression (Regex) คือ ลำดับของอักขระ (Sequence of Characters) ที่กำหนดรูปแบบการค้นหา (Search Pattern) ซึ่งสามารถใช้สำหรับการดำเนินการกับสตริง (String) เช่น การค้นหา การจับคู่ การแทนที่ และการตรวจสอบความถูกต้อง

ประโยชน์หลักของ Regex:

การตรวจสอบความถูกต้อง (Validation): ตรวจสอบรูปแบบของข้อมูล เช่น อีเมล หมายเลขโทรศัพท์ รหัสไปรษณีย์
การค้นหาและแทนที่ (Search and Replace): ค้นหาข้อความที่ตรงตามรูปแบบและแทนที่ด้วยข้อความใหม่
การแยกวิเคราะห์ข้อมูล (Data Parsing): แยกข้อมูลจากไฟล์บันทึก (Log Files) หรือเอกสาร
การทำความสะอาดข้อมูล (Data Cleaning): กรองและจัดรูปแบบข้อมูลให้เป็นมาตรฐาน
การแยกสตริง (String Splitting): แยกสตริงตามรูปแบบที่ซับซ้อน

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#458588','primaryTextColor':'#ebdbb2','primaryBorderColor':'#83a598','lineColor':'#fabd2f','secondaryColor':'#b16286','tertiaryColor':'#689d6a','background':'#282828','mainBkg':'#3c3836','textColor':'#ebdbb2','fontSize':'14px'}}}%%
mindmap
  root((Regular Expression
Regex))
    Applications
      การตรวจสอบ
Validation
      การค้นหา
Searching
      การแทนที่
Replacement
      การแยกวิเคราะห์
Parsing
    Components
      ตัวอักขระธรรมดา
Literals
      Metacharacters
      Quantifiers
      Groups
    Benefits
      ความยืดหยุ่น
Flexibility
      ประสิทธิภาพ
Efficiency
      มาตรฐาน
Standardization

1.2 โมดูล re ใน Python

Python มีโมดูล re (Regular Expression) ในตัวสำหรับการทำงานกับ Regex โดยไม่ต้องติดตั้งไลบรารีเพิ่มเติม

import re

def demo_basic_regex():
    """
    ฟังก์ชันสาธิตการใช้งาน Regex พื้นฐาน
    
    แสดงการนำเข้าโมดูล re และการใช้ฟังก์ชันพื้นฐาน
    """
    # สตริงที่ต้องการค้นหา
    text = "ฉันรัก Python และ Regex มาก ๆ"
    
    # รูปแบบที่ต้องการค้นหา (ค้นหาคำว่า Python)
    pattern = r"Python"
    
    # ใช้ re.search() เพื่อค้นหารูปแบบ
    match = re.search(pattern, text)
    
    if match:
        print(f"พบ '{pattern}' ที่ตำแหน่ง: {match.start()}-{match.end()}")
        print(f"ข้อความที่จับคู่ได้: {match.group()}")
    else:
        print(f"ไม่พบ '{pattern}' ในข้อความ")

# ทดสอบฟังก์ชัน
demo_basic_regex()
# Output: พบ 'Python' ที่ตำแหน่ง: 7-13
#         ข้อความที่จับคู่ได้: Python

ฟังก์ชันหลักในโมดูล re:

ฟังก์ชัน	คำอธิบาย	การใช้งาน
`re.search()`	ค้นหารูปแบบที่ใดก็ได้ในสตริง	การค้นหาทั่วไป
`re.match()`	จับคู่รูปแบบที่จุดเริ่มต้นของสตริง	การตรวจสอบรูปแบบเริ่มต้น
`re.findall()`	หาการจับคู่ทั้งหมด (คืนค่าเป็น list)	การดึงข้อมูลหลายรายการ
`re.finditer()`	หาการจับคู่ทั้งหมด (คืนค่าเป็น iterator)	การประมวลผลผลลัพธ์ทีละรายการ
`re.sub()`	แทนที่ส่วนที่จับคู่ได้	การแก้ไขข้อความ
`re.split()`	แยกสตริงด้วยรูปแบบ	การแบ่งข้อความ
`re.compile()`	คอมไพล์รูปแบบเพื่อใช้ซ้ำ	เพิ่มประสิทธิภาพ

1.3 รูปแบบ (Syntax) พื้นฐาน

1.3.1 ตัวอักษรธรรมดา (Literal Characters)

Literal Characters คือการจับคู่อักขระตามตัวอักษรที่ระบุ โดยตรง

def literal_matching():
    """
    ตัวอย่างการจับคู่ตัวอักษรธรรมดา
    
    แสดงการค้นหาข้อความที่ตรงกันเป็นตัวอักษร
    """
    text = "Python 3.11 เป็นภาษาโปรแกรมที่ดี"
    
    # ค้นหาคำว่า "Python" (ตรงตัว)
    pattern1 = r"Python"
    match1 = re.search(pattern1, text)
    print(f"Pattern '{pattern1}': {match1.group() if match1 else 'ไม่พบ'}")
    
    # ค้นหาตัวเลข "3.11" (ต้อง escape จุดด้วย \.)
    pattern2 = r"3\.11"
    match2 = re.search(pattern2, text)
    print(f"Pattern '{pattern2}': {match2.group() if match2 else 'ไม่พบ'}")

literal_matching()
# Output: Pattern 'Python': Python
#         Pattern '3\.11': 3.11

1.3.2 การใช้ Raw String (r"")

ใน Python ควรใช้ Raw String (r"") เมื่อเขียน Regex เพื่อหลีกเลี่ยงปัญหาการแปลความหมาย Backslash

def raw_string_demo():
    """
    เปรียบเทียบการใช้ Raw String กับ String ธรรมดา
    
    แสดงความแตกต่างระหว่าง r"" และ ""
    """
    text = "Path: C:\\Users\\Documents\\file.txt"
    
    # ใช้ Raw String (แนะนำ)
    pattern_raw = r"C:\\Users"
    match_raw = re.search(pattern_raw, text)
    print(f"Raw String: {match_raw.group() if match_raw else 'ไม่พบ'}")
    
    # ใช้ String ธรรมดา (ต้อง escape backslash 2 ครั้ง)
    pattern_normal = "C:\\\\Users"
    match_normal = re.search(pattern_normal, text)
    print(f"Normal String: {match_normal.group() if match_normal else 'ไม่พบ'}")

raw_string_demo()
# Output: Raw String: C:\Users
#         Normal String: C:\Users

1.3.3 การจับคู่แบบไม่ตรงตัว (Case-Insensitive Matching)

สามารถใช้ Flag re.IGNORECASE หรือ re.I เพื่อจับคู่โดยไม่สนใจตัวพิมพ์เล็กพิมพ์ใหญ่

def case_insensitive_demo():
    """
    ตัวอย่างการจับคู่โดยไม่สนใจตัวพิมพ์
    
    แสดงการใช้ re.IGNORECASE flag
    """
    text = "Python python PYTHON PyThOn"
    pattern = r"python"
    
    # จับคู่แบบ case-sensitive (ค่าเริ่มต้น)
    matches_sensitive = re.findall(pattern, text)
    print(f"Case-sensitive: {matches_sensitive}")
    
    # จับคู่แบบ case-insensitive
    matches_insensitive = re.findall(pattern, text, re.IGNORECASE)
    print(f"Case-insensitive: {matches_insensitive}")

case_insensitive_demo()
# Output: Case-sensitive: ['python']
#         Case-insensitive: ['Python', 'python', 'PYTHON', 'PyThOn']

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#458588','primaryTextColor':'#ebdbb2','primaryBorderColor':'#83a598','lineColor':'#fabd2f','secondaryColor':'#b16286','tertiaryColor':'#689d6a','background':'#282828','mainBkg':'#3c3836','textColor':'#ebdbb2','fontSize':'14px'}}}%%
flowchart TD
    A["เริ่มต้น Regex
Start"] --> B{"ใช้ Raw String?
Use r''?"}
    B -->|"ใช่
Yes"| C["r'pattern'"]
    B -->|"ไม่
No"| D["'pattern'
(ต้อง escape \\)"]
    C --> E{"Case-sensitive?"}
    D --> E
    E -->|"ไม่สนใจตัวพิมพ์
Ignore Case"| F["เพิ่ม re.IGNORECASE"]
    E -->|"สนใจตัวพิมพ์
Case Sensitive"| G["ไม่ใช้ flag"]
    F --> H["ดำเนินการค้นหา
Execute Search"]
    G --> H
    
    style A fill:#689d6a,stroke:#8ec07c,color:#ebdbb2
    style H fill:#d79921,stroke:#fabd2f,color:#282828
    style B fill:#458588,stroke:#83a598,color:#ebdbb2
    style E fill:#458588,stroke:#83a598,color:#ebdbb2

2. ตัวอักขระพิเศษ (Metacharacters) และลำดับหลีก (Escape Sequences)

Metacharacters คือ อักขระที่มีความหมายพิเศษ ใน Regex ซึ่งใช้ในการกำหนดรูปแบบการจับคู่ที่ซับซ้อน

2.1 ตารางตัวอักขระพิเศษ

ตัวอักขระ	ความหมาย	ตัวอย่าง	คำอธิบาย
`.`	จับคู่อักขระใดก็ได้ (ยกเว้น newline)	`a.c`	จับคู่ "abc", "a9c", "a c"
`^`	จุดเริ่มต้นของสตริงหรือบรรทัด	`^Hello`	จับคู่ "Hello" ที่ต้นบรรทัด
`$`	จุดสิ้นสุดของสตริงหรือบรรทัด	`world$`	จับคู่ "world" ที่ท้ายบรรทัด
`*`	จับคู่ 0 ครั้งหรือมากกว่า	`ab*c`	จับคู่ "ac", "abc", "abbc"
`+`	จับคู่ 1 ครั้งหรือมากกว่า	`ab+c`	จับคู่ "abc", "abbc" (ไม่จับ "ac")
`?`	จับคู่ 0 หรือ 1 ครั้ง	`colou?r`	จับคู่ "color", "colour"
`{m}`	จับคู่ m ครั้งพอดี	`a{3}`	จับคู่ "aaa"
`{m,n}`	จับคู่ m ถึง n ครั้ง	`a{2,4}`	จับคู่ "aa", "aaa", "aaaa"
`\|`	ตัวดำเนินการ OR	`cat\|dog`	จับคู่ "cat" หรือ "dog"
`[]`	ชุดอักขระ (Character Set)	`[aeiou]`	จับคู่สระเดี่ยว
`[^]`	ชุดอักขระแบบปฏิเสธ	`[^0-9]`	จับคู่ที่ไม่ใช่ตัวเลข
`()`	การจัดกลุ่ม (Grouping)	`(ab)+`	จับคู่ "ab", "abab", "ababab"
`\`	Escape Character	`\.`	จับคู่จุด "." ตามตัวอักษร

2.2 จุด (.) - จับคู่อักขระใดก็ได้

จุด (.) จับคู่กับ อักขระใดก็ได้ 1 ตัว ยกเว้นขึ้นบรรทัดใหม่ (newline)

def dot_metacharacter():
    """
    ตัวอย่างการใช้จุด (.) metacharacter
    
    จุดจับคู่กับอักขระใดก็ได้ 1 ตัว
    """
    text = "cat, bat, hat, rat, 123"
    
    # รูปแบบ: .at จับคู่อักขระใดก็ได้ตามด้วย "at"
    pattern = r".at"
    matches = re.findall(pattern, text)
    print(f"รูปแบบ '.at' พบ: {matches}")
    
    # รูปแบบ: ..t จับคู่อักขระใดก็ได้ 2 ตัวตามด้วย "t"
    pattern2 = r"..t"
    matches2 = re.findall(pattern2, text)
    print(f"รูปแบบ '..t' พบ: {matches2}")

dot_metacharacter()
# Output: รูปแบบ '.at' พบ: ['cat', 'bat', 'hat', 'rat']
#         รูปแบบ '..t' พบ: ['cat', 'bat', 'hat', 'rat']

2.3 Anchors (^, $) - จุดเริ่มต้นและสิ้นสุด

Anchors ใช้ระบุ ตำแหน่ง ในสตริงแทนที่จะจับคู่อักขระ

def anchors_demo():
    """
    ตัวอย่างการใช้ Anchors (^ และ $)
    
    ^ = จุดเริ่มต้น, $ = จุดสิ้นสุด
    """
    lines = [
        "Python is great",
        "I love Python",
        "Python",
        "Learning Python programming"
    ]
    
    # รูปแบบ: ^Python (ขึ้นต้นด้วย Python)
    pattern_start = r"^Python"
    print("ขึ้นต้นด้วย 'Python':")
    for line in lines:
        if re.search(pattern_start, line):
            print(f"  ✓ {line}")
    
    # รูปแบบ: Python$ (ลงท้ายด้วย Python)
    pattern_end = r"Python$"
    print("\nลงท้ายด้วย 'Python':")
    for line in lines:
        if re.search(pattern_end, line):
            print(f"  ✓ {line}")
    
    # รูปแบบ: ^Python$ (ต้องเป็น Python อย่างเดียว)
    pattern_exact = r"^Python$"
    print("\nเป็น 'Python' อย่างเดียว:")
    for line in lines:
        if re.search(pattern_exact, line):
            print(f"  ✓ {line}")

anchors_demo()
# Output: ขึ้นต้นด้วย 'Python':
#           ✓ Python is great
#           ✓ Python
#           ✓ Learning Python programming (ไม่ขึ้นต้นด้วย Python)
#         ลงท้ายด้วย 'Python':
#           ✓ I love Python
#           ✓ Python
#         เป็น 'Python' อย่างเดียว:
#           ✓ Python

2.4 ตัวดำเนินการ OR (|)

Pipe (|) ใช้สำหรับการจับคู่ ทางเลือก (Alternative)

def or_operator_demo():
    """
    ตัวอย่างการใช้ตัวดำเนินการ OR (|)
    
    จับคู่รูปแบบใดรูปแบบหนึ่ง
    """
    text = "ฉันมี cat และ dog แต่ไม่มี bird"
    
    # จับคู่ "cat" หรือ "dog"
    pattern = r"cat|dog"
    matches = re.findall(pattern, text)
    print(f"พบสัตว์: {matches}")
    
    # จับคู่หลายทางเลือก
    text2 = "สีที่ชอบคือ red, blue, yellow"
    pattern2 = r"red|blue|green|yellow"
    matches2 = re.findall(pattern2, text2)
    print(f"พบสี: {matches2}")

or_operator_demo()
# Output: พบสัตว์: ['cat', 'dog']
#         พบสี: ['red', 'blue', 'yellow']

2.5 การจัดกลุ่ม (Grouping) ด้วย ()

วงเล็บ () ใช้สำหรับ:

การจัดกลุ่ม (Grouping): รวมส่วนของรูปแบบเข้าด้วยกัน
การจับภาพ (Capturing): เก็บส่วนที่จับคู่ได้ไว้ใช้งาน

def grouping_demo():
    """
    ตัวอย่างการใช้วงเล็บสำหรับ Grouping และ Capturing
    
    () ใช้จัดกลุ่มและจับภาพข้อมูล
    """
    # ตัวอย่างที่ 1: Grouping กับ Quantifier
    text1 = "ababab abab ab"
    pattern1 = r"(ab)+"  # จับคู่ "ab" ซ้ำ 1 ครั้งขึ้นไป
    matches1 = re.findall(pattern1, text1)
    print(f"รูปแบบ '(ab)+' พบ: {matches1}")
    
    # ตัวอย่างที่ 2: Capturing Groups
    text2 = "วันที่: 25/12/2024"
    pattern2 = r"(\d{2})/(\d{2})/(\d{4})"  # จับภาพ วัน/เดือน/ปี
    match2 = re.search(pattern2, text2)
    if match2:
        print(f"วันที่เต็ม: {match2.group(0)}")
        print(f"วัน: {match2.group(1)}")
        print(f"เดือน: {match2.group(2)}")
        print(f"ปี: {match2.group(3)}")
    
    # ตัวอย่างที่ 3: Non-Capturing Group (?:...)
    text3 = "Mr. Smith, Mrs. Johnson"
    pattern3 = r"(?:Mr|Mrs)\. (\w+)"  # ไม่จับภาพ Mr/Mrs
    matches3 = re.findall(pattern3, text3)
    print(f"นามสกุล: {matches3}")

grouping_demo()
# Output: รูปแบบ '(ab)+' พบ: ['ab', 'ab', 'ab']
#         วันที่เต็ม: 25/12/2024
#         วัน: 25
#         เดือน: 12
#         ปี: 2024
#         นามสกุล: ['Smith', 'Johnson']

2.6 ชุดอักขระ (Character Sets) ด้วย []

วงเล็บก้ามปู [] ใช้กำหนด ชุดของอักขระ ที่ยอมรับได้

def character_sets_demo():
    """
    ตัวอย่างการใช้ Character Sets []
    
    [] ใช้จับคู่อักขระใดอักขระหนึ่งในชุด
    """
    text = "สีโปรด: red, blue, green, yellow"
    
    # จับคู่สระ a, e, i, o, u
    pattern1 = r"[aeiou]"
    vowels = re.findall(pattern1, text)
    print(f"สระที่พบ: {vowels}")
    
    # จับคู่ตัวเลข 0-9
    text2 = "รหัส: A1B2C3"
    pattern2 = r"[0-9]"
    digits = re.findall(pattern2, text2)
    print(f"ตัวเลขที่พบ: {digits}")
    
    # จับคู่ตัวอักษร a-z และ A-Z
    pattern3 = r"[a-zA-Z]+"
    letters = re.findall(pattern3, text2)
    print(f"ตัวอักษรที่พบ: {letters}")
    
    # การปฏิเสธด้วย [^...] (จับที่ไม่ใช่ตัวเลข)
    pattern4 = r"[^0-9]+"
    non_digits = re.findall(pattern4, text2)
    print(f"ที่ไม่ใช่ตัวเลข: {non_digits}")

character_sets_demo()
# Output: สระที่พบ: ['e', 'e', 'u', 'e', 'e', 'e', 'o']
#         ตัวเลขที่พบ: ['1', '2', '3']
#         ตัวอักษรที่พบ: ['A', 'B', 'C']
#         ที่ไม่ใช่ตัวเลข: ['รหัส: A', 'B', 'C']

2.7 คลาสตัวอักขระที่กำหนดไว้ล่วงหน้า (Predefined Character Classes)

Python Regex มี Shorthand Character Classes ที่ใช้บ่อย:

Shorthand	ความหมาย	เทียบเท่ากับ	ตัวอย่าง
`\d`	ตัวเลข (Digit)	`[0-9]`	จับคู่ 0-9
`\D`	ที่ไม่ใช่ตัวเลข	`[^0-9]`	จับคู่นอกจาก 0-9
`\w`	อักขระคำ (Word Character)	`[a-zA-Z0-9_]`	ตัวอักษร, ตัวเลข, underscore
`\W`	ที่ไม่ใช่อักขระคำ	`[^a-zA-Z0-9_]`	สัญลักษณ์พิเศษ, ช่องว่าง
`\s`	ช่องว่าง (Whitespace)	`[ \t\n\r\f\v]`	space, tab, newline
`\S`	ที่ไม่ใช่ช่องว่าง	`[^ \t\n\r\f\v]`	อักขระที่มองเห็นได้
`\b`	ขอบคำ (Word Boundary)	-	ตำแหน่งระหว่างคำ
`\B`	ที่ไม่ใช่ขอบคำ	-	ภายในคำ

def predefined_classes_demo():
    """
    ตัวอย่างการใช้ Predefined Character Classes
    
    แสดงการใช้ \d, \w, \s และตัวตรงข้าม
    """
    text = "บัญชี: user_123, รหัส: ABC-456, ราคา: 1,250.50 บาท"
    
    # \d - จับคู่ตัวเลข
    digits = re.findall(r"\d+", text)
    print(f"ตัวเลข (\\d): {digits}")
    
    # \w - จับคู่อักขระคำ
    words = re.findall(r"\w+", text)
    print(f"อักขระคำ (\\w): {words}")
    
    # \s - จับคู่ช่องว่าง
    spaces = re.findall(r"\s", text)
    print(f"ช่องว่าง (\\s): {len(spaces)} ตัว")
    
    # \b - ขอบคำ (หาคำที่มีตัวเลข)
    words_with_digits = re.findall(r"\b\w*\d\w*\b", text)
    print(f"คำที่มีตัวเลข (\\b): {words_with_digits}")
    
    # ตัวอย่างการใช้ตัวตรงข้าม
    text2 = "Python3.11"
    non_digits = re.findall(r"\D+", text2)  # ที่ไม่ใช่ตัวเลข
    print(f"ที่ไม่ใช่ตัวเลข (\\D): {non_digits}")

predefined_classes_demo()
# Output: ตัวเลข (\d): ['123', '456', '1', '250', '50']
#         อักขระคำ (\w): ['user_123', 'ABC', '456', '1', '250', '50']
#         ช่องว่าง (\s): 10 ตัว
#         คำที่มีตัวเลข (\b): ['user_123', '456', '1', '250', '50']
#         ที่ไม่ใช่ตัวเลข (\D): ['Python', '.']

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#458588','primaryTextColor':'#ebdbb2','primaryBorderColor':'#83a598','lineColor':'#fabd2f','secondaryColor':'#b16286','tertiaryColor':'#689d6a','background':'#282828','mainBkg':'#3c3836','textColor':'#ebdbb2','fontSize':'14px'}}}%%
graph TB
    A["Character Classes
คลาสตัวอักขระ"] --> B["Predefined
กำหนดไว้"]
    A --> C["Custom
กำหนดเอง"]
    
    B --> D["\d (Digits)
ตัวเลข"]
    B --> E["\w (Word)
อักขระคำ"]
    B --> F["\s (Space)
ช่องว่าง"]
    
    C --> G["[aeiou]
ชุดที่กำหนด"]
    C --> H["[^0-9]
ชุดปฏิเสธ"]
    C --> I["[a-z]
ช่วง"]
    
    D --> J["ตรงข้าม: \D"]
    E --> K["ตรงข้าม: \W"]
    F --> L["ตรงข้าม: \S"]
    
    style A fill:#b16286,stroke:#d3869b,color:#ebdbb2
    style B fill:#458588,stroke:#83a598,color:#ebdbb2
    style C fill:#458588,stroke:#83a598,color:#ebdbb2
    style D fill:#689d6a,stroke:#8ec07c,color:#ebdbb2
    style E fill:#689d6a,stroke:#8ec07c,color:#ebdbb2
    style F fill:#689d6a,stroke:#8ec07c,color:#ebdbb2
    style G fill:#d79921,stroke:#fabd2f,color:#282828
    style H fill:#d79921,stroke:#fabd2f,color:#282828
    style I fill:#d79921,stroke:#fabd2f,color:#282828

3. ตัวกำหนดปริมาณ (Quantifiers)

Quantifiers ใช้ระบุ จำนวนครั้ง ที่รูปแบบควรปรากฏ

3.1 ตารางตัวกำหนดปริมาณ

Quantifier	ความหมาย	การจับคู่	ตัวอย่าง
`*`	0 ครั้งหรือมากกว่า	Greedy	`ab*` จับ "a", "ab", "abb", "abbb"
`+`	1 ครั้งหรือมากกว่า	Greedy	`ab+` จับ "ab", "abb", "abbb" (ไม่จับ "a")
`?`	0 หรือ 1 ครั้ง	Greedy	`ab?` จับ "a", "ab"
`{m}`	m ครั้งพอดี	Exact	`a{3}` จับ "aaa"
`{m,}`	m ครั้งขึ้นไป	Greedy	`a{2,}` จับ "aa", "aaa", "aaaa", ...
`{m,n}`	m ถึง n ครั้ง	Greedy	`a{2,4}` จับ "aa", "aaa", "aaaa"
`*?`	0 ครั้งหรือมากกว่า	Non-Greedy	`ab*?` จับน้อยที่สุด
`+?`	1 ครั้งหรือมากกว่า	Non-Greedy	`ab+?` จับน้อยที่สุด
`??`	0 หรือ 1 ครั้ง	Non-Greedy	`ab??` จับน้อยที่สุด
`{m,n}?`	m ถึง n ครั้ง	Non-Greedy	`a{2,4}?` จับน้อยที่สุด

3.2 Asterisk (*) - 0 ครั้งหรือมากกว่า

Asterisk (*) จับคู่รูปแบบ 0 ครั้งหรือมากกว่า (Zero or More)

def asterisk_quantifier():
    """
    ตัวอย่างการใช้ * (0 ครั้งหรือมากกว่า)
    
    * จับคู่รูปแบบก่อนหน้า 0 ครั้งขึ้นไป
    """
    text = "a ab abb abbb abbbb"
    
    # รูปแบบ: ab* (a ตามด้วย b 0 ครั้งหรือมากกว่า)
    pattern = r"ab*"
    matches = re.findall(pattern, text)
    print(f"รูปแบบ 'ab*' พบ: {matches}")
    
    # ตัวอย่างการใช้กับ \d (ตัวเลข 0 ครั้งขึ้นไป)
    text2 = "ไฟล์: data.txt, report123.pdf, summary.doc"
    pattern2 = r"\w+\d*\.\w+"  # ชื่อไฟล์ที่อาจมีตัวเลข
    files = re.findall(pattern2, text2)
    print(f"ไฟล์ที่พบ: {files}")

asterisk_quantifier()
# Output: รูปแบบ 'ab*' พบ: ['a', 'ab', 'abb', 'abbb', 'abbbb']
#         ไฟล์ที่พบ: ['data.txt', 'report123.pdf', 'summary.doc']

3.3 Plus (+) - 1 ครั้งหรือมากกว่า

Plus (+) จับคู่รูปแบบ 1 ครั้งหรือมากกว่า (One or More)

def plus_quantifier():
    """
    ตัวอย่างการใช้ + (1 ครั้งหรือมากกว่า)
    
    + จับคู่รูปแบบก่อนหน้าอย่างน้อย 1 ครั้ง
    """
    text = "a ab abb abbb abbbb"
    
    # รูปแบบ: ab+ (a ตามด้วย b อย่างน้อย 1 ครั้ง)
    pattern = r"ab+"
    matches = re.findall(pattern, text)
    print(f"รูปแบบ 'ab+' พบ: {matches}")
    
    # ตัวอย่าง: ค้นหาคำที่มีตัวอักษรอย่างน้อย 1 ตัว
    text2 = "Hello 123 World 456"
    pattern2 = r"[a-zA-Z]+"  # ตัวอักษร 1 ตัวขึ้นไป
    words = re.findall(pattern2, text2)
    print(f"คำที่พบ: {words}")
    
    # ตัวอย่าง: ค้นหาตัวเลขอย่างน้อย 1 ตัว
    pattern3 = r"\d+"  # ตัวเลข 1 ตัวขึ้นไป
    numbers = re.findall(pattern3, text2)
    print(f"ตัวเลขที่พบ: {numbers}")

plus_quantifier()
# Output: รูปแบบ 'ab+' พบ: ['ab', 'abb', 'abbb', 'abbbb']
#         คำที่พบ: ['Hello', 'World']
#         ตัวเลขที่พบ: ['123', '456']

3.4 Question Mark (?) - 0 หรือ 1 ครั้ง

Question Mark (?) จับคู่รูปแบบ 0 หรือ 1 ครั้ง (Optional)

def question_quantifier():
    """
    ตัวอย่างการใช้ ? (0 หรือ 1 ครั้ง)
    
    ? ทำให้รูปแบบก่อนหน้าเป็น optional
    """
    # ตัวอย่างที่ 1: color/colour
    text1 = "I like color and colour"
    pattern1 = r"colou?r"  # u เป็น optional
    matches1 = re.findall(pattern1, text1)
    print(f"รูปแบบ 'colou?r' พบ: {matches1}")
    
    # ตัวอย่างที่ 2: เบอร์โทรที่อาจมี + หน้า
    text2 = "โทร: 0812345678, +66812345678"
    pattern2 = r"\+?\d+"  # + เป็น optional
    phones = re.findall(pattern2, text2)
    print(f"เบอร์โทร: {phones}")
    
    # ตัวอย่างที่ 3: URL ที่อาจมี s ใน https
    text3 = "เว็บไซต์: http://example.com และ https://secure.com"
    pattern3 = r"https?://\S+"  # s เป็น optional
    urls = re.findall(pattern3, text3)
    print(f"URL: {urls}")

question_quantifier()
# Output: รูปแบบ 'colou?r' พบ: ['color', 'colour']
#         เบอร์โทร: ['0812345678', '66812345678']
#         URL: ['http://example.com', 'https://secure.com']

3.5 Curly Braces {} - จำนวนครั้งที่กำหนด

Curly Braces {} ใช้ระบุ จำนวนครั้งที่แน่นอน หรือช่วง

def curly_braces_quantifier():
    """
    ตัวอย่างการใช้ {} (จำนวนครั้งที่กำหนด)
    
    {m} = m ครั้งพอดี, {m,n} = m ถึง n ครั้ง
    """
    # ตัวอย่างที่ 1: {m} - จำนวนครั้งพอดี
    text1 = "a aa aaa aaaa aaaaa"
    pattern1 = r"a{3}"  # a พอดี 3 ตัว
    matches1 = re.findall(pattern1, text1)
    print(f"รูปแบบ 'a{{3}}' พบ: {matches1}")
    
    # ตัวอย่างที่ 2: {m,n} - ช่วง
    pattern2 = r"a{2,4}"  # a ระหว่าง 2-4 ตัว
    matches2 = re.findall(pattern2, text1)
    print(f"รูปแบบ 'a{{2,4}}' พบ: {matches2}")
    
    # ตัวอย่างที่ 3: รหัสไปรษณีย์ไทย (5 หลัก)
    text2 = "ที่อยู่: กรุงเทพฯ 10200, เชียงใหม่ 50000, ภูเก็ต 83000"
    pattern3 = r"\d{5}"  # ตัวเลข 5 ตัวพอดี
    postcodes = re.findall(pattern3, text2)
    print(f"รหัสไปรษณีย์: {postcodes}")
    
    # ตัวอย่างที่ 4: หมายเลขโทรศัพท์ (10 หลัก)
    text3 = "ติดต่อ: 0812345678, 021234567"
    pattern4 = r"\b\d{10}\b"  # ตัวเลข 10 ตัวพอดี
    phones = re.findall(pattern4, text3)
    print(f"เบอร์โทร 10 หลัก: {phones}")

curly_braces_quantifier()
# Output: รูปแบบ 'a{3}' พบ: ['aaa', 'aaa']
#         รูปแบบ 'a{2,4}' พบ: ['aa', 'aaa', 'aaaa', 'aaaa']
#         รหัสไปรษณีย์: ['10200', '50000', '83000']
#         เบอร์โทร 10 หลัก: ['0812345678']

3.6 Greedy vs Non-Greedy (Lazy) Matching

Greedy Matching (ค่าเริ่มต้น) จะพยายาม จับคู่มากที่สุด เท่าที่เป็นไปได้
Non-Greedy Matching จะพยายาม จับคู่น้อยที่สุด เท่าที่เป็นไปได้

Greedy: (pattern) * Non-Greedy: (pattern) * ?

โดยที่:

pattern: รูปแบบที่ต้องการจับคู่
*: Quantifier (สามารถเป็น *, +, ?, {m,n})
?: เพิ่มหลัง Quantifier เพื่อทำให้เป็น Non-Greedy

def greedy_vs_non_greedy():
    """
    เปรียบเทียบ Greedy และ Non-Greedy Matching
    
    แสดงความแตกต่างระหว่างการจับคู่แบบ Greedy และ Non-Greedy
    """
    html = "<div>Content 1</div><div>Content 2</div>"
    
    # Greedy: จับคู่มากที่สุด
    pattern_greedy = r"<div>.*</div>"
    match_greedy = re.search(pattern_greedy, html)
    print("Greedy (.*) :")
    print(f"  {match_greedy.group() if match_greedy else 'ไม่พบ'}")
    
    # Non-Greedy: จับคู่น้อยที่สุด
    pattern_non_greedy = r"<div>.*?</div>"
    matches_non_greedy = re.findall(pattern_non_greedy, html)
    print("Non-Greedy (.*?) :")
    for i, match in enumerate(matches_non_greedy, 1):
        print(f"  {i}. {match}")
    
    # ตัวอย่างเพิ่มเติม
    text = 'He said "Hello" and "Goodbye"'
    
    # Greedy: จับคู่คำพูดทั้งหมด
    greedy = re.findall(r'".*"', text)
    print(f"\nGreedy (\".* \"): {greedy}")
    
    # Non-Greedy: จับคู่แต่ละคำพูด
    non_greedy = re.findall(r'".*?"', text)
    print(f"Non-Greedy (\".*?\"): {non_greedy}")

greedy_vs_non_greedy()
# Output: Greedy (.*) :
#           <div>Content 1</div><div>Content 2</div>
#         Non-Greedy (.*?) :
#           1. <div>Content 1</div>
#           2. <div>Content 2</div>
#         Greedy (".*"): ['"Hello" and "Goodbye"']
#         Non-Greedy (".*?"): ['"Hello"', '"Goodbye"']

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#458588','primaryTextColor':'#ebdbb2','primaryBorderColor':'#83a598','lineColor':'#fabd2f','secondaryColor':'#b16286','tertiaryColor':'#689d6a','background':'#282828','mainBkg':'#3c3836','textColor':'#ebdbb2','fontSize':'14px'}}}%%
flowchart LR
    A["Input: <div>A</div><div>B</div>"] --> B{"Quantifier Type"}
    
    B -->|"Greedy
<div>.*</div>"| C["จับคู่มากสุด
Max Match"]
    B -->|"Non-Greedy
<div>.*?</div>"| D["จับคู่น้อยสุด
Min Match"]
    
    C --> E["<div>A</div><div>B</div>
(ทั้งหมด)"]
    D --> F["<div>A</div>
(แต่ละส่วน)"]
    D --> G["<div>B</div>"]
    
    style A fill:#458588,stroke:#83a598,color:#ebdbb2
    style B fill:#b16286,stroke:#d3869b,color:#ebdbb2
    style C fill:#d79921,stroke:#fabd2f,color:#282828
    style D fill:#689d6a,stroke:#8ec07c,color:#ebdbb2
    style E fill:#cc241d,stroke:#fb4934,color:#ebdbb2
    style F fill:#98971a,stroke:#b8bb26,color:#282828
    style G fill:#98971a,stroke:#b8bb26,color:#282828

4. ฟังก์ชันหลักของโมดูล re ใน Python

โมดูล re มีฟังก์ชันหลายตัวสำหรับการทำงานกับ Regular Expression แต่ละฟังก์ชันมีจุดประสงค์และพฤติกรรมที่แตกต่างกัน

4.1 re.match() - จับคู่ที่จุดเริ่มต้น

re.match() ตรวจสอบว่า Regex จับคู่ที่ จุดเริ่มต้น ของสตริงหรือไม่

def match_demo():
    """
    ตัวอย่างการใช้ re.match()
    
    match() จับคู่ที่จุดเริ่มต้นของสตริงเท่านั้น
    """
    # ตัวอย่างที่ 1: จับคู่สำเร็จ (อยู่ต้นสตริง)
    text1 = "Python is great"
    pattern = r"Python"
    match = re.match(pattern, text1)
    print(f"ข้อความ: '{text1}'")
    print(f"รูปแบบ: '{pattern}'")
    print(f"ผลลัพธ์: {match.group() if match else 'ไม่จับคู่'}\n")
    
    # ตัวอย่างที่ 2: จับคู่ไม่สำเร็จ (ไม่ได้อยู่ต้นสตริง)
    text2 = "I love Python"
    match2 = re.match(pattern, text2)
    print(f"ข้อความ: '{text2}'")
    print(f"รูปแบบ: '{pattern}'")
    print(f"ผลลัพธ์: {match2.group() if match2 else 'ไม่จับคู่'}\n")
    
    # ตัวอย่างที่ 3: ตรวจสอบรูปแบบอีเมล
    email = "user@example.com"
    email_pattern = r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"
    email_match = re.match(email_pattern, email)
    print(f"อีเมล: '{email}'")
    print(f"ถูกต้อง: {'ใช่' if email_match else 'ไม่'}")

match_demo()
# Output: ข้อความ: 'Python is great'
#         รูปแบบ: 'Python'
#         ผลลัพธ์: Python
#
#         ข้อความ: 'I love Python'
#         รูปแบบ: 'Python'
#         ผลลัพธ์: ไม่จับคู่
#
#         อีเมล: 'user@example.com'
#         ถูกต้อง: ใช่

4.2 re.search() - ค้นหาที่ใดก็ได้

re.search() สแกนหาที่ ใดก็ได้ ในสตริงเพื่อหาตำแหน่งที่จับคู่

def search_demo():
    """
    ตัวอย่างการใช้ re.search()
    
    search() ค้นหาที่ใดก็ได้ในสตริงและคืนค่าการจับคู่แรก
    """
    text = "ฉันรัก Python และ Regex"
    
    # ค้นหา "Python" ที่ไหนก็ได้
    pattern1 = r"Python"
    match1 = re.search(pattern1, text)
    if match1:
        print(f"พบ '{pattern1}' ที่ตำแหน่ง {match1.start()}-{match1.end()}")
    
    # ค้นหาตัวเลข
    text2 = "Python 3.11 เวอร์ชันล่าสุด"
    pattern2 = r"\d+\.\d+"  # รูปแบบเลขเวอร์ชัน
    match2 = re.search(pattern2, text2)
    if match2:
        print(f"พบเวอร์ชัน: {match2.group()}")
    
    # ค้นหาที่ไม่พบ
    pattern3 = r"Java"
    match3 = re.search(pattern3, text)
    print(f"พบ '{pattern3}': {'ใช่' if match3 else 'ไม่'}")

search_demo()
# Output: พบ 'Python' ที่ตำแหน่ง 7-13
#         พบเวอร์ชัน: 3.11
#         พบ 'Java': ไม่

4.3 re.findall() - หาการจับคู่ทั้งหมด (List)

re.findall() ค้นหาการจับคู่ ทั้งหมด และคืนค่าเป็น list ของสตริง

def findall_demo():
    """
    ตัวอย่างการใช้ re.findall()
    
    findall() คืนค่ารายการของการจับคู่ทั้งหมด
    """
    # ตัวอย่างที่ 1: หาคำทั้งหมด
    text1 = "Python is great, Python is fun"
    pattern1 = r"Python"
    matches1 = re.findall(pattern1, text1)
    print(f"พบ '{pattern1}' จำนวน: {len(matches1)} ครั้ง")
    print(f"รายการ: {matches1}\n")
    
    # ตัวอย่างที่ 2: หาตัวเลขทั้งหมด
    text2 = "ราคา: 100 บาท, ลด 20%, เหลือ 80 บาท"
    pattern2 = r"\d+"
    numbers = re.findall(pattern2, text2)
    print(f"ตัวเลขที่พบ: {numbers}\n")
    
    # ตัวอย่างที่ 3: หาอีเมลทั้งหมด
    text3 = "ติดต่อ: admin@example.com หรือ support@test.co.th"
    pattern3 = r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"
    emails = re.findall(pattern3, text3)
    print(f"อีเมล: {emails}")
    
    # ตัวอย่างที่ 4: หาด้วย Groups (คืนค่าเป็น tuple)
    text4 = "วันที่: 25/12/2024, 01/01/2025"
    pattern4 = r"(\d{2})/(\d{2})/(\d{4})"
    dates = re.findall(pattern4, text4)
    print(f"\nวันที่ (tuples): {dates}")
    for day, month, year in dates:
        print(f"  วัน={day}, เดือน={month}, ปี={year}")

findall_demo()
# Output: พบ 'Python' จำนวน: 2 ครั้ง
#         รายการ: ['Python', 'Python']
#
#         ตัวเลขที่พบ: ['100', '20', '80']
#
#         อีเมล: ['admin@example.com', 'support@test.co.th']
#
#         วันที่ (tuples): [('25', '12', '2024'), ('01', '01', '2025')]
#           วัน=25, เดือน=12, ปี=2024
#           วัน=01, เดือน=01, ปี=2025

4.4 re.finditer() - หาการจับคู่ทั้งหมด (Iterator)

re.finditer() ค้นหาการจับคู่ทั้งหมดและคืนค่าเป็น iterator ของวัตถุ Match

def finditer_demo():
    """
    ตัวอย่างการใช้ re.finditer()
    
    finditer() คืนค่า iterator ของ Match objects
    """
    text = "Python 3.11, Java 17, Ruby 3.2"
    pattern = r"([A-Za-z]+)\s+(\d+(?:\.\d+)?)"
    
    # ใช้ finditer() เพื่อวนซ้ำผลลัพธ์
    print("ภาษาโปรแกรมและเวอร์ชัน:")
    for i, match in enumerate(re.finditer(pattern, text), 1):
        language = match.group(1)
        version = match.group(2)
        start = match.start()
        end = match.end()
        print(f"{i}. {language} v{version} (ตำแหน่ง: {start}-{end})")
    
    # เปรียบเทียบกับ findall()
    print("\nเปรียบเทียบ findall():")
    matches_list = re.findall(pattern, text)
    for i, (lang, ver) in enumerate(matches_list, 1):
        print(f"{i}. {lang} v{ver}")

finditer_demo()
# Output: ภาษาโปรแกรมและเวอร์ชัน:
#         1. Python v3.11 (ตำแหน่ง: 0-13)
#         2. Java v17 (ตำแหน่ง: 15-22)
#         3. Ruby v3.2 (ตำแหน่ง: 24-33)
#
#         เปรียบเทียบ findall():
#         1. Python v3.11
#         2. Java v17
#         3. Ruby v3.2

4.5 re.sub() - การแทนที่ (Substitution)

re.sub() แทนที่ส่วนที่จับคู่ได้ด้วยสตริงใหม่

def sub_demo():
    """
    ตัวอย่างการใช้ re.sub()
    
    sub() แทนที่การจับคู่ด้วยสตริงใหม่
    """
    # ตัวอย่างที่ 1: แทนที่คำเดียว
    text1 = "I love Java and Java is great"
    pattern1 = r"Java"
    result1 = re.sub(pattern1, "Python", text1)
    print(f"เดิม: {text1}")
    print(f"ใหม่: {result1}\n")
    
    # ตัวอย่างที่ 2: ซ่อนหมายเลขบัตรเครดิต
    text2 = "บัตร: 1234-5678-9012-3456"
    pattern2 = r"\d{4}-\d{4}-\d{4}-(\d{4})"
    result2 = re.sub(pattern2, r"****-****-****-\1", text2)
    print(f"เดิม: {text2}")
    print(f"ซ่อน: {result2}\n")
    
    # ตัวอย่างที่ 3: ลบช่องว่างเกิน
    text3 = "Hello    World    !"
    pattern3 = r"\s+"
    result3 = re.sub(pattern3, " ", text3)
    print(f"เดิม: '{text3}'")
    print(f"ใหม่: '{result3}'\n")
    
    # ตัวอย่างที่ 4: ใช้ฟังก์ชันในการแทนที่
    def double_number(match):
        """เพิ่มค่าตัวเลขเป็น 2 เท่า"""
        return str(int(match.group()) * 2)
    
    text4 = "ราคา: 100, ลด 20, เหลือ 80"
    pattern4 = r"\d+"
    result4 = re.sub(pattern4, double_number, text4)
    print(f"เดิม: {text4}")
    print(f"x2:   {result4}")

sub_demo()
# Output: เดิม: I love Java and Java is great
#         ใหม่: I love Python and Python is great
#
#         เดิม: บัตร: 1234-5678-9012-3456
#         ซ่อน: บัตร: ****-****-****-3456
#
#         เดิม: 'Hello    World    !'
#         ใหม่: 'Hello World !'
#
#         เดิม: ราคา: 100, ลด 20, เหลือ 80
#         x2:   ราคา: 200, ลด 40, เหลือ 160

4.6 re.split() - การแยกสตริง

re.split() แยกสตริงโดยใช้ Regex เป็นตัวคั่น

def split_demo():
    """
    ตัวอย่างการใช้ re.split()
    
    split() แยกสตริงตามรูปแบบที่กำหนด
    """
    # ตัวอย่างที่ 1: แยกด้วยช่องว่างหรือจุลภาค
    text1 = "apple, banana orange,pear  grape"
    pattern1 = r"[,\s]+"  # จุลภาคหรือช่องว่าง 1 ตัวขึ้นไป
    parts1 = re.split(pattern1, text1)
    print(f"ข้อความ: {text1}")
    print(f"แยกเป็น: {parts1}\n")
    
    # ตัวอย่างที่ 2: แยกด้วยเครื่องหมายต่าง ๆ
    text2 = "one;two:three,four"
    pattern2 = r"[;:,]"  # ; หรือ : หรือ ,
    parts2 = re.split(pattern2, text2)
    print(f"ข้อความ: {text2}")
    print(f"แยกเป็น: {parts2}\n")
    
    # ตัวอย่างที่ 3: จำกัดจำนวนการแยก
    text3 = "a-b-c-d-e"
    pattern3 = r"-"
    parts3_all = re.split(pattern3, text3)
    parts3_limited = re.split(pattern3, text3, maxsplit=2)
    print(f"ข้อความ: {text3}")
    print(f"แยกทั้งหมด: {parts3_all}")
    print(f"แยก 2 ครั้ง: {parts3_limited}")

split_demo()
# Output: ข้อความ: apple, banana orange,pear  grape
#         แยกเป็น: ['apple', 'banana', 'orange', 'pear', 'grape']
#
#         ข้อความ: one;two:three,four
#         แยกเป็น: ['one', 'two', 'three', 'four']
#
#         ข้อความ: a-b-c-d-e
#         แยกทั้งหมด: ['a', 'b', 'c', 'd', 'e']
#         แยก 2 ครั้ง: ['a', 'b', 'c-d-e']

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#458588','primaryTextColor':'#ebdbb2','primaryBorderColor':'#83a598','lineColor':'#fabd2f','secondaryColor':'#b16286','tertiaryColor':'#689d6a','background':'#282828','mainBkg':'#3c3836','textColor':'#ebdbb2','fontSize':'14px'}}}%%
graph TD
    A["โมดูล re
re Module"] --> B["การค้นหา
Searching"]
    A --> C["การแก้ไข
Modification"]
    A --> D["การแยก
Splitting"]
    
    B --> E["re.match()
จุดเริ่มต้น"]
    B --> F["re.search()
ที่ใดก็ได้"]
    B --> G["re.findall()
ทั้งหมด (list)"]
    B --> H["re.finditer()
ทั้งหมด (iterator)"]
    
    C --> I["re.sub()
แทนที่"]
    
    D --> J["re.split()
แยกสตริง"]
    
    style A fill:#b16286,stroke:#d3869b,color:#ebdbb2
    style B fill:#458588,stroke:#83a598,color:#ebdbb2
    style C fill:#458588,stroke:#83a598,color:#ebdbb2
    style D fill:#458588,stroke:#83a598,color:#ebdbb2
    style E fill:#689d6a,stroke:#8ec07c,color:#ebdbb2
    style F fill:#689d6a,stroke:#8ec07c,color:#ebdbb2
    style G fill:#689d6a,stroke:#8ec07c,color:#ebdbb2
    style H fill:#689d6a,stroke:#8ec07c,color:#ebdbb2
    style I fill:#d79921,stroke:#fabd2f,color:#282828
    style J fill:#d79921,stroke:#fabd2f,color:#282828

5. วัตถุ Match (Match Object) และการใช้งาน

Match Object คือวัตถุที่คืนค่ามาจาก re.search(), re.match() หรือ re.finditer() ซึ่งมีข้อมูลเกี่ยวกับการจับคู่

5.1 โครงสร้าง Match Object

Match Object เป็นวัตถุพิเศษที่ Python สร้างขึ้นเมื่อพบการจับคู่ที่สำเร็จ ประกอบด้วย Attributes (คุณสมบัติ) และ Methods (เมธอด) ที่ใช้เข้าถึงข้อมูลเกี่ยวกับการจับคู่

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#458588','primaryTextColor':'#ebdbb2','primaryBorderColor':'#83a598','lineColor':'#fabd2f','secondaryColor':'#b16286','tertiaryColor':'#689d6a','background':'#282828','mainBkg':'#3c3836','textColor':'#ebdbb2','fontSize':'13px'}}}%%
graph TD
    A["Match Object
วัตถุการจับคู่"] --> B["Attributes
คุณสมบัติ"]
    A --> C["Methods
เมธอด"]
    
    B --> D["string
สตริงต้นฉบับ"]
    B --> E["pos, endpos
ตำแหน่งค้นหา"]
    B --> F["lastindex
กลุ่มสุดท้าย"]
    B --> G["lastgroup
ชื่อกลุ่มสุดท้าย"]
    B --> H["re
Pattern object"]
    
    C --> I["การดึงข้อมูล
Data Extraction"]
    C --> J["การหาตำแหน่ง
Position Finding"]
    C --> K["การจัดการกลุ่ม
Group Management"]
    
    I --> L["group()
groups()
groupdict()"]
    J --> M["start()
end()
span()"]
    K --> N["expand()
__getitem__()"]
    
    style A fill:#b16286,stroke:#d3869b,color:#ebdbb2
    style B fill:#458588,stroke:#83a598,color:#ebdbb2
    style C fill:#458588,stroke:#83a598,color:#ebdbb2
    style I fill:#689d6a,stroke:#8ec07c,color:#ebdbb2
    style J fill:#689d6a,stroke:#8ec07c,color:#ebdbb2
    style K fill:#689d6a,stroke:#8ec07c,color:#ebdbb2
    style L fill:#d79921,stroke:#fabd2f,color:#282828
    style M fill:#d79921,stroke:#fabd2f,color:#282828
    style N fill:#d79921,stroke:#fabd2f,color:#282828

Attributes (คุณสมบัติ) ของ Match Object

Attribute	ชนิดข้อมูล	คำอธิบาย	ตัวอย่าง
`string`	`str`	สตริงต้นฉบับที่ใช้ค้นหา	`"Python 3.11"`
`pos`	`int`	ตำแหน่งเริ่มต้นที่เริ่มค้นหา	`0`
`endpos`	`int`	ตำแหน่งสิ้นสุดที่หยุดค้นหา	`11`
`lastindex`	`int`	หมายเลขกลุ่มสุดท้ายที่จับได้	`3`
`lastgroup`	`str`	ชื่อกลุ่มสุดท้าย (ถ้ามี named group)	`'year'`
`re`	`Pattern`	Pattern object ที่ใช้จับคู่	`re.compile(...)`

Methods (เมธอด) ของ Match Object

1. เมธอดการดึงข้อมูล (Data Extraction Methods):

เมธอด	คืนค่า	คำอธิบาย
`group([g1, g2, ...])`	`str` หรือ `tuple`	ดึงข้อความที่จับคู่ได้ (กลุ่ม 0 = ทั้งหมด)
`groups([default])`	`tuple`	ดึงกลุ่มทั้งหมด (ไม่รวมกลุ่ม 0)
`groupdict([default])`	`dict`	ดึง named groups เป็น dictionary
`expand(template)`	`str`	แทนที่ template ด้วยกลุ่มที่จับได้

2. เมธอดการหาตำแหน่ง (Position Methods):

เมธอด	คืนค่า	คำอธิบาย
`start([group])`	`int`	ตำแหน่งเริ่มต้นของกลุ่ม
`end([group])`	`int`	ตำแหน่งสิ้นสุดของกลุ่ม (index + 1)
`span([group])`	`tuple`	คู่ของ (start, end)

3. เมธอดพิเศษ:

เมธอด	คืนค่า	คำอธิบาย
`__getitem__(g)`	`str`	เข้าถึงกลุ่มด้วย `match[g]`

def match_object_structure():
    """
    สาธิตโครงสร้างและคุณสมบัติของ Match Object
    
    แสดงการเข้าถึง attributes และ methods ต่าง ๆ
    """
    text = "วันเกิด: 25/12/2024 เวลา 14:30"
    pattern = re.compile(r'(?P<day>\d{2})/(?P<month>\d{2})/(?P<year>\d{4})')
    match = re.search(pattern, text)
    
    if match:
        print("=" * 50)
        print("ATTRIBUTES (คุณสมบัติ)")
        print("=" * 50)
        print(f"string:     {match.string!r}")
        print(f"pos:        {match.pos}")
        print(f"endpos:     {match.endpos}")
        print(f"lastindex:  {match.lastindex}")
        print(f"lastgroup:  {match.lastgroup!r}")
        print(f"re:         {match.re.pattern}")
        
        print("\n" + "=" * 50)
        print("METHODS - การดึงข้อมูล")
        print("=" * 50)
        print(f"group(0):   {match.group(0)}")
        print(f"group(1):   {match.group(1)} (day)")
        print(f"group(2):   {match.group(2)} (month)")
        print(f"group(3):   {match.group(3)} (year)")
        print(f"groups():   {match.groups()}")
        print(f"groupdict(): {match.groupdict()}")
        
        print("\n" + "=" * 50)
        print("METHODS - การหาตำแหน่ง")
        print("=" * 50)
        print(f"start():    {match.start()}")
        print(f"end():      {match.end()}")
        print(f"span():     {match.span()}")
        print(f"start(1):   {match.start(1)} (day)")
        print(f"span(1):    {match.span(1)} (day)")
        
        print("\n" + "=" * 50)
        print("เข้าถึงผ่าน Index")
        print("=" * 50)
        print(f"match[0]:   {match[0]}")
        print(f"match[1]:   {match[1]}")
        print(f"match['day']: {match['day']}")

match_object_structure()
# Output: 
# ==================================================
# ATTRIBUTES (คุณสมบัติ)
# ==================================================
# string:     'วันเกิด: 25/12/2024 เวลา 14:30'
# pos:        0
# endpos:     29
# lastindex:  3
# lastgroup:  'year'
# re:         (?P<day>\d{2})/(?P<month>\d{2})/(?P<year>\d{4})
#
# ==================================================
# METHODS - การดึงข้อมูล
# ==================================================
# group(0):   25/12/2024
# group(1):   25 (day)
# group(2):   12 (month)
# group(3):   2024 (year)
# groups():   ('25', '12', '2024')
# groupdict(): {'day': '25', 'month': '12', 'year': '2024'}
#
# ==================================================
# METHODS - การหาตำแหน่ง
# ==================================================
# start():    9
# end():      19
# span():     (9, 19)
# start(1):   9 (day)
# span(1):    (9, 11) (day)
#
# ==================================================
# เข้าถึงผ่าน Index
# ==================================================
# match[0]:   25/12/2024
# match[1]:   25
# match['day']: 25

คำอธิบายเชิงลึก

1. ความแตกต่างระหว่าง pos/endpos กับ start()/end():

def pos_vs_start_demo():
    """
    อธิบายความแตกต่างระหว่าง pos/endpos และ start()/end()
    """
    text = "Hello World"
    pattern = r"World"
    
    # ค้นหาเฉพาะตำแหน่ง 3-11
    match = re.search(pattern, text, 3, 11)
    
    if match:
        print("สตริง: 'Hello World'")
        print("ค้นหาในช่วง: ตำแหน่ง 3-11")
        print()
        print(f"pos (เริ่มค้นหา):     {match.pos}")
        print(f"endpos (หยุดค้นหา):   {match.endpos}")
        print(f"start() (จับคู่เริ่ม): {match.start()}")
        print(f"end() (จับคู่จบ):      {match.end()}")
        print()
        print("pos/endpos  = ขอบเขตการค้นหา")
        print("start/end   = ตำแหน่งที่จับคู่จริง")

pos_vs_start_demo()
# Output: สตริง: 'Hello World'
#         ค้นหาในช่วง: ตำแหน่ง 3-11
#         
#         pos (เริ่มค้นหา):     3
#         endpos (หยุดค้นหา):   11
#         start() (จับคู่เริ่ม): 6
#         end() (จับคู่จบ):      11
#         
#         pos/endpos  = ขอบเขตการค้นหา
#         start/end   = ตำแหน่งที่จับคู่จริง

2. การใช้ expand() Template:

def expand_demo():
    """
    สาธิตการใช้ expand() method
    
    expand() ใช้แทนที่ template ด้วยกลุ่มที่จับได้
    """
    text = "John Doe, jane@example.com, 081-234-5678"
    pattern = r'(\w+)@(\w+)\.(\w+)'
    match = re.search(pattern, text)
    
    if match:
        print("อีเมล:", match.group(0))
        print()
        
        # ใช้ \g<n> หรือ \n ใน template
        template1 = r"User: \1, Domain: \2.\3"
        print("Template 1:", template1)
        print("ผลลัพธ์:", match.expand(template1))
        print()
        
        template2 = r"Email: \g<0>"
        print("Template 2:", template2)
        print("ผลลัพธ์:", match.expand(template2))

expand_demo()
# Output: อีเมล: jane@example.com
#         
#         Template 1: User: \1, Domain: \2.\3
#         ผลลัพธ์: User: jane, Domain: example.com
#         
#         Template 2: Email: \g<0>
#         ผลลัพธ์: Email: jane@example.com

3. lastindex และ lastgroup:

def lastindex_lastgroup_demo():
    """
    อธิบาย lastindex และ lastgroup
    
    แสดงว่ากลุ่มใดถูกจับคู่สุดท้าย
    """
    # กรณีที่ 1: ทุกกลุ่มถูกจับคู่
    text1 = "2024-12-25"
    pattern1 = r'(\d{4})-(\d{2})-(\d{2})'
    match1 = re.search(pattern1, text1)
    print("กรณีที่ 1: จับคู่ทุกกลุ่ม")
    print(f"  lastindex: {match1.lastindex}")
    print(f"  lastgroup: {match1.lastgroup}")
    print()
    
    # กรณีที่ 2: กลุ่มเป็น optional
    text2 = "Python"
    pattern2 = r'(Python)( \d+\.\d+)?'  # เวอร์ชันเป็น optional
    match2 = re.search(pattern2, text2)
    print("กรณีที่ 2: กลุ่มที่ 2 ไม่ถูกจับคู่")
    print(f"  groups: {match2.groups()}")
    print(f"  lastindex: {match2.lastindex}")
    print(f"  lastgroup: {match2.lastgroup}")
    print()
    
    # กรณีที่ 3: Named groups
    text3 = "2024-12-25"
    pattern3 = r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})'
    match3 = re.search(pattern3, text3)
    print("กรณีที่ 3: Named groups")
    print(f"  lastindex: {match3.lastindex}")
    print(f"  lastgroup: {match3.lastgroup!r}")

lastindex_lastgroup_demo()
# Output: กรณีที่ 1: จับคู่ทุกกลุ่ม
#           lastindex: 3
#           lastgroup: None
#         
#         กรณีที่ 2: กลุ่มที่ 2 ไม่ถูกจับคู่
#           groups: ('Python', None)
#           lastindex: 1
#           lastgroup: None
#         
#         กรณีที่ 3: Named groups
#           lastindex: 3
#           lastgroup: 'day'

5.2 เมธอด .group() - ดึงข้อความที่จับคู่

group() ใช้ดึงข้อความที่จับคู่ได้

group (n) = {\begin{matrix} การจับคู่ทั้งหมด & ถ้า n = 0 \\ กลุ่มที่ n & ถ้า n > 0 \end{matrix}

โดยที่:

n = 0: คืนค่าการจับคู่ทั้งหมด (ค่าเริ่มต้น)
n > 0: คืนค่ากลุ่มที่ n (Capturing Group)

def group_method_demo():
    """
    ตัวอย่างการใช้เมธอด .group()
    
    แสดงการดึงข้อมูลจาก Match Object
    """
    text = "ติดต่อ: โทร 081-234-5678 อีเมล admin@example.com"
    
    # ตัวอย่างที่ 1: group(0) - การจับคู่ทั้งหมด
    pattern1 = r"\d{3}-\d{3}-\d{4}"
    match1 = re.search(pattern1, text)
    if match1:
        print("เบอร์โทร:")
        print(f"  group(0): {match1.group(0)}")  # หรือ match1.group()
    
    # ตัวอย่างที่ 2: Capturing Groups
    pattern2 = r"(\d{3})-(\d{3})-(\d{4})"
    match2 = re.search(pattern2, text)
    if match2:
        print("\nเบอร์โทร (แบ่งกลุ่ม):")
        print(f"  group(0): {match2.group(0)} (ทั้งหมด)")
        print(f"  group(1): {match2.group(1)} (ส่วนที่ 1)")
        print(f"  group(2): {match2.group(2)} (ส่วนที่ 2)")
        print(f"  group(3): {match2.group(3)} (ส่วนที่ 3)")
    
    # ตัวอย่างที่ 3: ดึงหลายกลุ่มพร้อมกัน
    print("\nดึงหลายกลุ่ม:")
    parts = match2.group(1, 2, 3)
    print(f"  groups (1,2,3): {parts}")

group_method_demo()
# Output: เบอร์โทร:
#           group(0): 081-234-5678
#
#         เบอร์โทร (แบ่งกลุ่ม):
#           group(0): 081-234-5678 (ทั้งหมด)
#           group(1): 081 (ส่วนที่ 1)
#           group(2): 234 (ส่วนที่ 2)
#           group(3): 5678 (ส่วนที่ 3)
#
#         ดึงหลายกลุ่ม:
#           groups (1,2,3): ('081', '234', '5678')

5.3 เมธอด .groups() - ดึงกลุ่มทั้งหมด

groups() คืนค่า tuple ของกลุ่มทั้งหมด (ไม่รวม group(0))

def groups_method_demo():
    """
    ตัวอย่างการใช้เมธอด .groups()
    
    แสดงการดึงกลุ่มทั้งหมดเป็น tuple
    """
    # ตัวอย่างที่ 1: วันที่
    text1 = "วันเกิด: 25/12/2024"
    pattern1 = r"(\d{2})/(\d{2})/(\d{4})"
    match1 = re.search(pattern1, text1)
    if match1:
        all_groups = match1.groups()
        print(f"กลุ่มทั้งหมด: {all_groups}")
        day, month, year = all_groups
        print(f"วัน={day}, เดือน={month}, ปี={year}")
    
    # ตัวอย่างที่ 2: URL
    text2 = "เว็บ: https://www.example.com:8080/path"
    pattern2 = r"(https?)://([^:]+):(\d+)(/.*)?"
    match2 = re.search(pattern2, text2)
    if match2:
        print(f"\nURL Groups: {match2.groups()}")
        protocol, domain, port, path = match2.groups()
        print(f"Protocol: {protocol}")
        print(f"Domain: {domain}")
        print(f"Port: {port}")
        print(f"Path: {path}")

groups_method_demo()
# Output: กลุ่มทั้งหมด: ('25', '12', '2024')
#         วัน=25, เดือน=12, ปี=2024
#
#         URL Groups: ('https', 'www.example.com', '8080', '/path')
#         Protocol: https
#         Domain: www.example.com
#         Port: 8080
#         Path: /path

5.4 เมธอด .groupdict() - Named Groups

groupdict() คืนค่า dictionary ของ Named Capturing Groups

def groupdict_demo():
    """
    ตัวอย่างการใช้เมธอด .groupdict()
    
    แสดงการใช้ Named Groups (?P<name>...)
    """
    # ตัวอย่างที่ 1: ข้อมูลบุคคล
    text = "ชื่อ: สมชาย นามสกุล: ใจดี อายุ: 25 ปี"
    pattern = r"ชื่อ: (?P<firstname>\w+) นามสกุล: (?P<lastname>\w+) อายุ: (?P<age>\d+)"
    match = re.search(pattern, text)
    
    if match:
        # ดึงเป็น dictionary
        info = match.groupdict()
        print("ข้อมูล (dict):", info)
        print(f"ชื่อ: {info['firstname']}")
        print(f"นามสกุล: {info['lastname']}")
        print(f"อายุ: {info['age']} ปี")
        
        # ดึงทีละ field
        print(f"\nดึงทีละ field:")
        print(f"  ชื่อ: {match.group('firstname')}")
        print(f"  นามสกุล: {match.group('lastname')}")
    
    # ตัวอย่างที่ 2: Log entry
    log = "2024-12-25 14:30:45 [ERROR] Connection timeout"
    log_pattern = r"(?P<date>\d{4}-\d{2}-\d{2}) (?P<time>\d{2}:\d{2}:\d{2}) \[(?P<level>\w+)\] (?P<message>.+)"
    log_match = re.search(log_pattern, log)
    
    if log_match:
        print(f"\nLog Entry:")
        for key, value in log_match.groupdict().items():
            print(f"  {key}: {value}")

groupdict_demo()
# Output: ข้อมูล (dict): {'firstname': 'สมชาย', 'lastname': 'ใจดี', 'age': '25'}
#         ชื่อ: สมชาย
#         นามสกุล: ใจดี
#         อายุ: 25 ปี
#
#         ดึงทีละ field:
#           ชื่อ: สมชาย
#           นามสกุล: ใจดี
#
#         Log Entry:
#           date: 2024-12-25
#           time: 14:30:45
#           level: ERROR
#           message: Connection timeout

5.5 เมธอด .start(), .end(), .span() - ตำแหน่ง

เมธอดเหล่านี้ใช้หา ตำแหน่ง ของการจับคู่ในสตริง

def position_methods_demo():
    """
    ตัวอย่างการใช้ .start(), .end(), .span()
    
    แสดงการหาตำแหน่งของการจับคู่
    """
    text = "Python 3.11 is the latest version"
    pattern = r"Python \d+\.\d+"
    match = re.search(pattern, text)
    
    if match:
        print(f"ข้อความ: '{text}'")
        print(f"จับคู่: '{match.group()}'")
        print(f"\nตำแหน่ง:")
        print(f"  start(): {match.start()} (ตำแหน่งเริ่มต้น)")
        print(f"  end():   {match.end()} (ตำแหน่งสิ้นสุด + 1)")
        print(f"  span():  {match.span()} (tuple ของ start-end)")
        
        # ใช้ตำแหน่งเพื่อแยกข้อความ
        start, end = match.span()
        before = text[:start]
        matched = text[start:end]
        after = text[end:]
        print(f"\nแยกข้อความ:")
        print(f"  ก่อน: '{before}'")
        print(f"  จับคู่: '{matched}'")
        print(f"  หลัง: '{after}'")
    
    # ตำแหน่งของแต่ละกลุ่ม
    text2 = "Date: 25/12/2024"
    pattern2 = r"(\d{2})/(\d{2})/(\d{4})"
    match2 = re.search(pattern2, text2)
    
    if match2:
        print(f"\nตำแหน่งแต่ละกลุ่ม:")
        for i in range(4):  # 0-3
            print(f"  group({i}): {match2.group(i)}")
            print(f"    ตำแหน่ง: {match2.start(i)}-{match2.end(i)}")

position_methods_demo()
# Output: ข้อความ: 'Python 3.11 is the latest version'
#         จับคู่: 'Python 3.11'
#
#         ตำแหน่ง:
#           start(): 0 (ตำแหน่งเริ่มต้น)
#           end():   11 (ตำแหน่งสิ้นสุด + 1)
#           span():  (0, 11) (tuple ของ start-end)
#
#         แยกข้อความ:
#           ก่อน: ''
#           จับคู่: 'Python 3.11'
#           หลัง: ' is the latest version'
#
#         ตำแหน่งแต่ละกลุ่ม:
#           group(0): 25/12/2024
#             ตำแหน่ง: 6-16
#           group(1): 25
#             ตำแหน่ง: 6-8
#           group(2): 12
#             ตำแหน่ง: 9-11
#           group(3): 2024
#             ตำแหน่ง: 12-16

6. การคอมไพล์ Regular Expression (Compiling Regex)

re.compile() ใช้ คอมไพล์ Regex pattern เพื่อเพิ่มประสิทธิภาพเมื่อต้องใช้รูปแบบเดียวกันหลายครั้ง

6.1 ประโยชน์ของการคอมไพล์

ประสิทธิภาพ = \frac{เวลาโดยไม่คอมไพล์}{เวลาโดยคอมไพล์} \approx 2 - 3 เท่า

โดยที่:

ไม่คอมไพล์: ต้องแปลรูปแบบทุกครั้งที่เรียกใช้
คอมไพล์: แปลครั้งเดียว แล้วนำกลับมาใช้ได้หลายครั้ง

import time

def compile_performance():
    """
    เปรียบเทียบประสิทธิภาพระหว่างคอมไพล์และไม่คอมไพล์
    
    แสดงความแตกต่างของเวลาประมวลผล
    """
    text = "Python 3.11 is great" * 10000
    pattern = r"Python \d+\.\d+"
    
    # ไม่คอมไพล์
    start = time.time()
    for _ in range(1000):
        re.search(pattern, text)
    time_no_compile = time.time() - start
    print(f"ไม่คอมไพล์: {time_no_compile:.4f} วินาที")
    
    # คอมไพล์
    compiled_pattern = re.compile(pattern)
    start = time.time()
    for _ in range(1000):
        compiled_pattern.search(text)
    time_compiled = time.time() - start
    print(f"คอมไพล์:    {time_compiled:.4f} วินาที")
    
    # เปรียบเทียบ
    improvement = (time_no_compile / time_compiled)
    print(f"\nประสิทธิภาพดีขึ้น: {improvement:.2f}x")

# compile_performance()  # ใช้เวลาในการทดสอบ

6.2 การใช้งาน re.compile()

def compile_usage():
    """
    ตัวอย่างการใช้ re.compile()
    
    แสดงการคอมไพล์และใช้งาน Pattern Object
    """
    # คอมไพล์รูปแบบ
    email_pattern = re.compile(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}')
    
    # ใช้งานเหมือนฟังก์ชันปกติ
    emails = [
        "valid@example.com",
        "invalid@",
        "another.valid@test.co.th",
        "not-an-email"
    ]
    
    print("ตรวจสอบอีเมล:")
    for email in emails:
        if email_pattern.search(email):
            print(f"  ✓ {email}")
        else:
            print(f"  ✗ {email}")
    
    # ใช้กับฟังก์ชันต่าง ๆ
    text = "ติดต่อ: admin@example.com, support@test.com"
    print(f"\nอีเมลที่พบ: {email_pattern.findall(text)}")
    
    # แทนที่อีเมล
    hidden = email_pattern.sub("[EMAIL]", text)
    print(f"ซ่อนอีเมล: {hidden}")

compile_usage()
# Output: ตรวจสอบอีเมล:
#           ✓ valid@example.com
#           ✗ invalid@
#           ✓ another.valid@test.co.th
#           ✗ not-an-email
#
#         อีเมลที่พบ: ['admin@example.com', 'support@test.com']
#         ซ่อนอีเมล: ติดต่อ: [EMAIL], [EMAIL]

6.3 Flags (ตัวเลือก)

Flags ใช้ปรับพฤติกรรมของ Regex

Flag	ชื่อเต็ม	ความหมาย
`re.I`	`re.IGNORECASE`	ไม่สนใจตัวพิมพ์เล็ก-ใหญ่
`re.M`	`re.MULTILINE`	`^` และ `$` จับคู่แต่ละบรรทัด
`re.S`	`re.DOTALL`	`.` จับคู่รวมถึง newline
`re.X`	`re.VERBOSE`	อนุญาตให้ใส่ comment และช่องว่าง
`re.A`	`re.ASCII`	จับคู่เฉพาะ ASCII
`re.L`	`re.LOCALE`	ใช้ locale settings

def flags_demo():
    """
    ตัวอย่างการใช้ Flags
    
    แสดงการใช้ตัวเลือกต่าง ๆ กับ Regex
    """
    # re.IGNORECASE (re.I)
    text1 = "Python PYTHON python PyThOn"
    pattern1 = re.compile(r"python", re.IGNORECASE)
    matches1 = pattern1.findall(text1)
    print(f"IGNORECASE: {matches1}\n")
    
    # re.MULTILINE (re.M)
    text2 = """Line 1: Python
Line 2: Java
Line 3: Python"""
    pattern2_no_m = re.compile(r"^Python")  # ไม่มี MULTILINE
    pattern2_m = re.compile(r"^Python", re.MULTILINE)  # มี MULTILINE
    print("MULTILINE:")
    print(f"  ไม่มี flag: {len(pattern2_no_m.findall(text2))} รายการ")
    print(f"  มี flag:    {len(pattern2_m.findall(text2))} รายการ\n")
    
    # re.DOTALL (re.S)
    text3 = "Hello\nWorld"
    pattern3_no_s = re.compile(r"Hello.")  # ไม่มี DOTALL
    pattern3_s = re.compile(r"Hello.", re.DOTALL)  # มี DOTALL
    print("DOTALL (. จับคู่ newline):")
    print(f"  ไม่มี flag: {pattern3_no_s.search(text3)}")
    print(f"  มี flag:    {pattern3_s.search(text3).group()!r}\n")
    
    # re.VERBOSE (re.X)
    # อนุญาตให้เขียน Regex แบบมี comment
    phone_pattern = re.compile(r"""
        (\d{3})   # รหัสพื้นที่
        -         # ขีด
        (\d{3})   # 3 หลักถัดไป
        -         # ขีด
        (\d{4})   # 4 หลักสุดท้าย
    """, re.VERBOSE)
    
    text4 = "โทร: 081-234-5678"
    match4 = phone_pattern.search(text4)
    print(f"VERBOSE (อ่านง่าย):")
    print(f"  เบอร์: {match4.group() if match4 else 'ไม่พบ'}")
    
    # รวมหลาย flags ด้วย |
    pattern5 = re.compile(r"^python", re.IGNORECASE | re.MULTILINE)
    matches5 = pattern5.findall(text2)
    print(f"\nรวมหลาย flags: {matches5}")

flags_demo()
# Output: IGNORECASE: ['Python', 'PYTHON', 'python', 'PyThOn']
#
#         MULTILINE:
#           ไม่มี flag: 1 รายการ
#           มี flag:    2 รายการ
#
#         DOTALL (. จับคู่ newline):
#           ไม่มี flag: None
#           มี flag:    'Hello\n'
#
#         VERBOSE (อ่านง่าย):
#           เบอร์: 081-234-5678
#
#         รวมหลาย flags: ['Python', 'Python']

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#458588','primaryTextColor':'#ebdbb2','primaryBorderColor':'#83a598','lineColor':'#fabd2f','secondaryColor':'#b16286','tertiaryColor':'#689d6a','background':'#282828','mainBkg':'#3c3836','textColor':'#ebdbb2','fontSize':'14px'}}}%%
flowchart TD
    A["Regex Pattern"] --> B{"ต้องใช้หลายครั้ง?"}
    
    B -->|"ใช่
Yes"| C["re.compile(pattern)"]
    B -->|"ไม่
No"| D["ใช้โดยตรง
re.search()"]
    
    C --> E{"ต้องการ Flags?"}
    E -->|"ใช่"| F["เพิ่ม Flags
re.I, re.M, re.S"]
    E -->|"ไม่"| G["Pattern Object"]
    F --> G
    
    G --> H["ใช้งาน
.search()
.findall()
.sub()"]
    D --> I["ผลลัพธ์"]
    H --> I
    
    style A fill:#689d6a,stroke:#8ec07c,color:#ebdbb2
    style B fill:#458588,stroke:#83a598,color:#ebdbb2
    style C fill:#b16286,stroke:#d3869b,color:#ebdbb2
    style E fill:#458588,stroke:#83a598,color:#ebdbb2
    style G fill:#d79921,stroke:#fabd2f,color:#282828
    style I fill:#98971a,stroke:#b8bb26,color:#282828

7. ตัวอย่างการใช้งานจริง (Use Cases)

7.1 การตรวจสอบความถูกต้องของข้อมูล (Validation)

7.1.1 อีเมล (Email)

def validate_email():
    """
    ตรวจสอบรูปแบบอีเมล
    
    ใช้ Regex ตรวจสอบว่าอีเมลถูกต้องหรือไม่
    """
    # รูปแบบอีเมลพื้นฐาน
    email_pattern = re.compile(
        r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    )
    
    test_emails = [
        "user@example.com",           # ถูกต้อง
        "user.name@example.co.th",    # ถูกต้อง
        "user+tag@example.com",       # ถูกต้อง
        "invalid.email@",             # ไม่ถูกต้อง
        "@example.com",               # ไม่ถูกต้อง
        "user@.com",                  # ไม่ถูกต้อง
        "user name@example.com"       # ไม่ถูกต้อง (มีช่องว่าง)
    ]
    
    print("ตรวจสอบอีเมล:")
    for email in test_emails:
        is_valid = bool(email_pattern.match(email))
        status = "✓" if is_valid else "✗"
        print(f"  {status} {email}")

validate_email()
# Output: ตรวจสอบอีเมล:
#           ✓ user@example.com
#           ✓ user.name@example.co.th
#           ✓ user+tag@example.com
#           ✗ invalid.email@
#           ✗ @example.com
#           ✗ user@.com
#           ✗ user name@example.com

7.1.2 หมายเลขโทรศัพท์ (Phone Number)

def validate_phone():
    """
    ตรวจสอบหมายเลขโทรศัพท์ไทย
    
    รองรับหลายรูปแบบ: 081-234-5678, 0812345678, +66812345678
    """
    # รูปแบบเบอร์โทรไทย
    phone_patterns = {
        "มาตรฐาน": re.compile(r'^\d{10}$'),                           # 0812345678
        "มีขีด": re.compile(r'^\d{3}-\d{3}-\d{4}$'),                  # 081-234-5678
        "รหัสประเทศ": re.compile(r'^\+66\d{9}$'),                    # +66812345678
        "ครบทุกรูปแบบ": re.compile(r'^(\+66|0)\d{1,2}-?\d{3}-?\d{4}$')  # ทุกรูปแบบ
    }
    
    test_phones = [
        "0812345678",
        "081-234-5678",
        "+66812345678",
        "02-123-4567",
        "12345",
        "081 234 5678",
    ]
    
    print("ตรวจสอบเบอร์โทร (ครบทุกรูปแบบ):")
    pattern = phone_patterns["ครบทุกรูปแบบ"]
    for phone in test_phones:
        is_valid = bool(pattern.match(phone))
        status = "✓" if is_valid else "✗"
        print(f"  {status} {phone}")

validate_phone()
# Output: ตรวจสอบเบอร์โทร (ครบทุกรูปแบบ):
#           ✓ 0812345678
#           ✓ 081-234-5678
#           ✓ +66812345678
#           ✓ 02-123-4567
#           ✗ 12345
#           ✗ 081 234 5678

7.1.3 รหัสไปรษณีย์ไทย (Postal Code)

def validate_postal_code():
    """
    ตรวจสอบรหัสไปรษณีย์ไทย
    
    รหัสไปรษณีย์ไทยมี 5 หลัก
    """
    postal_pattern = re.compile(r'^\d{5}$')
    
    test_codes = [
        "10200",   # กรุงเทพฯ
        "50000",   # เชียงใหม่
        "83000",   # ภูเก็ต
        "1020",    # ไม่ถูกต้อง (4 หลัก)
        "102000",  # ไม่ถูกต้อง (6 หลัก)
        "ABCDE",   # ไม่ถูกต้อง (ไม่ใช่ตัวเลข)
    ]
    
    print("ตรวจสอบรหัสไปรษณีย์:")
    for code in test_codes:
        is_valid = bool(postal_pattern.match(code))
        status = "✓" if is_valid else "✗"
        print(f"  {status} {code}")

validate_postal_code()
# Output: ตรวจสอบรหัสไปรษณีย์:
#           ✓ 10200
#           ✓ 50000
#           ✓ 83000
#           ✗ 1020
#           ✗ 102000
#           ✗ ABCDE

7.2 การแยกวิเคราะห์ข้อมูล (Parsing)

7.2.1 Log Files

def parse_log_file():
    """
    แยกวิเคราะห์ไฟล์บันทึก (Log)
    
    แยกข้อมูล timestamp, level, และ message
    """
    # รูปแบบ log: YYYY-MM-DD HH:MM:SS [LEVEL] message
    log_pattern = re.compile(
        r'(?P<date>\d{4}-\d{2}-\d{2}) '
        r'(?P<time>\d{2}:\d{2}:\d{2}) '
        r'\[(?P<level>\w+)\] '
        r'(?P<message>.+)'
    )
    
    sample_logs = """
2024-12-25 10:30:00 [INFO] Application started
2024-12-25 10:30:15 [DEBUG] Loading configuration
2024-12-25 10:31:00 [WARNING] Connection slow
2024-12-25 10:32:45 [ERROR] Failed to connect to database
2024-12-25 10:33:00 [INFO] Retrying connection
    """.strip().split('\n')
    
    print("การแยกวิเคราะห์ Log:")
    for log_line in sample_logs:
        match = log_pattern.match(log_line.strip())
        if match:
            log_data = match.groupdict()
            print(f"[{log_data['level']:7}] {log_data['time']} - {log_data['message']}")

parse_log_file()
# Output: การแยกวิเคราะห์ Log:
#         [INFO   ] 10:30:00 - Application started
#         [DEBUG  ] 10:30:15 - Loading configuration
#         [WARNING] 10:31:00 - Connection slow
#         [ERROR  ] 10:32:45 - Failed to connect to database
#         [INFO   ] 10:33:00 - Retrying connection

7.2.2 HTML Tags

def parse_html():
    """
    แยกวิเคราะห์ HTML tags
    
    ดึงข้อมูลจาก HTML (ควรใช้ BeautifulSoup สำหรับงานจริง)
    """
    html = """
    <div class="container">
        <h1>Title</h1>
        <p class="content">Paragraph 1</p>
        <p class="content">Paragraph 2</p>
        <a href="https://example.com">Link</a>
    </div>
    """
    
    # ดึง tag และเนื้อหา
    tag_pattern = re.compile(r'<(\w+)[^>]*>([^<]+)</\1>')
    matches = tag_pattern.findall(html)
    
    print("HTML Tags และเนื้อหา:")
    for tag, content in matches:
        print(f"  <{tag}>: {content.strip()}")
    
    # ดึง links
    link_pattern = re.compile(r'<a[^>]+href="([^"]+)"[^>]*>([^<]+)</a>')
    links = link_pattern.findall(html)
    
    print("\nLinks:")
    for url, text in links:
        print(f"  {text} -> {url}")

parse_html()
# Output: HTML Tags และเนื้อหา:
#           <h1>: Title
#           <p>: Paragraph 1
#           <p>: Paragraph 2
#           <a>: Link
#
#         Links:
#           Link -> https://example.com

7.3 การทำความสะอาดข้อมูล (Data Cleaning)

def clean_data():
    """
    ทำความสะอาดข้อมูล
    
    ลบอักขระพิเศษ, ช่องว่างเกิน, และจัดรูปแบบ
    """
    # ข้อมูลที่ไม่เรียบร้อย
    dirty_text = """
    Hello!!!   This    is   a   test...
    Email: user@EXAMPLE.com
    Phone: (081)  234-5678
    Price: $1,234.56
    """
    
    print("ก่อนทำความสะอาด:")
    print(dirty_text)
    
    # 1. ลบอักขระพิเศษซ้ำ
    cleaned = re.sub(r'[!.]{2,}', '.', dirty_text)
    
    # 2. ลบช่องว่างเกิน
    cleaned = re.sub(r'\s+', ' ', cleaned)
    
    # 3. แปลงอีเมลเป็นตัวพิมพ์เล็ก
    def lowercase_email(match):
        return match.group(1) + match.group(2).lower()
    
    cleaned = re.sub(r'(Email: )([^\s]+)', lowercase_email, cleaned)
    
    # 4. จัดรูปแบบเบอร์โทร
    cleaned = re.sub(r'\((\d{3})\)\s*(\d{3})-(\d{4})', r'\1-\2-\3', cleaned)
    
    # 5. ลบสัญลักษณ์เงิน
    cleaned = re.sub(r'\$', '', cleaned)
    
    print("\nหลังทำความสะอาด:")
    print(cleaned.strip())

clean_data()
# Output: ก่อนทำความสะอาด:
#         
#             Hello!!!   This    is   a   test...
#             Email: user@EXAMPLE.com
#             Phone: (081)  234-5678
#             Price: $1,234.56
#         
#         
#         หลังทำความสะอาด:
#         Hello. This is a test. Email: user@example.com Phone: 081-234-5678 Price: 1,234.56

7.4 การค้นหาและแทนที่ที่ซับซ้อน

def advanced_replacement():
    """
    การแทนที่ที่ซับซ้อนด้วยฟังก์ชัน
    
    แสดงการใช้ฟังก์ชันใน re.sub()
    """
    text = "ราคาสินค้า: Item1=100, Item2=250, Item3=80"
    
    # แทนที่ด้วยการคำนวณ (ลด 10%)
    def apply_discount(match):
        """ลดราคา 10%"""
        price = int(match.group(1))
        discounted = int(price * 0.9)
        return str(discounted)
    
    result = re.sub(r'=(\d+)', lambda m: f"={apply_discount(m)}", text)
    print(f"เดิม: {text}")
    print(f"ลด 10%: {result}")
    
    # แปลงรูปแบบวันที่
    dates = "วันที่: 25/12/2024, 01/01/2025"
    
    def convert_date_format(match):
        """แปลง DD/MM/YYYY เป็น YYYY-MM-DD"""
        day, month, year = match.groups()
        return f"{year}-{month}-{day}"
    
    result_dates = re.sub(r'(\d{2})/(\d{2})/(\d{4})', convert_date_format, dates)
    print(f"\nเดิม: {dates}")
    print(f"ใหม่: {result_dates}")

advanced_replacement()
# Output: เดิม: ราคาสินค้า: Item1=100, Item2=250, Item3=80
#         ลด 10%: ราคาสินค้า: Item1=90, Item2=225, Item3=72
#
#         เดิม: วันที่: 25/12/2024, 01/01/2025
#         ใหม่: วันที่: 2024-12-25, 2025-01-01

8. เคล็ดลับและแนวทางปฏิบัติที่ดี (Best Practices)

8.1 ใช้ Raw String (r"") เสมอ

def raw_string_best_practice():
    """
    เคล็ดลับ: ใช้ Raw String เพื่อหลีกเลี่ยงปัญหา Backslash
    
    r"" ทำให้ไม่ต้อง escape backslash
    """
    # ❌ ไม่ดี: ต้อง escape หลายครั้ง
    bad_pattern = "\\d{3}\\s\\w+"
    
    # ✓ ดี: ใช้ raw string
    good_pattern = r"\d{3}\s\w+"
    
    print(f"ไม่ใช้ raw string: {bad_pattern}")
    print(f"ใช้ raw string:    {good_pattern}")
    print(f"เหมือนกัน: {bad_pattern == good_pattern}")

raw_string_best_practice()
# Output: ไม่ใช้ raw string: \d{3}\s\w+
#         ใช้ raw string:    \d{3}\s\w+
#         เหมือนกัน: True

8.2 คอมไพล์เมื่อใช้ซ้ำ

def compile_best_practice():
    """
    เคล็ดลับ: คอมไพล์ Pattern เมื่อต้องใช้หลายครั้ง
    
    ช่วยเพิ่มประสิทธิภาพและทำให้โค้ดอ่านง่าย
    """
    # ❌ ไม่ดี: ไม่คอมไพล์ (เสียเวลาแปลทุกครั้ง)
    def bad_approach(texts):
        results = []
        for text in texts:
            if re.match(r'^\d{5}$', text):
                results.append(text)
        return results
    
    # ✓ ดี: คอมไพล์ก่อนใช้
    def good_approach(texts):
        pattern = re.compile(r'^\d{5}$')
        results = []
        for text in texts:
            if pattern.match(text):
                results.append(text)
        return results
    
    test_data = ["10200", "ABC", "50000", "12345", "test"]
    print(f"ผลลัพธ์: {good_approach(test_data)}")

compile_best_practice()
# Output: ผลลัพธ์: ['10200', '50000', '12345']

8.3 ใช้ Non-Greedy เมื่อจำเป็น

def non_greedy_best_practice():
    """
    เคล็ดลับ: ใช้ Non-Greedy (?:, *?, +?, ??) เมื่อจับคู่ HTML หรือ XML
    
    ป้องกันการจับคู่เกินความต้องการ
    """
    html = "<div>First</div><div>Second</div>"
    
    # ❌ ไม่ดี: Greedy (จับคู่มากเกินไป)
    bad_pattern = r"<div>.*</div>"
    bad_result = re.search(bad_pattern, html)
    print(f"Greedy: {bad_result.group()}")
    
    # ✓ ดี: Non-Greedy (จับคู่พอดี)
    good_pattern = r"<div>.*?</div>"
    good_results = re.findall(good_pattern, html)
    print(f"Non-Greedy: {good_results}")

non_greedy_best_practice()
# Output: Greedy: <div>First</div><div>Second</div>
#         Non-Greedy: ['<div>First</div>', '<div>Second</div>']

8.4 ใช้ Named Groups เพื่อความชัดเจน

def named_groups_best_practice():
    """
    เคล็ดลับ: ใช้ Named Groups (?P<name>) เพื่อให้โค้ดอ่านง่าย
    
    ทำให้เข้าใจว่าแต่ละกลุ่มคืออะไร
    """
    # ❌ ไม่ดี: ใช้หมายเลขกลุ่ม (อ่านยาก)
    date_text = "วันที่: 25/12/2024"
    bad_pattern = r"(\d{2})/(\d{2})/(\d{4})"
    bad_match = re.search(bad_pattern, date_text)
    print(f"ไม่ดี: วัน={bad_match.group(1)}, เดือน={bad_match.group(2)}")
    
    # ✓ ดี: ใช้ Named Groups (อ่านง่าย)
    good_pattern = r"(?P<day>\d{2})/(?P<month>\d{2})/(?P<year>\d{4})"
    good_match = re.search(good_pattern, date_text)
    print(f"ดี: วัน={good_match.group('day')}, เดือน={good_match.group('month')}")

named_groups_best_practice()
# Output: ไม่ดี: วัน=25, เดือน=12
#         ดี: วัน=25, เดือน=12

8.5 ทดสอบ Regex ด้วยเครื่องมือออนไลน์

เครื่องมือทดสอบ Regex ที่แนะนำ:

เว็บไซต์	คุณสมบัติ	URL
Regex101	รองรับหลายภาษา, อธิบายรายละเอียด	https://regex101.com
RegExr	Interactive, มี cheatsheet	https://regexr.com
Pythex	เฉพาะ Python, ทดสอบเร็ว	https://pythex.org
RegexPal	เรียบง่าย, ใช้งานง่าย	https://www.regexpal.com

def testing_workflow():
    """
    ขั้นตอนการทดสอบ Regex
    
    แนวทางในการพัฒนาและทดสอบ Regex
    """
    print("ขั้นตอนการทดสอบ Regex:")
    print("1. เขียนรูปแบบเบื้องต้นบน Regex101.com")
    print("2. ทดสอบกับข้อมูลตัวอย่างหลาย ๆ แบบ")
    print("3. ตรวจสอบ edge cases")
    print("4. คัดลอกไปใช้ใน Python")
    print("5. เขียน unit tests")
    
    # ตัวอย่าง unit test
    def test_email_pattern():
        email_pattern = re.compile(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')
        
        # Test cases
        valid_emails = ["user@example.com", "test.name@example.co.th"]
        invalid_emails = ["invalid@", "@example.com", "user@"]
        
        for email in valid_emails:
            assert email_pattern.match(email), f"ควรถูกต้อง: {email}"
        
        for email in invalid_emails:
            assert not email_pattern.match(email), f"ควรไม่ถูกต้อง: {email}"
        
        print("\n✓ ผ่านการทดสอบทั้งหมด")
    
    test_email_pattern()

testing_workflow()
# Output: ขั้นตอนการทดสอบ Regex:
#         1. เขียนรูปแบบเบื้องต้นบน Regex101.com
#         2. ทดสอบกับข้อมูลตัวอย่างหลาย ๆ แบบ
#         3. ตรวจสอบ edge cases
#         4. คัดลอกไปใช้ใน Python
#         5. เขียน unit tests
#
#         ✓ ผ่านการทดสอบทั้งหมด

8.6 Regex Cheat Sheet

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#458588','primaryTextColor':'#ebdbb2','primaryBorderColor':'#83a598','lineColor':'#fabd2f','secondaryColor':'#b16286','tertiaryColor':'#689d6a','background':'#282828','mainBkg':'#3c3836','textColor':'#ebdbb2','fontSize':'13px'}}}%%
flowchart TB
    A["Regex Cheat Sheet
คู่มือ Regex"]
    
    A --> B["Metacharacters
ตัวอักขระพิเศษ"]
    A --> C["Quantifiers
ตัวกำหนดปริมาณ"]
    A --> D["Character Classes
คลาสอักขระ"]
    A --> E["Groups
การจัดกลุ่ม"]
    A --> F["Anchors
ตัวยึดตำแหน่ง"]
    
    B --> B1[". ^ $ | [ ] ( ) { } \ + * ?"]
    C --> C1["* + ? {n} {m,n}"]
    D --> D1["\d \w \s \D \W \S"]
    E --> E1["(...) (?:...) (?P<name>...)"]
    F --> F1["\b \B ^ $"]
    
    style A fill:#b16286,stroke:#d3869b,color:#ebdbb2
    style B fill:#458588,stroke:#83a598,color:#ebdbb2
    style C fill:#458588,stroke:#83a598,color:#ebdbb2
    style D fill:#458588,stroke:#83a598,color:#ebdbb2
    style E fill:#458588,stroke:#83a598,color:#ebdbb2
    style F fill:#458588,stroke:#83a598,color:#ebdbb2
    style B1 fill:#689d6a,stroke:#8ec07c,color:#ebdbb2
    style C1 fill:#689d6a,stroke:#8ec07c,color:#ebdbb2
    style D1 fill:#689d6a,stroke:#8ec07c,color:#ebdbb2
    style E1 fill:#689d6a,stroke:#8ec07c,color:#ebdbb2
    style F1 fill:#689d6a,stroke:#8ec07c,color:#ebdbb2

ตาราง Cheat Sheet แบบละเอียด

1. Metacharacters (ตัวอักขระพิเศษ)

อักขระ	ชื่อ	ความหมาย	ตัวอย่าง	จับคู่
`.`	Dot	อักขระใด ๆ (ยกเว้น `\n`)	`a.c`	`abc`, `a9c`, `a c`
`^`	Caret	จุดเริ่มต้นสตริง/บรรทัด	`^Hello`	`Hello World` ✓
`$`	Dollar	จุดสิ้นสุดสตริง/บรรทัด	`end$`	`The end` ✓
`\|`	Pipe	ตัวเลือก (OR)	`cat\|dog`	`cat` หรือ `dog`
`\`	Backslash	Escape อักขระพิเศษ	`\.`	จุด `.` ตามตัวอักษร
`[ ]`	Brackets	ชุดอักขระ	`[aeiou]`	สระตัวใดตัวหนึ่ง
`( )`	Parentheses	กลุ่มและจับภาพ	`(ab)+`	`ab`, `abab`, `ababab`
`{ }`	Braces	จำนวนครั้งที่กำหนด	`a{3}`	`aaa`

2. Quantifiers (ตัวกำหนดปริมาณ)

Quantifier	ชื่อ	จำนวนครั้ง	Greedy	ตัวอย่าง	จับคู่
`*`	Star	0+	Yes	`ab*`	`a`, `ab`, `abb`, `abbb`
`+`	Plus	1+	Yes	`ab+`	`ab`, `abb`, `abbb`
`?`	Question	0 หรือ 1	Yes	`colou?r`	`color`, `colour`
`{n}`	Exact	n ครั้งพอดี	-	`\d{3}`	`123`, `456`
`{m,}`	At Least	m ครั้งขึ้นไป	Yes	`a{2,}`	`aa`, `aaa`, `aaaa`
`{m,n}`	Range	m ถึง n ครั้ง	Yes	`a{2,4}`	`aa`, `aaa`, `aaaa`
`*?`	Lazy Star	0+ (น้อยสุด)	No	`<.*?>`	`<div>` (ไม่ใช่ `<div>...</div>`)
`+?`	Lazy Plus	1+ (น้อยสุด)	No	`.+?`	1 ตัวอักษร
`??`	Lazy Question	0 หรือ 1 (น้อยสุด)	No	`a??`	ไม่จับ `a` ถ้าทำได้

3. Character Classes (คลาสอักขระ)

Class	ชื่อ	จับคู่	เทียบเท่า	ตัวอย่าง
`\d`	Digit	ตัวเลข	`[0-9]`	`0`, `5`, `9`
`\D`	Non-digit	ไม่ใช่ตัวเลข	`[^0-9]`	`a`, `Z`, `@`
`\w`	Word	อักขระคำ	`[a-zA-Z0-9_]`	`a`, `B`, `5`, `_`
`\W`	Non-word	ไม่ใช่อักขระคำ	`[^a-zA-Z0-9_]`	`@`, `!`,
`\s`	Whitespace	ช่องว่าง	`[ \t\n\r\f\v]`	, `\t`, `\n`
`\S`	Non-whitespace	ไม่ใช่ช่องว่าง	`[^ \t\n\r\f\v]`	`a`, `1`, `@`
`[abc]`	Set	ตัวใดตัวหนึ่งในชุด	-	`a`, `b`, `c`
`[^abc]`	Negated Set	ไม่ใช่ตัวในชุด	-	`d`, `1`, `@`
`[a-z]`	Range	ช่วงอักขระ	-	`a` ถึง `z`
`[a-zA-Z0-9]`	Multiple Ranges	หลายช่วง	-	ตัวอักษรและตัวเลข

4. Groups (การจัดกลุ่ม)

รูปแบบ	ชื่อ	คำอธิบาย	ตัวอย่าง	การเข้าถึง
`(...)`	Capturing Group	จับภาพและสร้างกลุ่ม	`(\d{2})/(\d{2})`	`match.group(1)`, `match.group(2)`
`(?:...)`	Non-capturing Group	จัดกลุ่มโดยไม่จับภาพ	`(?:ab)+`	ไม่สร้างกลุ่ม
`(?P<name>...)`	Named Group	กลุ่มที่มีชื่อ	`(?P<year>\d{4})`	`match.group('year')`
`(?P=name)`	Backreference (Named)	อ้างอิงกลุ่มที่มีชื่อ	`(?P<tag>\w+).*(?P=tag)`	จับคู่ tag ซ้ำ
`\1, \2`	Backreference (Numbered)	อ้างอิงกลุ่มด้วยหมายเลข	`(\w+) \1`	`word word`
`(?\|...)`	Conditional	เงื่อนไขตามกลุ่ม	ซับซ้อน	ขึ้นกับเงื่อนไข

5. Anchors และ Boundaries (ตัวยึดตำแหน่ง)

Anchor	ชื่อ	จับคู่ที่	ตัวอย่าง	คำอธิบาย
`^`	Start	จุดเริ่มต้นสตริง	`^Python`	ต้องเริ่มด้วย Python
`$`	End	จุดสิ้นสุดสตริง	`end$`	ต้องลงท้ายด้วย end
`\b`	Word Boundary	ขอบคำ	`\bcat\b`	`cat` เป็นคำเดี่ยว
`\B`	Non-word Boundary	ไม่ใช่ขอบคำ	`\Bcat\B`	`cat` อยู่ในคำ เช่น `concatenate`
`\A`	Start (absolute)	จุดเริ่มต้นสตริงเท่านั้น	`\APython`	เหมือน `^` แต่ไม่สนใจ MULTILINE
`\Z`	End (absolute)	จุดสิ้นสุดสตริงเท่านั้น	`end\Z`	เหมือน `$` แต่ไม่สนใจ MULTILINE

6. Lookahead และ Lookbehind (การมองล่วงหน้า/หลัง)

รูปแบบ	ชื่อ	คำอธิบาย	ตัวอย่าง	จับคู่
`(?=...)`	Positive Lookahead	ตามด้วย...	`\d(?=px)`	`5` ใน `5px`
`(?!...)`	Negative Lookahead	ไม่ตามด้วย...	`\d(?!px)`	`5` ใน `5em`
`(?<=...)`	Positive Lookbehind	นำหน้าด้วย...	`(?<=\$)\d+`	`100` ใน `$100`
`(?<!...)`	Negative Lookbehind	ไม่นำหน้าด้วย...	`(?<!\$)\d+`	`100` ใน `€100`

7. Flags (ตัวเลือกพิเศษ)

Flag	ชื่อเต็ม	คำอธิบาย	ตัวอย่างการใช้
`re.I`	`re.IGNORECASE`	ไม่สนใจตัวพิมพ์	`re.search(r'python', text, re.I)`
`re.M`	`re.MULTILINE`	`^` `$` จับแต่ละบรรทัด	`re.findall(r'^Python', text, re.M)`
`re.S`	`re.DOTALL`	`.` จับรวม `\n`	`re.search(r'<.*>', html, re.S)`
`re.X`	`re.VERBOSE`	อนุญาต comment และช่องว่าง	`re.compile(r'\d{3} # รหัส', re.X)`
`re.A`	`re.ASCII`	จับเฉพาะ ASCII	`re.findall(r'\w+', text, re.A)`
`re.L`	`re.LOCALE`	ใช้ locale setting	น้อยใช้

8. Special Sequences (ลำดับพิเศษ)

ลำดับ	ความหมาย	ตัวอย่าง
`\n`	Newline	ขึ้นบรรทัดใหม่
`\t`	Tab	แท็บ
`\r`	Carriage Return	-
`\f`	Form Feed	-
`\v`	Vertical Tab	-
`\0`	Null	-
`\xHH`	Hex character	`\x41` = `A`
`\uHHHH`	Unicode (16-bit)	`\u0041` = `A`
`\UHHHHHHHH`	Unicode (32-bit)	`\U00000041` = `A`

def comprehensive_cheatsheet_demo():
    """
    ตัวอย่างการใช้งานจาก Cheat Sheet
    
    สาธิตการใช้งานที่หลากหลาย
    """
    # 1. Metacharacters
    print("=== METACHARACTERS ===")
    text1 = "cat and dog, bird or fish"
    print(f"OR (|): {re.findall(r'cat|dog|bird', text1)}")
    print(f"Dot (.): {re.findall(r'c.t', 'cat cut cot')}")
    
    # 2. Quantifiers
    print("\n=== QUANTIFIERS ===")
    print(f"* (0+): {re.findall(r'ab*', 'a ab abb abbb')}")
    print(f"+ (1+): {re.findall(r'ab+', 'a ab abb abbb')}")
    print(f"? (0-1): {re.findall(r'colou?r', 'color colour')}")
    print(f"{{2,4}}: {re.findall(r'a{2,4}', 'a aa aaa aaaa aaaaa')}")
    
    # 3. Character Classes
    print("\n=== CHARACTER CLASSES ===")
    text2 = "Call me at 081-234-5678 or email@test.com"
    print(f"\\d (digits): {re.findall(r'\d+', text2)}")
    print(f"\\w (word): {re.findall(r'\w+', text2)}")
    print(f"[a-z]: {re.findall(r'[a-z]+', text2)}")
    
    # 4. Groups
    print("\n=== GROUPS ===")
    date = "2024-12-25"
    match = re.search(r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})', date)
    print(f"Named groups: {match.groupdict()}")
    
    # 5. Anchors
    print("\n=== ANCHORS ===")
    print(f"^ (start): {bool(re.match(r'^Python', 'Python is great'))}")
    print(f"$ (end): {bool(re.search(r'great$', 'Python is great'))}")
    print(f"\\b (boundary): {re.findall(r'\bcat\b', 'cat cats concatenate')}")
    
    # 6. Lookahead/Lookbehind
    print("\n=== LOOKAHEAD/LOOKBEHIND ===")
    text3 = "Price: $100, €200, ¥300"
    print(f"(?<=\\$): {re.findall(r'(?<=\$)\d+', text3)}")  # หลัง $
    print(f"(?=px): {re.findall(r'\d+(?=px)', '5px 10em 15px')}")  # ก่อน px
    
    # 7. Flags
    print("\n=== FLAGS ===")
    text4 = "Python\nJava\nPython"
    print(f"re.I: {re.findall(r'python', text4, re.I)}")
    print(f"re.M: {len(re.findall(r'^Python', text4, re.M))} matches")

comprehensive_cheatsheet_demo()
# Output: === METACHARACTERS ===
#         OR (|): ['cat', 'dog', 'bird']
#         Dot (.): ['cat', 'cut', 'cot']
#
#         === QUANTIFIERS ===
#         * (0+): ['a', 'ab', 'abb', 'abbb']
#         + (1+): ['ab', 'abb', 'abbb']
#         ? (0-1): ['color', 'colour']
#         {2,4}: ['aa', 'aaa', 'aaaa', 'aaaa']
#
#         === CHARACTER CLASSES ===
#         \d (digits): ['081', '234', '5678']
#         \w (word): ['Call', 'me', 'at', '081', '234', '5678', 'or', 'email', 'test', 'com']
#         [a-z]: ['all', 'me', 'at', 'or', 'email', 'test', 'com']
#
#         === GROUPS ===
#         Named groups: {'year': '2024', 'month': '12', 'day': '25'}
#
#         === ANCHORS ===
#         ^ (start): True
#         $ (end): True
#         \b (boundary): ['cat']
#
#         === LOOKAHEAD/LOOKBEHIND ===
#         (?<=\$): ['100']
#         (?=px): ['5', '15']
#
#         === FLAGS ===
#         re.I: ['Python', 'Python']
#         re.M: 2 matches

9. ประวัติและวิวัฒนาการของ Regular Expression

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#458588','primaryTextColor':'#ebdbb2','primaryBorderColor':'#83a598','lineColor':'#fabd2f','secondaryColor':'#b16286','tertiaryColor':'#689d6a','background':'#282828','mainBkg':'#3c3836','textColor':'#ebdbb2','fontSize':'13px'}}}%%
flowchart TD
    subgraph Era1["ยุค 1940s-1950s
Foundation Era"]
        A["1943: Warren McCulloch
& Walter Pitts
Neural Networks"]
        B["1951: Stephen Kleene
Regular Sets Theory
Kleene Star (*)"]
        A --> B
    end
    
    subgraph Era2["ยุค 1960s-1970s
Implementation Era"]
        C["1968: Ken Thompson
QED Text Editor
First Regex Implementation"]
        D["1970s: grep
Global Regular Expression Print
Unix Command"]
        B --> C
        C --> D
    end
    
    subgraph Era3["ยุค 1980s-1990s
Standardization Era"]
        E["1986: POSIX
Regex Standardization
BRE & ERE"]
        F["1987: Perl
Powerful Regex Engine
PCRE Foundation"]
        G["1997: PCRE
Perl Compatible
Regular Expressions"]
        D --> E
        E --> F
        F --> G
    end
    
    subgraph Era4["ยุค 2000s-Present
Modern Era"]
        H["Python re Module
Standard Library
PCRE-based"]
        I["Unicode Support
Internationalization
\\u, \\U"]
        J["Modern Features
Named Groups
Lookahead/Lookbehind"]
        G --> H
        H --> I
        I --> J
    end
    
    style Era1 fill:#458588,stroke:#83a598,color:#ebdbb2
    style Era2 fill:#b16286,stroke:#d3869b,color:#ebdbb2
    style Era3 fill:#689d6a,stroke:#8ec07c,color:#ebdbb2
    style Era4 fill:#d79921,stroke:#fabd2f,color:#282828

สรุป

Regular Expression (Regex) เป็นเครื่องมือที่ทรงพลังสำหรับการจัดการข้อความในรูปแบบต่าง ๆ โดยโมดูล re ใน Python มีความสามารถครบถ้วนสำหรับการใช้งาน Regex ทั้งการค้นหา การจับคู่ การแทนที่ และการแยกวิเคราะห์ข้อมูล

จุดสำคัญที่ควรจำ:

ใช้ Raw String (r"") เพื่อหลีกเลี่ยงปัญหา Backslash
คอมไพล์ Pattern เมื่อต้องใช้หลายครั้งเพื่อเพิ่มประสิทธิภาพ
ใช้ Named Groups เพื่อให้โค้ดอ่านง่ายและบำรุงรักษาง่าย
ระวัง Greedy Matching ใช้ Non-Greedy (?, *?, +?) เมื่อจำเป็น
ทดสอบ Regex ด้วยเครื่องมือออนไลน์ก่อนนำไปใช้จริง

ข้อควรระวัง:

Regex ไม่เหมาะสำหรับการแยกวิเคราะห์ HTML/XML ที่ซับซ้อน (ควรใช้ BeautifulSoup)
Regex ที่ซับซ้อนเกินไปอาจทำให้โค้ดอ่านยากและบำรุงรักษายาก
ควรเขียน unit tests สำหรับ Regex ที่สำคัญ

การเรียนรู้และฝึกฝน Regex จะช่วยเพิ่มประสิทธิภาพในการประมวลผลข้อความและทำให้คุณสามารถแก้ปัญหาที่เกี่ยวข้องกับ String Manipulation ได้อย่างมีประสิทธิภาพ

เอกสารอ้างอิง

Python Official Documentation - re Module
- https://docs.python.org/3/library/re.html
Regular Expression Info
- https://www.regular-expressions.info/
Regex101 - Online Regex Tester
- https://regex101.com/
Python Regex Tutorial - Real Python
- https://realpython.com/regex-python/
Kleene, S.C. (1956) - "Representation of Events in Nerve Nets and Finite Automata"
Friedl, Jeffrey E.F. (2006) - "Mastering Regular Expressions" (3rd Edition), O'Reilly Media
PCRE - Perl Compatible Regular Expressions
- https://www.pcre.org/
Unicode Regular Expressions
- https://unicode.org/reports/tr18/

หมายเหตุ: เอกสารนี้สร้างขึ้นเพื่อการศึกษาและเป็นคู่มืออ้างอิง สำหรับการใช้งานจริงในโปรเจคที่ซับซ้อน ควรศึกษาเพิ่มเติมจากเอกสารอย่างเป็นทางการและทดสอบอย่างละเอียดก่อนนำไปใช้งาน