5. การจัดการและแปลงข้อมูล (Data Manipulation)

การจัดการและแปลงข้อมูลเป็นหัวใจสำคัญของการทำงานกับ Pandas ขั้นตอนนี้จะช่วยให้เราสามารถปรับเปลี่ยนหน้าตาและโครงสร้างของข้อมูลให้พร้อมสำหรับการวิเคราะห์ขั้นสูง Data Manipulation คือกระบวนการเปลี่ยนแปลง ปรับแต่ง และจัดรูปแบบข้อมูลให้อยู่ในรูปแบบที่เหมาะสมสำหรับการวิเคราะห์และการนำเสนอผลลัพธ์

graph TB
    Start["เริ่มต้น
Raw Data"] --> Clean["ข้อมูลสะอาด
Clean Data"]
    Clean --> Manipulate["การจัดการข้อมูล
Data Manipulation"]
    
    subgraph "Data Manipulation Process"
        Manipulate --> AddDel["เพิ่ม/ลบ Column
Add/Drop Columns"]
        Manipulate --> Apply["ใช้ฟังก์ชัน
Apply Functions"]
        Manipulate --> String["จัดการ String
String Operations"]
        Manipulate --> Sort["เรียงลำดับ
Sorting"]
    end
    
    AddDel --> Ready["ข้อมูลพร้อมวิเคราะห์
Ready for Analysis"]
    Apply --> Ready
    String --> Ready
    Sort --> Ready
    
    style Start fill:#458588,stroke:#1d2021,stroke-width:2px,color:#ebdbb2
    style Clean fill:#98971a,stroke:#1d2021,stroke-width:2px,color:#ebdbb2
    style Manipulate fill:#d79921,stroke:#1d2021,stroke-width:2px,color:#1d2021
    style AddDel fill:#689d6a,stroke:#1d2021,stroke-width:2px,color:#ebdbb2
    style Apply fill:#689d6a,stroke:#1d2021,stroke-width:2px,color:#ebdbb2
    style String fill:#689d6a,stroke:#1d2021,stroke-width:2px,color:#ebdbb2
    style Sort fill:#689d6a,stroke:#1d2021,stroke-width:2px,color:#ebdbb2
    style Ready fill:#b16286,stroke:#1d2021,stroke-width:2px,color:#ebdbb2

5.1 การเพิ่มและลบ Column (Adding and Dropping Columns)

การเพิ่มและลบคอลัมน์เป็นการดำเนินการพื้นฐานที่สำคัญในการจัดการข้อมูล การเพิ่ม Column ช่วยให้เราสามารถสร้างตัวแปรใหม่จากข้อมูลเดิม ในขณะที่การลบ Column ช่วยกำจัดข้อมูลที่ไม่จำเป็นออกจาก DataFrame

5.1.1 การสร้าง Column ใหม่ (Creating New Columns)

การสร้างคอลัมน์ใหม่ใน Pandas สามารถทำได้หลายวิธี โดยวิธีที่พบบ่อยที่สุดคือการกำหนดค่าให้กับชื่อคอลัมน์ใหม่โดยตรง

วิธีการสร้าง Column ใหม่:

การกำหนดค่าโดยตรง - ใช้สำหรับค่าคงที่หรือการคำนวณง่ายๆ
การคำนวณจาก Column เดิม - สร้างค่าใหม่จากการดำเนินการทางคณิตศาสตร์
การใช้ฟังก์ชัน - ประยุกต์ใช้ฟังก์ชันที่ซับซ้อนกับข้อมูล
การใช้เงื่อนไข - สร้างค่าตามเงื่อนไขที่กำหนด

import pandas as pd
import numpy as np

def create_new_columns_example():
    """
    ตัวอย่างการสร้าง Column ใหม่ในรูปแบบต่างๆ
    
    แสดงวิธีการสร้างคอลัมน์ใหม่จากข้อมูลเดิม
    รวมถึงการคำนวณและการใช้เงื่อนไข
    """
    # สร้าง DataFrame ตัวอย่าง
    df = pd.DataFrame({
        'product': ['A', 'B', 'C', 'D', 'E'],
        'price': [100, 150, 200, 120, 180],
        'quantity': [10, 5, 8, 15, 12]
    })
    
    print("DataFrame เดิม:")
    print(df)
    print("\n" + "="*50 + "\n")
    
    # 1. สร้าง Column ใหม่โดยการคำนวณ
    df['total_value'] = df['price'] * df['quantity']
    print("เพิ่ม Column 'total_value' (ราคา × จำนวน):")
    print(df)
    print("\n" + "="*50 + "\n")
    
    # 2. สร้าง Column ด้วยค่าคงที่
    df['currency'] = 'THB'
    print("เพิ่ม Column 'currency' (ค่าคงที่):")
    print(df)
    print("\n" + "="*50 + "\n")
    
    # 3. สร้าง Column ด้วยเงื่อนไข (if-else)
    df['price_category'] = np.where(
        df['price'] >= 150, 
        'Expensive', 
        'Affordable'
    )
    print("เพิ่ม Column 'price_category' (ตามเงื่อนไข):")
    print(df)
    print("\n" + "="*50 + "\n")
    
    # 4. สร้าง Column ด้วยเงื่อนไขหลายระดับ
    conditions = [
        (df['price'] < 130),
        (df['price'] >= 130) & (df['price'] < 180),
        (df['price'] >= 180)
    ]
    choices = ['Low', 'Medium', 'High']
    df['price_tier'] = np.select(conditions, choices, default='Unknown')
    
    print("เพิ่ม Column 'price_tier' (เงื่อนไขหลายระดับ):")
    print(df)
    print("\n" + "="*50 + "\n")
    
    # 5. สร้าง Column จากการรวมข้อมูล
    df['product_info'] = df['product'] + ' (' + df['currency'] + ')'
    print("เพิ่ม Column 'product_info' (รวมข้อมูล):")
    print(df)
    
    return df

# ทดสอบการใช้งาน
if __name__ == "__main__":
    result_df = create_new_columns_example()

คำอธิบายสมการการคำนวณมูลค่ารวม:

V_{total} = P \times Q

โดยที่:

$V_{total}$ = มูลค่ารวม (Total Value)
$P$ = ราคาต่อหน่วย (Price per Unit)
$Q$ = จำนวน (Quantity)

5.1.2 การลบ Column (Dropping Columns)

การลบคอลัมน์ช่วยลดขนาดของ DataFrame และกำจัดข้อมูลที่ไม่จำเป็น ซึ่งช่วยเพิ่มประสิทธิภาพในการประมวลผล

วิธีการลบ Column:

วิธีการ	Syntax	In-place	ใช้เมื่อ
drop()	`df.drop('col', axis=1)`	ไม่ (สร้าง copy ใหม่)	ต้องการเก็บ DataFrame เดิม
drop() in-place	`df.drop('col', axis=1, inplace=True)`	ใช่ (แก้ไขตัวเดิม)	แก้ไข DataFrame โดยตรง
del statement	`del df['col']`	ใช่	ลบคอลัมน์เดียวอย่างรวดเร็ว
pop()	`df.pop('col')`	ใช่	ต้องการค่าที่ลบออกมาด้วย

def drop_columns_example():
    """
    ตัวอย่างการลบ Column ด้วยวิธีต่างๆ
    
    แสดงเทคนิคการลบคอลัมน์ทั้งแบบ in-place
    และแบบสร้าง DataFrame ใหม่
    """
    # สร้าง DataFrame ตัวอย่าง
    df = pd.DataFrame({
        'A': [1, 2, 3],
        'B': [4, 5, 6],
        'C': [7, 8, 9],
        'D': [10, 11, 12],
        'E': [13, 14, 15]
    })
    
    print("DataFrame เดิม:")
    print(df)
    print("\n" + "="*50 + "\n")
    
    # 1. ลบ Column เดียว (สร้าง copy ใหม่)
    df_dropped = df.drop('E', axis=1)
    print("ลบ Column 'E' (สร้าง copy ใหม่):")
    print(df_dropped)
    print("\nDataFrame เดิมยังคงเหมือนเดิม:")
    print(df)
    print("\n" + "="*50 + "\n")
    
    # 2. ลบหลาย Columns พร้อมกัน
    df_dropped_multi = df.drop(['C', 'D'], axis=1)
    print("ลบ Columns 'C' และ 'D':")
    print(df_dropped_multi)
    print("\n" + "="*50 + "\n")
    
    # 3. ลบ Column แบบ in-place
    df_inplace = df.copy()
    df_inplace.drop('B', axis=1, inplace=True)
    print("ลบ Column 'B' แบบ in-place:")
    print(df_inplace)
    print("\n" + "="*50 + "\n")
    
    # 4. ใช้ del statement
    df_del = df.copy()
    del df_del['A']
    print("ลบ Column 'A' ด้วย del:")
    print(df_del)
    print("\n" + "="*50 + "\n")
    
    # 5. ใช้ pop() - ได้ค่ากลับมาด้วย
    df_pop = df.copy()
    removed_column = df_pop.pop('C')
    print("ลบ Column 'C' ด้วย pop():")
    print("DataFrame หลังลบ:")
    print(df_pop)
    print("\nค่าที่ถูกลบออกมา:")
    print(removed_column)
    
    return df_dropped, df_inplace

# ทดสอบการใช้งาน
if __name__ == "__main__":
    result1, result2 = drop_columns_example()

5.1.3 การใช้ assign() สำหรับ Method Chaining

เมธอด assign() เป็นวิธีที่สะดวกในการสร้างหลาย columns พร้อมกันและสามารถใช้ใน method chaining ได้

def assign_method_example():
    """
    ตัวอย่างการใช้ assign() สำหรับการสร้าง Column แบบ Method Chaining
    
    assign() เหมาะสำหรับการสร้างหลาย columns พร้อมกัน
    และรักษา functional programming style
    """
    # สร้าง DataFrame ตัวอย่าง
    df = pd.DataFrame({
        'temperature_celsius': [0, 10, 20, 30, 40],
        'pressure_pascal': [101325, 102000, 103000, 104000, 105000]
    })
    
    print("DataFrame เดิม:")
    print(df)
    print("\n" + "="*50 + "\n")
    
    # ใช้ assign() สร้างหลาย columns
    result = df.assign(
        temperature_fahrenheit=lambda x: (x['temperature_celsius'] * 9/5) + 32,
        temperature_kelvin=lambda x: x['temperature_celsius'] + 273.15,
        pressure_atm=lambda x: x['pressure_pascal'] / 101325,
        pressure_bar=lambda x: x['pressure_pascal'] / 100000
    )
    
    print("DataFrame หลังใช้ assign() (เพิ่ม 4 columns พร้อมกัน):")
    print(result)
    print("\n" + "="*50 + "\n")
    
    # Method chaining ต่อเนื่อง
    result_chained = (df
        .assign(temp_f=lambda x: (x['temperature_celsius'] * 9/5) + 32)
        .assign(temp_category=lambda x: pd.cut(
            x['temp_f'], 
            bins=[0, 32, 50, 68, 86, 200],
            labels=['Freezing', 'Cold', 'Cool', 'Warm', 'Hot']
        ))
        .assign(pressure_category=lambda x: np.where(
            x['pressure_pascal'] > 102000,
            'High Pressure',
            'Normal Pressure'
        ))
    )
    
    print("DataFrame หลัง Method Chaining:")
    print(result_chained)
    
    return result, result_chained

# ทดสอบการใช้งาน
if __name__ == "__main__":
    result1, result2 = assign_method_example()

สมการการแปลงอุณหภูมิ:

T_{F} = \frac{9}{5} T_{C} + 32

T_{K} = T_{C} + 273.15

โดยที่:

$T_{F}$ = อุณหภูมิในหน่วยฟาเรนไฮต์ (Fahrenheit)
$T_{C}$ = อุณหภูมิในหน่วยเซลเซียส (Celsius)
$T_{K}$ = อุณหภูมิในหน่วยเคลวิน (Kelvin)

flowchart LR
    Original["DataFrame เดิม
Original Data"] --> Assign["assign() Method
การกำหนดค่า"]
    
    subgraph "Column Creation Operations"
        Assign --> Col1["Column 1
คอลัมน์ 1"]
        Assign --> Col2["Column 2
คอลัมน์ 2"]
        Assign --> Col3["Column 3
คอลัมน์ 3"]
    end
    
    Col1 --> Result["DataFrame ใหม่
New DataFrame"]
    Col2 --> Result
    Col3 --> Result
    
    style Original fill:#458588,stroke:#1d2021,stroke-width:2px,color:#ebdbb2
    style Assign fill:#d79921,stroke:#1d2021,stroke-width:2px,color:#1d2021
    style Col1 fill:#689d6a,stroke:#1d2021,stroke-width:2px,color:#ebdbb2
    style Col2 fill:#689d6a,stroke:#1d2021,stroke-width:2px,color:#ebdbb2
    style Col3 fill:#689d6a,stroke:#1d2021,stroke-width:2px,color:#ebdbb2
    style Result fill:#b16286,stroke:#1d2021,stroke-width:2px,color:#ebdbb2

5.2 การใช้ฟังก์ชันกับข้อมูล (Applying Functions)

การใช้ฟังก์ชันกับข้อมูลเป็นเทคนิคที่ทรงพลังในการแปลงข้อมูล ช่วยให้เราสามารถประยุกต์ใช้ logic ที่ซับซ้อนกับแต่ละแถวหรือแต่ละคอลัมน์ได้อย่างมีประสิทธิภาพ

5.2.1 การใช้ apply() Method

เมธอด apply() เป็นเครื่องมือหลักในการประยุกต์ฟังก์ชันกับ DataFrame หรือ Series apply() สามารถทำงานได้ทั้งตาม axis=0 (คอลัมน์) และ axis=1 (แถว)

การทำงานของ apply():

axis=0 - ประยุกต์ฟังก์ชันกับแต่ละคอลัมน์ (default)
axis=1 - ประยุกต์ฟังก์ชันกับแต่ละแถว
result_type - กำหนดรูปแบบผลลัพธ์ ('expand', 'reduce', 'broadcast')

def apply_method_example():
    """
    ตัวอย่างการใช้ apply() ในรูปแบบต่างๆ
    
    แสดงการใช้ apply() กับฟังก์ชันปกติ, lambda functions
    และการทำงานกับทั้ง axis=0 และ axis=1
    """
    # สร้าง DataFrame ตัวอย่าง - คะแนนสอบของนักเรียน
    df = pd.DataFrame({
        'student': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
        'math': [85, 92, 78, 95, 88],
        'science': [90, 88, 85, 92, 95],
        'english': [88, 85, 92, 90, 87]
    })
    
    print("DataFrame เดิม (คะแนนสอบ):")
    print(df)
    print("\n" + "="*50 + "\n")
    
    # 1. ใช้ apply() กับ Series (column)
    df['math_grade'] = df['math'].apply(lambda x: 'A' if x >= 90 else ('B' if x >= 80 else 'C'))
    print("เพิ่ม Column 'math_grade' ด้วย lambda function:")
    print(df)
    print("\n" + "="*50 + "\n")
    
    # 2. ใช้ฟังก์ชันที่กำหนดเอง
    def calculate_grade(score):
        """คำนวณเกรดจากคะแนน"""
        if score >= 90:
            return 'A'
        elif score >= 80:
            return 'B'
        elif score >= 70:
            return 'C'
        elif score >= 60:
            return 'D'
        else:
            return 'F'
    
    df['science_grade'] = df['science'].apply(calculate_grade)
    print("เพิ่ม Column 'science_grade' ด้วยฟังก์ชันที่กำหนดเอง:")
    print(df)
    print("\n" + "="*50 + "\n")
    
    # 3. ใช้ apply() กับแถว (axis=1)
    subject_cols = ['math', 'science', 'english']
    df['average'] = df[subject_cols].apply(lambda row: row.mean(), axis=1)
    print("คำนวณค่าเฉลี่ยด้วย apply(axis=1):")
    print(df)
    print("\n" + "="*50 + "\n")
    
    # 4. ใช้ apply() แบบซับซ้อนกับแถว
    def evaluate_student(row):
        """ประเมินผลการเรียนโดยรวม"""
        avg = (row['math'] + row['science'] + row['english']) / 3
        if avg >= 90:
            return f"{row['student']}: Outstanding (ดีเยี่ยม)"
        elif avg >= 80:
            return f"{row['student']}: Good (ดี)"
        else:
            return f"{row['student']}: Pass (ผ่าน)"
    
    df['evaluation'] = df.apply(evaluate_student, axis=1)
    print("ประเมินผลด้วยฟังก์ชันที่ซับซ้อน:")
    print(df[['student', 'average', 'evaluation']])
    print("\n" + "="*50 + "\n")
    
    # 5. ใช้ apply() สร้างหลาย columns พร้อมกัน
    def calculate_stats(row):
        """คำนวณสถิติหลายค่าพร้อมกัน"""
        scores = [row['math'], row['science'], row['english']]
        return pd.Series({
            'max_score': max(scores),
            'min_score': min(scores),
            'score_range': max(scores) - min(scores)
        })
    
    stats_df = df.apply(calculate_stats, axis=1)
    result = pd.concat([df, stats_df], axis=1)
    print("สร้างหลาย columns พร้อมกัน:")
    print(result[['student', 'max_score', 'min_score', 'score_range']])
    
    return df, result

# ทดสอบการใช้งาน
if __name__ == "__main__":
    df_result, full_result = apply_method_example()

สมการการคำนวณค่าเฉลี่ย:

\bar{x} = \frac{1}{n} \sum_{i = 1}^{n} x_{i}

โดยที่:

$\bar{x}$ = ค่าเฉลี่ย (Mean/Average)
$n$ = จำนวนค่าทั้งหมด (Number of Values)
$x_{i}$ = ค่าตัวที่ i (i-th Value)

5.2.2 การใช้ map() และ replace()

เมธอด map() ใช้สำหรับการแปลงค่าใน Series โดยใช้ dictionary หรือ function ส่วน replace() ใช้สำหรับการแทนที่ค่าเฉพาะเจาะจง

def map_replace_example():
    """
    ตัวอย่างการใช้ map() และ replace()
    
    map() - แปลงค่าทั้งหมดตาม mapping ที่กำหนด
    replace() - แทนที่ค่าเฉพาะเจาะจง
    """
    # สร้าง DataFrame ตัวอย่าง - ข้อมูลพนักงาน
    df = pd.DataFrame({
        'employee': ['John', 'Sarah', 'Mike', 'Emma', 'David'],
        'department_code': ['IT', 'HR', 'IT', 'FN', 'HR'],
        'salary_level': [3, 2, 4, 3, 2],
        'status': ['active', 'active', 'inactive', 'active', 'active']
    })
    
    print("DataFrame เดิม:")
    print(df)
    print("\n" + "="*50 + "\n")
    
    # 1. ใช้ map() กับ dictionary
    department_map = {
        'IT': 'Information Technology',
        'HR': 'Human Resources',
        'FN': 'Finance',
        'MK': 'Marketing'
    }
    df['department'] = df['department_code'].map(department_map)
    print("ใช้ map() แปลง department code:")
    print(df)
    print("\n" + "="*50 + "\n")
    
    # 2. ใช้ map() กับ function
    salary_map = {1: 20000, 2: 30000, 3: 45000, 4: 60000, 5: 80000}
    df['salary'] = df['salary_level'].map(salary_map)
    print("ใช้ map() แปลงระดับเงินเดือนเป็นจำนวนเงิน:")
    print(df)
    print("\n" + "="*50 + "\n")
    
    # 3. ใช้ map() กับ lambda
    df['salary_formatted'] = df['salary'].map(lambda x: f'{x:,.0f} บาท')
    print("ใช้ map() กับ lambda จัดรูปแบบเงินเดือน:")
    print(df[['employee', 'salary', 'salary_formatted']])
    print("\n" + "="*50 + "\n")
    
    # 4. ใช้ replace() แทนที่ค่าเดียว
    df_replaced = df.copy()
    df_replaced['status'] = df_replaced['status'].replace('inactive', 'not_working')
    print("ใช้ replace() แทนที่ค่าเดียว:")
    print(df_replaced[['employee', 'status']])
    print("\n" + "="*50 + "\n")
    
    # 5. ใช้ replace() แทนที่หลายค่าพร้อมกัน
    df_replaced2 = df.copy()
    status_mapping = {
        'active': 'กำลังทำงาน',
        'inactive': 'ไม่ได้ทำงาน'
    }
    df_replaced2['status_thai'] = df_replaced2['status'].replace(status_mapping)
    print("ใช้ replace() แทนที่หลายค่า:")
    print(df_replaced2[['employee', 'status', 'status_thai']])
    print("\n" + "="*50 + "\n")
    
    # 6. ใช้ replace() แบบ regex
    df_regex = pd.DataFrame({
        'text': ['test@email.com', 'user@mail.com', 'admin@site.com']
    })
    df_regex['masked_email'] = df_regex['text'].replace(
        r'^(.{2}).*(@.*)$', 
        r'\1***\2', 
        regex=True
    )
    print("ใช้ replace() กับ regex ซ่อนอีเมล:")
    print(df_regex)
    
    return df, df_replaced, df_regex

# ทดสอบการใช้งาน
if __name__ == "__main__":
    df1, df2, df3 = map_replace_example()

5.2.3 การใช้ applymap() สำหรับทั้ง DataFrame

เมธอด applymap() (หรือ map() ใน Pandas เวอร์ชันใหม่) ใช้สำหรับประยุกต์ฟังก์ชันกับทุกเซลล์ใน DataFrame

def applymap_example():
    """
    ตัวอย่างการใช้ applymap() (DataFrame.map() ในเวอร์ชันใหม่)
    
    ประยุกต์ฟังก์ชันกับทุกเซลล์ใน DataFrame
    เหมาะสำหรับการแปลงรูปแบบหรือการทำ data cleaning
    """
    # สร้าง DataFrame ตัวอย่าง - ราคาสินค้าในร้านต่างๆ
    df = pd.DataFrame({
        'Store_A': [100.5, 250.75, 150.25],
        'Store_B': [105.0, 245.50, 155.00],
        'Store_C': [98.75, 255.25, 148.50]
    }, index=['Product_1', 'Product_2', 'Product_3'])
    
    print("DataFrame เดิม (ราคาสินค้า):")
    print(df)
    print("\n" + "="*50 + "\n")
    
    # 1. ใช้ map() (เวอร์ชันใหม่) หรือ applymap() (เวอร์ชันเก่า)
    # ปัดเศษทุกค่าให้เป็นจำนวนเต็ม
    try:
        # สำหรับ Pandas >= 2.1.0
        df_rounded = df.map(lambda x: round(x))
    except AttributeError:
        # สำหรับ Pandas < 2.1.0
        df_rounded = df.applymap(lambda x: round(x))
    
    print("ปัดเศษทุกค่าเป็นจำนวนเต็ม:")
    print(df_rounded)
    print("\n" + "="*50 + "\n")
    
    # 2. จัดรูปแบบเป็นสกุลเงิน
    try:
        df_formatted = df.map(lambda x: f'฿{x:,.2f}')
    except AttributeError:
        df_formatted = df.applymap(lambda x: f'฿{x:,.2f}')
    
    print("จัดรูปแบบเป็นสกุลเงิน:")
    print(df_formatted)
    print("\n" + "="*50 + "\n")
    
    # 3. แปลงเป็นเปอร์เซ็นต์ของราคาสูงสุด
    max_price = df.max().max()
    try:
        df_percentage = df.map(lambda x: f'{(x/max_price)*100:.1f}%')
    except AttributeError:
        df_percentage = df.applymap(lambda x: f'{(x/max_price)*100:.1f}%')
    
    print(f"แปลงเป็นเปอร์เซ็นต์ของราคาสูงสุด ({max_price:.2f}):")
    print(df_percentage)
    print("\n" + "="*50 + "\n")
    
    # 4. ฟังก์ชันที่ซับซ้อนกว่า - หมวดหมู่ราคา
    def price_category(price):
        """กำหนดหมวดหมู่ราคา"""
        if price < 100:
            return '💰 ถูก'
        elif price < 200:
            return '💰💰 ปานกลาง'
        else:
            return '💰💰💰 แพง'
    
    try:
        df_category = df.map(price_category)
    except AttributeError:
        df_category = df.applymap(price_category)
    
    print("แปลงเป็นหมวดหมู่ราคา:")
    print(df_category)
    
    return df, df_formatted, df_category

# ทดสอบการใช้งาน
if __name__ == "__main__":
    df_original, df_format, df_cat = applymap_example()

graph TD
    Input["Input Data
ข้อมูลเริ่มต้น"] --> Choice{"เลือกวิธีการ
Choose Method"}
    
    Choice -->|"1 Column"| Apply["apply()
ใช้กับ Series"]
    Choice -->|"Multiple Rows"| ApplyRow["apply(axis=1)
ใช้กับแต่ละแถว"]
    Choice -->|"Mapping Values"| Map["map()
แปลงค่า"]
    Choice -->|"Replace Specific"| Replace["replace()
แทนที่ค่า"]
    Choice -->|"Entire DataFrame"| ApplyMap["map()/applymap()
ทุกเซลล์"]
    
    Apply --> Output["Output
ผลลัพธ์"]
    ApplyRow --> Output
    Map --> Output
    Replace --> Output
    ApplyMap --> Output
    
    style Input fill:#458588,stroke:#1d2021,stroke-width:2px,color:#ebdbb2
    style Choice fill:#d79921,stroke:#1d2021,stroke-width:2px,color:#1d2021
    style Apply fill:#689d6a,stroke:#1d2021,stroke-width:2px,color:#ebdbb2
    style ApplyRow fill:#689d6a,stroke:#1d2021,stroke-width:2px,color:#ebdbb2
    style Map fill:#689d6a,stroke:#1d2021,stroke-width:2px,color:#ebdbb2
    style Replace fill:#689d6a,stroke:#1d2021,stroke-width:2px,color:#ebdbb2
    style ApplyMap fill:#689d6a,stroke:#1d2021,stroke-width:2px,color:#ebdbb2
    style Output fill:#b16286,stroke:#1d2021,stroke-width:2px,color:#ebdbb2

ตารางเปรียบเทียบ apply(), map(), และ replace():

Method	ใช้กับ	Input Function	ใช้เมื่อ	Performance
apply()	DataFrame/Series	Function	ต้องการ logic ที่ซับซ้อน	ช้ากว่า (แต่ flexible)
map()	Series เท่านั้น	Dict/Function	แปลงค่าแบบ 1:1	เร็ว (สำหรับ dict)
replace()	DataFrame/Series	Dict/Value	แทนที่ค่าเฉพาะเจาะจง	เร็วมาก
applymap()	DataFrame	Function	ต้องการแปลงทุกเซลล์	ช้า (ควรหลีกเลี่ยง)

5.3 การจัดการข้อมูลแบบ String (String Operations)

การจัดการข้อมูลประเภท String เป็นงานที่พบบ่อยในการทำ Data Analysis โดย Pandas มี .str accessor ที่ให้เราเข้าถึงเมธอดสำหรับจัดการ String ได้อย่างสะดวก

5.3.1 String Accessor และการแปลงตัวอักษร

String Accessor (.str) เป็น namespace พิเศษที่ให้เข้าถึงเมธอดสำหรับ String operations ซึ่งคล้ายกับ methods ของ Python string แต่ทำงานกับทั้ง Series

def string_accessor_example():
    """
    ตัวอย่างการใช้ String Accessor และการแปลงตัวอักษร
    
    .str accessor ช่วยให้เราใช้ string methods
    กับทั้ง Series ได้อย่างง่ายดาย
    """
    # สร้าง DataFrame ตัวอย่าง - ข้อมูลลูกค้า
    df = pd.DataFrame({
        'customer_id': [1, 2, 3, 4, 5],
        'name': ['john doe', 'JANE SMITH', 'Bob Wilson', 'alice BROWN', 'Charlie Davis'],
        'email': ['JOHN@EMAIL.COM', 'jane@email.com', 'Bob@Email.Com', 'alice@email.com', 'CHARLIE@EMAIL.COM'],
        'phone': ['081-234-5678', '082-345-6789', '083-456-7890', '084-567-8901', '085-678-9012']
    })
    
    print("DataFrame เดิม:")
    print(df)
    print("\n" + "="*50 + "\n")
    
    # 1. การแปลงตัวอักษร - lower, upper, title, capitalize
    df['name_lower'] = df['name'].str.lower()
    df['name_upper'] = df['name'].str.upper()
    df['name_title'] = df['name'].str.title()
    df['name_capitalize'] = df['name'].str.capitalize()
    
    print("การแปลงตัวอักษร:")
    print(df[['name', 'name_lower', 'name_upper', 'name_title', 'name_capitalize']])
    print("\n" + "="*50 + "\n")
    
    # 2. ทำให้ email เป็นตัวพิมพ์เล็กทั้งหมด (standard practice)
    df['email_cleaned'] = df['email'].str.lower()
    print("ทำความสะอาด email:")
    print(df[['email', 'email_cleaned']])
    print("\n" + "="*50 + "\n")
    
    # 3. ตัดช่องว่างหน้า-หลัง (strip, lstrip, rstrip)
    df_space = pd.DataFrame({
        'text': ['  hello  ', 'world  ', '  python', '  data science  ']
    })
    df_space['stripped'] = df_space['text'].str.strip()
    df_space['lstripped'] = df_space['text'].str.lstrip()
    df_space['rstripped'] = df_space['text'].str.rstrip()
    
    print("การตัดช่องว่าง:")
    print(df_space)
    print("\n" + "="*50 + "\n")
    
    # 4. การแทนที่ข้อความ (replace)
    df['phone_cleaned'] = df['phone'].str.replace('-', '')
    print("ลบเครื่องหมาย - ออกจากเบอร์โทร:")
    print(df[['phone', 'phone_cleaned']])
    print("\n" + "="*50 + "\n")
    
    # 5. ตรวจสอบ case
    df_check = pd.DataFrame({
        'text': ['HELLO', 'hello', 'Hello', 'hElLo']
    })
    df_check['is_upper'] = df_check['text'].str.isupper()
    df_check['is_lower'] = df_check['text'].str.islower()
    df_check['is_title'] = df_check['text'].str.istitle()
    
    print("ตรวจสอบ case:")
    print(df_check)
    
    return df, df_space, df_check

# ทดสอบการใช้งาน
if __name__ == "__main__":
    df1, df2, df3 = string_accessor_example()

5.3.2 การค้นหาและการตรวจสอบ String

การค้นหาและตรวจสอบเป็นการดำเนินการที่สำคัญในการกรองและวิเคราะห์ข้อความ

def string_search_example():
    """
    ตัวอย่างการค้นหาและตรวจสอบ String
    
    ใช้ contains, startswith, endswith, match
    สำหรับการค้นหาและกรองข้อมูล
    """
    # สร้าง DataFrame ตัวอย่าง - รายการสินค้า
    df = pd.DataFrame({
        'product_code': ['ELEC-001', 'FOOD-002', 'ELEC-003', 'CLOTH-004', 'FOOD-005'],
        'product_name': ['Laptop Computer', 'Rice 5kg', 'Smartphone', 'T-Shirt Blue', 'Instant Noodles'],
        'description': ['High performance laptop', 'Premium quality rice', 
                       'Latest smartphone model', 'Cotton t-shirt', 'Spicy flavor noodles']
    })
    
    print("DataFrame เดิม:")
    print(df)
    print("\n" + "="*50 + "\n")
    
    # 1. ค้นหาด้วย contains()
    electronics = df[df['product_code'].str.contains('ELEC')]
    print("สินค้าที่เป็นอิเล็กทรอนิกส์ (ELEC):")
    print(electronics)
    print("\n" + "="*50 + "\n")
    
    # 2. ค้นหาด้วย startswith()
    food_items = df[df['product_code'].str.startswith('FOOD')]
    print("สินค้าที่เป็นอาหาร (เริ่มด้วย FOOD):")
    print(food_items)
    print("\n" + "="*50 + "\n")
    
    # 3. ค้นหาด้วย endswith()
    df['ends_with_s'] = df['product_name'].str.endswith('s')
    print("สินค้าที่ชื่อลงท้ายด้วย 's':")
    print(df[['product_name', 'ends_with_s']])
    print("\n" + "="*50 + "\n")
    
    # 4. ค้นหาด้วย regex pattern
    df['has_number'] = df['product_name'].str.contains(r'\d', regex=True)
    print("สินค้าที่มีตัวเลขในชื่อ:")
    print(df[['product_name', 'has_number']])
    print("\n" + "="*50 + "\n")
    
    # 5. การค้นหาแบบ case-insensitive
    blue_items = df[df['product_name'].str.contains('blue', case=False)]
    print("สินค้าที่มีคำว่า 'blue' (ไม่สนใจตัวพิมพ์เล็ก-ใหญ่):")
    print(blue_items)
    print("\n" + "="*50 + "\n")
    
    # 6. การค้นหาหลายคำพร้อมกัน (OR condition)
    keywords = df[df['description'].str.contains('quality|latest', case=False, regex=True)]
    print("สินค้าที่มี 'quality' หรือ 'latest' ในคำอธิบาย:")
    print(keywords)
    print("\n" + "="*50 + "\n")
    
    # 7. ตรวจสอบความยาว string
    df['name_length'] = df['product_name'].str.len()
    print("ความยาวชื่อสินค้า:")
    print(df[['product_name', 'name_length']])
    print("\n" + "="*50 + "\n")
    
    # 8. ตรวจสอบว่าเป็นตัวเลขหรือตัวอักษร
    test_df = pd.DataFrame({
        'value': ['123', 'abc', '45.67', 'test123', '  ']
    })
    test_df['is_numeric'] = test_df['value'].str.isnumeric()
    test_df['is_alpha'] = test_df['value'].str.isalpha()
    test_df['is_alphanumeric'] = test_df['value'].str.isalnum()
    test_df['is_space'] = test_df['value'].str.isspace()
    
    print("ตรวจสอบประเภทของข้อความ:")
    print(test_df)
    
    return df, test_df

# ทดสอบการใช้งาน
if __name__ == "__main__":
    df_products, df_test = string_search_example()

5.3.3 การแยกและการรวม String

การแยกและรวม String เป็นเทคนิคสำคัญในการจัดการข้อมูลที่มาในรูปแบบข้อความยาวๆ

def string_split_combine_example():
    """
    ตัวอย่างการแยกและรวม String
    
    ใช้ split, rsplit, join สำหรับ
    การแยกข้อความและการรวมข้อความ
    """
    # สร้าง DataFrame ตัวอย่าง - ข้อมูลบุคคล
    df = pd.DataFrame({
        'full_name': ['John Michael Doe', 'Jane Elizabeth Smith', 'Robert James Wilson'],
        'address': ['123 Main St, New York, NY 10001', 
                   '456 Oak Ave, Los Angeles, CA 90001',
                   '789 Pine Rd, Chicago, IL 60601'],
        'tags': ['premium,loyal,active', 'new,trial', 'premium,inactive']
    })
    
    print("DataFrame เดิม:")
    print(df)
    print("\n" + "="*50 + "\n")
    
    # 1. แยกชื่อเป็น first, middle, last name
    name_parts = df['full_name'].str.split(' ', expand=True)
    name_parts.columns = ['first_name', 'middle_name', 'last_name']
    df = pd.concat([df, name_parts], axis=1)
    
    print("แยกชื่อเต็มเป็น first, middle, last:")
    print(df[['full_name', 'first_name', 'middle_name', 'last_name']])
    print("\n" + "="*50 + "\n")
    
    # 2. แยกที่อยู่ด้วย comma
    address_parts = df['address'].str.split(', ', expand=True)
    address_parts.columns = ['street', 'city', 'state_zip']
    df = pd.concat([df, address_parts], axis=1)
    
    print("แยกที่อยู่:")
    print(df[['address', 'street', 'city', 'state_zip']])
    print("\n" + "="*50 + "\n")
    
    # 3. แยก state และ zip code
    state_zip = df['state_zip'].str.split(' ', expand=True)
    state_zip.columns = ['state', 'zip']
    df = pd.concat([df, state_zip], axis=1)
    
    print("แยก state และ zip code:")
    print(df[['state_zip', 'state', 'zip']])
    print("\n" + "="*50 + "\n")
    
    # 4. แยก tags เป็น list
    df['tags_list'] = df['tags'].str.split(',')
    print("แยก tags เป็น list:")
    print(df[['tags', 'tags_list']])
    print("\n" + "="*50 + "\n")
    
    # 5. ดึงเฉพาะส่วนที่ต้องการ (indexing หลัง split)
    df['first_tag'] = df['tags'].str.split(',').str[0]
    df['last_tag'] = df['tags'].str.split(',').str[-1]
    
    print("ดึง tag แรกและสุดท้าย:")
    print(df[['tags', 'first_tag', 'last_tag']])
    print("\n" + "="*50 + "\n")
    
    # 6. รวม string ด้วย cat() หรือ join
    df['short_address'] = df['city'].str.cat(df['state'], sep=', ')
    print("รวม city และ state:")
    print(df[['city', 'state', 'short_address']])
    print("\n" + "="*50 + "\n")
    
    # 7. รวม string จากหลาย columns
    df['formatted_name'] = (df['first_name'] + ' ' + 
                           df['last_name'].str.upper())
    print("จัดรูปแบบชื่อใหม่:")
    print(df[['first_name', 'last_name', 'formatted_name']])
    print("\n" + "="*50 + "\n")
    
    # 8. ใช้ str.get() เพื่อดึงตัวอักษรตำแหน่งที่ต้องการ
    df['initial'] = (df['first_name'].str[0] + 
                    df['middle_name'].str[0] + 
                    df['last_name'].str[0])
    print("สร้างชื่อย่อ (initials):")
    print(df[['full_name', 'initial']])
    
    return df

# ทดสอบการใช้งาน
if __name__ == "__main__":
    result_df = string_split_combine_example()

5.3.4 การจัดการ String ขั้นสูงด้วย Regex

Regular Expression (Regex) เป็นเครื่องมือที่ทรงพลังสำหรับการค้นหาและจัดการ pattern ของข้อความที่ซับซ้อน

import re

def string_regex_example():
    """
    ตัวอย่างการใช้ Regular Expression กับ String
    
    ใช้ extract, findall, replace กับ regex pattern
    สำหรับการจัดการข้อความที่ซับซ้อน
    """
    # สร้าง DataFrame ตัวอย่าง - ข้อความที่มี pattern
    df = pd.DataFrame({
        'text': [
            'Contact: john@email.com, Phone: 081-234-5678',
            'Email me at jane.smith@company.co.th or call 082-345-6789',
            'Reach out: bob_wilson@mail.com, Mobile: 083-456-7890',
            'My email is alice.brown@domain.com and phone is 084-567-8901'
        ]
    })
    
    print("DataFrame เดิม:")
    print(df)
    print("\n" + "="*50 + "\n")
    
    # 1. ดึง email ด้วย regex
    df['email'] = df['text'].str.extract(r'([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})')
    print("ดึง email address:")
    print(df[['text', 'email']])
    print("\n" + "="*50 + "\n")
    
    # 2. ดึงเบอร์โทรศัพท์
    df['phone'] = df['text'].str.extract(r'(\d{3}-\d{3}-\d{4})')
    print("ดึงเบอร์โทรศัพท์:")
    print(df[['text', 'phone']])
    print("\n" + "="*50 + "\n")
    
    # 3. ดึงทุก email ที่เจอ (findall)
    df['all_emails'] = df['text'].str.findall(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}')
    print("ดึงทุก email (findall):")
    print(df[['all_emails']])
    print("\n" + "="*50 + "\n")
    
    # 4. แทนที่ email ด้วย [REDACTED]
    df['masked_text'] = df['text'].str.replace(
        r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}',
        '[EMAIL]',
        regex=True
    )
    print("ซ่อน email address:")
    print(df[['masked_text']])
    print("\n" + "="*50 + "\n")
    
    # 5. ตัวอย่างการทำงานกับราคา
    price_df = pd.DataFrame({
        'description': [
            'Product costs $29.99 only',
            'Price: THB 1,250.50',
            'Special offer $15.00',
            'Regular price: $99.95'
        ]
    })
    
    # ดึงตัวเลขที่เป็นราคา (รวม comma และ decimal)
    price_df['price_raw'] = price_df['description'].str.extract(r'([\d,]+\.?\d*)')
    # ลบ comma ออก
    price_df['price_numeric'] = price_df['price_raw'].str.replace(',', '').astype(float)
    
    print("ดึงราคาจากข้อความ:")
    print(price_df)
    print("\n" + "="*50 + "\n")
    
    # 6. ดึงหลาย groups พร้อมกัน
    contact_df = pd.DataFrame({
        'info': [
            'Name: John Doe, Age: 30',
            'Name: Jane Smith, Age: 25',
            'Name: Bob Wilson, Age: 35'
        ]
    })
    
    contact_parts = contact_df['info'].str.extract(r'Name: ([^,]+), Age: (\d+)')
    contact_parts.columns = ['name', 'age']
    contact_df = pd.concat([contact_df, contact_parts], axis=1)
    
    print("ดึงหลาย groups (name และ age):")
    print(contact_df)
    print("\n" + "="*50 + "\n")
    
    # 7. การตรวจสอบ pattern ด้วย match()
    email_list = pd.DataFrame({
        'email': ['valid@email.com', 'invalid@', 'another.valid@domain.co.th', '@notvalid.com']
    })
    
    # ตรวจสอบว่าเป็น email ที่ถูกต้อง
    email_list['is_valid'] = email_list['email'].str.match(
        r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    )
    
    print("ตรวจสอบความถูกต้องของ email:")
    print(email_list)
    
    return df, price_df, contact_df, email_list

# ทดสอบการใช้งาน
if __name__ == "__main__":
    df1, df2, df3, df4 = string_regex_example()

Regex Patterns ที่ใช้บ่อย:

Pattern	ความหมาย	ตัวอย่าง
`\d`	ตัวเลข 0-9	`\d{3}` = 123
`\w`	ตัวอักษรและตัวเลข	`\w+` = hello123
`\s`	ช่องว่าง (space, tab)	`\s+` = หลายช่องว่าง
`^`	เริ่มต้นของ string	`^Hello` = เริ่มด้วย Hello
`$`	ท้ายของ string	`world$` = ลงท้ายด้วย world
`.`	ตัวอักษรใดๆ	`a.c` = abc, adc
`*`	0 หรือมากกว่า	`ab*` = a, ab, abb
`+`	1 หรือมากกว่า	`ab+` = ab, abb
`?`	0 หรือ 1	`colou?r` = color, colour
`[]`	ชุดของตัวอักษร	`[abc]` = a, b, หรือ c
`	`	OR

graph TB
    Input["Input Text
ข้อความต้นฉบับ"] --> StringOps["String Operations
การดำเนินการ String"]
    
    subgraph "String Methods"
        StringOps --> Basic["Basic Operations
upper/lower/strip"]
        StringOps --> Search["Search & Check
contains/startswith"]
        StringOps --> Split["Split & Join
แยก/รวม"]
        StringOps --> Regex["Regex Operations
extract/replace"]
    end
    
    Basic --> Clean["Cleaned Text
ข้อความสะอาด"]
    Search --> Filter["Filtered Data
ข้อมูลที่กรอง"]
    Split --> Parts["Text Parts
ส่วนย่อยของข้อความ"]
    Regex --> Extracted["Extracted Info
ข้อมูลที่ดึงออกมา"]
    
    Clean --> Output["Output
ผลลัพธ์"]
    Filter --> Output
    Parts --> Output
    Extracted --> Output
    
    style Input fill:#458588,stroke:#1d2021,stroke-width:2px,color:#ebdbb2
    style StringOps fill:#d79921,stroke:#1d2021,stroke-width:2px,color:#1d2021
    style Basic fill:#689d6a,stroke:#1d2021,stroke-width:2px,color:#ebdbb2
    style Search fill:#689d6a,stroke:#1d2021,stroke-width:2px,color:#ebdbb2
    style Split fill:#689d6a,stroke:#1d2021,stroke-width:2px,color:#ebdbb2
    style Regex fill:#689d6a,stroke:#1d2021,stroke-width:2px,color:#ebdbb2
    style Clean fill:#b8bb26,stroke:#1d2021,stroke-width:2px,color:#1d2021
    style Filter fill:#b8bb26,stroke:#1d2021,stroke-width:2px,color:#1d2021
    style Parts fill:#b8bb26,stroke:#1d2021,stroke-width:2px,color:#1d2021
    style Extracted fill:#b8bb26,stroke:#1d2021,stroke-width:2px,color:#1d2021
    style Output fill:#b16286,stroke:#1d2021,stroke-width:2px,color:#ebdbb2

5.4 การเรียงลำดับข้อมูล (Sorting)

การเรียงลำดับข้อมูลเป็นขั้นตอนสำคัญในการจัดระเบียบและวิเคราะห์ข้อมูล Sorting ช่วยให้เราสามารถมองเห็นรูปแบบของข้อมูล ค้นหาค่าสูงสุด-ต่ำสุด และจัดเตรียมข้อมูลสำหรับการวิเคราะห์ขั้นต่อไป

5.4.1 การเรียงลำดับตาม Values (sort_values)

เมธอด sort_values() ใช้สำหรับเรียงลำดับข้อมูลตามค่าใน columns ที่เราต้องการ

def sort_values_example():
    """
    ตัวอย่างการใช้ sort_values() เรียงลำดับตามค่า
    
    เรียงลำดับตาม column เดียวหรือหลาย columns
    พร้อมกำหนด ascending/descending
    """
    # สร้าง DataFrame ตัวอย่าง - ข้อมูลนักเรียน
    df = pd.DataFrame({
        'student_id': [101, 102, 103, 104, 105, 106, 107, 108],
        'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Grace', 'Henry'],
        'class': ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B'],
        'math_score': [85, 92, 78, 88, 95, 82, 90, 87],
        'science_score': [90, 85, 88, 92, 88, 90, 85, 95],
        'attendance': [95, 88, 92, 85, 98, 90, 87, 93]
    })
    
    print("DataFrame เดิม:")
    print(df)
    print("\n" + "="*50 + "\n")
    
    # 1. เรียงตาม column เดียว (ascending - น้อยไปมาก)
    sorted_math = df.sort_values('math_score')
    print("เรียงตามคะแนนคณิตศาสตร์ (น้อย -> มาก):")
    print(sorted_math[['name', 'math_score']])
    print("\n" + "="*50 + "\n")
    
    # 2. เรียงแบบ descending (มากไปน้อย)
    sorted_math_desc = df.sort_values('math_score', ascending=False)
    print("เรียงตามคะแนนคณิตศาสตร์ (มาก -> น้อย):")
    print(sorted_math_desc[['name', 'math_score']])
    print("\n" + "="*50 + "\n")
    
    # 3. เรียงตามหลาย columns
    sorted_multi = df.sort_values(['class', 'math_score'], ascending=[True, False])
    print("เรียงตามห้องเรียน (A->B) แล้วตามคะแนนคณิตศาสตร์ (มาก->น้อย):")
    print(sorted_multi[['name', 'class', 'math_score']])
    print("\n" + "="*50 + "\n")
    
    # 4. เรียงและ reset index
    sorted_reset = df.sort_values('attendance', ascending=False).reset_index(drop=True)
    print("เรียงตามเปอร์เซ็นต์การเข้าเรียน และ reset index:")
    print(sorted_reset[['name', 'attendance']])
    print("\n" + "="*50 + "\n")
    
    # 5. เรียงแบบ in-place (แก้ไข DataFrame เดิม)
    df_copy = df.copy()
    df_copy.sort_values('name', inplace=True)
    print("เรียงตามชื่อแบบ in-place:")
    print(df_copy[['name', 'math_score']])
    print("\n" + "="*50 + "\n")
    
    # 6. การจัดการกับ NaN values
    df_with_nan = pd.DataFrame({
        'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
        'score': [85, np.nan, 92, 78, np.nan]
    })
    
    # NaN จะอยู่ท้ายเสมอ (default)
    sorted_nan_last = df_with_nan.sort_values('score')
    print("เรียงโดย NaN อยู่ท้าย (default):")
    print(sorted_nan_last)
    print("\n" + "="*50 + "\n")
    
    # กำหนดให้ NaN อยู่หน้า
    sorted_nan_first = df_with_nan.sort_values('score', na_position='first')
    print("เรียงโดย NaN อยู่หน้า:")
    print(sorted_nan_first)
    print("\n" + "="*50 + "\n")
    
    # 7. เรียงตาม computed column
    df['total_score'] = df['math_score'] + df['science_score']
    sorted_total = df.sort_values('total_score', ascending=False)
    print("เรียงตามคะแนนรวม:")
    print(sorted_total[['name', 'math_score', 'science_score', 'total_score']])
    
    return df, sorted_multi, sorted_total

# ทดสอบการใช้งาน
if __name__ == "__main__":
    df_original, df_multi, df_total = sort_values_example()

สมการการคำนวณคะแนนรวม:

S_{total} = \sum_{i = 1}^{n} S_{i}

โดยที่:

$S_{total}$ = คะแนนรวมทั้งหมด (Total Score)
$S_{i}$ = คะแนนวิชาที่ i (Score in subject i)
$n$ = จำนวนวิชาทั้งหมด (Number of subjects)

5.4.2 การเรียงลำดับตาม Index (sort_index)

เมธอด sort_index() ใช้สำหรับเรียงลำดับตาม index ของ DataFrame ซึ่งเป็นประโยชน์เมื่อ index มีความหมาย เช่น วันที่หรือรหัสที่มีลำดับ

def sort_index_example():
    """
    ตัวอย่างการใช้ sort_index() เรียงลำดับตาม index
    
    เรียงลำดับตาม row index หรือ column index
    เหมาะสำหรับข้อมูลที่ index มีความสำคัญ
    """
    # สร้าง DataFrame ตัวอย่าง - ยอดขายรายเดือน
    df = pd.DataFrame({
        'product_A': [100, 150, 120, 180, 200],
        'product_B': [90, 110, 95, 130, 140],
        'product_C': [120, 140, 100, 160, 180]
    }, index=['May', 'Jan', 'Dec', 'Feb', 'Mar'])
    
    print("DataFrame เดิม (index ไม่เรียง):")
    print(df)
    print("\n" + "="*50 + "\n")
    
    # 1. เรียงตาม row index
    sorted_index = df.sort_index()
    print("เรียงตาม row index (ตามตัวอักษร):")
    print(sorted_index)
    print("\n" + "="*50 + "\n")
    
    # 2. เรียงตาม row index แบบ descending
    sorted_index_desc = df.sort_index(ascending=False)
    print("เรียงตาม row index (descending):")
    print(sorted_index_desc)
    print("\n" + "="*50 + "\n")
    
    # 3. เรียงตาม column index (axis=1)
    sorted_columns = df.sort_index(axis=1)
    print("เรียงตาม column names:")
    print(sorted_columns)
    print("\n" + "="*50 + "\n")
    
    # 4. ตัวอย่างกับ DatetimeIndex
    date_df = pd.DataFrame({
        'sales': [100, 150, 120, 180, 200, 160, 190],
        'orders': [10, 15, 12, 18, 20, 16, 19]
    }, index=pd.date_range('2024-01-15', periods=7, freq='D'))
    
    # สับลำดับ
    date_df = date_df.sample(frac=1, random_state=42)
    
    print("DataFrame ที่มี DatetimeIndex (ไม่เรียง):")
    print(date_df)
    print("\n" + "="*50 + "\n")
    
    # เรียงตามวันที่
    date_sorted = date_df.sort_index()
    print("เรียงตามวันที่:")
    print(date_sorted)
    print("\n" + "="*50 + "\n")
    
    # 5. ตัวอย่างกับ MultiIndex
    multi_idx = pd.MultiIndex.from_tuples([
        ('Q2', 'Apr'),
        ('Q1', 'Jan'),
        ('Q2', 'May'),
        ('Q1', 'Feb'),
        ('Q1', 'Mar'),
        ('Q2', 'Jun')
    ], names=['Quarter', 'Month'])
    
    multi_df = pd.DataFrame({
        'revenue': [1000, 800, 1100, 850, 900, 1200]
    }, index=multi_idx)
    
    print("DataFrame ที่มี MultiIndex (ไม่เรียง):")
    print(multi_df)
    print("\n" + "="*50 + "\n")
    
    # เรียงตาม level 0 (Quarter) แล้ว level 1 (Month)
    multi_sorted = multi_df.sort_index()
    print("เรียงตาม MultiIndex:")
    print(multi_sorted)
    print("\n" + "="*50 + "\n")
    
    # เรียงเฉพาะ level 1 (Month)
    multi_sorted_level1 = multi_df.sort_index(level=1)
    print("เรียงตาม level 1 (Month) เท่านั้น:")
    print(multi_sorted_level1)
    
    return df, sorted_index, date_sorted, multi_sorted

# ทดสอบการใช้งาน
if __name__ == "__main__":
    df1, df2, df3, df4 = sort_index_example()

5.4.3 การเรียงลำดับขั้นสูงและเทคนิคพิเศษ

มีเทคนิคการเรียงลำดับขั้นสูงที่สามารถใช้ในสถานการณ์ที่ซับซ้อนมากขึ้น

def advanced_sorting_example():
    """
    ตัวอย่างการเรียงลำดับขั้นสูง
    
    การเรียงด้วย key function, categorical sort
    และเทคนิคพิเศษอื่นๆ
    """
    # สร้าง DataFrame ตัวอย่าง - ข้อมูลพนักงาน
    df = pd.DataFrame({
        'employee': ['John Smith', 'jane doe', 'BOB WILSON', 'Alice Brown', 'charlie DAVIS'],
        'department': ['IT', 'HR', 'IT', 'Finance', 'HR'],
        'salary': [50000, 45000, 55000, 60000, 48000],
        'experience': ['5 years', '2 years', '10 years', '8 years', '3 years'],
        'level': ['Senior', 'Junior', 'Senior', 'Manager', 'Junior']
    })
    
    print("DataFrame เดิม:")
    print(df)
    print("\n" + "="*50 + "\n")
    
    # 1. เรียงตามความยาวของชื่อ (ใช้ key parameter)
    sorted_by_length = df.sort_values('employee', key=lambda x: x.str.len())
    print("เรียงตามความยาวของชื่อ:")
    print(sorted_by_length[['employee', 'salary']])
    print("\n" + "="*50 + "\n")
    
    # 2. เรียงแบบ case-insensitive
    sorted_case_insensitive = df.sort_values(
        'employee', 
        key=lambda x: x.str.lower()
    )
    print("เรียงชื่อแบบไม่สนใจตัวพิมพ์เล็ก-ใหญ่:")
    print(sorted_case_insensitive[['employee', 'department']])
    print("\n" + "="*50 + "\n")
    
    # 3. เรียงตามลำดับที่กำหนดเอง (Categorical)
    level_order = pd.CategoricalDtype(
        categories=['Junior', 'Senior', 'Manager'],
        ordered=True
    )
    df['level_cat'] = df['level'].astype(level_order)
    sorted_categorical = df.sort_values('level_cat')
    
    print("เรียงตามตำแหน่ง (Junior -> Senior -> Manager):")
    print(sorted_categorical[['employee', 'level', 'salary']])
    print("\n" + "="*50 + "\n")
    
    # 4. เรียงตามค่าที่แปลงแล้ว (แยกตัวเลขจาก string)
    def extract_years(exp_str):
        """แยกตัวเลขปีจาก string"""
        return int(exp_str.split()[0])
    
    sorted_experience = df.sort_values(
        'experience',
        key=lambda x: x.apply(extract_years)
    )
    print("เรียงตามประสบการณ์ (แยกจากข้อความ):")
    print(sorted_experience[['employee', 'experience', 'level']])
    print("\n" + "="*50 + "\n")
    
    # 5. เรียงด้วยลำดับความสำคัญที่ซับซ้อน
    # เรียงตาม level (categorical) แล้วตาม salary (descending)
    sorted_complex = df.sort_values(
        ['level_cat', 'salary'],
        ascending=[True, False]
    )
    print("เรียงตาม level แล้วตาม salary (สูง->ต่ำ):")
    print(sorted_complex[['employee', 'level', 'salary']])
    print("\n" + "="*50 + "\n")
    
    # 6. การใช้ nlargest และ nsmallest
    top3_salary = df.nlargest(3, 'salary')
    print("Top 3 เงินเดือนสูงสุด:")
    print(top3_salary[['employee', 'salary']])
    print("\n" + "="*50 + "\n")
    
    bottom2_salary = df.nsmallest(2, 'salary')
    print("Bottom 2 เงินเดือนต่ำสุด:")
    print(bottom2_salary[['employee', 'salary']])
    print("\n" + "="*50 + "\n")
    
    # 7. การ rank ข้อมูล
    df['salary_rank'] = df['salary'].rank(ascending=False)
    df['experience_years'] = df['experience'].apply(extract_years)
    df['exp_rank'] = df['experience_years'].rank(ascending=False)
    
    print("เพิ่ม ranking columns:")
    print(df[['employee', 'salary', 'salary_rank', 'experience_years', 'exp_rank']])
    
    return df, sorted_categorical, top3_salary

# ทดสอบการใช้งาน
if __name__ == "__main__":
    df_adv, df_cat, df_top = advanced_sorting_example()

ตารางเปรียบเทียบ Sorting Methods:

Method	ใช้เมื่อ	Parameters สำคัญ	Performance
sort_values()	เรียงตามค่าใน columns	by, ascending, key	O(n log n)
sort_index()	เรียงตาม index	axis, level, ascending	O(n log n)
nlargest()	หา n ค่าที่ใหญ่ที่สุด	n, columns	O(n) better than sort
nsmallest()	หา n ค่าที่เล็กที่สุด	n, columns	O(n) better than sort
rank()	สร้าง ranking	method, ascending	O(n log n)

graph TB
    Data["Unsorted Data
ข้อมูลไม่เรียง"] --> Decision{"เลือกวิธีเรียง
Sorting Method"}
    
    Decision -->|"ตาม Values"| SortVal["sort_values()
เรียงตามค่า"]
    Decision -->|"ตาม Index"| SortIdx["sort_index()
เรียงตาม index"]
    Decision -->|"หา Top N"| NLarge["nlargest()
ค่าสูงสุด N อันดับ"]
    Decision -->|"หา Bottom N"| NSmall["nsmallest()
ค่าต่ำสุด N อันดับ"]
    
    subgraph "Sorting Options"
        SortVal --> Asc["Ascending
น้อย -> มาก"]
        SortVal --> Desc["Descending
มาก -> น้อย"]
        SortVal --> Multi["Multi-column
หลายคอลัมน์"]
        SortVal --> Key["With Key Function
ใช้ฟังก์ชันแปลงค่า"]
    end
    
    Asc --> Result["Sorted Data
ข้อมูลที่เรียงแล้ว"]
    Desc --> Result
    Multi --> Result
    Key --> Result
    SortIdx --> Result
    NLarge --> Result
    NSmall --> Result
    
    style Data fill:#458588,stroke:#1d2021,stroke-width:2px,color:#ebdbb2
    style Decision fill:#d79921,stroke:#1d2021,stroke-width:2px,color:#1d2021
    style SortVal fill:#689d6a,stroke:#1d2021,stroke-width:2px,color:#ebdbb2
    style SortIdx fill:#689d6a,stroke:#1d2021,stroke-width:2px,color:#ebdbb2
    style NLarge fill:#689d6a,stroke:#1d2021,stroke-width:2px,color:#ebdbb2
    style NSmall fill:#689d6a,stroke:#1d2021,stroke-width:2px,color:#ebdbb2
    style Asc fill:#b8bb26,stroke:#1d2021,stroke-width:2px,color:#1d2021
    style Desc fill:#b8bb26,stroke:#1d2021,stroke-width:2px,color:#1d2021
    style Multi fill:#b8bb26,stroke:#1d2021,stroke-width:2px,color:#1d2021
    style Key fill:#b8bb26,stroke:#1d2021,stroke-width:2px,color:#1d2021
    style Result fill:#b16286,stroke:#1d2021,stroke-width:2px,color:#ebdbb2

สรุป (Summary)

การจัดการและแปลงข้อมูลเป็นทักษะที่สำคัญที่สุดในการทำงานกับ Pandas ในบทนี้เราได้เรียนรู้เทคนิคต่างๆ ที่จะช่วยให้เราสามารถปรับเปลี่ยนและจัดการข้อมูลได้อย่างมีประสิทธิภาพ

ประเด็นสำคัญที่ต้องจำ:

การเพิ่มและลบ Columns - ใช้การกำหนดค่าโดยตรง, assign(), drop(), หรือ del ตามความเหมาะสม
การใช้ฟังก์ชัน - เลือกใช้ apply(), map(), หรือ replace() ตามลักษณะของงาน
String Operations - ใช้ .str accessor สำหรับการจัดการข้อความอย่างมีประสิทธิภาพ
การเรียงลำดับ - ใช้ sort_values() และ sort_index() เพื่อจัดระเบียบข้อมูล

Best Practices:

เลือกใช้เมธอดที่เหมาะสมกับงาน เพื่อความชัดเจนและประสิทธิภาพ
ใช้ method chaining เมื่อทำงานหลายขั้นตอนต่อเนื่องกัน
ระวังการใช้ inplace=True เพราะจะแก้ไข DataFrame เดิม
ทดสอบโค้ดกับข้อมูลจำนวนเล็กก่อนรันกับข้อมูลใหญ่
ใช้ vectorized operations แทน loops เมื่อเป็นไปได้

การเตรียมตัวสำหรับบทต่อไป:

หลังจากที่เราสามารถจัดการและแปลงข้อมูลได้แล้ว บทถัดไปเราจะเรียนรู้เกี่ยวกับ การรวบรวมและจัดกลุ่มข้อมูล (Aggregation & Grouping) ซึ่งจะช่วยให้เราสามารถสรุปและวิเคราะห์ข้อมูลในระดับที่ลึกขึ้น โดยใช้เทคนิคอย่าง groupby(), agg(), และ pivot_table()

เอกสารอ้างอิง (References)

เอกสารทางการของ Pandas:

Pandas User Guide - Working with Text Data: https://pandas.pydata.org/docs/user_guide/text.html
Pandas API Reference - DataFrame.apply(): https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html
Pandas API Reference - DataFrame.sort_values(): https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_values.html

หนังสือและบทความแนะนำ:

"Python for Data Analysis" โดย Wes McKinney (ผู้สร้าง Pandas)
"Pandas Cookbook" โดย Theodore Petrou

แหล่งเรียนรู้เพิ่มเติม:

Real Python - Pandas String Methods: https://realpython.com/python-pandas-tricks/
Towards Data Science - Advanced Pandas Techniques
Stack Overflow - Pandas Tag: https://stackoverflow.com/questions/tagged/pandas

Regular Expression Resources:

Python re module documentation: https://docs.python.org/3/library/re.html
RegexOne - Interactive Regex Tutorial: https://regexone.com/
Regex101 - Online Regex Tester: https://regex101.com/

หมายเหตุ: เอกสารนี้ใช้ Pandas version 2.0+ ซึ่งมีการเปลี่ยนแปลงบางอย่างจากเวอร์ชันเก่า เช่น การใช้ map() แทน applymap() สำหรับ DataFrame-wide operations