9. การส่งออกและการแสดงผล (Exporting & Visualization)

หลังจากที่เราทำการวิเคราะห์ข้อมูลด้วย Pandas แล้ว ขั้นตอนสุดท้ายที่สำคัญคือการ นำเสนอผลลัพธ์ (Present Results) และ บันทึกข้อมูล (Save Data) เพื่อแชร์กับผู้อื่นหรือนำไปใช้งานต่อ ใน Chapter นี้เราจะเรียนรู้ทั้งการสร้างกราฟสวยงามและการส่งออกข้อมูลในรูปแบบต่างๆ

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#d79921','primaryTextColor':'#3c3836','primaryBorderColor':'#98971a','lineColor':'#458588','secondaryColor':'#b16286','tertiaryColor':'#689d6a','background':'#fbf1c7','mainBkg':'#fbf1c7','secondBkg':'#ebdbb2'}}}%%
graph TB
    A["DataFrame
ข้อมูลที่วิเคราะห์แล้ว"] --> B["Visualization
การแสดงผล"]
    A --> C["Export
การส่งออก"]
    
    B --> B1["Built-in Plots
กราฟในตัว"]
    B --> B2["Matplotlib
ปรับแต่งขั้นสูง"]
    B --> B3["Seaborn
กราฟสวยงาม"]
    
    C --> C1["CSV/Excel
ตารางข้อมูล"]
    C --> C2["JSON/HTML
รูปแบบอื่นๆ"]
    C --> C3["Database
ฐานข้อมูล"]
    
    style A fill:#d79921,stroke:#3c3836,stroke-width:2px,color:#3c3836
    style B fill:#458588,stroke:#3c3836,stroke-width:2px,color:#fbf1c7
    style C fill:#b16286,stroke:#3c3836,stroke-width:2px,color:#fbf1c7

9.1 การพล็อตกราฟเบื้องต้นด้วย Pandas (Basic Plotting)

Pandas มีความสามารถในการสร้างกราฟ (Plotting) ที่ ฝังอยู่ภายใน (Built-in) โดยใช้ Matplotlib เป็น Backend ทำให้เราสามารถสร้างกราฟได้อย่างรวดเร็วเพียงแค่เรียกใช้ .plot() method

9.1.1 การสร้างกราฟเส้น (Line Plot)

Line Plot เป็นกราฟพื้นฐานที่สุด เหมาะสำหรับแสดงแนวโน้ม (Trend) ของข้อมูลตามเวลาหรือลำดับ

import pandas as pd
import matplotlib.pyplot as plt

def create_line_plot_example():
    """
    สร้างตัวอย่างกราฟเส้นแสดงยอดขายรายเดือน
    
    Returns:
        None: แสดงกราฟบนหน้าจอ
    """
    # สร้างข้อมูลยอดขายรายเดือน
    data = {
        'เดือน': ['ม.ค.', 'ก.พ.', 'มี.ค.', 'เม.ย.', 'พ.ค.', 'มิ.ย.'],
        'ยอดขาย': [45000, 52000, 48000, 61000, 58000, 67000],
        'ค่าใช้จ่าย': [30000, 31000, 29000, 35000, 33000, 38000]
    }
    df = pd.DataFrame(data)
    df.set_index('เดือน', inplace=True)
    
    # สร้าง Line Plot แบบง่าย
    df.plot(kind='line', figsize=(10, 6), title='รายงานยอดขายและค่าใช้จ่าย')
    plt.ylabel('จำนวนเงิน (บาท)')
    plt.xlabel('เดือน')
    plt.grid(True, alpha=0.3)  # เพิ่ม grid เบาๆ
    plt.legend(['ยอดขาย', 'ค่าใช้จ่าย'])
    plt.show()

# ตัวอย่างการใช้งาน
create_line_plot_example()

พารามิเตอร์สำคัญของ .plot():

kind: ประเภทของกราฟ ('line', 'bar', 'barh', 'hist', 'box', 'kde', 'area', 'scatter', 'hexbin', 'pie')
figsize: ขนาดของกราฟ เป็น tuple (width, height) หน่วยเป็นนิ้ว
title: ชื่อหัวข้อกราฟ
xlabel, ylabel: ป้ายชื่อแกน X และ Y
legend: แสดง legend หรือไม่ (True/False)
grid: แสดงเส้น grid หรือไม่
color: สีของเส้นหรือแท่ง (str หรือ list)
alpha: ความโปร่งใส (0-1)

9.1.2 กราฟแท่ง (Bar Chart)

Bar Chart เหมาะสำหรับการเปรียบเทียบค่าระหว่างหมวดหมู่ต่างๆ สามารถทำได้ทั้งแบบแนวตั้ง (Vertical) และแนวนอน (Horizontal)

def create_bar_chart_example():
    """
    สร้างตัวอย่างกราฟแท่งเปรียบเทียบยอดขายของสาขา
    """
    # ข้อมูลยอดขายแต่ละสาขา
    sales_data = {
        'สาขา': ['กรุงเทพ', 'เชียงใหม่', 'ภูเก็ต', 'ขอนแก่น', 'หาดใหญ่'],
        'Q1': [120000, 85000, 95000, 78000, 88000],
        'Q2': [135000, 92000, 102000, 81000, 94000],
        'Q3': [145000, 98000, 115000, 85000, 99000]
    }
    df = pd.DataFrame(sales_data)
    df.set_index('สาขา', inplace=True)
    
    # สร้าง Bar Chart แนวตั้ง
    ax = df.plot(
        kind='bar',
        figsize=(12, 6),
        title='เปรียบเทียบยอดขายแต่ละไตรมาส',
        color=['#d79921', '#458588', '#98971a'],  # Gruvbox colors
        width=0.8,
        rot=45  # หมุนป้ายแกน X
    )
    
    plt.ylabel('ยอดขาย (บาท)')
    plt.xlabel('สาขา')
    plt.legend(title='ไตรมาส')
    plt.tight_layout()  # ปรับ layout ให้พอดี
    plt.show()
    
    # สร้าง Horizontal Bar Chart
    df.plot(kind='barh', figsize=(10, 6), title='ยอดขายแต่ละสาขา (แนวนอน)')
    plt.xlabel('ยอดขาย (บาท)')
    plt.show()

create_bar_chart_example()

ความแตกต่างระหว่าง bar และ barh:

ประเภท	ทิศทาง	เหมาะสำหรับ
`bar`	แนวตั้ง (Vertical)	เปรียบเทียบตามช่วงเวลา, หมวดหมู่น้อย
`barh`	แนวนอน (Horizontal)	หมวดหมู่มาก, ชื่อยาว, Ranking

9.1.3 กราฟฮิสโตแกรม (Histogram)

Histogram ใช้แสดงการกระจายตัว (Distribution) ของข้อมูลเชิงปริมาณ แตกต่างจาก Bar Chart ตรงที่แกน X เป็นข้อมูลต่อเนื่อง (Continuous)

สมการความถี่สัมพัติ (Relative Frequency):

f (x) = \frac{n_{i}}{N}

โดยที่:

f(x) = ความถี่สัมพัติ (Relative Frequency)
n_i = จำนวนข้อมูลใน bin ที่ i
N = จำนวนข้อมูลทั้งหมด

import numpy as np

def create_histogram_example():
    """
    สร้างฮิสโตแกรมแสดงการกระจายของคะแนนสอบ
    """
    # สร้างข้อมูลคะแนนสอบแบบสุ่ม (Normal Distribution)
    np.random.seed(42)
    scores = np.random.normal(loc=65, scale=15, size=500)  # mean=65, std=15
    df = pd.DataFrame({'คะแนน': scores})
    
    # สร้าง Histogram
    ax = df['คะแนน'].plot(
        kind='hist',
        bins=30,  # จำนวนช่วง (bins)
        figsize=(10, 6),
        title='การกระจายคะแนนสอบ',
        color='#458588',
        alpha=0.7,
        edgecolor='black'
    )
    
    # เพิ่มเส้นค่าเฉลี่ย
    mean_score = df['คะแนน'].mean()
    plt.axvline(mean_score, color='#d79921', linestyle='--', 
                linewidth=2, label=f'ค่าเฉลี่ย: {mean_score:.2f}')
    
    plt.xlabel('คะแนน')
    plt.ylabel('ความถี่ (Frequency)')
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.show()

create_histogram_example()

พารามิเตอร์สำคัญของ Histogram:

bins: จำนวนช่วง (bins) สำหรับแบ่งข้อมูล เช่น 10, 20, 30 หรือกำหนดเป็น list ของขอบเขต
density: ถ้าเป็น True จะแสดงความหนาแน่นความน่าจะเป็น (Probability Density) แทนความถี่
cumulative: แสดงค่าสะสม (Cumulative) หรือไม่

9.1.4 กราฟกล่อง (Box Plot)

Box Plot หรือ Box-and-Whisker Plot เป็นเครื่องมือทางสถิติที่แสดงการกระจายตัวของข้อมูล โดยแสดง ควอไทล์ (Quantiles) และ ค่าผิดปกติ (Outliers)

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#d79921','primaryTextColor':'#3c3836','primaryBorderColor':'#98971a','lineColor':'#458588','secondaryColor':'#b16286','tertiaryColor':'#689d6a','background':'#fbf1c7','mainBkg':'#fbf1c7','secondBkg':'#ebdbb2'}}}%%
graph LR
    A["Minimum
ค่าต่ำสุด"] --> B["Q1
ควอไทล์ที่ 1
25%"]
    B --> C["Q2/Median
มัธยฐาน
50%"]
    C --> D["Q3
ควอไทล์ที่ 3
75%"]
    D --> E["Maximum
ค่าสูงสุด"]
    
    F["Outliers
ค่าผิดปกติ"] -.-> A
    F -.-> E
    
    style C fill:#d79921,stroke:#3c3836,stroke-width:3px,color:#3c3836
    style B fill:#458588,stroke:#3c3836,stroke-width:2px,color:#fbf1c7
    style D fill:#458588,stroke:#3c3836,stroke-width:2px,color:#fbf1c7
    style F fill:#cc241d,stroke:#3c3836,stroke-width:2px,color:#fbf1c7

สมการ Interquartile Range (IQR):

IQR = Q_{3} - Q_{1}

สมการตรวจหา Outliers:

Lower Bound = Q_{1} - 1.5 \times IQR

Upper Bound = Q_{3} + 1.5 \times IQR

def create_boxplot_example():
    """
    สร้าง Box Plot เปรียบเทียบเงินเดือนตามแผนก
    """
    # สร้างข้อมูลเงินเดือนแต่ละแผนก
    np.random.seed(42)
    data = {
        'IT': np.random.normal(50000, 10000, 100),
        'Marketing': np.random.normal(45000, 8000, 100),
        'Sales': np.random.normal(48000, 12000, 100),
        'HR': np.random.normal(42000, 7000, 100)
    }
    df = pd.DataFrame(data)
    
    # สร้าง Box Plot
    ax = df.plot(
        kind='box',
        figsize=(10, 6),
        title='การเปรียบเทียบเงินเดือนตามแผนก',
        color={'boxes': '#458588', 'whiskers': '#98971a', 
               'medians': '#d79921', 'caps': '#3c3836'}
    )
    
    plt.ylabel('เงินเดือน (บาท)')
    plt.xlabel('แผนก')
    plt.grid(True, alpha=0.3, axis='y')
    plt.show()
    
    # แสดงสถิติ
    print("\n=== สถิติเบื้องต้นแต่ละแผนก ===")
    print(df.describe())

create_boxplot_example()

ข้อดีของ Box Plot:

เห็นการกระจายของข้อมูลได้ชัดเจน
ตรวจจับ Outliers ได้ง่าย
เปรียบเทียบหลายกลุ่มพร้อมกันได้

9.1.5 กราฟการกระจาย (Scatter Plot)

Scatter Plot ใช้แสดงความสัมพันธ์ (Relationship) ระหว่างตัวแปร 2 ตัว เหมาะสำหรับการหา สหสัมพันธ์ (Correlation)

สมการสหสัมพันธ์เพียร์สัน (Pearson Correlation):

r = \frac{\sum (x_{i} - \bar{x}) (y_{i} - \bar{y})}{\sqrt{\sum {(x_{i} - \bar{x})}^{2} \sum {(y_{i} - \bar{y})}^{2}}}

โดยที่:

r = ค่าสหสัมพันธ์ (-1 ถึง 1)
x_i, y_i = ค่าของตัวแปร x และ y ที่ตำแหน่ง i
x̄, ȳ = ค่าเฉลี่ยของตัวแปร x และ y

def create_scatter_plot_example():
    """
    สร้าง Scatter Plot แสดงความสัมพันธ์ระหว่างงบโฆษณาและยอดขาย
    """
    # สร้างข้อมูลที่มีความสัมพันธ์เชิงบวก
    np.random.seed(42)
    ad_budget = np.random.uniform(10000, 100000, 50)
    sales = ad_budget * 2.5 + np.random.normal(0, 20000, 50)  # มีความสัมพันธ์เชิงเส้น
    
    df = pd.DataFrame({
        'งบโฆษณา': ad_budget,
        'ยอดขาย': sales
    })
    
    # สร้าง Scatter Plot
    ax = df.plot(
        kind='scatter',
        x='งบโฆษณา',
        y='ยอดขาย',
        figsize=(10, 6),
        title='ความสัมพันธ์ระหว่างงบโฆษณาและยอดขาย',
        color='#458588',
        s=100,  # ขนาดของจุด
        alpha=0.6
    )
    
    # คำนวณและแสดง Correlation
    correlation = df['งบโฆษณา'].corr(df['ยอดขาย'])
    
    # เพิ่ม Trend Line
    z = np.polyfit(df['งบโฆษณา'], df['ยอดขาย'], 1)
    p = np.poly1d(z)
    plt.plot(df['งบโฆษณา'], p(df['งบโฆษณา']), 
             color='#d79921', linestyle='--', linewidth=2, 
             label=f'Trend Line (r={correlation:.3f})')
    
    plt.xlabel('งบโฆษณา (บาท)')
    plt.ylabel('ยอดขาย (บาท)')
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.show()

create_scatter_plot_example()

การตีความค่า Correlation:

ค่า r	การตีความ
0.9 - 1.0	สหสัมพันธ์เชิงบวกสูงมาก
0.7 - 0.9	สหสัมพันธ์เชิงบวกสูง
0.5 - 0.7	สหสัมพันธ์เชิงบวกปานกลาง
0.3 - 0.5	สหสัมพันธ์เชิงบวกต่ำ
-0.3 - 0.3	ไม่มีความสัมพันธ์
-0.5 - -0.3	สหสัมพันธ์เชิงลบต่ำ
-0.7 - -0.5	สหสัมพันธ์เชิงลบปานกลาง
-0.9 - -0.7	สหสัมพันธ์เชิงลบสูง
-1.0 - -0.9	สหสัมพันธ์เชิงลบสูงมาก

9.1.6 กราฟวงกลม (Pie Chart)

Pie Chart เหมาะสำหรับแสดงสัดส่วน (Proportion) ของแต่ละส่วนเทียบกับทั้งหมด แต่ควรใช้เมื่อมีหมวดหมู่ไม่เกิน 5-7 หมวด

def create_pie_chart_example():
    """
    สร้าง Pie Chart แสดงสัดส่วนยอดขายตามประเภทสินค้า
    """
    # ข้อมูลยอดขายตามประเภทสินค้า
    data = {
        'ประเภทสินค้า': ['อิเล็กทรอนิกส์', 'เสื้อผ้า', 'อาหาร', 'ของใช้ในบ้าน', 'หนังสือ'],
        'ยอดขาย': [350000, 280000, 420000, 190000, 160000]
    }
    df = pd.DataFrame(data)
    df.set_index('ประเภทสินค้า', inplace=True)
    
    # สร้าง Pie Chart
    colors = ['#d79921', '#458588', '#98971a', '#b16286', '#689d6a']
    
    ax = df.plot(
        kind='pie',
        y='ยอดขาย',
        figsize=(10, 8),
        title='สัดส่วนยอดขายตามประเภทสินค้า',
        colors=colors,
        autopct='%1.1f%%',  # แสดงเปอร์เซ็นต์
        startangle=90,  # เริ่มที่ 90 องศา
        legend=False
    )
    
    plt.ylabel('')  # ลบ label แกน y
    plt.show()

create_pie_chart_example()

ข้อควรระวังในการใช้ Pie Chart:

ไม่เหมาะสำหรับหมวดหมู่มากเกินไป (เกิน 7 หมวด)
ยากต่อการเปรียบเทียบค่าที่ใกล้เคียงกัน
Bar Chart มักจะเหมาะสมกว่าในกรณีส่วนใหญ่

9.2 การปรับแต่งกราฟขั้นสูง (Advanced Customization)

การสร้างกราฟด้วย Pandas นั้นสะดวก แต่บางครั้งเราต้องการควบคุมละเอียด (Fine-grained Control) มากขึ้น ซึ่งต้องใช้ Matplotlib โดยตรง

9.2.1 การใช้ Matplotlib กับ Pandas

def advanced_plot_customization():
    """
    ตัวอย่างการปรับแต่งกราฟขั้นสูงด้วย Matplotlib
    """
    # สร้างข้อมูลตัวอย่าง
    data = {
        'ปี': [2019, 2020, 2021, 2022, 2023, 2024],
        'รายได้': [1200000, 980000, 1350000, 1580000, 1820000, 2100000],
        'กำไร': [180000, 120000, 240000, 310000, 410000, 520000]
    }
    df = pd.DataFrame(data)
    df.set_index('ปี', inplace=True)
    
    # สร้าง Figure และ Axes แบบกำหนดเอง
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
    
    # กราฟที่ 1: Line Plot พร้อมหลาย Formatting
    df['รายได้'].plot(
        ax=ax1,
        kind='line',
        color='#458588',
        linewidth=3,
        marker='o',
        markersize=10,
        markerfacecolor='#d79921',
        markeredgecolor='#3c3836',
        markeredgewidth=2
    )
    
    ax1.set_title('แนวโน้มรายได้ 6 ปี', fontsize=16, fontweight='bold')
    ax1.set_xlabel('ปี', fontsize=12)
    ax1.set_ylabel('รายได้ (บาท)', fontsize=12)
    ax1.grid(True, linestyle='--', alpha=0.5)
    ax1.set_facecolor('#fbf1c7')  # สีพื้นหลัง Gruvbox
    
    # กราฟที่ 2: Area Plot แสดงสัดส่วนกำไร
    df['กำไร'].plot(
        ax=ax2,
        kind='area',
        color='#98971a',
        alpha=0.7
    )
    
    ax2.set_title('แนวโน้มกำไร 6 ปี', fontsize=16, fontweight='bold')
    ax2.set_xlabel('ปี', fontsize=12)
    ax2.set_ylabel('กำไร (บาท)', fontsize=12)
    ax2.grid(True, linestyle='--', alpha=0.5)
    ax2.set_facecolor('#fbf1c7')
    
    plt.tight_layout()
    plt.show()

advanced_plot_customization()

องค์ประกอบสำคัญของ Matplotlib:

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#d79921','primaryTextColor':'#3c3836','primaryBorderColor':'#98971a','lineColor':'#458588','secondaryColor':'#b16286','tertiaryColor':'#689d6a','background':'#fbf1c7','mainBkg':'#fbf1c7','secondBkg':'#ebdbb2'}}}%%
graph TB
    A["Figure
ภาพรวมทั้งหมด"] --> B["Axes
พื้นที่กราฟ"]
    B --> C["Title
หัวข้อ"]
    B --> D["X-axis
แกน X"]
    B --> E["Y-axis
แกน Y"]
    B --> F["Legend
คำอธิบาย"]
    B --> G["Grid
เส้นตาราง"]
    B --> H["Plot
เส้น/จุด/แท่ง"]
    
    style A fill:#d79921,stroke:#3c3836,stroke-width:3px,color:#3c3836
    style B fill:#458588,stroke:#3c3836,stroke-width:2px,color:#fbf1c7

9.2.2 การสร้าง Subplots หลายกราฟ

def create_multiple_subplots():
    """
    สร้างหลายกราฟในหน้าเดียวกัน (Subplots)
    """
    # สร้างข้อมูลตัวอย่าง
    np.random.seed(42)
    dates = pd.date_range('2024-01-01', periods=100, freq='D')
    df = pd.DataFrame({
        'วันที่': dates,
        'อุณหภูมิ': np.random.normal(30, 5, 100),
        'ความชื้น': np.random.normal(70, 10, 100),
        'ฝน': np.random.poisson(2, 100)
    })
    df.set_index('วันที่', inplace=True)
    
    # สร้าง Subplots 2x2
    fig, axes = plt.subplots(2, 2, figsize=(15, 10))
    fig.suptitle('Dashboard ข้อมูลสภาพอากาศ 100 วัน', 
                 fontsize=18, fontweight='bold')
    
    # กราฟที่ 1: อุณหภูมิ (Line)
    df['อุณหภูมิ'].plot(ax=axes[0, 0], color='#d79921', linewidth=2)
    axes[0, 0].set_title('อุณหภูมิรายวัน')
    axes[0, 0].set_ylabel('องศาเซลเซียส')
    axes[0, 0].grid(True, alpha=0.3)
    
    # กราฟที่ 2: ความชื้น (Area)
    df['ความชื้น'].plot(ax=axes[0, 1], kind='area', 
                         color='#458588', alpha=0.5)
    axes[0, 1].set_title('ความชื้นรายวัน')
    axes[0, 1].set_ylabel('เปอร์เซ็นต์')
    axes[0, 1].grid(True, alpha=0.3)
    
    # กราฟที่ 3: ฝน (Bar)
    df['ฝน'].plot(ax=axes[1, 0], kind='bar', color='#98971a', width=1.0)
    axes[1, 0].set_title('ปริมาณฝนรายวัน')
    axes[1, 0].set_ylabel('มิลลิเมตร')
    axes[1, 0].tick_params(axis='x', rotation=0, labelsize=6)
    
    # กราฟที่ 4: Histogram อุณหภูมิ
    df['อุณหภูมิ'].plot(ax=axes[1, 1], kind='hist', 
                         bins=20, color='#b16286', alpha=0.7)
    axes[1, 1].set_title('การกระจายอุณหภูมิ')
    axes[1, 1].set_xlabel('องศาเซลเซียส')
    axes[1, 1].set_ylabel('ความถี่')
    
    plt.tight_layout()
    plt.show()

create_multiple_subplots()

รูปแบบการสร้าง Subplots:

plt.subplots(nrows, ncols) - สร้าง Grid แบบเท่ากัน
plt.subplot(rows, cols, index) - สร้างทีละอัน
fig.add_subplot() - เพิ่มแบบยืดหยุ่น
GridSpec - กำหนด Layout ซับซ้อน

9.2.3 การใช้งาน Seaborn

Seaborn เป็น Library ที่สร้างบน Matplotlib ทำให้สร้างกราฟสวยงาม (Beautiful) และซับซ้อนได้ง่ายขึ้น

import seaborn as sns

def seaborn_examples():
    """
    ตัวอย่างการใช้ Seaborn กับ Pandas
    """
    # ตั้งค่า Style ของ Seaborn
    sns.set_style("whitegrid")
    sns.set_palette("husl")
    
    # สร้างข้อมูลตัวอย่าง
    tips = pd.DataFrame({
        'total_bill': np.random.uniform(10, 50, 100),
        'tip': np.random.uniform(1, 10, 100),
        'day': np.random.choice(['Thu', 'Fri', 'Sat', 'Sun'], 100),
        'time': np.random.choice(['Lunch', 'Dinner'], 100),
        'size': np.random.randint(1, 6, 100)
    })
    
    # สร้างหลายกราฟ Seaborn
    fig, axes = plt.subplots(2, 2, figsize=(15, 12))
    
    # 1. Scatter Plot พร้อม Regression Line
    sns.regplot(data=tips, x='total_bill', y='tip', ax=axes[0, 0],
                scatter_kws={'alpha': 0.5}, line_kws={'color': '#d79921'})
    axes[0, 0].set_title('ความสัมพันธ์ระหว่างยอดบิลและทิป')
    
    # 2. Box Plot เปรียบเทียบวัน
    sns.boxplot(data=tips, x='day', y='total_bill', ax=axes[0, 1])
    axes[0, 1].set_title('การเปรียบเทียบยอดบิลแต่ละวัน')
    
    # 3. Violin Plot
    sns.violinplot(data=tips, x='time', y='tip', ax=axes[1, 0])
    axes[1, 0].set_title('การกระจายของทิปตามเวลา')
    
    # 4. Heatmap Correlation
    correlation = tips[['total_bill', 'tip', 'size']].corr()
    sns.heatmap(correlation, annot=True, cmap='YlOrRd', ax=axes[1, 1],
                fmt='.2f', linewidths=0.5)
    axes[1, 1].set_title('Correlation Matrix')
    
    plt.tight_layout()
    plt.show()

seaborn_examples()

กราฟพิเศษใน Seaborn:

ชื่อกราฟ	คำสั่ง	เหมาะสำหรับ
Violin Plot	`sns.violinplot()`	แสดงการกระจายแบบละเอียด
Pair Plot	`sns.pairplot()`	ดูความสัมพันธ์ระหว่างตัวแปรทุกคู่
Heatmap	`sns.heatmap()`	แสดง Correlation Matrix
Joint Plot	`sns.jointplot()`	รวม Scatter + Histogram
Count Plot	`sns.countplot()`	นับความถี่ของ Category
Swarm Plot	`sns.swarmplot()`	แสดงทุกจุดโดยไม่ทับ

9.3 การบันทึกกราฟ (Saving Figures)

หลังจากสร้างกราฟแล้ว เราสามารถ บันทึกเป็นไฟล์ (Save to File) เพื่อนำไปใช้ในรายงานหรือนำเสนอ

9.3.1 การบันทึกด้วย Matplotlib

def save_figure_examples():
    """
    ตัวอย่างการบันทึกกราฟในรูปแบบต่างๆ
    """
    # สร้างกราฟตัวอย่าง
    data = {
        'เดือน': ['ม.ค.', 'ก.พ.', 'มี.ค.', 'เม.ย.', 'พ.ค.', 'มิ.ย.'],
        'ยอดขาย': [45000, 52000, 48000, 61000, 58000, 67000]
    }
    df = pd.DataFrame(data)
    
    plt.figure(figsize=(10, 6))
    plt.plot(df['เดือน'], df['ยอดขาย'], marker='o', 
             linewidth=2, color='#458588')
    plt.title('ยอดขายรายเดือน', fontsize=16)
    plt.xlabel('เดือน')
    plt.ylabel('ยอดขาย (บาท)')
    plt.grid(True, alpha=0.3)
    
    # บันทึกในรูปแบบต่างๆ
    # 1. PNG - สำหรับใช้ทั่วไป
    plt.savefig('sales_chart.png', dpi=300, bbox_inches='tight')
    
    # 2. SVG - Vector format สำหรับการพิมพ์คุณภาพสูง
    plt.savefig('sales_chart.svg', format='svg', bbox_inches='tight')
    
    # 3. PDF - สำหรับรายงาน
    plt.savefig('sales_chart.pdf', bbox_inches='tight')
    
    # 4. JPG - ขนาดไฟล์เล็ก
    plt.savefig('sales_chart.jpg', dpi=150, quality=95, bbox_inches='tight')
    
    print("บันทึกกราฟเรียบร้อยแล้ว!")

save_figure_examples()

พารามิเตอร์สำคัญของ savefig():

dpi: ความละเอียด (Dots Per Inch) แนะนำ 300 สำหรับพิมพ์
bbox_inches='tight': ตัดขอบให้พอดี
transparent: พื้นหลังโปร่งใส (True/False)
facecolor: สีพื้นหลัง
format: รูปแบบไฟล์ ('png', 'pdf', 'svg', 'jpg')
quality: คุณภาพสำหรับ JPG (0-100)

เปรียบเทียบรูปแบบไฟล์:

รูปแบบ	ประเภท	ขนาดไฟล์	คุณภาพ	เหมาะสำหรับ
PNG	Raster	กลาง	สูง	เว็บไซต์, นำเสนอ
JPG	Raster	เล็ก	ปานกลาง	รูปถ่าย, เว็บไซต์
SVG	Vector	เล็ก	ไม่จำกัด	การพิมพ์, แก้ไขต่อ
PDF	Vector	กลาง	ไม่จำกัด	รายงาน, พิมพ์

9.4 การส่งออกข้อมูล (Exporting Data)

นอกจากการสร้างกราฟแล้ว การส่งออกข้อมูลเป็นสิ่งสำคัญเพื่อแชร์ผลลัพธ์หรือนำไปใช้งานต่อ

9.4.1 การส่งออกเป็น CSV

CSV (Comma-Separated Values) เป็นรูปแบบที่นิยมมากที่สุด เพราะเปิดได้กับโปรแกรมเกือบทุกตัว

def export_to_csv():
    """
    ตัวอย่างการส่งออกเป็น CSV พร้อมตัวเลือกต่างๆ
    """
    # สร้างข้อมูลตัวอย่าง
    data = {
        'รหัสพนักงาน': ['E001', 'E002', 'E003', 'E004'],
        'ชื่อ': ['สมชาย', 'สมหญิง', 'ประเสริฐ', 'วิไล'],
        'แผนก': ['IT', 'Marketing', 'Sales', 'HR'],
        'เงินเดือน': [45000, 38000, 42000, 35000]
    }
    df = pd.DataFrame(data)
    
    # 1. บันทึกแบบพื้นฐาน
    df.to_csv('employees.csv', index=False)
    
    # 2. บันทึกโดยเลือกเฉพาะบาง Column
    df[['ชื่อ', 'แผนก']].to_csv('employees_simple.csv', index=False)
    
    # 3. บันทึกโดยกำหนด Separator เป็นอื่น
    df.to_csv('employees_tab.txt', sep='\t', index=False)  # Tab-delimited
    
    # 4. บันทึกโดย Encoding เป็น UTF-8 with BOM (เปิดใน Excel ไม่เพี้ยน)
    df.to_csv('employees_thai.csv', index=False, encoding='utf-8-sig')
    
    # 5. บันทึกเฉพาะบาง Row ตามเงื่อนไข
    high_salary = df[df['เงินเดือน'] > 40000]
    high_salary.to_csv('high_salary_employees.csv', index=False)
    
    # 6. เพิ่มข้อมูลต่อท้ายไฟล์เดิม (Append mode)
    new_employee = pd.DataFrame({
        'รหัสพนักงาน': ['E005'],
        'ชื่อ': ['ธนา'],
        'แผนก': ['Finance'],
        'เงินเดือน': [48000]
    })
    new_employee.to_csv('employees.csv', mode='a', header=False, index=False)
    
    print("ส่งออก CSV สำเร็จ!")

export_to_csv()

พารามิเตอร์สำคัญของ to_csv():

index: บันทึก Index หรือไม่ (True/False)
sep: ตัวแบ่ง (เช่น ',' ',', '\t')
encoding: การเข้ารหัส ('utf-8', 'utf-8-sig', 'cp874')
header: บันทึกชื่อ Column หรือไม่
columns: เลือกเฉพาะ Column ที่ต้องการ
mode: โหมดการเขียน ('w'=overwrite, 'a'=append)
na_rep: แทนค่า NaN (เช่น 'NA', '', '0')
float_format: รูปแบบตัวเลขทศนิยม (เช่น '%.2f')

9.4.2 การส่งออกเป็น Excel

Excel เป็นเครื่องมือที่นิยมในองค์กร Pandas สามารถส่งออกเป็น .xlsx พร้อมปรับแต่งได้

def export_to_excel():
    """
    ตัวอย่างการส่งออกเป็น Excel พร้อมการจัดรูปแบบ
    """
    # ต้อง install: pip install openpyxl
    
    # สร้างข้อมูลหลาย DataFrame
    sales_q1 = pd.DataFrame({
        'สาขา': ['กรุงเทพ', 'เชียงใหม่', 'ภูเก็ต'],
        'มกราคม': [120000, 85000, 95000],
        'กุมภาพันธ์': [125000, 88000, 97000],
        'มีนาคม': [135000, 92000, 102000]
    })
    
    sales_q2 = pd.DataFrame({
        'สาขา': ['กรุงเทพ', 'เชียงใหม่', 'ภูเก็ต'],
        'เมษายน': [138000, 94000, 105000],
        'พฤษภาคม': [142000, 97000, 108000],
        'มิถุนายน': [145000, 98000, 115000]
    })
    
    # 1. บันทึกแบบง่าย (1 Sheet)
    sales_q1.to_excel('sales_q1.xlsx', sheet_name='Q1', index=False)
    
    # 2. บันทึกหลาย Sheet ในไฟล์เดียว
    with pd.ExcelWriter('sales_report.xlsx', engine='openpyxl') as writer:
        sales_q1.to_excel(writer, sheet_name='Q1', index=False)
        sales_q2.to_excel(writer, sheet_name='Q2', index=False)
        
        # สร้าง Summary Sheet
        summary = pd.DataFrame({
            'ไตรมาส': ['Q1', 'Q2'],
            'ยอดขายรวม': [
                sales_q1[['มกราคม', 'กุมภาพันธ์', 'มีนาคม']].sum().sum(),
                sales_q2[['เมษายน', 'พฤษภาคม', 'มิถุนายน']].sum().sum()
            ]
        })
        summary.to_excel(writer, sheet_name='สรุป', index=False)
    
    print("ส่งออก Excel สำเร็จ!")

export_to_excel()

การจัดรูปแบบ Excel ขั้นสูง:

def export_to_excel_formatted():
    """
    ส่งออก Excel พร้อมการจัดรูปแบบขั้นสูง
    """
    from openpyxl import load_workbook
    from openpyxl.styles import Font, PatternFill, Alignment, Border, Side
    
    # สร้างข้อมูล
    df = pd.DataFrame({
        'สินค้า': ['Laptop', 'Mouse', 'Keyboard', 'Monitor'],
        'ยอดขาย': [50, 200, 150, 30],
        'ราคาต่อหน่วย': [25000, 500, 1200, 8000],
        'รายได้': [1250000, 100000, 180000, 240000]
    })
    
    # บันทึกเป็น Excel
    df.to_excel('products_formatted.xlsx', index=False)
    
    # โหลดไฟล์กลับมาเพื่อจัดรูปแบบ
    wb = load_workbook('products_formatted.xlsx')
    ws = wb.active
    
    # จัดรูปแบบหัวตาราง
    header_fill = PatternFill(start_color='458588', end_color='458588', 
                               fill_type='solid')
    header_font = Font(bold=True, color='FFFFFF', size=12)
    
    for cell in ws[1]:
        cell.fill = header_fill
        cell.font = header_font
        cell.alignment = Alignment(horizontal='center', vertical='center')
    
    # จัดรูปแบบตัวเลข
    for row in ws.iter_rows(min_row=2, min_col=4, max_col=4):
        for cell in row:
            cell.number_format = '#,##0'  # รูปแบบตัวเลขมีคอมม่า
    
    # ปรับความกว้างคอลัมน์
    ws.column_dimensions['A'].width = 15
    ws.column_dimensions['B'].width = 12
    ws.column_dimensions['C'].width = 15
    ws.column_dimensions['D'].width = 15
    
    wb.save('products_formatted.xlsx')
    print("จัดรูปแบบ Excel เรียบร้อย!")

export_to_excel_formatted()

9.4.3 การส่งออกเป็น JSON

JSON (JavaScript Object Notation) เป็นรูปแบบที่นิยมสำหรับ Web APIs และการแลกเปลี่ยนข้อมูล

def export_to_json():
    """
    ตัวอย่างการส่งออกเป็น JSON
    """
    # สร้างข้อมูลตัวอย่าง
    data = {
        'id': [1, 2, 3, 4],
        'name': ['Alice', 'Bob', 'Charlie', 'David'],
        'score': [85, 92, 78, 95],
        'grade': ['B', 'A', 'C', 'A']
    }
    df = pd.DataFrame(data)
    
    # 1. แบบ Records (Array of Objects) - นิยมที่สุด
    df.to_json('students_records.json', orient='records', indent=2, 
               force_ascii=False)
    # Output: [{"id": 1, "name": "Alice", ...}, ...]
    
    # 2. แบบ Index (Object of Objects)
    df.to_json('students_index.json', orient='index', indent=2)
    # Output: {"0": {"id": 1, ...}, "1": {...}, ...}
    
    # 3. แบบ Columns (Object with Column Arrays)
    df.to_json('students_columns.json', orient='columns', indent=2)
    # Output: {"id": [1,2,3,4], "name": [...], ...}
    
    # 4. แบบ Values (2D Array)
    df.to_json('students_values.json', orient='values')
    # Output: [[1,"Alice",85,"B"], [2,"Bob",92,"A"], ...]
    
    # 5. แบบ Split (Separate columns, index, data)
    df.to_json('students_split.json', orient='split', indent=2)
    # Output: {"columns": [...], "index": [...], "data": [...]}
    
    print("ส่งออก JSON สำเร็จ!")

export_to_json()

รูปแบบ Orient ใน JSON:

Orient	โครงสร้าง	เหมาะสำหรับ
`records`	List of Dicts	REST API, JavaScript
`index`	Dict of Dicts (indexed)	การอ้างอิงด้วย ID
`columns`	Dict of Lists	Charting libraries
`values`	2D Array	Machine Learning
`split`	Separate structure	Pandas reconstruction
`table`	JSON Table Schema	Data packages

9.4.4 การส่งออกเป็น HTML

HTML Table เหมาะสำหรับนำเสนอบนเว็บไซต์หรือ Email

def export_to_html():
    """
    ตัวอย่างการส่งออกเป็น HTML
    """
    # สร้างข้อมูลตัวอย่าง
    data = {
        'สินค้า': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Headset'],
        'ยอดขาย': [50, 200, 150, 30, 80],
        'รายได้': [1250000, 100000, 180000, 240000, 320000]
    }
    df = pd.DataFrame(data)
    
    # 1. HTML แบบพื้นฐาน
    df.to_html('products_basic.html', index=False)
    
    # 2. HTML พร้อม CSS Styling
    html_string = """
    <html>
    <head>
        <title>รายงานยอดขาย</title>
        <style>
            body {{
                font-family: 'Sarabun', Arial, sans-serif;
                background-color: #fbf1c7;
                padding: 20px;
            }}
            h1 {{
                color: #3c3836;
                text-align: center;
            }}
            table {{
                border-collapse: collapse;
                width: 80%;
                margin: 0 auto;
                background-color: white;
                box-shadow: 0 2px 8px rgba(0,0,0,0.1);
            }}
            th {{
                background-color: #458588;
                color: white;
                padding: 12px;
                text-align: left;
            }}
            td {{
                padding: 10px;
                border-bottom: 1px solid #ddd;
            }}
            tr:hover {{
                background-color: #ebdbb2;
            }}
        </style>
    </head>
    <body>
        <h1>รายงานยอดขายสินค้า</h1>
        {table}
    </body>
    </html>
    """
    
    # สร้าง HTML Table และแทรกใน Template
    table_html = df.to_html(index=False, classes='data-table')
    full_html = html_string.format(table=table_html)
    
    with open('products_styled.html', 'w', encoding='utf-8') as f:
        f.write(full_html)
    
    print("ส่งออก HTML สำเร็จ!")

export_to_html()

9.4.5 การส่งออกเป็น SQL Database

Pandas สามารถส่งออกข้อมูลไปยัง ฐานข้อมูล (Database) ได้โดยตรง

import sqlite3

def export_to_database():
    """
    ตัวอย่างการส่งออกไปยัง SQLite Database
    """
    # สร้างข้อมูลตัวอย่าง
    employees = pd.DataFrame({
        'emp_id': [1, 2, 3, 4, 5],
        'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
        'department': ['IT', 'HR', 'Sales', 'IT', 'Marketing'],
        'salary': [50000, 45000, 48000, 55000, 42000]
    })
    
    # เชื่อมต่อกับ Database (สร้างไฟล์ใหม่ถ้ายังไม่มี)
    conn = sqlite3.connect('company.db')
    
    # ส่งออกไป Database
    employees.to_sql('employees', conn, if_exists='replace', index=False)
    
    # ตรวจสอบว่าบันทึกสำเร็จ
    result = pd.read_sql('SELECT * FROM employees WHERE salary > 45000', conn)
    print("พนักงานที่เงินเดือนมากกว่า 45,000:")
    print(result)
    
    conn.close()
    print("\nส่งออกไป Database สำเร็จ!")

export_to_database()

พารามิเตอร์สำคัญของ to_sql():

name: ชื่อตารางในฐานข้อมูล
con: Connection object
if_exists: จัดการถ้าตารางมีอยู่แล้ว
- 'fail': แสดง Error (default)
- 'replace': ลบตารางเดิมแล้วสร้างใหม่
- 'append': เพิ่มข้อมูลต่อท้าย
index: บันทึก Index หรือไม่
dtype: กำหนดประเภทข้อมูลของแต่ละคอลัมน์
method: วิธีการ Insert ('multi', None)

9.5 เทคนิคขั้นสูงและ Best Practices

9.5.1 การเลือกรูปแบบไฟล์ที่เหมาะสม

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#d79921','primaryTextColor':'#3c3836','primaryBorderColor':'#98971a','lineColor':'#458588','secondaryColor':'#b16286','tertiaryColor':'#689d6a','background':'#fbf1c7','mainBkg':'#fbf1c7','secondBkg':'#ebdbb2'}}}%%
graph TB
    A["ต้องการส่งออกข้อมูล"] --> B{ขนาดข้อมูล}
    
    B -->|เล็ก < 10MB| C{ใครจะเปิด?}
    B -->|กลาง 10-100MB| D["Parquet
หรือ Feather"]
    B -->|ใหญ่ > 100MB| E["Database
หรือ HDF5"]
    
    C -->|คนทั่วไป| F["CSV/Excel"]
    C -->|โปรแกรมเมอร์| G["JSON/CSV"]
    C -->|เว็บไซต์| H["HTML/JSON"]
    C -->|รายงาน| I["PDF/Excel"]
    
    style A fill:#d79921,stroke:#3c3836,stroke-width:2px,color:#3c3836
    style F fill:#98971a,stroke:#3c3836,stroke-width:2px,color:#fbf1c7
    style G fill:#458588,stroke:#3c3836,stroke-width:2px,color:#fbf1c7
    style H fill:#b16286,stroke:#3c3836,stroke-width:2px,color:#fbf1c7

ตารางเปรียบเทียบรูปแบบไฟล์:

รูปแบบ	ความเร็วอ่าน	ความเร็วเขียน	ขนาดไฟล์	รักษาประเภทข้อมูล	เปิดได้ทั่วไป
CSV	ปานกลาง	เร็ว	กลาง	❌	✅✅✅
Excel	ช้า	ช้า	ใหญ่	✅	✅✅
JSON	ปานกลาง	ปานกลาง	ใหญ่	⚠️	✅✅
Parquet	เร็วมาก	เร็ว	เล็ก	✅✅	⚠️
Feather	เร็วมาก	เร็วมาก	เล็ก	✅✅	❌
HDF5	เร็ว	เร็ว	เล็ก	✅✅	❌

9.5.2 การใช้ Parquet และ Feather

Parquet และ Feather เป็นรูปแบบไฟล์ที่ออกแบบมาสำหรับ Big Data โดยเฉพาะ

def advanced_file_formats():
    """
    ตัวอย่างการใช้ Parquet และ Feather
    """
    # สร้างข้อมูลขนาดใหญ่
    large_df = pd.DataFrame({
        'id': range(1000000),
        'value': np.random.randn(1000000),
        'category': np.random.choice(['A', 'B', 'C', 'D'], 1000000),
        'date': pd.date_range('2020-01-01', periods=1000000, freq='min')
    })
    
    import time
    
    # ทดสอบ CSV
    start = time.time()
    large_df.to_csv('large_data.csv', index=False)
    csv_write_time = time.time() - start
    
    start = time.time()
    df_csv = pd.read_csv('large_data.csv')
    csv_read_time = time.time() - start
    
    # ทดสอบ Parquet (ต้อง install: pip install pyarrow)
    start = time.time()
    large_df.to_parquet('large_data.parquet', compression='snappy')
    parquet_write_time = time.time() - start
    
    start = time.time()
    df_parquet = pd.read_parquet('large_data.parquet')
    parquet_read_time = time.time() - start
    
    # ทดสอบ Feather (ต้อง install: pip install pyarrow)
    start = time.time()
    large_df.to_feather('large_data.feather')
    feather_write_time = time.time() - start
    
    start = time.time()
    df_feather = pd.read_feather('large_data.feather')
    feather_read_time = time.time() - start
    
    # แสดงผลเปรียบเทียบ
    import os
    comparison = pd.DataFrame({
        'Format': ['CSV', 'Parquet', 'Feather'],
        'Write Time (s)': [csv_write_time, parquet_write_time, feather_write_time],
        'Read Time (s)': [csv_read_time, parquet_read_time, feather_read_time],
        'File Size (MB)': [
            os.path.getsize('large_data.csv') / 1024 / 1024,
            os.path.getsize('large_data.parquet') / 1024 / 1024,
            os.path.getsize('large_data.feather') / 1024 / 1024
        ]
    })
    
    print("\n=== เปรียบเทียบประสิทธิภาพ ===")
    print(comparison.to_string(index=False))

# advanced_file_formats()  # Uncomment เพื่อทดสอบ

ข้อดีของ Parquet:

บีบอัดได้ดี (Compression)
อ่านได้รวดเร็วมาก
รองรับการอ่านเฉพาะ Column ที่ต้องการ (Columnar format)
รักษาประเภทข้อมูลได้ดี

ข้อดีของ Feather:

อ่านและเขียนเร็วที่สุด
ใช้ได้กับทั้ง Python และ R
ออกแบบมาสำหรับการแชร์ข้อมูลระหว่าง Language

9.5.3 การสร้าง Interactive Dashboard ด้วย Plotly

def create_interactive_dashboard():
    """
    สร้าง Interactive Dashboard ด้วย Plotly (ต้อง install: pip install plotly)
    """
    import plotly.graph_objects as go
    from plotly.subplots import make_subplots
    
    # สร้างข้อมูลตัวอย่าง
    months = ['ม.ค.', 'ก.พ.', 'มี.ค.', 'เม.ย.', 'พ.ค.', 'มิ.ย.']
    revenue = [120000, 135000, 128000, 148000, 156000, 172000]
    cost = [80000, 88000, 84000, 96000, 102000, 110000]
    profit = [r - c for r, c in zip(revenue, cost)]
    
    # สร้าง Subplots
    fig = make_subplots(
        rows=2, cols=2,
        subplot_titles=('รายได้และค่าใช้จ่าย', 'กำไรสุทธิ', 
                        'อัตรากำไร', 'การเติบโต'),
        specs=[[{'type': 'scatter'}, {'type': 'bar'}],
               [{'type': 'indicator'}, {'type': 'scatter'}]]
    )
    
    # กราฟที่ 1: Line Chart
    fig.add_trace(go.Scatter(x=months, y=revenue, name='รายได้',
                             line=dict(color='#458588', width=3)), row=1, col=1)
    fig.add_trace(go.Scatter(x=months, y=cost, name='ค่าใช้จ่าย',
                             line=dict(color='#d79921', width=3)), row=1, col=1)
    
    # กราฟที่ 2: Bar Chart
    fig.add_trace(go.Bar(x=months, y=profit, name='กำไร',
                         marker_color='#98971a'), row=1, col=2)
    
    # กราฟที่ 3: Gauge (Indicator)
    profit_margin = (sum(profit) / sum(revenue)) * 100
    fig.add_trace(go.Indicator(
        mode="gauge+number+delta",
        value=profit_margin,
        title={'text': "อัตรากำไร (%)"},
        delta={'reference': 30},
        gauge={'axis': {'range': [None, 50]},
               'bar': {'color': "#458588"},
               'steps': [
                   {'range': [0, 20], 'color': "#ebdbb2"},
                   {'range': [20, 35], 'color': "#d5c4a1"}],
               'threshold': {'line': {'color': "red", 'width': 4},
                           'thickness': 0.75, 'value': 30}}
    ), row=2, col=1)
    
    # กราฟที่ 4: Area Chart
    growth = [(revenue[i] - revenue[i-1])/revenue[i-1]*100 
              if i > 0 else 0 for i in range(len(revenue))]
    fig.add_trace(go.Scatter(x=months, y=growth, 
                            fill='tozeroy', name='%การเติบโต',
                            line=dict(color='#b16286')), row=2, col=2)
    
    # ปรับแต่ง Layout
    fig.update_layout(
        title_text="Dashboard การวิเคราะห์ธุรกิจ",
        showlegend=True,
        height=800,
        template='plotly_white'
    )
    
    # บันทึกเป็น HTML แบบ Interactive
    fig.write_html('interactive_dashboard.html')
    print("สร้าง Interactive Dashboard สำเร็จ!")

# create_interactive_dashboard()  # Uncomment เพื่อทดสอบ

9.6 สรุปและ Best Practices

9.6.1 Checklist สำหรับการ Visualization

เลือกประเภทกราฟที่เหมาะสม
- Line Chart → แนวโน้มตามเวลา
- Bar Chart → เปรียบเทียบหมวดหมู่
- Scatter Plot → ความสัมพันธ์ระหว่างตัวแปร
- Box Plot → การกระจายและ Outliers
- Histogram → การกระจายของข้อมูลต่อเนื่อง
- Pie Chart → สัดส่วน (ใช้น้อยที่สุด)
หลักการออกแบบกราฟที่ดี
- มี Title และ Labels ชัดเจน
- เลือกสีที่เหมาะสม (Color blind friendly)
- ไม่ใส่ข้อมูลมากเกินไปในกราฟเดียว
- มี Legend ที่อ่านง่าย
- ใช้ Grid เบาๆ เพื่อช่วยอ่านค่า
การเลือกขนาดและ DPI
- สำหรับหน้าจอ: 96 DPI, ขนาด 10x6 นิ้ว
- สำหรับพิมพ์: 300 DPI
- สำหรับนำเสนอ: 150 DPI, ขนาด 12x8 นิ้ว

9.6.2 Checklist สำหรับการ Export

ก่อนส่งออก:
- ✅ ตรวจสอบว่าข้อมูลถูกต้องครบถ้วน
- ✅ ลบ Column ที่ไม่จำเป็นออก
- ✅ เรียงลำดับข้อมูลให้เหมาะสม
- ✅ จัดการ Missing Values
เลือกรูปแบบไฟล์:
- แชร์กับคนทั่วไป → CSV, Excel
- ใช้กับโปรแกรมเมอร์ → JSON, Parquet
- ข้อมูลขนาดใหญ่ → Parquet, HDF5
- แสดงบนเว็บ → HTML, JSON
การตั้งชื่อไฟล์:
- ใช้ชื่อที่มีความหมาย
- เพิ่มวันที่ถ้าจำเป็น: sales_report_2024_06_15.csv
- หลีกเลี่ยงอักขระพิเศษ
- ใช้ underscore แทน space

9.6.3 Code Template สำหรับการส่งออกครบวงจร

def complete_export_workflow(df, report_name='report'):
    """
    Template สำหรับการส่งออกข้อมูลและกราฟครบวงจร
    
    Parameters:
        df: DataFrame ที่ต้องการส่งออก
        report_name: ชื่อรายงาน
    """
    from datetime import datetime
    
    # สร้าง timestamp
    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
    base_name = f"{report_name}_{timestamp}"
    
    # 1. บันทึกข้อมูลดิบ
    print("📊 กำลังบันทึกข้อมูล...")
    df.to_csv(f"{base_name}.csv", index=False, encoding='utf-8-sig')
    df.to_excel(f"{base_name}.xlsx", index=False, sheet_name='Data')
    
    # 2. สร้างและบันทึกกราฟ
    print("📈 กำลังสร้างกราฟ...")
    fig, axes = plt.subplots(2, 2, figsize=(15, 10))
    fig.suptitle(f'Dashboard: {report_name}', fontsize=16, fontweight='bold')
    
    # สร้างกราฟตัวอย่าง (ปรับตามข้อมูลจริง)
    numeric_cols = df.select_dtypes(include=['number']).columns[:4]
    
    for idx, col in enumerate(numeric_cols):
        ax = axes[idx // 2, idx % 2]
        df[col].plot(ax=ax, kind='line', title=col)
        ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.savefig(f"{base_name}_dashboard.png", dpi=300, bbox_inches='tight')
    plt.close()
    
    # 3. สร้าง Summary Report
    print("📝 กำลังสร้างรายงานสรุป...")
    summary = df.describe()
    summary.to_excel(f"{base_name}_summary.xlsx", sheet_name='Summary')
    
    # 4. สร้าง HTML Report
    html_content = f"""
    <html>
    <head>
        <title>{report_name}</title>
        <meta charset="utf-8">
        <style>
            body {{ font-family: Arial, sans-serif; margin: 20px; }}
            h1 {{ color: #3c3836; }}
            table {{ border-collapse: collapse; width: 100%; margin-top: 20px; }}
            th {{ background-color: #458588; color: white; padding: 10px; }}
            td {{ padding: 8px; border: 1px solid #ddd; }}
            img {{ max-width: 100%; height: auto; margin-top: 20px; }}
        </style>
    </head>
    <body>
        <h1>รายงาน: {report_name}</h1>
        <p>สร้างเมื่อ: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}</p>
        <h2>ข้อมูลสรุป</h2>
        {summary.to_html()}
        <h2>Dashboard</h2>
        <img src="{base_name}_dashboard.png" alt="Dashboard">
    </body>
    </html>
    """
    
    with open(f"{base_name}_report.html", 'w', encoding='utf-8') as f:
        f.write(html_content)
    
    print(f"\n✅ ส่งออกเรียบร้อย! ไฟล์ที่สร้าง:")
    print(f"   - {base_name}.csv")
    print(f"   - {base_name}.xlsx")
    print(f"   - {base_name}_summary.xlsx")
    print(f"   - {base_name}_dashboard.png")
    print(f"   - {base_name}_report.html")

# ตัวอย่างการใช้งาน
# df = pd.read_csv('your_data.csv')
# complete_export_workflow(df, 'sales_analysis')

สรุป Chapter 9

ใน Chapter นี้เราได้เรียนรู้เกี่ยวกับการแสดงผลและส่งออกข้อมูล (Visualization & Exporting) ซึ่งเป็นขั้นตอนสุดท้ายของกระบวนการวิเคราะห์ข้อมูล ประเด็นสำคัญที่ได้เรียนรู้:

การ Visualization:

Pandas มี Plotting ในตัวที่ใช้งานง่ายผ่าน .plot() method
สามารถสร้างกราฟหลายประเภท: Line, Bar, Histogram, Box, Scatter, Pie
การปรับแต่งขั้นสูงต้องใช้ Matplotlib โดยตรง
Seaborn ช่วยสร้างกราฟสวยงามและซับซ้อนได้ง่ายขึ้น
การเลือกประเภทกราฟต้องพิจารณาจากประเภทของข้อมูลและวัตถุประสงค์

การ Export:

CSV เหมาะสำหรับการแชร์ทั่วไป
Excel ดีสำหรับรายงานและนำเสนอ
JSON เหมาะสำหรับ Web APIs
Parquet/Feather เร็วและประหยัดพื้นที่สำหรับข้อมูลขนาดใหญ่
HTML ดีสำหรับการแสดงผลบนเว็บ

Best Practices:

เลือกรูปแบบกราฟและไฟล์ที่เหมาะสมกับผู้รับ
ใส่ Title, Labels, และ Legend ที่ชัดเจน
ตั้งชื่อไฟล์ที่มีความหมายและเป็นระบบ
ตรวจสอบข้อมูลก่อนส่งออกทุกครั้ง
บันทึกหลายรูปแบบเพื่อความยืดหยุ่น

เอกสารอ้างอิง

Pandas Documentation - Plotting
https://pandas.pydata.org/docs/user_guide/visualization.html
Matplotlib Documentation
https://matplotlib.org/stable/contents.html
Seaborn Tutorial
https://seaborn.pydata.org/tutorial.html
Plotly Python Documentation
https://plotly.com/python/
Apache Parquet Format
https://parquet.apache.org/
Wes McKinney (2017). "Python for Data Analysis"
O'Reilly Media - Chapter 9: Plotting and Visualization
Jake VanderPlas (2016). "Python Data Science Handbook"
O'Reilly Media - Chapter 4: Visualization with Matplotlib

🎉 ขอแสดงความยินดี! คุณได้เรียนจบ Chapter 9 แล้ว

คุณได้เรียนรู้วิธีการนำเสนอผลลัพธ์การวิเคราะห์ข้อมูลในรูปแบบที่สวยงามและมีประสิทธิภาพ จากนี้ไป คุณสามารถสร้างรายงานและ Dashboard ที่น่าประทับใจได้ด้วยตัวเอง! 💪📊✨