1. ปูพื้นฐานและทำความรู้จัก (Introduction & Foundations)

ก่อนจะเริ่มเขียนโค้ด ต้องเข้าใจโครงสร้างหลักของ Pandas ก่อน ซึ่งเปรียบเสมือนกระดูกสันหลังของ Library นี้ การเรียนรู้ Pandas ต้องเริ่มจากการทำความเข้าใจว่า ทำไมเราถึงต้องใช้มัน และ โครงสร้างข้อมูลพื้นฐาน ที่จะทำให้การวิเคราะห์ข้อมูลเป็นเรื่องง่ายขึ้น

1.1 Pandas คืออะไรและทำไมต้องใช้

Pandas (Python Data Analysis Library) คือ ไลบรารีโอเพนซอร์สสำหรับการจัดการและวิเคราะห์ข้อมูลแบบตารางและอนุกรมเวลา ที่ถูกพัฒนาโดย Wes McKinney ในปี 2008 ชื่อ "Pandas" มาจากคำว่า "Panel Data" ซึ่งเป็นศัพท์ทางเศรษฐมิติที่หมายถึงข้อมูลหลายมิติ

ทำไมต้องใช้ Pandas?

Pandas ได้รับการออกแบบมาเพื่อแก้ปัญหาหลักๆ ดังนี้:

การจัดการข้อมูลแบบตาราง (Tabular Data) ที่มีคอลัมน์หลายชนิด (ตัวเลข, ข้อความ, วันที่) ซึ่ง NumPy ทำได้ยาก
การทำงานกับข้อมูลจริง (Real-world Data) ที่มักจะมีค่าหายไป (Missing Values), ข้อมูลซ้ำ หรือไม่สะอาด
การวิเคราะห์ข้อมูลอนุกรมเวลา (Time Series) ซึ่งต้องการการจัดการวันที่และเวลาที่ซับซ้อน
การทำงานร่วมกับไฟล์ข้อมูลต่างๆ เช่น CSV, Excel, SQL, JSON อย่างง่ายดาย

ความแตกต่างระหว่าง NumPy และ Pandas

เพื่อให้เข้าใจว่าทำไมต้องใช้ Pandas เราต้องรู้ว่ามันต่างจาก NumPy อย่างไร:

graph LR
    A["NumPy
อาร์เรย์เชิงตัวเลข"] 
    B["Pandas
โครงสร้างข้อมูลแบบตาราง"]
    
    A --> C["✓ เร็วมาก
✓ ข้อมูลชนิดเดียว
✓ คำนวณเชิงตัวเลข"]
    B --> D["✓ ข้อมูลหลายชนิด
✓ จัดการ Missing Data
✓ Label-based indexing"]
    
    style A fill:#458588,stroke:#1d2021,stroke-width:2px,color:#fbf1c7
    style B fill:#b16286,stroke:#1d2021,stroke-width:2px,color:#fbf1c7
    style C fill:#689d6a,stroke:#1d2021,stroke-width:2px,color:#282828
    style D fill:#d79921,stroke:#1d2021,stroke-width:2px,color:#282828

ตารางเปรียบเทียบ NumPy vs Pandas:

ลักษณะ	NumPy	Pandas
ประเภทข้อมูล	ชนิดเดียวกันทั้ง array	หลายชนิดในแต่ละคอลัมน์
โครงสร้าง	N-dimensional array	Series (1D) และ DataFrame (2D)
Indexing	ตำแหน่ง (Integer-based)	Label และ Integer-based
Missing Values	ไม่มีการจัดการโดยตรง	มี NaN, None พร้อมเมธอดจัดการ
ความเร็ว	เร็วมาก (C-based)	ช้ากว่าเล็กน้อย (Built on NumPy)
การใช้งาน	คำนวณเชิงตัวเลข, Linear Algebra	การวิเคราะห์ข้อมูล, ETL, Statistics

ตัวอย่างการเปรียบเทียบ

import numpy as np
import pandas as pd

# NumPy: ต้องเป็นข้อมูลชนิดเดียว
numpy_array = np.array([1, 2, 3, 4, 5])
# ถ้าใส่ข้อความ มันจะแปลงทั้งหมดเป็น string
mixed_numpy = np.array([1, 2, 'three', 4, 5])
print(f"NumPy dtype: {mixed_numpy.dtype}")  # dtype: <U21 (Unicode string)

# Pandas: รองรับข้อมูลหลายชนิดในตาราง
pandas_df = pd.DataFrame({
    'numbers': [1, 2, 3, 4, 5],
    'text': ['one', 'two', 'three', 'four', 'five'],
    'floats': [1.1, 2.2, 3.3, 4.4, 5.5]
})
print(pandas_df.dtypes)
# numbers     int64
# text       object (string)
# floats    float64

ประวัติความเป็นมาของ Pandas

graph TB
    subgraph "ยุคก่อกำเนิด
2008-2011"
        A["2008
Wes McKinney
สร้าง Pandas
ที่ AQR Capital"]
        B["2009
Open Source
บน GitHub"]
        C["2011
Pandas 0.7
Time Series
สมบูรณ์"]
    end
    
    subgraph "ยุคเติบโต
2012-2017"
        D["2013
รองรับ
Multiple Indexing"]
        E["2015
Integration
กับ Jupyter"]
        F["2017
Pandas 0.20
Performance
ดีขึ้นมาก"]
    end
    
    subgraph "ยุคสมัยใหม่
2018-ปัจจุบัน"
        G["2020
Pandas 1.0
Stable API"]
        H["2023
Pandas 2.0
Copy-on-Write
Arrow Backend"]
        I["ปัจจุบัน
Standard Tool
Data Science"]
    end
    
    A --> B --> C --> D --> E --> F --> G --> H --> I
    
    style A fill:#cc241d,stroke:#1d2021,stroke-width:2px,color:#fbf1c7
    style B fill:#cc241d,stroke:#1d2021,stroke-width:2px,color:#fbf1c7
    style C fill:#cc241d,stroke:#1d2021,stroke-width:2px,color:#fbf1c7
    style D fill:#d65d0e,stroke:#1d2021,stroke-width:2px,color:#fbf1c7
    style E fill:#d65d0e,stroke:#1d2021,stroke-width:2px,color:#fbf1c7
    style F fill:#d65d0e,stroke:#1d2021,stroke-width:2px,color:#fbf1c7
    style G fill:#98971a,stroke:#1d2021,stroke-width:2px,color:#282828
    style H fill:#98971a,stroke:#1d2021,stroke-width:2px,color:#282828
    style I fill:#98971a,stroke:#1d2021,stroke-width:2px,color:#282828

1.2 การติดตั้งและ Import

การติดตั้ง Pandas

Pandas สามารถติดตั้งได้หลายวิธี แต่วิธีที่แนะนำคือการใช้ pip หรือ conda:

วิธีที่ 1: ใช้ pip (Python Package Installer)

# ติดตั้ง Pandas เวอร์ชันล่าสุด
pip install pandas

# ติดตั้งเวอร์ชันเฉพาะ
pip install pandas==2.0.0

# อัพเกรดเป็นเวอร์ชันล่าสุด
pip install --upgrade pandas

วิธีที่ 2: ใช้ conda (Anaconda/Miniconda)

# ติดตั้ง Pandas
conda install pandas

# หรือจาก conda-forge channel
conda install -c conda-forge pandas

Dependencies ที่จำเป็น

เมื่อติดตั้ง Pandas แพ็กเกจต่อไปนี้จะถูกติดตั้งด้วยอัตโนมัติ:

NumPy - การคำนวณเชิงตัวเลข
python-dateutil - การจัดการวันที่และเวลา
pytz - Timezone definitions

การ Import Pandas

ข้อตกลงมาตรฐาน (Convention) ในการ import Pandas คือการใช้ชื่อย่อว่า pd:

"""
การ Import Pandas แบบมาตรฐาน
"""
import pandas as pd
import numpy as np  # มักจะ import คู่กัน

# ตรวจสอบเวอร์ชัน
print(f"Pandas version: {pd.__version__}")
print(f"NumPy version: {np.__version__}")

# ตัวอย่างผลลัพธ์:
# Pandas version: 2.0.3
# NumPy version: 1.24.3

การตั้งค่า Display Options

Pandas มี options ที่ช่วยให้การแสดงผลอ่านง่ายขึ้น:

"""
ตั้งค่าการแสดงผลของ Pandas
"""
import pandas as pd

# แสดงทศนิยม 2 ตำแหน่ง
pd.set_option('display.precision', 2)

# แสดงแถวมากสุด 100 แถว
pd.set_option('display.max_rows', 100)

# แสดงคอลัมน์มากสุด 20 คอลัมน์
pd.set_option('display.max_columns', 20)

# กว้างของแต่ละคอลัมน์ไม่เกิน 50 ตัวอักษร
pd.set_option('display.max_colwidth', 50)

# ดูการตั้งค่าทั้งหมด
pd.describe_option()

# รีเซ็ตการตั้งค่ากลับเป็นค่าเริ่มต้น
pd.reset_option('all')

ตัวอย่างการใช้งานเบื้องต้น

"""
ตัวอย่างโค้ดแรกกับ Pandas
สร้างตารางข้อมูลนักเรียนและคำนวณเกรดเฉลี่ย
"""
import pandas as pd

# สร้าง DataFrame จาก dictionary
students = pd.DataFrame({
    'ชื่อ': ['สมชาย', 'สมหญิง', 'สมศักดิ์', 'สมใจ'],
    'คณิต': [85, 92, 78, 88],
    'วิทย์': [90, 88, 85, 91],
    'อังกฤษ': [78, 95, 82, 87]
})

# คำนวณคะแนนเฉลี่ย
students['เฉลี่ย'] = students[['คณิต', 'วิทย์', 'อังกฤษ']].mean(axis=1)

# แสดงผล
print(students)

# ตัวอย่างผลลัพธ์:
#        ชื่อ  คณิต  วิทย์  อังกฤษ    เฉลี่ย
# 0    สมชาย    85    90     78  84.33
# 1   สมหญิง    92    88     95  91.67
# 2  สมศักดิ์    78    85     82  81.67
# 3    สมใจ    88    91     87  88.67

1.3 โครงสร้างข้อมูล Series (1 มิติ)

Series คือ โครงสร้างข้อมูลพื้นฐานแบบ 1 มิติ ของ Pandas ซึ่งเปรียบเสมือน คอลัมน์เดียวในตาราง หรือ อาร์เรย์ที่มี Label Series ประกอบด้วย 2 ส่วนหลัก:

Values - ค่าข้อมูลจริง (อาร์เรย์ 1 มิติ)
Index - ป้ายกำกับ (Labels) ของแต่ละค่า

โครงสร้างของ Series

graph TB
    A["Pandas Series"] --> B["Index
ป้ายกำกับ"]
    A --> C["Values
ค่าข้อมูล"]
    A --> D["dtype
ชนิดข้อมูล"]
    
    B --> B1["0, 1, 2, ...
หรือ
custom labels"]
    C --> C1["Array-like
list, ndarray,
dict"]
    D --> D1["int64, float64,
object, datetime64,
category"]
    
    style A fill:#458588,stroke:#1d2021,stroke-width:3px,color:#fbf1c7
    style B fill:#b16286,stroke:#1d2021,stroke-width:2px,color:#fbf1c7
    style C fill:#d79921,stroke:#1d2021,stroke-width:2px,color:#282828
    style D fill:#689d6a,stroke:#1d2021,stroke-width:2px,color:#282828
    style B1 fill:#ebdbb2,stroke:#1d2021,stroke-width:1px,color:#282828
    style C1 fill:#ebdbb2,stroke:#1d2021,stroke-width:1px,color:#282828
    style D1 fill:#ebdbb2,stroke:#1d2021,stroke-width:1px,color:#282828

สมการแสดง Series

Series สามารถแทนได้ด้วยสมการ:

S = \{(i_{0}, v_{0}), (i_{1}, v_{1}), \dots, (i_{n}, v_{n})\}

คำอธิบายตัวแปร:

$S$ = Series object
$i_{k}$ = Index (label) ที่ตำแหน่ง k
$v_{k}$ = Value (ค่า) ที่ตำแหน่ง k
$n$ = จำนวนสมาชิกทั้งหมด

การสร้าง Series

1. สร้างจาก List:

"""
การสร้าง Series จาก List
"""
import pandas as pd
import numpy as np

# สร้าง Series แบบง่าย
scores = pd.Series([85, 90, 78, 92, 88])
print("Series จาก List:")
print(scores)
print(f"\nชนิดข้อมูล: {scores.dtype}")

# ตัวอย่างผลลัพธ์:
# 0    85
# 1    90
# 2    78
# 3    92
# 4    88
# dtype: int64

2. สร้างพร้อมกำหนด Index:

"""
การกำหนด Index แบบ Custom
"""
# สร้าง Series พร้อม label
students_scores = pd.Series(
    data=[85, 90, 78, 92, 88],
    index=['สมชาย', 'สมหญิง', 'สมศักดิ์', 'สมใจ', 'สมปอง']
)
print(students_scores)

# ตัวอย่างผลลัพธ์:
# สมชาย      85
# สมหญิง     90
# สมศักดิ์    78
# สมใจ       92
# สมปอง      88
# dtype: int64

3. สร้างจาก Dictionary:

"""
การสร้าง Series จาก Dictionary
Key จะกลายเป็น Index โดยอัตโนมัติ
"""
population = pd.Series({
    'กรุงเทพ': 8305218,
    'เชียงใหม่': 1725570,
    'นครราชสีมา': 2643551,
    'ขอนแก่น': 1791864
})
print(population)

# ตัวอย่างผลลัพธ์:
# กรุงเทพ        8305218
# เชียงใหม่      1725570
# นครราชสีมา     2643551
# ขอนแก่น        1791864
# dtype: int64

4. สร้างจาก NumPy Array:

"""
การสร้าง Series จาก NumPy Array
"""
# สร้างข้อมูลแบบสุ่ม
random_data = pd.Series(
    data=np.random.randn(5),
    index=['A', 'B', 'C', 'D', 'E']
)
print(random_data)

# ตัวอย่างผลลัพธ์:
# A    0.47
# B   -1.19
# C    1.43
# D   -0.31
# E    0.74
# dtype: float64

การ Indexing และ Selection

1. Integer-based Indexing:

"""
การเข้าถึงข้อมูลด้วยตำแหน่ง (Position)
"""
scores = pd.Series([85, 90, 78, 92, 88], 
                   index=['A', 'B', 'C', 'D', 'E'])

# เลือกแบบตำแหน่งเดียว
print(scores[0])      # 85
print(scores.iloc[0]) # 85 (แนะนำให้ใช้ .iloc สำหรับตำแหน่ง)

# เลือกหลายตำแหน่ง
print(scores.iloc[0:3])  # ตำแหน่ง 0, 1, 2
print(scores.iloc[[0, 2, 4]])  # ตำแหน่ง 0, 2, 4

2. Label-based Indexing:

"""
การเข้าถึงข้อมูลด้วย Label
"""
# เลือกด้วย Label
print(scores['A'])       # 85
print(scores.loc['A'])   # 85 (แนะนำให้ใช้ .loc สำหรับ Label)

# เลือกหลาย Labels
print(scores.loc[['A', 'C', 'E']])

# Slicing ด้วย Label (รวมจุดสุดท้าย!)
print(scores.loc['B':'D'])  # รวม 'D' ด้วย!

3. Boolean Indexing:

"""
การกรองข้อมูลด้วยเงื่อนไข
"""
# หาคะแนนที่มากกว่า 85
high_scores = scores[scores > 85]
print(high_scores)

# ตัวอย่างผลลัพธ์:
# B    90
# D    92
# E    88
# dtype: int64

# เงื่อนไขหลายตัว (AND)
medium_scores = scores[(scores >= 80) & (scores < 90)]

# เงื่อนไขหลายตัว (OR)
extreme_scores = scores[(scores < 80) | (scores > 90)]

การดำเนินการพื้นฐานกับ Series

1. การคำนวณทางสถิติ:

"""
ฟังก์ชันทางสถิติพื้นฐาน
"""
scores = pd.Series([85, 90, 78, 92, 88])

print(f"ค่าเฉลี่ย (Mean): {scores.mean()}")           # 86.6
print(f"ค่ามัธยฐาน (Median): {scores.median()}")      # 88.0
print(f"ส่วนเบี่ยงเบนมาตรฐาน (Std): {scores.std()}")  # 5.55
print(f"ค่าต่ำสุด (Min): {scores.min()}")             # 78
print(f"ค่าสูงสุด (Max): {scores.max()}")             # 92
print(f"ผลรวม (Sum): {scores.sum()}")                 # 433
print(f"จำนวน (Count): {scores.count()}")             # 5

# สรุปสถิติพร้อมกัน
print(scores.describe())

2. การดำเนินการทางคณิตศาสตร์:

"""
Vectorized Operations
การคำนวณทีละหลายค่าพร้อมกัน
"""
scores = pd.Series([85, 90, 78, 92, 88])

# บวกค่าคงที่ (เติมคะแนนให้ทุกคน 5 คะแนน)
bonus_scores = scores + 5
print(bonus_scores)

# คูณค่าคงที่ (แปลงคะแนนเป็นเปอร์เซ็นต์)
percentage = scores / 100
print(percentage)

# Series + Series (element-wise)
quiz1 = pd.Series([10, 15, 12, 14, 11])
quiz2 = pd.Series([13, 12, 15, 13, 14])
total_quiz = quiz1 + quiz2
print(total_quiz)

สมการการคำนวณแบบ element-wise:

S_{3} = S_{1} + S_{2} = [\begin{matrix} s_{1, 0} + s_{2, 0} \\ s_{1, 1} + s_{2, 1} \\ ⋮ \\ s_{1, n} + s_{2, n} \end{matrix}]

คำอธิบายตัวแปร:

$S_{1}$ , $S_{2}$ = Series ต้นทาง
$S_{3}$ = Series ผลลัพธ์
$s_{i, j}$ = ค่าใน Series i ที่ตำแหน่ง j

3. การจัดการ Missing Values:

"""
การทำงานกับค่าที่หายไป (NaN)
"""
# สร้าง Series ที่มี NaN
data_with_nan = pd.Series([85, np.nan, 78, 92, np.nan])

# ตรวจสอบค่า NaN
print(data_with_nan.isna())  # Boolean Series
print(f"จำนวน NaN: {data_with_nan.isna().sum()}")  # 2

# แทนที่ NaN ด้วยค่าเฉลี่ย
filled_mean = data_with_nan.fillna(data_with_nan.mean())
print(filled_mean)

# ลบค่า NaN ออก
dropped = data_with_nan.dropna()
print(dropped)

Attributes และ Methods ที่สำคัญของ Series

Attribute/Method	คำอธิบาย	ตัวอย่าง
`.values`	ดึงค่า numpy array	`s.values`
`.index`	ดึง Index object	`s.index`
`.dtype`	ประเภทข้อมูล	`s.dtype`
`.shape`	ขนาด (จำนวนสมาชิก,)	`s.shape`
`.size`	จำนวนสมาชิก	`s.size`
`.name`	ชื่อของ Series	`s.name = 'scores'`
`.head(n)`	ดูข้อมูล n แถวแรก	`s.head(3)`
`.tail(n)`	ดูข้อมูล n แถวสุดท้าย	`s.tail(3)`
`.unique()`	ค่าที่ไม่ซ้ำกัน	`s.unique()`
`.value_counts()`	นับจำนวนแต่ละค่า	`s.value_counts()`
`.sort_values()`	เรียงตามค่า	`s.sort_values()`
`.sort_index()`	เรียงตาม index	`s.sort_index()`

1.4 โครงสร้างข้อมูล DataFrame (2 มิติ)

DataFrame คือ โครงสร้างข้อมูลแบบตารางสองมิติ ที่เป็นหัวใจหลักของ Pandas สามารถคิดได้ว่า DataFrame เป็น คอลเลกชันของหลาย Series ที่มี Index ร่วมกัน หรือเปรียบเสมือน ตาราง Excel หรือ ตาราง SQL ใน Python

โครงสร้างของ DataFrame

graph TB
    A["Pandas DataFrame"] --> B["Columns
คอลัมน์"]
    A --> C["Index
แถว Row Labels"]
    A --> D["Values
ข้อมูลตาราง 2D"]
    
    B --> B1["Series 1
dtype: int64"]
    B --> B2["Series 2
dtype: float64"]
    B --> B3["Series 3
dtype: object"]
    
    C --> C1["0, 1, 2, ...
หรือ
Custom Index"]
    
    D --> D1["NumPy Array
2D Matrix"]
    
    style A fill:#458588,stroke:#1d2021,stroke-width:3px,color:#fbf1c7
    style B fill:#b16286,stroke:#1d2021,stroke-width:2px,color:#fbf1c7
    style C fill:#d79921,stroke:#1d2021,stroke-width:2px,color:#282828
    style D fill:#689d6a,stroke:#1d2021,stroke-width:2px,color:#282828
    style B1 fill:#ebdbb2,stroke:#1d2021,stroke-width:1px,color:#282828
    style B2 fill:#ebdbb2,stroke:#1d2021,stroke-width:1px,color:#282828
    style B3 fill:#ebdbb2,stroke:#1d2021,stroke-width:1px,color:#282828
    style C1 fill:#ebdbb2,stroke:#1d2021,stroke-width:1px,color:#282828
    style D1 fill:#ebdbb2,stroke:#1d2021,stroke-width:1px,color:#282828

สมการแสดง DataFrame

DataFrame สามารถแทนได้ด้วยเมทริกซ์:

DF = [\begin{matrix} v_{0, 0} & v_{0, 1} & \dots & v_{0, m} \\ v_{1, 0} & v_{1, 1} & \dots & v_{1, m} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ v_{n, 0} & v_{n, 1} & \dots & v_{n, m} \end{matrix}]

คำอธิบายตัวแปร:

$DF$ = DataFrame object
$n$ = จำนวนแถว (rows)
$m$ = จำนวนคอลัมน์ (columns)
$v_{i, j}$ = ค่าที่แถว i คอลัมน์ j

การสร้าง DataFrame

1. สร้างจาก Dictionary of Lists:

"""
วิธีที่นิยมที่สุด: Dictionary โดย Key = Column Name
"""
import pandas as pd

# สร้าง DataFrame จาก dictionary
students = pd.DataFrame({
    'รหัส': ['S001', 'S002', 'S003', 'S004'],
    'ชื่อ': ['สมชาย', 'สมหญิง', 'สมศักดิ์', 'สมใจ'],
    'อายุ': [20, 21, 19, 22],
    'เกรดเฉลี่ย': [3.45, 3.78, 3.12, 3.89]
})

print(students)

# ตัวอย่างผลลัพธ์:
#     รหัส       ชื่อ  อายุ  เกรดเฉลี่ย
# 0  S001    สมชาย    20      3.45
# 1  S002   สมหญิง    21      3.78
# 2  S003  สมศักดิ์    19      3.12
# 3  S004    สมใจ    22      3.89

# ดูข้อมูลโครงสร้าง
print(f"\nรูปร่าง (shape): {students.shape}")  # (4, 4) = 4 แถว, 4 คอลัมน์
print(f"ชื่อคอลัมน์: {students.columns.tolist()}")
print(f"ชนิดข้อมูล:\n{students.dtypes}")

2. สร้างจาก List of Dictionaries:

"""
แต่ละ Dictionary = 1 แถว
"""
employees = pd.DataFrame([
    {'ชื่อ': 'นาย A', 'ตำแหน่ง': 'Manager', 'เงินเดือน': 50000},
    {'ชื่อ': 'นาง B', 'ตำแหน่ง': 'Engineer', 'เงินเดือน': 45000},
    {'ชื่อ': 'นาย C', 'ตำแหน่ง': 'Analyst', 'เงินเดือน': 40000}
])

print(employees)

3. สร้างจาก List of Lists:

"""
List of Lists + กำหนดชื่อคอลัมน์
"""
data = [
    ['Python', 95, 'A'],
    ['Java', 88, 'B+'],
    ['C++', 92, 'A'],
    ['JavaScript', 85, 'B+']
]

courses = pd.DataFrame(
    data,
    columns=['วิชา', 'คะแนน', 'เกรด']
)

print(courses)

4. สร้างจาก NumPy Array:

"""
การสร้างจาก 2D NumPy Array
"""
import numpy as np

# สร้างข้อมูลตัวเลขสุ่ม
random_data = np.random.randn(5, 3)  # 5 แถว, 3 คอลัมน์

df_random = pd.DataFrame(
    random_data,
    columns=['A', 'B', 'C'],
    index=['Row1', 'Row2', 'Row3', 'Row4', 'Row5']
)

print(df_random)

5. สร้าง DataFrame เปล่าและเพิ่มข้อมูลทีหลัง:

"""
สร้าง DataFrame เปล่าแล้วเพิ่มคอลัมน์
"""
# สร้าง DataFrame เปล่า
df_empty = pd.DataFrame()

# เพิ่มคอลัมน์ทีละคอลัมน์
df_empty['ชื่อ'] = ['คนที่ 1', 'คนที่ 2', 'คนที่ 3']
df_empty['คะแนน'] = [85, 90, 78]
df_empty['ผ่าน'] = df_empty['คะแนน'] >= 80  # Boolean column

print(df_empty)

# ตัวอย่างผลลัพธ์:
#        ชื่อ  คะแนน     ผ่าน
# 0  คนที่ 1     85   True
# 1  คนที่ 2     90   True
# 2  คนที่ 3     78  False

ทำความเข้าใจ Rows และ Columns

graph TB
    A["DataFrame
ตาราง 2 มิติ"]
    
    A --> B["แกนคอลัมน์
Axis 1
Horizontal"]
    A --> C["แกนแถว
Axis 0
Vertical"]
    
    B --> B1["Each Column
= 1 Series"]
    B --> B2["ข้อมูลชนิดเดียวกัน
ในคอลัมน์เดียว"]
    B --> B3["Column Name
เป็น Label"]
    
    C --> C1["Each Row
= 1 Record"]
    C --> C2["ข้อมูลหลายชนิด
ในแถวเดียว"]
    C --> C3["Index
เป็น Label"]
    
    style A fill:#458588,stroke:#1d2021,stroke-width:3px,color:#fbf1c7
    style B fill:#d65d0e,stroke:#1d2021,stroke-width:2px,color:#fbf1c7
    style C fill:#98971a,stroke:#1d2021,stroke-width:2px,color:#282828
    style B1 fill:#ebdbb2,stroke:#1d2021,stroke-width:1px,color:#282828
    style B2 fill:#ebdbb2,stroke:#1d2021,stroke-width:1px,color:#282828
    style B3 fill:#ebdbb2,stroke:#1d2021,stroke-width:1px,color:#282828
    style C1 fill:#ebdbb2,stroke:#1d2021,stroke-width:1px,color:#282828
    style C2 fill:#ebdbb2,stroke:#1d2021,stroke-width:1px,color:#282828
    style C3 fill:#ebdbb2,stroke:#1d2021,stroke-width:1px,color:#282828

ตารางเปรียบเทียบ Rows vs Columns:

ลักษณะ	Rows (แถว)	Columns (คอลัมน์)
แกน (Axis)	Axis 0 (แนวตั้ง)	Axis 1 (แนวนอน)
หน่วย	1 Record / 1 Observation	1 Feature / 1 Variable
ข้อมูล	หลายชนิดในแถวเดียว	ชนิดเดียวในคอลัมน์เดียว
Label	Index	Column Names
เทียบ SQL	1 Row ในตาราง	1 Field ในตาราง
เทียบ Excel	แถวนอน	คอลัมน์ตั้ง A, B, C...

การเข้าถึง Columns

"""
วิธีการเลือกคอลัมน์ใน DataFrame
"""
students = pd.DataFrame({
    'รหัส': ['S001', 'S002', 'S003'],
    'ชื่อ': ['สมชาย', 'สมหญิง', 'สมศักดิ์'],
    'คะแนน': [85, 90, 78]
})

# วิธีที่ 1: ใช้ชื่อคอลัมน์เหมือน Dictionary (แนะนำ)
print(students['ชื่อ'])  # คืนค่าเป็น Series

# วิธีที่ 2: ใช้ Dot notation (เฉพาะชื่อที่ไม่มีช่องว่าง)
print(students.ชื่อ)  # คืนค่าเป็น Series

# เลือกหลายคอลัมน์ (คืนค่าเป็น DataFrame)
print(students[['ชื่อ', 'คะแนน']])

# ตัวอย่างผลลัพธ์:
#          ชื่อ  คะแนน
# 0    สมชาย     85
# 1   สมหญิง     90
# 2  สมศักดิ์     78

การเข้าถึง Rows

"""
วิธีการเลือกแถวใน DataFrame
"""
# เลือกแถวด้วยตำแหน่ง (iloc)
first_row = students.iloc[0]  # แถวแรก
print(first_row)

# เลือกหลายแถว
first_two = students.iloc[0:2]  # แถว 0 และ 1
print(first_two)

# เลือกแถวด้วย Label (loc) - ถ้ามี custom index
students_indexed = students.set_index('รหัส')
print(students_indexed.loc['S001'])  # แถวที่มี index = 'S001'

การเข้าถึงทั้ง Rows และ Columns พร้อมกัน

"""
การเลือกแบบผสม: Rows และ Columns
"""
# .loc[rows, columns] - Label-based
print(students.loc[0, 'ชื่อ'])  # แถว 0, คอลัมน์ 'ชื่อ'
print(students.loc[0:1, ['ชื่อ', 'คะแนน']])  # แถว 0-1, 2 คอลัมน์

# .iloc[rows, columns] - Integer-based
print(students.iloc[0, 1])  # แถว 0, คอลัมน์ 1
print(students.iloc[0:2, 1:3])  # แถว 0-1, คอลัมน์ 1-2

ตัวอย่างการใช้งานจริง

"""
สร้างระบบจัดการข้อมูลนักเรียน
พร้อมการคำนวณและวิเคราะห์
"""
import pandas as pd

# สร้างข้อมูลนักเรียน
students_df = pd.DataFrame({
    'รหัส': ['S001', 'S002', 'S003', 'S004', 'S005'],
    'ชื่อ': ['สมชาย', 'สมหญิง', 'สมศักดิ์', 'สมใจ', 'สมปอง'],
    'คณิต': [85, 92, 78, 88, 95],
    'วิทย์': [90, 88, 85, 91, 93],
    'อังกฤษ': [78, 95, 82, 87, 89],
    'เพศ': ['M', 'F', 'M', 'M', 'M']
})

# คำนวณคะแนนเฉลี่ย
students_df['เฉลี่ย'] = students_df[['คณิต', 'วิทย์', 'อังกฤษ']].mean(axis=1)

# คำนวณเกรด
def calculate_grade(score):
    """คำนวณเกรดจากคะแนนเฉลี่ย"""
    if score >= 90:
        return 'A'
    elif score >= 85:
        return 'B+'
    elif score >= 80:
        return 'B'
    elif score >= 75:
        return 'C+'
    else:
        return 'C'

students_df['เกรด'] = students_df['เฉลี่ย'].apply(calculate_grade)

# แสดงผล
print("ข้อมูลนักเรียนทั้งหมด:")
print(students_df)

print("\n=== สถิติสรุป ===")
print(f"คะแนนเฉลี่ยรวม: {students_df['เฉลี่ย'].mean():.2f}")
print(f"คะแนนสูงสุด: {students_df['เฉลี่ย'].max():.2f}")
print(f"คะแนนต่ำสุด: {students_df['เฉลี่ย'].min():.2f}")

print("\n=== นักเรียนที่ได้เกรด A ===")
grade_a_students = students_df[students_df['เกรด'] == 'A']
print(grade_a_students[['ชื่อ', 'เฉลี่ย', 'เกรด']])

# ตัวอย่างผลลัพธ์:
#     รหัส       ชื่อ  คณิต  วิทย์  อังกฤษ เพศ    เฉลี่ย เกรด
# 0  S001    สมชาย    85    90     78  M  84.33   B+
# 1  S002   สมหญิง    92    88     95  F  91.67    A
# 2  S003  สมศักดิ์    78    85     82  M  81.67    B
# 3  S004    สมใจ    88    91     87  M  88.67   B+
# 4  S005   สมปอง    95    93     89  M  92.33    A

Attributes และ Methods สำคัญของ DataFrame

Attribute/Method	คำอธิบาย	ตัวอย่าง
`.shape`	(แถว, คอลัมน์)	`df.shape` → (5, 3)
`.columns`	รายชื่อคอลัมน์	`df.columns`
`.index`	รายชื่อ index	`df.index`
`.dtypes`	ชนิดข้อมูลแต่ละคอลัมน์	`df.dtypes`
`.values`	NumPy array 2D	`df.values`
`.head(n)`	ดู n แถวแรก	`df.head(3)`
`.tail(n)`	ดู n แถวสุดท้าย	`df.tail(3)`
`.info()`	สรุปข้อมูลโครงสร้าง	`df.info()`
`.describe()`	สถิติพื้นฐาน	`df.describe()`
`.T`	Transpose (สลับแถว-คอลัมน์)	`df.T`

สรุปโดยรวม

ในบทนี้เราได้เรียนรู้พื้นฐานสำคัญของ Pandas ที่จะเป็นรากฐานสำหรับการวิเคราะห์ข้อมูล:

สิ่งที่ได้เรียนรู้

Pandas คืออะไร: ไลบรารีสำหรับจัดการข้อมูลแบบตาราง มีความสามารถเหนือกว่า NumPy ในการทำงานกับข้อมูลจริง
การติดตั้งและ Import: การติดตั้งด้วย pip/conda และการ import แบบมาตรฐาน import pandas as pd
Series (1 มิติ): โครงสร้างข้อมูลพื้นฐานที่ประกอบด้วย Index และ Values เหมาะสำหรับข้อมูลคอลัมน์เดียว
DataFrame (2 มิติ): โครงสร้างตารางที่เป็นคอลเลกชันของ Series มี Rows และ Columns พร้อมความสามารถในการจัดการข้อมูลที่ซับซ้อน

แนวคิดสำคัญที่ต้องจำ

Series = คอลัมน์เดียว, DataFrame = หลายคอลัมน์
Index คือป้ายกำกับสำหรับแถว, Columns คือป้ายกำกับสำหรับคอลัมน์
ใช้ .loc[] สำหรับ label-based indexing
ใช้ .iloc[] สำหรับ integer-based indexing
DataFrame สามารถมีข้อมูลหลายชนิดในคอลัมน์ต่างๆ

ขั้นตอนต่อไป

หลังจากเรียนรู้พื้นฐานแล้ว ขั้นตอนต่อไปคือ:

การนำเข้าข้อมูล: อ่านไฟล์ CSV, Excel, JSON
การสำรวจข้อมูล: ดูภาพรวมและสถิติเบื้องต้น
การเลือกและกรองข้อมูล: Boolean indexing, Query
การทำความสะอาดข้อมูล: จัดการ Missing Values และข้อมูลซ้ำ

คำแนะนำในการฝึกฝน

ลองสร้าง DataFrame จากข้อมูลของตัวเอง
ทดลองใช้ Methods ต่างๆ เพื่อสำรวจข้อมูล
ฝึกการเลือกข้อมูล ด้วย .loc[] และ .iloc[]
ทำโปรเจกต์เล็กๆ เช่น วิเคราะห์คะแนนสอบ หรือรายรับรายจ่าย

เอกสารอ้างอิง

Pandas Official Documentation: https://pandas.pydata.org/docs/
Pandas User Guide: https://pandas.pydata.org/docs/user_guide/index.html
10 Minutes to Pandas: https://pandas.pydata.org/docs/user_guide/10min.html
Python for Data Analysis by Wes McKinney (ผู้สร้าง Pandas)
Pandas Cookbook: https://pandas.pydata.org/docs/user_guide/cookbook.html
NumPy Documentation: https://numpy.org/doc/

หมายเหตุ: เอกสารนี้เป็นส่วนหนึ่งของซีรีส์การเรียนรู้ Pandas ฉบับสมบูรณ์ สำหรับหัวข้อถัดไป โปรดติดตาม "2. การนำเข้าและสำรวจข้อมูล (Data Loading & Inspection)"