Excel (XLSX) Skill
Overview
This skill provides comprehensive capabilities for working with Excel spreadsheets programmatically using Python. It covers everything from basic file operations to advanced data analysis, formula management, chart creation, and formatting.
The primary library is openpyxl for full Excel file manipulation, supplemented by pandas for data analysis tasks.
Core Capabilities
1. File Operations
- Reading: Load .xlsx, .xlsm, and .csv files
- Writing: Create new Excel workbooks from scratch
- Editing: Modify existing workbooks while preserving formulas, formatting, and charts
- Converting: Transform between CSV, Excel, and other formats
2. Data Management
- Cell Operations: Read, write, and modify individual cells or ranges
- Formulas: Create and manage Excel formulas (SUM, VLOOKUP, INDEX/MATCH, etc.)
- Data Validation: Set dropdown lists, numeric ranges, date constraints
- Named Ranges: Define and use named cell ranges for easier formula management
- Cell Styling: Fonts, colors, borders, alignment, number formats
- Conditional Formatting: Apply rules-based formatting
- Row/Column Sizing: Set widths, heights, auto-fit
- Merge Cells: Combine cells for headers and labels
4. Charts & Visualizations
- Chart Types: Line, bar, column, pie, scatter, area, combo charts
- Chart Customization: Titles, legends, data labels, colors
- Multiple Series: Multi-dataset charts with secondary axes
- Chart Positioning: Place charts in specific locations
5. Multi-Worksheet Operations
- Sheet Management: Create, rename, delete, reorder worksheets
- Cross-Sheet Formulas: Reference data across multiple sheets
- Sheet Copying: Duplicate sheets with formatting intact
- Sheet Protection: Lock/unlock sheets and ranges
6. Data Analysis
- Filtering: Auto-filter data ranges
- Sorting: Multi-level sorting
- Pivot Tables: Programmatic pivot table creation
- Statistical Functions: Built-in and custom calculations
Installation
bash
1# Primary library
2pip install openpyxl
3
4# For data analysis
5pip install pandas openpyxl
6
7# Or with uv
8uv pip install openpyxl pandas
Essential Workflows
Workflow 1: Creating a New Workbook from Scratch
python
1from openpyxl import Workbook
2from openpyxl.styles import Font, PatternFill, Alignment
3from openpyxl.utils import get_column_letter
4
5# Create new workbook
6wb = Workbook()
7ws = wb.active
8ws.title = "Sales Report"
9
10# Add headers with formatting
11headers = ["Product", "Q1", "Q2", "Q3", "Q4", "Total"]
12header_fill = PatternFill(start_color="4472C4", end_color="4472C4", fill_type="solid")
13header_font = Font(color="FFFFFF", bold=True)
14
15for col, header in enumerate(headers, start=1):
16 cell = ws.cell(row=1, column=col, value=header)
17 cell.fill = header_fill
18 cell.font = header_font
19 cell.alignment = Alignment(horizontal="center")
20
21# Add data
22data = [
23 ["Product A", 1000, 1200, 1100, 1300],
24 ["Product B", 800, 900, 950, 1000],
25 ["Product C", 1500, 1400, 1600, 1700]
26]
27
28for row_idx, row_data in enumerate(data, start=2):
29 for col_idx, value in enumerate(row_data, start=1):
30 ws.cell(row=row_idx, column=col_idx, value=value)
31
32# Add formulas for totals
33for row in range(2, len(data) + 2):
34 formula = f"=SUM(B{row}:E{row})"
35 ws.cell(row=row, column=6, value=formula)
36
37# Adjust column widths
38for col in range(1, 7):
39 ws.column_dimensions[get_column_letter(col)].width = 12
40
41# Save workbook
42wb.save("sales_report.xlsx")
Workflow 2: Reading and Analyzing Existing Workbooks
python
1from openpyxl import load_workbook
2
3# Load existing workbook
4wb = load_workbook('data.xlsx', data_only=True) # data_only=True evaluates formulas
5ws = wb.active
6
7# Method 1: Iterate through all rows
8for row in ws.iter_rows(min_row=2, values_only=True):
9 print(row)
10
11# Method 2: Read specific cells
12value = ws['A1'].value
13value = ws.cell(row=2, column=2).value
14
15# Method 3: Read range
16for row in ws['B2':'D5']:
17 for cell in row:
18 print(cell.value, end=' ')
19 print()
20
21# Calculate statistics
22values = [cell.value for cell in ws['B'][1:] if isinstance(cell.value, (int, float))]
23if values:
24 print(f"Sum: {sum(values)}")
25 print(f"Average: {sum(values) / len(values):.2f}")
26
27wb.close()
python
1from openpyxl import load_workbook
2from openpyxl.styles import Font
3
4# Load workbook WITHOUT data_only to preserve formulas
5wb = load_workbook('existing_report.xlsx')
6ws = wb['Sales']
7
8# Update values (formulas will recalculate when opened in Excel)
9ws['B2'] = 1500
10ws['C2'] = 1650
11
12# Add new row with data and formulas
13new_row = ws.max_row + 1
14ws[f'A{new_row}'] = "Product D"
15ws[f'B{new_row}'] = 900
16ws[f'C{new_row}'] = 1000
17ws[f'D{new_row}'] = 1100
18ws[f'E{new_row}'] = 1200
19ws[f'F{new_row}'] = f"=SUM(B{new_row}:E{new_row})" # Add formula
20
21# Apply formatting to new row
22for col in range(1, 7):
23 cell = ws.cell(row=new_row, column=col)
24 if col == 1:
25 cell.font = Font(bold=True)
26
27# Save changes
28wb.save('existing_report.xlsx')
Workflow 4: Working with Pandas for Data Analysis
python
1import pandas as pd
2from openpyxl import load_workbook
3from openpyxl.styles import Font, PatternFill
4
5# Step 1: Read and analyze data with pandas
6df = pd.read_excel('sales_data.xlsx')
7
8# Perform analysis
9summary = df.groupby('Product').agg({
10 'Sales': ['sum', 'mean', 'count'],
11 'Profit': 'sum'
12}).round(2)
13
14summary.columns = ['Total Sales', 'Avg Sales', 'Transactions', 'Total Profit']
15
16# Step 2: Write results to new Excel file
17with pd.ExcelWriter('sales_analysis.xlsx', engine='openpyxl') as writer:
18 df.to_excel(writer, sheet_name='Raw Data', index=False)
19 summary.to_excel(writer, sheet_name='Summary')
20
21# Step 3: Enhance with openpyxl formatting
22wb = load_workbook('sales_analysis.xlsx')
23ws = wb['Summary']
24
25header_fill = PatternFill(start_color="366092", end_color="366092", fill_type="solid")
26header_font = Font(color="FFFFFF", bold=True)
27
28for cell in ws[1]:
29 cell.fill = header_fill
30 cell.font = header_font
31
32wb.save('sales_analysis.xlsx')
Key Principles
- Always use formulas for calculations instead of hardcoded values
- Formulas update automatically when source data changes
- Use named ranges for complex formulas to improve readability
- Don't use
data_only=True when loading files if you need to preserve formulas
- For large datasets: Write rows in bulk using
ws.append()
- For reading large files: Use
read_only=True mode
- For writing large files: Use
write_only=True mode
- Avoid cell-by-cell operations in nested loops
Memory Management
- Close workbooks after use:
wb.close()
- Use read_only/write_only modes for large files
- Process data in chunks for very large datasets
Error Handling
- Always use try/except blocks for file operations
- Check for empty cells before processing
- Validate data types before calculations
- Handle InvalidFileException for corrupted files
Date and Time
- Use
datetime objects for dates, not strings
- Apply proper number formats:
cell.number_format = 'mm/dd/yyyy'
- Excel stores dates as numbers internally
Quick Reference
Basic Operations
python
1from openpyxl import Workbook, load_workbook
2
3# Create workbook
4wb = Workbook()
5ws = wb.active
6
7# Read cell
8value = ws['A1'].value
9value = ws.cell(row=1, column=1).value
10
11# Write cell
12ws['A1'] = "Hello"
13ws.cell(row=1, column=1, value="Hello")
14
15# Write formula
16ws['C1'] = "=A1+B1"
17
18# Add row
19ws.append([1, 2, 3])
20
21# Save and close
22wb.save('output.xlsx')
23wb.close()
Common Imports
python
1from openpyxl import Workbook, load_workbook
2from openpyxl.styles import Font, PatternFill, Border, Side, Alignment
3from openpyxl.chart import LineChart, BarChart, PieChart, Reference
4from openpyxl.utils import get_column_letter
5from openpyxl.data_validation import DataValidation
6from openpyxl.formatting.rule import ColorScaleRule, CellIsRule
Common Use Cases
Creating Charts
Add visualizations to your spreadsheets. See examples/workflow-examples.md for complete chart creation workflow including line charts, bar charts, and pie charts.
Apply visual formatting based on cell values. See examples/workflow-examples.md for color scales, icon sets, and rule-based formatting.
Data Validation
Create dropdown lists and input constraints. See examples/workflow-examples.md for dropdown lists, numeric ranges, and date validation.
Multi-Sheet Workbooks
Work with multiple worksheets and cross-sheet formulas. See examples/workflow-examples.md for complete multi-sheet workflow.
Financial Reports
Create professional financial statements. See examples/financial-report.md for a complete income statement example with dynamic formulas.
Transform CSV data into formatted Excel reports. See examples/data-transformation.md for pandas integration and pivot table creation.
Dashboards
Build executive dashboards with multiple charts. See examples/dashboard-creation.md for comprehensive dashboard with KPIs and visualizations.
Helper Scripts
The scripts/ directory provides utility functions for common operations:
python
1from scripts.excel_helper import (
2 create_workbook,
3 read_excel_data,
4 add_chart,
5 apply_formatting,
6 add_formula,
7 auto_fit_columns
8)
9
10# Create new workbook with data
11wb, ws = create_workbook("Sales Report", headers=["Product", "Q1", "Q2"])
12
13# Read data from existing file
14data = read_excel_data("data.xlsx", sheet_name="Sheet1")
15
16# Add chart to worksheet
17add_chart(ws, chart_type="line", data_range="B2:D10", title="Sales Trend")
18
19# Apply formatting
20apply_formatting(ws, cell_range="A1:D1", bold=True, bg_color="4472C4")
21
22# Add formula to range
23add_formula(ws, cell="E2", formula="=SUM(B2:D2)", copy_down=10)
24
25# Auto-fit all columns
26auto_fit_columns(ws)
27
28wb.save("output.xlsx")
Additional Resources
Detailed Documentation
- Library Reference: See
references/library-reference.md for complete openpyxl, pandas, and xlsxwriter documentation
- Best Practices: See
references/best-practices.md for performance optimization, error handling, and common pitfalls
Complete Examples
- Workflow Examples:
examples/workflow-examples.md - Charts, conditional formatting, data validation, multi-sheet operations
- Financial Reports:
examples/financial-report.md - Income statement with dynamic formulas
- Data Transformation:
examples/data-transformation.md - CSV to Excel with pandas integration
- Dashboard Creation:
examples/dashboard-creation.md - Multi-chart dashboard with KPIs
External Links
Summary
This skill enables comprehensive Excel automation including:
- Creating complex spreadsheets with formulas and formatting
- Reading and analyzing existing workbooks
- Editing files while preserving formulas and styles
- Creating professional charts and visualizations
- Applying conditional formatting and data validation
- Working with multiple worksheets and cross-sheet formulas
- Integrating with pandas for advanced data analysis
- Handling large datasets efficiently
Use this skill for any task involving Excel files, from simple data entry to complex financial reports and dashboards.