SQL & Backend Guardrails
Full documentation: See SQL_BEST_PRACTICES.md for complete reference.
Purpose
Prevent SQL-related bugs and ensure v2 API contract compliance. This skill acts as a guardrail for all backend database operations.
Part 1: Mandatory Checks Before Writing SQL
When This Activates
- Writing or modifying SQL queries (raw or ORM)
- Creating or updating service functions that touch the database
- Adding new API endpoints
- Refactoring existing database logic
The "Must Do" Checklist
✅ EVERY SQL QUERY MUST:
├── Parameter Style
│ ├── Use :param bind parameters ONLY
│ ├── NO %(param)s (psycopg2-specific)
│ └── NO f-string interpolation
│
├── Date Parameters
│ ├── Pass Python date/datetime objects
│ ├── NO string dates ("2024-01-01")
│ └── NO ::date casting unless schema mismatch
│
├── Enum Values
│ ├── Use contract_schema.py methods
│ ├── SaleType.to_db() for DB values
│ └── NO hardcoded strings ('Resale', 'New Sale')
│
├── Outlier Filtering
│ └── Use COALESCE(is_outlier, false) = false
│
└── Location
├── SQL in services/*_service.py
├── Pure logic in *_compute.py
└── Routes: parse params → call service → return
Part 2: Forbidden Patterns
python
1# ❌ FORBIDDEN: psycopg2-style parameters
2WHERE sale_type = %(sale_type)s
3
4# ❌ FORBIDDEN: Mixed parameter styles
5WHERE date >= :date_from AND type = %(type)s
6
7# ❌ FORBIDDEN: f-string interpolation
8query = f"WHERE psf > {min_psf}"
9
10# ❌ FORBIDDEN: String dates
11params = {"date_from": "2024-01-01"}
12
13# ❌ FORBIDDEN: Hardcoded enum strings
14if sale_type == 'New Sale':
15
16# ❌ FORBIDDEN: OR IS NULL pattern
17WHERE is_outlier = false OR is_outlier IS NULL
Part 3: Correct Patterns
SQL Parameter Style
python
1# ✅ CORRECT
2query = text("""
3 SELECT project, COUNT(*) as count
4 FROM transactions
5 WHERE COALESCE(is_outlier, false) = false
6 AND sale_type = :sale_type
7 AND transaction_date >= :date_from
8 AND transaction_date <= :date_to
9 AND psf > :psf_min
10""")
11
12params = {
13 "sale_type": SaleType.to_db(SaleType.RESALE),
14 "date_from": date(2023, 1, 1),
15 "date_to": date.today(),
16 "psf_min": 500
17}
18
19result = db.session.execute(query, params)
Enum Handling
python
1# ✅ CORRECT - Backend
2from api.contracts.contract_schema import SaleType
3
4sale_type_db = SaleType.to_db(sale_type_enum) # → "Resale"
javascript
1// ✅ CORRECT - Frontend
2import { SaleType, isSaleType } from '../schemas/apiContract';
3
4const isNew = isSaleType.newSale(row.saleType);
Part 4: v2 API Compliance
Endpoint Requirements
| Requirement | Implementation |
|---|
Support ?schema=v2 | Check query param |
| Default: dual-mode | Both snake_case + camelCase |
| v2: strict mode | camelCase only, enums only |
Response Pattern
python
1# Default response (backward compatible)
2{
3 "median_psf": 1500, # v1 (deprecated)
4 "medianPsf": 1500, # v2
5}
6
7# With ?schema=v2
8{
9 "medianPsf": 1500 # v2 only
10}
Part 5: Pre-Commit Validation
Before Any Backend Change, Verify:
- Parameter Style: No
%(...)s in any SQL
- Date Types: All date params are Python objects
- Enum Usage: No hardcoded DB strings
- Outlier Filter: Uses
COALESCE pattern
- File Location: SQL in services, not routes
- v2 Compliance: New endpoints support
?schema=v2
Part 6: Placeholder/Param Validation
The Problem
SQL placeholders (:param) are strings. Python dict keys are strings. No compiler catches mismatches.
python
1# BUG: SQL uses :max_date but params dict doesn't have it
2sql = "WHERE date < :max_date_exclusive AND date > :max_date - INTERVAL '12 months'"
3params = {"max_date_exclusive": tomorrow} # ❌ Missing max_date!
The Fix: Validate Before Execute
python
1import re
2
3def validate_sql_params(sql: str, params: dict) -> None:
4 """Fail fast if SQL placeholders don't match params."""
5 placeholders = set(re.findall(r':([a-zA-Z_][a-zA-Z0-9_]*)', sql))
6 param_keys = set(params.keys())
7
8 missing = placeholders - param_keys
9 if missing:
10 raise ValueError(f"SQL placeholders missing from params: {missing}")
11
12 unused = param_keys - placeholders
13 if unused:
14 # Warning only - unused params are safe but wasteful
15 logger.warning(f"Unused params (not in SQL): {unused}")
Where to add: In your DB helper so ALL queries benefit automatically.
Part 6b: Common Mistakes Quick Reference
| Anti-Pattern | Symptom | Grep to Find | Fix |
|---|
%(param)s style | Works locally, breaks in prod | grep -rn "%(.*)" backend/ | Use :param style |
| String dates | Query returns wrong results | grep -rn '"20[0-9][0-9]-' backend/services/ | Use Python date() objects |
| Hardcoded enums | Filter mismatches | grep -rn "'Resale'|'New Sale'" backend/ | Use SaleType.to_db() |
| Missing COALESCE | Outliers included | grep -rn "is_outlier = false" backend/ | Use COALESCE(is_outlier, false) = false |
| Mixed date params | Silent data gaps | grep -rn "max_date.*max_date_exclusive" backend/ | Pick ONE convention per query |
| SQL in routes | Scattered logic | grep -rn "text\(" backend/routes/ | Move to services/ |
Quick Audit Commands
bash
1# Find psycopg2-style params (FORBIDDEN)
2grep -rn "%(" backend/services/ backend/routes/
3
4# Find hardcoded date strings
5grep -rn '"20[0-9][0-9]-[0-9][0-9]' backend/
6
7# Find hardcoded sale type strings
8grep -rn "'Resale'\|'New Sale'\|'Sub Sale'" backend/
9
10# Find missing COALESCE on outliers
11grep -rn "is_outlier = false" backend/ | grep -v COALESCE
12
13# Find f-string interpolation in SQL (DANGEROUS)
14grep -rn 'f".*WHERE\|f".*SELECT\|f".*FROM' backend/
Part 7: Testing Requirements
Required Tests for SQL Changes
python
1# Unit test: Pure computation
2def test_calculate_metrics():
3 result = calculate_metrics(sample_data)
4 assert result['median'] == expected
5
6# Integration test: Valid query
7def test_endpoint_returns_data():
8 response = client.get('/api/endpoint')
9 assert response.status_code == 200
10
11# Contract test: v2 shape
12def test_v2_response_is_camel_case():
13 response = client.get('/api/endpoint?schema=v2')
14 data = response.json()
15 assert 'medianPsf' in data
16 assert 'median_psf' not in data
Part 8: SQL Correctness Patterns
Stable Keys for Joins
sql
1-- ❌ WRONG - display name changes break joins
2SELECT * FROM transactions t
3JOIN projects p ON t.project_name = p.project_name
4
5-- ✅ CORRECT - stable identifier
6SELECT * FROM transactions t
7JOIN projects p ON t.project_id = p.project_id
Deterministic Aggregations
sql
1-- ❌ WRONG - non-deterministic (arbitrary tie-break)
2SELECT MODE() WITHIN GROUP (ORDER BY project_name) as canonical_name
3
4-- ❌ WRONG - unordered array
5SELECT array_agg(project_name) as names
6
7-- ✅ CORRECT - explicit ordering/tie-break
8SELECT project_name
9FROM projects
10ORDER BY transaction_count DESC, project_id -- id breaks ties
11LIMIT 1
12
13-- ✅ CORRECT - ordered array
14SELECT array_agg(project_name ORDER BY project_id) as names
Two-Phase Pattern (Invariant Truth)
sql
1-- ❌ WRONG - filter changes the definition of launch_date
2WITH filtered AS (
3 SELECT * FROM transactions WHERE district = 'D01'
4)
5SELECT project_id, MIN(transaction_date) as launch_date
6FROM filtered
7GROUP BY project_id -- Launch date shifts based on filter!
8
9-- ✅ CORRECT - compute globally, then filter membership
10WITH launch_dates AS (
11 -- Phase A: Compute truth globally (no filters)
12 SELECT project_id, MIN(transaction_date) as launch_date
13 FROM transactions
14 GROUP BY project_id
15),
16filtered_projects AS (
17 -- Phase B: Filter is membership, not redefinition
18 SELECT DISTINCT project_id
19 FROM transactions
20 WHERE district = 'D01'
21)
22SELECT ld.*
23FROM launch_dates ld
24WHERE ld.project_id IN (SELECT project_id FROM filtered_projects)
Static SQL with NULL Guards
sql
1-- ❌ FORBIDDEN - Dynamic string building in Python
2query = "SELECT * FROM txn WHERE 1=1"
3if bedroom:
4 query += f" AND bedroom = {bedroom}" -- Injection risk!
5
6-- ✅ CORRECT - Static SQL, all filters in one query
7SELECT * FROM transactions
8WHERE COALESCE(is_outlier, false) = false
9 AND (:bedroom IS NULL OR bedroom = :bedroom)
10 AND (:district IS NULL OR district = :district)
11 AND (:date_from IS NULL OR transaction_date >= :date_from)
12 AND (:date_to IS NULL OR transaction_date < :date_to)
DB Does Set Work (No N+1)
python
1# ❌ WRONG - O(n²) Python loop
2project_names = db.execute("SELECT DISTINCT project_name FROM txn").fetchall()
3for name in project_names:
4 units = db.execute("SELECT * FROM units WHERE project_name = :n", {"n": name})
5 # N queries for N projects!
6
7# ✅ CORRECT - Single bulk join
8result = db.execute("""
9 SELECT t.project_name, u.unit_info
10 FROM (SELECT DISTINCT project_name FROM transactions) t
11 LEFT JOIN units u ON t.project_name = u.project_name
12""")
Quick Reference Card
SQL GUARDRAILS CHECKLIST
[ ] :param style only (no %(param)s)
[ ] Python date objects (no strings)
[ ] Enums via contract_schema.py
[ ] COALESCE(is_outlier, false) = false
[ ] Parameterized numeric values
[ ] SQL in service files
[ ] Placeholders match params.keys()
[ ] Exclusive bounds only (:max_date_exclusive, not :max_date)
[ ] v2 endpoint support
[ ] Tests for v1, v2, edge cases
Sign-Off Template
Before marking SQL work as complete:
markdown
1## SQL Change Sign-Off
2
3### Change Summary
4[Brief description]
5
6### Guardrail Compliance
7- [x] Parameter style: :param only
8- [x] Date handling: Python objects
9- [x] Enum handling: contract_schema.py
10- [x] Outlier filter: COALESCE pattern
11- [x] SQL location: services/ directory
12
13### v2 Compliance
14- [x] Supports ?schema=v2
15- [x] Returns camelCase in v2 mode
16- [x] Tests pass for both modes
17
18Verified by: [name]
19Date: [date]