Django Error Handling and Logging Skill
Purpose
Define and enforce error handling and logging standards for enterprise Django 6 systems, ensuring consistent error responses, structured logging, audit trails, and observability.
Scope
- Exception hierarchy and custom exceptions
- Error handling in services, resolvers, and views
- GraphQL error standardization
- Structured logging configuration
- Audit logging
- Error monitoring and alerting
- Async error handling
Responsibilities
- ENFORCE structured, JSON-formatted logging in production.
- ENFORCE custom exception hierarchy for domain errors.
- ENFORCE standardized error responses in GraphQL.
- ENFORCE audit logging for sensitive operations.
- PREVENT stack trace leakage to clients.
- PREVENT silent error swallowing.
Mandatory Rules
ALWAYS
- ALWAYS use Python's
logging module. Never use print().
- ALWAYS use structured JSON logging in production:
python
1LOGGING = {
2 'version': 1,
3 'disable_existing_loggers': False,
4 'formatters': {
5 'json': {
6 '()': 'pythonjsonlogger.jsonlogger.JsonFormatter',
7 'format': '%(asctime)s %(name)s %(levelname)s %(message)s %(pathname)s %(lineno)d',
8 },
9 },
10 'handlers': {
11 'console': {'class': 'logging.StreamHandler', 'formatter': 'json'},
12 },
13 'loggers': {
14 'django': {'handlers': ['console'], 'level': 'WARNING'},
15 'notifications': {'handlers': ['console'], 'level': 'INFO'},
16 'audit': {'handlers': ['console'], 'level': 'INFO'},
17 },
18 'root': {'handlers': ['console'], 'level': 'INFO'},
19}
- ALWAYS define a custom exception hierarchy for domain errors:
python
1# core/exceptions.py
2class AppError(Exception):
3 """Base exception for all application errors."""
4 def __init__(self, message: str, code: str = "INTERNAL_ERROR"):
5 self.message = message
6 self.code = code
7 super().__init__(message)
8
9class NotFoundError(AppError):
10 def __init__(self, resource: str, identifier):
11 super().__init__(f"{resource} not found: {identifier}", code="NOT_FOUND")
12
13class PermissionDeniedError(AppError):
14 def __init__(self, message: str = "Permission denied"):
15 super().__init__(message, code="PERMISSION_DENIED")
16
17class ValidationError(AppError):
18 def __init__(self, message: str, field: str = None):
19 self.field = field
20 super().__init__(message, code="VALIDATION_ERROR")
21
22class ConflictError(AppError):
23 def __init__(self, message: str):
24 super().__init__(message, code="CONFLICT")
- ALWAYS catch specific exceptions. Never use bare
except: or except Exception: without re-raising:
python
1# CORRECT
2try:
3 notification = Notification.objects.get(id=nid, recipient=user)
4except Notification.DoesNotExist:
5 raise NotFoundError("Notification", nid)
6
7# WRONG
8try:
9 ...
10except:
11 pass
- ALWAYS log exceptions with full traceback at ERROR level:
python
1try:
2 result = service.process(data)
3except AppError:
4 raise # Domain errors propagate normally
5except Exception:
6 logger.exception("Unexpected error in process()") # Logs full traceback
7 raise AppError("An unexpected error occurred")
- ALWAYS return standardized error responses from GraphQL resolvers:
python
1@strawberry.mutation(permission_classes=[IsAuthenticated])
2def mark_read(self, info: Info, notification_id: int) -> NotificationType | None:
3 try:
4 notification = notification_service.mark_as_read(notification_id, info.context["request"].user)
5 return NotificationType.from_model(notification)
6 except NotFoundError:
7 return None
8 except PermissionDeniedError:
9 return None
10 except Exception:
11 logger.exception(f"Error marking notification {notification_id} as read")
12 return None
- ALWAYS include contextual information in log messages:
python
1logger.info("Notification created", extra={
2 "notification_id": notification.id,
3 "recipient_id": notification.recipient_id,
4 "notification_type": notification.notification_type,
5 "actor_id": actor.id if actor else None,
6})
- ALWAYS use separate loggers for different concerns:
python
1logger = logging.getLogger(__name__) # Module-level logger
2audit_logger = logging.getLogger('audit') # Audit events
3perf_logger = logging.getLogger('performance') # Performance metrics
- ALWAYS use
logger.exception() (not logger.error()) when logging caught exceptions — it includes the traceback.
- ALWAYS set appropriate log levels:
DEBUG: Detailed diagnostic info (dev only)
INFO: Routine operations (notification created, task started)
WARNING: Unexpected but recoverable situations (cache miss, retry)
ERROR: Failures requiring attention (DB error, external API failure)
CRITICAL: System-level failures (Redis down, DB connection lost)
NEVER
- NEVER use
print() for logging or debugging.
- NEVER use bare
except: or except Exception: pass.
- NEVER log sensitive data: passwords, tokens, API keys, full credit card numbers, SSNs.
- NEVER expose internal stack traces, file paths, or SQL queries in API responses.
- NEVER silently swallow exceptions without logging.
- NEVER use string formatting in logger calls — use lazy formatting:
python
1# CORRECT — lazy formatting, evaluated only if log level is active
2logger.info("User %s created notification %s", user_id, notification_id)
3# ALSO CORRECT — extra dict
4logger.info("Notification created", extra={"user_id": user_id, "notif_id": notification_id})
5# WRONG — always evaluated, even if log level is disabled
6logger.debug(f"Processing {expensive_computation()}")
- NEVER log at ERROR level for expected business conditions (e.g., validation failures). Use WARNING or INFO.
Error Handling Layers
┌─────────────────────────────┐
│ GraphQL Resolver │ Catch AppError → return structured response
│ │ Catch Exception → log, return generic error
├─────────────────────────────┤
│ Service Layer │ Raise AppError subclasses for business errors
│ │ Let unexpected exceptions propagate
├─────────────────────────────┤
│ Model Layer │ Raise Django ValidationError in clean()
│ │ Raise IntegrityError on constraint violations
├─────────────────────────────┤
│ Database │ Raises OperationalError, IntegrityError
└─────────────────────────────┘
Audit Logging
Log these events to the audit logger:
- Authentication: login, logout, failed login, password change
- Authorization: permission denied events
- Data mutations: create, update, delete of sensitive models
- Admin actions: role changes, user management
- Configuration changes: feature flags, settings updates
Format:
python
1audit_logger.info("action_performed", extra={
2 "timestamp": timezone.now().isoformat(),
3 "user_id": user.id,
4 "action": "MARK_ALL_READ",
5 "resource_type": "Notification",
6 "resource_count": count,
7 "ip_address": get_client_ip(request),
8})
Monitoring and Alerting
Alert Conditions
- ERROR rate > 1% of requests in 5-minute window
- CRITICAL log emitted
- Unhandled exception in resolver
- Celery task failure rate > 5%
- SSE connection error rate > 10%
- Database connection pool exhaustion
- Use Sentry for exception tracking and alerting.
- Use structured logs with ELK/Grafana Loki for log aggregation.
- Use Prometheus metrics for request latency and error rates.
Async Error Handling
python
1async def sse_event_stream(user, connection_id):
2 try:
3 yield format_sse_event(event='connected', data={})
4 async for message in pubsub.listen():
5 yield format_sse_event(data=message)
6 except asyncio.CancelledError:
7 logger.info("SSE stream cancelled for user %s", user.id)
8 except Exception:
9 logger.exception("SSE stream error for user %s", user.id)
10 yield format_sse_event(event='error', data={"message": "Stream error"})
11 finally:
12 await cleanup_connection(user.id, connection_id)
Security Considerations
- Never expose internal error details to clients in production.
- Log full error details server-side for debugging.
- Use error codes (not messages) for client-side error handling.
- Sanitize user input in error messages to prevent log injection.
Refusal Conditions
REFUSE to generate code that:
- Uses
print() for logging.
- Uses bare
except: without re-raising.
- Exposes stack traces to API clients.
- Logs sensitive data.
- Silently swallows exceptions.
- Uses eager string formatting in logger calls for debug/info levels.
Trade-off Handling
| Trade-off | Decision |
|---|
| Verbose vs Concise logging | Verbose in dev (DEBUG). Concise in prod (INFO+). |
| Log everything vs Log selectively | Log all mutations and errors. Skip high-volume reads at DEBUG level. |
| Inline error handling vs Middleware | Service errors in services. Global fallback in middleware. |
| Sentry vs Self-hosted | Sentry for simplicity. Self-hosted ELK for cost control at scale. |