Purpose: Systematically diagnose and fix pagination failures that prevent complete data import from APIs
Core Principles
1. Verify API Response Structure Before Assuming
Never assume pagination fields based on documentation or other endpoints. Always test actual responses:
bash
1curl -s API_ENDPOINT | jq 'keys'
Different API versions or endpoints may use different pagination patterns even within the same service.
APIs use distinct pagination patterns that require different implementations:
- Cursor-based:
{nextPageCursor, results} - use cursor param
- Page-based:
{page, total_pages, results} - use page number param
- Offset-based:
{offset, limit, total} - use offset/limit params
- Link-based:
{next, previous, results} - follow next URL
Using the wrong pattern causes pagination to stop after first page.
3. Optimize Page Size for Efficiency
Most APIs support configurable page sizes (e.g., 50-1000 items per page). Using maximum page_size:
- Reduces total API calls (20x fewer calls with 1000 vs 50)
- Decreases network overhead
- Minimizes rate limit exposure
- Speeds up bulk imports
Before implementing pagination logic:
- Fetch page 1 and inspect response structure
- Manually fetch page 2 to confirm field values
- Verify cursor/page advancement works correctly
- Check termination condition (null cursor, empty results, etc.)
Systematic Debugging Workflow
Step 1: Reproduce the Issue
Symptoms of pagination failure:
- Import stops after exactly 1 page
- Returns same results repeatedly
- Status shows "completed_all_pages" but dataset incomplete
- Missing data compared to known totals
Example:
Expected: 74,386 highlights
Actual: 463 files (< 1% of total)
Status: "completed_all_pages" after 1 page
Step 2: Inspect Actual API Response
Don't trust assumptions - verify response structure:
bash
1# Fetch first page and check structure
2curl -s -H "Authorization: Token $TOKEN" \
3 "https://api.example.com/endpoint?page_size=50" | jq 'keys'
4
5# Expected output reveals actual fields:
6# ["count", "nextPageCursor", "results"]
7# NOT ["count", "next", "previous", "results"]
Critical checks:
Step 3: Compare Expected vs Actual Fields
Common mismatches:
| Expected (Wrong) | Actual (Correct) | Impact |
|---|
next | nextPageCursor | Stops after page 1 |
page parameter | pageCursor parameter | Repeats page 1 |
| Page number increment | Cursor advancement | Never progresses |
has_more boolean | null cursor | Wrong termination check |
Step 4: Test Second Page Manually
Verify pagination actually works:
bash
1# Get page 1
2PAGE1=$(curl -s -H "Authorization: Token $TOKEN" \
3 "https://api.example.com/endpoint?page_size=50")
4
5# Extract cursor
6CURSOR=$(echo $PAGE1 | jq -r '.nextPageCursor')
7
8# Get page 2 using cursor
9curl -s -H "Authorization: Token $TOKEN" \
10 "https://api.example.com/endpoint?page_size=50&pageCursor=$CURSOR" \
11 | jq '{count, nextPageCursor, results_count: (.results | length)}'
Expected results:
- Different
results array contents
- New
nextPageCursor value (or null if last page)
- Progress toward completion
Update implementation to match API design:
python
1# Initialize
2cursor = None
3page_num = 0
4
5while True:
6 page_num += 1
7
8 # Build params
9 params = {"page_size": 1000} # Use maximum
10 if cursor:
11 params["pageCursor"] = cursor # Use correct param name
12
13 # Fetch page
14 response = fetch_api(endpoint, params)
15 results = response.get("results", [])
16
17 if not results:
18 break # Empty results = done
19
20 # Process results
21 for item in results:
22 process(item)
23
24 # Get next cursor
25 next_cursor = response.get("nextPageCursor") # Use correct field name
26 if not next_cursor:
27 break # No more pages
28
29 cursor = next_cursor # Advance cursor
python
1# Initialize
2page_num = 1
3
4while True:
5 # Build params
6 params = {"page": page_num, "page_size": 1000}
7
8 # Fetch page
9 response = fetch_api(endpoint, params)
10 results = response.get("results", [])
11
12 if not results:
13 break
14
15 # Process results
16 for item in results:
17 process(item)
18
19 # Check if more pages exist
20 if not response.get("next"): # Or check page_num < total_pages
21 break
22
23 page_num += 1 # Increment page number
Step 6: Verify Fix with Logging
Add debug logging to confirm pagination works:
python
1logger.info(f"Page {page_num}: {len(results)} items, cursor={cursor}, next={next_cursor}")
Expected log output:
Page 1: 1000 items, cursor=None, next=55771679
Page 2: 1000 items, cursor=55771679, next=55114962
Page 3: 1000 items, cursor=55114962, next=54503291
...
Page 75: 386 items, cursor=12847563, next=null
Step 7: Optimize Page Size
Before optimization:
python
1params = {"page_size": 50} # Small pages
2# Result: 1,488 pages needed for 74,386 items
After optimization:
python
1params = {"page_size": 1000} # Maximum supported
2# Result: 75 pages needed for 74,386 items
3# Improvement: 20x fewer API calls
Check API documentation for:
- Maximum page_size allowed
- Rate limits (larger pages = fewer calls)
- Response time vs page size tradeoffs
✅ REQUIRED Patterns
DO: Test actual API responses before implementing
Never rely on documentation alone. Always curl the endpoint and inspect response structure:
bash
1curl -s API_ENDPOINT | jq '.'
DO: Use maximum page_size supported by API
Default page sizes are often inefficient (50-100 items). Check API limits and use maximum:
python
1# Efficient
2params = {"page_size": 1000}
3
4# Inefficient
5params = {"page_size": 50} # 20x more API calls
DO: Match parameter names exactly
API field names are case-sensitive and specific:
python
1# CORRECT
2params["pageCursor"] = cursor
3
4# WRONG (will not work)
5params["page_cursor"] = cursor # Snake case instead of camelCase
6params["cursor"] = cursor # Missing "page" prefix
DO: Add pagination logging for diagnosis
Always log pagination progress:
python
1logger.info(f"Page {page}: {len(results)} items, next={next_cursor}")
DO: Verify termination conditions
Check both conditions to prevent infinite loops:
python
1# Check empty results
2if not results:
3 break
4
5# AND check next cursor/page
6if not next_cursor: # or not has_more, or page >= total_pages
7 break
❌ FORBIDDEN Patterns
DON'T: Assume pagination pattern from other endpoints
Different endpoints in same API may use different pagination:
python
1# WRONG: Assume v2 uses same pagination as v3
2# v3 endpoint uses page numbers
3# v2 endpoint uses cursors
DON'T: Check wrong field for continuation
python
1# WRONG
2if not data.get("next"): # Field doesn't exist
3 break
4
5# RIGHT
6if not data.get("nextPageCursor"): # Actual field name
7 break
DON'T: Use inefficient page sizes
python
1# WRONG: Causes 20x more API calls
2params = {"page_size": 50}
3
4# RIGHT: Minimizes API calls
5params = {"page_size": 1000}
DON'T: Increment page numbers for cursor-based APIs
python
1# WRONG: Page number ignored for cursor-based pagination
2page_num = 1
3while True:
4 params = {"page": page_num} # Repeats page 1 forever
5 page_num += 1
6
7# RIGHT: Use cursor advancement
8cursor = None
9while True:
10 params = {"pageCursor": cursor} if cursor else {}
11 cursor = response.get("nextPageCursor")
DON'T: Skip manual testing before implementation
python
1# WRONG: Implement without verifying
2# Assume API uses page numbers, implement pagination
3# Deploy and discover it uses cursors
4
5# RIGHT: Test first
6# curl endpoint | jq 'keys'
7# Verify field names
8# Test page 2 manually
9# Then implement
Quick Decision Tree
NO - stops after 1 page:
- Check actual API response structure (curl + jq)
- Compare field names (case-sensitive)
- Verify parameter names match API expectations
- Test page 2 manually
NO - returns duplicates:
- Check if using page number instead of cursor
- Verify cursor is advancing
- Check if parameter name is correct
YES - but slow:
- Check page_size value
- Increase to maximum supported
- Balance with rate limits
API returns nextPageCursor field:
→ Use cursor-based pagination with pageCursor parameter
API returns next URL:
→ Follow link-based pagination (use next URL directly)
API returns page and total_pages:
→ Use page-based pagination with page parameter
API returns offset and total:
→ Use offset-based pagination with offset and limit parameters
Common Mistakes
Mistake 1: Checking Non-Existent Field
Problem:
python
1if not data.get("next"): # Field doesn't exist in response
2 break
Solution:
bash
1# First, check actual response
2curl API | jq 'keys'
3# Output: ["count", "nextPageCursor", "results"]
4
5# Then use correct field
6if not data.get("nextPageCursor"):
7 break
Mistake 2: Using Wrong Parameter Name
Problem:
python
1params["page"] = page_num # API doesn't use page numbers
Solution:
python
1# Cursor-based APIs require cursor parameter
2params["pageCursor"] = cursor # Not "page"
Mistake 3: Small Page Size
Problem:
python
1params = {"page_size": 50}
2# 74,386 items ÷ 50 = 1,488 API calls
Solution:
python
1params = {"page_size": 1000} # Use maximum
2# 74,386 items ÷ 1000 = 75 API calls
3# 20x improvement
Examples
Context:
- Readwise MCP server stuck importing 463 highlights instead of 74,386
- Status: "completed_all_pages" after 1 page
- Using v2 export API endpoint
❌ WRONG - Assumed page-based pagination
python
1# Incorrect implementation
2page_num = 1
3while page_num < 1000:
4 params = {"page": page_num, "page_size": 50}
5 data = fetch_api("/export/", params, api_version="v2")
6
7 # Wrong field check
8 if not data.get("next"): # This field doesn't exist
9 break
10
11 page_num += 1 # Never executed because break on page 1
Problem: API uses cursor-based pagination, not page numbers. Field is nextPageCursor not next.
✅ RIGHT - Cursor-based pagination with correct fields
python
1# Correct implementation
2cursor = None
3page_num = 0
4
5while page_num < 1000:
6 page_num += 1
7
8 # Use cursor parameter
9 params = {"page_size": 1000} # Increased from 50
10 if cursor:
11 params["pageCursor"] = cursor # Correct parameter name
12
13 data = fetch_api("/export/", params, api_version="v2")
14 results = data.get("results", [])
15
16 if not results:
17 break
18
19 # Process results...
20
21 # Use correct field name
22 next_cursor = data.get("nextPageCursor") # Not "next"
23 if not next_cursor:
24 break
25
26 cursor = next_cursor # Advance cursor
Result:
- Before: 1 page, 463 highlights (< 1%)
- After: 75 pages, 74,386 highlights (100%)
- Efficiency: 20x fewer API calls (1000 vs 50 page_size)
Context:
- New API integration
- Documentation unclear about pagination
- Need to import complete dataset
Step-by-step debugging:
bash
1# Step 1: Test API response structure
2curl -s -H "Authorization: Token $TOKEN" \
3 "https://api.example.com/data?limit=10" | jq 'keys'
4
5# Output: ["data", "pagination"]
6
7# Step 2: Inspect pagination object
8curl -s -H "Authorization: Token $TOKEN" \
9 "https://api.example.com/data?limit=10" | jq '.pagination'
10
11# Output:
12# {
13# "total": 5000,
14# "offset": 0,
15# "limit": 10,
16# "has_more": true
17# }
18
19# Step 3: Test offset advancement
20curl -s -H "Authorization: Token $TOKEN" \
21 "https://api.example.com/data?limit=10&offset=10" | jq '.pagination'
22
23# Output:
24# {
25# "total": 5000,
26# "offset": 10,
27# "limit": 10,
28# "has_more": true
29# }
Implementation:
python
1# Offset-based pagination identified
2offset = 0
3limit = 100 # Use larger limit
4
5while True:
6 params = {"limit": limit, "offset": offset}
7 response = fetch_api("/data", params)
8
9 items = response.get("data", [])
10 if not items:
11 break
12
13 # Process items...
14
15 pagination = response.get("pagination", {})
16 if not pagination.get("has_more"):
17 break
18
19 offset += limit # Advance offset
When to Use This Skill
This skill auto-activates when:
- Pagination stops after exactly 1 page despite more data existing
- Import status shows "completed_all_pages" but dataset incomplete
- API integration returns duplicate results repeatedly
- Implementing pagination for new API endpoint
- User mentions "pagination bug", "stuck at one page", or "not paginating"
- Debugging issues with cursor-based, page-based, or offset-based pagination
- Converting between pagination patterns (e.g., page numbers to cursors)
- Optimizing API call efficiency with page_size tuning
Don't use when:
- Pagination works correctly (complete dataset imported)
- API returns proper error messages (different debugging needed)
- Rate limiting is the issue (needs rate limit handling, not pagination fixes)
- Authentication problems (verify auth before debugging pagination)
Integration
Related Skills:
Related Commands:
/readwise-import - Primary user of this debugging methodology
Related Vault Documents:
- [[0 Projects/2026 Draft Articles/Readwise Highlights Import Draft]] - Documented implementation of highlights import with pagination
- [[Readwise MCP Server Implementation]] (if exists) - Technical documentation
Technical Context:
- MCP server:
/Users/ngpestelos/src/readwise-mcp-server/server.py
- State file:
.claude/state/readwise-import.json
- Readwise API docs: https://readwise.io/api_deets
Key Takeaway
API pagination failures usually stem from field name mismatches or wrong pagination pattern assumptions. Always verify actual API response structure with curl/jq before implementing pagination logic, use maximum page_size for efficiency, and test page 2 manually to confirm advancement works. The pattern is: inspect response → identify pagination type → match implementation → optimize page size → verify with logging.
Discovered January 30, 2026 during Readwise highlights backfill debugging
Bug fix reduced 74,386 highlights import from theoretical 1,488 pages to actual 75 pages
Pattern applies to any cursor-based, page-based, or offset-based pagination implementation