dribl-crawling — community dribl-crawling, williamstownsc, community, ide skills, Claude Code, Cursor, Windsurf

v1.0.0
GitHub

About this Skill

Perfect for Web Scraping Agents needing real browser automation with playwright-core for extracting data from Cloudflare-protected websites Static website for Williamstown soccer club powered by Next and Sanity

dejanvasic85 dejanvasic85
[0]
[0]
Updated: 3/1/2026

Agent Capability Analysis

The dribl-crawling skill by dejanvasic85 is an open-source community AI agent skill for Claude Code and other IDE workflows, helping agents execute tasks with better context, repeatability, and domain-specific guidance.

Ideal Agent Persona

Perfect for Web Scraping Agents needing real browser automation with playwright-core for extracting data from Cloudflare-protected websites

Core Value

Empowers agents to extract and transform raw API data from dribl.com using a two-phase workflow, leveraging playwright-core for real browser automation and handling Cloudflare protection, resulting in validated and merged data for seamless integration into websites like the Williamstown soccer club's Next and Sanity-powered platform

Capabilities Granted for dribl-crawling

Extracting clubs and fixtures data from dribl.com
Automating data updates for sports websites
Transforming raw API data into validated and merged formats

! Prerequisites & Limits

  • Requires playwright-core installation
  • Cloudflare protection may require additional handling
  • Specific to dribl.com API structure and data formats
Labs Demo

Browser Sandbox Environment

⚡️ Ready to unleash?

Experience this Agent in a zero-setup browser environment powered by WebContainers. No installation required.

Boot Container Sandbox

dribl-crawling

Install dribl-crawling, an AI agent skill for AI agent workflows and automation. Works with Claude Code, Cursor, and Windsurf with one-command setup.

SKILL.md
Readonly

Dribl Crawling

Overview

Extract clubs and fixtures data from https://fv.dribl.com/fixtures/ (SPA with Cloudflare protection) using real browser automation with playwright-core. Two-phase workflow: extraction (raw API data) → transformation (validated, merged data).

Purpose: Crawl dribl.com to maintain up-to-date clubs and fixtures data for Williamstown SC website.

Architecture

Data flow:

dribl API → data/external/fixtures/{team}/ (raw) → transform → data/matches/ (validated)
dribl API → data/external/clubs/ (raw) → transform → data/clubs/ (validated)

Two-phase pattern:

  1. Extraction: Playwright intercepts API requests, saves raw JSON
  2. Transformation: Read raw data, validate with Zod, transform, deduplicate, save

Key technologies:

  • playwright-core (real Chrome browser)
  • Zod validation schemas
  • TypeScript with tsx runner

Clubs Extraction

Reference: bin/crawlClubs.ts

Pattern:

typescript
1// Launch browser 2const browser = await chromium.launch({ 3 headless: false, 4 channel: 'chrome' 5}); 6 7// Custom user agent (bypass detection) 8const context = await browser.newContext({ 9 userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36...', 10 viewport: { width: 1280, height: 720 } 11}); 12 13// Intercept API request 14const [clubsResponse] = await Promise.all([ 15 page.waitForResponse((response) => response.url().startsWith(clubsApiUrl) && response.ok(), { 16 timeout: 60_000 17 }), 18 page.goto(url, { waitUntil: 'domcontentloaded' }) 19]); 20 21// Validate and save 22const rawData = await clubsResponse.json(); 23const validated = externalApiResponseSchema.parse(rawData); 24writeFileSync(outputPath, JSON.stringify(validated, null, '\t') + '\n');

API endpoint:

  • URL: https://mc-api.dribl.com/api/list/clubs?disable_paging=true
  • Response: JSON with data array of club objects
  • Validation: externalApiResponseSchema (src/types/matches.ts)

Output:

  • Path: data/external/clubs/clubs.json
  • Format: Single JSON file with all clubs

CLI args:

  • --url <fixtures-page-url> (optional, defaults to standard fixtures page)

Fixtures Extraction

Pattern (implemented in bin/crawlFixtures.ts):

Steps:

  1. Navigate to https://fv.dribl.com/fixtures/
  2. Wait for SPA to load (waitUntil: 'domcontentloaded')
  3. Apply filters (REQUIRED):
    • Season (e.g., "2025")
    • Competition (e.g., "FFV")
    • League (e.g., "state-league-2-men-s-north-west")
  4. Intercept /api/fixtures responses
  5. Handle pagination:
    • Detect "Load more" button in DOM
    • Click button to load next chunk
    • Wait for new API response
    • Repeat until no more data
  6. Save each chunk as chunk-{index}.json

API endpoint:

  • URL: https://mc-api.dribl.com/api/fixtures
  • Query params: season, competition, league (from filters)
  • Response: JSON with data array, links (next/prev), meta (cursors)
  • Validation: externalFixturesApiResponseSchema

Output:

  • Path: data/external/fixtures/{team}/chunk-0.json, chunk-1.json, etc.
  • Format: Multiple JSON files (one per "Load more" click)
  • Naming: chunk-{index}.json where index starts at 0

CLI args:

  • --team <slug> (required) - Team slug for output folder (e.g., "state-league-2-men-s-north-west")
  • --league <slug> (required) - League slug for filtering (e.g., "State League 2 Men's - North-West")
  • --season <year> (optional, default to current year)
  • --competition <id> (optional, default to FFV)

Clubs Transformation

Reference: bin/syncClubs.ts

Pattern:

typescript
1// Load external data 2const externalResponse = loadExternalData(); // from data/external/clubs/ 3const validated = externalApiResponseSchema.parse(externalResponse); 4 5// Transform to internal format 6const apiClubs = externalResponse.data.map((externalClub) => transformExternalClub(externalClub)); 7 8// Load existing clubs 9const existingFile = loadExistingClubs(); // from data/clubs/ 10 11// Merge (deduplicate by externalId) 12const clubsMap = new Map<string, Club>(); 13for (const club of existingClubs) { 14 clubsMap.set(club.externalId, club); 15} 16for (const apiClub of apiClubs) { 17 clubsMap.set(apiClub.externalId, apiClub); // update or add 18} 19 20// Sort by name 21const mergedClubs = Array.from(clubsMap.values()).sort((a, b) => a.name.localeCompare(b.name)); 22 23// Save 24writeFileSync(CLUBS_FILE_PATH, JSON.stringify({ clubs: mergedClubs }, null, '\t'));

Transform service: src/lib/clubService.ts

  • transformExternalClub(): Converts external club format to internal format
  • Maps fields: id→externalId, attributes.name→name/displayName, etc.
  • Normalizes address (combines address_line_1 + address_line_2)
  • Maps socials array (name→platform, value→url)
  • Validates output with clubSchema

Output:

  • Path: data/clubs/clubs.json
  • Format: { clubs: Club[] }

Fixtures Transformation

Reference: bin/syncFixtures.ts

Pattern:

typescript
1// Read all chunk files 2const teamDir = path.join(EXTERNAL_DIR, team); 3const files = await fs.readdir(teamDir); 4const chunkFiles = files.filter((f) => f.match(/^chunk-\d+\.json$/)).sort(); // natural number sort 5 6// Load and validate each chunk 7const responses: ExternalFixturesApiResponse[] = []; 8for (const file of chunkFiles) { 9 const content = await fs.readFile(path.join(teamDir, file), 'utf-8'); 10 const validated = externalFixturesApiResponseSchema.parse(JSON.parse(content)); 11 responses.push(validated); 12} 13 14// Transform all fixtures 15const allFixtures = []; 16for (const response of responses) { 17 for (const externalFixture of response.data) { 18 const fixture = transformExternalFixture(externalFixture); 19 allFixtures.push(fixture); 20 } 21} 22 23// Deduplicate (by round + homeTeamId + awayTeamId) 24const seen = new Set<string>(); 25const deduplicated = allFixtures.filter((f) => { 26 const key = `${f.round}-${f.homeTeamId}-${f.awayTeamId}`; 27 if (seen.has(key)) return false; 28 seen.add(key); 29 return true; 30}); 31 32// Sort by round, then date 33const sorted = deduplicated.sort((a, b) => { 34 if (a.round !== b.round) return a.round - b.round; 35 return a.date.localeCompare(b.date); 36}); 37 38// Calculate metadata 39const totalRounds = Math.max(...sorted.map((f) => f.round), 0); 40 41// Save 42const fixtureData = { 43 competition: 'FFV', 44 season: 2025, 45 totalFixtures: sorted.length, 46 totalRounds, 47 fixtures: sorted 48}; 49writeFileSync(outputPath, JSON.stringify(fixtureData, null, '\t'));

Transform service: src/lib/matches/fixtureTransformService.ts

  • transformExternalFixture(): Converts external fixture format to internal format
  • Parses round number (e.g., "R1" → 1)
  • Formats date/time/day strings (ISO date, 24h time, weekday name)
  • Combines ground + field names for address
  • Finds club external IDs by matching team names/logos
  • Validates output with fixtureSchema

Output:

  • Path: data/matches/{team}.json
  • Format: { competition, season, totalFixtures, totalRounds, fixtures: Fixture[] }

CLI args:

  • --team <slug> (required) - Team slug to sync (e.g., "state-league-2-men-s-north-west")

Validation Schemas

Reference: src/types/matches.ts

External schemas (API responses):

  • externalApiResponseSchema: Clubs API response
  • externalClubSchema: Single club object
  • externalFixturesApiResponseSchema: Fixtures API response
  • externalFixtureSchema: Single fixture object

Internal schemas (transformed data):

  • clubSchema: Single club
  • clubsSchema: Clubs file ({ clubs: Club[] })
  • fixtureSchema: Single fixture
  • fixtureDataSchema: Fixtures file ({ competition, season, totalFixtures, totalRounds, fixtures })

Pattern: Always validate at boundaries (API → external schema, transform → internal schema)

CI Integration

Reference: .github/workflows/crawl-clubs.yml

Linux setup (GitHub Actions):

yaml
1- name: Install Chrome 2 run: npx playwright install --with-deps chrome 3 4- name: Crawl clubs 5 run: npm run crawl:clubs:ci -- ${{ inputs.url && format('--url "{0}"', inputs.url) || '' }}

Key points:

  • Use xvfb-run prefix on Linux for headless Chrome (e.g., xvfb-run npm run crawl:clubs)
  • Install with --with-deps flag to get system dependencies
  • Set appropriate timeout (5 min for clubs, may need more for fixtures)
  • Upload artifacts for data files

Package.json scripts pattern:

json
1{ 2 "crawl:clubs": "tsx bin/crawlClubs.ts", 3 "crawl:clubs:ci": "xvfb-run tsx bin/crawlClubs.ts", 4 "sync:clubs": "tsx bin/syncClubs.ts", 5 "sync:fixtures": "tsx bin/syncFixtures.ts" 6}

Best Practices

Logging:

  • Use emoji logging for clarity:
    • ✓ / ✅ - Success
    • ❌ - Error
    • 📂 - File operations
    • 🔄 - Processing/transformation
  • Log counts and progress for large operations

Error handling:

  • Try/catch at top level
  • Special handling for ZodError (print issues)
  • Exit with code 1 on failure
  • Close browser in finally block

File operations:

  • Always use mkdirSync(path, { recursive: true }) before writing
  • Format JSON with tabs: JSON.stringify(data, null, '\t')
  • Add newline at end of file: content + '\n'
  • Use absolute paths with resolve(__dirname, '../relative/path')

Data separation:

  • Keep raw external data in data/external/ (gitignored)
  • Keep transformed data in data/ (committed)
  • Never commit external API responses directly

Validation:

  • Validate immediately after receiving API data
  • Validate before writing transformed data
  • Use descriptive error messages with file paths

CLI arguments:

  • Use Commander library for consistent CLI parsing
  • Define options with .option() or .requiredOption()
  • Provide defaults for optional args
  • Commander auto-generates help text and validates required args

Common Patterns

Reading chunks:

typescript
1const files = await fs.readdir(dir); 2const chunks = files 3 .filter((f) => f.match(/^chunk-\d+\.json$/)) 4 .sort((a, b) => { 5 const numA = parseInt(a.match(/\d+/)?.[0] || '0', 10); 6 const numB = parseInt(b.match(/\d+/)?.[0] || '0', 10); 7 return numA - numB; 8 });

Deduplication:

typescript
1const seen = new Set<string>(); 2const unique = items.filter((item) => { 3 const key = computeKey(item); 4 if (seen.has(key)) return false; 5 seen.add(key); 6 return true; 7});

Merge with existing:

typescript
1const map = new Map<string, T>(); 2existing.forEach((item) => map.set(item.id, item)); 3incoming.forEach((item) => map.set(item.id, item)); // update or add 4const merged = Array.from(map.values());

Browser cleanup:

typescript
1let browser: Browser | undefined; 2try { 3 browser = await chromium.launch(...); 4 // work 5} finally { 6 if (browser) await browser.close(); 7}

FAQ & Installation Steps

These questions and steps mirror the structured data on this page for better search understanding.

? Frequently Asked Questions

What is dribl-crawling?

Perfect for Web Scraping Agents needing real browser automation with playwright-core for extracting data from Cloudflare-protected websites Static website for Williamstown soccer club powered by Next and Sanity

How do I install dribl-crawling?

Run the command: npx killer-skills add dejanvasic85/williamstownsc/dribl-crawling. It works with Cursor, Windsurf, VS Code, Claude Code, and 19+ other IDEs.

What are the use cases for dribl-crawling?

Key use cases include: Extracting clubs and fixtures data from dribl.com, Automating data updates for sports websites, Transforming raw API data into validated and merged formats.

Which IDEs are compatible with dribl-crawling?

This skill is compatible with Cursor, Windsurf, VS Code, Trae, Claude Code, OpenClaw, Aider, Codex, OpenCode, Goose, Cline, Roo Code, Kiro, Augment Code, Continue, GitHub Copilot, Sourcegraph Cody, and Amazon Q Developer. Use the Killer-Skills CLI for universal one-command installation.

Are there any limitations for dribl-crawling?

Requires playwright-core installation. Cloudflare protection may require additional handling. Specific to dribl.com API structure and data formats.

How To Install

  1. 1. Open your terminal

    Open the terminal or command line in your project directory.

  2. 2. Run the install command

    Run: npx killer-skills add dejanvasic85/williamstownsc/dribl-crawling. The CLI will automatically detect your IDE or AI agent and configure the skill.

  3. 3. Start using the skill

    The skill is now active. Your AI agent can use dribl-crawling immediately in the current project.

Related Skills

Looking for an alternative to dribl-crawling or another community skill for your workflow? Explore these related open-source skills.

View All

widget-generator

Logo of f
f

f.k.a. Awesome ChatGPT Prompts. Share, discover, and collect prompts from the community. Free and open source — self-host for your organization with complete privacy.

149.6k
0
AI

flags

Logo of vercel
vercel

flags is a Next.js feature management skill that enables developers to efficiently add or modify framework feature flags, streamlining React application development.

138.4k
0
Browser

zustand

Logo of lobehub
lobehub

The ultimate space for work and life — to find, build, and collaborate with agent teammates that grow with you. We are taking agent harness to the next level — enabling multi-agent collaboration, effortless agent team design, and introducing agents as the unit of work interaction.

72.8k
0
AI

data-fetching

Logo of lobehub
lobehub

The ultimate space for work and life — to find, build, and collaborate with agent teammates that grow with you. We are taking agent harness to the next level — enabling multi-agent collaboration, effortless agent team design, and introducing agents as the unit of work interaction.

72.8k
0
AI