synthea — for Claude Code synthea, llm-fhir-query-eval, community, for Claude Code, ide skills, FHIR query translation, Natural Language Processing, Healthcare data interaction, Synthea module creation, Synthetic patient generation, Claude Code

v1.0.0
GitHub

About this Skill

Synthea is a specialized AI agent skill designed to compare different LLMs on their ability to accurately translate natural language to FHIR queries.

Features

Create Synthea modules from phenotype data using Generic Module Framework
Generate synthetic patients with custom modules
Load generated FHIR bundles to FHIR servers
Process multiple phenotypes in batch mode
Show status of all phenotypes
List available phenotypes with modules

# Core Topics

feordin feordin
[0]
[0]
Updated: 3/12/2026
Labs Demo

Browser Sandbox Environment

⚡️ Ready to unleash?

Experience this Agent in a zero-setup browser environment powered by WebContainers. No installation required.

Boot Container Sandbox

synthea

Unlock efficient FHIR query translation with Synthea, an AI agent skill for Claude Code. Discover how Synthea streamlines natural language to FHIR queries,...

SKILL.md
Readonly

Synthea Test Data Generation Skill

Usage

/synthea <command> [options]

Commands

CommandDescription
create-module <phenotype>Create Synthea module from phenotype data
generate <phenotype>Run Synthea to generate FHIR data
load <phenotype>Load generated data to FHIR server
full <phenotype>Create module + generate + load (full pipeline)
statusShow status of all phenotypes
listList available phenotypes with modules
batch <phenotypes...>Process multiple phenotypes

Command Details

create-module

Creates a Synthea GMF (Generic Module Framework) module from phenotype data.

Inputs:

  • data/phekb-raw/<phenotype>/document_analysis.json - Extracted codes and criteria
  • data/phekb-raw/<phenotype>/description.txt - Phenotype description
  • test-cases/phekb/phekb-<phenotype>.json - Test case with required codes

Outputs:

  • synthea/modules/custom/phekb_<phenotype>.json - Positive case module
  • synthea/modules/custom/phekb_<phenotype>_control.json - Control module

generate

Runs Synthea with the custom module to generate synthetic patients.

Options:

  • --patients N - Number of positive cases (default: 20)
  • --controls N - Number of control cases (default: 20)
  • --seed N - Random seed for reproducibility

Outputs:

  • synthea/output/<phenotype>/positive/fhir/*.json - Positive patient bundles
  • synthea/output/<phenotype>/control/fhir/*.json - Control patient bundles

load

Loads generated FHIR bundles to the FHIR server.

Prerequisites:

  • FHIR server running (HAPI FHIR at http://localhost:8080/fhir or Azure FHIR at http://localhost:9080)
  • Generated data exists in synthea/output/<phenotype>/

Important: Infrastructure bundles (hospitals, practitioners) must load BEFORE patient bundles. The CLI fhir-eval load synthea command handles this automatically. If loading manually, load hospitalInformation*.json and practitionerInformation*.json files first.

full

Runs the complete pipeline: create-module → generate → load

status

Shows which phenotypes have:

  • Document analysis data
  • Synthea modules created
  • Generated data
  • Data loaded to FHIR server

Instructions for Claude

When this skill is invoked, follow these instructions based on the command:

For create-module <phenotype>:

  1. Read the phenotype data:

    Read: data/phekb-raw/<phenotype>/document_analysis.json
    Read: test-cases/phekb/phekb-<phenotype>.json (if exists)
    
  2. Extract key information:

    • Diagnosis codes (ICD-9, ICD-10, SNOMED-CT)
    • Lab codes (LOINC) with thresholds from clinical_criteria
    • Medication codes (RxNorm)
    • Age requirements
    • Exclusion criteria
  3. Verify and enrich codes using the UMLS MCP server (CRITICAL):

    Do NOT trust codes from document_analysis.json blindly — they may be incomplete, outdated, or wrong. Use the UMLS MCP tools to get authoritative codes:

    a. For each diagnosis concept, search UMLS and get codes across systems:

    search_umls(query="<condition name>", search_type="exact")
    get_concept(cui="<CUI>")
    crosswalk_codes(source="SNOMEDCT_US", code="<SNOMED>", target_source="ICD10CM")
    crosswalk_codes(source="SNOMEDCT_US", code="<SNOMED>", target_source="ICD9CM")
    

    b. For each lab concept, find the correct LOINC code:

    search_umls(query="<lab name>", search_type="words")
    

    Look for results with semantic type "Laboratory Procedure" or "Clinical Attribute".

    c. For each medication, find the RxNorm code:

    search_umls(query="<medication name>", search_type="exact")
    crosswalk_codes(source="RXNORM", code="<CODE>", target_source="SNOMEDCT_US")
    

    d. Cross-check codes from the phenotype data against UMLS:

    get_source_concept(source="ICD10CM", id="<CODE>")
    

    Verify the display name matches the intended concept. Discard obsolete codes.

    e. If crosswalk returns empty (common for SNOMED→ICD-10), search UMLS for the term with search_type="words" and look for CUIs with atoms from the target system.

    See /umls skill for full details on UMLS MCP usage patterns and gotchas.

  4. Generate the Synthea module following GMF format:

    • Use the existing synthea/modules/custom/phekb_type_2_diabetes.json as a template
    • Include multiple code systems for each concept (SNOMED + ICD-10 + ICD-9)
    • Set realistic value ranges based on clinical_criteria
    • Create state machine: Initial → Age Guard → Condition Onset → Labs → Medications → Terminal
  5. Generate the control module:

    • Normal lab values (below diagnostic thresholds)
    • No phenotype-specific diagnosis codes
    • May include unrelated conditions for variety
  6. Write the modules:

    Write: synthea/modules/custom/phekb_<phenotype>.json
    Write: synthea/modules/custom/phekb_<phenotype>_control.json
    
  7. Validate JSON syntax by reading back and parsing

For generate <phenotype>:

  1. Check prerequisites:

    • Synthea source build exists at C:\repos\synthea (or SYNTHEA_HOME env var)
    • Module exists: synthea/modules/custom/phekb_<phenotype>.json
  2. If Synthea not found, provide install instructions:

    bash
    1git clone https://github.com/synthetichealth/synthea.git C:/repos/synthea
  3. IMPORTANT: Environment-specific issues to handle:

    Claude Code runs in a bash environment (git bash), NOT cmd.exe. This causes three issues:

    Issue 1: .bat files don't run natively in bash.

    • Do NOT call run_synthea.bat directly — it won't find gradlew.bat.
    • Instead, call ./gradlew (the Unix wrapper) directly from the Synthea directory.

    Issue 2: JAVA_HOME / Java not on PATH.

    • Git bash doesn't inherit Windows system PATH fully. Java may not be found.
    • Before running Synthea, set JAVA_HOME explicitly:
      bash
      1export JAVA_HOME="/c/Program Files/Eclipse Adoptium/jdk-17.0.18.8-hotspot" 2export PATH="$JAVA_HOME/bin:$PATH"
    • Or auto-detect: check /c/Program Files/Eclipse Adoptium/, /c/Program Files/Java/, etc.

    Issue 3: Backslash paths break Gradle -Params.

    • Gradle's Groovy parser interprets \ as escape characters.
    • ALWAYS use forward slashes in all paths passed to ./gradlew.
    • Example: C:/repos/llm-fhir-query-eval/synthea/modules/custom (NOT C:\repos\...)
  4. Run the Python helper script (recommended):

    bash
    1python synthea/generate_test_data.py --phenotype <phenotype> --patients 20 --controls 20

    The script auto-detects whether it's running in bash or cmd.exe and adjusts:

    • In bash: calls ./gradlew run -Params="[...]" directly with forward-slash paths
    • In cmd.exe: calls run_synthea.bat as before
    • Auto-detects JAVA_HOME from common Windows install locations
  5. Or run Synthea directly via gradlew (if script has issues):

    bash
    1export JAVA_HOME="/c/Program Files/Eclipse Adoptium/jdk-17.0.18.8-hotspot" 2export PATH="$JAVA_HOME/bin:$PATH" 3 4cd C:/repos/synthea && ./gradlew run -Params="['-p','20','-m','phekb_<phenotype>','-d','C:/repos/llm-fhir-query-eval/synthea/modules/custom','--exporter.fhir.export','true','--exporter.fhir.use_us_core_ig','true','--exporter.baseDirectory','C:/repos/llm-fhir-query-eval/synthea/output/<phenotype>/positive','-s','42']"

    Then repeat with phekb_<phenotype>_control module and control output subdirectory.

  6. Report results: Count generated files and summarize

For load <phenotype>:

  1. Check FHIR server is running:

    bash
    1curl -s http://localhost:8080/fhir/metadata | head -5
  2. Load positive cases:

    bash
    1for f in synthea/output/<phenotype>/positive/fhir/*.json; do 2 curl -X POST http://localhost:8080/fhir \ 3 -H "Content-Type: application/fhir+json" \ 4 -d @"$f" 5done
  3. Load control cases:

    bash
    1for f in synthea/output/<phenotype>/control/fhir/*.json; do 2 curl -X POST http://localhost:8080/fhir \ 3 -H "Content-Type: application/fhir+json" \ 4 -d @"$f" 5done
  4. Verify loaded data by querying the server

  5. Update test case with expected resource IDs (optional)

For full <phenotype>:

Execute in sequence:

  1. create-module <phenotype>
  2. generate <phenotype>
  3. load <phenotype>

Report overall success/failure.

For status:

  1. List all phenotypes from data/phekb-raw/*/

  2. For each, check:

    • Has document_analysis.json?
    • Has module in synthea/modules/custom/?
    • Has generated data in synthea/output/?
    • Count positive/control patients
  3. Display summary table:

    Phenotype            | Analysis | Module | Data (pos/ctrl) | Loaded
    ---------------------|----------|--------|-----------------|--------
    type-2-diabetes      | ✓        | ✓      | 20/20           | ✓
    asthma               | ✓        | ✗      | -               | -
    heart-failure        | ✓        | ✗      | -               | -
    

For list:

List phenotypes that have Synthea modules ready:

bash
1ls synthea/modules/custom/phekb_*.json | grep -v _control

For batch <phenotypes...>:

  1. Parse the phenotype list (comma or space separated)
  2. For each phenotype, run full <phenotype>
  3. Track successes and failures
  4. Report summary at end

Multi-Path Phenotype Modules

Key Learning: Phenotype Algorithms Have Multiple Paths

PheKB phenotype algorithms are multi-path decision trees, NOT simple code lookups. A single phenotype may identify patients through DIFFERENT combinations of clinical data:

  • Diagnosis codes only
  • Diagnosis codes + medications
  • Diagnosis codes + abnormal labs
  • Medications + abnormal labs (NO diagnosis code)
  • Complex temporal ordering rules

Generating Path-Specific Patients

When creating Synthea modules for phenotypes with multiple identification paths, generate DISTINCT patient groups:

  1. Analyze the algorithm document (usually a PDF in data/phekb-raw/<phenotype>/) to identify all paths
  2. Create separate state branches in the Synthea module for each path type
  3. Path 4-type patients (no diagnosis code): These patients should have MedicationRequest + Observation resources but NO Condition resource. This requires a separate branch in the module that:
    • Skips the ConditionOnset state
    • Still prescribes medications (MedicationOrder)
    • Still records abnormal lab values (Observation)
  4. Use distributed_transition to control the mix of patient types

Module Structure for Multi-Path Phenotypes

Initial → Age_Guard → Set_Diabetes_Flag → Wellness_Encounter → Path_Router
                                                                    ├→ Path_With_Diagnosis (70%)
                                                                    │     ├→ Diagnose_Condition
                                                                    │     ├→ Record_Labs
                                                                    │     ├→ Prescribe_Meds
                                                                    │     └→ End_Encounter
                                                                    └→ Path_No_Diagnosis (30%)
                                                                          ├→ Record_Abnormal_Labs (NO ConditionOnset!)
                                                                          ├→ Prescribe_Meds
                                                                          └→ End_Encounter

Lab Value Thresholds

When generating observation values, use thresholds from the phenotype algorithm document:

  • Case thresholds: Values that qualify as "abnormal" for case identification
  • Control thresholds: Values that must be normal for control patients (often stricter)

Example from T2DM:

LabCase ThresholdControl Exclusion
HbA1c>= 6.5%>= 6.0%
Fasting glucose>= 125 mg/dL>= 110 mg/dL
Random glucose> 200 mg/dL> 110 mg/dL

Dual Data Sets: Generic vs US Core

For each phenotype, plan to generate TWO Synthea module variants:

VariantModule SuffixCondition CodesMedsCategoriesWhen to Use
Genericphekb_<name>.jsonSNOMED onlyRxNorm SCDBase FHIRTier 1 eval, basic testing
US Corephekb_<name>_uscore.jsonSNOMED + ICD-10-CMRxNorm SCDUS Core categoriesTier 3 eval, profile-aware testing

The US Core variant adds:

  • ICD-10-CM codes alongside SNOMED on ConditionOnset (US Core allows both)
  • Condition.category = problem-list-item (US Core requires this)
  • Observation.category = laboratory with proper US Core category coding
  • MedicationRequest.intent = order and .status = active (US Core requires these)
  • Note: US Core 8 removed ICD-9-CM from the condition valueset — don't include ICD-9 in US Core variant

Output directories:

synthea/output/<phenotype>/
├── generic/
│   ├── positive/fhir/
│   └── control/fhir/
└── uscore/
    ├── positive/fhir/
    └── control/fhir/

Medication Codes: Ingredient vs SCD

Synthea's FHIR exporter works best with SCD-level (Semantic Clinical Drug) RxNorm codes, not ingredient-level codes. Always:

  1. Check the algorithm doc for ingredient-level codes
  2. Use /umls to find the corresponding SCD codes
  3. Use Synthea's built-in modules as reference for which SCD codes to use

IMPORTANT: Test case expected queries and Synthea modules need DIFFERENT code levels:

  • Synthea modules: Use SCD codes (e.g., 860975 for "metformin 500 MG ER Tablet") because Synthea generates FHIR MedicationRequest resources with specific drug forms
  • Test case expected queries: May use EITHER ingredient OR SCD codes depending on what the FHIR server indexes. HAPI FHIR does NOT automatically resolve ingredient→SCD relationships, so queries must match the exact codes in the data

Synthea GMF Critical Patterns (Lessons Learned)

These are hard-won lessons from debugging Synthea module generation:

  1. "wellness": true on Encounter states is REQUIRED. Without it, the module's ConditionOnset/Observation/MedicationOrder states will process but produce ZERO FHIR resources. Synthea only writes resources to output when they occur inside a lifecycle-managed encounter.

  2. ConditionOnset MUST be inside an encounter. Place it AFTER the Encounter state and BEFORE the EncounterEnd state. The old pattern of using target_encounter pointing to a future encounter state does NOT work reliably for custom modules.

  3. Use SetAttribute for disease flags, conditional_transition for branching. Match the pattern from Synthea's built-in metabolic_syndrome_disease.json + metabolic_syndrome_care.json. Disease modules set attributes; care/encounter modules check attributes and create resources.

  4. MedicationOrder reason field must reference an attribute name, not a state name. Use "reason": "t2dm_condition" where t2dm_condition was set via assign_to_attribute on a ConditionOnset. For Path 4 patients (no condition), omit the reason field.

  5. Infrastructure bundles must load first on HAPI FHIR. Synthea generates hospitalInformation*.json and practitionerInformation*.json files. These must be loaded before patient bundles, or HAPI returns 404 errors for Practitioner references.

  6. Patient count vs module filter. The -m flag in Synthea keeps only patients who enter the named module. Combined with -p N, it generates N total patients but only outputs those matching the module. If the module has an Age_Guard, young patients may pass the filter but lack clinical data.

FHIR Server Compatibility Notes

ServerHealthcheckData PersistenceBundle Load Order_has SupportNotes
HAPI FHIRcurl worksStable in-memoryInfra files firstYesRecommended for dev
fhir-candleNo curl/wgetUnstable (periodic resets)Any orderLimitedNOT recommended
Azure FHIRTBDSQL-backedInfra files firstYesRequires SQL Server

Synthea Module Template

When creating modules, use this structure:

json
1{ 2 "name": "PheKB <Phenotype Name>", 3 "remarks": [ 4 "Auto-generated from PheKB phenotype: <phenotype-id>", 5 "Clinical criteria: ...", 6 "..." 7 ], 8 "states": { 9 "Initial": { 10 "type": "Initial", 11 "direct_transition": "Age_Guard" 12 }, 13 "Age_Guard": { 14 "type": "Guard", 15 "allow": { "condition_type": "Age", "operator": ">=", "quantity": 18, "unit": "years" }, 16 "direct_transition": "..." 17 }, 18 "Condition_Onset": { 19 "type": "ConditionOnset", 20 "codes": [ 21 { "system": "SNOMED-CT", "code": "...", "display": "..." }, 22 { "system": "ICD-10-CM", "code": "...", "display": "..." } 23 ], 24 "direct_transition": "..." 25 }, 26 "Lab_Observation": { 27 "type": "Observation", 28 "category": "laboratory", 29 "codes": [{ "system": "LOINC", "code": "...", "display": "..." }], 30 "unit": "...", 31 "range": { "low": ..., "high": ... }, 32 "direct_transition": "..." 33 }, 34 "Medication_Order": { 35 "type": "MedicationOrder", 36 "codes": [{ "system": "RxNorm", "code": "...", "display": "..." }], 37 "direct_transition": "..." 38 }, 39 "Terminal": { 40 "type": "Terminal" 41 } 42 }, 43 "gmf_version": 2 44}

Code System URIs

SystemSynthea NameFHIR URI
SNOMED CTSNOMED-CThttp://snomed.info/sct
ICD-10-CMICD-10-CMhttp://hl7.org/fhir/sid/icd-10-cm
ICD-9-CMICD-9-CMhttp://hl7.org/fhir/sid/icd-9-cm
LOINCLOINChttp://loinc.org
RxNormRxNormhttp://www.nlm.nih.gov/research/umls/rxnorm

Note: Algorithm PDFs (e.g., Table 7) may list additional LOINC codes beyond the primary ones. Include all relevant codes in modules:

  • Random glucose: 2339-0, 2345-7
  • HbA1c: 4548-4, 17856-6, 4549-2, 17855-8

Example Session

User: /synthea create-module asthma

Claude: I'll create a Synthea module for the asthma phenotype.

[Reads document_analysis.json and test case]
[Extracts codes: ICD-9 493.x, SNOMED 195967001, LOINC for spirometry, RxNorm for albuterol]
[Creates module with age guard, condition onset, observations, medications]
[Writes phekb_asthma.json and phekb_asthma_control.json]

Created Synthea modules for asthma:
- synthea/modules/custom/phekb_asthma.json (positive cases)
- synthea/modules/custom/phekb_asthma_control.json (controls)

To generate test data: /synthea generate asthma
User: /synthea full heart-failure

Claude: Running full pipeline for heart-failure phenotype...

Step 1/3: Creating module...
[Creates module]

Step 2/3: Generating data...
[Runs Synthea - 20 positive, 20 control patients]

Step 3/3: Loading to FHIR server...
[Loads 40 patient bundles]

Complete! Generated and loaded 40 patients for heart-failure phenotype.
- 20 positive cases (should match phenotype query)
- 20 controls (should NOT match)

FAQ & Installation Steps

These questions and steps mirror the structured data on this page for better search understanding.

? Frequently Asked Questions

What is synthea?

Synthea is a specialized AI agent skill designed to compare different LLMs on their ability to accurately translate natural language to FHIR queries.

How do I install synthea?

Run the command: npx killer-skills add feordin/llm-fhir-query-eval/synthea. It works with Cursor, Windsurf, VS Code, Claude Code, and 19+ other IDEs.

Which IDEs are compatible with synthea?

This skill is compatible with Cursor, Windsurf, VS Code, Trae, Claude Code, OpenClaw, Aider, Codex, OpenCode, Goose, Cline, Roo Code, Kiro, Augment Code, Continue, GitHub Copilot, Sourcegraph Cody, and Amazon Q Developer. Use the Killer-Skills CLI for universal one-command installation.

How To Install

  1. 1. Open your terminal

    Open the terminal or command line in your project directory.

  2. 2. Run the install command

    Run: npx killer-skills add feordin/llm-fhir-query-eval/synthea. The CLI will automatically detect your IDE or AI agent and configure the skill.

  3. 3. Start using the skill

    The skill is now active. Your AI agent can use synthea immediately in the current project.

Related Skills

Looking for an alternative to synthea or another community skill for your workflow? Explore these related open-source skills.

View All

widget-generator

Logo of f
f

f.k.a. Awesome ChatGPT Prompts. Share, discover, and collect prompts from the community. Free and open source — self-host for your organization with complete privacy.

149.6k
0
AI

flags

Logo of vercel
vercel

flags is a Next.js feature management skill that enables developers to efficiently add or modify framework feature flags, streamlining React application development.

138.4k
0
Browser

zustand

Logo of lobehub
lobehub

The ultimate space for work and life — to find, build, and collaborate with agent teammates that grow with you. We are taking agent harness to the next level — enabling multi-agent collaboration, effortless agent team design, and introducing agents as the unit of work interaction.

72.8k
0
AI

data-fetching

Logo of lobehub
lobehub

The ultimate space for work and life — to find, build, and collaborate with agent teammates that grow with you. We are taking agent harness to the next level — enabling multi-agent collaboration, effortless agent team design, and introducing agents as the unit of work interaction.

72.8k
0
AI