Browser Sandbox Environment
⚡️ Ready to unleash?
Experience this Agent in a zero-setup browser environment powered by WebContainers. No installation required.
synthea
Unlock efficient FHIR query translation with Synthea, an AI agent skill for Claude Code. Discover how Synthea streamlines natural language to FHIR queries,...
Synthea Test Data Generation Skill
Usage
/synthea <command> [options]
Commands
| Command | Description |
|---|---|
create-module <phenotype> | Create Synthea module from phenotype data |
generate <phenotype> | Run Synthea to generate FHIR data |
load <phenotype> | Load generated data to FHIR server |
full <phenotype> | Create module + generate + load (full pipeline) |
status | Show status of all phenotypes |
list | List available phenotypes with modules |
batch <phenotypes...> | Process multiple phenotypes |
Command Details
create-module
Creates a Synthea GMF (Generic Module Framework) module from phenotype data.
Inputs:
data/phekb-raw/<phenotype>/document_analysis.json- Extracted codes and criteriadata/phekb-raw/<phenotype>/description.txt- Phenotype descriptiontest-cases/phekb/phekb-<phenotype>.json- Test case with required codes
Outputs:
synthea/modules/custom/phekb_<phenotype>.json- Positive case modulesynthea/modules/custom/phekb_<phenotype>_control.json- Control module
generate
Runs Synthea with the custom module to generate synthetic patients.
Options:
--patients N- Number of positive cases (default: 20)--controls N- Number of control cases (default: 20)--seed N- Random seed for reproducibility
Outputs:
synthea/output/<phenotype>/positive/fhir/*.json- Positive patient bundlessynthea/output/<phenotype>/control/fhir/*.json- Control patient bundles
load
Loads generated FHIR bundles to the FHIR server.
Prerequisites:
- FHIR server running (HAPI FHIR at
http://localhost:8080/fhiror Azure FHIR athttp://localhost:9080) - Generated data exists in
synthea/output/<phenotype>/
Important: Infrastructure bundles (hospitals, practitioners) must load BEFORE patient bundles. The CLI fhir-eval load synthea command handles this automatically. If loading manually, load hospitalInformation*.json and practitionerInformation*.json files first.
full
Runs the complete pipeline: create-module → generate → load
status
Shows which phenotypes have:
- Document analysis data
- Synthea modules created
- Generated data
- Data loaded to FHIR server
Instructions for Claude
When this skill is invoked, follow these instructions based on the command:
For create-module <phenotype>:
-
Read the phenotype data:
Read: data/phekb-raw/<phenotype>/document_analysis.json Read: test-cases/phekb/phekb-<phenotype>.json (if exists) -
Extract key information:
- Diagnosis codes (ICD-9, ICD-10, SNOMED-CT)
- Lab codes (LOINC) with thresholds from clinical_criteria
- Medication codes (RxNorm)
- Age requirements
- Exclusion criteria
-
Verify and enrich codes using the UMLS MCP server (CRITICAL):
Do NOT trust codes from document_analysis.json blindly — they may be incomplete, outdated, or wrong. Use the UMLS MCP tools to get authoritative codes:
a. For each diagnosis concept, search UMLS and get codes across systems:
search_umls(query="<condition name>", search_type="exact") get_concept(cui="<CUI>") crosswalk_codes(source="SNOMEDCT_US", code="<SNOMED>", target_source="ICD10CM") crosswalk_codes(source="SNOMEDCT_US", code="<SNOMED>", target_source="ICD9CM")b. For each lab concept, find the correct LOINC code:
search_umls(query="<lab name>", search_type="words")Look for results with semantic type "Laboratory Procedure" or "Clinical Attribute".
c. For each medication, find the RxNorm code:
search_umls(query="<medication name>", search_type="exact") crosswalk_codes(source="RXNORM", code="<CODE>", target_source="SNOMEDCT_US")d. Cross-check codes from the phenotype data against UMLS:
get_source_concept(source="ICD10CM", id="<CODE>")Verify the display name matches the intended concept. Discard obsolete codes.
e. If crosswalk returns empty (common for SNOMED→ICD-10), search UMLS for the term with
search_type="words"and look for CUIs with atoms from the target system.See
/umlsskill for full details on UMLS MCP usage patterns and gotchas. -
Generate the Synthea module following GMF format:
- Use the existing
synthea/modules/custom/phekb_type_2_diabetes.jsonas a template - Include multiple code systems for each concept (SNOMED + ICD-10 + ICD-9)
- Set realistic value ranges based on clinical_criteria
- Create state machine: Initial → Age Guard → Condition Onset → Labs → Medications → Terminal
- Use the existing
-
Generate the control module:
- Normal lab values (below diagnostic thresholds)
- No phenotype-specific diagnosis codes
- May include unrelated conditions for variety
-
Write the modules:
Write: synthea/modules/custom/phekb_<phenotype>.json Write: synthea/modules/custom/phekb_<phenotype>_control.json -
Validate JSON syntax by reading back and parsing
For generate <phenotype>:
-
Check prerequisites:
- Synthea source build exists at
C:\repos\synthea(orSYNTHEA_HOMEenv var) - Module exists:
synthea/modules/custom/phekb_<phenotype>.json
- Synthea source build exists at
-
If Synthea not found, provide install instructions:
bash1git clone https://github.com/synthetichealth/synthea.git C:/repos/synthea -
IMPORTANT: Environment-specific issues to handle:
Claude Code runs in a bash environment (git bash), NOT cmd.exe. This causes three issues:
Issue 1:
.batfiles don't run natively in bash.- Do NOT call
run_synthea.batdirectly — it won't findgradlew.bat. - Instead, call
./gradlew(the Unix wrapper) directly from the Synthea directory.
Issue 2: JAVA_HOME / Java not on PATH.
- Git bash doesn't inherit Windows system PATH fully. Java may not be found.
- Before running Synthea, set JAVA_HOME explicitly:
bash
1export JAVA_HOME="/c/Program Files/Eclipse Adoptium/jdk-17.0.18.8-hotspot" 2export PATH="$JAVA_HOME/bin:$PATH" - Or auto-detect: check
/c/Program Files/Eclipse Adoptium/,/c/Program Files/Java/, etc.
Issue 3: Backslash paths break Gradle
-Params.- Gradle's Groovy parser interprets
\as escape characters. - ALWAYS use forward slashes in all paths passed to
./gradlew. - Example:
C:/repos/llm-fhir-query-eval/synthea/modules/custom(NOTC:\repos\...)
- Do NOT call
-
Run the Python helper script (recommended):
bash1python synthea/generate_test_data.py --phenotype <phenotype> --patients 20 --controls 20The script auto-detects whether it's running in bash or cmd.exe and adjusts:
- In bash: calls
./gradlew run -Params="[...]"directly with forward-slash paths - In cmd.exe: calls
run_synthea.batas before - Auto-detects JAVA_HOME from common Windows install locations
- In bash: calls
-
Or run Synthea directly via gradlew (if script has issues):
bash1export JAVA_HOME="/c/Program Files/Eclipse Adoptium/jdk-17.0.18.8-hotspot" 2export PATH="$JAVA_HOME/bin:$PATH" 3 4cd C:/repos/synthea && ./gradlew run -Params="['-p','20','-m','phekb_<phenotype>','-d','C:/repos/llm-fhir-query-eval/synthea/modules/custom','--exporter.fhir.export','true','--exporter.fhir.use_us_core_ig','true','--exporter.baseDirectory','C:/repos/llm-fhir-query-eval/synthea/output/<phenotype>/positive','-s','42']"Then repeat with
phekb_<phenotype>_controlmodule andcontroloutput subdirectory. -
Report results: Count generated files and summarize
For load <phenotype>:
-
Check FHIR server is running:
bash1curl -s http://localhost:8080/fhir/metadata | head -5 -
Load positive cases:
bash1for f in synthea/output/<phenotype>/positive/fhir/*.json; do 2 curl -X POST http://localhost:8080/fhir \ 3 -H "Content-Type: application/fhir+json" \ 4 -d @"$f" 5done -
Load control cases:
bash1for f in synthea/output/<phenotype>/control/fhir/*.json; do 2 curl -X POST http://localhost:8080/fhir \ 3 -H "Content-Type: application/fhir+json" \ 4 -d @"$f" 5done -
Verify loaded data by querying the server
-
Update test case with expected resource IDs (optional)
For full <phenotype>:
Execute in sequence:
create-module <phenotype>generate <phenotype>load <phenotype>
Report overall success/failure.
For status:
-
List all phenotypes from
data/phekb-raw/*/ -
For each, check:
- Has
document_analysis.json? - Has module in
synthea/modules/custom/? - Has generated data in
synthea/output/? - Count positive/control patients
- Has
-
Display summary table:
Phenotype | Analysis | Module | Data (pos/ctrl) | Loaded ---------------------|----------|--------|-----------------|-------- type-2-diabetes | ✓ | ✓ | 20/20 | ✓ asthma | ✓ | ✗ | - | - heart-failure | ✓ | ✗ | - | -
For list:
List phenotypes that have Synthea modules ready:
bash1ls synthea/modules/custom/phekb_*.json | grep -v _control
For batch <phenotypes...>:
- Parse the phenotype list (comma or space separated)
- For each phenotype, run
full <phenotype> - Track successes and failures
- Report summary at end
Multi-Path Phenotype Modules
Key Learning: Phenotype Algorithms Have Multiple Paths
PheKB phenotype algorithms are multi-path decision trees, NOT simple code lookups. A single phenotype may identify patients through DIFFERENT combinations of clinical data:
- Diagnosis codes only
- Diagnosis codes + medications
- Diagnosis codes + abnormal labs
- Medications + abnormal labs (NO diagnosis code)
- Complex temporal ordering rules
Generating Path-Specific Patients
When creating Synthea modules for phenotypes with multiple identification paths, generate DISTINCT patient groups:
- Analyze the algorithm document (usually a PDF in
data/phekb-raw/<phenotype>/) to identify all paths - Create separate state branches in the Synthea module for each path type
- Path 4-type patients (no diagnosis code): These patients should have MedicationRequest + Observation resources but NO Condition resource. This requires a separate branch in the module that:
- Skips the ConditionOnset state
- Still prescribes medications (MedicationOrder)
- Still records abnormal lab values (Observation)
- Use distributed_transition to control the mix of patient types
Module Structure for Multi-Path Phenotypes
Initial → Age_Guard → Set_Diabetes_Flag → Wellness_Encounter → Path_Router
├→ Path_With_Diagnosis (70%)
│ ├→ Diagnose_Condition
│ ├→ Record_Labs
│ ├→ Prescribe_Meds
│ └→ End_Encounter
└→ Path_No_Diagnosis (30%)
├→ Record_Abnormal_Labs (NO ConditionOnset!)
├→ Prescribe_Meds
└→ End_Encounter
Lab Value Thresholds
When generating observation values, use thresholds from the phenotype algorithm document:
- Case thresholds: Values that qualify as "abnormal" for case identification
- Control thresholds: Values that must be normal for control patients (often stricter)
Example from T2DM:
| Lab | Case Threshold | Control Exclusion |
|---|---|---|
| HbA1c | >= 6.5% | >= 6.0% |
| Fasting glucose | >= 125 mg/dL | >= 110 mg/dL |
| Random glucose | > 200 mg/dL | > 110 mg/dL |
Dual Data Sets: Generic vs US Core
For each phenotype, plan to generate TWO Synthea module variants:
| Variant | Module Suffix | Condition Codes | Meds | Categories | When to Use |
|---|---|---|---|---|---|
| Generic | phekb_<name>.json | SNOMED only | RxNorm SCD | Base FHIR | Tier 1 eval, basic testing |
| US Core | phekb_<name>_uscore.json | SNOMED + ICD-10-CM | RxNorm SCD | US Core categories | Tier 3 eval, profile-aware testing |
The US Core variant adds:
- ICD-10-CM codes alongside SNOMED on ConditionOnset (US Core allows both)
Condition.category=problem-list-item(US Core requires this)Observation.category=laboratorywith proper US Core category codingMedicationRequest.intent=orderand.status=active(US Core requires these)- Note: US Core 8 removed ICD-9-CM from the condition valueset — don't include ICD-9 in US Core variant
Output directories:
synthea/output/<phenotype>/
├── generic/
│ ├── positive/fhir/
│ └── control/fhir/
└── uscore/
├── positive/fhir/
└── control/fhir/
Medication Codes: Ingredient vs SCD
Synthea's FHIR exporter works best with SCD-level (Semantic Clinical Drug) RxNorm codes, not ingredient-level codes. Always:
- Check the algorithm doc for ingredient-level codes
- Use /umls to find the corresponding SCD codes
- Use Synthea's built-in modules as reference for which SCD codes to use
IMPORTANT: Test case expected queries and Synthea modules need DIFFERENT code levels:
- Synthea modules: Use SCD codes (e.g.,
860975for "metformin 500 MG ER Tablet") because Synthea generates FHIR MedicationRequest resources with specific drug forms - Test case expected queries: May use EITHER ingredient OR SCD codes depending on what the FHIR server indexes. HAPI FHIR does NOT automatically resolve ingredient→SCD relationships, so queries must match the exact codes in the data
Synthea GMF Critical Patterns (Lessons Learned)
These are hard-won lessons from debugging Synthea module generation:
-
"wellness": trueon Encounter states is REQUIRED. Without it, the module's ConditionOnset/Observation/MedicationOrder states will process but produce ZERO FHIR resources. Synthea only writes resources to output when they occur inside a lifecycle-managed encounter. -
ConditionOnset MUST be inside an encounter. Place it AFTER the Encounter state and BEFORE the EncounterEnd state. The old pattern of using
target_encounterpointing to a future encounter state does NOT work reliably for custom modules. -
Use
SetAttributefor disease flags,conditional_transitionfor branching. Match the pattern from Synthea's built-inmetabolic_syndrome_disease.json+metabolic_syndrome_care.json. Disease modules set attributes; care/encounter modules check attributes and create resources. -
MedicationOrder
reasonfield must reference an attribute name, not a state name. Use"reason": "t2dm_condition"wheret2dm_conditionwas set viaassign_to_attributeon a ConditionOnset. For Path 4 patients (no condition), omit thereasonfield. -
Infrastructure bundles must load first on HAPI FHIR. Synthea generates
hospitalInformation*.jsonandpractitionerInformation*.jsonfiles. These must be loaded before patient bundles, or HAPI returns 404 errors for Practitioner references. -
Patient count vs module filter. The
-mflag in Synthea keeps only patients who enter the named module. Combined with-p N, it generates N total patients but only outputs those matching the module. If the module has an Age_Guard, young patients may pass the filter but lack clinical data.
FHIR Server Compatibility Notes
| Server | Healthcheck | Data Persistence | Bundle Load Order | _has Support | Notes |
|---|---|---|---|---|---|
| HAPI FHIR | curl works | Stable in-memory | Infra files first | Yes | Recommended for dev |
| fhir-candle | No curl/wget | Unstable (periodic resets) | Any order | Limited | NOT recommended |
| Azure FHIR | TBD | SQL-backed | Infra files first | Yes | Requires SQL Server |
Synthea Module Template
When creating modules, use this structure:
json1{ 2 "name": "PheKB <Phenotype Name>", 3 "remarks": [ 4 "Auto-generated from PheKB phenotype: <phenotype-id>", 5 "Clinical criteria: ...", 6 "..." 7 ], 8 "states": { 9 "Initial": { 10 "type": "Initial", 11 "direct_transition": "Age_Guard" 12 }, 13 "Age_Guard": { 14 "type": "Guard", 15 "allow": { "condition_type": "Age", "operator": ">=", "quantity": 18, "unit": "years" }, 16 "direct_transition": "..." 17 }, 18 "Condition_Onset": { 19 "type": "ConditionOnset", 20 "codes": [ 21 { "system": "SNOMED-CT", "code": "...", "display": "..." }, 22 { "system": "ICD-10-CM", "code": "...", "display": "..." } 23 ], 24 "direct_transition": "..." 25 }, 26 "Lab_Observation": { 27 "type": "Observation", 28 "category": "laboratory", 29 "codes": [{ "system": "LOINC", "code": "...", "display": "..." }], 30 "unit": "...", 31 "range": { "low": ..., "high": ... }, 32 "direct_transition": "..." 33 }, 34 "Medication_Order": { 35 "type": "MedicationOrder", 36 "codes": [{ "system": "RxNorm", "code": "...", "display": "..." }], 37 "direct_transition": "..." 38 }, 39 "Terminal": { 40 "type": "Terminal" 41 } 42 }, 43 "gmf_version": 2 44}
Code System URIs
| System | Synthea Name | FHIR URI |
|---|---|---|
| SNOMED CT | SNOMED-CT | http://snomed.info/sct |
| ICD-10-CM | ICD-10-CM | http://hl7.org/fhir/sid/icd-10-cm |
| ICD-9-CM | ICD-9-CM | http://hl7.org/fhir/sid/icd-9-cm |
| LOINC | LOINC | http://loinc.org |
| RxNorm | RxNorm | http://www.nlm.nih.gov/research/umls/rxnorm |
Note: Algorithm PDFs (e.g., Table 7) may list additional LOINC codes beyond the primary ones. Include all relevant codes in modules:
- Random glucose: 2339-0, 2345-7
- HbA1c: 4548-4, 17856-6, 4549-2, 17855-8
Example Session
User: /synthea create-module asthma
Claude: I'll create a Synthea module for the asthma phenotype.
[Reads document_analysis.json and test case]
[Extracts codes: ICD-9 493.x, SNOMED 195967001, LOINC for spirometry, RxNorm for albuterol]
[Creates module with age guard, condition onset, observations, medications]
[Writes phekb_asthma.json and phekb_asthma_control.json]
Created Synthea modules for asthma:
- synthea/modules/custom/phekb_asthma.json (positive cases)
- synthea/modules/custom/phekb_asthma_control.json (controls)
To generate test data: /synthea generate asthma
User: /synthea full heart-failure
Claude: Running full pipeline for heart-failure phenotype...
Step 1/3: Creating module...
[Creates module]
Step 2/3: Generating data...
[Runs Synthea - 20 positive, 20 control patients]
Step 3/3: Loading to FHIR server...
[Loads 40 patient bundles]
Complete! Generated and loaded 40 patients for heart-failure phenotype.
- 20 positive cases (should match phenotype query)
- 20 controls (should NOT match)
FAQ & Installation Steps
These questions and steps mirror the structured data on this page for better search understanding.
? Frequently Asked Questions
What is synthea?
Synthea is a specialized AI agent skill designed to compare different LLMs on their ability to accurately translate natural language to FHIR queries.
How do I install synthea?
Run the command: npx killer-skills add feordin/llm-fhir-query-eval/synthea. It works with Cursor, Windsurf, VS Code, Claude Code, and 19+ other IDEs.
Which IDEs are compatible with synthea?
This skill is compatible with Cursor, Windsurf, VS Code, Trae, Claude Code, OpenClaw, Aider, Codex, OpenCode, Goose, Cline, Roo Code, Kiro, Augment Code, Continue, GitHub Copilot, Sourcegraph Cody, and Amazon Q Developer. Use the Killer-Skills CLI for universal one-command installation.
↓ How To Install
-
1. Open your terminal
Open the terminal or command line in your project directory.
-
2. Run the install command
Run: npx killer-skills add feordin/llm-fhir-query-eval/synthea. The CLI will automatically detect your IDE or AI agent and configure the skill.
-
3. Start using the skill
The skill is now active. Your AI agent can use synthea immediately in the current project.