Synthea is a specialized AI agent skill designed to compare different LLMs on their ability to accurately translate natural language to FHIR queries.

How do I install synthea?

Run the command: npx killer-skills add feordin/llm-fhir-query-eval/synthea. It works with Cursor, Windsurf, VS Code, Claude Code, and 19+ other IDEs.

Which IDEs are compatible with synthea?

This skill is compatible with Cursor, Windsurf, VS Code, Trae, Claude Code, OpenClaw, Aider, Codex, OpenCode, Goose, Cline, Roo Code, Kiro, Augment Code, Continue, GitHub Copilot, Sourcegraph Cody, and Amazon Q Developer. Use the Killer-Skills CLI for universal one-command installation.

Synthea Test Data Generation Skill

Name: synthea
Availability: InStock
Author: feordin

Usage

/synthea <command> [options]

Commands

Command	Description
`create-module <phenotype>`	Create Synthea module from phenotype data
`generate <phenotype>`	Run Synthea to generate FHIR data
`load <phenotype>`	Load generated data to FHIR server
`full <phenotype>`	Create module + generate + load (full pipeline)
`status`	Show status of all phenotypes
`list`	List available phenotypes with modules
`batch <phenotypes...>`	Process multiple phenotypes

Command Details

create-module

Creates a Synthea GMF (Generic Module Framework) module from phenotype data.

Inputs:

data/phekb-raw/<phenotype>/document_analysis.json - Extracted codes and criteria
data/phekb-raw/<phenotype>/description.txt - Phenotype description
test-cases/phekb/phekb-<phenotype>.json - Test case with required codes

Outputs:

synthea/modules/custom/phekb_<phenotype>.json - Positive case module
synthea/modules/custom/phekb_<phenotype>_control.json - Control module

generate

Runs Synthea with the custom module to generate synthetic patients.

Options:

--patients N - Number of positive cases (default: 20)
--controls N - Number of control cases (default: 20)
--seed N - Random seed for reproducibility

Outputs:

synthea/output/<phenotype>/positive/fhir/*.json - Positive patient bundles
synthea/output/<phenotype>/control/fhir/*.json - Control patient bundles

load

Loads generated FHIR bundles to the FHIR server.

Prerequisites:

FHIR server running (HAPI FHIR at http://localhost:8080/fhir or Azure FHIR at http://localhost:9080)
Generated data exists in synthea/output/<phenotype>/

Important: Infrastructure bundles (hospitals, practitioners) must load BEFORE patient bundles. The CLI fhir-eval load synthea command handles this automatically. If loading manually, load hospitalInformation*.json and practitionerInformation*.json files first.

full

Runs the complete pipeline: create-module → generate → load

status

Shows which phenotypes have:

Document analysis data
Synthea modules created
Generated data
Data loaded to FHIR server

Instructions for Claude

When this skill is invoked, follow these instructions based on the command:

For `create-module <phenotype>`:

Read the phenotype data:

Read: data/phekb-raw/<phenotype>/document_analysis.json
Read: test-cases/phekb/phekb-<phenotype>.json (if exists)

Extract key information:
- Diagnosis codes (ICD-9, ICD-10, SNOMED-CT)
- Lab codes (LOINC) with thresholds from clinical_criteria
- Medication codes (RxNorm)
- Age requirements
- Exclusion criteria
Verify and enrich codes using the UMLS MCP server (CRITICAL):

Do NOT trust codes from document_analysis.json blindly — they may be incomplete, outdated, or wrong. Use the UMLS MCP tools to get authoritative codes:

a. For each diagnosis concept, search UMLS and get codes across systems:
```
search_umls(query="<condition name>", search_type="exact")
get_concept(cui="<CUI>")
crosswalk_codes(source="SNOMEDCT_US", code="<SNOMED>", target_source="ICD10CM")
crosswalk_codes(source="SNOMEDCT_US", code="<SNOMED>", target_source="ICD9CM")
```
b. For each lab concept, find the correct LOINC code:
```
search_umls(query="<lab name>", search_type="words")
```
Look for results with semantic type "Laboratory Procedure" or "Clinical Attribute".

c. For each medication, find the RxNorm code:
```
search_umls(query="<medication name>", search_type="exact")
crosswalk_codes(source="RXNORM", code="<CODE>", target_source="SNOMEDCT_US")
```
d. Cross-check codes from the phenotype data against UMLS:
```
get_source_concept(source="ICD10CM", id="<CODE>")
```
Verify the display name matches the intended concept. Discard obsolete codes.

e. If crosswalk returns empty (common for SNOMED→ICD-10), search UMLS for the term with search_type="words" and look for CUIs with atoms from the target system.

See /umls skill for full details on UMLS MCP usage patterns and gotchas.
Generate the Synthea module following GMF format:
- Use the existing synthea/modules/custom/phekb_type_2_diabetes.json as a template
- Include multiple code systems for each concept (SNOMED + ICD-10 + ICD-9)
- Set realistic value ranges based on clinical_criteria
- Create state machine: Initial → Age Guard → Condition Onset → Labs → Medications → Terminal
Generate the control module:
- Normal lab values (below diagnostic thresholds)
- No phenotype-specific diagnosis codes
- May include unrelated conditions for variety

Write the modules:

Write: synthea/modules/custom/phekb_<phenotype>.json
Write: synthea/modules/custom/phekb_<phenotype>_control.json

Validate JSON syntax by reading back and parsing

For `generate <phenotype>`:

Check prerequisites:
- Synthea source build exists at C:\repos\synthea (or SYNTHEA_HOME env var)
- Module exists: synthea/modules/custom/phekb_<phenotype>.json

If Synthea not found, provide install instructions:

bash
1git clone https://github.com/synthetichealth/synthea.git C:/repos/synthea

IMPORTANT: Environment-specific issues to handle:

Claude Code runs in a bash environment (git bash), NOT cmd.exe. This causes three issues:

Issue 1: .bat files don't run natively in bash.
- Do NOT call run_synthea.bat directly — it won't find gradlew.bat.
- Instead, call ./gradlew (the Unix wrapper) directly from the Synthea directory.
Issue 2: JAVA_HOME / Java not on PATH.
- Git bash doesn't inherit Windows system PATH fully. Java may not be found.
- Before running Synthea, set JAVA_HOME explicitly:
```
bash
1export JAVA_HOME="/c/Program Files/Eclipse Adoptium/jdk-17.0.18.8-hotspot"
2export PATH="$JAVA_HOME/bin:$PATH"
```
- Or auto-detect: check /c/Program Files/Eclipse Adoptium/, /c/Program Files/Java/, etc.
Issue 3: Backslash paths break Gradle -Params.
- Gradle's Groovy parser interprets \ as escape characters.
- ALWAYS use forward slashes in all paths passed to ./gradlew.
- Example: C:/repos/llm-fhir-query-eval/synthea/modules/custom (NOT C:\repos\...)
Run the Python helper script (recommended):
```
bash
1python synthea/generate_test_data.py --phenotype <phenotype> --patients 20 --controls 20
```
The script auto-detects whether it's running in bash or cmd.exe and adjusts:
- In bash: calls ./gradlew run -Params="[...]" directly with forward-slash paths
- In cmd.exe: calls run_synthea.bat as before
- Auto-detects JAVA_HOME from common Windows install locations

Or run Synthea directly via gradlew (if script has issues):

bash
1export JAVA_HOME="/c/Program Files/Eclipse Adoptium/jdk-17.0.18.8-hotspot"
2export PATH="$JAVA_HOME/bin:$PATH"
3
4cd C:/repos/synthea && ./gradlew run -Params="['-p','20','-m','phekb_<phenotype>','-d','C:/repos/llm-fhir-query-eval/synthea/modules/custom','--exporter.fhir.export','true','--exporter.fhir.use_us_core_ig','true','--exporter.baseDirectory','C:/repos/llm-fhir-query-eval/synthea/output/<phenotype>/positive','-s','42']"

Then repeat with phekb_<phenotype>_control module and control output subdirectory.

Report results: Count generated files and summarize

For `load <phenotype>`:

Check FHIR server is running:

bash
1curl -s http://localhost:8080/fhir/metadata | head -5

Load positive cases:

bash
1for f in synthea/output/<phenotype>/positive/fhir/*.json; do
2  curl -X POST http://localhost:8080/fhir \
3    -H "Content-Type: application/fhir+json" \
4    -d @"$f"
5done

Load control cases:

bash
1for f in synthea/output/<phenotype>/control/fhir/*.json; do
2  curl -X POST http://localhost:8080/fhir \
3    -H "Content-Type: application/fhir+json" \
4    -d @"$f"
5done

Verify loaded data by querying the server
Update test case with expected resource IDs (optional)

For `full <phenotype>`:

Execute in sequence:

create-module <phenotype>
generate <phenotype>
load <phenotype>

Report overall success/failure.

For `status`:

List all phenotypes from data/phekb-raw/*/
For each, check:
- Has document_analysis.json?
- Has module in synthea/modules/custom/?
- Has generated data in synthea/output/?
- Count positive/control patients

Display summary table:

Phenotype            | Analysis | Module | Data (pos/ctrl) | Loaded
---------------------|----------|--------|-----------------|--------
type-2-diabetes      | ✓        | ✓      | 20/20           | ✓
asthma               | ✓        | ✗      | -               | -
heart-failure        | ✓        | ✗      | -               | -

For `list`:

List phenotypes that have Synthea modules ready:

bash
1ls synthea/modules/custom/phekb_*.json | grep -v _control

For `batch <phenotypes...>`:

Parse the phenotype list (comma or space separated)
For each phenotype, run full <phenotype>
Track successes and failures
Report summary at end

Multi-Path Phenotype Modules

Key Learning: Phenotype Algorithms Have Multiple Paths

PheKB phenotype algorithms are multi-path decision trees, NOT simple code lookups. A single phenotype may identify patients through DIFFERENT combinations of clinical data:

Diagnosis codes only
Diagnosis codes + medications
Diagnosis codes + abnormal labs
Medications + abnormal labs (NO diagnosis code)
Complex temporal ordering rules

Generating Path-Specific Patients

When creating Synthea modules for phenotypes with multiple identification paths, generate DISTINCT patient groups:

Analyze the algorithm document (usually a PDF in data/phekb-raw/<phenotype>/) to identify all paths
Create separate state branches in the Synthea module for each path type
Path 4-type patients (no diagnosis code): These patients should have MedicationRequest + Observation resources but NO Condition resource. This requires a separate branch in the module that:
- Skips the ConditionOnset state
- Still prescribes medications (MedicationOrder)
- Still records abnormal lab values (Observation)
Use distributed_transition to control the mix of patient types

Module Structure for Multi-Path Phenotypes

Initial → Age_Guard → Set_Diabetes_Flag → Wellness_Encounter → Path_Router
                                                                    ├→ Path_With_Diagnosis (70%)
                                                                    │     ├→ Diagnose_Condition
                                                                    │     ├→ Record_Labs
                                                                    │     ├→ Prescribe_Meds
                                                                    │     └→ End_Encounter
                                                                    └→ Path_No_Diagnosis (30%)
                                                                          ├→ Record_Abnormal_Labs (NO ConditionOnset!)
                                                                          ├→ Prescribe_Meds
                                                                          └→ End_Encounter

Lab Value Thresholds

When generating observation values, use thresholds from the phenotype algorithm document:

Case thresholds: Values that qualify as "abnormal" for case identification
Control thresholds: Values that must be normal for control patients (often stricter)

Example from T2DM:

Lab	Case Threshold	Control Exclusion
HbA1c	>= 6.5%	>= 6.0%
Fasting glucose	>= 125 mg/dL	>= 110 mg/dL
Random glucose	> 200 mg/dL	> 110 mg/dL

Dual Data Sets: Generic vs US Core

For each phenotype, plan to generate TWO Synthea module variants:

Variant	Module Suffix	Condition Codes	Meds	Categories	When to Use
Generic	`phekb_<name>.json`	SNOMED only	RxNorm SCD	Base FHIR	Tier 1 eval, basic testing
US Core	`phekb_<name>_uscore.json`	SNOMED + ICD-10-CM	RxNorm SCD	US Core categories	Tier 3 eval, profile-aware testing

The US Core variant adds:

ICD-10-CM codes alongside SNOMED on ConditionOnset (US Core allows both)
Condition.category = problem-list-item (US Core requires this)
Observation.category = laboratory with proper US Core category coding
MedicationRequest.intent = order and .status = active (US Core requires these)
Note: US Core 8 removed ICD-9-CM from the condition valueset — don't include ICD-9 in US Core variant

Output directories:

synthea/output/<phenotype>/
├── generic/
│   ├── positive/fhir/
│   └── control/fhir/
└── uscore/
    ├── positive/fhir/
    └── control/fhir/

Medication Codes: Ingredient vs SCD

Synthea's FHIR exporter works best with SCD-level (Semantic Clinical Drug) RxNorm codes, not ingredient-level codes. Always:

Check the algorithm doc for ingredient-level codes
Use /umls to find the corresponding SCD codes
Use Synthea's built-in modules as reference for which SCD codes to use

IMPORTANT: Test case expected queries and Synthea modules need DIFFERENT code levels:

Synthea modules: Use SCD codes (e.g., 860975 for "metformin 500 MG ER Tablet") because Synthea generates FHIR MedicationRequest resources with specific drug forms
Test case expected queries: May use EITHER ingredient OR SCD codes depending on what the FHIR server indexes. HAPI FHIR does NOT automatically resolve ingredient→SCD relationships, so queries must match the exact codes in the data

Synthea GMF Critical Patterns (Lessons Learned)

These are hard-won lessons from debugging Synthea module generation:

"wellness": true on Encounter states is REQUIRED. Without it, the module's ConditionOnset/Observation/MedicationOrder states will process but produce ZERO FHIR resources. Synthea only writes resources to output when they occur inside a lifecycle-managed encounter.
ConditionOnset MUST be inside an encounter. Place it AFTER the Encounter state and BEFORE the EncounterEnd state. The old pattern of using target_encounter pointing to a future encounter state does NOT work reliably for custom modules.
Use SetAttribute for disease flags, conditional_transition for branching. Match the pattern from Synthea's built-in metabolic_syndrome_disease.json + metabolic_syndrome_care.json. Disease modules set attributes; care/encounter modules check attributes and create resources.
MedicationOrder reason field must reference an attribute name, not a state name. Use "reason": "t2dm_condition" where t2dm_condition was set via assign_to_attribute on a ConditionOnset. For Path 4 patients (no condition), omit the reason field.
Infrastructure bundles must load first on HAPI FHIR. Synthea generates hospitalInformation*.json and practitionerInformation*.json files. These must be loaded before patient bundles, or HAPI returns 404 errors for Practitioner references.
Patient count vs module filter. The -m flag in Synthea keeps only patients who enter the named module. Combined with -p N, it generates N total patients but only outputs those matching the module. If the module has an Age_Guard, young patients may pass the filter but lack clinical data.

FHIR Server Compatibility Notes

Server	Healthcheck	Data Persistence	Bundle Load Order	`_has` Support	Notes
HAPI FHIR	curl works	Stable in-memory	Infra files first	Yes	Recommended for dev
fhir-candle	No curl/wget	Unstable (periodic resets)	Any order	Limited	NOT recommended
Azure FHIR	TBD	SQL-backed	Infra files first	Yes	Requires SQL Server

Synthea Module Template

When creating modules, use this structure:

json
1{
2  "name": "PheKB <Phenotype Name>",
3  "remarks": [
4    "Auto-generated from PheKB phenotype: <phenotype-id>",
5    "Clinical criteria: ...",
6    "..."
7  ],
8  "states": {
9    "Initial": {
10      "type": "Initial",
11      "direct_transition": "Age_Guard"
12    },
13    "Age_Guard": {
14      "type": "Guard",
15      "allow": { "condition_type": "Age", "operator": ">=", "quantity": 18, "unit": "years" },
16      "direct_transition": "..."
17    },
18    "Condition_Onset": {
19      "type": "ConditionOnset",
20      "codes": [
21        { "system": "SNOMED-CT", "code": "...", "display": "..." },
22        { "system": "ICD-10-CM", "code": "...", "display": "..." }
23      ],
24      "direct_transition": "..."
25    },
26    "Lab_Observation": {
27      "type": "Observation",
28      "category": "laboratory",
29      "codes": [{ "system": "LOINC", "code": "...", "display": "..." }],
30      "unit": "...",
31      "range": { "low": ..., "high": ... },
32      "direct_transition": "..."
33    },
34    "Medication_Order": {
35      "type": "MedicationOrder",
36      "codes": [{ "system": "RxNorm", "code": "...", "display": "..." }],
37      "direct_transition": "..."
38    },
39    "Terminal": {
40      "type": "Terminal"
41    }
42  },
43  "gmf_version": 2
44}

Code System URIs

System	Synthea Name	FHIR URI
SNOMED CT	`SNOMED-CT`	`http://snomed.info/sct`
ICD-10-CM	`ICD-10-CM`	`http://hl7.org/fhir/sid/icd-10-cm`
ICD-9-CM	`ICD-9-CM`	`http://hl7.org/fhir/sid/icd-9-cm`
LOINC	`LOINC`	`http://loinc.org`
RxNorm	`RxNorm`	`http://www.nlm.nih.gov/research/umls/rxnorm`

Note: Algorithm PDFs (e.g., Table 7) may list additional LOINC codes beyond the primary ones. Include all relevant codes in modules:

Random glucose: 2339-0, 2345-7
HbA1c: 4548-4, 17856-6, 4549-2, 17855-8

Example Session

User: /synthea create-module asthma

Claude: I'll create a Synthea module for the asthma phenotype.

[Reads document_analysis.json and test case]
[Extracts codes: ICD-9 493.x, SNOMED 195967001, LOINC for spirometry, RxNorm for albuterol]
[Creates module with age guard, condition onset, observations, medications]
[Writes phekb_asthma.json and phekb_asthma_control.json]

Created Synthea modules for asthma:
- synthea/modules/custom/phekb_asthma.json (positive cases)
- synthea/modules/custom/phekb_asthma_control.json (controls)

To generate test data: /synthea generate asthma

User: /synthea full heart-failure

Claude: Running full pipeline for heart-failure phenotype...

Step 1/3: Creating module...
[Creates module]

Step 2/3: Generating data...
[Runs Synthea - 20 positive, 20 control patients]

Step 3/3: Loading to FHIR server...
[Loads 40 patient bundles]

Complete! Generated and loaded 40 patients for heart-failure phenotype.
- 20 positive cases (should match phenotype query)
- 20 controls (should NOT match)

synthea — for Claude Code synthea, llm-fhir-query-eval, community, for Claude Code, ide skills, FHIR query translation, Natural Language Processing, Healthcare data interaction, Synthea module creation, Synthetic patient generation, Claude Code

About this Skill

Features

# Core Topics

Browser Sandbox Environment

⚡️ Ready to unleash?

synthea

Synthea Test Data Generation Skill

Usage

Commands

Command Details

create-module

generate

load

full

status

Instructions for Claude

For create-module <phenotype>:

For generate <phenotype>:

For load <phenotype>:

For full <phenotype>:

For status:

For list:

For batch <phenotypes...>:

Multi-Path Phenotype Modules

Key Learning: Phenotype Algorithms Have Multiple Paths

Generating Path-Specific Patients

Module Structure for Multi-Path Phenotypes

Lab Value Thresholds

Dual Data Sets: Generic vs US Core

Medication Codes: Ingredient vs SCD

Synthea GMF Critical Patterns (Lessons Learned)

FHIR Server Compatibility Notes

Synthea Module Template

Code System URIs

Example Session

FAQ & Installation Steps

? Frequently Asked Questions

What is synthea?

How do I install synthea?

Which IDEs are compatible with synthea?

↓ How To Install

Related Skills

Looking for an alternative to synthea or another community skill for your workflow? Explore these related open-source skills.

widget-generator

flags

zustand

data-fetching