What is investigate-stuck-messages?

Perfect for Relayer API Agents needing advanced stuck message analysis and retry count investigation. The home for Hyperlane core contracts, sdk packages, and other infrastructure

How do I install investigate-stuck-messages?

Run the command: npx killer-skills add hyperlane-xyz/hyperlane-monorepo/investigate-stuck-messages. It works with Cursor, Windsurf, VS Code, Claude Code, and 19+ other IDEs.

What are the use cases for investigate-stuck-messages?

Key use cases include: Debugging stuck messages in prepare queue, Analyzing retry counts for specific app contexts, Investigating error reasons for high retry counts on specific chains.

Which IDEs are compatible with investigate-stuck-messages?

This skill is compatible with Cursor, Windsurf, VS Code, Trae, Claude Code, OpenClaw, Aider, Codex, OpenCode, Goose, Cline, Roo Code, Kiro, Augment Code, Continue, GitHub Copilot, Sourcegraph Cody, and Amazon Q Developer. Use the Killer-Skills CLI for universal one-command installation.

Are there any limitations for investigate-stuck-messages?

Requires Relayer API access. Limited to Hyperlane core contracts and SDK packages. Alert-based triggers required for automated investigation.

Investigate Stuck Messages

Name: investigate-stuck-messages
Availability: InStock
Author: hyperlane-xyz

Query the relayer API to investigate stuck messages, their retry counts, and error reasons.

When to Use

Alert-based triggers:
- Alert: "Known app context relayer queue length > 0 for 40m"
- Any alert mentioning stuck messages in prepare queue
- High retry counts for specific app contexts
User request triggers:
- "Why are messages stuck for [app_context]?"
- "Investigate stuck messages on [chain]"
- "What's causing the queue alert?"
- Pasting a Grafana alert URL

Input Parameters

Option 1: Grafana Alert URL (recommended)

/investigate-stuck-messages https://abacusworks.grafana.net/alerting/grafana/cdg1ro5hi4vswb/view?tab=instances

Option 2: Manual specification

/investigate-stuck-messages app_context=EZETH/renzo-prod remote=linea

Parameter	Required	Default	Description
`alert_url`	No	-	Grafana alert URL (extracts app_context/remote from firing instances)
`app_context`	No*	-	The app context (e.g., `EZETH/renzo-prod`, `oUSDT/production`)
`remote`	No*	-	Destination chain name (e.g., `linea`, `ethereum`, `arbitrum`)
`environment`	No	`mainnet3`	Deployment environment

*Either alert_url OR both app_context and remote must be provided.

Workflow

Step 1: Parse Input and Extract Alert Instances

If Grafana alert URL provided:

Extract the alert UID from the URL (e.g., cdg1ro5hi4vswb from .../alerting/grafana/cdg1ro5hi4vswb/view)

Query Prometheus directly for firing instances using mcp__grafana__query_prometheus:

sum by (app_context, remote)(
    max_over_time(
        hyperlane_submitter_queue_length{
            queue_name="prepare_queue",
            app_context!~"Unknown|merkly_eth|merkly_erc20|helloworld|velo_message_module",
            hyperlane_context!~"rc|vanguard0|vanguard1|vanguard2|vanguard3|vanguard4|vanguard5",
            operation_status!~"Retry\\(ApplicationReport\\(.*\\)\\)|FirstPrepareAttempt",
            hyperlane_deployment="mainnet3",
        }[2m]
    )
) > 0

Extract app_context and remote labels from each result.

If manual app_context/remote provided:

Use the provided values directly.

Step 2: Setup Port-Forward to Relayer

Check if port 9090 is already in use:

bash
1lsof -i :9090

If not in use, start port-forward in background:

bash
1kubectl port-forward omniscient-relayer-hyperlane-agent-relayer-0 9090 -n mainnet3 &

Wait a few seconds for the port-forward to establish.

Step 3: Get Domain IDs for Chains

Look up domain IDs from the registry:

bash
1cat node_modules/.pnpm/@hyperlane-xyz+registry@*/node_modules/@hyperlane-xyz/registry/dist/chains/<chain>/metadata.json | jq '.domainId'

Common domain IDs:

ethereum: 1
optimism: 10
arbitrum: 42161
polygon: 137
base: 8453
unichain: 130
avalanche: 43114

Step 4: Query Relayer API

For each destination chain, query the relayer API:

bash
1curl -s 'http://localhost:9090/list_operations?destination_domain=<DOMAIN_ID>' > /tmp/<chain>.json

The response contains operations with:

id: Message ID (H256)
operation.message.sender: Sender address
operation.message.recipient: Recipient address
operation.num_retries: Number of retries (higher = more stuck)
operation.status: Error status (e.g., {"Retry": "ErrorEstimatingGas"})
operation.message.origin: Origin domain ID
operation.message.destination: Destination domain ID
operation.app_context: App context name

Step 5: Filter Messages by App Context

Look up the app_context in rust/main/app-contexts/mainnet_config.json:

bash
1jq '.metricAppContexts[] | select(.name == "<APP_CONTEXT>")' rust/main/app-contexts/mainnet_config.json

Filter API results to only include messages where:

operation.message.recipient matches one of the recipientAddress values for that destination domain

Important: Addresses are padded to 32 bytes (H256 format).

Step 6: Query GCP Logs for Actual Errors

Calculate log freshness based on retry count:

The relayer uses exponential backoff (see calculate_msg_backoff in rust/main/agents/relayer/src/msg/pending_message.rs):

Retries	Backoff/retry	Cumulative Time	Freshness Flag
1-4	5s-1min	~2min	`--freshness=1h`
5-24	3min	~1h	`--freshness=3h`
25-39	5-26min	~5h	`--freshness=12h`
40-49	30min-1h	~12h	`--freshness=24h`
50-60	2-22h	~35h	`--freshness=3d`
60+	22h+	35h+	`--freshness=7d`

For each message ID, query GCP logs with calculated freshness:

bash
1gcloud logging read 'resource.type=k8s_container AND resource.labels.namespace_name=mainnet3 AND resource.labels.pod_name:omniscient-relayer AND jsonPayload.span.id:<MESSAGE_ID> AND jsonPayload.fields.error:*' --project=abacus-labs-dev --limit=1 --format='value(jsonPayload.fields.error)' --freshness=<CALCULATED_FRESHNESS>

Extract the human-readable error from the response using sed (macOS compatible):

bash
1echo "$raw_error" | sed -n 's/.*execution reverted: \([^"]*\)".*/\1/p' | head -1

Common error patterns:

"execution reverted: Nonce already used" → "Nonce already used"
"execution reverted: panic: arithmetic underflow" → "Arithmetic underflow"

Note: Do not use grep -P as it's not available on macOS.

Step 7: Present Investigation Results

Output a detailed summary table with full message IDs and both error sources:

## Investigation Results for [APP_CONTEXT]

### Summary
- Total stuck messages: X
- Destinations affected: [list]
- Reprepare reasons: ErrorEstimatingGas (N), CouldNotFetchMetadata (M)

### Messages

| Message ID | Retries | Reprepare Reason | Error | Origin |
|------------|---------|------------------|-----------|--------|
| `0xaa18ebc1c79345e6d24984a0b9a5ab66c968d128d46b2357b641e56e71b8d30c` | 47 | ErrorEstimatingGas | Nonce already used | optimism |
| `0xd6aeef7c092a88aa23ad53227aeb834ae731d059b3ce749db8451e761f3f15ac` | 47 | ErrorEstimatingGas | Nonce already used | arbitrum |

**Important**: Always show the full 66-character message ID (0x + 64 hex chars). Do not truncate.

### Error Analysis
[Explain based on the actual log errors found]

### Next Steps
To denylist these messages, run:
/denylist-stuck-messages <message_ids> app_context=APP_CONTEXT

Column definitions:

Reprepare Reason: From operation.status in relayer API (e.g., ErrorEstimatingGas, CouldNotFetchMetadata)
Error: Actual revert reason from GCP logs (e.g., "Nonce already used", "Arithmetic underflow")

Step 8: Output Denylist Command

At the end of the investigation results, output the full denylist command:

### Next Steps
To denylist, run:
/denylist-stuck-messages 0xaa18ebc1c79345e6d24984a0b9a5ab66c968d128d46b2357b641e56e71b8d30c 0xd6aeef7c092a88aa23ad53227aeb834ae731d059b3ce749db8451e761f3f15ac app_context=APP_CONTEXT

Always use full message IDs, never truncated.

Error Status Reference

Status	Meaning	Action
`ErrorEstimatingGas`	Gas estimation failed (contract revert)	Usually denylist - contract won't accept
`CouldNotFetchMetadata`	Can't get ISM metadata	Check validators, may resolve itself
`ApplicationReport(...)`	App-specific error	Check the specific error message
`GasPaymentNotFound`	No IGP payment	May need manual relay with gas

Error Handling

Port-forward fails: Check kubectl context: kubectl config current-context
No messages found: Queue may have cleared; alert may be stale
API returns error: Check relayer pod: kubectl get pods -n mainnet3 | grep relayer
App context not found: May be new/custom; ask user for sender/recipient addresses

Prerequisites

kubectl configured with access to mainnet cluster
Grafana MCP server connected (for alert URL parsing)

investigate-stuck-messages — community investigate-stuck-messages, hyperlane-monorepo, community, ide skills, Claude Code, Cursor, Windsurf

Agent Capability Analysis

Ideal Agent Persona

Core Value

↓ Capabilities Granted for investigate-stuck-messages

! Prerequisites & Limits

Browser Sandbox Environment

⚡️ Ready to unleash?

investigate-stuck-messages

Investigate Stuck Messages

When to Use

Input Parameters

Workflow

Step 1: Parse Input and Extract Alert Instances

Step 2: Setup Port-Forward to Relayer

Step 3: Get Domain IDs for Chains

Step 4: Query Relayer API

Step 5: Filter Messages by App Context

Step 6: Query GCP Logs for Actual Errors

Step 7: Present Investigation Results

Step 8: Output Denylist Command

Error Status Reference

Error Handling

Prerequisites

FAQ & Installation Steps

? Frequently Asked Questions

What is investigate-stuck-messages?

How do I install investigate-stuck-messages?

What are the use cases for investigate-stuck-messages?

Which IDEs are compatible with investigate-stuck-messages?

Are there any limitations for investigate-stuck-messages?

↓ How To Install

Related Skills

Looking for an alternative to investigate-stuck-messages or another community skill for your workflow? Explore these related open-source skills.

widget-generator

flags

zustand

data-fetching

investigate-stuck-messages — community investigate-stuck-messages, hyperlane-monorepo, community, ide skills, Claude Code, Cursor, Windsurf

About this Skill

Agent Capability Analysis

Ideal Agent Persona

Core Value

↓ Capabilities Granted for investigate-stuck-messages

! Prerequisites & Limits

Browser Sandbox Environment

⚡️ Ready to unleash?

investigate-stuck-messages

Investigate Stuck Messages

When to Use

Input Parameters

Workflow

Step 1: Parse Input and Extract Alert Instances

Step 2: Setup Port-Forward to Relayer

Step 3: Get Domain IDs for Chains

Step 4: Query Relayer API

Step 5: Filter Messages by App Context

Step 6: Query GCP Logs for Actual Errors

Step 7: Present Investigation Results

Step 8: Output Denylist Command

Error Status Reference

Error Handling

Prerequisites

FAQ & Installation Steps

? Frequently Asked Questions

What is investigate-stuck-messages?

How do I install investigate-stuck-messages?

What are the use cases for investigate-stuck-messages?

Which IDEs are compatible with investigate-stuck-messages?

Are there any limitations for investigate-stuck-messages?

↓ How To Install

Related Skills

Looking for an alternative to investigate-stuck-messages or another community skill for your workflow? Explore these related open-source skills.

widget-generator

flags

zustand

data-fetching