investigate-stuck-messages — community investigate-stuck-messages, hyperlane-monorepo, community, ide skills, Claude Code, Cursor, Windsurf

v1.0.0
GitHub

About this Skill

Perfect for Relayer API Agents needing advanced stuck message analysis and retry count investigation. The home for Hyperlane core contracts, sdk packages, and other infrastructure

hyperlane-xyz hyperlane-xyz
[0]
[0]
Updated: 3/5/2026

Agent Capability Analysis

The investigate-stuck-messages skill by hyperlane-xyz is an open-source community AI agent skill for Claude Code and other IDE workflows, helping agents execute tasks with better context, repeatability, and domain-specific guidance.

Ideal Agent Persona

Perfect for Relayer API Agents needing advanced stuck message analysis and retry count investigation.

Core Value

Empowers agents to query the relayer API, investigate stuck messages, and analyze retry counts and error reasons using Hyperlane core contracts and SDK packages, enabling efficient debugging of app context relayer queue issues.

Capabilities Granted for investigate-stuck-messages

Debugging stuck messages in prepare queue
Analyzing retry counts for specific app contexts
Investigating error reasons for high retry counts on specific chains

! Prerequisites & Limits

  • Requires Relayer API access
  • Limited to Hyperlane core contracts and SDK packages
  • Alert-based triggers required for automated investigation
Labs Demo

Browser Sandbox Environment

⚡️ Ready to unleash?

Experience this Agent in a zero-setup browser environment powered by WebContainers. No installation required.

Boot Container Sandbox

investigate-stuck-messages

Install investigate-stuck-messages, an AI agent skill for AI agent workflows and automation. Works with Claude Code, Cursor, and Windsurf with one-command...

SKILL.md
Readonly

Investigate Stuck Messages

Query the relayer API to investigate stuck messages, their retry counts, and error reasons.

When to Use

  1. Alert-based triggers:

    • Alert: "Known app context relayer queue length > 0 for 40m"
    • Any alert mentioning stuck messages in prepare queue
    • High retry counts for specific app contexts
  2. User request triggers:

    • "Why are messages stuck for [app_context]?"
    • "Investigate stuck messages on [chain]"
    • "What's causing the queue alert?"
    • Pasting a Grafana alert URL

Input Parameters

Option 1: Grafana Alert URL (recommended)

/investigate-stuck-messages https://abacusworks.grafana.net/alerting/grafana/cdg1ro5hi4vswb/view?tab=instances

Option 2: Manual specification

/investigate-stuck-messages app_context=EZETH/renzo-prod remote=linea
ParameterRequiredDefaultDescription
alert_urlNo-Grafana alert URL (extracts app_context/remote from firing instances)
app_contextNo*-The app context (e.g., EZETH/renzo-prod, oUSDT/production)
remoteNo*-Destination chain name (e.g., linea, ethereum, arbitrum)
environmentNomainnet3Deployment environment

*Either alert_url OR both app_context and remote must be provided.

Workflow

Step 1: Parse Input and Extract Alert Instances

If Grafana alert URL provided:

  1. Extract the alert UID from the URL (e.g., cdg1ro5hi4vswb from .../alerting/grafana/cdg1ro5hi4vswb/view)

  2. Query Prometheus directly for firing instances using mcp__grafana__query_prometheus:

    sum by (app_context, remote)(
        max_over_time(
            hyperlane_submitter_queue_length{
                queue_name="prepare_queue",
                app_context!~"Unknown|merkly_eth|merkly_erc20|helloworld|velo_message_module",
                hyperlane_context!~"rc|vanguard0|vanguard1|vanguard2|vanguard3|vanguard4|vanguard5",
                operation_status!~"Retry\\(ApplicationReport\\(.*\\)\\)|FirstPrepareAttempt",
                hyperlane_deployment="mainnet3",
            }[2m]
        )
    ) > 0
    
  3. Extract app_context and remote labels from each result.

If manual app_context/remote provided:

Use the provided values directly.

Step 2: Setup Port-Forward to Relayer

Check if port 9090 is already in use:

bash
1lsof -i :9090

If not in use, start port-forward in background:

bash
1kubectl port-forward omniscient-relayer-hyperlane-agent-relayer-0 9090 -n mainnet3 &

Wait a few seconds for the port-forward to establish.

Step 3: Get Domain IDs for Chains

Look up domain IDs from the registry:

bash
1cat node_modules/.pnpm/@hyperlane-xyz+registry@*/node_modules/@hyperlane-xyz/registry/dist/chains/<chain>/metadata.json | jq '.domainId'

Common domain IDs:

  • ethereum: 1
  • optimism: 10
  • arbitrum: 42161
  • polygon: 137
  • base: 8453
  • unichain: 130
  • avalanche: 43114

Step 4: Query Relayer API

For each destination chain, query the relayer API:

bash
1curl -s 'http://localhost:9090/list_operations?destination_domain=<DOMAIN_ID>' > /tmp/<chain>.json

The response contains operations with:

  • id: Message ID (H256)
  • operation.message.sender: Sender address
  • operation.message.recipient: Recipient address
  • operation.num_retries: Number of retries (higher = more stuck)
  • operation.status: Error status (e.g., {"Retry": "ErrorEstimatingGas"})
  • operation.message.origin: Origin domain ID
  • operation.message.destination: Destination domain ID
  • operation.app_context: App context name

Step 5: Filter Messages by App Context

Look up the app_context in rust/main/app-contexts/mainnet_config.json:

bash
1jq '.metricAppContexts[] | select(.name == "<APP_CONTEXT>")' rust/main/app-contexts/mainnet_config.json

Filter API results to only include messages where:

  • operation.message.recipient matches one of the recipientAddress values for that destination domain

Important: Addresses are padded to 32 bytes (H256 format).

Step 6: Query GCP Logs for Actual Errors

Calculate log freshness based on retry count:

The relayer uses exponential backoff (see calculate_msg_backoff in rust/main/agents/relayer/src/msg/pending_message.rs):

RetriesBackoff/retryCumulative TimeFreshness Flag
1-45s-1min~2min--freshness=1h
5-243min~1h--freshness=3h
25-395-26min~5h--freshness=12h
40-4930min-1h~12h--freshness=24h
50-602-22h~35h--freshness=3d
60+22h+35h+--freshness=7d

For each message ID, query GCP logs with calculated freshness:

bash
1gcloud logging read 'resource.type=k8s_container AND resource.labels.namespace_name=mainnet3 AND resource.labels.pod_name:omniscient-relayer AND jsonPayload.span.id:<MESSAGE_ID> AND jsonPayload.fields.error:*' --project=abacus-labs-dev --limit=1 --format='value(jsonPayload.fields.error)' --freshness=<CALCULATED_FRESHNESS>

Extract the human-readable error from the response using sed (macOS compatible):

bash
1echo "$raw_error" | sed -n 's/.*execution reverted: \([^"]*\)".*/\1/p' | head -1

Common error patterns:

  • "execution reverted: Nonce already used" → "Nonce already used"
  • "execution reverted: panic: arithmetic underflow" → "Arithmetic underflow"

Note: Do not use grep -P as it's not available on macOS.

Step 7: Present Investigation Results

Output a detailed summary table with full message IDs and both error sources:

## Investigation Results for [APP_CONTEXT]

### Summary
- Total stuck messages: X
- Destinations affected: [list]
- Reprepare reasons: ErrorEstimatingGas (N), CouldNotFetchMetadata (M)

### Messages

| Message ID | Retries | Reprepare Reason | Error | Origin |
|------------|---------|------------------|-----------|--------|
| `0xaa18ebc1c79345e6d24984a0b9a5ab66c968d128d46b2357b641e56e71b8d30c` | 47 | ErrorEstimatingGas | Nonce already used | optimism |
| `0xd6aeef7c092a88aa23ad53227aeb834ae731d059b3ce749db8451e761f3f15ac` | 47 | ErrorEstimatingGas | Nonce already used | arbitrum |

**Important**: Always show the full 66-character message ID (0x + 64 hex chars). Do not truncate.

### Error Analysis
[Explain based on the actual log errors found]

### Next Steps
To denylist these messages, run:
/denylist-stuck-messages <message_ids> app_context=APP_CONTEXT

Column definitions:

  • Reprepare Reason: From operation.status in relayer API (e.g., ErrorEstimatingGas, CouldNotFetchMetadata)
  • Error: Actual revert reason from GCP logs (e.g., "Nonce already used", "Arithmetic underflow")

Step 8: Output Denylist Command

At the end of the investigation results, output the full denylist command:

### Next Steps
To denylist, run:
/denylist-stuck-messages 0xaa18ebc1c79345e6d24984a0b9a5ab66c968d128d46b2357b641e56e71b8d30c 0xd6aeef7c092a88aa23ad53227aeb834ae731d059b3ce749db8451e761f3f15ac app_context=APP_CONTEXT

Always use full message IDs, never truncated.

Error Status Reference

StatusMeaningAction
ErrorEstimatingGasGas estimation failed (contract revert)Usually denylist - contract won't accept
CouldNotFetchMetadataCan't get ISM metadataCheck validators, may resolve itself
ApplicationReport(...)App-specific errorCheck the specific error message
GasPaymentNotFoundNo IGP paymentMay need manual relay with gas

Error Handling

  • Port-forward fails: Check kubectl context: kubectl config current-context
  • No messages found: Queue may have cleared; alert may be stale
  • API returns error: Check relayer pod: kubectl get pods -n mainnet3 | grep relayer
  • App context not found: May be new/custom; ask user for sender/recipient addresses

Prerequisites

  • kubectl configured with access to mainnet cluster
  • Grafana MCP server connected (for alert URL parsing)

FAQ & Installation Steps

These questions and steps mirror the structured data on this page for better search understanding.

? Frequently Asked Questions

What is investigate-stuck-messages?

Perfect for Relayer API Agents needing advanced stuck message analysis and retry count investigation. The home for Hyperlane core contracts, sdk packages, and other infrastructure

How do I install investigate-stuck-messages?

Run the command: npx killer-skills add hyperlane-xyz/hyperlane-monorepo/investigate-stuck-messages. It works with Cursor, Windsurf, VS Code, Claude Code, and 19+ other IDEs.

What are the use cases for investigate-stuck-messages?

Key use cases include: Debugging stuck messages in prepare queue, Analyzing retry counts for specific app contexts, Investigating error reasons for high retry counts on specific chains.

Which IDEs are compatible with investigate-stuck-messages?

This skill is compatible with Cursor, Windsurf, VS Code, Trae, Claude Code, OpenClaw, Aider, Codex, OpenCode, Goose, Cline, Roo Code, Kiro, Augment Code, Continue, GitHub Copilot, Sourcegraph Cody, and Amazon Q Developer. Use the Killer-Skills CLI for universal one-command installation.

Are there any limitations for investigate-stuck-messages?

Requires Relayer API access. Limited to Hyperlane core contracts and SDK packages. Alert-based triggers required for automated investigation.

How To Install

  1. 1. Open your terminal

    Open the terminal or command line in your project directory.

  2. 2. Run the install command

    Run: npx killer-skills add hyperlane-xyz/hyperlane-monorepo/investigate-stuck-messages. The CLI will automatically detect your IDE or AI agent and configure the skill.

  3. 3. Start using the skill

    The skill is now active. Your AI agent can use investigate-stuck-messages immediately in the current project.

Related Skills

Looking for an alternative to investigate-stuck-messages or another community skill for your workflow? Explore these related open-source skills.

View All

widget-generator

Logo of f
f

f.k.a. Awesome ChatGPT Prompts. Share, discover, and collect prompts from the community. Free and open source — self-host for your organization with complete privacy.

149.6k
0
AI

flags

Logo of vercel
vercel

flags is a Next.js feature management skill that enables developers to efficiently add or modify framework feature flags, streamlining React application development.

138.4k
0
Browser

zustand

Logo of lobehub
lobehub

The ultimate space for work and life — to find, build, and collaborate with agent teammates that grow with you. We are taking agent harness to the next level — enabling multi-agent collaboration, effortless agent team design, and introducing agents as the unit of work interaction.

72.8k
0
AI

data-fetching

Logo of lobehub
lobehub

The ultimate space for work and life — to find, build, and collaborate with agent teammates that grow with you. We are taking agent harness to the next level — enabling multi-agent collaboration, effortless agent team design, and introducing agents as the unit of work interaction.

72.8k
0
AI