RLM CLI
Recursive Language Models (RLM) CLI - enables LLMs to handle near-infinite context by recursively decomposing inputs and calling themselves over parts. Supports files, directories, URLs, and stdin.
Installation
bash
1pip install rlm-cli # or: pipx install rlm-cli
2uvx rlm-cli ask ... # run without installing
Set an API key for your backend (openrouter is default):
bash
1export OPENROUTER_API_KEY=... # default backend
2export OPENAI_API_KEY=... # for --backend openai
3export ANTHROPIC_API_KEY=... # for --backend anthropic
Commands
ask - Query with context
bash
1rlm ask <inputs> -q "question"
Inputs (combinable):
| Type | Example | Notes |
|---|
| Directory | rlm ask . -q "..." | Recursive, respects .gitignore |
| File | rlm ask main.py -q "..." | Single file |
| URL | rlm ask https://x.com -q "..." | Auto-converts to markdown |
| stdin | git diff | rlm ask - -q "..." | - reads from pipe |
| Literal | rlm ask "text" -q "..." --literal | Treat as raw text |
| Multiple | rlm ask a.py b.py -q "..." | Combine any types |
Options:
| Flag | Description |
|---|
-q "..." | Question/prompt (required) |
--backend | Provider: openrouter (default), openai, anthropic |
--model NAME | Model override (format: provider/model or just model) |
--json | Machine-readable output |
--output-format | Output format: text, json, or json-tree |
--summary | Show execution summary with depth statistics |
--extensions .py .ts | Filter by extension |
--include/--exclude | Glob patterns |
--max-iterations N | Limit REPL iterations (default: 30) |
--max-depth N | Recursive RLM depth (default: 1 = no recursion) |
--max-budget N.NN | Spending limit in USD (requires OpenRouter) |
--max-timeout N | Time limit in seconds |
--max-tokens N | Total token limit (input + output) |
--max-errors N | Consecutive error limit before stopping |
--no-index | Skip auto-indexing |
--exa | Enable Exa web search (requires EXA_API_KEY) |
--inject-file FILE | Execute Python code between iterations |
JSON output structure:
json
1{"ok": true, "exit_code": 0, "result": {"response": "..."}, "stats": {...}}
JSON-tree output (--output-format=json-tree):
Adds execution tree showing nested RLM calls:
json
1{
2 "result": {
3 "response": "...",
4 "tree": {
5 "depth": 0,
6 "model": "openai/gpt-4",
7 "duration": 2.3,
8 "cost": 0.05,
9 "iterations": [...],
10 "children": [...]
11 }
12 }
13}
Summary output (--summary):
Shows depth-wise statistics after completion:
- JSON mode: adds
summary field to stats
- Text mode: prints summary to stderr
=== RLM Execution Summary ===
Total depth: 2 | Nodes: 3 | Cost: $0.0054 | Duration: 17.38s
Depth 0: 1 call(s) ($0.0047, 13.94s)
Depth 1: 2 call(s) ($0.0007, 3.44s)
complete - Query without context
bash
1rlm complete "prompt text"
2rlm complete "Generate SQL" --json --backend openai
search - Search indexed files
bash
1rlm search "query" [options]
| Flag | Description |
|---|
--limit N | Max results (default: 20) |
--language python | Filter by language |
--paths-only | Output file paths only |
--json | JSON output |
Auto-indexes on first use. Manual index: rlm index .
index - Build search index
bash
1rlm index . # Index current dir
2rlm index ./src --force # Force full reindex
doctor - Check setup
bash
1rlm doctor # Check config, API keys, deps
2rlm doctor --json
Workflows
Git diff review:
bash
1git diff | rlm ask - -q "Review for bugs"
2git diff --cached | rlm ask - -q "Ready to commit?"
3git diff HEAD~3 | rlm ask - -q "Summarize changes"
Codebase analysis:
bash
1rlm ask . -q "Explain architecture"
2rlm ask src/ -q "How does auth work?" --extensions .py
Search + analyze:
bash
1rlm search "database" --paths-only
2rlm ask src/db.py -q "How is connection pooling done?"
Compare files:
bash
1rlm ask old.py new.py -q "What changed?"
Configuration
Precedence: CLI flags > env vars > config file > defaults
Config locations: ./rlm.yaml, ./.rlm.yaml, ~/.config/rlm/config.yaml
yaml
1backend: openrouter
2model: google/gemini-3-flash-preview
3max_iterations: 30
Environment variables:
RLM_BACKEND - Default backend
RLM_MODEL - Default model
RLM_CONFIG - Config file path
RLM_JSON=1 - Always output JSON
Recursion and Budget Limits
Recursive RLM (--max-depth)
Enable recursive llm_query() calls where child RLMs process sub-tasks:
bash
1# 2 levels of recursion
2rlm ask . -q "Research thoroughly" --max-depth 2
3
4# With budget cap
5rlm ask . -q "Analyze codebase" --max-depth 3 --max-budget 0.50
Budget Control (--max-budget)
Limit spending per completion. Raises BudgetExceededError when exceeded:
bash
1# Cap at $1.00
2rlm complete "Complex task" --max-budget 1.00
3
4# Very low budget (will likely exceed)
5rlm ask . -q "Analyze everything" --max-budget 0.001
Requirements: OpenRouter backend (returns cost data in responses).
Other Limits
Timeout (--max-timeout) - Stop after N seconds:
bash
1rlm complete "Complex task" --max-timeout 30
Token limit (--max-tokens) - Stop after N total tokens:
bash
1rlm ask . -q "Analyze" --max-tokens 10000
Error threshold (--max-errors) - Stop after N consecutive code errors:
bash
1rlm complete "Write code" --max-errors 3
Stop Conditions
RLM execution stops when any of these occur:
- Final answer - LLM calls
FINAL_VAR("variable_name") with the NAME of a variable (as a string)
- Max iterations - Exceeds
--max-iterations (exit code 0, graceful - forces final answer)
FINAL_VAR usage (common mistake - pass variable NAME, not value):
python
1# CORRECT:
2result = {"answer": "hello", "score": 42}
3FINAL_VAR("result") # pass the variable NAME as a string
4
5# WRONG:
6FINAL_VAR(result) # passing the dict directly causes AttributeError
- Max budget exceeded - Spending >
--max-budget (exit code 20, error)
- Max timeout exceeded - Time >
--max-timeout (exit code 20, error with partial answer)
- Max tokens exceeded - Tokens >
--max-tokens (exit code 20, error with partial answer)
- Max errors exceeded - Consecutive errors >
--max-errors (exit code 20, error with partial answer)
- User cancellation - Ctrl+C or SIGUSR1 (exit code 0, returns partial answer as success)
- Max depth reached - Child RLM at depth 0 cannot recurse further
Note on max iterations: This is a soft limit. When exceeded, RLM prompts the LLM one more time to provide a final answer. Modern LLMs typically complete in 1-2 iterations.
Partial answers: When timeout, tokens, or errors stop execution, the error includes partial_answer if any response was generated before stopping.
Early exit (Ctrl+C): Pressing Ctrl+C (or sending SIGUSR1) returns the partial answer as success (exit code 0) with early_exit: true in the result.
Inject File (--inject-file)
Update REPL variables mid-run by modifying an inject file:
bash
1# Create inject file
2echo 'focus = "authentication"' > inject.py
3
4# Run with inject file
5rlm ask . -q "Analyze based on 'focus'" --inject-file inject.py
6
7# In another terminal, update mid-run
8echo 'focus = "authorization"' > inject.py
The file is checked before each iteration and executed if modified.
Exit Codes
| Code | Meaning |
|---|
| 0 | Success |
| 2 | CLI usage error |
| 10 | Input error (file not found) |
| 11 | Config error (missing API key) |
| 20 | Backend/API error (includes budget exceeded) |
| 30 | Runtime error |
| 40 | Index/search error |
When rlm ask runs on a directory, the LLM gets search tools:
| Tool | Cost | Privacy | Use For |
|---|
rg.search() | Free | Local | Exact patterns, function names, imports |
tv.search() | Free | Local | Topics, concepts, related files |
exa.search() | $ | API | Web search (requires --exa flag) |
pi.* | $$$ | API | Hierarchical PDF/document navigation |
- rg.search(pattern, paths, globs) - ripgrep for exact patterns
- tv.search(query, limit) - Tantivy BM25 for concepts
Exa Web Search (--exa flag, Costs Money)
⚠️ Opt-in: Requires --exa flag and EXA_API_KEY environment variable.
Setup:
bash
1export EXA_API_KEY=... # Get from https://exa.ai
Usage in REPL:
python
1from rlm_cli.tools_search import exa, web
2
3# Basic search
4results = exa.search(query="Python async patterns", limit=5)
5for r in results:
6 print(f"{r['title']}: {r['url']}")
7
8# With highlights (relevant excerpts)
9results = exa.search(
10 query="error handling best practices",
11 limit=3,
12 include_highlights=True
13)
14
15# Semantic alias
16results = web(query="machine learning tutorial", limit=5)
17
18# Find similar pages
19results = exa.find_similar(url="https://example.com/article", limit=5)
exa.search() parameters:
| Param | Default | Description |
|---|
query | required | Search query |
limit | 10 | Max results |
search_type | "auto" | "auto", "neural", or "keyword" |
include_domains | None | Only these domains |
exclude_domains | None | Exclude these domains |
include_text | False | Include full page text |
include_highlights | True | Include relevant excerpts |
category | None | "company", "research paper", "news", etc. |
When to use exa.search() / web():
- Finding external documentation, tutorials, articles
- Researching topics beyond the local codebase
- Finding similar pages to a reference URL
PageIndex (pi.* - Opt-in, Costs Money)
⚠️ WARNING: PageIndex sends document content to LLM APIs and costs money.
Only use when:
- User explicitly requests document/PDF analysis
- Document has hierarchical structure (reports, manuals)
- User accepts cost/privacy tradeoffs
Prerequisites:
OPENROUTER_API_KEY (or other backend key) must be set in environment
- PageIndex submodule must be initialized
- Run within rlm-cli's virtual environment (has required dependencies)
Setup (REQUIRED before any pi. operation):*
python
1import sys
2sys.path.insert(0, "/path/to/rlm-cli/rlm") # rlm submodule
3sys.path.insert(0, "/path/to/rlm-cli/pageindex") # pageindex submodule
4
5from rlm.clients import get_client
6from rlm_cli.tools_pageindex import pi
7
8# Configure with existing rlm backend
9client = get_client(backend="openrouter", backend_kwargs={"model_name": "google/gemini-2.0-flash-001"})
10pi.configure(client)
Indexing (costs $$$):
python
1# Build tree index - THIS COSTS MONEY (no caching, re-indexes each call)
2tree = pi.index(path="report.pdf")
3# Returns: PITree object with doc_name, nodes, doc_description, raw
Viewing structure (free after indexing):
python
1# Display table of contents
2print(pi.toc(tree))
3
4# Get section by node_id (IDs are "0000", "0001", "0002", etc.)
5section = pi.get_section(tree, "0003")
6# Returns: PINode with title, node_id, start_index, end_index, summary, children
7# Returns: None if not found
8
9if section:
10 print(f"{section.title}: pages {section.start_index}-{section.end_index}")
Finding node IDs:
Node IDs are assigned sequentially ("0000", "0001", ...) in tree traversal order.
To see all node IDs, access the raw tree structure:
python
1import json
2print(json.dumps(tree.raw["structure"], indent=2))
3# Each node has: title, node_id, start_index, end_index
pi. API Reference:*
| Method | Cost | Returns | Description |
|---|
pi.configure(client) | Free | None | Set rlm backend (REQUIRED first) |
pi.status() | Free | dict | Check availability, config, warning |
pi.index(path=str) | $$$ | PITree | Build tree from PDF |
pi.toc(tree, max_depth=3) | Free | str | Formatted table of contents |
pi.get_section(tree, node_id) | Free | PINode or None | Get section by ID |
pi.available() | Free | bool | Check if PageIndex installed |
pi.configured() | Free | bool | Check if client configured |
PITree attributes: doc_name, nodes (list of PINode), doc_description, raw (dict)
PINode attributes: title, node_id, start_index, end_index, summary (may be None), children (may be None)
Notes:
summary is only populated if add_summaries=True in pi.index()
children is None for leaf nodes (sections with no subsections)
tree.raw["structure"] is a flat list; hierarchy is in PINode.children
- PageIndex extracts document structure (TOC), not content. Use page numbers to locate sections in the original PDF.
Example output from pi.toc():
📄 annual_report.pdf
• Executive Summary (p.1-5)
• Financial Overview (p.6-20)
• Revenue (p.6-10)
• Expenses (p.11-15)
• Projections (p.16-20)
• Risk Factors (p.21-35)