Drop prxy.monster in front of your existing model calls. Keep your SDKs and provider keys. Every call gets cost attribution, policy metadata, a signed receipt, and an outcome loop.
No provider-token markup · BYOK · hash-only by default · Ed25519 receipts · works with Anthropic, OpenAI, Google, Groq, Bedrock, OpenRouter
One base URL. Provider keys stay yours. Provider bills inference; prxy bills the control layer.
Built for what broke this month
| What broke | Where | The module |
|---|---|---|
| Auto-compaction regression dropping user intent mid-session | Issue #36068 · Mar 19, 2026 | Compaction Bridge |
| MCP tool definitions burning 67K–143K tokens before you type | Apideck post · Mar 16, 2026 | MCP Optimizer |
| Public reports of AI coding-tool budgets outrunning forecasts | Benzinga report · Apr 2026 | Cost Guard |
| Claude Code users reporting rapid rate-limit drain | MacRumors report · Mar 26, 2026 | Semantic + Exact Cache |
| Claude Code subscription availability/pricing tests | Public timeline · Apr 2026 | MIT self-host |
| Context rot after ~2 hours of session | Widely reported · Apr 2026 | IPC + Rehydrator |
New high-signal agent incidents become module candidates. Read the dated log at /monster-log/ or rerun the proof at /benchmarks/ →
What you actually get
You send a request to api.prxy.monster with your existing Anthropic, OpenAI, or Bedrock key. The request flows through your configured module pipeline — caching, MCP optimization, pattern injection, cost guards — then hits your provider with your key. The response comes back the same way. Same wire format you already use.
STEP 01
curl -X POST https://api.prxy.monster/v1/messages \ -H "Authorization: Bearer $PRXY_KEY" \ -H "X-Provider-Key: $ANTHROPIC_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "claude-sonnet-4-6", "max_tokens": 200, "messages": [{"role":"user","content":"Hello, prxy."}] }'
SDK drop-in: just swap ANTHROPIC_BASE_URL
STEP 02
→ mcp-optimizer # prune tool defs → semantic-cache # dedupe near-matches → patterns # inject past solutions → cost-guard # enforce budget ceiling → your provider # your key, your bill
Toggle modules per key via PRXY_PIPE
STEP 03
{ "id": "msg_01HZ...", "type": "message", "role": "assistant", "content": [{ "type": "text", "text": "Hi. How can I help?" }], "usage": { "input_tokens": 12, "output_tokens": 8 } }
Cache hits skip repeated provider calls
prxy.monster does not bill you for tokens. Your provider bills you for tokens. We bill you for the gateway and the module pipeline. We never mark up inference.
Not an inference provider. Not a web proxy. Not a VPN. Not prxy.com.
How it works
For supported Anthropic Messages and OpenAI Chat Completions clients, change the base URL. Check the compatibility matrix for partial and planned APIs.
Every conversation forges patterns. Outcomes are tracked. Failures retire. Good solutions reinforce.
Patterns inject before each request. Context never resets. Your AI bill goes down over time.
Featured modules
Survives the auto-compaction regression in #36068. Re-injects user intent on every compaction boundary so your agent doesn't drop the thread mid-session.
The 67K-tokens-of-MCP problem. Scores each tool against the request and ships only the relevant ones. The local benchmark suite reports 33.4% average tool-token reduction across synthetic 120-tool fixtures.
Sessions don't have to start from zero. Injects relevant past solutions into the system prompt. Forges new patterns from successful resolutions. Compounds over time.
Repeat questions don't repeat costs. Embeds the request, replays the cached response above similarity threshold, and skips the upstream call on a hit. Hit rate depends on repeated workload shape.
AI coding-tool budgets outrunning forecasts. Per-key, per-day, per-month USD ceilings. 429 before the bill blows. Stops runaway agents in their tracks.
pricing transparency
| prxy.monster | OpenRouter | Portkey | Helicone | LiteLLM | |
|---|---|---|---|---|---|
| BYOK provider invoice | Provider bills you directly | Direct BYOK with post-free-tier fee | Customer provider key / gateway billing | Customer provider key / gateway billing | Self-hosted provider key |
| Published hosted billing unit | YES | Model token prices / credits | Recorded logs / requests | Requests + storage usage | Open source self-host |
| Provider inference markup on API-key BYOK | No markup; no provider settlement | No token markup; BYOK fee after 1M requests | No published token markup | No published token markup | No hosted bill in OSS mode |
| Public itemized payment ledger | receipts.prxy.monster | Not a public ledger | Not a public ledger | Not a public ledger | N/A |
| MCP token optimization | YES | NO | NO | NO | NO |
| Infinite context (compressed) | YES | NO | NO | NO | NO |
| Pattern learning across sessions | YES | NO | NO | NO | NO |
| Semantic cache | YES | NO | YES | NO | NO |
| Self-host (MIT/Apache) | YES — MIT | NO | YES — Apache 2.0 | YES | YES |
| Composable modules | YES | NO | NO | NO | NO |
prxy.monster charges for the gateway pipeline, not provider inference on BYOK routes. Sources: OpenRouter FAQ, Portkey pricing, Helicone pricing, LiteLLM GitHub.
the proof layer
Receipts prove what happened. Outcomes prove what mattered. Patterns reuse what worked. The four headers below ride on every routed call — public JWKS at /.well-known/prxy-receipt-keys.json closes the loop.
Cost by model, project, and agent. Cache hit / miss. Policy decision. Module chain. Provider, status, latency. Public_demo / public_minimal / public_redacted / public_full / private — your call.
receipts.prxy.monster/r/<id>Per-project monthly budgets with off / warn / hard_fail enforcement. Hash-only payload capture by default; encrypted-at-rest is opt-in. BYOK + provider routing.
blocked receipt for the audit trailnone mode disables capture beyond receipt hashesEd25519 over RFC 8785 JCS-canonicalized receipt body. Public JWKS, key id prxy-receipt-2026-q2. Verify in browser, in CLI, or in your own code.
prxy-cli receipt verify <id> runs the full canonicalize-and-verify path<VerifyBadge> on every public receipt pageSubmit an outcome anchored on a receipt. Positive outcomes feed memory_candidates. A reviewer in lair promotes useful candidates into patterns. Patterns ride into future calls.
POST /v1/outcomes — succeeded / failed / partially_solved / +8 morePlays nice
Same wire format for supported Anthropic Messages and OpenAI Chat Completions routes. Most integrations are a single base URL change.
AI Coding Tools
SDKs
Frameworks
Deploy
Hosted gateway. Zero ops. Account-scoped memory and cache.
Single local gateway. Private data volume. MIT licensed.
Dedicated deployment for teams that need their own account boundary.
Requests, not tokens. Your provider already charges you per token — we don't double-dip.
prxy_FREE
$0 forever
1,000 requests / month · hard cap
prxy_PRO
$20 / month
100,000 requests / month · then $0.20 per 1k
prxy_TEAM
$99 / month
1,000,000 requests / month · then $0.10 per 1k
One request = one HTTP call into our gateway. Streaming counts as one. Cached hits count as one. Failed-upstream calls don't count. API-key BYOK users pay providers directly at provider list rates; prxy.monster does not mark up that invoice. Managed MPP is separate: the $0.05 MPP price includes the upstream call when settlement is enabled.
Create a free account, get a PRXY API key, register your provider key, then paste this in your terminal. Same Anthropic Messages shape.
Common questions
/.well-known/prxy-receipt-keys.json by anyone — no prxy code required.
memory_candidates queue. A reviewer in lair promotes useful candidates into patterns, which then ride into future calls via the patterns module.
hash_only is the default. We persist the four canonical sha-256 hashes plus receipt metadata. Plaintext request and response bodies are not stored unless you opt in to encrypted_at_rest, in which case they are sealed under your X25519 public key. Outcome notes are hashed server-side and never persist in raw form. Full matrix at /security/data-retention/.
module_chain tells you exactly which modules ran.
public_demo receipts you can click into. Do not paste secrets, private code, or customer data — sandbox receipts are public_demo by design.
prxy-monster-local is the MIT self-host edition. Same module API as cloud. No telemetry to prxy.monster, no hosted sync, no managed MPP. State (cache, patterns, archived context) lives in your local volume. Use it when policy or compliance forbids sending traffic through a hosted gateway.
/v1/agent/quote, and /v1/agent/sessions are live. Production Stripe SPT settlement is gated on Stripe Link SPT GA + production-payment credentials. Until that is configured, paid retries can return verification-failed. Managed MPP is $0.05 per call and includes the upstream model call. The protocol surface is wired end-to-end so you can build against it now.
prxy-monster-local, @prxy/module-sdk, and prxy-cli are MIT-licensed on npm. Self-host the entire pipeline on your own infrastructure for free. The hosted gateway, lair operator dashboard, and the receipts ledger surfaces are closed source.
Create your account, choose a plan, and continue through Stripe Checkout. When payment succeeds, your prxy_ API key is provisioned and emailed automatically.