Rate limits

View as Markdown

Every API key gets a rate-limit tier baked in at create time. The tier decides how many read-light, write-light, and long-running calls you can make, plus how many jobs can be running at once. Hit the limit and you get 429 RATE_LIMITED with a Retry-After header - honor it before retrying.

Tiers

Tier	Reads (rpm, `read-light`)	Writes (rpm, `write-light`)	Long-running starts (rpm, `long-running`)	Daily cap (`write-light`)
`standard`	120	60	20	10,000
`pilot`	1,200	600	60	100,000
`partner`	6,000	3,000	300	500,000

standard is the default for partner keys. pilot adds headroom for early integrations; partner is the enterprise tier for production integrations.

Check your current tier with GET /v1/whoami:

{
  "organizationId": "2481fa5c-a404-...",
  "scopes": [],
  "rateLimitTier": "standard"
}

Buckets are per endpoint class

One bucket for reads, one for writes, one for long-running starts. They're separate. A burst of content-generation POSTs cannot starve out your polling of /v1/jobs/:id. The endpoint class is fixed per route and we don't move routes between classes silently.

read-light - every GET. Polling jobs, reading metrics, listing scheduled posts.
write-light - PATCH, DELETE, small POST that completes synchronously (approve, reject, create OAuth URL).
long-running - the POST that kicks off a job: ingest, generate, clone, create influencer.

The class is exposed on every response in the X-RateLimit-Endpoint-Class header so you know which bucket a call drew from.

Concurrent-jobs is a different dimension. It caps how many jobs can be in running state for your org at once, regardless of how fast you started them. Hitting it returns 429 on the POST that would exceed the cap. Wait for existing jobs to finish, then retry.

Response headers

Every response - success or error - carries the current state of your bucket. Log these if you want to see the limit approaching.

X-RateLimit-Endpoint-Class: read-light
X-RateLimit-Limit: 120
X-RateLimit-Remaining: 119
X-RateLimit-Reset: 1776572820
X-RateLimit-Tier: standard

X-RateLimit-Endpoint-Class - which bucket this call drew from (read-light, write-light, long-running).
X-RateLimit-Limit - your tier's cap for that class.
X-RateLimit-Remaining - tokens left in the current bucket.
X-RateLimit-Reset - when the bucket refills.
X-RateLimit-Tier - your key's tier (standard, pilot, partner).

Every response reports exactly the bucket the call touched. To see the state of a different bucket, issue a request that hits it.

The 429 response

{
  "error": {
    "code": "RATE_LIMITED",
    "message": "Rate limit exceeded on write-light.",
    "requestId": "req_01HXZ9G7...",
    "details": {
      "endpointClass": "write-light",
      "retryAfterMs": 12400
    }
  }
}

Headers:

Retry-After: 13
X-RateLimit-Endpoint-Class: write-light
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1745000013

Honor Retry-After, then retry with the same Idempotency-Key. Don't jitter - the server staggered your bucket reset for you.

For high-volume integrations, batch work before calling the API and watch the response headers for each bucket.

Kill switch

A per-key kill switch lets us disable a key without revoking it. Every request fails with 503 KILL_SWITCH until the switch is cleared.

{
  "error": {
    "code": "KILL_SWITCH",
    "message": "This API key has been temporarily disabled.",
    "requestId": "req_01HXZ9G7...",
    "details": { "scope": "key" }
  }
}

details.scope is one of:

key - your specific key is off. Contact us to turn it back on.
organization - your whole org is off. Same story.
global - the entire partner API is off. Incident only. Watch the status page and retry when we post recovery.

We flip the kill switch for two reasons: a runaway client burning credits, or an upstream incident we need to contain. We tell you either way. It's not a silent fail.

Rate-limiter fallback

Our rate limiter has an in-memory fallback if its primary backend is unreachable. In that mode, the response carries X-RateLimit-Fallback: memory alongside the standard headers. This is intentional: we'd rather let you through than black-hole your traffic on our infrastructure problem. We flip the global kill switch if the downstream can't handle the flood.

You don't need to code for this. If the headers aren't there, treat it as "we don't know" and don't change your behavior.

Long-running jobs and rate limiting

Starting a job costs one token from the long-running bucket. Polling /v1/jobs/:id costs one read-light token per poll. A sensible poll loop at 2s intervals over a 60s job costs 30 reads - well under the standard tier's 120 rpm read cap.

Poll with backoff. Job state is not emitted on every request.

Requesting an increase

Hit the standard ceiling and need room to test? Message your Layers contact with:

Your organizationId (from GET /v1/whoami).
Which bucket you're saturating (reads, writes, long-running, concurrent jobs).
A traffic shape - sustained rpm, burst ceiling, or both.

We read that and flip the tier.