Layers
Partner APIOperational

Rate limits

Tiers, per-endpoint-class buckets, Retry-After, and how to stay under the line.

View as Markdown

Every API key gets a rate-limit tier baked in at create time. The tier decides how many read-light, write-light, and long-running calls per minute you can make, plus how many jobs can be running at once. Hit the limit and you get 429 RATE_LIMITED with a Retry-After header — read it and sleep exactly that long.

Tiers

TierReads (rpm, read-light)Writes (rpm, write-light)Long-running starts (rpm, long-running)Daily cap (write-light)
standard120602010,000
pilot1,20060060100,000
partner6,0003,000300500,000

standard is the default for any key created self-serve. pilot is negotiated headroom for design-partner accounts; partner is the GIC-shape tier for production integrations. An internal tier exists but is reserved for Layers-owned keys (ladmin, ops scripts); it will not appear on a partner-visible /whoami response.

Check your current tier with GET /v1/whoami:

{
  "organizationId": "2481fa5c-a404-...",
  "scopes": [],
  "rateLimitTier": "standard",
  "killSwitch": false
}

Buckets are per endpoint class

One bucket for reads, one for writes, one for long-running starts. They're separate. A burst of content-generation POSTs cannot starve out your polling of /v1/jobs/:id. The endpoint class is fixed per route and we don't move routes between classes silently.

  • read-light — every GET. Polling jobs, reading metrics, listing scheduled posts.
  • write-lightPATCH, DELETE, small POST that completes synchronously (approve, reject, create OAuth URL).
  • long-running — the POST that kicks off a job: ingest, generate, clone, create influencer.

The class is exposed on every response in the X-RateLimit-Endpoint-Class header so you know which bucket a call drew from.

Concurrent-jobs is a different dimension. It caps how many jobs can be in running state for your org at once, regardless of how fast you started them. Hitting it returns 429 on the POST that would exceed the cap. Wait for existing jobs to finish, then retry.

Response headers

Every response — success or error — carries the current state of your bucket. Log these if you want to see the limit approaching.

X-RateLimit-Endpoint-Class: read-light
X-RateLimit-Limit: 120
X-RateLimit-Remaining: 119
X-RateLimit-Reset: 1776572820
X-RateLimit-Tier: standard
  • X-RateLimit-Endpoint-Class — which bucket this call drew from (read-light, write-light, long-running).
  • X-RateLimit-Limit — your tier's per-minute cap for that class.
  • X-RateLimit-Remaining — tokens left in the current bucket.
  • X-RateLimit-Reset — Unix epoch seconds when the bucket refills.
  • X-RateLimit-Tier — your key's tier (standard, pilot, partner).

Every response reports exactly the bucket the call touched. To see the state of a different bucket, issue a request that hits it.

The 429 response

{
  "error": {
    "code": "RATE_LIMITED",
    "message": "Rate limit exceeded on write-light.",
    "requestId": "req_01HXZ9G7...",
    "details": {
      "endpointClass": "write-light",
      "retryAfterMs": 12400
    }
  }
}

Headers:

Retry-After: 13
X-RateLimit-Endpoint-Class: write-light
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1745000013

Sleep Retry-After seconds, then retry with the same Idempotency-Key. Don't jitter — the server staggered your bucket reset for you.

Don't build a fleet of workers that all watch one key. Ten workers hammering the same token bucket is ten workers serialized behind each other. Spread load across keys if you need real parallelism, or batch work before the call.

Kill switch

A per-key kill switch lets us (or you, via your Layers contact) flip a key off without revoking it. The key stays in the table; every request fails with 503 KILL_SWITCH.

{
  "error": {
    "code": "KILL_SWITCH",
    "message": "This API key has been temporarily disabled.",
    "requestId": "req_01HXZ9G7...",
    "details": { "scope": "key" }
  }
}

details.scope is one of:

  • key — your specific key is off. Contact us to turn it back on.
  • organization — your whole org is off. Same story.
  • global — the entire partner API is off. Incident only. Watch the status page and retry when we post recovery.

We flip the kill switch for two reasons: a runaway client burning credits, or an upstream incident we need to contain. We tell you either way. It's not a silent fail.

Rate-limiter fallback

Our rate limiter has an in-memory fallback if its primary backend is unreachable. In that mode, the response carries X-RateLimit-Fallback: memory alongside the standard headers. This is intentional: we'd rather let you through than black-hole your traffic on our infrastructure problem. We flip the global kill switch if the downstream can't handle the flood.

You don't need to code for this. If the headers aren't there, treat it as "we don't know" and don't change your behavior.

Long-running jobs and rate limiting

Starting a job costs one token from the long-running bucket. Polling /v1/jobs/:id costs one read-light token per poll. A sensible poll loop at 2s intervals over a 60s job costs 30 reads — well under the standard tier's 120 rpm read cap.

Polling faster than once per second is a waste of your bucket. The job state doesn't advance faster than we can update it.

Requesting an increase

Hit the standard ceiling and need room to test? Message your Layers contact with:

  • Your organizationId (from GET /v1/whoami).
  • Which bucket you're saturating (reads, writes, long-running, concurrent jobs).
  • A traffic shape — sustained rpm, burst ceiling, or both.

We read that and flip the tier. It takes minutes, not a ticket cycle.

See also

On this page