# Rate limits (/docs/api/operational/rate-limits)



Every API key gets a rate-limit tier baked in at create time. The tier decides how many `read-light`, `write-light`, and `long-running` calls per minute you can make, plus how many jobs can be running at once. Hit the limit and you get `429 RATE_LIMITED` with a `Retry-After` header — read it and sleep exactly that long.

## Tiers [#tiers]

| Tier       | Reads (rpm, `read-light`) | Writes (rpm, `write-light`) | Long-running starts (rpm, `long-running`) | Daily cap (`write-light`) |
| ---------- | ------------------------- | --------------------------- | ----------------------------------------- | ------------------------- |
| `standard` | 120                       | 60                          | 20                                        | 10,000                    |
| `pilot`    | 1,200                     | 600                         | 60                                        | 100,000                   |
| `partner`  | 6,000                     | 3,000                       | 300                                       | 500,000                   |

`standard` is the default for any key created self-serve. `pilot` is negotiated headroom for design-partner accounts; `partner` is the GIC-shape tier for production integrations. An `internal` tier exists but is reserved for Layers-owned keys (ladmin, ops scripts); it will not appear on a partner-visible `/whoami` response.

Check your current tier with [`GET /v1/whoami`](/docs/api/reference/organizations/whoami):

```json
{
  "organizationId": "2481fa5c-a404-...",
  "scopes": [],
  "rateLimitTier": "standard",
  "killSwitch": false
}
```

## Buckets are per endpoint class [#buckets-are-per-endpoint-class]

One bucket for reads, one for writes, one for long-running starts. They're separate. A burst of content-generation POSTs cannot starve out your polling of `/v1/jobs/:id`. The endpoint class is fixed per route and we don't move routes between classes silently.

* **`read-light`** — every `GET`. Polling jobs, reading metrics, listing scheduled posts.
* **`write-light`** — `PATCH`, `DELETE`, small `POST` that completes synchronously (approve, reject, create OAuth URL).
* **`long-running`** — the `POST` that kicks off a job: ingest, generate, clone, create influencer.

The class is exposed on every response in the `X-RateLimit-Endpoint-Class` header so you know which bucket a call drew from.

Concurrent-jobs is a different dimension. It caps how many jobs can be in `running` state for your org at once, regardless of how fast you started them. Hitting it returns `429` on the POST that would exceed the cap. Wait for existing jobs to finish, then retry.

## Response headers [#response-headers]

Every response — success or error — carries the current state of your bucket. Log these if you want to see the limit approaching.

```http
X-RateLimit-Endpoint-Class: read-light
X-RateLimit-Limit: 120
X-RateLimit-Remaining: 119
X-RateLimit-Reset: 1776572820
X-RateLimit-Tier: standard
```

* `X-RateLimit-Endpoint-Class` — which bucket this call drew from (`read-light`, `write-light`, `long-running`).
* `X-RateLimit-Limit` — your tier's per-minute cap for that class.
* `X-RateLimit-Remaining` — tokens left in the current bucket.
* `X-RateLimit-Reset` — Unix epoch seconds when the bucket refills.
* `X-RateLimit-Tier` — your key's tier (`standard`, `pilot`, `partner`).

Every response reports exactly the bucket the call touched. To see the state of a different bucket, issue a request that hits it.

## The 429 response [#the-429-response]

```json
{
  "error": {
    "code": "RATE_LIMITED",
    "message": "Rate limit exceeded on write-light.",
    "requestId": "req_01HXZ9G7...",
    "details": {
      "endpointClass": "write-light",
      "retryAfterMs": 12400
    }
  }
}
```

Headers:

```http
Retry-After: 13
X-RateLimit-Endpoint-Class: write-light
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1745000013
```

Sleep `Retry-After` seconds, then retry with the same `Idempotency-Key`. Don't jitter — the server staggered your bucket reset for you.

<Callout type="warn">
  Don't build a fleet of workers that all watch one key. Ten workers hammering
  the same token bucket is ten workers serialized behind each other. Spread
  load across keys if you need real parallelism, or batch work before the call.
</Callout>

## Kill switch [#kill-switch]

A per-key kill switch lets us (or you, via your Layers contact) flip a key off without revoking it. The key stays in the table; every request fails with `503 KILL_SWITCH`.

```json
{
  "error": {
    "code": "KILL_SWITCH",
    "message": "This API key has been temporarily disabled.",
    "requestId": "req_01HXZ9G7...",
    "details": { "scope": "key" }
  }
}
```

`details.scope` is one of:

* `key` — your specific key is off. Contact us to turn it back on.
* `organization` — your whole org is off. Same story.
* `global` — the entire partner API is off. Incident only. Watch the status page and retry when we post recovery.

We flip the kill switch for two reasons: a runaway client burning credits, or an upstream incident we need to contain. We tell you either way. It's not a silent fail.

## Rate-limiter fallback [#rate-limiter-fallback]

Our rate limiter has an in-memory fallback if its primary backend is unreachable. In that mode, the response carries `X-RateLimit-Fallback: memory` alongside the standard headers. This is intentional: we'd rather let you through than black-hole your traffic on our infrastructure problem. We flip the global kill switch if the downstream can't handle the flood.

You don't need to code for this. If the headers aren't there, treat it as "we don't know" and don't change your behavior.

## Long-running jobs and rate limiting [#long-running-jobs-and-rate-limiting]

Starting a job costs one token from the `long-running` bucket. Polling `/v1/jobs/:id` costs one `read-light` token per poll. A sensible poll loop at 2s intervals over a 60s job costs 30 reads — well under the `standard` tier's 120 rpm read cap.

Polling faster than once per second is a waste of your bucket. The job state doesn't advance faster than we can update it.

## Requesting an increase [#requesting-an-increase]

Hit the `standard` ceiling and need room to test? Message your Layers contact with:

* Your `organizationId` (from `GET /v1/whoami`).
* Which bucket you're saturating (reads, writes, long-running, concurrent jobs).
* A traffic shape — sustained rpm, burst ceiling, or both.

We read that and flip the tier. It takes minutes, not a ticket cycle.

## See also [#see-also]

* [Errors](/docs/api/operational/errors)
* [Jobs](/docs/api/concepts/jobs)
* [Idempotency](/docs/api/operational/idempotency)
