Rate Limiting and API Throttling in Go for Multi-Tenant SaaS

Rate limiting is how you protect your SaaS infrastructure from one tenant overloading the system for everyone else. Getting it wrong means either throttling legitimate users unnecessarily or letting a badly-written API client bring down your database. This is how we implement per-tenant rate limiting in Go SaaS backends.

تحديد معدل الطلبات هو كيف تحمي بنية SaaS التحتية من مستأجر واحد يُثقل النظام على الجميع. الخطأ في التطبيق يعني إما تقييد المستخدمين الشرعيين دون سبب، أو السماح لعميل API مكتوب بشكل سيئ بإسقاط قاعدة بياناتك. هذا كيف ننفذ تحديد معدل الطلبات لكل مستأجر في تطبيقات Go SaaS.

Why does multi-tenant rate limiting matter specifically?

In a single-tenant application, rate limiting protects your infrastructure from external abuse. In a multi-tenant SaaS, it protects tenants from each other.

A tenant with a badly-configured integration might issue hundreds of API requests per second in a tight loop. Without per-tenant rate limiting, those requests consume database connections, saturate worker pools, and degrade API response times for every other tenant on the platform. The tenant generating the traffic may not even notice the problem, because their requests are succeeding.

For SaaS platforms serving Lebanese and Gulf enterprise customers, where API integrations are often built by in-house IT teams under time pressure, poorly-written API clients are the norm rather than the exception. Rate limiting is not pessimism, it is operational hygiene.

Choosing the right rate limiting algorithm

Three algorithms cover most use cases:

Fixed window counts requests in a time window (e.g., 1000 requests per minute). Simple to implement, but has a boundary problem: a burst of 1000 requests at 12:00:59 followed by 1000 requests at 12:01:01 totals 2000 requests in two seconds while passing both window checks.

Sliding window computes the request count over a rolling window anchored to the current time, not to fixed clock boundaries. More accurate than fixed window, slightly more expensive to implement correctly in a distributed system.

Token bucket maintains a bucket of tokens that fills at a constant rate. Each request consumes one token. If the bucket is empty, the request is throttled. The bucket can hold a maximum number of tokens, which defines the burst capacity.

The token bucket is the right choice for most SaaS APIs because it allows short bursts above the average rate while enforcing the long-term rate limit. An integration that fires 50 requests in two seconds then is quiet for a minute is handled correctly: the burst is consumed from the bucket, and the bucket refills during the quiet period.

Implementing token bucket rate limiting in Go with Redis

The token bucket must be implemented in a way that is correct across multiple API server instances. A simple in-memory bucket per process is not correct: requests to the same tenant can hit different server instances, and the buckets drift out of sync.

Redis with a Lua script provides an atomic token bucket that is consistent across instances:

-- Token bucket in Redis Lua
-- KEYS[1]: bucket key
-- ARGV[1]: max_tokens, ARGV[2]: refill_rate (tokens/second)
-- ARGV[3]: cost (tokens per request), ARGV[4]: now (unix timestamp float)

local key = KEYS[1]
local max_tokens = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local cost = tonumber(ARGV[3])
local now = tonumber(ARGV[4])

local data = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(data[1]) or max_tokens
local last_refill = tonumber(data[2]) or now

-- Refill tokens based on elapsed time
local elapsed = now - last_refill
local new_tokens = math.min(max_tokens, tokens + elapsed * refill_rate)

if new_tokens < cost then
    -- Not enough tokens
    redis.call('HSET', key, 'tokens', new_tokens, 'last_refill', now)
    redis.call('EXPIRE', key, 3600)
    return {0, math.ceil((cost - new_tokens) / refill_rate * 1000)}
end

-- Consume tokens
new_tokens = new_tokens - cost
redis.call('HSET', key, 'tokens', new_tokens, 'last_refill', now)
redis.call('EXPIRE', key, 3600)
return {1, 0} -- allowed, retry_after_ms

The Lua script executes atomically in Redis, so there are no race conditions even with thousands of concurrent requests across multiple API server instances.

The Go wrapper:

type RateLimiter struct {
    redis  *redis.Client
    script *redis.Script
}

func (rl *RateLimiter) Allow(ctx context.Context, tenantID string, maxTokens, refillRate float64) (bool, time.Duration, error) {
    key := fmt.Sprintf("rate_limit:tenant:%s", tenantID)
    now := float64(time.Now().UnixNano()) / 1e9

    result, err := rl.script.Run(ctx, rl.redis,
        []string{key},
        maxTokens, refillRate, 1, now,
    ).Int64Slice()
    if err != nil { return false, 0, err }

    allowed := result[0] == 1
    retryAfter := time.Duration(result[1]) * time.Millisecond
    return allowed, retryAfter, nil
}

Middleware integration

The rate limiter integrates as HTTP middleware that runs before any handler logic:

func RateLimitMiddleware(limiter *RateLimiter, limits TenantLimitResolver) func(http.Handler) http.Handler {
    return func(next http.Handler) http.Handler {
        return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
            tenantID := TenantFromContext(r.Context())
            maxTokens, refillRate := limits.Resolve(tenantID)

            allowed, retryAfter, err := limiter.Allow(r.Context(), tenantID, maxTokens, refillRate)
            if err != nil {
                // On Redis failure, fail open (allow request through)
                next.ServeHTTP(w, r)
                return
            }

            if !allowed {
                w.Header().Set("Retry-After", fmt.Sprintf("%.0f", retryAfter.Seconds()))
                w.Header().Set("X-RateLimit-Limit", fmt.Sprintf("%.0f", maxTokens))
                http.Error(w, "rate limit exceeded", http.StatusTooManyRequests)
                return
            }

            w.Header().Set("X-RateLimit-Limit", fmt.Sprintf("%.0f", maxTokens))
            next.ServeHTTP(w, r)
        })
    }
}

Notice the fail-open behavior when Redis is unavailable. Rate limiting is important, but it should not be a single point of failure. If Redis goes down, the API continues to serve requests rather than blocking all traffic.

Per-plan limits and tenant resolution

Different subscription plans warrant different rate limits. A free plan customer should not get the same rate limit as an enterprise customer on a dedicated contract.

The TenantLimitResolver interface allows limits to come from any source: database, environment variables, or a configuration service:

type TenantLimits struct {
    MaxTokens   float64 // burst capacity
    RefillRate  float64 // tokens per second
}

func (r *DatabaseLimitResolver) Resolve(tenantID string) (float64, float64) {
    limits, ok := r.cache.Get(tenantID)
    if !ok {
        limits = r.fetchFromDB(tenantID)
        r.cache.Set(tenantID, limits, 5*time.Minute)
    }
    return limits.MaxTokens, limits.RefillRate
}

Caching limits for 5 minutes prevents a database lookup on every request while allowing plan changes to take effect within a short window.

A reasonable limit structure for a SaaS API:

| Plan | Burst (tokens) | Rate (req/second) | |------------|----------------|-------------------| | Free | 20 | 2 | | Startup | 100 | 10 | | Growth | 500 | 50 | | Enterprise | 2000 | 200 |

Handling bursts from legitimate integrations

Not all burst traffic is abusive. An integration that processes end-of-day reconciliation data, a mobile app that syncs when it comes back online, or a restaurant POS system that flushes its offline queue after a network reconnect will generate burst traffic that is legitimate and expected.

Design the burst capacity to accommodate these patterns. An enterprise restaurant chain in Lebanon that goes offline during a power cut and reconnects with 30 minutes of queued transactions should not be throttled for doing exactly what the system was designed to handle.

For MENA SaaS platforms specifically, brief internet outages followed by reconnection bursts are common enough that burst capacity should be sized generously. A restaurant POS customer who cannot sync their transactions during a power outage is a support call, a chargeback request, and a cancellation risk.

Surfacing rate limit information to API consumers

API consumers cannot manage their integration correctly if they do not know they are being rate-limited or how close they are to the limit.

Standard headers to include on every response:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 73
X-RateLimit-Reset: 1716000060
Retry-After: 12  (only on 429 responses)

Well-behaved API clients read these headers and back off proactively before hitting the limit. Poor API clients that ignore these headers and continue until they receive 429 errors are exactly the ones rate limiting protects you from.

Key lessons from production

Fail open on Redis failure. Rate limiting should not be a single point of failure for your API. An unavailable Redis instance should result in unthrottled requests, not 503 errors for all tenants.

Burst capacity is a feature, not a safety risk. SaaS customers legitimately need to burst above their average rate. The token bucket algorithm handles this correctly without requiring special-case code.

Log every throttled request with tenant_id and rate limit context. Throttling is a signal, not just a protection mechanism. Tenants consistently hitting their limit are candidates for plan upgrades, and the data should be visible in your analytics.

Test rate limiting explicitly in your integration test suite. A race condition in rate limit logic is one of the most dangerous bugs in a multi-tenant system because it can allow one tenant to starve another of API capacity.

Why does multi-tenant rate limiting matter specifically?

Choosing the right rate limiting algorithm

Implementing token bucket rate limiting in Go with Redis

Middleware integration

Per-plan limits and tenant resolution

Handling bursts from legitimate integrations

Surfacing rate limit information to API consumers

Key lessons from production

Not sure where to start?

Keep reading

Go Concurrency Patterns for SaaS Backends: Worker Pools, Channels, and Backpressure

Idempotency Keys in Go SaaS: Preventing Duplicate Operations at Scale

Background Job Queues in Go with Redis: Patterns for SaaS Operations