PostgreSQL Read Replicas for Multi-Tenant SaaS: Routing, Lag, and Connection Management

Q: What should you monitor for replica health?

Three metrics cover most replica operational concerns Replication lag in seconds: query pglastxactreplaytimestamp() on the replica every 30 seconds. Alert when lag exceeds 10 seconds for a database handling near-real-time data, or 60 seconds for an analytics replica where some lag is expected.

Adding a read replica to a PostgreSQL SaaS database sounds straightforward until you hit replication lag on a tenant who just created a record and immediately tries to read it back. This is the operational reality of read replicas in multi-tenant systems, and here is how we handle routing, lag, and connection management without introducing a new class of consistency bugs.

Adding a read replica to a PostgreSQL SaaS database sounds straightforward until you hit replication lag on a tenant who just created a record and immediately tries to read it back. The application reads stale data, the tenant reports a bug, and you spend a day convincing yourself the write actually succeeded. This is the operational reality of read replicas in multi-tenant systems, and here is how we handle routing, lag, and connection management without introducing a new class of consistency bugs.

When do read replicas actually help multi-tenant SaaS workloads?

Read replicas help when your SaaS workload is genuinely read-heavy and the reads are not immediately following writes. Reporting, analytics, list views, and search queries that run on data that is minutes or hours old are ideal for replica routing.

Read replicas do not help with write-heavy workloads. PostgreSQL streaming replication is asynchronous by default. Writes do not wait for replica acknowledgment, so adding a replica to a write-heavy database does not reduce primary load meaningfully.

For a typical SaaS application serving Lebanese and Gulf enterprise customers, the workload breakdown tends to be:

Dashboard and reporting: 60 to 70% of queries, read-only, data can be seconds old
List views and search: 20 to 25% of queries, read-only, acceptable replication lag
Write operations and immediate read-back: 10 to 15% of queries, must go to primary

Read replicas reduce primary load by routing that 60 to 70% to replicas, which also reduces connection pressure on the primary.

How do you route queries between primary and replica in Go?

The simplest and safest routing strategy is explicit per-query routing using two separate database pools: one pointing to the primary and one to the replica. The application code calls the right pool for the right query type:

type DB struct {
    primary *sqlx.DB
    replica *sqlx.DB
}

func (d *DB) Primary() *sqlx.DB { return d.primary }
func (d *DB) Replica() *sqlx.DB { return d.replica }

Repository code makes the routing decision explicitly:

func (r *OrderRepo) ListOrdersForReport(ctx context.Context, tenantID uuid.UUID) ([]Order, error) {
    // Dashboard report — stale data acceptable
    return r.db.Replica().SelectContext(ctx, ...)
}

func (r *OrderRepo) GetOrderAfterCreate(ctx context.Context, id uuid.UUID) (*Order, error) {
    // Follows a write — must go to primary
    return r.db.Primary().GetContext(ctx, ...)
}

This requires developers to think about routing at the call site, which is explicit, auditable, and prevents accidental replica reads after writes. An alternative is middleware-level routing based on HTTP method: GET requests use replica, POST/PUT/PATCH/DELETE use primary. This works for simple API surfaces but breaks down when GET endpoints follow writes (for example, a redirect-after-create that returns the new resource).

How do you handle the write-then-read consistency problem?

The write-then-read problem is the most common operational issue with read replicas. A tenant creates an order, the write commits on the primary, and before the WAL propagates to the replica, the client reloads the page and queries the replica for the new order. The order is not there yet.

Three approaches work in production:

Read-your-writes via context flag: after a write, mark the request context to require primary for subsequent reads within the same request:

type contextKey string
const mustUsePrimaryKey contextKey = "mustUsePrimary"

func (d *DB) ForRead(ctx context.Context) *sqlx.DB {
    if ctx.Value(mustUsePrimaryKey) == true {
        return d.primary
    }
    return d.replica
}

func MarkMustUsePrimary(ctx context.Context) context.Context {
    return context.WithValue(ctx, mustUsePrimaryKey, true)
}

Replication lag check: read the replica's replication lag before routing. If lag exceeds a threshold, fall back to primary:

func (d *DB) ForReadWithLagCheck(ctx context.Context, maxLagSeconds float64) *sqlx.DB {
    var lag float64
    d.replica.QueryRowContext(ctx,
        `SELECT EXTRACT(EPOCH FROM (NOW() - pg_last_xact_replay_timestamp()))`,
    ).Scan(&lag)
    if lag > maxLagSeconds {
        return d.primary
    }
    return d.replica
}

This adds one round-trip per read, but the lag check query is cheap and the overhead is acceptable for dashboard endpoints.

Accept eventual consistency explicitly: for features where stale reads are genuinely acceptable (reporting, read-only admin views), document this in the code and return appropriate cache-control headers. This is the cleanest approach when the feature semantics actually support it.

How do you configure connection pools for primary and replica?

Primary and replica pools need different tuning. The primary handles reads and writes with lock contention from writes limiting effective concurrency. The replica handles only reads and supports more concurrent connections without the same pressure.

A reasonable starting configuration on AWS RDS db.t4g.medium:

primaryDB.SetMaxOpenConns(20)
primaryDB.SetMaxIdleConns(5)
primaryDB.SetConnMaxLifetime(30 * time.Minute)

replicaDB.SetMaxOpenConns(40)
replicaDB.SetMaxIdleConns(10)
replicaDB.SetConnMaxLifetime(30 * time.Minute)

Monitor pg_stat_activity on both instances. If the primary consistently shows more than 15 active connections on a t4g.medium, you are running close to the limit and should add PgBouncer in front of the primary. A previous post in this series covers PgBouncer configuration in detail.

What should you monitor for replica health?

Three metrics cover most replica operational concerns:

Replication lag in seconds: query pg_last_xact_replay_timestamp() on the replica every 30 seconds. Alert when lag exceeds 10 seconds for a database handling near-real-time data, or 60 seconds for an analytics replica where some lag is expected.

Replica connections versus limit: track pg_stat_activity on the replica the same way as the primary. A replica running out of connections due to a dashboard query spike will start refusing read connections, forcing fallback to the primary.

WAL receiver status: pg_stat_wal_receiver shows whether the replica is actively receiving WAL from the primary. A stopped WAL receiver is an early signal of replica lag before it shows up in the lag metric.

Set up a Prometheus exporter (postgres_exporter) on both primary and replica. A Grafana dashboard with lag, active connections, and query rate per instance gives you the visibility to diagnose replica issues before tenants report them.

Key lessons from production

Instrument which database every query hits before adding a replica. Without query-level database tagging, you cannot verify that routing is working correctly, and you cannot debug write-then-read consistency issues when they appear.

Start routing only reporting and analytics queries to the replica. Do not attempt to route all reads immediately. Add categories incrementally, verify consistency in each batch, then move to the next.

Test replica failover. If your primary fails and AWS promotes the replica to primary, your application needs to reconnect to the new primary endpoint. Verify your connection retry logic handles this before it happens in production at 3am.

When do read replicas actually help multi-tenant SaaS workloads?

How do you route queries between primary and replica in Go?

How do you handle the write-then-read consistency problem?

How do you configure connection pools for primary and replica?

What should you monitor for replica health?

Key lessons from production

Not sure where to start?

Keep reading

Feature Flags in Production Go SaaS: Deploying Features Without Restarts

Production Audit Logs in Go SaaS: Design, Storage, and What Most Teams Get Wrong

Go API Testing in Production SaaS: The Patterns That Actually Catch Real Bugs