Structured Logging in Go Production: Context That Actually Helps

Logs that say 'user not found' without context are the most expensive noise in production. You find the line but not the cause. Structured logging with consistent field propagation through request context solves this, and Go 1.21's slog package makes it the natural default for new services.

Logs that say "user not found" or "payment failed" without any surrounding context are the most expensive kind of noise in production. You find the log line but not the cause: which user, which request path, which tenant, what the system state was three hops earlier. Structured logging with consistent context propagation solves this, and Go 1.21's built-in slog package makes it the natural default for new services in Lebanon and across the MENA region.

What breaks without structured logging in production

The symptom shows up during incidents. An on-call engineer opens CloudWatch or Datadog, searches for an error string, and finds 400 log lines with no context. Each one says "database query failed" with a generic stack trace. Without a request ID, a tenant ID, or an operation name, it is impossible to trace one failure from HTTP entry to the error site without combing through correlated timestamps and hoping the volume is low enough to make manual correlation feasible.

This is not a hypothetical. Every Go service that starts with fmt.Println or log.Printf ends here within six months of real production traffic.

How slog changes the model in Go 1.21

slog ships with Go 1.21 as the standard library structured logger. It centers on three things: key-value attributes attached to individual log calls, a Logger that carries a persistent set of attributes across calls, and pluggable handlers that format output as JSON or text.

The model that works for SaaS backends is a request-scoped logger: one logger per request, created at the HTTP or gRPC entry point, pre-populated with fields that identify the request, then threaded through context so downstream functions can log to it without knowing the request details themselves.

func NewRequestLogger(parent *slog.Logger, r *http.Request, tenantID, requestID string) *slog.Logger {
    return parent.With(
        slog.String("tenant_id", tenantID),
        slog.String("request_id", requestID),
        slog.String("method", r.Method),
        slog.String("path", r.URL.Path),
    )
}

Every log call made with this logger carries the four fields automatically. Middleware attaches the logger to the request context so any function downstream can retrieve it without receiving it as an explicit argument.

type contextKey struct{}

func WithLogger(ctx context.Context, logger *slog.Logger) context.Context {
    return context.WithValue(ctx, contextKey{}, logger)
}

func LoggerFromCtx(ctx context.Context) *slog.Logger {
    if l, ok := ctx.Value(contextKey{}).(*slog.Logger); ok {
        return l
    }
    return slog.Default()
}

Database functions, third-party API callers, and queue handlers all use the same request-correlated logger without any coupling to the HTTP layer.

What fields belong in every log line

In production MENA SaaS deployments, the field set that consistently makes incidents faster to resolve is:

tenant_id: required on every log line in a multi-tenant system. Without it, you cannot filter to one customer's data during an incident.
request_id: a UUID generated at the entry point. Correlates all log lines for one HTTP request across services if you propagate it as a header.
trace_id: if you run distributed tracing with OpenTelemetry, attach the trace ID so logs and traces are joinable.
user_id: where the request is authenticated. Never log session tokens or passwords.
operation: the named operation being performed, not just the HTTP method and path.
duration_ms: for any measurable operation. Logging duration at the site of the operation instead of only at middleware level lets you see which sub-operation inside a request is slow.

Keep the base context small. A log line with 20 fields is harder to read than one with 6 focused ones. Add fields at the call site only when they add information the base context does not carry.

Handling errors in structured logs

The most useful pattern for error logs is to attach the error as a structured field rather than formatting it into the message string:

if err != nil {
    logger.Error("failed to charge payment",
        slog.String("payment_id", paymentID),
        slog.String("error", err.Error()),
    )
    return err
}

When errors wrap upstream errors with fmt.Errorf("%w", err), log the full chain at the point of handling, not at the point of origin. Logging at every return adds noise. Log at the boundary where the error stops being propagated.

For errors from external systems like Stripe, AWS, or a third-party SMS gateway, include the external error code if the API provides one. This is the field that makes the difference between "payment failed" and "payment failed: Stripe code insufficient_funds, decline_code card_not_supported".

Log levels in production

Most SaaS teams overconfigure log levels. The model that works in production:

ERROR: something is broken and a human should look at it
WARN: something is abnormal but the system recovered or degraded gracefully
INFO: significant lifecycle events (request completed, job started, deployment event)
DEBUG: disabled in production; enabled in staging for specific components

Do not log at INFO for every database query or cache hit. At any meaningful request volume, INFO-level query logging drowns out the events that matter. Use metrics for counts and latencies; reserve logs for discrete events with meaningful state.

In ECS-hosted Go services, stdout from each container is captured and forwarded to CloudWatch. JSON-formatted slog output flows into CloudWatch Logs Insights natively without additional parsing configuration.

Sampling high-volume logs

Some operations generate log lines at rates that become expensive. A SaaS product serving 500 requests per second with full INFO logging produces a volume that costs real money in CloudWatch or Datadog.

A simple in-process sampler:

func ShouldSampleInfo() bool {
    return rand.Intn(100) < 10 // 10% sample rate for routine INFO events
}

Apply this at the middleware level for routine request completions. Never sample errors or warnings. This pattern cuts CloudWatch costs by 60 to 80 percent on high-volume services without losing any incident-relevant data.

Correlating logs with traces

In a production setup with OpenTelemetry, inject the trace ID from the active span into every log line. This makes it possible to jump from a log line to the full distributed trace for the same request in one click.

func loggerWithTrace(ctx context.Context, base *slog.Logger) *slog.Logger {
    span := trace.SpanFromContext(ctx)
    if !span.SpanContext().IsValid() {
        return base
    }
    return base.With(
        slog.String("trace_id", span.SpanContext().TraceID().String()),
        slog.String("span_id", span.SpanContext().SpanID().String()),
    )
}

If you are not running distributed tracing yet, a random request ID that appears in both the log lines and the HTTP response header as X-Request-ID dramatically reduces the time to reproduce and investigate a customer-reported issue.

Key lessons from production

Structured logging pays for itself during the first production incident after it is implemented. The difference between 40 minutes of searching and 4 minutes is a tenant ID and a request ID on every log line.

The pattern that works: one request-scoped logger created at the HTTP entry, carried through context, with six core fields. Errors logged at the handling boundary with the full error chain. Log levels kept strict. High-volume INFO sampled at 10 percent. Trace IDs joined to the trace system where OpenTelemetry is in use.

Engineering teams in Lebanon and across MENA that adopt this pattern early spend dramatically less time on operational incidents as their systems scale.

What breaks without structured logging in production

How slog changes the model in Go 1.21

What fields belong in every log line

Handling errors in structured logs

Log levels in production

Sampling high-volume logs

Correlating logs with traces

Key lessons from production

Not sure where to start?

Keep reading

Building a Real-Time Delivery and Fleet Tracking System with Go and WebSockets

Coordinating Distributed Transactions in Go: The Saga Pattern in Production

Designing End-of-Day and Shift Reports for POS Systems in Go