Production Audit Logs in Go SaaS: Design, Storage, and What Most Teams Get Wrong

Audit logs are the single most underinvested feature in most SaaS backends. Teams add them late, store them inefficiently, and discover the gaps only when a client asks "who changed this and when" and the answer is unknowable. Getting audit logging right from the beginning is cheaper than rebuilding it under pressure.

In RTYLR, Voxire's restaurant and retail platform serving businesses across Lebanon and the Gulf, audit logs are used daily. Restaurant operators query them to trace inventory discrepancies. Finance teams export them for compliance reviews. When something unexpected happens at a POS terminal at 11pm, the audit trail is the first place anyone looks.

What audit logs are actually for

Audit logs serve three distinct purposes that are often conflated:

Compliance records show that a system meets regulatory or contractual requirements. A restaurant chain needs to show that menu price changes were made by authorized users. A SaaS platform needs to show that data access patterns comply with GDPR requirements for EU-linked customers.

Operational investigation answers questions like "why does the inventory count not match the purchase order" or "who disabled this user account at 3am". This is the most frequent use case. It needs to be fast and searchable.

Customer transparency allows end users of your SaaS to see their own activity history. Account changes, API key creation, permission modifications. Some products surface this directly in the UI.

Each purpose has different query patterns, retention requirements, and performance constraints. Designing a single table to serve all three is possible, but mixing operational query indexes with compliance export indexes on one hot table causes problems at scale.

The table schema

A production-ready audit log table in PostgreSQL:

CREATE TABLE audit_logs (
  id          BIGSERIAL PRIMARY KEY,
  tenant_id   UUID NOT NULL,
  actor_id    UUID,
  actor_type  TEXT NOT NULL DEFAULT 'user',
  action      TEXT NOT NULL,
  resource    TEXT NOT NULL,
  resource_id TEXT,
  before_data JSONB,
  after_data  JSONB,
  metadata    JSONB,
  ip_address  INET,
  created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX audit_tenant_time_idx ON audit_logs (tenant_id, created_at DESC);
CREATE INDEX audit_resource_idx ON audit_logs (tenant_id, resource, resource_id, created_at DESC);
CREATE INDEX audit_actor_idx ON audit_logs (tenant_id, actor_id, created_at DESC);

A few design decisions worth explaining:

BIGSERIAL instead of UUID for the primary key. Audit logs are append-only and queried in time order. BIGSERIAL keeps rows physically ordered on disk in insertion order, which makes time-range scans significantly faster than a UUID primary key would.

before_data and after_data as JSONB. Storing both the before and after state of a resource means you can answer any question about what changed without querying the main tables. The cost is storage. For high-churn resources like inventory counts, consider storing only changed fields rather than the full record.

actor_type to distinguish human users from system processes. A scheduled job that reorders stock should appear in the audit log as an automated action, not as an anonymous user.

resource and resource_id use human-readable identifiers: "menu_item" and "item_abc123", not table names and integer IDs. When querying audit logs six months after an event, you want logs that are readable without a schema reference.

Writing audit logs in Go without boilerplate

The most common mistake is writing audit log entries manually inside every service method. This leads to audit log calls being skipped during refactors and inconsistent field naming across different parts of the codebase.

A better approach is a middleware-style auditor that hooks into your service layer:

type AuditLogger struct {
    db *sql.DB
}

type AuditEntry struct {
    TenantID   uuid.UUID
    ActorID    *uuid.UUID
    ActorType  string
    Action     string
    Resource   string
    ResourceID string
    Before     interface{}
    After      interface{}
    Metadata   map[string]interface{}
    IPAddress  string
}

func (a *AuditLogger) Log(ctx context.Context, entry AuditEntry) error {
    beforeJSON, _ := json.Marshal(entry.Before)
    afterJSON, _ := json.Marshal(entry.After)
    metaJSON, _ := json.Marshal(entry.Metadata)

    _, err := a.db.ExecContext(ctx, `
        INSERT INTO audit_logs
          (tenant_id, actor_id, actor_type, action, resource, resource_id,
           before_data, after_data, metadata, ip_address)
        VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10::inet)
    `,
        entry.TenantID, entry.ActorID, entry.ActorType,
        entry.Action, entry.Resource, entry.ResourceID,
        beforeJSON, afterJSON, metaJSON, entry.IPAddress,
    )
    return err
}

In service methods, the call becomes a single structured statement:

func (s *MenuService) UpdatePrice(ctx context.Context, itemID uuid.UUID, newPrice decimal.Decimal) error {
    before, err := s.db.GetMenuItem(ctx, itemID)
    if err != nil {
        return err
    }
    if err := s.db.UpdateMenuItemPrice(ctx, itemID, newPrice); err != nil {
        return err
    }
    after := *before
    after.Price = newPrice
    return s.auditor.Log(ctx, AuditEntry{
        TenantID:   s.tenantFromCtx(ctx),
        ActorID:    s.actorFromCtx(ctx),
        Action:     "update_price",
        Resource:   "menu_item",
        ResourceID: itemID.String(),
        Before:     before,
        After:      after,
    })
}

Asynchronous versus synchronous audit writes

Writing audit logs synchronously in the request path adds latency. For a price update that takes 5ms to execute, adding a synchronous audit write that takes 3ms increases total latency by 60%. At high request volume, this compounds.

The tradeoff: async audit writes can be lost if the service restarts before the write completes. Synchronous writes are always durably recorded.

For most SaaS applications, the right approach depends on the action type. Financial operations, permission changes, and data deletion should be logged synchronously. Activity logs for reads, UI navigation, and non-destructive operations can be async.

For async paths, the outbox pattern described in an earlier post is the appropriate implementation. Write the audit event to the outbox table within the main transaction, and let a background worker persist it to the audit logs table.

Retention, partitioning, and growth

Audit log tables will grow indefinitely without a retention policy. At RTYLR, a tenant running a busy restaurant can generate tens of thousands of audit entries per day across staff logins, menu changes, inventory adjustments, and order modifications.

PostgreSQL table partitioning by created_at makes retention manageable:

CREATE TABLE audit_logs (
  id          BIGSERIAL,
  tenant_id   UUID NOT NULL,
  created_at  TIMESTAMPTZ NOT NULL,
  -- other columns
) PARTITION BY RANGE (created_at);

CREATE TABLE audit_logs_2026_05 PARTITION OF audit_logs
  FOR VALUES FROM ('2026-05-01') TO ('2026-06-01');

Dropping a monthly partition to enforce a 12-month retention policy is a metadata operation. It completes in milliseconds regardless of row count and does not create any table bloat.

Multi-tenant isolation in audit queries

Every query against the audit log table must include tenant_id in the WHERE clause. This is not optional and should be enforced at the query layer, not left to individual developers.

A wrapper that makes it impossible to query audit logs without specifying a tenant:

func (q *AuditQuery) ForTenant(tenantID uuid.UUID) *TenantAuditQuery {
    return &TenantAuditQuery{db: q.db, tenantID: tenantID}
}

func (q *TenantAuditQuery) ByResource(resource, resourceID string, limit int) ([]AuditLog, error) {
    rows, err := q.db.Query(`
        SELECT id, actor_id, actor_type, action, before_data, after_data, created_at
        FROM audit_logs
        WHERE tenant_id = $1 AND resource = $2 AND resource_id = $3
        ORDER BY created_at DESC
        LIMIT $4
    `, q.tenantID, resource, resourceID, limit)
    // ...
}

Cross-tenant audit queries should require an explicit admin flag and be logged themselves. An admin reading another tenant's audit trail is itself an auditable event.

Key lessons from production

Three things surprised the RTYLR team after running audit logs in production across Lebanon and Gulf deployments for over a year:

First, before_data is queried far more often than expected. The most common operational question is not "what is the current state" but "what was the state before that change happened". Having before_data stored in the audit log eliminates the need to reconstruct history from individual change events.

Second, the action field naming matters more than expected. Inconsistent naming (price_updated vs update_price vs menu.price.change) makes audit logs unsearchable. Define an enumerated set of action names and enforce it at code review time.

Third, customers ask for audit log exports more often than anticipated. Building a simple CSV export endpoint in the first sprint turned out to be one of the most-used features in RTYLR's enterprise tier.