Load Testing Go APIs with k6 Before Production Cutover

Shipping a new Go service to ECS without any load testing history is a gamble. The code compiles and the tests pass, but you have no evidence the system holds under 200 concurrent users or 10 minutes of sustained traffic. Startups in Lebanon and the Gulf that skip this step often find out on their first real launch.

Shipping a new Go service to production on ECS without any load testing history is a gamble. The code compiles and the unit tests pass, but you have no evidence the system holds under 200 concurrent users, 1,000 requests per second, or 10 minutes of sustained traffic. Startups in Lebanon and the Gulf that skip this step often discover their database connection pool saturates or their ECS task CPU spikes on the first real campaign or launch.

Why load testing is different from unit testing

Unit tests verify that individual functions produce correct outputs for specific inputs. Load tests verify that the system behaves correctly under concurrency, sustained load, and realistic usage patterns. A Go handler that passes every unit test can still deadlock under load if it holds a database connection across a blocking operation, or saturate a downstream API's rate limit, or exhaust goroutines from a misconfigured worker pool.

Load testing answers questions unit tests cannot: what is the 99th percentile response time at 500 concurrent users, where does the system start degrading gracefully, what fails first when capacity is exceeded, and how long does recovery take after a spike.

Why k6 for Go and ECS services

k6 is a load testing tool written in Go, scripted in JavaScript. It produces clean structured output, integrates with Grafana and InfluxDB for real-time dashboards, and runs from CI pipelines or from a developer laptop without complex setup. For most Go and ECS teams it replaces the combination of JMeter, Locust, and custom shell scripts that older teams accumulated.

The k6 model fits small engineering teams: write the test in JavaScript, run it against a staging environment, iterate on the script as the API evolves, and add it to CI for regression detection before every deployment.

A minimal k6 test for a Go API:

import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  stages: [
    { duration: '2m', target: 50 },   // ramp to 50 users
    { duration: '5m', target: 50 },   // hold at 50
    { duration: '2m', target: 200 },  // ramp to 200
    { duration: '5m', target: 200 },  // hold at 200
    { duration: '2m', target: 0 },    // ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95)<500'],  // 95th percentile under 500ms
    http_req_failed: ['rate<0.01'],    // less than 1% errors
  },
};

export default function () {
  const res = http.get('https://api-staging.example.com/api/v1/products', {
    headers: { Authorization: `Bearer ${__ENV.API_TOKEN}` },
  });
  check(res, {
    'status is 200': (r) => r.status === 200,
    'response time ok': (r) => r.timings.duration < 500,
  });
  sleep(1);
}

This runs a staged ramp test: 50 virtual users, then 200, with a 5-minute hold at each level. The thresholds block defines pass/fail criteria. If either threshold is violated, k6 exits with a non-zero code, failing the CI job automatically.

Scenarios to test before production cutover

Not all endpoints carry equal load risk. For a SaaS backend serving Lebanon or Gulf operators, the scenarios worth testing before cutover:

Authentication flow: login, token refresh, token validation under concurrent load. Auth is on the critical path of every request. A bottleneck here affects all users simultaneously.

Database-heavy reads: list endpoints, search endpoints, any endpoint with joins or aggregations. These saturate the database connection pool before anything else. Test the endpoint that returns paginated lists for a tenant with 100,000 rows.

Write-heavy flows: order submission, transaction recording, form submissions. These test write throughput and the behavior of your connection pool under concurrent writes.

Background job interaction: test what happens to response times when a heavy background job runs concurrently with live API traffic on the same instance. This is a common source of p99 spikes that only appear in production.

Concurrent requests from the same tenant: in multi-tenant SaaS, test multiple users from one tenant making requests simultaneously. This catches per-tenant rate limiting issues and locking contention on tenant-scoped rows.

Interpreting k6 output for Go services

The metrics that matter most for Go services on ECS:

http_req_duration: total HTTP request duration from the client perspective. If high, the bottleneck is somewhere in your handler chain, database, or external API.
http_req_waiting (TTFB): time to first byte. If high relative to total duration, the server is waiting on a database query before it begins responding.
http_req_failed: percentage of requests that returned an error or failed to connect. Anything above 0.1% at stable load is a problem.

On the Go and ECS side, monitor in parallel during the test:

ECS task CPU: if CPU hits 80%, your instances are undersized or your handlers are CPU-bound
RDS connections: if connections approach the PostgreSQL max_connections limit, your pool is too large or pgBouncer is not configured
RDS CPU and IOPS: a write-heavy test that pins RDS CPU points to missing indexes or inefficient queries
ECS task memory: Go garbage collector pressure shows up as memory creeping upward under sustained load

Setting thresholds from real business requirements

Threshold values should come from actual business requirements, not from generic benchmarks. Ask: what response time causes a user to abandon the operation? For an e-commerce checkout, most evidence puts this at 3 seconds total page load, which means the API should respond in under 500ms to leave headroom for the frontend. For an internal operational dashboard used by restaurant staff, 2 seconds is often acceptable. For a real-time POS terminal sync, the threshold is much stricter.

The error rate threshold should reflect the cost of an error. For a payment API, even 0.1% errors at 1,000 requests per minute means one failed payment per hour, which is not acceptable. For a non-critical analytics endpoint, 1% errors under extreme load might be tolerable with a retry mechanism.

Running k6 from GitHub Actions

Adding k6 to CI means load regressions cannot ship without detection:

name: Load test staging
on:
  push:
    branches: [main]

jobs:
  load-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Install k6
        run: |
          sudo gpg --no-default-keyring \
            --keyring /usr/share/keyrings/k6-archive-keyring.gpg \
            --keyserver hkp://keyserver.ubuntu.com:80 \
            --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
          echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" \
            | sudo tee /etc/apt/sources.list.d/k6.list
          sudo apt-get update && sudo apt-get install k6
      - name: Run load test
        env:
          API_TOKEN: ${{ secrets.STAGING_API_TOKEN }}
        run: k6 run --env API_TOKEN=$API_TOKEN tests/load/api.js

The k6 test runs on every push to main, against the staging environment. If the API degrades under the defined load, the CI job fails before any production deployment proceeds.

Common bottlenecks found during load testing

Across Go services deployed on ECS in MENA, the most common bottlenecks found during first load tests:

Connection pool exhaustion: the default database/sql pool settings in Go are conservative. Under load, the pool queues goroutines waiting for connections, which drives response times up exponentially past a threshold. Set db.SetMaxOpenConns() and db.SetMaxIdleConns() deliberately, not by accepting the defaults.

N+1 queries: a list endpoint that queries the database once per item in a loop produces 1 query at idle and 500 queries at load for a typical page of results. Load testing finds these because the database CPU spike is unmistakable.

Missing indexes: queries that run in 2ms against a table with 1,000 rows run in 800ms against the same table with 1 million rows if the WHERE clause is not indexed. Load testing with realistic data volumes reveals this.

Synchronous external API calls: a handler that calls an SMS gateway or a payment processor synchronously blocks the goroutine for the duration of the external call. Under load, these add up. Use context deadlines and timeouts.

Key lessons from production

Load testing is not a pre-launch checkbox. It is ongoing verification that the system behaves as you think it does under real conditions. The services that ship cleanly to production in Lebanon and Gulf markets are the ones where the engineering team can say: we ran 200 concurrent users for 10 minutes, we watched the connection pool, we know where the ceiling is and how the system behaves when it reaches it.

k6 is the right tool for Go and ECS teams because it is scriptable, CI-native, and produces metrics that map directly to what you monitor in CloudWatch. Run staged ramp tests, define thresholds from real business requirements, and fix the bottleneck before the traffic finds it for you.

Why load testing is different from unit testing

Why k6 for Go and ECS services

Scenarios to test before production cutover

Interpreting k6 output for Go services

Setting thresholds from real business requirements

Running k6 from GitHub Actions

Common bottlenecks found during load testing

Key lessons from production

Not sure where to start?

Keep reading

Building a Real-Time Delivery and Fleet Tracking System with Go and WebSockets

Coordinating Distributed Transactions in Go: The Saga Pattern in Production

Designing End-of-Day and Shift Reports for POS Systems in Go