Next.js on K8s: Solving the 5 Most Common Production Issues

Introduction

Running Next.js on Kubernetes works well once it is configured correctly. But getting to that point usually means hitting a series of frustrating issues that are unique to the combination of Next.js, Docker containers, and Kubernetes orchestration. We have encountered and resolved these issues across dozens of client deployments. Here are the five most common problems and their fixes.

Issue 1: Missing CSS and Static Assets with Standalone Output

Symptoms

The application loads but without any CSS styling. The browser console shows 404 errors for /_next/static/* files.

Root Cause

When output: "standalone" is set in next.config.ts, the build output in .next/standalone does not include the .next/static directory or the public directory. These must be copied manually in the Dockerfile.

Fix

Ensure the Dockerfile copies both directories into the final image:

# In the runner stage
COPY --from=builder /app/.next/standalone ./
COPY --from=builder /app/.next/static ./.next/static
COPY --from=builder /app/public ./public

The order matters. The standalone directory is copied first, then static files are placed inside .next/static where the server expects them.

Verification

After building and running the container locally:

docker build -t myapp:test .
docker run -p 3000:3000 myapp:test
# Open http://localhost:3000 and check that CSS loads

Issue 2: Image Optimization Failing in Containers

Symptoms

The Next.js Image component returns 500 errors or extremely slow responses. Logs show errors related to sharp not being found or incompatible binaries.

Root Cause

Next.js uses sharp for server-side image optimization. The sharp binary is platform-specific and must be compiled for the container's OS and architecture. If we copy node_modules from a macOS or Windows build stage, the sharp binary will not work in a Linux container.

Fix

Option A: Install sharp in the container explicitly:

# In the deps stage, ensure sharp is installed for Linux
FROM node:22-alpine AS deps
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci

# Force sharp to install platform-specific binaries
RUN npm install sharp --platform=linux --arch=x64

Option B: Use an external image optimization service and disable the built-in optimizer:

// next.config.ts
const nextConfig: NextConfig = {
  output: "standalone",
  images: {
    loader: "custom",
    loaderFile: "./lib/image-loader.ts",
  },
};

// lib/image-loader.ts
export default function cloudflareLoader({
  src,
  width,
  quality,
}: {
  src: string;
  width: number;
  quality?: number;
}) {
  const params = [`width=${width}`, `quality=${quality || 75}`, "format=auto"];
  return `/cdn-cgi/image/${params.join(",")}/${src}`;
}

Option C: Skip optimization entirely for CDN-served images:

const nextConfig: NextConfig = {
  output: "standalone",
  images: {
    unoptimized: true,
  },
};

Issue 3: ISR (Incremental Static Regeneration) Not Working Across Pods

Symptoms

Pages regenerated by ISR show stale content on some requests. Different pods serve different versions of the same page.

Root Cause

By default, Next.js stores the ISR cache on the local filesystem of each pod. Since Kubernetes pods have independent filesystems, each pod maintains its own cache. When pod A regenerates a page, pods B and C still serve the old version until they independently regenerate.

Fix

Use a shared cache handler. Next.js supports custom cache handlers that can use Redis or another external store:

// cache-handler.ts
import { CacheHandler } from "next/dist/server/lib/incremental-cache";
import Redis from "ioredis";

const redis = new Redis(process.env.REDIS_URL!);

export default class RedisCacheHandler extends CacheHandler {
  async get(key: string) {
    const data = await redis.get(`next-cache:${key}`);
    if (!data) return null;
    return JSON.parse(data);
  }

  async set(key: string, data: any, ctx: any) {
    const ttl = ctx.revalidate || 3600;
    await redis.setex(`next-cache:${key}`, ttl, JSON.stringify(data));
  }

  async revalidateTag(tags: string | string[]) {
    const tagArray = Array.isArray(tags) ? tags : [tags];
    for (const tag of tagArray) {
      const keys = await redis.keys(`next-cache:*`);
      // Implementation depends on how tags are stored
      for (const key of keys) {
        await redis.del(key);
      }
    }
  }
}

Configure it in next.config.ts:

const nextConfig: NextConfig = {
  output: "standalone",
  cacheHandler: "./cache-handler.ts",
  cacheMaxMemorySize: 0, // Disable in-memory cache
};

Issue 4: Memory Leaks in Node.js Pods

Symptoms

Pods are OOMKilled periodically. Memory usage climbs steadily over hours or days. The Kubernetes HPA keeps scaling up pods without the load increasing.

Root Cause

Common sources of memory leaks in Next.js on Kubernetes:

Unbounded in-memory caches (ISR cache growing without eviction).
Server Components holding references to large data structures.
Event listeners not being cleaned up in API routes.
Third-party libraries with memory leaks.

Fix

Step 1: Set memory limits in the Kubernetes deployment and configure Node.js heap:

containers:
  - name: myapp
    resources:
      requests:
        memory: 256Mi
      limits:
        memory: 512Mi
    env:
      - name: NODE_OPTIONS
        value: "--max-old-space-size=384"

Set max-old-space-size to about 75% of the memory limit to leave room for non-heap memory.

Step 2: Disable or limit the in-memory ISR cache:

const nextConfig: NextConfig = {
  cacheMaxMemorySize: 0, // Disable in-memory ISR cache
};

Step 3: Profile memory in a staging pod:

# Port-forward to a pod
kubectl port-forward pod/myapp-xxx 9229:9229 -n myapp

# Start the pod with inspect flag
# Add to deployment env:
# NODE_OPTIONS: "--inspect=0.0.0.0:9229 --max-old-space-size=384"

# Connect Chrome DevTools to chrome://inspect

Step 4: If the leak cannot be found, a pragmatic solution is to configure Kubernetes to restart pods periodically:

livenessProbe:
  httpGet:
    path: /api/health
    port: 3000
  initialDelaySeconds: 10
  periodSeconds: 30
  failureThreshold: 3

Combined with a health endpoint that tracks uptime:

// app/api/health/route.ts
const startTime = Date.now();
const MAX_UPTIME_MS = 12 * 60 * 60 * 1000; // 12 hours

export async function GET() {
  const uptime = Date.now() - startTime;
  if (uptime > MAX_UPTIME_MS) {
    return NextResponse.json(
      { status: "restart-requested", uptime },
      { status: 503 }
    );
  }
  return NextResponse.json({ status: "ok", uptime });
}

Issue 5: Requests Failing During Deployments (Graceful Shutdown)

Symptoms

During a rolling deployment, some requests return 502 or connection reset errors.

Root Cause

When Kubernetes sends SIGTERM to a pod, the Node.js process begins shutting down. But the Kubernetes Service may still route new requests to the terminating pod for a brief window before the endpoint is removed from the load balancer.

Fix

Two changes are required:

1. Add a preStop lifecycle hook to delay the SIGTERM:

lifecycle:
  preStop:
    exec:
      command: ["/bin/sh", "-c", "sleep 5"]

This gives Kubernetes 5 seconds to update the endpoint list before the application starts shutting down.

2. Handle SIGTERM gracefully in the Node.js process. The Next.js standalone server handles this by default, but if we have a custom server, we need to close connections properly:

// Custom server example (if not using default standalone)
process.on("SIGTERM", () => {
  console.log("SIGTERM received, closing server...");
  server.close(() => {
    console.log("Server closed");
    process.exit(0);
  });

  // Force shutdown after 25 seconds (before terminationGracePeriodSeconds)
  setTimeout(() => {
    console.log("Forcing shutdown");
    process.exit(1);
  }, 25000);
});

3. Set maxUnavailable: 0 in the rolling update strategy to ensure new pods are ready before old ones are terminated.

Conclusion

These five issues account for the majority of problems we see when teams move Next.js to Kubernetes. The fixes are straightforward once we understand the root causes. Missing static assets and sharp binaries are Dockerfile issues. ISR cache inconsistency requires a shared cache backend. Memory leaks need monitoring and heap limits. Graceful shutdown requires a preStop hook and proper SIGTERM handling. With these solutions in place, Next.js runs reliably on Kubernetes in production.

Next.js on K8s: Solving the 5 Most Common Production Issues

Introduction

Issue 1: Missing CSS and Static Assets with Standalone Output

Symptoms

Root Cause

Fix

Verification

Issue 2: Image Optimization Failing in Containers

Symptoms

Root Cause

Fix

Issue 3: ISR (Incremental Static Regeneration) Not Working Across Pods

Symptoms

Root Cause

Fix

Issue 4: Memory Leaks in Node.js Pods

Symptoms

Root Cause

Fix

Issue 5: Requests Failing During Deployments (Graceful Shutdown)

Symptoms

Root Cause

Fix

Conclusion

Need help with this?

Related Articles

Deploying Next.js 16 to Kubernetes: The Complete Production Guide

How We Run Next.js at Scale on K3s with Zero Downtime