Introduction
Running Next.js on Kubernetes works well once it is configured correctly. But getting to that point usually means hitting a series of frustrating issues that are unique to the combination of Next.js, Docker containers, and Kubernetes orchestration. We have encountered and resolved these issues across dozens of client deployments. Here are the five most common problems and their fixes.
Issue 1: Missing CSS and Static Assets with Standalone Output
Symptoms
The application loads but without any CSS styling. The browser console shows 404 errors for /_next/static/* files.
Root Cause
When output: "standalone" is set in next.config.ts, the build output in .next/standalone does not include the .next/static directory or the public directory. These must be copied manually in the Dockerfile.
Fix
Ensure the Dockerfile copies both directories into the final image:
# In the runner stage
COPY --from=builder /app/.next/standalone ./
COPY --from=builder /app/.next/static ./.next/static
COPY --from=builder /app/public ./public
The order matters. The standalone directory is copied first, then static files are placed inside .next/static where the server expects them.
Verification
After building and running the container locally:
docker build -t myapp:test .
docker run -p 3000:3000 myapp:test
# Open http://localhost:3000 and check that CSS loads
Issue 2: Image Optimization Failing in Containers
Symptoms
The Next.js Image component returns 500 errors or extremely slow responses. Logs show errors related to sharp not being found or incompatible binaries.
Root Cause
Next.js uses sharp for server-side image optimization. The sharp binary is platform-specific and must be compiled for the container's OS and architecture. If we copy node_modules from a macOS or Windows build stage, the sharp binary will not work in a Linux container.
Fix
Option A: Install sharp in the container explicitly:
# In the deps stage, ensure sharp is installed for Linux
FROM node:22-alpine AS deps
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci
# Force sharp to install platform-specific binaries
RUN npm install sharp --platform=linux --arch=x64
Option B: Use an external image optimization service and disable the built-in optimizer:
// next.config.ts
const nextConfig: NextConfig = {
output: "standalone",
images: {
loader: "custom",
loaderFile: "./lib/image-loader.ts",
},
};
// lib/image-loader.ts
export default function cloudflareLoader({
src,
width,
quality,
}: {
src: string;
width: number;
quality?: number;
}) {
const params = [`width=${width}`, `quality=${quality || 75}`, "format=auto"];
return `/cdn-cgi/image/${params.join(",")}/${src}`;
}
Option C: Skip optimization entirely for CDN-served images:
const nextConfig: NextConfig = {
output: "standalone",
images: {
unoptimized: true,
},
};
Issue 3: ISR (Incremental Static Regeneration) Not Working Across Pods
Symptoms
Pages regenerated by ISR show stale content on some requests. Different pods serve different versions of the same page.
Root Cause
By default, Next.js stores the ISR cache on the local filesystem of each pod. Since Kubernetes pods have independent filesystems, each pod maintains its own cache. When pod A regenerates a page, pods B and C still serve the old version until they independently regenerate.
Fix
Use a shared cache handler. Next.js supports custom cache handlers that can use Redis or another external store:
// cache-handler.ts
import { CacheHandler } from "next/dist/server/lib/incremental-cache";
import Redis from "ioredis";
const redis = new Redis(process.env.REDIS_URL!);
export default class RedisCacheHandler extends CacheHandler {
async get(key: string) {
const data = await redis.get(`next-cache:${key}`);
if (!data) return null;
return JSON.parse(data);
}
async set(key: string, data: any, ctx: any) {
const ttl = ctx.revalidate || 3600;
await redis.setex(`next-cache:${key}`, ttl, JSON.stringify(data));
}
async revalidateTag(tags: string | string[]) {
const tagArray = Array.isArray(tags) ? tags : [tags];
for (const tag of tagArray) {
const keys = await redis.keys(`next-cache:*`);
// Implementation depends on how tags are stored
for (const key of keys) {
await redis.del(key);
}
}
}
}
Configure it in next.config.ts:
const nextConfig: NextConfig = {
output: "standalone",
cacheHandler: "./cache-handler.ts",
cacheMaxMemorySize: 0, // Disable in-memory cache
};
Issue 4: Memory Leaks in Node.js Pods
Symptoms
Pods are OOMKilled periodically. Memory usage climbs steadily over hours or days. The Kubernetes HPA keeps scaling up pods without the load increasing.
Root Cause
Common sources of memory leaks in Next.js on Kubernetes:
- Unbounded in-memory caches (ISR cache growing without eviction).
- Server Components holding references to large data structures.
- Event listeners not being cleaned up in API routes.
- Third-party libraries with memory leaks.
Fix
Step 1: Set memory limits in the Kubernetes deployment and configure Node.js heap:
containers:
- name: myapp
resources:
requests:
memory: 256Mi
limits:
memory: 512Mi
env:
- name: NODE_OPTIONS
value: "--max-old-space-size=384"
Set max-old-space-size to about 75% of the memory limit to leave room for non-heap memory.
Step 2: Disable or limit the in-memory ISR cache:
const nextConfig: NextConfig = {
cacheMaxMemorySize: 0, // Disable in-memory ISR cache
};
Step 3: Profile memory in a staging pod:
# Port-forward to a pod
kubectl port-forward pod/myapp-xxx 9229:9229 -n myapp
# Start the pod with inspect flag
# Add to deployment env:
# NODE_OPTIONS: "--inspect=0.0.0.0:9229 --max-old-space-size=384"
# Connect Chrome DevTools to chrome://inspect
Step 4: If the leak cannot be found, a pragmatic solution is to configure Kubernetes to restart pods periodically:
livenessProbe:
httpGet:
path: /api/health
port: 3000
initialDelaySeconds: 10
periodSeconds: 30
failureThreshold: 3
Combined with a health endpoint that tracks uptime:
// app/api/health/route.ts
const startTime = Date.now();
const MAX_UPTIME_MS = 12 * 60 * 60 * 1000; // 12 hours
export async function GET() {
const uptime = Date.now() - startTime;
if (uptime > MAX_UPTIME_MS) {
return NextResponse.json(
{ status: "restart-requested", uptime },
{ status: 503 }
);
}
return NextResponse.json({ status: "ok", uptime });
}
Issue 5: Requests Failing During Deployments (Graceful Shutdown)
Symptoms
During a rolling deployment, some requests return 502 or connection reset errors.
Root Cause
When Kubernetes sends SIGTERM to a pod, the Node.js process begins shutting down. But the Kubernetes Service may still route new requests to the terminating pod for a brief window before the endpoint is removed from the load balancer.
Fix
Two changes are required:
1. Add a preStop lifecycle hook to delay the SIGTERM:
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 5"]
This gives Kubernetes 5 seconds to update the endpoint list before the application starts shutting down.
2. Handle SIGTERM gracefully in the Node.js process. The Next.js standalone server handles this by default, but if we have a custom server, we need to close connections properly:
// Custom server example (if not using default standalone)
process.on("SIGTERM", () => {
console.log("SIGTERM received, closing server...");
server.close(() => {
console.log("Server closed");
process.exit(0);
});
// Force shutdown after 25 seconds (before terminationGracePeriodSeconds)
setTimeout(() => {
console.log("Forcing shutdown");
process.exit(1);
}, 25000);
});
3. Set maxUnavailable: 0 in the rolling update strategy to ensure new pods are ready before old ones are terminated.
Conclusion
These five issues account for the majority of problems we see when teams move Next.js to Kubernetes. The fixes are straightforward once we understand the root causes. Missing static assets and sharp binaries are Dockerfile issues. ISR cache inconsistency requires a shared cache backend. Memory leaks need monitoring and heap limits. Graceful shutdown requires a preStop hook and proper SIGTERM handling. With these solutions in place, Next.js runs reliably on Kubernetes in production.