Every Next.js application will at some point receive a broken deploy. The question is not whether rollback will be needed, but how long it takes, how much traffic it affects, and whether the process is reliable enough to execute under pressure at 2 AM. This article covers the rollback strategies available for Next.js in order of increasing reliability, and the trade-offs of each.
Why Rollback Is Harder Than It Looks
A Next.js application in production has more state than its own code. A rollback that only reverts the application binary while leaving other state in place can produce a system that is broken in a different way than the original failure.
The state that a rollback must account for:
| State type | Examples | Rollback concern |
|---|---|---|
| Application code | Next.js bundle, server code | The obvious target |
| Environment variables | Feature flags, API keys | Old code may expect different keys |
| Database schema | Migrations run by the new deploy | Old code may not work with new schema |
| Cache contents | Full Route Cache, Data Cache | May contain data shaped for new code |
| Client-side cache | Browser Router Cache | Users may have cached the broken version |
A deployment process that does not account for all five can produce silent failures after rollback that are harder to diagnose than the original problem.
Strategy 1 — Manual Redeploy of Previous Version
The simplest rollback: identify the last working image tag or commit, build it again, and deploy it.
Procedure (CI/CD via GitLab CI / GitHub Actions):
# Find the last working tag
git log --oneline -10
# Trigger a deploy of a specific commit
git checkout <commit-hash>
git tag v1.2.3-rollback
git push origin v1.2.3-rollback
# CI pipeline picks up the tag and deploys
What this does well: Works on any infrastructure. No special tooling required.
What this does poorly: Time-to-recovery depends on how long the build takes. If builds take 8 minutes, you have at least 8 minutes of broken production before the rollback is live. Traffic continues hitting the broken version during the build.
When to use: Development environments, staging, or production deployments with very low traffic where 5-10 minutes of downtime is acceptable.
Strategy 2 — Tag-Based Image Rollback
If your CI pipeline produces and pushes Docker images tagged with the commit SHA or version, rollback is a matter of updating the running image without a build step.
# Kubernetes deployment update
kubectl set image deployment/nextjs-app nextjs=registry.example.com/nextjs-app:v1.2.2 --namespace=production
# Watch the rollout
kubectl rollout status deployment/nextjs-app --namespace=production
Kubernetes performs a rolling update by default — it brings up pods running the old image before terminating pods running the broken image, so traffic continues flowing throughout.
Time to recovery: 60-120 seconds, depending on pod startup time and the readinessProbe configuration.
Prerequisite: Images must be retained in the registry. A registry cleanup policy that deletes images older than N days will eventually remove the image you need. Retain at least the last 10 production images by tag.
# Kubernetes Deployment — key rollback-related settings
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0 # Never take a pod down before the new one is ready
maxSurge: 1 # Allow one extra pod during the update
template:
spec:
containers:
- name: nextjs
image: registry.example.com/nextjs-app:v1.2.3
readinessProbe:
httpGet:
path: /api/health
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
With maxUnavailable: 0, Kubernetes will not terminate any running pod until the replacement passes its readiness probe. A broken deploy that makes the health check fail will halt the rollout — old pods stay up, and the deployment can be rolled back with:
kubectl rollout undo deployment/nextjs-app --namespace=production
This is the single most important Kubernetes configuration for Next.js deployments. Without a readiness probe tied to an actual application health check, Kubernetes considers a pod ready as soon as it starts — even if the application is crashing or returning 500s.
Strategy 3 — Blue/Green Deployment
Blue/green maintains two complete environments. At any given time, one environment (blue) is live and the other (green) is idle. Deployments go to green. Traffic is switched at the load balancer level. Rollback is a traffic switch, not a pod operation.
# Simplified Kubernetes Service switching
apiVersion: v1
kind: Service
metadata:
name: nextjs-app
spec:
selector:
app: nextjs-app
slot: blue # Change to "green" to switch traffic
ports:
- port: 80
targetPort: 3000
Switching from green back to blue is a single kubectl patch:
kubectl patch service nextjs-app -p '{"spec":{"selector":{"slot":"blue"}}}' --namespace=production
Time to recovery: Under 10 seconds. The switch is atomic at the load balancer level — no pod restart, no build, no rolling update.
Cost: Requires double the compute to keep both environments running simultaneously.
What this does not solve: Database schema changes. If the new deploy ran a migration that is not backward-compatible with the old code, switching back to blue will produce database errors. This is the primary reason to prefer additive migrations (add a column, keep the old one) over destructive ones (drop-and-recreate) in continuously deployed applications.
Strategy 4 — Canary Rollback
Canary deployments send a percentage of traffic to the new version while the majority continues on the old version. If the canary shows errors, the traffic percentage is reduced to zero — equivalent to a rollback — without any pod operation on the stable version.
With Kubernetes and an ingress controller that supports traffic splitting (NGINX ingress with nginx.ingress.kubernetes.io/canary annotations, or Argo Rollouts):
# Canary ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: nextjs-app-canary
annotations:
nginx.ingress.kubernetes.io/canary: "true"
nginx.ingress.kubernetes.io/canary-weight: "10"
spec:
rules:
- host: example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: nextjs-app-canary
port:
number: 80
Setting canary-weight: "0" removes the canary from the traffic pool. The stable deployment continues without interruption.
Time to recovery: Under 5 seconds.
When to use: High-traffic production environments where a broken deploy affecting even 1% of users is significant. Canary is the deployment strategy that catches the most regressions before they affect all users.
Health Check Endpoint
All rollback strategies above depend on a reliable health check. The Next.js application must expose an endpoint that returns a non-200 status when it is not ready to serve traffic.
// app/api/health/route.ts
import { NextResponse } from "next/server";
export async function GET() {
try {
// Check database connectivity
await db.$queryRaw`SELECT 1`;
return NextResponse.json({ status: "ok" }, { status: 200 });
} catch (error) {
return NextResponse.json(
{ status: "error", message: String(error) },
{ status: 503 }
);
}
}
A health check that always returns 200 is useless for Kubernetes readiness probes. The probe must verify that the application can actually serve requests — which means checking any critical dependency (database, cache) that would cause user-visible failures if unavailable.
Choosing a Strategy
| Strategy | Time to recovery | Requires Kubernetes | Zero downtime | Handles schema changes |
|---|---|---|---|---|
| Manual redeploy | 5-15 minutes | No | No | Depends |
| Image rollback | 60-120 seconds | Recommended | Yes (with readiness probe) | No |
| Blue/green | Under 10 seconds | Yes | Yes | No |
| Canary | Under 5 seconds | Yes | Yes | Yes (stable never changes) |
For most Next.js applications, image rollback on Kubernetes with a properly configured readiness probe covers the majority of failure scenarios. Blue/green or canary is worth the additional complexity for applications deploying multiple times per day or serving high-value traffic where even a 60-second degradation is unacceptable.
Private DevOps configures image-based rollback with readiness probes and kubectl rollout undo as the baseline for every Next.js Kubernetes deployment, with canary deployments available for teams that want sub-10-second recovery. The configuration is documented in runbooks so any engineer on the team can execute a rollback without prior platform knowledge.
Need help with this?
Our team handles this kind of work daily. Let us take care of your infrastructure.
Related Articles
Deploying Next.js 16 to Kubernetes: The Complete Production Guide
A complete guide to deploying Next.js 16 to Kubernetes in production, including multi-stage Dockerfile, K3s deployment manifests, health checks, HPA, Cloudflare Tunnel integration, environment variables, and Prisma in containers.
Next.jsNext.js on K8s: Solving the 5 Most Common Production Issues
Five common production issues when running Next.js on Kubernetes and how to fix each one: missing CSS with standalone output, image optimization in containers, ISR with shared cache, Node.js memory leaks, and graceful shutdown.
Next.jsHow We Run Next.js at Scale on K3s with Zero Downtime
A production-grade guide to running Next.js on K3s with zero downtime — container registry, CI/CD pipelines, rolling updates, Cloudflare CDN and Tunnel, Prometheus monitoring, and automated cache purging.