Introduction
We manage dozens of Next.js applications for our clients. Over time, we have developed a deployment pipeline that is fast, reliable, and achieves true zero-downtime deployments on K3s clusters. This article describes our actual production setup: from code push to live traffic, with every step automated.
This is not a theoretical guide. This is what we run today.
Architecture Overview
Developer pushes code
|
v
Deploy script (local machine or CI)
|
v
rsync source to build server
|
v
Docker build on remote server
|
v
kubectl set image (rolling update on K3s)
|
v
Cloudflare cache purge
|
v
Traffic served via Cloudflare Tunnel --> K3s --> Next.js pods
1. The Deploy Script
We use a single bash script that handles the entire deployment. No CI/CD platform required (though it integrates with one if needed):
#!/usr/bin/env bash
set -euo pipefail
# Configuration
APP_NAME="myapp"
REMOTE_HOST="build.example.com"
REMOTE_DIR="/opt/builds/$APP_NAME"
REGISTRY="registry.example.com"
K8S_NAMESPACE="myapp"
CF_ZONE_ID="your-cloudflare-zone-id"
CF_API_TOKEN="your-cloudflare-api-token"
# Generate tag from git
TAG=$(git rev-parse --short HEAD)
IMAGE="$REGISTRY/$APP_NAME:$TAG"
echo "==> Syncing source to build server..."
rsync -az --delete \
--exclude='.git' \
--exclude='node_modules' \
--exclude='.next' \
./ "$REMOTE_HOST:$REMOTE_DIR/"
echo "==> Building Docker image remotely..."
ssh "$REMOTE_HOST" "cd $REMOTE_DIR && \
docker build \
--build-arg NEXT_PUBLIC_SITE_URL=https://myapp.example.com \
-t $IMAGE \
-t $REGISTRY/$APP_NAME:latest . && \
docker push $IMAGE && \
docker push $REGISTRY/$APP_NAME:latest"
echo "==> Updating Kubernetes deployment..."
ssh "$REMOTE_HOST" "kubectl set image \
deployment/$APP_NAME \
$APP_NAME=$IMAGE \
-n $K8S_NAMESPACE"
echo "==> Waiting for rollout..."
ssh "$REMOTE_HOST" "kubectl rollout status \
deployment/$APP_NAME \
-n $K8S_NAMESPACE \
--timeout=300s"
echo "==> Purging Cloudflare cache..."
curl -s -X POST \
"https://api.cloudflare.com/client/v4/zones/$CF_ZONE_ID/purge_cache" \
-H "Authorization: Bearer $CF_API_TOKEN" \
-H "Content-Type: application/json" \
--data '{"purge_everything":true}'
echo "==> Deployment of $TAG complete!"
Why rsync Instead of Git Clone?
- Speed: rsync only transfers changed files. A typical deployment syncs a few MB instead of cloning the entire repository.
- No Git dependency on the build server.
- Works with uncommitted changes during development (we do not do this in production, but it is useful for staging).
Why Build Remotely?
Building Docker images on a developer's laptop is slow, especially on macOS with Docker Desktop. Our build server is a dedicated Linux machine with fast SSD and native Docker, which builds images 3-5x faster.
2. K3s Cluster Configuration
Our K3s setup is deliberately simple:
# Install K3s without the default ingress controller
curl -sfL https://get.k3s.io | sh -s - \
--disable traefik \
--disable servicelb \
--write-kubeconfig-mode 644
We disable Traefik and the default service load balancer because all traffic arrives through Cloudflare Tunnel. There is no need for an ingress controller or external load balancer.
Deployment Manifest
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
namespace: myapp
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
terminationGracePeriodSeconds: 30
containers:
- name: myapp
image: registry.example.com/myapp:latest
ports:
- containerPort: 3000
env:
- name: NODE_ENV
value: "production"
- name: NODE_OPTIONS
value: "--max-old-space-size=384"
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: "1"
memory: 512Mi
readinessProbe:
httpGet:
path: /api/health
port: 3000
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
httpGet:
path: /api/health
port: 3000
initialDelaySeconds: 15
periodSeconds: 30
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 5"]
3. Cloudflare Integration
Cloudflare Tunnel
We run cloudflared as a Kubernetes deployment with 2 replicas for redundancy:
apiVersion: apps/v1
kind: Deployment
metadata:
name: cloudflared
namespace: myapp
spec:
replicas: 2
selector:
matchLabels:
app: cloudflared
template:
metadata:
labels:
app: cloudflared
spec:
containers:
- name: cloudflared
image: cloudflare/cloudflared:latest
args:
- tunnel
- --no-autoupdate
- run
- --token
- $(TUNNEL_TOKEN)
env:
- name: TUNNEL_TOKEN
valueFrom:
secretKeyRef:
name: cf-tunnel-token
key: token
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 200m
memory: 128Mi
CDN Caching Rules
We configure Cloudflare to cache static assets aggressively but bypass the cache for API routes and dynamic pages:
/_next/static/*: Cache for 1 year (immutable hashed filenames)./images/*,/fonts/*: Cache for 30 days./api/*: Bypass cache.- Everything else: Cache for 1 hour with
stale-while-revalidate.
Automatic Cache Purge After Deploy
The deploy script purges the entire Cloudflare cache after a successful rollout. For targeted purging, we can purge by prefix:
curl -s -X POST \
"https://api.cloudflare.com/client/v4/zones/$CF_ZONE_ID/purge_cache" \
-H "Authorization: Bearer $CF_API_TOKEN" \
-H "Content-Type: application/json" \
--data '{"prefixes":["myapp.example.com/_next/"]}'
4. Monitoring with Prometheus
We run Prometheus and Grafana on the K3s cluster to monitor application and cluster health.
Key Metrics We Track
- Request latency (P50, P95, P99) via Node.js metrics
- Pod CPU and memory usage via kube-state-metrics
- Pod restart count (indicates OOM kills or crash loops)
- HTTP error rates (5xx responses)
- Deployment rollout duration
ServiceMonitor for Next.js
We expose metrics from the Next.js application using a custom endpoint:
// app/api/metrics/route.ts
import { NextResponse } from "next/server";
let requestCount = 0;
export async function GET() {
requestCount++;
const memUsage = process.memoryUsage();
const metrics = [
`# HELP nodejs_heap_used_bytes Node.js heap used`,
`# TYPE nodejs_heap_used_bytes gauge`,
`nodejs_heap_used_bytes ${memUsage.heapUsed}`,
`# HELP nodejs_heap_total_bytes Node.js heap total`,
`# TYPE nodejs_heap_total_bytes gauge`,
`nodejs_heap_total_bytes ${memUsage.heapTotal}`,
`# HELP http_requests_total Total HTTP requests`,
`# TYPE http_requests_total counter`,
`http_requests_total ${requestCount}`,
].join("\n");
return new NextResponse(metrics, {
headers: { "Content-Type": "text/plain" },
});
}
Alerting Rules
groups:
- name: nextjs-alerts
rules:
- alert: HighMemoryUsage
expr: container_memory_usage_bytes{container="myapp"} / container_spec_memory_limit_bytes{container="myapp"} > 0.85
for: 5m
labels:
severity: warning
annotations:
summary: "Next.js pod memory above 85%"
- alert: PodRestartLoop
expr: increase(kube_pod_container_status_restarts_total{container="myapp"}[1h]) > 3
for: 0m
labels:
severity: critical
annotations:
summary: "Next.js pod restarting frequently"
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.05
for: 2m
labels:
severity: critical
annotations:
summary: "Error rate above 5%"
5. Results
With this setup, our deployments typically complete in under 3 minutes from script execution to live traffic:
- rsync: 5-15 seconds
- Docker build: 60-120 seconds (cached layers)
- Image push: 10-30 seconds
- Rolling update: 30-60 seconds
- Cache purge: 2 seconds
Zero requests are dropped during the rolling update thanks to the readiness probe, preStop hook, and maxUnavailable: 0 configuration.
Conclusion
This pipeline has been serving us and our clients reliably for over a year. It is intentionally simple: a bash script, rsync, Docker, K3s, and Cloudflare. There are no complex CI/CD platforms to maintain. Every step is transparent and debuggable. When something goes wrong, we can SSH into the build server and run the commands manually. Simplicity is the ultimate sophistication in production infrastructure.