Hands-on Kubernetes & DevOpsPart 2 of 5 — open for all chapters
January 22, 2026
Production Observability on Kubernetes: Prometheus, Grafana, and the Three Golden Signals
Instrument a Node.js app, deploy Prometheus and Grafana with Helm, and build dashboards around latency, traffic, and errors.
Quick navigation
- Why Observability Matters Before You Need It
- The Three Golden Signals
- Step 1 — Instrument Your App
- Step 2 — Deploy the Monitoring Stack with Helm
- Step 3 — Wire Prometheus to Your App
- Step 4 — PromQL: Querying Your Metrics
- Step 5 — Build the Dashboard
- The Production Reality Check
- What's in the
/metricsOutput - Key Takeaways
- What's Next
Why Observability Matters Before You Need It
There's a rule in SRE: you don't get to be surprised by a production incident twice for the same reason. The first time, you scramble. The second time, you should already have a dashboard that shows you exactly what went wrong before users notice.
Most tutorials show you how to run Prometheus. This one shows you how to wire it to your actual app, write PromQL queries that answer real questions, and build Grafana dashboards you'd actually use during an on-call incident.
Stack: Node.js prom-client Prometheus Grafana kube-prometheus-stack Helm Kubernetes
The Three Golden Signals
Google's SRE book defines four golden signals for monitoring any service. Three of them apply directly to a web API:
Latency — how long are requests taking?
Traffic — how many requests per second?
Errors — what percentage are failing?
The fourth — saturation — is how full your system is (memory, CPU). Every dashboard you build for a production service should answer these four questions at a glance. Everything else is secondary.
Step 1 — Instrument Your App
Before Prometheus can scrape your app, your app needs to expose metrics. The standard is a /metrics endpoint that returns data in Prometheus's text format.
In Node.js, prom-client handles this:
npm install prom-client
Add instrumentation to your Express app:
const promClient = require('prom-client');
const register = new promClient.Registry();
// Default metrics — Node.js internals (event loop, memory, CPU, GC)
promClient.collectDefaultMetrics({ register });
// Counter — total requests, labeled by method, route, and status code
const httpRequestCounter = new promClient.Counter({
name: 'http_requests_total',
help: 'Total number of HTTP requests',
labelNames: ['method', 'route', 'status_code'],
registers: [register],
});
// Histogram — request duration with configurable buckets
// Buckets define the resolution of your latency data
const httpRequestDuration = new promClient.Histogram({
name: 'http_request_duration_seconds',
help: 'HTTP request duration in seconds',
labelNames: ['method', 'route', 'status_code'],
buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1],
registers: [register],
});
// Middleware — automatically records every request
app.use((req, res, next) => {
const end = httpRequestDuration.startTimer();
res.on('finish', () => {
const labels = {
method: req.method,
route: req.path,
status_code: res.statusCode,
};
httpRequestCounter.inc(labels);
end(labels);
});
next();
});
// The endpoint Prometheus scrapes
app.get('/metrics', async (req, res) => {
res.set('Content-Type', register.contentType);
res.end(await register.metrics());
});
Why a Histogram instead of a Summary?
Histograms store raw bucket counts and let Prometheus calculate percentiles at query time. Summaries calculate percentiles in the app itself and can't be aggregated across multiple instances. With two pods running, a Histogram gives you the p99 across all pods combined. A Summary would give you two separate p99s with no way to merge them. Always use Histograms for latency.
Verify it works:
curl http://localhost:3000/metrics
You'll see output like:
# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET",route="/health",status_code="200"} 42
# HELP http_request_duration_seconds HTTP request duration in seconds
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{le="0.001",...} 38
Your app is now speaking Prometheus's language.
Step 2 — Deploy the Monitoring Stack with Helm
The kube-prometheus-stack Helm chart is the industry standard way to run Prometheus on Kubernetes. It bundles Prometheus, Grafana, Alertmanager, and pre-built dashboards for cluster health in a single install.
helm repo add prometheus-community \
https://prometheus-community.github.io/helm-charts
helm repo update
kubectl create namespace monitoring
helm install kube-prometheus-stack \
prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--set prometheus.prometheusSpec.scrapeInterval=15s \
--set grafana.adminPassword=devops123
Wait for all pods to reach Running state:
kubectl get pods -n monitoring --watch
You'll see pods for the operator, Prometheus itself, Grafana, Alertmanager, and node-exporter. The operator is key — it watches for ServiceMonitor resources and automatically updates Prometheus's scrape config without restarting anything.
Access Grafana:
kubectl port-forward -n monitoring \
service/kube-prometheus-stack-grafana 3001:80
Open http://localhost:3001, login with admin / devops123. You'll find pre-built dashboards for node CPU, memory, pod counts, and Kubernetes internals — all working immediately without any configuration. That's the power of the chart.
Step 3 — Wire Prometheus to Your App
Prometheus doesn't automatically scrape every service in your cluster. You tell it what to scrape using a ServiceMonitor — a custom Kubernetes resource that the operator watches.
First, add a named port to your Service:
apiVersion: v1
kind: Service
metadata:
name: my-api-service
labels:
app: my-api
spec:
selector:
app: my-api
ports:
- name: http # ServiceMonitor references this name
port: 80
targetPort: 3000
type: ClusterIP
Then create the ServiceMonitor:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: my-api-monitor
namespace: monitoring
labels:
release: kube-prometheus-stack # must match the Helm release name
spec:
namespaceSelector:
matchNames:
- default
selector:
matchLabels:
app: my-api
endpoints:
- port: http
path: /metrics
interval: 15s
The release: kube-prometheus-stack label is the critical detail most tutorials miss. Without it, the Prometheus operator ignores your ServiceMonitor entirely. The operator only picks up ServiceMonitors that match its label selector — which defaults to the Helm release name.
Verify Prometheus found your app:
kubectl port-forward -n monitoring \
service/kube-prometheus-stack-prometheus 9090:9090
Open http://localhost:9090 → Status → Targets. Look for your app listed with State: UP.
Step 4 — PromQL: Querying Your Metrics
PromQL is Prometheus's query language. Three patterns cover 80% of what you'll use in production.
Request rate — traffic golden signal:
sum(rate(http_requests_total[5m])) by (route)
rate() calculates per-second rate over a time window. [5m] means "look at the last 5 minutes." sum() by (route) groups results by endpoint so you can see which routes are busiest.
Error rate — errors golden signal:
sum(rate(http_requests_total{status_code=~"5.."}[5m]))
/
sum(rate(http_requests_total[5m]))
* 100
Filter to 5xx responses, divide by total traffic, multiply by 100 for a percentage. This is your error rate SLI — the number you'd put in an SLO like "error rate below 0.1% for 99.9% of the month."
p99 latency — latency golden signal:
histogram_quantile(0.99,
sum(rate(http_request_duration_seconds_bucket[5m])) by (le, route)
)
histogram_quantile(0.99, ...) calculates the 99th percentile from bucket data. The le label is required — it's the "less than or equal" bucket boundary that defines the histogram shape. This tells you: 99% of requests finish faster than this value. It's what your slowest users experience.
Pod memory:
container_memory_working_set_bytes{
namespace="default",
pod=~"my-api-.*"
}
working_set_bytes is the metric the kubelet uses for OOM decisions — it's the number that matters for memory limits. Watch this trend upward over time to catch memory leaks before they cause OOMKilled events.
Step 5 — Build the Dashboard
In Grafana, create a new dashboard with four panels. For each panel, use Time series visualization — it's the right choice for any metric that changes over time.
One setting that makes dashboards actually readable: set the Unit on each panel under Standard options:
-
Latency panels →
seconds (s)— Grafana auto-scales to ms -
Memory panels →
bytes (IEC)— shows MiB/GiB instead of raw bytes -
Percentage panels →
Percent (0-100)
Without units, you see 0.00312 instead of 3.12ms. The unit setting is the difference between a dashboard someone uses and one someone ignores.
The resulting dashboard answers the four golden signals at a glance:
┌─────────────────────┬─────────────────────┐
│ Request Rate │ Error Rate │
│ req/sec by route │ % of 5xx traffic │
├─────────────────────┼─────────────────────┤
│ p99 Latency │ Pod Memory │
│ ms by route │ MiB per pod │
└─────────────────────┴─────────────────────┘
Generate traffic to see it in action:
for i in {1..50}; do
curl -s http://localhost:8080/ > /dev/null
curl -s http://localhost:8080/health > /dev/null
done
Watch the request rate panel update in real time.
The Production Reality Check
Running this locally exposed a real-world issue worth documenting: minikube on Apple Silicon is prone to clock skew when the node goes not-ready. The symptom is Prometheus logs full of err="out of bounds" — it's rejecting its own samples as being from the future because the node's clock drifted.
The fix is a clean restart with the docker driver:
minikube stop && minikube delete
minikube start --driver=docker --memory=3500 --cpus=4
In production on real cloud nodes this doesn't happen — cloud VMs sync to NTP automatically. But on a laptop, if you ever see out of bounds errors in Prometheus logs, clock skew is the first thing to check.
What's in the /metrics Output
After instrumenting your app, hitting /metrics returns more than just your custom metrics. collectDefaultMetrics() adds Node.js internals automatically:
nodejs_eventloop_lag_seconds — event loop delay (>100ms = problem)
nodejs_active_handles_total — open connections and timers
process_heap_bytes — V8 heap usage
process_cpu_seconds_total — CPU time consumed
These default metrics often catch problems before your custom ones do. A memory leak shows up in process_heap_bytes trending upward long before your app starts failing health checks.
Key Takeaways
-
Instrument first, dashboard second — Prometheus can't show you what your app doesn't expose
-
Histograms over Summaries — histograms aggregate across pod replicas, summaries don't
-
The
releaselabel on ServiceMonitor is not optional — without it the operator ignores you -
Set units on every Grafana panel —
0.00312and3.12msare the same number, one is usable -
working_set_bytesis the memory metric that matters — it's what kubelet uses for OOM decisions -
p99 is what your worst users experience — p50 (median) hides tail latency problems
What's Next
The observability foundation is in place. The natural next step is alerting — right now you can see problems in Grafana, but only if you're looking. Alertmanager can page you before users notice. After that, GitOps with ArgoCD replaces manual kubectl apply with Git-driven deployments where the cluster automatically syncs to your repository.
The monitoring is running. Now make it proactive.
Part of a hands-on DevOps learning series. Code available at github.com/kaungmyathan22/golang-k8s-portfolio.