Production Observability on Kubernetes: Prometheus, Grafana, and the Three Golden Signals

Why Observability Matters Before You Need It
The Three Golden Signals
Step 1 — Instrument Your App
Step 2 — Deploy the Monitoring Stack with Helm
Step 3 — Wire Prometheus to Your App
Step 4 — PromQL: Querying Your Metrics
Step 5 — Build the Dashboard
The Production Reality Check
What's in the /metrics Output
Key Takeaways
What's Next

Why Observability Matters Before You Need It

There's a rule in SRE: you don't get to be surprised by a production incident twice for the same reason. The first time, you scramble. The second time, you should already have a dashboard that shows you exactly what went wrong before users notice.

Most tutorials show you how to run Prometheus. This one shows you how to wire it to your actual app, write PromQL queries that answer real questions, and build Grafana dashboards you'd actually use during an on-call incident.

Stack: Node.js prom-client Prometheus Grafana kube-prometheus-stack Helm Kubernetes

The Three Golden Signals

Google's SRE book defines four golden signals for monitoring any service. Three of them apply directly to a web API:

Latency — how long are requests taking?
Traffic — how many requests per second?
Errors — what percentage are failing?

The fourth — saturation — is how full your system is (memory, CPU). Every dashboard you build for a production service should answer these four questions at a glance. Everything else is secondary.

Step 1 — Instrument Your App

Before Prometheus can scrape your app, your app needs to expose metrics. The standard is a /metrics endpoint that returns data in Prometheus's text format.

In Node.js, prom-client handles this:

npm install prom-client

Add instrumentation to your Express app:

const promClient = require('prom-client');
const register = new promClient.Registry();
// Default metrics — Node.js internals (event loop, memory, CPU, GC)
promClient.collectDefaultMetrics({ register });
// Counter — total requests, labeled by method, route, and status code
const httpRequestCounter = new promClient.Counter({
name: 'http_requests_total',
help: 'Total number of HTTP requests',
labelNames: ['method', 'route', 'status_code'],
registers: [register],
});
// Histogram — request duration with configurable buckets
// Buckets define the resolution of your latency data
const httpRequestDuration = new promClient.Histogram({
name: 'http_request_duration_seconds',
help: 'HTTP request duration in seconds',
labelNames: ['method', 'route', 'status_code'],
buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1],
registers: [register],
});
// Middleware — automatically records every request
app.use((req, res, next) => {
const end = httpRequestDuration.startTimer();
res.on('finish', () => {
const labels = {
method: req.method,
route: req.path,
status_code: res.statusCode,
};
httpRequestCounter.inc(labels);
end(labels);
});
next();
});
// The endpoint Prometheus scrapes
app.get('/metrics', async (req, res) => {
res.set('Content-Type', register.contentType);
res.end(await register.metrics());
});

Why a Histogram instead of a Summary?

Histograms store raw bucket counts and let Prometheus calculate percentiles at query time. Summaries calculate percentiles in the app itself and can't be aggregated across multiple instances. With two pods running, a Histogram gives you the p99 across all pods combined. A Summary would give you two separate p99s with no way to merge them. Always use Histograms for latency.

Verify it works:

curl http://localhost:3000/metrics

You'll see output like:

# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET",route="/health",status_code="200"} 42
# HELP http_request_duration_seconds HTTP request duration in seconds
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{le="0.001",...} 38

Your app is now speaking Prometheus's language.

Step 2 — Deploy the Monitoring Stack with Helm

The kube-prometheus-stack Helm chart is the industry standard way to run Prometheus on Kubernetes. It bundles Prometheus, Grafana, Alertmanager, and pre-built dashboards for cluster health in a single install.

helm repo add prometheus-community \
https://prometheus-community.github.io/helm-charts
helm repo update
kubectl create namespace monitoring
helm install kube-prometheus-stack \
prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--set prometheus.prometheusSpec.scrapeInterval=15s \
--set grafana.adminPassword=devops123

Wait for all pods to reach Running state:

kubectl get pods -n monitoring --watch

You'll see pods for the operator, Prometheus itself, Grafana, Alertmanager, and node-exporter. The operator is key — it watches for ServiceMonitor resources and automatically updates Prometheus's scrape config without restarting anything.

Access Grafana:

kubectl port-forward -n monitoring \
service/kube-prometheus-stack-grafana 3001:80

Open http://localhost:3001, login with admin / devops123. You'll find pre-built dashboards for node CPU, memory, pod counts, and Kubernetes internals — all working immediately without any configuration. That's the power of the chart.

Step 3 — Wire Prometheus to Your App

Prometheus doesn't automatically scrape every service in your cluster. You tell it what to scrape using a ServiceMonitor — a custom Kubernetes resource that the operator watches.

First, add a named port to your Service:

apiVersion: v1
kind: Service
metadata:
  name: my-api-service
  labels:
    app: my-api
spec:
  selector:
    app: my-api
  ports:
    - name: http # ServiceMonitor references this name
      port: 80
      targetPort: 3000
  type: ClusterIP

Then create the ServiceMonitor:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-api-monitor
  namespace: monitoring
  labels:
    release: kube-prometheus-stack # must match the Helm release name
spec:
  namespaceSelector:
    matchNames:
      - default
  selector:
    matchLabels:
      app: my-api
  endpoints:
    - port: http
      path: /metrics
      interval: 15s

The release: kube-prometheus-stack label is the critical detail most tutorials miss. Without it, the Prometheus operator ignores your ServiceMonitor entirely. The operator only picks up ServiceMonitors that match its label selector — which defaults to the Helm release name.

Verify Prometheus found your app:

kubectl port-forward -n monitoring \
service/kube-prometheus-stack-prometheus 9090:9090

Open http://localhost:9090 → Status → Targets. Look for your app listed with State: UP.

Step 4 — PromQL: Querying Your Metrics

PromQL is Prometheus's query language. Three patterns cover 80% of what you'll use in production.

Request rate — traffic golden signal:

sum(rate(http_requests_total[5m])) by (route)

rate() calculates per-second rate over a time window. [5m] means "look at the last 5 minutes." sum() by (route) groups results by endpoint so you can see which routes are busiest.

Error rate — errors golden signal:

sum(rate(http_requests_total{status_code=~"5.."}[5m]))
/
sum(rate(http_requests_total[5m]))
* 100

Filter to 5xx responses, divide by total traffic, multiply by 100 for a percentage. This is your error rate SLI — the number you'd put in an SLO like "error rate below 0.1% for 99.9% of the month."

p99 latency — latency golden signal:

histogram_quantile(0.99,
sum(rate(http_request_duration_seconds_bucket[5m])) by (le, route)
)

histogram_quantile(0.99, ...) calculates the 99th percentile from bucket data. The le label is required — it's the "less than or equal" bucket boundary that defines the histogram shape. This tells you: 99% of requests finish faster than this value. It's what your slowest users experience.

Pod memory:

container_memory_working_set_bytes{
namespace="default",
pod=~"my-api-.*"
}

working_set_bytes is the metric the kubelet uses for OOM decisions — it's the number that matters for memory limits. Watch this trend upward over time to catch memory leaks before they cause OOMKilled events.

Step 5 — Build the Dashboard

In Grafana, create a new dashboard with four panels. For each panel, use Time series visualization — it's the right choice for any metric that changes over time.

One setting that makes dashboards actually readable: set the Unit on each panel under Standard options:

Latency panels → seconds (s) — Grafana auto-scales to ms
Memory panels → bytes (IEC) — shows MiB/GiB instead of raw bytes
Percentage panels → Percent (0-100)

Without units, you see 0.00312 instead of 3.12ms. The unit setting is the difference between a dashboard someone uses and one someone ignores.

The resulting dashboard answers the four golden signals at a glance:

┌─────────────────────┬─────────────────────┐
│ Request Rate │ Error Rate │
│ req/sec by route │ % of 5xx traffic │
├─────────────────────┼─────────────────────┤
│ p99 Latency │ Pod Memory │
│ ms by route │ MiB per pod │
└─────────────────────┴─────────────────────┘

Generate traffic to see it in action:

for i in {1..50}; do
curl -s http://localhost:8080/ > /dev/null
curl -s http://localhost:8080/health > /dev/null
done

Watch the request rate panel update in real time.

The Production Reality Check

Running this locally exposed a real-world issue worth documenting: minikube on Apple Silicon is prone to clock skew when the node goes not-ready. The symptom is Prometheus logs full of err="out of bounds" — it's rejecting its own samples as being from the future because the node's clock drifted.

The fix is a clean restart with the docker driver:

minikube stop && minikube delete
minikube start --driver=docker --memory=3500 --cpus=4

In production on real cloud nodes this doesn't happen — cloud VMs sync to NTP automatically. But on a laptop, if you ever see out of bounds errors in Prometheus logs, clock skew is the first thing to check.

What's in the `/metrics` Output

After instrumenting your app, hitting /metrics returns more than just your custom metrics. collectDefaultMetrics() adds Node.js internals automatically:

nodejs_eventloop_lag_seconds — event loop delay (>100ms = problem)
nodejs_active_handles_total — open connections and timers
process_heap_bytes — V8 heap usage
process_cpu_seconds_total — CPU time consumed

These default metrics often catch problems before your custom ones do. A memory leak shows up in process_heap_bytes trending upward long before your app starts failing health checks.

Key Takeaways

Instrument first, dashboard second — Prometheus can't show you what your app doesn't expose
Histograms over Summaries — histograms aggregate across pod replicas, summaries don't
The release label on ServiceMonitor is not optional — without it the operator ignores you
Set units on every Grafana panel — 0.00312 and 3.12ms are the same number, one is usable
working_set_bytes is the memory metric that matters — it's what kubelet uses for OOM decisions
p99 is what your worst users experience — p50 (median) hides tail latency problems

What's Next

The observability foundation is in place. The natural next step is alerting — right now you can see problems in Grafana, but only if you're looking. Alertmanager can page you before users notice. After that, GitOps with ArgoCD replaces manual kubectl apply with Git-driven deployments where the cluster automatically syncs to your repository.

The monitoring is running. Now make it proactive.

Part of a hands-on DevOps learning series. Code available at github.com/kaungmyathan22/golang-k8s-portfolio.

Quick navigation