March 4, 2026

Building an Async Job Queue Worker in Go: Redis, Worker Pools, and Production Patterns

Background jobs in Go with Redis-backed queues, goroutine worker pools, graceful shutdown, and patterns that hold up in production.

Quick navigation

System Architecture

┌──────────────┐     ┌──────────────────┐     ┌──────────────────┐
│ HTTP Client  │ --> │ Queue Manager    │ --> │ Worker Pools     │
│ POST /jobs   │     │ (Redis Backend)  │     │ (Goroutines)     │
└──────────────┘     └──────────────────┘     └──────────────────┘
       ↓                      ↓                         ↓
  Returns JobID         Stores in Redis          Processes tasks:
  {"job_id":"uuid"}     • pending_jobs            • Email sending
                        • running_jobs            • Image resize
  GET /jobs/:id         • completed_jobs          • Analytics
  → status check        • failed_jobs             • Report gen
                        • dlq (dead letter)

Key Design Decisions:

  1. Redis for persistence — Jobs survive restarts, simple list-based queue
  2. Worker pool pattern — Fixed goroutines prevent resource exhaustion
  3. Exponential backoff — Failed jobs retry intelligently
  4. Dead letter queue — Failed jobs aren't lost, they're quarantined
  5. Prometheus metrics — Observable system behavior

Tech Stack

ComponentTechnologyWhy This Choice
LanguageGo 1.25Fast, strong typing, excellent concurrency
Queue BackendRedis 7In-memory speed with persistence
ConcurrencyGoroutines + ChannelsLightweight, efficient communication
Worker PatternPool of N workersBounded resource usage
Failure HandlingRetry + DLQResilient to transient failures
MonitoringPrometheusIndustry standard metrics
ContainerizationDocker ComposeReproducible local/prod environments

Development Timeline: 6 Days

Day 1-2: Core Job Model and In-Memory Queue

Started with the fundamental data structure:

// internal/queue/job.go
type JobStatus string

const (
    StatusPending   JobStatus = "pending"
    StatusRunning   JobStatus = "running"
    StatusCompleted JobStatus = "completed"
    StatusFailed    JobStatus = "failed"
)

type Job struct {
    ID          string     `json:"id"`
    Type        string     `json:"type"`        // email, resize, analytics
    Payload     []byte     `json:"payload"`     // Job-specific data
    Status      JobStatus  `json:"status"`
    CreatedAt   time.Time  `json:"created_at"`
    UpdatedAt   time.Time  `json:"updated_at"`
    Attempt     int        `json:"attempt"`     // Retry counter
    Error       *string    `json:"error,omitempty"`
    StartedAt   *time.Time `json:"started_at,omitempty"`
    CompletedAt *time.Time `json:"completed_at,omitempty"`
}

Why pointer fields for Error, StartedAt, CompletedAt?

  • Nullable in JSON — null vs empty string distinction
  • Clear semantic: job hasn't started vs started at zero time

Initial In-Memory Queue:

type MemoryQueue struct {
    jobs    map[string]*Job
    pending []string
    mu      sync.RWMutex
}

func (mq *MemoryQueue) Create(ctx context.Context, job *Job) error {
    mq.mu.Lock()
    defer mq.mu.Unlock()

    if job.ID == "" {
        job.ID = uuid.New().String()
    }

    job.Status = StatusPending
    job.CreatedAt = time.Now()
    job.UpdatedAt = time.Now()

    mq.jobs[job.ID] = job
    mq.pending = append(mq.pending, job.ID)

    return nil
}

Problem discovered: Restart server → all jobs vanish. Not production-ready.


Day 3-4: Redis Persistence Layer

Migrated to Redis for durable job storage:

// internal/queue/redis_queue.go
type RedisQueue struct {
    client  *redis.Client
    maxSize int
}

const (
    pendingJobsKey   = "queue:pending"
    runningJobsKey   = "queue:running"
    completedJobsKey = "queue:completed"
    failedJobsKey    = "queue:failed"
    jobPrefix        = "job:"
)

func (rq *RedisQueue) Create(ctx context.Context, job *Job) error {
    // Check queue capacity
    count, err := rq.client.LLen(ctx, pendingJobsKey).Result()
    if err != nil {
        return fmt.Errorf("failed to check queue size: %w", err)
    }

    if count >= int64(rq.maxSize) {
        return fmt.Errorf("queue is full (max %d jobs)", rq.maxSize)
    }

    // Generate ID if not provided
    if job.ID == "" {
        job.ID = uuid.New().String()
    }

    job.Status = StatusPending
    job.CreatedAt = time.Now()
    job.UpdatedAt = time.Now()

    // Store job data
    jobKey := jobPrefix + job.ID
    jobData, err := json.Marshal(job)
    if err != nil {
        return fmt.Errorf("failed to marshal job: %w", err)
    }

    // Store with 7-day expiration
    if err := rq.client.Set(ctx, jobKey, jobData, 7*24*time.Hour).Err(); err != nil {
        return fmt.Errorf("failed to store job: %w", err)
    }

    // Add to pending queue
    if err := rq.client.RPush(ctx, pendingJobsKey, job.ID).Err(); err != nil {
        return fmt.Errorf("failed to enqueue job: %w", err)
    }

    return nil
}

Redis Data Model:

Keys:
  queue:pending     → LIST of job IDs waiting for processing
  queue:running     → LIST of job IDs currently being processed
  queue:completed   → LIST of completed job IDs
  queue:failed      → LIST of failed job IDs
  job:{uuid}        → HASH of job data (JSON)
  dlq:{uuid}        → HASH of dead-letter job data

Dequeue with Atomic Move:

func (rq *RedisQueue) Dequeue(ctx context.Context) (*Job, error) {
    // Atomically move from pending to running
    jobID, err := rq.client.BLMove(
        ctx,
        pendingJobsKey,
        runningJobsKey,
        "LEFT",
        "RIGHT",
        5*time.Second, // Block for 5 seconds
    ).Result()

    if err == redis.Nil {
        return nil, nil // No jobs available
    }
    if err != nil {
        return nil, err
    }

    // Fetch job data
    jobKey := jobPrefix + jobID
    jobData, err := rq.client.Get(ctx, jobKey).Result()
    if err != nil {
        return nil, fmt.Errorf("job data not found: %w", err)
    }

    var job Job
    if err := json.Unmarshal([]byte(jobData), &job); err != nil {
        return nil, err
    }

    job.Status = StatusRunning
    now := time.Now()
    job.StartedAt = &now
    job.UpdatedAt = now

    // Update job status
    updatedData, _ := json.Marshal(job)
    rq.client.Set(ctx, jobKey, updatedData, 7*24*time.Hour)

    return &job, nil
}

Critical insight: BLMove (blocking list move) is atomic. Job can't be picked up by two workers simultaneously.


Day 5: Worker Pool Implementation

Producer-consumer pattern with bounded concurrency:

// internal/worker/pool.go
type WorkerPool struct {
    numWorkers int
    queue      queue.Queue
    workers    []chan *queue.Job
    wg         sync.WaitGroup
    ctx        context.Context
    cancel     context.CancelFunc
    logger     *zap.Logger
}

func NewWorkerPool(numWorkers int, q queue.Queue, logger *zap.Logger) *WorkerPool {
    ctx, cancel := context.WithCancel(context.Background())

    wp := &WorkerPool{
        numWorkers: numWorkers,
        queue:      q,
        workers:    make([]chan *queue.Job, numWorkers),
        ctx:        ctx,
        cancel:     cancel,
        logger:     logger,
    }

    // Start worker goroutines
    for i := 0; i < numWorkers; i++ {
        wp.workers[i] = make(chan *queue.Job, 10) // Buffered channel
        wp.wg.Add(1)
        go wp.worker(i, wp.workers[i])
    }

    return wp
}

func (wp *WorkerPool) Start() {
    wp.logger.Info("worker pool started", zap.Int("num_workers", wp.numWorkers))

    for {
        select {
        case <-wp.ctx.Done():
            wp.logger.Info("worker pool shutting down")
            return
        default:
            // Dequeue job from Redis
            job, err := wp.queue.Dequeue(wp.ctx)
            if err != nil {
                wp.logger.Error("failed to dequeue job", zap.Error(err))
                time.Sleep(1 * time.Second)
                continue
            }

            if job == nil {
                // No jobs available, short sleep
                time.Sleep(100 * time.Millisecond)
                continue
            }

            // Round-robin assignment to workers
            workerID := rand.Intn(wp.numWorkers)
            wp.workers[workerID] <- job
        }
    }
}

Worker goroutine:

func (wp *WorkerPool) worker(id int, jobs <-chan *queue.Job) {
    defer wp.wg.Done()

    wp.logger.Info("worker started", zap.Int("worker_id", id))

    for job := range jobs {
        wp.logger.Info("processing job",
            zap.Int("worker_id", id),
            zap.String("job_id", job.ID),
            zap.String("job_type", job.Type),
        )

        if err := wp.processJob(job); err != nil {
            wp.handleJobFailure(job, err)
        } else {
            wp.handleJobSuccess(job)
        }
    }

    wp.logger.Info("worker stopped", zap.Int("worker_id", id))
}

Job Processing Logic:

func (wp *WorkerPool) processJob(job *queue.Job) error {
    switch job.Type {
    case "email":
        return wp.processEmailJob(job)
    case "image_resize":
        return wp.processImageResizeJob(job)
    case "analytics":
        return wp.processAnalyticsJob(job)
    default:
        return fmt.Errorf("unknown job type: %s", job.Type)
    }
}

func (wp *WorkerPool) processEmailJob(job *queue.Job) error {
    var payload struct {
        To      string `json:"to"`
        Subject string `json:"subject"`
        Body    string `json:"body"`
    }

    if err := json.Unmarshal(job.Payload, &payload); err != nil {
        return fmt.Errorf("invalid email payload: %w", err)
    }

    // Simulate email sending
    time.Sleep(time.Duration(rand.Intn(3)) * time.Second)

    // In production: integrate with SendGrid, AWS SES, etc.
    wp.logger.Info("email sent",
        zap.String("to", payload.To),
        zap.String("subject", payload.Subject),
    )

    return nil
}

Day 6: Retry Logic and Dead Letter Queue

Exponential Backoff with Jitter:

const (
    MaxRetries = 3
    BaseDelay  = 1 * time.Second
)

func calculateBackoff(attempt int) time.Duration {
    base := float64(BaseDelay)
    multiplier := math.Pow(2, float64(attempt-1)) // 2^0, 2^1, 2^2
    jitter := rand.Float64() * 0.1 * base         // ±10% randomness

    delay := time.Duration(base*multiplier + jitter)

    // Cap at 30 seconds
    if delay > 30*time.Second {
        delay = 30 * time.Second
    }

    return delay
}

Why jitter? Prevents thundering herd when many jobs fail simultaneously.

Retry attempts: 1s → 2s → 4s (with randomness)

Failure Handler:

func (wp *WorkerPool) handleJobFailure(job *queue.Job, err error) {
    job.Attempt++
    errMsg := err.Error()
    job.Error = &errMsg
    job.UpdatedAt = time.Now()

    if job.Attempt >= MaxRetries {
        wp.logger.Error("job failed permanently",
            zap.String("job_id", job.ID),
            zap.Int("attempts", job.Attempt),
            zap.Error(err),
        )

        // Move to dead letter queue
        wp.moveToDeadLetterQueue(job)
        jobsMovedToDLQ.Inc()
    } else {
        wp.logger.Warn("job failed, retrying",
            zap.String("job_id", job.ID),
            zap.Int("attempt", job.Attempt),
            zap.Error(err),
        )

        // Schedule retry with backoff
        backoff := calculateBackoff(job.Attempt)
        time.Sleep(backoff)

        // Re-enqueue
        job.Status = queue.StatusPending
        wp.queue.Create(wp.ctx, job)

        jobsRetried.Inc()
    }
}

Dead Letter Queue Implementation:

func (wp *WorkerPool) moveToDeadLetterQueue(job *queue.Job) {
    ctx := context.Background()
    dlqKey := "dlq:" + job.ID

    job.Status = queue.StatusFailed
    now := time.Now()
    job.CompletedAt = &now

    data, err := json.Marshal(job)
    if err != nil {
        wp.logger.Error("failed to marshal DLQ job", zap.Error(err))
        return
    }

    // Store in DLQ with no expiration (manual intervention needed)
    if err := wp.queue.(*queue.RedisQueue).Client().Set(ctx, dlqKey, data, 0).Err(); err != nil {
        wp.logger.Error("failed to move job to DLQ", zap.Error(err))
        return
    }

    // Add to failed jobs list
    wp.queue.(*queue.RedisQueue).Client().RPush(ctx, "queue:failed", job.ID)

    wp.logger.Info("job moved to DLQ",
        zap.String("job_id", job.ID),
        zap.String("error", *job.Error),
    )
}

DLQ allows:

  • Manual inspection of failed jobs
  • Debugging payload issues
  • Re-queueing after fixes

API Layer

HTTP Server Setup:

// cmd/api/main.go
func main() {
    // Load config
    cfg := config.Load()

    // Initialize logger
    logger, _ := zap.NewProduction()
    defer logger.Sync()

    // Connect to Redis
    redisClient := redis.NewClient(&redis.Options{
        Addr:     cfg.RedisURL,
        Password: cfg.RedisPassword,
        DB:       0,
    })

    // Test connection
    ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
    defer cancel()

    if err := redisClient.Ping(ctx).Err(); err != nil {
        logger.Fatal("failed to connect to Redis", zap.Error(err))
    }

    // Initialize queue
    queue := queue.NewRedisQueue(redisClient, 1000) // Max 1000 pending jobs

    // Start worker pool
    workerPool := worker.NewWorkerPool(cfg.NumWorkers, queue, logger)
    go workerPool.Start()

    // Setup HTTP server
    router := chi.NewRouter()
    router.Use(middleware.Logger)
    router.Use(middleware.Recoverer)

    // Routes
    router.Post("/jobs", handleCreateJob(queue, logger))
    router.Get("/jobs/{id}", handleGetJob(queue, logger))
    router.Get("/stats", handleStats(queue, logger))
    router.Get("/health", handleHealth(redisClient))
    router.Get("/metrics", promhttp.Handler().ServeHTTP)

    // Start server
    srv := &http.Server{
        Addr:    ":" + cfg.Port,
        Handler: router,
    }

    logger.Info("server starting", zap.String("port", cfg.Port))
    if err := srv.ListenAndServe(); err != nil {
        logger.Fatal("server failed", zap.Error(err))
    }
}

Create Job Endpoint:

func handleCreateJob(q queue.Queue, logger *zap.Logger) http.HandlerFunc {
    type request struct {
        Type    string          `json:"type"`
        Payload json.RawMessage `json:"payload"`
    }

    type response struct {
        JobID string `json:"job_id"`
    }

    return func(w http.ResponseWriter, r *http.Request) {
        var req request
        if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
            http.Error(w, "invalid request", http.StatusBadRequest)
            return
        }

        job := &queue.Job{
            Type:    req.Type,
            Payload: req.Payload,
        }

        if err := q.Create(r.Context(), job); err != nil {
            logger.Error("failed to create job", zap.Error(err))
            http.Error(w, "failed to create job", http.StatusInternalServerError)
            return
        }

        jobsCreated.Inc()

        w.Header().Set("Content-Type", "application/json")
        w.WriteHeader(http.StatusCreated)
        json.NewEncoder(w).Encode(response{JobID: job.ID})
    }
}

Stats Endpoint:

func handleStats(q queue.Queue, logger *zap.Logger) http.HandlerFunc {
    type stats struct {
        TotalJobs     int `json:"total_jobs"`
        PendingJobs   int `json:"pending_jobs"`
        RunningJobs   int `json:"running_jobs"`
        CompletedJobs int `json:"completed_jobs"`
        FailedJobs    int `json:"failed_jobs"`
        DLQSize       int `json:"dlq_size"`
    }

    return func(w http.ResponseWriter, r *http.Request) {
        ctx := r.Context()
        redisQueue := q.(*queue.RedisQueue)

        pending, _ := redisQueue.Client().LLen(ctx, "queue:pending").Result()
        running, _ := redisQueue.Client().LLen(ctx, "queue:running").Result()
        completed, _ := redisQueue.Client().LLen(ctx, "queue:completed").Result()
        failed, _ := redisQueue.Client().LLen(ctx, "queue:failed").Result()

        // Count DLQ entries
        dlqKeys, _ := redisQueue.Client().Keys(ctx, "dlq:*").Result()

        s := stats{
            TotalJobs:     int(pending + running + completed + failed),
            PendingJobs:   int(pending),
            RunningJobs:   int(running),
            CompletedJobs: int(completed),
            FailedJobs:    int(failed),
            DLQSize:       len(dlqKeys),
        }

        w.Header().Set("Content-Type", "application/json")
        json.NewEncoder(w).Encode(s)
    }
}

Observability and Metrics

Prometheus Metrics:

var (
    jobsCreated = promauto.NewCounter(prometheus.CounterOpts{
        Name: "jobs_created_total",
        Help: "Total number of jobs created",
    })

    jobsProcessed = promauto.NewCounter(prometheus.CounterOpts{
        Name: "jobs_processed_total",
        Help: "Total number of jobs successfully processed",
    })

    jobsFailed = promauto.NewCounter(prometheus.CounterOpts{
        Name: "jobs_failed_total",
        Help: "Total number of jobs that failed",
    })

    jobsRetried = promauto.NewCounter(prometheus.CounterOpts{
        Name: "jobs_retried_total",
        Help: "Total number of job retry attempts",
    })

    jobsMovedToDLQ = promauto.NewCounter(prometheus.CounterOpts{
        Name: "jobs_dlq_total",
        Help: "Total number of jobs moved to dead letter queue",
    })

    jobProcessingDuration = promauto.NewHistogram(prometheus.HistogramOpts{
        Name:    "job_processing_duration_seconds",
        Help:    "Time spent processing jobs",
        Buckets: prometheus.DefBuckets,
    })

    queueSize = promauto.NewGaugeVec(prometheus.GaugeOpts{
        Name: "queue_size",
        Help: "Number of jobs in each queue state",
    }, []string{"state"})
)

Metrics Collection:

func (wp *WorkerPool) worker(id int, jobs <-chan *queue.Job) {
    for job := range jobs {
        start := time.Now()

        if err := wp.processJob(job); err != nil {
            jobsFailed.Inc()
            wp.handleJobFailure(job, err)
        } else {
            jobsProcessed.Inc()
            jobProcessingDuration.Observe(time.Since(start).Seconds())
            wp.handleJobSuccess(job)
        }
    }
}

Health Check Endpoint:

func handleHealth(redisClient *redis.Client) http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        ctx, cancel := context.WithTimeout(r.Context(), 2*time.Second)
        defer cancel()

        // Check Redis connectivity
        if err := redisClient.Ping(ctx).Err(); err != nil {
            w.WriteHeader(http.StatusServiceUnavailable)
            json.NewEncoder(w).Encode(map[string]string{
                "status": "unhealthy",
                "error":  "redis connection failed",
            })
            return
        }

        w.Header().Set("Content-Type", "application/json")
        json.NewEncoder(w).Encode(map[string]string{
            "status": "healthy",
        })
    }
}

Docker Compose Setup

Complete development environment:

version: '3.8'

services:
  app:
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - "8081:8081"
    environment:
      - REDIS_URL=redis:6379
      - PORT=8081
      - NUM_WORKERS=5
    depends_on:
      redis:
        condition: service_healthy
    volumes:
      - .:/app
    restart: unless-stopped

  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    volumes:
      - redis_data:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 3s
      retries: 5
    restart: unless-stopped

  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
    restart: unless-stopped

volumes:
  redis_data:
  prometheus_data:

Prometheus Configuration:

# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'job-queue-worker'
    static_configs:
      - targets: ['app:8081']

Testing the System

Create Job:

curl -X POST http://localhost:8081/jobs \
  -H "Content-Type: application/json" \
  -d '{
    "type": "email",
    "payload": {
      "to": "user@example.com",
      "subject": "Welcome",
      "body": "Thanks for signing up!"
    }
  }'

# Response:
# {"job_id":"550e8400-e29b-41d4-a716-446655440000"}

Check Job Status:

curl http://localhost:8081/jobs/550e8400-e29b-41d4-a716-446655440000

# Response:
# {
#   "id": "550e8400-e29b-41d4-a716-446655440000",
#   "type": "email",
#   "status": "completed",
#   "created_at": "2026-04-07T10:30:00Z",
#   "started_at": "2026-04-07T10:30:01Z",
#   "completed_at": "2026-04-07T10:30:03Z",
#   "attempt": 1
# }

View Stats:

curl http://localhost:8081/stats

# Response:
# {
#   "total_jobs": 150,
#   "pending_jobs": 5,
#   "running_jobs": 3,
#   "completed_jobs": 140,
#   "failed_jobs": 2,
#   "dlq_size": 2
# }

Check Metrics:

curl http://localhost:8081/metrics | grep jobs_

# jobs_created_total 150
# jobs_processed_total 140
# jobs_failed_total 8
# jobs_retried_total 6
# jobs_dlq_total 2

Performance Benchmarks

Load Testing with vegeta:

echo "POST http://localhost:8081/jobs" | vegeta attack \
  -rate=100 \
  -duration=30s \
  -body='{"type":"email","payload":{"to":"test@example.com"}}' \
  -header="Content-Type: application/json" \
  | vegeta report

Results:

Metric5 Workers10 Workers20 Workers
Requests/sec959899
Latency (p50)45ms42ms40ms
Latency (p95)120ms98ms85ms
Latency (p99)180ms145ms125ms
Success Rate100%100%100%

Key Insight: Diminishing returns after 10 workers for this workload. Right-sizing matters.


Key Technical Lessons

1. Channel Buffer Size Matters

Problem: Unbuffered channels caused workers to block.

Solution: Buffer channels to smooth out bursts:

wp.workers[i] = make(chan *queue.Job, 10) // 10-item buffer

2. Redis Atomic Operations Prevent Race Conditions

Using BLMove instead of separate LPOP + RPUSH prevents job duplication:

// WRONG: Race condition possible
jobID := redis.LPop("pending")
redis.RPush("running", jobID)

// RIGHT: Atomic operation
jobID := redis.BLMove("pending", "running", "LEFT", "RIGHT", 5*time.Second)

3. Context Cancellation for Graceful Shutdown

func (wp *WorkerPool) Shutdown() {
    wp.cancel() // Signal all goroutines to stop

    // Close all worker channels
    for _, ch := range wp.workers {
        close(ch)
    }

    // Wait for all workers to finish
    wp.wg.Wait()

    wp.logger.Info("all workers shut down gracefully")
}

4. Exponential Backoff Prevents Cascading Failures

Fixed delay causes synchronized retries → thundering herd:

100 jobs fail at T=0
All retry at T=1
All fail again, retry at T=2
...system overloaded

Exponential backoff with jitter spreads retries:

Job 1: retry at 1.05s
Job 2: retry at 0.98s
Job 3: retry at 1.12s
...spread across time window

5. Dead Letter Queue is Non-Negotiable

Without DLQ, permanently failed jobs disappear. With DLQ:

  • Investigate failure patterns
  • Fix bugs in job processing logic
  • Manually re-queue after fixes
  • Audit for data quality issues

Interview Talking Points

When asked: "How do you handle background processing?"

"I built an asynchronous job queue system in Go using Redis for persistence and goroutine worker pools for concurrent processing. The system handles retry logic with exponential backoff, moves permanently failed jobs to a dead letter queue, and exposes Prometheus metrics for observability. Key challenges included preventing race conditions with atomic Redis operations, right-sizing the worker pool to avoid resource exhaustion, and implementing graceful shutdown to prevent job loss during deployments."

Follow-up questions prepared:

Q: How would you scale this to millions of jobs?

  • Horizontal scaling: Multiple worker instances reading from same Redis
  • Partitioning: Shard jobs by type across multiple Redis instances
  • Priority queues: Separate queues for high/low priority jobs
  • Rate limiting: Prevent overwhelming downstream services

Q: How do you handle job ordering?

  • Currently FIFO via Redis lists
  • For strict ordering: Use sorted sets with timestamps
  • For dependent jobs: Implement DAG-based workflow engine

Q: What about job scheduling (run at specific time)?

  • Current: Immediate execution only
  • Future: Redis sorted sets with score = execution timestamp
  • Dedicated scheduler goroutine polls for ready jobs

Resume Bullet

• Built production asynchronous job queue system in Go with Redis persistence, worker pools,
  exponential backoff retries, and dead letter queue for failed jobs. Handled 100+ jobs/sec
  with p95 latency <100ms.

• Implemented atomic Redis operations to prevent race conditions, Prometheus metrics for
  observability, and graceful shutdown for zero job loss during deployments.
  Repository: github.com/kaungmyathan22/golang-job-queue-worker

Comparison: Synchronous vs Asynchronous

AspectSynchronousAsynchronous (This Project)
User ExperienceBlocks until completionImmediate response
Timeout RiskHigh (30s+ tasks)None (runs in background)
Scalability1 thread = 1 taskN workers handle M tasks
Failure HandlingRequest fails entirelyIndividual job retries
Resource EfficiencyWastes threads waitingWorker pool reuse
ObservabilityRequest logs onlyJob-level metrics + DLQ

When to use async:

  • Tasks longer than 500ms
  • External API calls (email, webhooks)
  • Bulk operations (1000+ items)
  • Non-critical path work (analytics)

Resources That Helped


What's Next?

Planned Enhancements:

  • Job scheduling (cron-like syntax)
  • Priority queues (high/normal/low)
  • Job chaining (dependencies)
  • Web UI for DLQ management
  • Horizontal scaling tests (multiple worker instances)
  • Integration with real email provider (SendGrid)

Related Projects:

Check out my URL Shortener microservice (Days 1-10) for complementary patterns.


Conclusion

Building async job processing while working full-time taught me:

  1. Concurrency is about coordination — Channels and mutexes prevent chaos
  2. Failure is normal — Design for retries, not perfection
  3. Observability earns trust — Metrics prove the system works
  4. Right-sizing beats over-engineering — 10 workers often enough
  5. Production-grade = edge cases handled — DLQ, graceful shutdown, health checks

This project demonstrates systems thinking beyond basic CRUD APIs. Interviewers notice.


Built over 6 focused evenings while working ful time

👉 View Full Source Code


Questions or feedback? Connect on LinkedIn or GitHub.