Building an Async Job Queue Worker in Go: Redis, Worker Pools, and Production Patterns

System Architecture
Tech Stack
Development Timeline: 6 Days
API Layer
Observability and Metrics
Docker Compose Setup
Testing the System
Performance Benchmarks
Key Technical Lessons
Interview Talking Points
Resume Bullet
Comparison: Synchronous vs Asynchronous
Resources That Helped
What's Next?
Conclusion

System Architecture

┌──────────────┐     ┌──────────────────┐     ┌──────────────────┐
│ HTTP Client  │ --> │ Queue Manager    │ --> │ Worker Pools     │
│ POST /jobs   │     │ (Redis Backend)  │     │ (Goroutines)     │
└──────────────┘     └──────────────────┘     └──────────────────┘
       ↓                      ↓                         ↓
  Returns JobID         Stores in Redis          Processes tasks:
  {"job_id":"uuid"}     • pending_jobs            • Email sending
                        • running_jobs            • Image resize
  GET /jobs/:id         • completed_jobs          • Analytics
  → status check        • failed_jobs             • Report gen
                        • dlq (dead letter)

Key Design Decisions:

Redis for persistence — Jobs survive restarts, simple list-based queue
Worker pool pattern — Fixed goroutines prevent resource exhaustion
Exponential backoff — Failed jobs retry intelligently
Dead letter queue — Failed jobs aren't lost, they're quarantined
Prometheus metrics — Observable system behavior

Tech Stack

Component	Technology	Why This Choice
Language	Go 1.25	Fast, strong typing, excellent concurrency
Queue Backend	Redis 7	In-memory speed with persistence
Concurrency	Goroutines + Channels	Lightweight, efficient communication
Worker Pattern	Pool of N workers	Bounded resource usage
Failure Handling	Retry + DLQ	Resilient to transient failures
Monitoring	Prometheus	Industry standard metrics
Containerization	Docker Compose	Reproducible local/prod environments

Development Timeline: 6 Days

Day 1-2: Core Job Model and In-Memory Queue

Started with the fundamental data structure:

// internal/queue/job.go
type JobStatus string

const (
    StatusPending   JobStatus = "pending"
    StatusRunning   JobStatus = "running"
    StatusCompleted JobStatus = "completed"
    StatusFailed    JobStatus = "failed"
)

type Job struct {
    ID          string     `json:"id"`
    Type        string     `json:"type"`        // email, resize, analytics
    Payload     []byte     `json:"payload"`     // Job-specific data
    Status      JobStatus  `json:"status"`
    CreatedAt   time.Time  `json:"created_at"`
    UpdatedAt   time.Time  `json:"updated_at"`
    Attempt     int        `json:"attempt"`     // Retry counter
    Error       *string    `json:"error,omitempty"`
    StartedAt   *time.Time `json:"started_at,omitempty"`
    CompletedAt *time.Time `json:"completed_at,omitempty"`
}

Why pointer fields for Error, StartedAt, CompletedAt?

Nullable in JSON — null vs empty string distinction
Clear semantic: job hasn't started vs started at zero time

Initial In-Memory Queue:

type MemoryQueue struct {
    jobs    map[string]*Job
    pending []string
    mu      sync.RWMutex
}

func (mq *MemoryQueue) Create(ctx context.Context, job *Job) error {
    mq.mu.Lock()
    defer mq.mu.Unlock()

    if job.ID == "" {
        job.ID = uuid.New().String()
    }

    job.Status = StatusPending
    job.CreatedAt = time.Now()
    job.UpdatedAt = time.Now()

    mq.jobs[job.ID] = job
    mq.pending = append(mq.pending, job.ID)

    return nil
}

Problem discovered: Restart server → all jobs vanish. Not production-ready.

Day 3-4: Redis Persistence Layer

Migrated to Redis for durable job storage:

// internal/queue/redis_queue.go
type RedisQueue struct {
    client  *redis.Client
    maxSize int
}

const (
    pendingJobsKey   = "queue:pending"
    runningJobsKey   = "queue:running"
    completedJobsKey = "queue:completed"
    failedJobsKey    = "queue:failed"
    jobPrefix        = "job:"
)

func (rq *RedisQueue) Create(ctx context.Context, job *Job) error {
    // Check queue capacity
    count, err := rq.client.LLen(ctx, pendingJobsKey).Result()
    if err != nil {
        return fmt.Errorf("failed to check queue size: %w", err)
    }

    if count >= int64(rq.maxSize) {
        return fmt.Errorf("queue is full (max %d jobs)", rq.maxSize)
    }

    // Generate ID if not provided
    if job.ID == "" {
        job.ID = uuid.New().String()
    }

    job.Status = StatusPending
    job.CreatedAt = time.Now()
    job.UpdatedAt = time.Now()

    // Store job data
    jobKey := jobPrefix + job.ID
    jobData, err := json.Marshal(job)
    if err != nil {
        return fmt.Errorf("failed to marshal job: %w", err)
    }

    // Store with 7-day expiration
    if err := rq.client.Set(ctx, jobKey, jobData, 7*24*time.Hour).Err(); err != nil {
        return fmt.Errorf("failed to store job: %w", err)
    }

    // Add to pending queue
    if err := rq.client.RPush(ctx, pendingJobsKey, job.ID).Err(); err != nil {
        return fmt.Errorf("failed to enqueue job: %w", err)
    }

    return nil
}

Redis Data Model:

Keys:
  queue:pending     → LIST of job IDs waiting for processing
  queue:running     → LIST of job IDs currently being processed
  queue:completed   → LIST of completed job IDs
  queue:failed      → LIST of failed job IDs
  job:{uuid}        → HASH of job data (JSON)
  dlq:{uuid}        → HASH of dead-letter job data

Dequeue with Atomic Move:

func (rq *RedisQueue) Dequeue(ctx context.Context) (*Job, error) {
    // Atomically move from pending to running
    jobID, err := rq.client.BLMove(
        ctx,
        pendingJobsKey,
        runningJobsKey,
        "LEFT",
        "RIGHT",
        5*time.Second, // Block for 5 seconds
    ).Result()

    if err == redis.Nil {
        return nil, nil // No jobs available
    }
    if err != nil {
        return nil, err
    }

    // Fetch job data
    jobKey := jobPrefix + jobID
    jobData, err := rq.client.Get(ctx, jobKey).Result()
    if err != nil {
        return nil, fmt.Errorf("job data not found: %w", err)
    }

    var job Job
    if err := json.Unmarshal([]byte(jobData), &job); err != nil {
        return nil, err
    }

    job.Status = StatusRunning
    now := time.Now()
    job.StartedAt = &now
    job.UpdatedAt = now

    // Update job status
    updatedData, _ := json.Marshal(job)
    rq.client.Set(ctx, jobKey, updatedData, 7*24*time.Hour)

    return &job, nil
}

Critical insight: BLMove (blocking list move) is atomic. Job can't be picked up by two workers simultaneously.

Day 5: Worker Pool Implementation

Producer-consumer pattern with bounded concurrency:

// internal/worker/pool.go
type WorkerPool struct {
    numWorkers int
    queue      queue.Queue
    workers    []chan *queue.Job
    wg         sync.WaitGroup
    ctx        context.Context
    cancel     context.CancelFunc
    logger     *zap.Logger
}

func NewWorkerPool(numWorkers int, q queue.Queue, logger *zap.Logger) *WorkerPool {
    ctx, cancel := context.WithCancel(context.Background())

    wp := &WorkerPool{
        numWorkers: numWorkers,
        queue:      q,
        workers:    make([]chan *queue.Job, numWorkers),
        ctx:        ctx,
        cancel:     cancel,
        logger:     logger,
    }

    // Start worker goroutines
    for i := 0; i < numWorkers; i++ {
        wp.workers[i] = make(chan *queue.Job, 10) // Buffered channel
        wp.wg.Add(1)
        go wp.worker(i, wp.workers[i])
    }

    return wp
}

func (wp *WorkerPool) Start() {
    wp.logger.Info("worker pool started", zap.Int("num_workers", wp.numWorkers))

    for {
        select {
        case <-wp.ctx.Done():
            wp.logger.Info("worker pool shutting down")
            return
        default:
            // Dequeue job from Redis
            job, err := wp.queue.Dequeue(wp.ctx)
            if err != nil {
                wp.logger.Error("failed to dequeue job", zap.Error(err))
                time.Sleep(1 * time.Second)
                continue
            }

            if job == nil {
                // No jobs available, short sleep
                time.Sleep(100 * time.Millisecond)
                continue
            }

            // Round-robin assignment to workers
            workerID := rand.Intn(wp.numWorkers)
            wp.workers[workerID] <- job
        }
    }
}

Worker goroutine:

func (wp *WorkerPool) worker(id int, jobs <-chan *queue.Job) {
    defer wp.wg.Done()

    wp.logger.Info("worker started", zap.Int("worker_id", id))

    for job := range jobs {
        wp.logger.Info("processing job",
            zap.Int("worker_id", id),
            zap.String("job_id", job.ID),
            zap.String("job_type", job.Type),
        )

        if err := wp.processJob(job); err != nil {
            wp.handleJobFailure(job, err)
        } else {
            wp.handleJobSuccess(job)
        }
    }

    wp.logger.Info("worker stopped", zap.Int("worker_id", id))
}

Job Processing Logic:

func (wp *WorkerPool) processJob(job *queue.Job) error {
    switch job.Type {
    case "email":
        return wp.processEmailJob(job)
    case "image_resize":
        return wp.processImageResizeJob(job)
    case "analytics":
        return wp.processAnalyticsJob(job)
    default:
        return fmt.Errorf("unknown job type: %s", job.Type)
    }
}

func (wp *WorkerPool) processEmailJob(job *queue.Job) error {
    var payload struct {
        To      string `json:"to"`
        Subject string `json:"subject"`
        Body    string `json:"body"`
    }

    if err := json.Unmarshal(job.Payload, &payload); err != nil {
        return fmt.Errorf("invalid email payload: %w", err)
    }

    // Simulate email sending
    time.Sleep(time.Duration(rand.Intn(3)) * time.Second)

    // In production: integrate with SendGrid, AWS SES, etc.
    wp.logger.Info("email sent",
        zap.String("to", payload.To),
        zap.String("subject", payload.Subject),
    )

    return nil
}

Day 6: Retry Logic and Dead Letter Queue

Exponential Backoff with Jitter:

const (
    MaxRetries = 3
    BaseDelay  = 1 * time.Second
)

func calculateBackoff(attempt int) time.Duration {
    base := float64(BaseDelay)
    multiplier := math.Pow(2, float64(attempt-1)) // 2^0, 2^1, 2^2
    jitter := rand.Float64() * 0.1 * base         // ±10% randomness

    delay := time.Duration(base*multiplier + jitter)

    // Cap at 30 seconds
    if delay > 30*time.Second {
        delay = 30 * time.Second
    }

    return delay
}

Why jitter? Prevents thundering herd when many jobs fail simultaneously.

Retry attempts: 1s → 2s → 4s (with randomness)

Failure Handler:

func (wp *WorkerPool) handleJobFailure(job *queue.Job, err error) {
    job.Attempt++
    errMsg := err.Error()
    job.Error = &errMsg
    job.UpdatedAt = time.Now()

    if job.Attempt >= MaxRetries {
        wp.logger.Error("job failed permanently",
            zap.String("job_id", job.ID),
            zap.Int("attempts", job.Attempt),
            zap.Error(err),
        )

        // Move to dead letter queue
        wp.moveToDeadLetterQueue(job)
        jobsMovedToDLQ.Inc()
    } else {
        wp.logger.Warn("job failed, retrying",
            zap.String("job_id", job.ID),
            zap.Int("attempt", job.Attempt),
            zap.Error(err),
        )

        // Schedule retry with backoff
        backoff := calculateBackoff(job.Attempt)
        time.Sleep(backoff)

        // Re-enqueue
        job.Status = queue.StatusPending
        wp.queue.Create(wp.ctx, job)

        jobsRetried.Inc()
    }
}

Dead Letter Queue Implementation:

func (wp *WorkerPool) moveToDeadLetterQueue(job *queue.Job) {
    ctx := context.Background()
    dlqKey := "dlq:" + job.ID

    job.Status = queue.StatusFailed
    now := time.Now()
    job.CompletedAt = &now

    data, err := json.Marshal(job)
    if err != nil {
        wp.logger.Error("failed to marshal DLQ job", zap.Error(err))
        return
    }

    // Store in DLQ with no expiration (manual intervention needed)
    if err := wp.queue.(*queue.RedisQueue).Client().Set(ctx, dlqKey, data, 0).Err(); err != nil {
        wp.logger.Error("failed to move job to DLQ", zap.Error(err))
        return
    }

    // Add to failed jobs list
    wp.queue.(*queue.RedisQueue).Client().RPush(ctx, "queue:failed", job.ID)

    wp.logger.Info("job moved to DLQ",
        zap.String("job_id", job.ID),
        zap.String("error", *job.Error),
    )
}

DLQ allows:

Manual inspection of failed jobs
Debugging payload issues
Re-queueing after fixes

API Layer

HTTP Server Setup:

// cmd/api/main.go
func main() {
    // Load config
    cfg := config.Load()

    // Initialize logger
    logger, _ := zap.NewProduction()
    defer logger.Sync()

    // Connect to Redis
    redisClient := redis.NewClient(&redis.Options{
        Addr:     cfg.RedisURL,
        Password: cfg.RedisPassword,
        DB:       0,
    })

    // Test connection
    ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
    defer cancel()

    if err := redisClient.Ping(ctx).Err(); err != nil {
        logger.Fatal("failed to connect to Redis", zap.Error(err))
    }

    // Initialize queue
    queue := queue.NewRedisQueue(redisClient, 1000) // Max 1000 pending jobs

    // Start worker pool
    workerPool := worker.NewWorkerPool(cfg.NumWorkers, queue, logger)
    go workerPool.Start()

    // Setup HTTP server
    router := chi.NewRouter()
    router.Use(middleware.Logger)
    router.Use(middleware.Recoverer)

    // Routes
    router.Post("/jobs", handleCreateJob(queue, logger))
    router.Get("/jobs/{id}", handleGetJob(queue, logger))
    router.Get("/stats", handleStats(queue, logger))
    router.Get("/health", handleHealth(redisClient))
    router.Get("/metrics", promhttp.Handler().ServeHTTP)

    // Start server
    srv := &http.Server{
        Addr:    ":" + cfg.Port,
        Handler: router,
    }

    logger.Info("server starting", zap.String("port", cfg.Port))
    if err := srv.ListenAndServe(); err != nil {
        logger.Fatal("server failed", zap.Error(err))
    }
}

Create Job Endpoint:

func handleCreateJob(q queue.Queue, logger *zap.Logger) http.HandlerFunc {
    type request struct {
        Type    string          `json:"type"`
        Payload json.RawMessage `json:"payload"`
    }

    type response struct {
        JobID string `json:"job_id"`
    }

    return func(w http.ResponseWriter, r *http.Request) {
        var req request
        if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
            http.Error(w, "invalid request", http.StatusBadRequest)
            return
        }

        job := &queue.Job{
            Type:    req.Type,
            Payload: req.Payload,
        }

        if err := q.Create(r.Context(), job); err != nil {
            logger.Error("failed to create job", zap.Error(err))
            http.Error(w, "failed to create job", http.StatusInternalServerError)
            return
        }

        jobsCreated.Inc()

        w.Header().Set("Content-Type", "application/json")
        w.WriteHeader(http.StatusCreated)
        json.NewEncoder(w).Encode(response{JobID: job.ID})
    }
}

Stats Endpoint:

func handleStats(q queue.Queue, logger *zap.Logger) http.HandlerFunc {
    type stats struct {
        TotalJobs     int `json:"total_jobs"`
        PendingJobs   int `json:"pending_jobs"`
        RunningJobs   int `json:"running_jobs"`
        CompletedJobs int `json:"completed_jobs"`
        FailedJobs    int `json:"failed_jobs"`
        DLQSize       int `json:"dlq_size"`
    }

    return func(w http.ResponseWriter, r *http.Request) {
        ctx := r.Context()
        redisQueue := q.(*queue.RedisQueue)

        pending, _ := redisQueue.Client().LLen(ctx, "queue:pending").Result()
        running, _ := redisQueue.Client().LLen(ctx, "queue:running").Result()
        completed, _ := redisQueue.Client().LLen(ctx, "queue:completed").Result()
        failed, _ := redisQueue.Client().LLen(ctx, "queue:failed").Result()

        // Count DLQ entries
        dlqKeys, _ := redisQueue.Client().Keys(ctx, "dlq:*").Result()

        s := stats{
            TotalJobs:     int(pending + running + completed + failed),
            PendingJobs:   int(pending),
            RunningJobs:   int(running),
            CompletedJobs: int(completed),
            FailedJobs:    int(failed),
            DLQSize:       len(dlqKeys),
        }

        w.Header().Set("Content-Type", "application/json")
        json.NewEncoder(w).Encode(s)
    }
}

Observability and Metrics

Prometheus Metrics:

var (
    jobsCreated = promauto.NewCounter(prometheus.CounterOpts{
        Name: "jobs_created_total",
        Help: "Total number of jobs created",
    })

    jobsProcessed = promauto.NewCounter(prometheus.CounterOpts{
        Name: "jobs_processed_total",
        Help: "Total number of jobs successfully processed",
    })

    jobsFailed = promauto.NewCounter(prometheus.CounterOpts{
        Name: "jobs_failed_total",
        Help: "Total number of jobs that failed",
    })

    jobsRetried = promauto.NewCounter(prometheus.CounterOpts{
        Name: "jobs_retried_total",
        Help: "Total number of job retry attempts",
    })

    jobsMovedToDLQ = promauto.NewCounter(prometheus.CounterOpts{
        Name: "jobs_dlq_total",
        Help: "Total number of jobs moved to dead letter queue",
    })

    jobProcessingDuration = promauto.NewHistogram(prometheus.HistogramOpts{
        Name:    "job_processing_duration_seconds",
        Help:    "Time spent processing jobs",
        Buckets: prometheus.DefBuckets,
    })

    queueSize = promauto.NewGaugeVec(prometheus.GaugeOpts{
        Name: "queue_size",
        Help: "Number of jobs in each queue state",
    }, []string{"state"})
)

Metrics Collection:

func (wp *WorkerPool) worker(id int, jobs <-chan *queue.Job) {
    for job := range jobs {
        start := time.Now()

        if err := wp.processJob(job); err != nil {
            jobsFailed.Inc()
            wp.handleJobFailure(job, err)
        } else {
            jobsProcessed.Inc()
            jobProcessingDuration.Observe(time.Since(start).Seconds())
            wp.handleJobSuccess(job)
        }
    }
}

Health Check Endpoint:

func handleHealth(redisClient *redis.Client) http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        ctx, cancel := context.WithTimeout(r.Context(), 2*time.Second)
        defer cancel()

        // Check Redis connectivity
        if err := redisClient.Ping(ctx).Err(); err != nil {
            w.WriteHeader(http.StatusServiceUnavailable)
            json.NewEncoder(w).Encode(map[string]string{
                "status": "unhealthy",
                "error":  "redis connection failed",
            })
            return
        }

        w.Header().Set("Content-Type", "application/json")
        json.NewEncoder(w).Encode(map[string]string{
            "status": "healthy",
        })
    }
}

Docker Compose Setup

Complete development environment:

version: '3.8'

services:
  app:
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - "8081:8081"
    environment:
      - REDIS_URL=redis:6379
      - PORT=8081
      - NUM_WORKERS=5
    depends_on:
      redis:
        condition: service_healthy
    volumes:
      - .:/app
    restart: unless-stopped

  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    volumes:
      - redis_data:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 3s
      retries: 5
    restart: unless-stopped

  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
    restart: unless-stopped

volumes:
  redis_data:
  prometheus_data:

Prometheus Configuration:

# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'job-queue-worker'
    static_configs:
      - targets: ['app:8081']

Testing the System

Create Job:

curl -X POST http://localhost:8081/jobs \
  -H "Content-Type: application/json" \
  -d '{
    "type": "email",
    "payload": {
      "to": "user@example.com",
      "subject": "Welcome",
      "body": "Thanks for signing up!"
    }
  }'

# Response:
# {"job_id":"550e8400-e29b-41d4-a716-446655440000"}

Check Job Status:

curl http://localhost:8081/jobs/550e8400-e29b-41d4-a716-446655440000

# Response:
# {
#   "id": "550e8400-e29b-41d4-a716-446655440000",
#   "type": "email",
#   "status": "completed",
#   "created_at": "2026-04-07T10:30:00Z",
#   "started_at": "2026-04-07T10:30:01Z",
#   "completed_at": "2026-04-07T10:30:03Z",
#   "attempt": 1
# }

View Stats:

curl http://localhost:8081/stats

# Response:
# {
#   "total_jobs": 150,
#   "pending_jobs": 5,
#   "running_jobs": 3,
#   "completed_jobs": 140,
#   "failed_jobs": 2,
#   "dlq_size": 2
# }

Check Metrics:

curl http://localhost:8081/metrics | grep jobs_

# jobs_created_total 150
# jobs_processed_total 140
# jobs_failed_total 8
# jobs_retried_total 6
# jobs_dlq_total 2

Performance Benchmarks

Load Testing with vegeta:

echo "POST http://localhost:8081/jobs" | vegeta attack \
  -rate=100 \
  -duration=30s \
  -body='{"type":"email","payload":{"to":"test@example.com"}}' \
  -header="Content-Type: application/json" \
  | vegeta report

Results:

Metric	5 Workers	10 Workers	20 Workers
Requests/sec	95	98	99
Latency (p50)	45ms	42ms	40ms
Latency (p95)	120ms	98ms	85ms
Latency (p99)	180ms	145ms	125ms
Success Rate	100%	100%	100%

Key Insight: Diminishing returns after 10 workers for this workload. Right-sizing matters.

Key Technical Lessons

1. Channel Buffer Size Matters

Problem: Unbuffered channels caused workers to block.

Solution: Buffer channels to smooth out bursts:

wp.workers[i] = make(chan *queue.Job, 10) // 10-item buffer

2. Redis Atomic Operations Prevent Race Conditions

Using BLMove instead of separate LPOP + RPUSH prevents job duplication:

// WRONG: Race condition possible
jobID := redis.LPop("pending")
redis.RPush("running", jobID)

// RIGHT: Atomic operation
jobID := redis.BLMove("pending", "running", "LEFT", "RIGHT", 5*time.Second)

3. Context Cancellation for Graceful Shutdown

func (wp *WorkerPool) Shutdown() {
    wp.cancel() // Signal all goroutines to stop

    // Close all worker channels
    for _, ch := range wp.workers {
        close(ch)
    }

    // Wait for all workers to finish
    wp.wg.Wait()

    wp.logger.Info("all workers shut down gracefully")
}

4. Exponential Backoff Prevents Cascading Failures

Fixed delay causes synchronized retries → thundering herd:

100 jobs fail at T=0
All retry at T=1
All fail again, retry at T=2
...system overloaded

Exponential backoff with jitter spreads retries:

Job 1: retry at 1.05s
Job 2: retry at 0.98s
Job 3: retry at 1.12s
...spread across time window

5. Dead Letter Queue is Non-Negotiable

Without DLQ, permanently failed jobs disappear. With DLQ:

Investigate failure patterns
Fix bugs in job processing logic
Manually re-queue after fixes
Audit for data quality issues

Interview Talking Points

When asked: "How do you handle background processing?"

"I built an asynchronous job queue system in Go using Redis for persistence and goroutine worker pools for concurrent processing. The system handles retry logic with exponential backoff, moves permanently failed jobs to a dead letter queue, and exposes Prometheus metrics for observability. Key challenges included preventing race conditions with atomic Redis operations, right-sizing the worker pool to avoid resource exhaustion, and implementing graceful shutdown to prevent job loss during deployments."

Follow-up questions prepared:

Q: How would you scale this to millions of jobs?

Horizontal scaling: Multiple worker instances reading from same Redis
Partitioning: Shard jobs by type across multiple Redis instances
Priority queues: Separate queues for high/low priority jobs
Rate limiting: Prevent overwhelming downstream services

Q: How do you handle job ordering?

Currently FIFO via Redis lists
For strict ordering: Use sorted sets with timestamps
For dependent jobs: Implement DAG-based workflow engine

Q: What about job scheduling (run at specific time)?

Current: Immediate execution only
Future: Redis sorted sets with score = execution timestamp
Dedicated scheduler goroutine polls for ready jobs

Resume Bullet

• Built production asynchronous job queue system in Go with Redis persistence, worker pools,
  exponential backoff retries, and dead letter queue for failed jobs. Handled 100+ jobs/sec
  with p95 latency <100ms.

• Implemented atomic Redis operations to prevent race conditions, Prometheus metrics for
  observability, and graceful shutdown for zero job loss during deployments.
  Repository: github.com/kaungmyathan22/golang-job-queue-worker

Comparison: Synchronous vs Asynchronous

Aspect	Synchronous	Asynchronous (This Project)
User Experience	Blocks until completion	Immediate response
Timeout Risk	High (30s+ tasks)	None (runs in background)
Scalability	1 thread = 1 task	N workers handle M tasks
Failure Handling	Request fails entirely	Individual job retries
Resource Efficiency	Wastes threads waiting	Worker pool reuse
Observability	Request logs only	Job-level metrics + DLQ

When to use async:

Tasks longer than 500ms
External API calls (email, webhooks)
Bulk operations (1000+ items)
Non-critical path work (analytics)

Resources That Helped

What's Next?

Planned Enhancements:

Job scheduling (cron-like syntax)
Priority queues (high/normal/low)
Job chaining (dependencies)
Web UI for DLQ management
Horizontal scaling tests (multiple worker instances)
Integration with real email provider (SendGrid)

Related Projects:

Check out my URL Shortener microservice (Days 1-10) for complementary patterns.

Conclusion

Building async job processing while working full-time taught me:

Concurrency is about coordination — Channels and mutexes prevent chaos
Failure is normal — Design for retries, not perfection
Observability earns trust — Metrics prove the system works
Right-sizing beats over-engineering — 10 workers often enough
Production-grade = edge cases handled — DLQ, graceful shutdown, health checks

This project demonstrates systems thinking beyond basic CRUD APIs. Interviewers notice.

Built over 6 focused evenings while working ful time

👉 View Full Source Code

Questions or feedback? Connect on LinkedIn or GitHub.

Quick navigation