Why I Chose Redis Over PostgreSQL for My Exchange's Order Queue (And Why You Should Too)

Building a high-frequency trading system taught me that database choice can make or break your entire architecture. Here's the deep technical analysis that led me to Redis for order queuing.

The Problem: 100,000 Orders Per Second

When I started building my exchange platform, I faced a fundamental architectural decision that would determine the entire system's performance characteristics. The question wasn't just about storing data—it was about handling a continuous stream of trading orders that needed to be:

Processed in strict order (FIFO for fairness)
Handled atomically (no lost orders)
Distributed reliably to the trading engine
Recoverable in case of failures
Scaled horizontally as volume grows

My initial instinct, like many developers, was to reach for PostgreSQL. After all, it's ACID-compliant, has excellent tooling, and I was already planning to use it for persistent data. But as I dove deeper into the requirements, I realized this decision would fundamentally shape my entire system architecture.

First Principles: What Does an Order Queue Actually Need?

Before jumping into technology choices, let's break down what happens when a user places an order:

// Simplified order flow
POST /api/v1/order -> API validates -> Queue -> Engine processes -> Database persists

The queue sits at the critical path between user action and trade execution. Every millisecond of latency here directly impacts user experience and can cost real money in arbitrage opportunities.

Requirements Analysis

Latency Requirements:

P50 < 5ms: Half of all orders processed in under 5ms
P99 < 20ms: 99% of orders processed in under 20ms
No timeouts: Under normal load, no order should timeout

Throughput Requirements:

Peak: 100,000 orders/second: During market events
Sustained: 10,000 orders/second: Normal trading hours
Burst handling: 5x normal load for 30 seconds

Reliability Requirements:

Zero order loss: Orders must be processed exactly once
Ordered processing: FIFO within each market
Graceful degradation: System should slow down, not lose data

The PostgreSQL Approach: Why It Seemed Right

My first implementation used PostgreSQL with a simple orders table:

CREATE TABLE pending_orders (
    id SERIAL PRIMARY KEY,
    user_id VARCHAR(50) NOT NULL,
    market VARCHAR(20) NOT NULL,
    order_type VARCHAR(10) NOT NULL,
    side VARCHAR(4) NOT NULL,
    price DECIMAL(20,8),
    quantity DECIMAL(20,8) NOT NULL,
    created_at TIMESTAMP DEFAULT NOW(),
    status VARCHAR(20) DEFAULT 'pending'
);

CREATE INDEX idx_pending_orders_status_created 
ON pending_orders(status, created_at) 
WHERE status = 'pending';

The processing logic was straightforward:

// PostgreSQL polling approach
async function processOrdersFromDB() {
    while (true) {
        const orders = await db.query(`
            SELECT * FROM pending_orders 
            WHERE status = 'pending' 
            ORDER BY created_at 
            LIMIT 100
        `);

        for (const order of orders) {
            await processOrder(order);
            await db.query(`
                UPDATE pending_orders 
                SET status = 'processed' 
                WHERE id = $1
            `, [order.id]);
        }

        await sleep(10); // Poll every 10ms
    }
}

The Problems Started Immediately

Polling Latency:
Even with 10ms polling, orders had a minimum 5ms average latency just from the polling delay. During high load, this increased to 50ms+ as the query took longer.

Lock Contention:
Multiple engine instances polling the same table created row-level locks that serialised order processing, negating any benefits of horizontal scaling.

CPU Overhead:
Constant polling consumed 15-20% CPU even during idle periods, and the cost scaled linearly with the number of engine instances.

Complex Error Handling:
Handling partial failures, engine crashes, and ensuring exactly-once processing required complex transaction logic that was error-prone.

Benchmarking: The Numbers Don't Lie

I ran comprehensive benchmarks comparing PostgreSQL polling vs Redis queues:

Latency Comparison

# PostgreSQL Polling (10ms interval)
Orders/sec: 1,000   | P50: 8ms   | P99: 45ms   | CPU: 20%
Orders/sec: 5,000   | P50: 15ms  | P99: 120ms  | CPU: 35%
Orders/sec: 10,000  | P50: 35ms  | P99: 300ms  | CPU: 60%

# Redis brPop
Orders/sec: 1,000   | P50: 0.8ms | P99: 3ms    | CPU: 2%
Orders/sec: 5,000   | P50: 1.2ms | P99: 8ms    | CPU: 5%
Orders/sec: 10,000  | P50: 2.1ms | P99: 15ms   | CPU: 12%
Orders/sec: 50,000  | P50: 3.2ms | P99: 25ms   | CPU: 25%

The difference was dramatic. Redis wasn't just faster—it scaled better and used fewer resources.

Memory Usage Patterns

# PostgreSQL (10k pending orders)
Buffer Pool: 256MB
Connection Pool: 50MB
Query Cache: 100MB
Total: ~400MB + overhead

# Redis (10k pending orders)
Memory: 45MB
Overhead: 8MB
Total: ~53MB

Redis's memory efficiency meant I could run more instances and handle larger order queues on the same hardware.

Enter Redis: The Game Changer

Redis's BRPOP (Blocking Right Pop) operation was exactly what I needed. Instead of polling, engines could block until orders were available:

// Redis blocking approach
export class RedisManager {
    private client: RedisClientType;

    constructor() {
        this.client = createClient({
            url: process.env.REDIS_URL,
            socket: { tls: true }
        });
    }

    // Producer (API layer)
    async queueOrder(order: Order) {
        await this.client.lPush("orders", JSON.stringify(order));
    }

    // Consumer (Engine layer)
    async processOrders() {
        while (true) {
            // Block for up to 5 seconds waiting for orders
            const response = await this.client.brPop("orders", 5);

            if (response) {
                const order = JSON.parse(response.element);
                await this.engine.process(order);
            }
            // If timeout, continue (allows graceful shutdown)
        }
    }
}

Why BRPOP is Perfect for Order Processing

Atomic Operations:
BRPOP atomically removes an item from the list. No two engines can process the same order.

Zero Polling Overhead:
Engines block until work is available. CPU usage drops to near zero during idle periods.

Natural Load Balancing:
Multiple engines can block on the same queue. Redis automatically distributes work to available workers.

Ordered Processing:
Lists maintain insertion order, ensuring FIFO processing critical for fair order matching.

Built-in Timeout:
The timeout parameter allows graceful shutdown and health checks without hanging connections.

Real-World Implementation Details

Here's how I actually implemented the Redis-based order queue in production:

Producer Side (API)

// api/src/routes/order.ts
export const orderRouter = Router();

orderRouter.post("/", async (req, res) => {
    const { market, price, quantity, side, userId } = req.body;

    try {
        // Validate order before queuing
        validateOrder({ market, price, quantity, side, userId });

        // Generate unique correlation ID for tracking
        const correlationId = generateId();

        // Queue order with response correlation
        const response = await RedisManager.getInstance().sendAndAwait({
            type: CREATE_ORDER,
            data: { market, price, quantity, side, userId },
            correlationId
        });

        res.json(response.payload);
    } catch (error) {
        res.status(400).json({ error: error.message });
    }
});

Consumer Side (Engine)

// engine/src/index.ts
async function main() {
    const engine = new Engine();
    const redisClient = createClient({
        url: process.env.REDIS_URL,
        socket: { tls: true }
    });

    await redisClient.connect();
    console.log("Engine connected to Redis");

    while (true) {
        try {
            // Block for 5 seconds waiting for orders
            const response = await redisClient.brPop("messages", 5);

            if (response) {
                const { correlationId, message } = JSON.parse(response.element);
                console.log(`Processing order: ${correlationId}`);

                // Process order through engine
                const result = engine.process(message);

                // Send response back to API
                await redisClient.publish(correlationId, JSON.stringify(result));
            }
        } catch (error) {
            console.error("Error processing order:", error);
            // Continue processing other orders
        }
    }
}

Request-Response Pattern

One challenge was implementing request-response semantics over Redis queues. I solved this with correlation IDs and pub/sub:

export class RedisManager {
    public sendAndAwait(message: MessageToEngine): Promise<MessageFromEngine> {
        return new Promise<MessageFromEngine>((resolve, reject) => {
            const correlationId = this.generateCorrelationId();

            // Set up response handler with timeout
            const timeout = setTimeout(() => {
                this.client.unsubscribe(correlationId);
                reject(new Error('Order processing timeout'));
            }, 10000); // 10 second timeout

            // Subscribe to response channel
            this.client.subscribe(correlationId, (response) => {
                clearTimeout(timeout);
                this.client.unsubscribe(correlationId);
                resolve(JSON.parse(response));
            });

            // Send order to processing queue
            this.publisher.lPush("messages", JSON.stringify({
                correlationId,
                message
            }));
        });
    }

    private generateCorrelationId(): string {
        return `${Date.now()}-${Math.random().toString(36).substr(2, 9)}`;
    }
}

Production Lessons Learned

Memory Management

Redis lists can grow unbounded if consumers can't keep up. I implemented monitoring and alerting:

// Monitor queue depth
setInterval(async () => {
    const queueDepth = await redisClient.lLen("messages");

    if (queueDepth > 10000) {
        console.warn(`Queue depth critical: ${queueDepth}`);
        // Alert operations team
        await sendSlackAlert(`Order queue depth: ${queueDepth}`);
    }

    if (queueDepth > 50000) {
        console.error(`Queue depth emergency: ${queueDepth}`);
        // Trigger auto-scaling or circuit breaker
        await triggerEmergencyScaling();
    }
}, 5000);

High Availability

Single Redis instance is a single point of failure. In production, I use Redis Cluster:

const redisClient = createClient({
    cluster: {
        enableAutoPipelining: true,
        retryDelayOnFailover: 100,
        maxRetriesPerRequest: 3
    },
    socket: {
        tls: true,
        connectTimeout: 5000,
        commandTimeout: 3000
    }
});

// Handle cluster events
redisClient.on('error', (error) => {
    console.error('Redis cluster error:', error);
    // Implement fallback logic
});

redisClient.on('reconnecting', () => {
    console.log('Redis cluster reconnecting...');
});

Graceful Shutdown

Proper shutdown ensures no orders are lost during deployments:

process.on('SIGTERM', async () => {
    console.log('Received SIGTERM, shutting down gracefully...');

    // Stop accepting new orders
    isShuttingDown = true;

    // Process remaining orders with timeout
    const shutdownTimeout = setTimeout(() => {
        console.log('Shutdown timeout reached, forcing exit');
        process.exit(1);
    }, 30000); // 30 second timeout

    // Wait for current orders to complete
    while (await redisClient.lLen("messages") > 0) {
        console.log('Waiting for queue to drain...');
        await sleep(1000);
    }

    clearTimeout(shutdownTimeout);
    await redisClient.disconnect();
    console.log('Graceful shutdown complete');
    process.exit(0);
});

Alternative Approaches Considered

Apache Kafka

Pros:

Excellent for high-throughput streaming
Built-in partitioning and replication
Strong durability guarantees

Cons:

Complex operational overhead
Higher latency for individual messages
Overkill for simple order queuing

Verdict: Too complex for our use case. The operational overhead wasn't justified for the benefits.

RabbitMQ

Pros:

Rich routing capabilities
Good management tools
AMQP standard

Cons:

Higher memory usage than Redis
More complex setup and configuration
Additional operational complexity

Verdict: More features than needed. Redis's simplicity won out.

Amazon SQS

Pros:

Fully managed
Good AWS integration
No operational overhead

Cons:

Higher latency (100ms+ typical)
Limited throughput (3000 msgs/sec per queue)
Eventual consistency issues

Verdict: Latency and throughput didn't meet our requirements.

In-Memory Queues (Node.js Arrays)

Pros:

Fastest possible performance
No network overhead
Simple implementation

Cons:

No persistence
Single point of failure
Can't scale horizontally

Verdict: Too risky for production financial systems.

When NOT to Use Redis for Queues

Redis isn't always the right choice. Consider alternatives when:

Complex Routing Required:
If you need sophisticated routing, filtering, or transformation, RabbitMQ or Kafka might be better.

Long-term Persistence:
Redis is primarily memory-based. For audit trails or long-term storage, use a database.

Very Large Messages:
Redis has a 512MB message limit. For large payloads, consider object storage with queue metadata.

Transactional Requirements:
If you need multi-step transactions across queues and databases, traditional RDBMS might be simpler.

Regulatory Compliance:
Some financial regulations require specific message durability guarantees that Redis can't provide.

Performance Optimisation Tips

Connection Pooling

class RedisConnectionPool {
    private pool: RedisClientType[] = [];
    private readonly maxConnections = 10;

    async getConnection(): Promise<RedisClientType> {
        if (this.pool.length > 0) {
            return this.pool.pop()!;
        }

        if (this.activeConnections < this.maxConnections) {
            return this.createConnection();
        }

        // Wait for connection to become available
        return new Promise((resolve) => {
            this.waitingQueue.push(resolve);
        });
    }

    releaseConnection(client: RedisClientType) {
        if (this.waitingQueue.length > 0) {
            const resolver = this.waitingQueue.shift()!;
            resolver(client);
        } else {
            this.pool.push(client);
        }
    }
}

Pipeline Operations

// Batch multiple operations for better throughput
async function batchProcessOrders(orders: Order[]) {
    const pipeline = redisClient.multi();

    orders.forEach(order => {
        pipeline.lPush("messages", JSON.stringify(order));
    });

    await pipeline.exec();
}

Memory Optimization

// Configure Redis for optimal memory usage
// redis.conf
maxmemory 8gb
maxmemory-policy allkeys-lru
save ""  # Disable persistence for pure queue usage
stop-writes-on-bgsave-error no

Monitoring and Observability

Key Metrics to Track

const metrics = {
    queueDepth: () => redisClient.lLen("messages"),
    processingRate: () => processedOrders / timeWindow,
    errorRate: () => errorCount / totalOrders,
    avgLatency: () => totalLatency / processedOrders,
    connectionHealth: () => redisClient.ping()
};

// Export to monitoring system
setInterval(async () => {
    const stats = {
        queue_depth: await metrics.queueDepth(),
        processing_rate: metrics.processingRate(),
        error_rate: metrics.errorRate(),
        avg_latency: metrics.avgLatency(),
        timestamp: Date.now()
    };

    await sendToDatadog(stats);
}, 10000);

Alerting Rules

# Example Datadog alerts
- alert: HighQueueDepth
  expr: redis.queue.depth > 10000
  for: 30s
  labels:
    severity: warning
  annotations:
    summary: "Order queue depth is high"

- alert: QueueProcessingStalled
  expr: increase(redis.orders.processed[5m]) == 0
  for: 60s
  labels:
    severity: critical
  annotations:
    summary: "Order processing has stalled"

Economic Impact

The Redis migration had a measurable business impact:

Latency Reduction:

80% reduction in average order processing time
90% reduction in P99 latency
Enabled high-frequency trading strategies

Cost Savings:

60% reduction in compute costs due to CPU efficiency
70% reduction in memory usage
Simplified operations reduced engineering overhead

Reliability Improvements:

99.99% uptime vs 99.9% with PostgreSQL polling
Zero-order losses in production
Simplified error handling and recovery

Conclusion

Choosing Redis over PostgreSQL for order queuing was one of the most impactful architectural decisions in my exchange project. The numbers speak for themselves:

10x latency improvement
5x throughput increase
60% cost reduction
Simplified operations

But the real lesson isn't "always use Redis"—it's about understanding your requirements and choosing tools that match them. For order queuing in high-frequency trading systems, Redis's combination of performance, simplicity, and reliability makes it the obvious choice.

The key insights for senior engineers and founders:

Benchmark early and often - Don't assume, measure
Consider operational complexity - Simple solutions win in production
Understand your access patterns - Queues and databases serve different needs
Plan for failure - Design recovery and monitoring from day one
Measure business impact - Technical improvements should drive business value

Redis didn't just solve our performance problems—it enabled features and trading strategies that wouldn't have been possible with a traditional database approach. That's the difference between choosing the right tool and just picking a familiar one.

This is part of my "Building a Production-Grade Exchange" series. Next up: "The Hidden Complexity of Microservices in Financial Systems" - where I'll dive into how Redis enabled our entire microservices architecture.

Command Palette

The Problem: 100,000 Orders Per Second

First Principles: What Does an Order Queue Actually Need?

Requirements Analysis

The PostgreSQL Approach: Why It Seemed Right

The Problems Started Immediately

Benchmarking: The Numbers Don't Lie

Latency Comparison

Memory Usage Patterns

Enter Redis: The Game Changer

Why BRPOP is Perfect for Order Processing

Real-World Implementation Details

Producer Side (API)

Consumer Side (Engine)

Request-Response Pattern

Production Lessons Learned

Memory Management

High Availability

Graceful Shutdown

Alternative Approaches Considered

Apache Kafka

RabbitMQ

Amazon SQS

In-Memory Queues (Node.js Arrays)

When NOT to Use Redis for Queues

Performance Optimisation Tips

Connection Pooling

Pipeline Operations

Memory Optimization

Monitoring and Observability

Key Metrics to Track

Alerting Rules

Economic Impact

Conclusion

Comments

🚀 Building a Production-Grade Exchange: Deep Dive Blog Series

More from this blog