Production Patterns: Error Handling, Rate Limits, and Caching

Day 5 covers the production patterns that separate hobby projects from reliable apps: proper error handling, API rate limit management, response caching, and cost optimization.

~1 hour Hands-on Precision AI Academy

Today’s Objective

Day 5 covers the production patterns that separate hobby projects from reliable apps: proper error handling, API rate limit management, response caching, and cost optimization.

import Anthropic from '@anthropic-ai/sdk'; const client = new Anthropic(); async function callWithRetry(params, maxRetries = 3) { for (let attempt = 1; attempt <= maxRetries; attempt++) { try { return await client.messages.create(params); } catch (error) { if (error instanceof Anthropic.RateLimitError) { // Wait before retrying (exponential backoff) const wait = Math.pow(2, attempt) * 1000; console.log(`Rate limited. Waiting ${wait}ms...`); await new Promise(r => setTimeout(r, wait)); } else if (error instanceof Anthropic.APIError && error.status >= 500 && attempt < maxRetries) { // Server error, retry await new Promise(r => setTimeout(r, 1000 * attempt)); } else { throw error; // Don't retry auth errors or client errors } } } throw new Error('Max retries exceeded'); }

Simple Response Caching

In-Memory Cache

IN-MEMORY CACHE

import crypto from 'crypto';

const cache = new Map();
const CACHE_TTL = 60 * 60 * 1000; // 1 hour

function cacheKey(params) { return crypto.createHash('md5') .update(JSON.stringify(params)) .digest('hex');
}

async function cachedChat(params) { const key = cacheKey(params); const cached = cache.get(key); if (cached && Date.now() - cached.timestamp < CACHE_TTL) { console.log('Cache hit'); return cached.response; } const response = await client.messages.create(params); cache.set(key, { response, timestamp: Date.now() }); return response;
}

// Use for deterministic queries (FAQ, document analysis)
// Don't cache conversational messages

Token Budget and Cost Control

Token Estimation

TOKEN ESTIMATION

// Rough token estimate: ~4 chars per token
function estimateTokens(text) { return Math.ceil(text.length / 4);
}

// Track usage per session
function trackUsage(session, response) { if (!session.totalTokens) session.totalTokens = 0; session.totalTokens += response.usage.input_tokens + response.usage.output_tokens; // Warn if approaching budget if (session.totalTokens > 50000) { console.warn('Session token budget warning:', session.totalTokens); }
}

// Cost calculation (claude-opus-4-5 rates as of 2026)
function estimateCost(inputTokens, outputTokens) { const inputCost = inputTokens * 0.000003; // $3/M input tokens const outputCost = outputTokens * 0.000015; // $15/M output tokens return inputCost + outputCost;
}

Want to go deeper in 2 days?

Our in-person AI bootcamp covers advanced AI development, agentic systems, and production deployment. Five cities. $1,490.

Reserve Your Seat →

Day 5 Checkpoint

Before moving on, make sure you can answer these without looking:

What is the core concept introduced in this lesson, and why does it matter?
What problem does Production solve that simpler approaches cannot?
Can you trace through the main code example in this lesson and explain each step?
What are the most common mistakes made when first learning this concept?
How would you explain today’s topic to a colleague who has never seen it before?