Day 5 of 5
⏱ ~60 minutes
Build AI Apps with JavaScript — Day 5

Production Patterns: Error Handling, Rate Limits, and Caching

Day 5 covers the production patterns that separate hobby projects from reliable apps: proper error handling, API rate limit management, response caching, and cost optimization.

Error Handling for the Claude API

Robust API Call with Retry
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();

async function callWithRetry(params, maxRetries = 3) {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      return await client.messages.create(params);
    } catch (error) {
      if (error instanceof Anthropic.RateLimitError) {
        // Wait before retrying (exponential backoff)
        const wait = Math.pow(2, attempt) * 1000;
        console.log(`Rate limited. Waiting ${wait}ms...`);
        await new Promise(r => setTimeout(r, wait));
      } else if (error instanceof Anthropic.APIError && 
                 error.status >= 500 && attempt < maxRetries) {
        // Server error, retry
        await new Promise(r => setTimeout(r, 1000 * attempt));
      } else {
        throw error; // Don't retry auth errors or client errors
      }
    }
  }
  throw new Error('Max retries exceeded');
}

Simple Response Caching

In-Memory Cache
import crypto from 'crypto';

const cache = new Map();
const CACHE_TTL = 60 * 60 * 1000; // 1 hour

function cacheKey(params) {
  return crypto.createHash('md5')
    .update(JSON.stringify(params))
    .digest('hex');
}

async function cachedChat(params) {
  const key = cacheKey(params);
  const cached = cache.get(key);
  
  if (cached && Date.now() - cached.timestamp < CACHE_TTL) {
    console.log('Cache hit');
    return cached.response;
  }
  
  const response = await client.messages.create(params);
  cache.set(key, { response, timestamp: Date.now() });
  return response;
}

// Use for deterministic queries (FAQ, document analysis)
// Don't cache conversational messages

Token Budget and Cost Control

Token Estimation
// Rough token estimate: ~4 chars per token
function estimateTokens(text) {
  return Math.ceil(text.length / 4);
}

// Track usage per session
function trackUsage(session, response) {
  if (!session.totalTokens) session.totalTokens = 0;
  session.totalTokens += response.usage.input_tokens + 
                         response.usage.output_tokens;
  
  // Warn if approaching budget
  if (session.totalTokens > 50000) {
    console.warn('Session token budget warning:', session.totalTokens);
  }
}

// Cost calculation (claude-opus-4-5 rates as of 2026)
function estimateCost(inputTokens, outputTokens) {
  const inputCost = inputTokens * 0.000003;   // $3/M input tokens
  const outputCost = outputTokens * 0.000015; // $15/M output tokens
  return inputCost + outputCost;
}
Day 5 Exercise
Harden Your App for Production
  1. Replace all direct API calls in your server with callWithRetry().
  2. Add the response cache to your document analysis endpoint.
  3. Add token tracking to your session object — log usage after each exchange.
  4. Set a max_tokens limit based on your budget and test that long responses truncate gracefully.
  5. Deploy to Railway or a cloud provider. Test the error handling by temporarily using an invalid API key.

Course Complete — Production AI in JavaScript

  • Exponential backoff retry for rate limits and server errors.
  • In-memory caching reduces API costs for repeated deterministic queries.
  • Token tracking lets you monitor costs per session or user.
  • You have the full stack: API calls, streaming, document processing, session management, error handling.

Want to go deeper in 3 days?

Our in-person AI bootcamp covers advanced AI development, agentic systems, and production deployment. Five cities. $1,490.

Reserve Your Seat →
Finished this lesson?