Skip to main content

Production Checklist

Essential tasks before deploying to production.

Security Checklist

Secrets & Keys

  • JWT secrets generated - Use cryptographically secure random values (min 48 bytes)

    openssl rand -base64 48
  • Encryption key generated - 32-byte hex key for token encryption

    openssl rand -hex 32
  • Azure OpenAI key secured - Never commit to version control

  • Resend API key secured - Store in encrypted .env.resend.production.enc

  • Environment files excluded - .env in .gitignore

Authentication

  • JWT_ACCESS_EXPIRY set - Recommended: 15 minutes
  • JWT_REFRESH_EXPIRY set - Recommended: 7 days
  • Rate limiting configured - 10 login attempts per 15 minutes
  • Password requirements enforced - Min 8 chars, mixed case, numbers, symbols

Network Security

  • HTTPS enabled - TLS 1.3 for all connections
  • CORS configured - ALLOWED_ORIGINS with specific domains
  • Helmet headers enabled - CSP, HSTS, X-Frame-Options
  • WebSocket secured - WSS for Socket.io connections

Data Protection

  • Tenant isolation enforced - ENFORCE_TENANT_ISOLATION=true
  • Encryption at rest - MongoDB and Qdrant encryption
  • PII masking enabled - Output scanning for sensitive data
  • Audit logging enabled - Security events tracked

Infrastructure Checklist

Database

  • MongoDB replica set - For high availability
  • MongoDB authentication - Username/password or SCRAM
  • MongoDB indexes created - Performance optimization
    npm run db:indexes
  • Backup strategy defined - Automated daily backups

Redis

  • Redis password set - AUTH required
  • Redis persistence configured - RDB or AOF
  • Memory limits set - maxmemory policy

Qdrant

  • Qdrant API key set - Authentication enabled
  • Collection created - With correct dimensions
  • Backup strategy defined - Snapshot schedule

Container/Server

  • Non-root user - Containers run as unprivileged user
  • Resource limits - CPU and memory constraints
  • Health checks - Kubernetes/Docker health probes
  • Log rotation - Prevent disk exhaustion

Application Checklist

Configuration

  • NODE_ENV=production - Production mode enabled
  • Debug logging disabled - LOG_LEVEL=info or warn
  • Retrieval tracing disabled - LOG_RETRIEVAL_TRACE=false
  • Cache enabled - RAG_CACHE_ENABLED=true

LLM Configuration

  • Azure OpenAI configured - All required variables set
  • Timeouts configured - LLM and streaming timeouts
  • Guardrails enabled - Hallucination blocking active
  • Fallback messages - User-friendly error messages

Email Service

  • Resend API key configured - RESEND_API_KEY set in production env
  • Domain verified in Resend - DKIM, SPF, and DMARC DNS records configured for sender domain
  • From email set - RESEND_FROM_EMAIL matches a verified Resend domain
  • Email flows tested - Password reset, email verification, workspace invitation, welcome emails all sending correctly

Sync & Workers

  • Worker processes running - All BullMQ workers active (assessment, questionnaire, monitoring)
  • Stale job recovery - STALE_JOB_TIMEOUT_HOURS configured
  • Monitoring worker scheduled - MONITORING_INTERVAL_HOURS=24 set; repeatable job registered at startup
  • Institution name configured - INSTITUTION_NAME set for RoI export RT.01.01 sheet

Monitoring Checklist

Logging

  • Structured logging - JSON format for aggregation
  • Log aggregation - Centralized logging (ELK, Datadog, etc.)
  • Error tracking - Sentry or similar service
  • Request tracing - Correlation IDs for debugging

Metrics

  • Health endpoint - /health returning status
  • API metrics - Response times, error rates
  • LLM metrics - Token usage, latencies
  • Queue metrics - Job processing rates

Alerting

  • Error rate alerts - High error rate notifications
  • Latency alerts - Slow response warnings
  • Disk/memory alerts - Resource exhaustion warnings

Performance Checklist

Optimization

  • Embedding concurrency - EMBEDDING_MAX_CONCURRENCY tuned
  • Chunk filtering - ENABLE_CHUNK_FILTER=true
  • Re-ranking enabled - ENABLE_CROSS_ENCODER_RERANK=true
  • Context expansion - ENABLE_CONTEXT_EXPANSION=true

Caching

  • RAG cache enabled - Response caching active
  • Re-rank cache enabled - RERANK_CACHE_TTL configured
  • CDN configured - Static asset caching

Database Indexes

Ensure these indexes exist:

// User
db.users.createIndex({ email: 1 }, { unique: true });

// Conversation
db.conversations.createIndex({ workspaceId: 1, userId: 1 });
db.conversations.createIndex({ updatedAt: -1 });

// Message
db.messages.createIndex({ conversationId: 1, createdAt: 1 });
db.messages.createIndex({ createdAt: 1 }, { expireAfterSeconds: 7776000 }); // 90 days

Deployment Checklist

Pre-Deployment

  • Dependencies updated - npm audit passed
  • Tests passing - All unit and integration tests
  • Build successful - Production build completes
  • Environment validated - All required variables set

Deployment

  • Blue-green or canary - Zero-downtime deployment
  • Database migrations - Schema updates applied
  • Cache cleared - Stale cache invalidated
  • Workers restarted - Background jobs processing

Post-Deployment

  • Health check passing - /health returns 200
  • Smoke tests passed - Critical paths verified
  • Logs reviewed - No unexpected errors
  • Metrics baseline - Performance within expectations

Backup & Recovery

Backup Strategy

  • MongoDB backups - Daily snapshots, 30-day retention
  • Redis backups - RDB snapshots
  • Qdrant backups - Collection snapshots
  • Secrets backup - Encrypted, off-site storage

Recovery Plan

  • RTO defined - Recovery Time Objective documented
  • RPO defined - Recovery Point Objective documented
  • Recovery tested - Quarterly restore tests
  • Runbooks created - Step-by-step recovery procedures

Documentation

  • Architecture documented - System design docs
  • Runbooks created - Operational procedures
  • API documented - Swagger/OpenAPI specs
  • Incident response - Escalation procedures

Compliance

  • GDPR compliance - Data processing documented
  • Data retention - TTL policies implemented (Analytics auto-delete after 90 days)
  • Audit trail - Access logging enabled
  • Privacy policy - User-facing documentation
  • DORA knowledge base seeded - npm run seed:compliance run in production environment
  • RoI export tested - GET /api/v1/workspaces/roi-export returns valid XLSX with 4 sheets
  • Monitoring alerts tested - Email delivery confirmed for at least one alert type