Files
catonline_ai/vw-agentic-rag/docs/deployment.md
2025-09-26 17:15:54 +08:00

18 KiB

🚀 Deployment Guide

This guide covers deploying the Agentic RAG system in production environments, including Docker containerization, cloud deployment, and infrastructure requirements.

Production Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Load Balancer │    │   Application    │    │   Database      │
│   (nginx/ALB)   │◄──►│   Containers     │◄──►│   (PostgreSQL)  │
│                 │    │                  │    │                 │
└─────────────────┘    └──────────────────┘    └─────────────────┘
         │                       │                       │
         ▼                       ▼                       ▼
    SSL Termination         FastAPI + Next.js      Session Storage
    Domain Routing          Auto-scaling            Managed Service
    Rate Limiting          Health Monitoring        Backup & Recovery

Infrastructure Requirements

Minimum Requirements

  • CPU: 2 vCPU cores
  • Memory: 4 GB RAM
  • Storage: 20 GB SSD
  • Network: 1 Gbps bandwidth
  • CPU: 4+ vCPU cores
  • Memory: 8+ GB RAM
  • Storage: 50+ GB SSD (with backup)
  • Network: 10+ Gbps bandwidth
  • Auto-scaling: 2-10 instances

Database Requirements

  • PostgreSQL 13+
  • Storage: 10+ GB (depends on retention policy)
  • Connections: 100+ concurrent connections
  • Backup: Daily automated backups
  • SSL: Required for production

Docker Deployment

1. Dockerfile for Backend

Create Dockerfile in the project root:

# Multi-stage build for Python backend
FROM python:3.12-slim as backend-builder

# Install system dependencies
RUN apt-get update && apt-get install -y \
    build-essential \
    libpq-dev \
    && rm -rf /var/lib/apt/lists/*

# Install uv
RUN pip install uv

# Set working directory
WORKDIR /app

# Copy dependency files
COPY pyproject.toml uv.lock ./

# Install dependencies
RUN uv sync --no-dev --no-editable

# Production stage
FROM python:3.12-slim as backend

# Install runtime dependencies
RUN apt-get update && apt-get install -y \
    libpq5 \
    curl \
    && rm -rf /var/lib/apt/lists/*

# Create non-root user
RUN useradd --create-home --shell /bin/bash app

# Set working directory
WORKDIR /app

# Copy installed dependencies from builder
COPY --from=backend-builder /app/.venv /app/.venv

# Copy application code
COPY service/ service/
COPY config.yaml .
COPY scripts/ scripts/

# Set permissions
RUN chown -R app:app /app

# Switch to non-root user
USER app

# Add .venv to PATH
ENV PATH="/app/.venv/bin:$PATH"

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
    CMD curl -f http://localhost:8000/health || exit 1

# Expose port
EXPOSE 8000

# Start command
CMD ["uvicorn", "service.main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"]

2. Dockerfile for Frontend

Create web/Dockerfile:

# Frontend build stage
FROM node:18-alpine as frontend-builder

WORKDIR /app

# Copy package files
COPY package*.json ./
COPY pnpm-lock.yaml ./

# Install dependencies
RUN npm install -g pnpm
RUN pnpm install --frozen-lockfile

# Copy source code
COPY . .

# Build application
RUN pnpm run build

# Production stage
FROM node:18-alpine as frontend

WORKDIR /app

# Create non-root user
RUN addgroup -g 1001 -S nodejs
RUN adduser -S nextjs -u 1001

# Copy built application
COPY --from=frontend-builder /app/public ./public
COPY --from=frontend-builder /app/.next/standalone ./
COPY --from=frontend-builder /app/.next/static ./.next/static

# Set permissions
RUN chown -R nextjs:nodejs /app

USER nextjs

EXPOSE 3000

ENV PORT 3000
ENV HOSTNAME "0.0.0.0"

CMD ["node", "server.js"]

3. Docker Compose for Local Production

Create docker-compose.prod.yml:

version: '3.8'

services:
  postgres:
    image: postgres:15-alpine
    environment:
      POSTGRES_DB: agent_memory
      POSTGRES_USER: ${POSTGRES_USER:-agent}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    volumes:
      - postgres_data:/var/lib/postgresql/data
      - ./init.sql:/docker-entrypoint-initdb.d/init.sql
    ports:
      - "5432:5432"
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER:-agent}"]
      interval: 30s
      timeout: 10s
      retries: 5

  backend:
    build: 
      context: .
      dockerfile: Dockerfile
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - RETRIEVAL_API_KEY=${RETRIEVAL_API_KEY}
      - DATABASE_URL=postgresql://${POSTGRES_USER:-agent}:${POSTGRES_PASSWORD}@postgres:5432/agent_memory
    depends_on:
      postgres:
        condition: service_healthy
    ports:
      - "8000:8000"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3

  frontend:
    build:
      context: ./web
      dockerfile: Dockerfile
    environment:
      - NEXT_PUBLIC_LANGGRAPH_API_URL=http://backend:8000/api
    depends_on:
      - backend
    ports:
      - "3000:3000"

  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
      - ./ssl:/etc/nginx/ssl
    depends_on:
      - frontend
      - backend

volumes:
  postgres_data:

4. Environment Configuration

Create .env.prod:

# Database
POSTGRES_USER=agent
POSTGRES_PASSWORD=your-secure-password
DATABASE_URL=postgresql://agent:your-secure-password@postgres:5432/agent_memory

# LLM API
OPENAI_API_KEY=your-openai-key
AZURE_OPENAI_API_KEY=your-azure-key
RETRIEVAL_API_KEY=your-retrieval-key

# Application
LOG_LEVEL=INFO
CORS_ORIGINS=["https://yourdomain.com"]
MAX_TOOL_LOOPS=5
MEMORY_TTL_DAYS=7

# Next.js
NEXT_PUBLIC_LANGGRAPH_API_URL=https://yourdomain.com/api
NODE_ENV=production

Cloud Deployment

Azure Container Instances

# Create resource group
az group create --name agentic-rag-rg --location eastus

# Create container registry
az acr create --resource-group agentic-rag-rg \
  --name agenticragacr --sku Basic

# Build and push images
az acr build --registry agenticragacr \
  --image agentic-rag-backend:latest .

# Create PostgreSQL database
az postgres flexible-server create \
  --resource-group agentic-rag-rg \
  --name agentic-rag-db \
  --admin-user agentadmin \
  --admin-password YourSecurePassword123! \
  --sku-name Standard_B1ms \
  --tier Burstable \
  --public-access 0.0.0.0 \
  --storage-size 32

# Deploy container instance
az container create \
  --resource-group agentic-rag-rg \
  --name agentic-rag-backend \
  --image agenticragacr.azurecr.io/agentic-rag-backend:latest \
  --registry-login-server agenticragacr.azurecr.io \
  --registry-username agenticragacr \
  --registry-password $(az acr credential show --name agenticragacr --query "passwords[0].value" -o tsv) \
  --dns-name-label agentic-rag-api \
  --ports 8000 \
  --environment-variables \
    OPENAI_API_KEY=$OPENAI_API_KEY \
    DATABASE_URL=$DATABASE_URL

AWS ECS Deployment

{
  "family": "agentic-rag-backend",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "1024",
  "memory": "2048",
  "executionRoleArn": "arn:aws:iam::account:role/ecsTaskExecutionRole",
  "taskRoleArn": "arn:aws:iam::account:role/ecsTaskRole",
  "containerDefinitions": [
    {
      "name": "backend",
      "image": "your-account.dkr.ecr.region.amazonaws.com/agentic-rag-backend:latest",
      "portMappings": [
        {
          "containerPort": 8000,
          "protocol": "tcp"
        }
      ],
      "environment": [
        {
          "name": "DATABASE_URL",
          "value": "postgresql://user:pass@rds-endpoint:5432/dbname"
        }
      ],
      "secrets": [
        {
          "name": "OPENAI_API_KEY",
          "valueFrom": "arn:aws:secretsmanager:region:account:secret:openai-key"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/agentic-rag",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "backend"
        }
      },
      "healthCheck": {
        "command": ["CMD-SHELL", "curl -f http://localhost:8000/health || exit 1"],
        "interval": 30,
        "timeout": 10,
        "retries": 3,
        "startPeriod": 60
      }
    }
  ]
}

Load Balancer Configuration

Nginx Configuration

Create nginx.conf:

events {
    worker_connections 1024;
}

http {
    upstream backend {
        server backend:8000;
    }

    upstream frontend {
        server frontend:3000;
    }

    # Rate limiting
    limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
    limit_req_zone $binary_remote_addr zone=chat:10m rate=5r/s;

    server {
        listen 80;
        server_name yourdomain.com;
        return 301 https://$server_name$request_uri;
    }

    server {
        listen 443 ssl http2;
        server_name yourdomain.com;

        ssl_certificate /etc/nginx/ssl/cert.pem;
        ssl_certificate_key /etc/nginx/ssl/key.pem;
        ssl_protocols TLSv1.2 TLSv1.3;
        ssl_ciphers HIGH:!aNULL:!MD5;

        # Frontend
        location / {
            proxy_pass http://frontend;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
        }

        # API endpoints
        location /api/ {
            limit_req zone=api burst=20 nodelay;
            
            proxy_pass http://backend;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
            
            # SSE specific settings
            proxy_buffering off;
            proxy_cache off;
            proxy_set_header Connection '';
            proxy_http_version 1.1;
            chunked_transfer_encoding off;
        }

        # Chat endpoint with stricter rate limiting
        location /api/chat {
            limit_req zone=chat burst=10 nodelay;
            
            proxy_pass http://backend;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
            
            # SSE specific settings
            proxy_buffering off;
            proxy_cache off;
            proxy_read_timeout 300s;
            proxy_set_header Connection '';
            proxy_http_version 1.1;
            chunked_transfer_encoding off;
        }
    }
}

Monitoring and Observability

Health Checks

Configure comprehensive health checks:

# Enhanced health check endpoint
@app.get("/health/detailed")
async def detailed_health():
    health_status = {
        "status": "healthy",
        "service": "agentic-rag",
        "version": "0.8.0",
        "timestamp": datetime.utcnow().isoformat(),
        "components": {}
    }
    
    # Database connectivity
    try:
        memory_manager = get_memory_manager()
        db_healthy = memory_manager.test_connection()
        health_status["components"]["database"] = {
            "status": "healthy" if db_healthy else "unhealthy",
            "type": "postgresql"
        }
    except Exception as e:
        health_status["components"]["database"] = {
            "status": "unhealthy",
            "error": str(e)
        }
    
    # LLM API connectivity
    try:
        config = get_config()
        # Test LLM connection
        health_status["components"]["llm"] = {
            "status": "healthy",
            "provider": config.provider
        }
    except Exception as e:
        health_status["components"]["llm"] = {
            "status": "unhealthy",
            "error": str(e)
        }
    
    # Overall status
    all_healthy = all(
        comp.get("status") == "healthy" 
        for comp in health_status["components"].values()
    )
    health_status["status"] = "healthy" if all_healthy else "degraded"
    
    return health_status

Logging Configuration

# logging.yaml
version: 1
disable_existing_loggers: false

formatters:
  standard:
    format: '%(asctime)s [%(levelname)s] %(name)s: %(message)s'
  json:
    format: '{"timestamp": "%(asctime)s", "level": "%(levelname)s", "logger": "%(name)s", "message": "%(message)s", "module": "%(module)s", "function": "%(funcName)s", "line": %(lineno)d}'

handlers:
  console:
    class: logging.StreamHandler
    level: INFO
    formatter: standard
    stream: ext://sys.stdout
  
  file:
    class: logging.handlers.RotatingFileHandler
    level: INFO
    formatter: json
    filename: /app/logs/app.log
    maxBytes: 10485760  # 10MB
    backupCount: 5

loggers:
  service:
    level: INFO
    handlers: [console, file]
    propagate: false
  
  uvicorn:
    level: INFO
    handlers: [console]
    propagate: false

root:
  level: INFO
  handlers: [console, file]

Metrics Collection

# metrics.py
from prometheus_client import Counter, Histogram, Gauge, generate_latest

# Metrics
REQUEST_COUNT = Counter('http_requests_total', 'Total HTTP requests', ['method', 'endpoint'])
REQUEST_DURATION = Histogram('http_request_duration_seconds', 'HTTP request duration')
ACTIVE_SESSIONS = Gauge('active_sessions_total', 'Number of active chat sessions')
TOOL_CALLS = Counter('tool_calls_total', 'Total tool calls', ['tool_name', 'status'])

@app.middleware("http")
async def metrics_middleware(request: Request, call_next):
    start_time = time.time()
    response = await call_next(request)
    duration = time.time() - start_time
    
    REQUEST_COUNT.labels(
        method=request.method, 
        endpoint=request.url.path
    ).inc()
    REQUEST_DURATION.observe(duration)
    
    return response

@app.get("/metrics")
async def get_metrics():
    return Response(generate_latest(), media_type="text/plain")

Security Configuration

Environment Variables Security

# Use a secrets management service in production
export OPENAI_API_KEY=$(aws secretsmanager get-secret-value --secret-id openai-key --query SecretString --output text)
export DATABASE_PASSWORD=$(azure keyvault secret show --vault-name MyKeyVault --name db-password --query value -o tsv)

Network Security

# docker-compose.prod.yml security additions
services:
  backend:
    networks:
      - backend-network
    deploy:
      resources:
        limits:
          memory: 2G
          cpus: '1.0'
        reservations:
          memory: 1G
          cpus: '0.5'

  postgres:
    networks:
      - backend-network
    # Only accessible from backend, not exposed publicly

networks:
  backend-network:
    driver: bridge
    internal: true  # Internal network only

SSL/TLS Configuration

# Generate SSL certificates with Let's Encrypt
certbot certonly --webroot -w /var/www/html -d yourdomain.com

# Or use existing certificates
cp /path/to/your/cert.pem /etc/nginx/ssl/
cp /path/to/your/key.pem /etc/nginx/ssl/

Deployment Checklist

Pre-deployment

  • Environment Variables: All secrets configured in secure storage
  • Database: PostgreSQL instance created and accessible
  • SSL Certificates: Valid certificates for HTTPS
  • Resource Limits: CPU/memory limits configured
  • Backup Strategy: Database backup schedule configured

Deployment

  • Docker Images: Built and pushed to registry
  • Load Balancer: Configured with health checks
  • Database Migration: Schema initialized
  • Configuration: Production config.yaml deployed
  • Monitoring: Health checks and metrics collection active

Post-deployment

  • Health Check: All endpoints responding correctly
  • Load Testing: System performance under load verified
  • Log Monitoring: Error rates and performance logs reviewed
  • Security Scan: Vulnerability assessment completed
  • Backup Verification: Database backup/restore tested

Troubleshooting Production Issues

Common Deployment Issues

1. Database Connection Failures

# Check PostgreSQL connectivity
psql -h your-db-host -U username -d database_name -c "SELECT 1;"

# Verify connection string format
echo $DATABASE_URL

2. Container Health Check Failures

# Check container logs
docker logs container-name

# Test health endpoint manually
curl -f http://localhost:8000/health

3. SSL Certificate Issues

# Verify certificate validity
openssl x509 -in /etc/nginx/ssl/cert.pem -text -noout

# Check certificate expiration
openssl x509 -in /etc/nginx/ssl/cert.pem -noout -dates

4. High Memory Usage

# Monitor memory usage
docker stats

# Check for memory leaks
docker exec -it container-name top

Performance Optimization

# Production optimizations in config.yaml
app:
  memory_ttl_days: 3  # Reduce memory usage
  max_tool_loops: 3   # Limit computation

postgresql:
  pool_size: 20       # Connection pooling
  max_overflow: 0     # Prevent connection leaks

llm:
  rag:
    max_context_length: 32000  # Reduce context window if needed
    temperature: 0.1           # More deterministic responses

This deployment guide covers the essential aspects of running the Agentic RAG system in production. For specific cloud providers or deployment scenarios not covered here, consult the provider's documentation and adapt these configurations accordingly.