Files
catonline_ai/vw-agentic-rag/docs/deployment.md

708 lines
18 KiB
Markdown
Raw Normal View History

2025-09-26 17:15:54 +08:00
# 🚀 Deployment Guide
This guide covers deploying the Agentic RAG system in production environments, including Docker containerization, cloud deployment, and infrastructure requirements.
## Production Architecture
```
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Load Balancer │ │ Application │ │ Database │
│ (nginx/ALB) │◄──►│ Containers │◄──►│ (PostgreSQL) │
│ │ │ │ │ │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│ │ │
▼ ▼ ▼
SSL Termination FastAPI + Next.js Session Storage
Domain Routing Auto-scaling Managed Service
Rate Limiting Health Monitoring Backup & Recovery
```
## Infrastructure Requirements
### Minimum Requirements
- **CPU**: 2 vCPU cores
- **Memory**: 4 GB RAM
- **Storage**: 20 GB SSD
- **Network**: 1 Gbps bandwidth
### Recommended Production
- **CPU**: 4+ vCPU cores
- **Memory**: 8+ GB RAM
- **Storage**: 50+ GB SSD (with backup)
- **Network**: 10+ Gbps bandwidth
- **Auto-scaling**: 2-10 instances
### Database Requirements
- **PostgreSQL 13+**
- **Storage**: 10+ GB (depends on retention policy)
- **Connections**: 100+ concurrent connections
- **Backup**: Daily automated backups
- **SSL**: Required for production
## Docker Deployment
### 1. Dockerfile for Backend
Create `Dockerfile` in the project root:
```dockerfile
# Multi-stage build for Python backend
FROM python:3.12-slim as backend-builder
# Install system dependencies
RUN apt-get update && apt-get install -y \
build-essential \
libpq-dev \
&& rm -rf /var/lib/apt/lists/*
# Install uv
RUN pip install uv
# Set working directory
WORKDIR /app
# Copy dependency files
COPY pyproject.toml uv.lock ./
# Install dependencies
RUN uv sync --no-dev --no-editable
# Production stage
FROM python:3.12-slim as backend
# Install runtime dependencies
RUN apt-get update && apt-get install -y \
libpq5 \
curl \
&& rm -rf /var/lib/apt/lists/*
# Create non-root user
RUN useradd --create-home --shell /bin/bash app
# Set working directory
WORKDIR /app
# Copy installed dependencies from builder
COPY --from=backend-builder /app/.venv /app/.venv
# Copy application code
COPY service/ service/
COPY config.yaml .
COPY scripts/ scripts/
# Set permissions
RUN chown -R app:app /app
# Switch to non-root user
USER app
# Add .venv to PATH
ENV PATH="/app/.venv/bin:$PATH"
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
# Expose port
EXPOSE 8000
# Start command
CMD ["uvicorn", "service.main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"]
```
### 2. Dockerfile for Frontend
Create `web/Dockerfile`:
```dockerfile
# Frontend build stage
FROM node:18-alpine as frontend-builder
WORKDIR /app
# Copy package files
COPY package*.json ./
COPY pnpm-lock.yaml ./
# Install dependencies
RUN npm install -g pnpm
RUN pnpm install --frozen-lockfile
# Copy source code
COPY . .
# Build application
RUN pnpm run build
# Production stage
FROM node:18-alpine as frontend
WORKDIR /app
# Create non-root user
RUN addgroup -g 1001 -S nodejs
RUN adduser -S nextjs -u 1001
# Copy built application
COPY --from=frontend-builder /app/public ./public
COPY --from=frontend-builder /app/.next/standalone ./
COPY --from=frontend-builder /app/.next/static ./.next/static
# Set permissions
RUN chown -R nextjs:nodejs /app
USER nextjs
EXPOSE 3000
ENV PORT 3000
ENV HOSTNAME "0.0.0.0"
CMD ["node", "server.js"]
```
### 3. Docker Compose for Local Production
Create `docker-compose.prod.yml`:
```yaml
version: '3.8'
services:
postgres:
image: postgres:15-alpine
environment:
POSTGRES_DB: agent_memory
POSTGRES_USER: ${POSTGRES_USER:-agent}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
volumes:
- postgres_data:/var/lib/postgresql/data
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
ports:
- "5432:5432"
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER:-agent}"]
interval: 30s
timeout: 10s
retries: 5
backend:
build:
context: .
dockerfile: Dockerfile
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}
- RETRIEVAL_API_KEY=${RETRIEVAL_API_KEY}
- DATABASE_URL=postgresql://${POSTGRES_USER:-agent}:${POSTGRES_PASSWORD}@postgres:5432/agent_memory
depends_on:
postgres:
condition: service_healthy
ports:
- "8000:8000"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
frontend:
build:
context: ./web
dockerfile: Dockerfile
environment:
- NEXT_PUBLIC_LANGGRAPH_API_URL=http://backend:8000/api
depends_on:
- backend
ports:
- "3000:3000"
nginx:
image: nginx:alpine
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf
- ./ssl:/etc/nginx/ssl
depends_on:
- frontend
- backend
volumes:
postgres_data:
```
### 4. Environment Configuration
Create `.env.prod`:
```bash
# Database
POSTGRES_USER=agent
POSTGRES_PASSWORD=your-secure-password
DATABASE_URL=postgresql://agent:your-secure-password@postgres:5432/agent_memory
# LLM API
OPENAI_API_KEY=your-openai-key
AZURE_OPENAI_API_KEY=your-azure-key
RETRIEVAL_API_KEY=your-retrieval-key
# Application
LOG_LEVEL=INFO
CORS_ORIGINS=["https://yourdomain.com"]
MAX_TOOL_LOOPS=5
MEMORY_TTL_DAYS=7
# Next.js
NEXT_PUBLIC_LANGGRAPH_API_URL=https://yourdomain.com/api
NODE_ENV=production
```
## Cloud Deployment
### Azure Container Instances
```bash
# Create resource group
az group create --name agentic-rag-rg --location eastus
# Create container registry
az acr create --resource-group agentic-rag-rg \
--name agenticragacr --sku Basic
# Build and push images
az acr build --registry agenticragacr \
--image agentic-rag-backend:latest .
# Create PostgreSQL database
az postgres flexible-server create \
--resource-group agentic-rag-rg \
--name agentic-rag-db \
--admin-user agentadmin \
--admin-password YourSecurePassword123! \
--sku-name Standard_B1ms \
--tier Burstable \
--public-access 0.0.0.0 \
--storage-size 32
# Deploy container instance
az container create \
--resource-group agentic-rag-rg \
--name agentic-rag-backend \
--image agenticragacr.azurecr.io/agentic-rag-backend:latest \
--registry-login-server agenticragacr.azurecr.io \
--registry-username agenticragacr \
--registry-password $(az acr credential show --name agenticragacr --query "passwords[0].value" -o tsv) \
--dns-name-label agentic-rag-api \
--ports 8000 \
--environment-variables \
OPENAI_API_KEY=$OPENAI_API_KEY \
DATABASE_URL=$DATABASE_URL
```
### AWS ECS Deployment
```json
{
"family": "agentic-rag-backend",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "1024",
"memory": "2048",
"executionRoleArn": "arn:aws:iam::account:role/ecsTaskExecutionRole",
"taskRoleArn": "arn:aws:iam::account:role/ecsTaskRole",
"containerDefinitions": [
{
"name": "backend",
"image": "your-account.dkr.ecr.region.amazonaws.com/agentic-rag-backend:latest",
"portMappings": [
{
"containerPort": 8000,
"protocol": "tcp"
}
],
"environment": [
{
"name": "DATABASE_URL",
"value": "postgresql://user:pass@rds-endpoint:5432/dbname"
}
],
"secrets": [
{
"name": "OPENAI_API_KEY",
"valueFrom": "arn:aws:secretsmanager:region:account:secret:openai-key"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/agentic-rag",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "backend"
}
},
"healthCheck": {
"command": ["CMD-SHELL", "curl -f http://localhost:8000/health || exit 1"],
"interval": 30,
"timeout": 10,
"retries": 3,
"startPeriod": 60
}
}
]
}
```
## Load Balancer Configuration
### Nginx Configuration
Create `nginx.conf`:
```nginx
events {
worker_connections 1024;
}
http {
upstream backend {
server backend:8000;
}
upstream frontend {
server frontend:3000;
}
# Rate limiting
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
limit_req_zone $binary_remote_addr zone=chat:10m rate=5r/s;
server {
listen 80;
server_name yourdomain.com;
return 301 https://$server_name$request_uri;
}
server {
listen 443 ssl http2;
server_name yourdomain.com;
ssl_certificate /etc/nginx/ssl/cert.pem;
ssl_certificate_key /etc/nginx/ssl/key.pem;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!MD5;
# Frontend
location / {
proxy_pass http://frontend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
# API endpoints
location /api/ {
limit_req zone=api burst=20 nodelay;
proxy_pass http://backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# SSE specific settings
proxy_buffering off;
proxy_cache off;
proxy_set_header Connection '';
proxy_http_version 1.1;
chunked_transfer_encoding off;
}
# Chat endpoint with stricter rate limiting
location /api/chat {
limit_req zone=chat burst=10 nodelay;
proxy_pass http://backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# SSE specific settings
proxy_buffering off;
proxy_cache off;
proxy_read_timeout 300s;
proxy_set_header Connection '';
proxy_http_version 1.1;
chunked_transfer_encoding off;
}
}
}
```
## Monitoring and Observability
### Health Checks
Configure comprehensive health checks:
```python
# Enhanced health check endpoint
@app.get("/health/detailed")
async def detailed_health():
health_status = {
"status": "healthy",
"service": "agentic-rag",
"version": "0.8.0",
"timestamp": datetime.utcnow().isoformat(),
"components": {}
}
# Database connectivity
try:
memory_manager = get_memory_manager()
db_healthy = memory_manager.test_connection()
health_status["components"]["database"] = {
"status": "healthy" if db_healthy else "unhealthy",
"type": "postgresql"
}
except Exception as e:
health_status["components"]["database"] = {
"status": "unhealthy",
"error": str(e)
}
# LLM API connectivity
try:
config = get_config()
# Test LLM connection
health_status["components"]["llm"] = {
"status": "healthy",
"provider": config.provider
}
except Exception as e:
health_status["components"]["llm"] = {
"status": "unhealthy",
"error": str(e)
}
# Overall status
all_healthy = all(
comp.get("status") == "healthy"
for comp in health_status["components"].values()
)
health_status["status"] = "healthy" if all_healthy else "degraded"
return health_status
```
### Logging Configuration
```yaml
# logging.yaml
version: 1
disable_existing_loggers: false
formatters:
standard:
format: '%(asctime)s [%(levelname)s] %(name)s: %(message)s'
json:
format: '{"timestamp": "%(asctime)s", "level": "%(levelname)s", "logger": "%(name)s", "message": "%(message)s", "module": "%(module)s", "function": "%(funcName)s", "line": %(lineno)d}'
handlers:
console:
class: logging.StreamHandler
level: INFO
formatter: standard
stream: ext://sys.stdout
file:
class: logging.handlers.RotatingFileHandler
level: INFO
formatter: json
filename: /app/logs/app.log
maxBytes: 10485760 # 10MB
backupCount: 5
loggers:
service:
level: INFO
handlers: [console, file]
propagate: false
uvicorn:
level: INFO
handlers: [console]
propagate: false
root:
level: INFO
handlers: [console, file]
```
### Metrics Collection
```python
# metrics.py
from prometheus_client import Counter, Histogram, Gauge, generate_latest
# Metrics
REQUEST_COUNT = Counter('http_requests_total', 'Total HTTP requests', ['method', 'endpoint'])
REQUEST_DURATION = Histogram('http_request_duration_seconds', 'HTTP request duration')
ACTIVE_SESSIONS = Gauge('active_sessions_total', 'Number of active chat sessions')
TOOL_CALLS = Counter('tool_calls_total', 'Total tool calls', ['tool_name', 'status'])
@app.middleware("http")
async def metrics_middleware(request: Request, call_next):
start_time = time.time()
response = await call_next(request)
duration = time.time() - start_time
REQUEST_COUNT.labels(
method=request.method,
endpoint=request.url.path
).inc()
REQUEST_DURATION.observe(duration)
return response
@app.get("/metrics")
async def get_metrics():
return Response(generate_latest(), media_type="text/plain")
```
## Security Configuration
### Environment Variables Security
```bash
# Use a secrets management service in production
export OPENAI_API_KEY=$(aws secretsmanager get-secret-value --secret-id openai-key --query SecretString --output text)
export DATABASE_PASSWORD=$(azure keyvault secret show --vault-name MyKeyVault --name db-password --query value -o tsv)
```
### Network Security
```yaml
# docker-compose.prod.yml security additions
services:
backend:
networks:
- backend-network
deploy:
resources:
limits:
memory: 2G
cpus: '1.0'
reservations:
memory: 1G
cpus: '0.5'
postgres:
networks:
- backend-network
# Only accessible from backend, not exposed publicly
networks:
backend-network:
driver: bridge
internal: true # Internal network only
```
### SSL/TLS Configuration
```bash
# Generate SSL certificates with Let's Encrypt
certbot certonly --webroot -w /var/www/html -d yourdomain.com
# Or use existing certificates
cp /path/to/your/cert.pem /etc/nginx/ssl/
cp /path/to/your/key.pem /etc/nginx/ssl/
```
## Deployment Checklist
### Pre-deployment
- [ ] **Environment Variables**: All secrets configured in secure storage
- [ ] **Database**: PostgreSQL instance created and accessible
- [ ] **SSL Certificates**: Valid certificates for HTTPS
- [ ] **Resource Limits**: CPU/memory limits configured
- [ ] **Backup Strategy**: Database backup schedule configured
### Deployment
- [ ] **Docker Images**: Built and pushed to registry
- [ ] **Load Balancer**: Configured with health checks
- [ ] **Database Migration**: Schema initialized
- [ ] **Configuration**: Production config.yaml deployed
- [ ] **Monitoring**: Health checks and metrics collection active
### Post-deployment
- [ ] **Health Check**: All endpoints responding correctly
- [ ] **Load Testing**: System performance under load verified
- [ ] **Log Monitoring**: Error rates and performance logs reviewed
- [ ] **Security Scan**: Vulnerability assessment completed
- [ ] **Backup Verification**: Database backup/restore tested
## Troubleshooting Production Issues
### Common Deployment Issues
**1. Database Connection Failures**
```bash
# Check PostgreSQL connectivity
psql -h your-db-host -U username -d database_name -c "SELECT 1;"
# Verify connection string format
echo $DATABASE_URL
```
**2. Container Health Check Failures**
```bash
# Check container logs
docker logs container-name
# Test health endpoint manually
curl -f http://localhost:8000/health
```
**3. SSL Certificate Issues**
```bash
# Verify certificate validity
openssl x509 -in /etc/nginx/ssl/cert.pem -text -noout
# Check certificate expiration
openssl x509 -in /etc/nginx/ssl/cert.pem -noout -dates
```
**4. High Memory Usage**
```bash
# Monitor memory usage
docker stats
# Check for memory leaks
docker exec -it container-name top
```
### Performance Optimization
```yaml
# Production optimizations in config.yaml
app:
memory_ttl_days: 3 # Reduce memory usage
max_tool_loops: 3 # Limit computation
postgresql:
pool_size: 20 # Connection pooling
max_overflow: 0 # Prevent connection leaks
llm:
rag:
max_context_length: 32000 # Reduce context window if needed
temperature: 0.1 # More deterministic responses
```
---
This deployment guide covers the essential aspects of running the Agentic RAG system in production. For specific cloud providers or deployment scenarios not covered here, consult the provider's documentation and adapt these configurations accordingly.