Files
catonline_ai/vw-agentic-rag/README.md
2025-09-26 17:15:54 +08:00

21 KiB

Agentic RAG for Manufacturing Standards & Regulations

An advanced Agentic RAG (Retrieval-Augmented Generation) application that helps enterprises answer questions about manufacturing standards and regulations. The system combines LangGraph orchestration, streaming responses, and authoritative document retrieval to provide grounded answers with proper citations.

Overview

This project provides a complete AI-powered assistant solution for manufacturing standards and regulatory compliance queries. It features an autonomous agent workflow that can retrieve relevant information from multiple sources, synthesize comprehensive answers, and provide proper citations in real-time streaming responses.

The system consists of a FastAPI backend powered by LangGraph for agent orchestration, PostgreSQL for persistent session memory, and a modern Next.js frontend using assistant-ui components for an optimal user experience.

Features

Core Capabilities

  • 🤖 Multi-Intent Agentic Workflow: LangGraph v0.6-powered system with intelligent intent recognition and routing
  • 🧠 Dual Agent System: Specialized agents for standards/regulations and user manual queries
  • 📡 Real-time Streaming: Server-Sent Events (SSE) with token-by-token streaming and live tool execution updates
  • 🔍 Advanced Retrieval System: Two-phase search strategy with metadata and content chunk retrieval
  • 📚 Smart Citation Management: Automatic superscript citations [1] with dynamic source document mapping
  • 💾 Persistent Memory: PostgreSQL-based session storage with 7-day TTL and intelligent conversation trimming
  • 🎨 Modern Web UI: Next.js + assistant-ui components with responsive design and multi-language support

Intelligence Features

  • 🎯 Intent Classification: Automatic routing between different knowledge domains (standards vs. user manuals)
  • 🔄 Multi-Round Tool Execution: Autonomous multi-step reasoning with parallel tool execution
  • 🔗 Context-Aware Retrieval: Query rewriting and enhancement based on conversation history
  • 📊 Tool Progress Tracking: Real-time visual feedback for ongoing retrieval operations
  • 🌍 Multi-Language Support: Browser language detection with URL parameter override

Technical Features

  • 🔌 AI SDK Compatibility: Full support for AI SDK Data Stream Protocol and assistant-ui integration
  • 🌐 Framework Agnostic: RESTful API design compatible with any frontend framework
  • 🔒 Production Ready: Structured logging, comprehensive error handling, CORS support
  • 🧪 Comprehensive Testing: Unit tests, integration tests, and streaming response validation
  • 🚀 Easy Deployment: Docker support, environment-based configuration, health monitoring
  • Performance Optimized: Efficient PostgreSQL connection pooling and memory management

🏗️ Architecture

System Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Next.js Web   │    │   FastAPI        │    │  PostgreSQL     │
│  (assistant-ui)  │◄──►│   + LangGraph    │◄──►│  Session Store  │
│                 │    │   Backend        │    │                 │
└─────────────────┘    └──────────────────┘    └─────────────────┘
       │                        │                        │
       ▼                        ▼                        ▼
   User Interface         AI Agent Workflow         Persistent Memory
 - Thread Component     - Intent Recognition       - Conversation History
 - Tool UI Display      - Dual Agent System        - 7-day TTL
 - Streaming Updates    - Tool Orchestration       - Session Management
 - Citation Links       - Citation Generation      - Connection Pooling

Multi-Intent Agent Workflow

[User Query] → [Intent Recognition] → [Route Decision]
                        │                    │
                        ▼                    ▼
            [Standards/Regulation RAG]  [User Manual RAG]
                        │                    │
                        ▼                    ▼
         [Multi-Phase Retrieval]     [Manual Content Search]
                        │                    │
                        ▼                    ▼
              [Citation Generation]  [Direct Answer]
                        │                    │
                        └─────► [Post Process] ◄─────┘
                                     │
                                     ▼
                             [Streaming Response]

Enhanced Agent Workflow

The system now features a sophisticated multi-intent architecture:

  1. Intent Recognition Node: Classifies user queries into appropriate domains
  2. Standard/Regulation RAG Agent: Handles compliance and standards queries with two-phase retrieval
  3. User Manual RAG Agent: Processes system usage and documentation queries
  4. Post Processing Node: Formats final outputs with citations and tool summaries

Configuration Management

  • Dual Configuration:
    • config.yaml: Core application settings (database, API, logging, retrieval endpoints)
    • llm_prompt.yaml: LLM parameters and specialized prompt templates for each agent
  • Environment Variables: Sensitive settings loaded from environment with fallback defaults
  • Type Safety: Pydantic models for configuration validation and runtime checks

Tool System Architecture

  • Modular Design: Tool definitions in service/graph/tools.py and service/graph/user_manual_tools.py
  • Parallel Execution: Multiple tools execute concurrently via asyncio.gather for optimal performance
  • Schema Generation: Automatic tool schema generation for LLM function calling
  • Error Handling: Robust error handling with detailed logging and graceful degradation
  • Context Injection: Tools receive conversation context for enhanced query understanding

Key Components

  • 🎯 Intent Recognition Node: Intelligent classification of user queries into appropriate knowledge domains
  • 🤖 Standards/Regulation Agent: Autonomous agent with two-phase retrieval strategy and citation generation
  • 📖 User Manual Agent: Specialized agent for system documentation and usage guidance queries
  • 🔧 Advanced Retrieval Tools: HTTP wrappers for multiple search APIs with conversation context injection
  • 📝 Post Processing Node: Formats final outputs with citations, tool summaries, and system disclaimers
  • 💽 PostgreSQL Memory: Persistent session storage with connection pooling and automatic cleanup
  • 📊 Streaming Response: AI SDK compatible SSE events with comprehensive tool progress tracking
  • 🌍 Multi-Language UI: Browser language detection with URL parameter override and localized content

📁 Codebase Structure

agentic-rag-4/
├── 📋 config.yaml              # Main application configuration
├── 🎯 llm_prompt.yaml          # LLM parameters and prompt templates
├── 🐍 pyproject.toml           # Python dependencies and project metadata
├── ⚙️ Makefile                 # Build automation and development commands
└── 📜 scripts/                 # Service management scripts
    ├── start_service.sh        # Service startup script
    ├── stop_service.sh         # Service shutdown script
    └── port_manager.sh         # Port management utilities

Backend (Python/FastAPI/LangGraph):
├── 🔧 service/                 # Main backend service
    ├── main.py                 # FastAPI application entry point
    ├── config.py               # Configuration management
    ├── ai_sdk_chat.py          # AI SDK compatible chat endpoint
    ├── ai_sdk_adapter.py       # Data Stream Protocol adapter
    ├── llm_client.py           # LLM provider abstractions
    ├── sse.py                  # Server-Sent Events utilities
    ├── 🧠 graph/               # LangGraph agent workflow
    │   ├── graph.py            # Multi-intent agent workflow definition
    │   ├── state.py            # Agent state management
    │   ├── intent_recognition.py # Query intent classification
    │   ├── tools.py            # Standard/regulation retrieval tools
    │   ├── user_manual_rag.py  # User manual agent workflow
    │   ├── user_manual_tools.py # User manual retrieval tools
    │   └── message_trimmer.py  # Conversation context management
    ├── 💾 memory/              # Session memory implementations
    │   ├── postgresql_memory.py # PostgreSQL session persistence
    │   └── store.py            # Memory store abstractions
    ├── 🔍 retrieval/           # Information retrieval tools
    │   └── agentic_retrieval.py # Enhanced search tools with context
    ├── 📋 schemas/             # Data models and validation
    │   └── messages.py         # Chat message schemas
    └── 🛠️ utils/               # Shared utilities
        ├── logging.py          # Structured logging
        ├── templates.py        # Prompt templates
        └── error_handler.py    # Error handling utilities

Frontend (Next.js/React/assistant-ui):
├── 🌐 web/                     # Next.js web application
    ├── src/app/                # App router structure
    │   ├── page.tsx            # Main chat interface with multi-language support
    │   ├── layout.tsx          # Application layout and metadata
    │   ├── globals.css         # Global styles + assistant-ui theming
    │   └── api/                # API routes (Server-side)
    │       ├── chat/route.ts   # Chat API proxy to backend
    │       └── langgraph/      # LangGraph API proxy for assistant-ui
    ├── public/                 # Static assets
    │   ├── legal-document.png  # Standard/regulation tool icon
    │   ├── search.png          # Content search tool icon
    │   └── user-guide.png      # User manual tool icon
    ├── package.json            # Frontend dependencies
    ├── tailwind.config.ts      # Tailwind + assistant-ui configuration
    └── next.config.ts          # Next.js configuration

Testing & Documentation:
├── 🧪 tests/                   # Test suite
    ├── unit/                   # Unit tests
    └── integration/            # Integration and E2E tests
└── 📚 docs/                    # Documentation
    ├── CHANGELOG.md            # Version history and changes
    ├── deployment.md           # Deployment guide
    ├── development.md          # Development setup
    └── testing.md              # Testing guide

🚀 Quick Start

Prerequisites

  • Python 3.12+ - Required for backend service
  • Node.js 18+ - Required for frontend development
  • uv - Rust-based Python package manager (Install uv)
  • npm/pnpm - Node.js package manager
  • PostgreSQL - Database for session persistence (Azure Database for PostgreSQL recommended)
  • LLM API Access - OpenAI API key or Azure OpenAI credentials
  • Retrieval API Access - Access to the manufacturing standards retrieval service

1. Installation

# Clone the repository
git clone <repository-url>
cd agentic-rag-4

# Install all dependencies (backend + frontend)
make install

# Alternative: Install manually
uv sync              # Backend dependencies
cd web && npm install # Frontend dependencies

2. Configuration

The application uses two main configuration files:

# Copy and edit configuration files
cp config.yaml config.local.yaml          # Main app configuration
cp llm_prompt.yaml llm_prompt.local.yaml  # LLM settings and prompts

# Required environment variables
export OPENAI_API_KEY="your-openai-api-key"
export RETRIEVAL_API_KEY="your-retrieval-api-key"

# For Azure OpenAI (optional)
export AZURE_OPENAI_API_KEY="your-azure-key"

Edit config.yaml (Application Configuration):

app:
  name: agentic-rag
  max_tool_rounds: 3
  memory_ttl_days: 7
  port: 8000

provider: openai  # or "azure"

openai:
  api_key: "${OPENAI_API_KEY}"
  base_url: "https://api.openai.com/v1"
  model: "gpt-4o"

retrieval:
  endpoint: "your-retrieval-endpoint"
  api_key: "${RETRIEVAL_API_KEY}"

search:
  standard_regulation_index: "index-standards"
  chunk_index: "index-chunks"
  chunk_user_manual_index: "index-manuals"

postgresql:
  host: "localhost"
  database: "agent_memory"
  username: "your-username"
  password: "your-password"
  ttl_days: 7

citation:
  base_url: "https://your-citation-base-url"

Edit llm_prompt.yaml (LLM Parameters & Prompts):

parameters:
  temperature: 0
  max_context_length: 100000

prompts:
  agent_system_prompt: |
    You are an Agentic RAG assistant for the CATOnline system...
    # Custom agent prompt for standards/regulations
    
  intent_recognition_system_prompt: |
    You are an intent classifier for the CATOnline system...
    # Intent classification prompt
    
  user_manual_system_prompt: |  
    You are a specialized assistant for CATOnline user manual queries...
    # User manual assistant prompt
# Option 1: Start both services simultaneously
make dev

# Option 2: Start services separately
make dev-backend     # Backend with auto-reload
make dev-web         # Frontend development server

# Check service status
make status
make health

Service URLs:

4. Production Mode

# Start backend service
make start          # Foreground mode
make start-bg       # Background mode

# Stop service
make stop

# Restart service
make restart

# Build and serve frontend
cd web
npm run build
npm start

5. Testing & Validation

# Run all tests
make test

# Run specific test suites
make test-unit           # Unit tests
make test-integration    # Integration tests
make test-e2e           # End-to-end tests

# Check service health
make health

# View service logs
make logs

📡 API Reference

Chat Endpoints

Primary Chat API (SSE Format)

POST /api/chat

Traditional Server-Sent Events format for custom integrations:

{
  "session_id": "session_abc123_1640995200000",
  "messages": [
    {"role": "user", "content": "What are the vehicle safety testing standards for electric vehicles?"}
  ],
  "client_hints": {}
}

AI SDK Compatible API (Data Stream Protocol)

POST /api/ai-sdk/chat

Compatible with AI SDK and assistant-ui frontend:

{
  "messages": [
    {"role": "user", "content": "What are the vehicle safety testing standards for electric vehicles?"}
  ],
  "session_id": "session_abc123_1640995200000",
  "metadata": {
    "source": "assistant-ui",
    "version": "0.11.0",
    "timestamp": "2025-01-01T12:00:00Z"
  }
}

Response Format

SSE Events (/api/chat):

event: tool_start
data: {"id":"tool_123","name":"retrieve_standard_regulation","args":{"query":"vehicle safety testing standards electric vehicles"}}

event: tokens  
data: {"delta":"Based on the retrieved standards","tool_call_id":null}

event: tool_result
data: {"id":"tool_123","name":"retrieve_standard_regulation","results":[...],"took_ms":234}

event: agent_done
data: {"answer_done":true}

event: post_append_1
data: {"answer":"Vehicle safety testing for electric vehicles [1] involves...","citations_mapping_csv":"1,SRC-ISO26262\n2,SRC-UN38.3"}

Data Stream Protocol (/api/ai-sdk/chat):

0:{"id":"msg_001","role":"assistant","content":[{"type":"text","text":"Based on the retrieved standards"}]}
1:{"type":"tool_call","tool_call_id":"tool_123","name":"retrieve_standard_regulation","args":{"query":"vehicle safety testing"}}
2:{"type":"tool_result","tool_call_id":"tool_123","result":{"results":[...],"took_ms":234}}

Utility Endpoints

Health Check

GET /health

{
  "status": "healthy", 
  "service": "agentic-rag"
}

API Information

GET /

{
  "message": "Agentic RAG API for Manufacturing Standards & Regulations"
}

Available Tools

The system provides specialized tools for different knowledge domains:

Standards & Regulations Tools

  1. retrieve_standard_regulation - Search standard/regulation metadata and attributes
  2. retrieve_doc_chunk_standard_regulation - Search document content chunks

User Manual Tools

  1. retrieve_system_usermanual - Search CATOnline system documentation and user guides
Parameter Type Required Description
query string Search query text
conversation_history string Previous conversation context
top_k integer Maximum results (default: 10)
score_threshold float Minimum relevance score
gen_rerank boolean Enable reranking (default: true)

Event Types Reference

Event Type Data Fields Description
tokens delta, tool_call_id LLM token stream
tool_start id, name, args Tool execution begins
tool_result id, name, results, took_ms Tool execution complete
tool_error id, name, error Tool execution failed
agent_done answer_done Agent processing complete
intent_classification intent, confidence Query intent classification result
citations citations_list Final formatted citation list
tool_summary summary Tool execution summary
error error, details System error occurred

Multi-Intent Workflow Events

The system now supports intent-based routing with specialized event streams:

  • Standards/Regulation Queries: Full tool execution with citation generation
  • User Manual Queries: Streamlined documentation search with direct answers
  • Intent Classification: Real-time feedback on query routing decisions

🧠 Multi-Intent System

The application features an intelligent intent recognition system that automatically routes user queries to specialized agents:

Intent Classification

The system analyzes user queries and conversation context to determine the appropriate processing path:

  1. Standard_Regulation_RAG: For compliance, standards, and regulatory queries

    • Two-phase retrieval strategy (metadata → content chunks)
    • Enhanced citation generation with document linking
    • Multi-round tool execution for comprehensive answers
  2. User_Manual_RAG: For system documentation and usage questions

    • Direct documentation search and retrieval
    • Streamlined processing for faster responses
    • Context-aware help and guidance

Query Examples

Standards/Regulation Queries:

  • "最新的电动汽车锂电池标准?" (Latest lithium battery standards for electric vehicles?)
  • "如何测试电动汽车的充电性能?" (How to test electric vehicle charging performance?)
  • "提供关于车辆通讯安全的法规" (Provide vehicle communication security regulations)

User Manual Queries:

  • "How do I use CATOnline system?"
  • "What are the search features available?"
  • "How to export search results?"

Enhanced Features

  • Context Preservation: Session memory maintained across intent switches
  • Language Detection: Automatic language handling for Chinese/English queries
  • Visual Feedback: Real-time UI updates showing intent classification and tool progress
  • Error Recovery: Graceful handling of classification uncertainties

📚 Documentation

For detailed information, see the documentation in the docs/ directory:

🤝 Contributing

We welcome contributions! Please see our Development Guide for details on:

  • Setting up the development environment
  • Code style and formatting guidelines
  • Running tests and quality checks
  • Submitting pull requests

Quick Contribution Setup

# Fork the repository and clone your fork
git clone https://github.com/your-username/agentic-rag-4.git
cd agentic-rag-4

# Install development dependencies
make install
uv sync --dev

# Run tests to ensure everything works
make test

# Create a feature branch
git checkout -b feature/amazing-feature

# Make your changes and test
make test
make lint

# Commit and push
git commit -m "Add amazing feature"
git push origin feature/amazing-feature

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙋‍♀️ Support


Built with ❤️ using FastAPI, LangGraph, Next.js, and assistant-ui