Files
2025-09-26 17:15:54 +08:00

51 KiB

Agentic RAG System Design Document

Overview

This document provides a comprehensive architectural overview of the Agentic RAG (Retrieval-Augmented Generation) system for manufacturing standards and regulations. The system combines LangGraph orchestration, streaming responses, and authoritative document retrieval to provide grounded answers with proper citations.

Design Philosophy

The Agentic RAG system is built on several key design principles:

  1. Intelligent Intent Recognition: The system automatically classifies user queries into different knowledge domains (standards/regulations vs. user manuals) to route them to specialized agents for optimal handling.

  2. Two-Phase Retrieval Strategy: For standards and regulations queries, the system first discovers relevant document metadata, then performs detailed content retrieval with enhanced query conditions based on the metadata findings.

  3. Streaming-First Architecture: All responses are delivered via Server-Sent Events (SSE) with real-time token streaming and tool execution progress, providing immediate feedback to users.

  4. Session-Aware Memory: Persistent conversation history stored in PostgreSQL enables context-aware multi-turn conversations while maintaining session isolation.

  5. Production-Ready Design: Comprehensive error handling, health monitoring, configuration management, and graceful fallback mechanisms ensure system reliability.

System Architecture

The Agentic RAG system employs a modern microservices architecture with clear separation of concerns across multiple layers. Each layer has specific responsibilities and communicates through well-defined interfaces.

Architecture Design Principles

  • Layered Architecture: Clear separation between presentation, business logic, data access, and external services
  • Asynchronous Processing: Non-blocking operations throughout the request pipeline for optimal performance
  • Horizontal Scalability: Stateless services that can be scaled independently based on load
  • Fault Tolerance: Graceful degradation and fallback mechanisms at every layer
  • Configuration-Driven: Environment-specific settings externalized for flexible deployment

High-Level Architecture

graph TB
    subgraph "Frontend Layer"
        UI[Next.js Web UI<br/>@assistant-ui/react]
        TR[Thread Component]
        TU[Tool UI Components]
        LS[Language Switcher]
    end
    
    subgraph "API Gateway Layer"
        NX[Next.js API Routes<br/>/api/chat]
        DP[Data Stream Protocol<br/>SSE Adapter]
    end
    
    subgraph "Backend Service Layer"
        FA[FastAPI Server<br/>Port 8000]
        AS[AI SDK Adapter]
        SC[SSE Controller]
    end
    
    subgraph "Agent Orchestration Layer"
        LG[LangGraph Workflow]
        IR[Intent Recognition]
        SA[Standards Agent]
        MA[Manual Agent]
        PP[Post Processor]
    end
    
    subgraph "Memory Layer"
        PG[(PostgreSQL<br/>Session Store)]
        CH[Checkpointer]
        MM[Memory Manager]
    end
    
    subgraph "Retrieval Layer"
        AZ[Azure AI Search]
        EM[Embedding Service]
        IDX[Search Indices]
    end
    
    subgraph "LLM Layer"
        LLM[LLM Provider<br/>OpenAI/Azure OpenAI]
        CF[Configuration]
    end
    
    UI --> NX
    TR --> NX
    TU --> NX
    LS --> UI
    NX --> DP
    DP --> FA
    FA --> AS
    AS --> SC
    SC --> LG
    LG --> IR
    IR --> SA
    IR --> MA
    SA --> PP
    MA --> PP
    LG --> CH
    CH --> PG
    MM --> PG
    SA --> AZ
    MA --> AZ
    AZ --> EM
    AZ --> IDX
    SA --> LLM
    MA --> LLM
    LLM --> CF

Component Architecture

The system is organized into several key component groups, each responsible for specific aspects of the application functionality:

Web Frontend Components:

  • Assistant Component: The main orchestrator that manages the overall chat experience
  • Thread UI: Handles conversation display and user interaction
  • Tool UIs: Specialized visualizations for different tool types (search, retrieval, analysis)
  • Language Support: Multi-language interface with automatic browser detection

Backend Core Components:

  • FastAPI Main: Central application server handling HTTP requests and responses
  • AI SDK Chat Endpoint: Specialized endpoint implementing the Data Stream Protocol for streaming responses
  • SSE Stream Controller: Manages Server-Sent Events for real-time communication
  • Configuration Manager: Centralized configuration loading and validation

Agent System Components:

  • LangGraph StateGraph: Core workflow engine managing agent execution
  • Intent Router: Intelligent classifier determining the appropriate agent for each query
  • Agent Nodes: Specialized processing units for different query types
  • Tool Nodes: Execution environment for retrieval and analysis tools
  • Memory System: Persistent storage and retrieval of conversation context
graph LR
    subgraph "Web Frontend"
        direction TB
        A1[Assistant Component]
        A2[Thread UI]
        A3[Tool UIs]
        A4[Language Support]
        A1 --> A2
        A1 --> A3
        A1 --> A4
    end
    
    subgraph "Backend Core"
        direction TB
        B1[FastAPI Main]
        B2[AI SDK Chat Endpoint]
        B3[SSE Stream Controller]
        B4[Configuration Manager]
        B1 --> B2
        B2 --> B3
        B1 --> B4
    end
    
    subgraph "Agent System"
        direction TB
        C1[LangGraph StateGraph]
        C2[Intent Router]
        C3[Agent Nodes]
        C4[Tool Nodes]
        C5[Memory System]
        C1 --> C2
        C2 --> C3
        C3 --> C4
        C1 --> C5
    end
    
    subgraph "Data Layer"
        direction TB
        D1[PostgreSQL Memory]
        D2[Azure AI Search]
        D3[LLM Services]
        D4[Configuration Store]
    end
    
    A1 -.-> B2
    B3 --> C1
    C4 --> D2
    C3 --> D3
    C5 --> D1
    B4 --> D4

Workflow Design

The Agentic RAG system implements sophisticated workflow patterns to handle different types of queries efficiently. The workflows are designed to be autonomous, adaptive, and optimized for the specific characteristics of each query type.

Agentic Workflow Architecture Advantages

Our system adopts the Agentic Workflow paradigm, which represents the optimal balance between autonomy and control in AI system design. This approach combines the best aspects of both traditional AI workflows and AI agents:

AI Workflow Patterns Comparison:

  1. AI Workflows: Deterministic, predesigned pipelines with highest predictability but lowest autonomy
  2. AI Agents: Reason-act loops that decide next steps with higher autonomy but variable reliability
  3. Agentic Workflows: Orchestrated graphs that embed one or more agents with guardrails, memory, and tools - delivering both autonomy and control

Our Agentic Workflow Benefits:

  • Controlled Autonomy: Agents can make autonomous decisions within well-defined guardrails and tool constraints
  • Predictable Behavior: LangGraph orchestration ensures reproducible workflows while allowing agent flexibility
  • Robust Error Handling: Built-in guardrails prevent agents from making unreliable or unsafe decisions
  • Memory-Aware Processing: Persistent session memory enables context-aware autonomous decision making
  • Tool-Constrained Intelligence: Agents operate within a curated set of tools, ensuring reliable and relevant outputs
  • Multi-Agent Coordination: Different specialized agents handle different query types with orchestrated handoffs
  • Adaptive Execution: Agents can autonomously decide on tool usage and multi-round execution while staying within system limits

Architectural Implementation:

  • LangGraph StateGraph: Provides the orchestrated graph structure with defined state transitions
  • Intent Recognition Router: Ensures queries reach the most appropriate specialized agent
  • Tool Round Limits: Guardrails prevent infinite loops while allowing autonomous multi-step reasoning
  • Session Memory: Enables context-aware decisions across conversation turns
  • Streaming Feedback: Real-time progress visibility provides user confidence in autonomous processing

Workflow Design Principles

  1. Intent-Driven Routing: Automatic classification ensures queries are handled by the most appropriate specialized agent
  2. Multi-Round Tool Execution: Agents can autonomously decide to use multiple tools in sequence to gather comprehensive information
  3. Parallel Processing: Multiple retrieval operations can execute simultaneously to reduce response time
  4. Context Preservation: Conversation history is maintained and used to enhance subsequent queries
  5. Citation Generation: All responses include proper source attribution with automatic citation extraction

Agentic Workflow

The core workflow demonstrates the Agentic Workflow pattern with orchestrated agent execution, guardrails, and autonomous decision-making within controlled boundaries. Each specialized agent operates with autonomy while being constrained by system guardrails and tool limitations.

flowchart TD
    START([User Query]) --> IR{Intent Recognition}
    
    IR -->|User Manual| UMA[User Manual RAG Agent]
    IR -->|Standards/Regulations| SRA[Standards/Regulations RAG Agent]
    
    subgraph "Standards/Regulations RAG"
        SRA --> SRT{Need Tools?}
        SRT -->|Yes| STL[Standards/Regulations Retrieval Tools<br/>Parallel Execution]
        SRT -->|No| SRS[Answer Synthesis]
        STL --> STC{Continue?}
        STC -->|Yes| QR2[Query Enhancement/<br/>Refinement]
        QR2 --> SRT
        STC -->|No| SRS
    end
    
    subgraph "User Manual RAG"
        UMA --> UMT{Need Tools?}
        UMT -->|Yes| UML[User Manual Retrieval Tools<br/>Parallel Execution]
        UMT -->|No| UMS[Answer Synthesis]
        UML --> UMC{Continue?}
        UMC -->|Yes| QR4[Query Enhancement/<br/>Refinement]
        QR4 --> UMT
        UMC -->|No| UMS
    end
    
    SRS --> SPP[Citation Builder]
    SPP --> END1([Response with Citations])
    
    UMS --> END2([Response])

    style IR fill:#e1f5fe
    style SRA fill:#f3e5f5
    style UMA fill:#e8f5e8
    style STL fill:#fff3e0
    style UML fill:#fff3e0

Agentic Workflow Features Demonstrated:

  • Orchestrated Graph Structure: LangGraph manages the overall workflow with defined state transitions
  • Embedded Specialized Agents: Different agents (Standards/Regulations, User Manual) handle domain-specific queries
  • Intelligent Query Rewriting/Decomposition: Core agentic feature where agents autonomously analyze, decompose, and rewrite queries for optimal retrieval coverage - demonstrating true query understanding and strategic planning
  • Autonomous Decision Making: Agents decide whether tools are needed and when to continue or finish
  • Built-in Guardrails: Tool round limits and workflow constraints prevent infinite loops
  • Memory Integration: Conversation context influences agent decisions throughout the workflow
  • Tool Orchestration: Agents autonomously select and execute appropriate tools within defined boundaries
  • Adaptive Query Intelligence: Agents learn from retrieval results and iteratively refine queries, showcasing emergent intelligence
  • Controllable Citation List and Links: Agentic workflow provides precise, controllable citation tracking with automatic mapping between retrieved sources and generated content, and can dynamically construct formatted citation lists and secure link URLs based on rule logic

Query Rewriting/Decomposition in Agentic Workflow - The Core Intelligence Feature:

This is the defining characteristic that elevates our solution from simple RAG to true Agentic RAG. The agents demonstrate genuine understanding and strategic thinking through sophisticated query processing:

  • Cognitive Query Analysis: Agents autonomously analyze user queries to understand intent, identify ambiguities, and infer implicit information requirements
  • Strategic Multi-Perspective Decomposition: Agents intelligently break down complex queries into 2-3 complementary sub-queries that explore different conceptual aspects, ensuring comprehensive coverage
  • Cross-Language Intelligence: Agents automatically generate semantically equivalent bilingual query variants (Chinese/English), demonstrating deep linguistic understanding
  • Context-Aware Strategic Rewriting: Agents incorporate conversation history and domain knowledge to refine and enhance queries, showing memory-driven intelligence
  • Autonomous Parallel Query Orchestration: Agents independently decide to execute multiple rewritten queries in parallel, optimizing for both speed and coverage
  • Iterative Learning and Refinement: Based on retrieval results, agents autonomously enhance queries for subsequent rounds, demonstrating learning and adaptation
  • Metadata-Informed Query Enhancement: For Phase 2 retrieval, agents intelligently synthesize metadata constraints from Phase 1 results, showing multi-step reasoning capability

Citation Management in Agentic Workflow - Enhanced Accountability and Traceability:

The Agentic Workflow provides unprecedented control and precision in citation management, going far beyond traditional RAG systems:

  • Autonomous Citation Tracking: Agents automatically track all tool calls and their results throughout multi-step workflows, maintaining complete provenance information
  • Fine-Grained Source Mapping: Each citation is precisely mapped to specific tool call results with unique identifiers, enabling exact source traceability
  • Multi-Round Citation Coherence: Agents maintain consistent citation numbering across multiple tool execution rounds, preventing citation conflicts or duplication
  • Intelligent Citation Placement: Agents strategically place citations based on content relevance and source quality, not just chronological order
  • Cross-Tool Citation Integration: Citations seamlessly integrate results from different tools (metadata search, content search) within a unified numbering system
  • Post-Processing Citation Enhancement: Dedicated post-processing nodes enrich citations with additional metadata (URLs, document titles, publication dates) for comprehensive reference lists
  • Citation Quality Control: Agents filter and validate citation sources based on relevance scores and metadata quality, ensuring only high-quality references are included

Citation Processing Workflow:

  1. Real-time Citation Capture: As agents execute tools, each result is automatically tagged with tool call ID and order number
  2. Strategic Citation Assignment: Agents intelligently assign citation numbers based on content importance and source authority
  3. Citation Map Generation: Agents generate structured citation mappings in standardized CSV format for processing
  4. Post-Processing Enhancement: Dedicated nodes transform raw citation data into formatted reference lists with complete metadata
  5. Quality Validation: Final citation lists undergo validation to ensure accuracy and completeness

This systematic approach ensures that every piece of information can be traced back to its exact source, providing users with the confidence and transparency required for regulatory and compliance use cases.

Query Processing Strategies - Domain-Specific Intelligence in Action:

The following strategies demonstrate how our agentic approach applies query rewriting/decomposition differently based on the target domain, showcasing true adaptive intelligence:

  1. Standards/Regulations Queries:

    • Phase 1: Generate 2-3 parallel metadata-focused sub-queries
    • Phase 2: Enhance queries with document codes and metadata constraints from Phase 1
    • Lucene Syntax: Intelligent use of advanced search syntax for precision filtering
  2. User Manual Queries:

    • Content-Focused: Generate queries targeting procedural and instructional content
    • Multi-Modal: Consider both textual content and structural elements (headers, sections)
    • Context Integration: Incorporate previous tool results for query refinement
  3. Cross-Agent Learning - Advanced Agentic Intelligence:

    • Query Pattern Recognition: Agents learn from successful query patterns across sessions, demonstrating emergent learning capabilities
    • Adaptive Rewriting: Query strategies evolve and adapt based on retrieval success rates, showing continuous improvement
    • Domain-Specific Optimization: Each agent develops specialized query rewriting patterns for its domain, demonstrating specialized expertise development

Two-Phase Retrieval Strategy

The standards and regulations agent employs a sophisticated two-phase retrieval strategy designed to maximize accuracy and relevance:

Phase 1: Metadata Discovery with Query Decomposition

  • Query Analysis: Agent analyzes user intent and decomposes complex queries into focused sub-queries
  • Multi-Perspective Rewriting: Generates 2-3 parallel sub-queries exploring different aspects of the user's intent
  • Cross-Language Coverage: Automatically includes both Chinese and English query variants for comprehensive search
  • Metadata-Focused Queries: Searches for document attributes, codes, titles, and publication information
  • Parallel Execution: Multiple rewritten queries execute simultaneously to maximize metadata coverage
  • Result Analysis: Agent synthesizes metadata findings to identify relevant standards and regulations

Phase 2: Content Retrieval with Query Enhancement (conditional)

  • Need Assessment: Agent autonomously determines if detailed content retrieval is required
  • Query Enhancement: Intelligently incorporates metadata constraints from Phase 1 results
  • Lucene Syntax Integration: Uses advanced search syntax with metadata filtering (e.g., (content_query) AND (document_code:(ISO45001 OR GB6722)))
  • Context-Aware Refinement: Enhances queries with conversation history and previous tool results
  • Focused Content Search: Retrieves detailed document chunks with full context and precise filtering
  • Multi-Round Capability: Can perform additional query refinement based on initial content results

Query Rewriting Examples:

Original Query: "汽车安全要求标准" (Automotive Safety Requirements Standards)

Phase 1 Decomposed Queries:

  1. "汽车安全标准 automotive safety standards GB ISO requirements"
  2. "车辆安全要求 vehicle safety requirements regulations 法规"
  3. "automotive safety standards ISO GB national standards 汽车"

Phase 2 Enhanced Queries (if Phase 1 found relevant documents):

  1. (安全要求 safety requirements) AND (document_code:(GB11551 OR ISO26262 OR GB7258))
  2. (automotive safety testing procedures) AND (document_category:Standard) AND (x_Standard_Vehicle_Type:passenger)
  3. (车辆安全技术条件) AND (publisher:国家标准委 OR SAC) AND (x_Standard_Published_State_EN:Effective)

This strategy ensures that users receive both overview information and detailed content as needed, while maintaining high precision through metadata-enhanced filtering and intelligent query decomposition.

Agentic Workflow in Two-Phase Retrieval:

  • Autonomous Phase Detection: Agents autonomously determine when Phase 2 retrieval is needed based on query analysis
  • Dynamic Query Enhancement: Agents intelligently enhance Phase 2 queries using metadata from Phase 1 results
  • Controlled Tool Execution: Tool usage is governed by workflow guardrails while allowing agent flexibility
  • Memory-Informed Decisions: Previous conversation context influences retrieval strategy decisions
  • Parallel Processing Autonomy: Agents can autonomously decide on parallel query execution for optimal coverage
sequenceDiagram
    participant U as User
    participant A as Agent
    participant QR as Query Rewriter
    participant P1 as Phase 1 Tool
    participant P2 as Phase 2 Tool
    participant AS as Azure Search
    participant LLM as LLM Service
    
    U->>A: Original query about standards
    A->>QR: Analyze and decompose query
    QR->>QR: Generate 2-3 sub-queries
    QR->>QR: Add cross-language variants
    QR-->>A: Rewritten query set
    
    par Phase 1: Parallel Metadata Discovery
        A->>P1: retrieve_standard_regulation(rewritten_query_1)
        A->>P1: retrieve_standard_regulation(rewritten_query_2)
        A->>P1: retrieve_standard_regulation(rewritten_query_3)
        P1->>AS: Search metadata index (parallel)
        AS-->>P1: Standards metadata results
        P1-->>A: Document codes, titles, dates
    end
    
    A->>A: Analyze metadata results
    A->>A: Determine if content needed
    A->>QR: Assess need for Phase 2
    
    opt Phase 2: Enhanced Content Retrieval
        QR->>QR: Enhance queries with metadata constraints
        QR->>QR: Apply Lucene syntax filtering
        QR-->>A: Enhanced query with metadata filters
        A->>P2: retrieve_doc_chunk(enhanced_query_with_constraints)
        P2->>AS: Search content index + metadata filters
        AS-->>P2: Filtered document chunks
        P2-->>A: Detailed content with context
    end
    
    A->>LLM: Synthesize with retrieved data
    LLM-->>A: Generated response
    A->>A: Extract citations from all sources
    A-->>U: Final answer with citations
    
    Note over QR: Query Rewriting Strategies:<br/>- Multi-perspective decomposition<br/>- Cross-language variants<br/>- Context-aware enhancement<br/>- Metadata constraint integration

Memory Management Flow

The system implements sophisticated session management with PostgreSQL-based persistence:

Session Lifecycle Management:

  • Unique session IDs generated for each conversation thread
  • Automatic session initialization with proper memory allocation
  • Conversation turns tracked with message ordering and timestamps
  • Intelligent message trimming to stay within context length limits
  • Persistent storage with 7-day TTL for automatic cleanup

Memory Architecture Benefits:

  • Cross-Request Continuity: Conversations persist across browser sessions
  • Context-Aware Responses: Agents can reference previous exchanges
  • Scalable Storage: PostgreSQL provides reliable, scalable persistence
  • Automatic Cleanup: TTL-based garbage collection prevents storage bloat
  • Fault Tolerance: Graceful fallback to in-memory storage if PostgreSQL unavailable

Agentic Workflow Memory Integration:

  • Context-Driven Autonomy: Agents make informed decisions based on conversation history
  • Memory-Aware Tool Selection: Previous tool results influence future tool choices
  • Session-Aware Guardrails: Memory context helps agents avoid redundant operations
  • Adaptive Workflow Paths: Conversation context guides agent workflow decisions
  • Persistent Learning: Agents can build upon previous conversation context for improved responses
flowchart TD
    subgraph "Session Lifecycle"
        SS[Session Start] --> SI[Session ID Generation]
        SI --> SM[Memory Initialization]
        SM --> CT[Conversation Turns]
        CT --> TM[Message Trimming]
        TM --> PS[Persistent Storage]
        PS --> TTL[TTL Cleanup]
        TTL --> SE[Session End]
    end
    
    subgraph "PostgreSQL Memory"
        SM --> CP[Create Checkpointer]
        CP --> PG[(PostgreSQL DB)]
        PS --> PW[Put Writes]
        PW --> PG
        TM --> TR[Trim Messages]
        TR --> PG
        TTL --> CL[Cleanup Old Records]
        CL --> PG
    end
    
    subgraph "Fallback Strategy"
        CP --> FB{PostgreSQL Available?}
        FB -->|No| IM[In-Memory Store]
        FB -->|Yes| PG
    end
    
    style PG fill:#e3f2fd
    style IM fill:#fff3e0
    style FB fill:#ffebee

Feature Architecture

The Agentic RAG system provides a comprehensive set of features designed for professional manufacturing standards and regulations queries. Each feature is implemented with production-grade quality and user experience considerations.

Feature Design Philosophy

  • User-Centric Design: All features prioritize ease of use and clear information presentation
  • Real-Time Feedback: Users receive immediate feedback on system processing and tool execution
  • Source Transparency: All responses include clear attribution and citation links
  • Multi-Modal Support: Text, visual, and interactive elements enhance information comprehension
  • Accessibility: Interface supports multiple languages and responsive design patterns

Core Features

mindmap
  root((Agentic RAG Features))
    Multi-Intent System
      Intent Recognition
      Domain Routing
      Specialized Agents
    Real-time Streaming
      SSE Protocol
      Token Streaming
      Tool Progress
      Citation Updates
    Advanced Retrieval
      Two-Phase Strategy
      Parallel Queries
      Metadata Enhancement
      Content Filtering
    Session Memory
      PostgreSQL Storage
      7-Day TTL
      Context Trimming
      Cross-Request State
    Modern Web UI
      assistant-ui Components
      Tool Visualizations
      Multi-language Support
      Responsive Design
    Production Ready
      Error Handling
      Health Monitoring
      Configuration Management
      Docker Support

Tool System Architecture

The tool system provides the core retrieval and analysis capabilities that power the agent workflows:

Tool Design Principles:

  • Query Intelligence: Advanced query rewriting and decomposition before tool execution
  • Modularity: Each tool has a single, well-defined responsibility
  • Composability: Tools can be combined in various workflows with rewritten queries
  • Observability: All tool executions provide detailed progress feedback
  • Error Resilience: Robust error handling with meaningful error messages
  • Performance: Optimized for both accuracy and response time through smart query enhancement

Query Processing Integration: Before any tool execution, the system applies sophisticated query rewriting and decomposition:

  1. Multi-perspective Decomposition: Breaking complex queries into focused sub-queries
  2. Cross-language Variants: Generating Chinese/English query variants for comprehensive coverage
  3. Context Enhancement: Adding domain-specific context and terminology
  4. Metadata Constraint Integration: Incorporating document type, date, and source constraints

This preprocessing ensures that each tool receives optimally crafted queries for maximum retrieval effectiveness.

Tool Categories:

Standards Tools: Specialized for regulatory and standards documents with intelligent query enhancement

  • retrieve_standard_regulation: Discovers document metadata using decomposed and rewritten queries
  • retrieve_doc_chunk_standard_regulation: Retrieves detailed content with metadata-enhanced filtering

User Manual Tools: Optimized for system documentation with context-aware query processing

  • retrieve_doc_chunk_user_manual: Searches user guides using rewritten queries for better coverage

Query Enhancement Integration: All tools benefit from the query processing pipeline:

  • Phase 1 Tools receive multiple decomposed queries for comprehensive metadata discovery
  • Phase 2 Tools receive enhanced queries with metadata constraints for precise content retrieval
  • Cross-tool Coordination ensures consistent query interpretation across different tool types

Azure AI Search Integration: All tools leverage advanced search capabilities with query intelligence

  • Smart Query Processing: Handles multiple rewritten queries with parallel execution
  • Hybrid Search: Combines keyword and vector search for decomposed query components
  • Semantic Ranking: Improved result relevance through query understanding
  • Cross-language Support: Processes Chinese/English query variants seamlessly
  • Metadata-aware Filtering: Applies enhanced constraints from query rewriting
  • Score Aggregation: Combines results from multiple query variants for comprehensive coverage
  • Multi-field Search: Searches across content and metadata with context-enhanced queries
graph TB
    subgraph "Query Processing Pipeline"
        QI[Query Input] --> QR[Query Rewriter & Decomposer]
        QR --> QA[Query Analyzer]
        QA --> QD[Query Dispatcher]
    end
    
    subgraph "Query Rewriting Strategies"
        QR --> QR1[Multi-perspective Decomposition]
        QR --> QR2[Cross-language Variants] 
        QR --> QR3[Context Enhancement]
        QR --> QR4[Metadata Constraint Integration]
    end
    
    subgraph "Tool Categories"
        ST[Standards Tools]
        UT[User Manual Tools]
    end
    
    subgraph "Standards Tools"
        ST1[retrieve_standard_regulation<br/>Metadata Search + Query Decomposition]
        ST2[retrieve_doc_chunk_standard_regulation<br/>Content Search + Enhanced Queries]
    end
    
    subgraph "User Manual Tools"
        UT1[retrieve_doc_chunk_user_manual<br/>Manual Search + Rewritten Queries]
    end
    
    subgraph "Tool Execution"
        TE[Tool Executor]
        PS[Parallel Scheduling]
        ER[Error Recovery]
        RF[Result Formatting]
    end
    
    subgraph "Azure AI Search Integration"
        HS[Hybrid Search]
        VS[Vector Search]
        SR[Semantic Ranking]
        RS[Result Scoring]
    end
    
    QD --> ST
    QD --> UT
    
    ST --> ST1
    ST --> ST2
    UT --> UT1
    
    ST1 --> TE
    ST2 --> TE
    UT1 --> TE
    
    TE --> PS
    PS --> ER
    ER --> RF
    
    TE --> HS
    HS --> VS
    VS --> SR
    SR --> RS

Data Flow Architecture

The system implements sophisticated data flow patterns optimized for real-time streaming and multi-step processing. Understanding these flows is crucial for system maintenance and optimization.

Data Flow Design Principles

  • Streaming-First: All responses use streaming protocols for immediate user feedback
  • Event-Driven: System components communicate through well-defined events
  • Backpressure Handling: Proper flow control prevents system overload
  • Error Propagation: Errors are handled gracefully with meaningful user feedback
  • Observability: Comprehensive logging and monitoring throughout all flows

Request-Response Flow

sequenceDiagram
    participant Browser as Web Browser
    participant NextJS as Next.js API
    participant FastAPI as FastAPI Backend
    participant LangGraph as LangGraph Engine
    participant Memory as PostgreSQL
    participant Search as Azure Search
    participant LLM as LLM Provider
    
    Browser->>NextJS: POST /api/chat
    NextJS->>FastAPI: Forward request
    FastAPI->>Memory: Load session
    Memory-->>FastAPI: Session data
    
    FastAPI->>LangGraph: Start workflow
    LangGraph->>LangGraph: Intent recognition
    
    alt Standards/Regulations Query
        LangGraph->>Search: Phase 1: Metadata search
        Search-->>LangGraph: Standards metadata
        LangGraph->>Search: Phase 2: Content search
        Search-->>LangGraph: Document chunks
    else User Manual Query
        LangGraph->>Search: Manual content search
        Search-->>LangGraph: Manual chunks
    end
    
    LangGraph->>LLM: Generate response
    LLM-->>LangGraph: Streamed tokens
    LangGraph->>LangGraph: Extract citations
    LangGraph->>Memory: Save conversation
    
    LangGraph-->>FastAPI: Streamed response
    FastAPI-->>NextJS: SSE stream
    NextJS-->>Browser: Data stream protocol

Streaming Data Flow

The streaming architecture implements the Data Stream Protocol for real-time communication:

Stream Event Types:

  • Text Events: text-start, text-delta, text-end for response content
  • Tool Events: tool-input-start, tool-input-delta, tool-input-available for tool parameters
  • Tool Results: tool-output-available for tool execution results
  • Step Events: start-step, finish-step for workflow progress
  • Control Events: finish, DONE for stream completion

Frontend Processing:

  • Data Stream Runtime: Parses and routes stream events to appropriate UI components
  • UI Components: Update in real-time based on received events
  • Tool UIs: Specialized visualizations for different tool types and their progress

Benefits:

  • Immediate Feedback: Users see processing start immediately
  • Progress Visibility: Tool execution progress is visible in real-time
  • Error Handling: Stream errors are displayed with context
  • Responsive UX: Interface remains interactive during processing
flowchart LR
    subgraph "Backend Streaming"
        LG[LangGraph Events]
        AD[AI SDK Adapter]
        SSE[SSE Controller]
    end
    
    subgraph "Stream Events"
        TS[text-start]
        TD[text-delta]
        TE[text-end]
        TIS[tool-input-start]
        TID[tool-input-delta]
        TIA[tool-input-available]
        TOA[tool-output-available]
        SS[start-step]
        FS[finish-step]
        FIN[finish]
        DONE[DONE]
    end
    
    subgraph "Frontend Processing"
        DS[Data Stream Runtime]
        UI[UI Components]
        TU[Tool UIs]
    end
    
    LG --> AD
    AD --> TS
    AD --> TD
    AD --> TE
    AD --> TIS
    AD --> TID
    AD --> TIA
    AD --> TOA
    AD --> SS
    AD --> FS
    AD --> FIN
    AD --> DONE
    
    SSE --> DS
    DS --> UI
    DS --> TU
    
    style LG fill:#e1f5fe
    style DS fill:#e8f5e8
    style SSE fill:#fff3e0

Configuration Architecture

The system uses a sophisticated configuration management approach that balances flexibility, security, and maintainability. Configuration is layered and validated to ensure system reliability.

Configuration Design Philosophy

  • Separation of Concerns: Different types of configuration are managed separately
  • Environment Flexibility: Easy adaptation to different deployment environments
  • Security First: Sensitive data is handled through secure channels
  • Type Safety: All configuration is validated using Pydantic models
  • Runtime Adaptability: Configuration can be updated without system restart (where appropriate)

Configuration Layers

Core Application Settings (config.yaml):

  • Application server configuration (ports, CORS, memory settings)
  • Database connection parameters
  • Logging configuration
  • Tool execution limits and timeouts

LLM and Prompt Configuration (llm_prompt.yaml):

  • LLM provider settings and model parameters
  • Specialized prompt templates for different agents
  • Token limits and generation parameters

Environment Variables:

  • Sensitive credentials (API keys, passwords)
  • Environment-specific overrides
  • Security tokens and certificates

Configuration Management

graph TB
    subgraph "Configuration Sources"
        CF1[config.yaml<br/>Core Settings]
        CF2[llm_prompt.yaml<br/>LLM & Prompts]
        CF3[Environment Variables<br/>Secrets]
    end
    
    subgraph "Configuration Models"
        CM1[AppConfig]
        CM2[LLMConfig]
        CM3[PostgreSQLConfig]
        CM4[RetrievalConfig]
        CM5[LoggingConfig]
    end
    
    subgraph "Runtime Configuration"
        RC1[Cached Config]
        RC2[Validation]
        RC3[Type Safety]
        RC4[Hot Reload]
    end
    
    CF1 --> CM1
    CF1 --> CM3
    CF1 --> CM4
    CF1 --> CM5
    CF2 --> CM2
    CF3 --> CM1
    CF3 --> CM3
    CF3 --> CM4
    
    CM1 --> RC1
    CM2 --> RC1
    CM3 --> RC1
    CM4 --> RC1
    CM5 --> RC1
    
    RC1 --> RC2
    RC2 --> RC3
    RC3 --> RC4
    
    style CF1 fill:#e3f2fd
    style CF2 fill:#e8f5e8
    style CF3 fill:#fff3e0

Service Configuration

The service configuration demonstrates the system's production-ready architecture:

Core Services Configuration:

  • Application Server: FastAPI running on port 8000 with CORS enabled for cross-origin requests
  • Database: Azure PostgreSQL with 7-day TTL for automatic session cleanup
  • LLM Provider: Configurable OpenAI/Azure OpenAI with multiple model support

Retrieval Services Configuration:

  • Azure AI Search: Hybrid search with semantic ranking across multiple indices
  • Embedding Service: Dedicated embedding generation service for vector search
  • Multi-Index Support: Separate indices for standards, regulations, and user manuals

Frontend Configuration:

  • Next.js Web Server: Port 3001 with server-side rendering and client-side hydration
  • API Proxy Layer: CORS handling and request forwarding to backend services
  • Static Asset Management: Optimized delivery of UI components and resources
graph LR
    subgraph "Core Services"
        APP[Application<br/>Port: 8000<br/>CORS: Enabled]
        DB[PostgreSQL<br/>Host: Azure<br/>TTL: 7 days]
        LLM[LLM Provider<br/>OpenAI/Azure<br/>Model: Configurable]
    end
    
    subgraph "Retrieval Services"
        AZ[Azure AI Search<br/>Hybrid + Semantic<br/>Multi-Index]
        EM[Embedding Service<br/>qwen3-embedding-8b<br/>Vector Generation]
    end
    
    subgraph "Frontend Services"
        WEB[Next.js Web<br/>Port: 3001<br/>SSR + Client]
        API[API Routes<br/>Proxy Layer<br/>CORS Handling]
    end
    
    APP --> DB
    APP --> LLM
    APP --> AZ
    AZ --> EM
    WEB --> API
    API --> APP

Deployment Architecture

The deployment architecture is designed for production scalability, reliability, and maintainability. It supports both cloud-native and containerized deployment patterns.

Deployment Design Principles

  • Cloud-Native: Leverages Azure cloud services for scalability and reliability
  • Containerization: Docker-based deployment for consistency across environments
  • Load Distribution: Multiple instances with proper load balancing
  • Health Monitoring: Comprehensive health checks and performance monitoring
  • Graceful Scaling: Auto-scaling capabilities based on demand

Production Deployment

The production deployment implements a multi-tier architecture with proper separation of concerns:

Load Balancer Tier:

  • Azure Load Balancer for high availability and traffic distribution
  • SSL termination and security policy enforcement
  • Health check routing to ensure traffic only reaches healthy instances

Frontend Tier:

  • Multiple Next.js instances for redundancy and load distribution
  • Static asset caching and CDN integration
  • Server-side rendering for optimal performance

Backend Tier:

  • Horizontally scalable FastAPI instances
  • Connection pooling for database efficiency
  • Shared session state through PostgreSQL

Data Tier:

  • Azure PostgreSQL for persistent session storage
  • Azure AI Search for document retrieval
  • External LLM services (OpenAI/Azure OpenAI)

Monitoring Tier:

  • Structured logging with centralized collection
  • Health check endpoints for all services
  • Performance metrics and alerting
graph TB
    subgraph "Load Balancer"
        LB[Azure Load Balancer]
    end
    
    subgraph "Frontend Tier"
        WEB1[Next.js Instance 1]
        WEB2[Next.js Instance 2]
        WEB3[Next.js Instance N]
    end
    
    subgraph "Backend Tier"
        API1[FastAPI Instance 1]
        API2[FastAPI Instance 2]
        API3[FastAPI Instance N]
    end
    
    subgraph "Data Tier"
        PG[(Azure PostgreSQL<br/>Session Memory)]
        AZ[(Azure AI Search<br/>Document Indices)]
        LLM[LLM Services<br/>OpenAI/Azure OpenAI]
    end
    
    subgraph "Monitoring"
        LOG[Structured Logging]
        HEALTH[Health Checks]
        METRICS[Performance Metrics]
    end
    
    LB --> WEB1
    LB --> WEB2
    LB --> WEB3
    
    WEB1 --> API1
    WEB2 --> API2
    WEB3 --> API3
    
    API1 --> PG
    API2 --> PG
    API3 --> PG
    
    API1 --> AZ
    API2 --> AZ
    API3 --> AZ
    
    API1 --> LLM
    API2 --> LLM
    API3 --> LLM
    
    API1 --> LOG
    API2 --> LOG
    API3 --> LOG
    
    LOG --> HEALTH
    LOG --> METRICS
    
    style LB fill:#e1f5fe
    style PG fill:#e8f5e8
    style AZ fill:#fff3e0
    style LLM fill:#f3e5f5

Container Architecture

The containerized deployment provides consistency and portability across environments:

Frontend Container:

  • Next.js application with Node.js runtime
  • Optimized build with static asset pre-generation
  • Environment variable injection for configuration
  • Health check endpoints for load balancer integration

Backend Container:

  • FastAPI application with Python 3.12+ runtime
  • Complete dependency tree including LangGraph and database drivers
  • Multi-stage build for optimized container size
  • Configuration validation on startup

External Service Integration:

  • Azure PostgreSQL for session persistence
  • Azure AI Search for document retrieval
  • Azure OpenAI for language model capabilities

Configuration Management:

  • Environment variables for runtime configuration
  • Mounted configuration files for complex settings
  • Secret management for sensitive credentials
  • Health check integration for service discovery

Benefits:

  • Consistency: Identical runtime environment across all deployments
  • Scalability: Easy horizontal scaling of individual services
  • Maintainability: Clear separation of application and infrastructure concerns
  • Security: Isolated execution environments with minimal attack surface
graph TB
    subgraph "Docker Containers"
        subgraph "Frontend Container"
            NEXT[Next.js<br/>Node.js Runtime<br/>Port: 3000]
        end
        
        subgraph "Backend Container"
            FAST[FastAPI<br/>Python Runtime<br/>Port: 8000]
            DEPS[Dependencies<br/>- LangGraph<br/>- psycopg<br/>- httpx]
        end
    end
    
    subgraph "External Services"
        PG_EXT[(Azure PostgreSQL)]
        AZ_EXT[(Azure AI Search)]
        LLM_EXT[Azure OpenAI]
    end
    
    subgraph "Configuration"
        ENV[Environment Variables]
        CONFIG[Configuration Files]
        SECRETS[Secret Management]
    end
    
    NEXT --> FAST
    FAST --> DEPS
    FAST --> PG_EXT
    FAST --> AZ_EXT
    FAST --> LLM_EXT
    
    ENV --> FAST
    ENV --> NEXT
    CONFIG --> FAST
    SECRETS --> FAST
    
    style NEXT fill:#e1f5fe
    style FAST fill:#e8f5e8
    style PG_EXT fill:#fff3e0
    style AZ_EXT fill:#fff3e0
    style LLM_EXT fill:#f3e5f5

Security Architecture

Security is implemented as a multi-layered defense system addressing threats at every level of the application stack. The architecture follows security best practices and industry standards.

Security Design Principles

  • Defense in Depth: Multiple security layers prevent single points of failure
  • Least Privilege: Components have minimal required permissions
  • Zero Trust: All requests are validated regardless of source
  • Data Protection: Sensitive data is encrypted at rest and in transit
  • Audit Trail: Comprehensive logging for security monitoring and compliance

Security Layers

graph TB
    subgraph "Frontend Security"
        CSP[Content Security Policy]
        CORS[CORS Configuration]
        XSS[XSS Protection]
        HTTPS[HTTPS Enforcement]
    end
    
    subgraph "API Security"
        AUTH[Session Authentication]
        RATE[Rate Limiting]
        VAL[Input Validation]
        CSRF[CSRF Protection]
    end
    
    subgraph "Data Security"
        ENC[Data Encryption]
        TLS[TLS Connections]
        KEY[Key Management]
        TTL[Data TTL/Cleanup]
    end
    
    subgraph "Infrastructure Security"
        VPN[Network Isolation]
        FW[Firewall Rules]
        IAM[Identity Management]
        AUDIT[Audit Logging]
    end
    
    CSP --> AUTH
    CORS --> AUTH
    XSS --> VAL
    HTTPS --> TLS
    
    AUTH --> ENC
    RATE --> ENC
    VAL --> ENC
    CSRF --> ENC
    
    ENC --> VPN
    TLS --> VPN
    KEY --> IAM
    TTL --> AUDIT
    
    style CSP fill:#ffebee
    style AUTH fill:#fff3e0
    style ENC fill:#e8f5e8
    style VPN fill:#e1f5fe

Performance Architecture

The system is designed for optimal performance across all components, with careful attention to latency, throughput, and resource utilization. Performance optimization is implemented at every layer.

Performance Design Principles

  • Latency Optimization: Minimize time to first response and overall response time
  • Throughput Maximization: Handle maximum concurrent users efficiently
  • Resource Efficiency: Optimal use of CPU, memory, and network resources
  • Predictable Performance: Consistent response times under varying loads
  • Scalable Architecture: Performance scales linearly with additional resources

Performance Optimization Strategies

Frontend Performance:

  • Server-Side Rendering: Faster initial page loads and better SEO
  • Code Splitting: Load only necessary JavaScript for each page
  • Browser Caching: Aggressive caching of static assets and API responses
  • CDN Distribution: Global content delivery for reduced latency

Backend Performance:

  • Asynchronous Processing: Non-blocking I/O for maximum concurrency
  • Connection Pooling: Efficient database connection management
  • Retry Logic: Intelligent retry mechanisms for transient failures
  • Streaming Responses: Immediate user feedback with progressive loading

Data Performance:

  • Search Indexing: Optimized indices for fast document retrieval
  • Vector Optimization: Efficient similarity search and ranking
  • Memory Management: Smart caching and memory usage patterns
  • TTL Optimization: Automatic cleanup to prevent performance degradation

Infrastructure Performance:

  • Auto Scaling: Dynamic resource allocation based on demand
  • Load Balancing: Optimal distribution of requests across instances
  • Performance Monitoring: Real-time metrics and alerting
  • Alert Systems: Proactive notification of performance issues
graph LR
    subgraph "Frontend Optimization"
        SSR[Server-Side Rendering]
        CODE[Code Splitting]
        CACHE[Browser Caching]
        CDN[CDN Distribution]
    end
    
    subgraph "Backend Optimization"
        ASYNC[Async Processing]
        POOL[Connection Pooling]
        RETRY[Retry Logic]
        STREAM[Streaming Responses]
    end
    
    subgraph "Data Optimization"
        INDEX[Search Indexing]
        VECTOR[Vector Optimization]
        MEMORY[Memory Management]
        TTL_OPT[TTL Optimization]
    end
    
    subgraph "Infrastructure Optimization"
        SCALE[Auto Scaling]
        BALANCE[Load Balancing]
        MONITOR[Performance Monitoring]
        ALERT[Alert Systems]
    end
    
    SSR --> ASYNC
    CODE --> POOL
    CACHE --> RETRY
    CDN --> STREAM
    
    ASYNC --> INDEX
    POOL --> VECTOR
    RETRY --> MEMORY
    STREAM --> TTL_OPT
    
    INDEX --> SCALE
    VECTOR --> BALANCE
    MEMORY --> MONITOR
    TTL_OPT --> ALERT

Technology Stack

The technology stack represents a carefully curated selection of modern, production-ready technologies that work together seamlessly to deliver a robust and scalable solution.

Technology Selection Criteria

  • Maturity: Proven technologies with strong community support
  • Performance: Optimized for speed and efficiency
  • Scalability: Can grow with increasing demands
  • Developer Experience: Tools that enhance productivity and maintainability
  • Ecosystem Integration: Technologies that work well together

Stack Components

Frontend Technologies:

  • Next.js 15: Latest React framework with advanced features like App Router and Server Components
  • React 19: Modern React with concurrent features and improved performance
  • TypeScript: Type safety and better developer experience
  • Tailwind CSS: Utility-first CSS framework for rapid UI development
  • assistant-ui: Specialized components for AI chat interfaces

Backend Technologies:

  • FastAPI: High-performance Python web framework with automatic API documentation
  • Python 3.12+: Latest Python with performance improvements and new features
  • LangGraph v0.6+: Advanced workflow orchestration for AI agents
  • Pydantic: Data validation and settings management
  • asyncio: Asynchronous programming for optimal concurrency

Data Technologies:

  • PostgreSQL: Robust relational database for session storage
  • psycopg3: Modern PostgreSQL adapter with async support
  • Azure AI Search: Advanced search capabilities with hybrid and semantic search
  • Vector Embeddings: Semantic similarity search for improved relevance

Infrastructure Technologies:

  • Docker: Containerization for consistent deployments
  • Azure Cloud: Comprehensive cloud platform with managed services
  • Health Monitoring: Built-in monitoring and alerting capabilities
  • Structured Logging: Comprehensive logging for debugging and monitoring

Complete Technology Stack

mindmap
  root((Technology Stack))
    Frontend
      Next.js 15
      React 19
      TypeScript
      Tailwind CSS
      assistant-ui
    Backend
      FastAPI
      Python 3.12+
      LangGraph v0.6+
      Pydantic
      asyncio
    Memory
      PostgreSQL
      psycopg3
      LangGraph Checkpointer
      Connection Pooling
    Search
      Azure AI Search
      Hybrid Search
      Vector Embeddings
      Semantic Ranking
    LLM
      OpenAI API
      Azure OpenAI
      Streaming Support
      Function Calling
    DevOps
      Docker
      Azure Cloud
      Health Monitoring
      Structured Logging

Conclusion

This Agentic RAG system represents a comprehensive solution for manufacturing standards and regulations queries, featuring:

Key Architectural Achievements

  • Sophisticated Multi-Layer Architecture: Clear separation of concerns with well-defined interfaces between frontend, API gateway, backend services, and data layers
  • Advanced AI Capabilities: LangGraph-powered multi-intent agents with intelligent routing and streaming responses
  • Production-Ready Implementation: Comprehensive error handling, monitoring, health checks, and graceful fallback mechanisms
  • Modern Technology Stack: Latest frameworks and best practices including Next.js 15, React 19, FastAPI, and LangGraph v0.6+
  • Scalable Design: Architecture ready for enterprise-scale deployment with horizontal scaling capabilities

System Benefits

For Users:

  • Intelligent, context-aware responses to complex manufacturing standards queries
  • Real-time streaming with immediate feedback and progress visibility
  • Multi-language support with automatic browser detection
  • Persistent conversation history across sessions

For Developers:

  • Clear, maintainable architecture with excellent documentation
  • Comprehensive testing framework with unit and integration tests
  • Configuration-driven deployment with environment flexibility
  • Modern development tools and practices

For Operations:

  • Docker-based deployment for consistency across environments
  • Comprehensive monitoring and alerting capabilities
  • Graceful degradation and fault tolerance
  • Automated scaling and load balancing

Design Excellence

The system demonstrates several aspects of excellent software design:

  1. Modularity: Each component has a single, well-defined responsibility
  2. Extensibility: New agents, tools, and features can be added without breaking existing functionality
  3. Reliability: Multiple layers of error handling and fallback mechanisms
  4. Performance: Optimized for both latency and throughput with streaming responses
  5. Security: Multi-layered security architecture following industry best practices
  6. Maintainability: Clean code structure with comprehensive documentation and testing

This architecture provides a solid foundation for current requirements while being flexible enough to accommodate future growth and enhancement. The system successfully combines the power of retrieval-augmented generation with intelligent agent orchestration to provide accurate, grounded, and cite-able responses to complex manufacturing standards queries.