Files

Ye Shijie db0e5965ec init

2025-09-26 17:15:54 +08:00

51 KiB

Raw Permalink Blame History

Agentic RAG System Design Document

Overview

This document provides a comprehensive architectural overview of the Agentic RAG (Retrieval-Augmented Generation) system for manufacturing standards and regulations. The system combines LangGraph orchestration, streaming responses, and authoritative document retrieval to provide grounded answers with proper citations.

Design Philosophy

The Agentic RAG system is built on several key design principles:

Intelligent Intent Recognition: The system automatically classifies user queries into different knowledge domains (standards/regulations vs. user manuals) to route them to specialized agents for optimal handling.
Two-Phase Retrieval Strategy: For standards and regulations queries, the system first discovers relevant document metadata, then performs detailed content retrieval with enhanced query conditions based on the metadata findings.
Streaming-First Architecture: All responses are delivered via Server-Sent Events (SSE) with real-time token streaming and tool execution progress, providing immediate feedback to users.
Session-Aware Memory: Persistent conversation history stored in PostgreSQL enables context-aware multi-turn conversations while maintaining session isolation.
Production-Ready Design: Comprehensive error handling, health monitoring, configuration management, and graceful fallback mechanisms ensure system reliability.

System Architecture

The Agentic RAG system employs a modern microservices architecture with clear separation of concerns across multiple layers. Each layer has specific responsibilities and communicates through well-defined interfaces.

Architecture Design Principles

Layered Architecture: Clear separation between presentation, business logic, data access, and external services
Asynchronous Processing: Non-blocking operations throughout the request pipeline for optimal performance
Horizontal Scalability: Stateless services that can be scaled independently based on load
Fault Tolerance: Graceful degradation and fallback mechanisms at every layer
Configuration-Driven: Environment-specific settings externalized for flexible deployment

High-Level Architecture

graph TB
    subgraph "Frontend Layer"
        UI[Next.js Web UI<br/>@assistant-ui/react]
        TR[Thread Component]
        TU[Tool UI Components]
        LS[Language Switcher]
    end
    
    subgraph "API Gateway Layer"
        NX[Next.js API Routes<br/>/api/chat]
        DP[Data Stream Protocol<br/>SSE Adapter]
    end
    
    subgraph "Backend Service Layer"
        FA[FastAPI Server<br/>Port 8000]
        AS[AI SDK Adapter]
        SC[SSE Controller]
    end
    
    subgraph "Agent Orchestration Layer"
        LG[LangGraph Workflow]
        IR[Intent Recognition]
        SA[Standards Agent]
        MA[Manual Agent]
        PP[Post Processor]
    end
    
    subgraph "Memory Layer"
        PG[(PostgreSQL<br/>Session Store)]
        CH[Checkpointer]
        MM[Memory Manager]
    end
    
    subgraph "Retrieval Layer"
        AZ[Azure AI Search]
        EM[Embedding Service]
        IDX[Search Indices]
    end
    
    subgraph "LLM Layer"
        LLM[LLM Provider<br/>OpenAI/Azure OpenAI]
        CF[Configuration]
    end
    
    UI --> NX
    TR --> NX
    TU --> NX
    LS --> UI
    NX --> DP
    DP --> FA
    FA --> AS
    AS --> SC
    SC --> LG
    LG --> IR
    IR --> SA
    IR --> MA
    SA --> PP
    MA --> PP
    LG --> CH
    CH --> PG
    MM --> PG
    SA --> AZ
    MA --> AZ
    AZ --> EM
    AZ --> IDX
    SA --> LLM
    MA --> LLM
    LLM --> CF

Component Architecture

The system is organized into several key component groups, each responsible for specific aspects of the application functionality:

Web Frontend Components:

Assistant Component: The main orchestrator that manages the overall chat experience
Thread UI: Handles conversation display and user interaction
Tool UIs: Specialized visualizations for different tool types (search, retrieval, analysis)
Language Support: Multi-language interface with automatic browser detection

Backend Core Components:

FastAPI Main: Central application server handling HTTP requests and responses
AI SDK Chat Endpoint: Specialized endpoint implementing the Data Stream Protocol for streaming responses
SSE Stream Controller: Manages Server-Sent Events for real-time communication
Configuration Manager: Centralized configuration loading and validation

Agent System Components:

LangGraph StateGraph: Core workflow engine managing agent execution
Intent Router: Intelligent classifier determining the appropriate agent for each query
Agent Nodes: Specialized processing units for different query types
Tool Nodes: Execution environment for retrieval and analysis tools
Memory System: Persistent storage and retrieval of conversation context

graph LR
    subgraph "Web Frontend"
        direction TB
        A1[Assistant Component]
        A2[Thread UI]
        A3[Tool UIs]
        A4[Language Support]
        A1 --> A2
        A1 --> A3
        A1 --> A4
    end
    
    subgraph "Backend Core"
        direction TB
        B1[FastAPI Main]
        B2[AI SDK Chat Endpoint]
        B3[SSE Stream Controller]
        B4[Configuration Manager]
        B1 --> B2
        B2 --> B3
        B1 --> B4
    end
    
    subgraph "Agent System"
        direction TB
        C1[LangGraph StateGraph]
        C2[Intent Router]
        C3[Agent Nodes]
        C4[Tool Nodes]
        C5[Memory System]
        C1 --> C2
        C2 --> C3
        C3 --> C4
        C1 --> C5
    end
    
    subgraph "Data Layer"
        direction TB
        D1[PostgreSQL Memory]
        D2[Azure AI Search]
        D3[LLM Services]
        D4[Configuration Store]
    end
    
    A1 -.-> B2
    B3 --> C1
    C4 --> D2
    C3 --> D3
    C5 --> D1
    B4 --> D4

Workflow Design

The Agentic RAG system implements sophisticated workflow patterns to handle different types of queries efficiently. The workflows are designed to be autonomous, adaptive, and optimized for the specific characteristics of each query type.

Agentic Workflow Architecture Advantages

Our system adopts the Agentic Workflow paradigm, which represents the optimal balance between autonomy and control in AI system design. This approach combines the best aspects of both traditional AI workflows and AI agents:

AI Workflow Patterns Comparison:

AI Workflows: Deterministic, predesigned pipelines with highest predictability but lowest autonomy
AI Agents: Reason-act loops that decide next steps with higher autonomy but variable reliability
Agentic Workflows: Orchestrated graphs that embed one or more agents with guardrails, memory, and tools - delivering both autonomy and control

Our Agentic Workflow Benefits:

Controlled Autonomy: Agents can make autonomous decisions within well-defined guardrails and tool constraints
Predictable Behavior: LangGraph orchestration ensures reproducible workflows while allowing agent flexibility
Robust Error Handling: Built-in guardrails prevent agents from making unreliable or unsafe decisions
Memory-Aware Processing: Persistent session memory enables context-aware autonomous decision making
Tool-Constrained Intelligence: Agents operate within a curated set of tools, ensuring reliable and relevant outputs
Multi-Agent Coordination: Different specialized agents handle different query types with orchestrated handoffs
Adaptive Execution: Agents can autonomously decide on tool usage and multi-round execution while staying within system limits

Architectural Implementation:

LangGraph StateGraph: Provides the orchestrated graph structure with defined state transitions
Intent Recognition Router: Ensures queries reach the most appropriate specialized agent
Tool Round Limits: Guardrails prevent infinite loops while allowing autonomous multi-step reasoning
Session Memory: Enables context-aware decisions across conversation turns
Streaming Feedback: Real-time progress visibility provides user confidence in autonomous processing

Workflow Design Principles

Intent-Driven Routing: Automatic classification ensures queries are handled by the most appropriate specialized agent
Multi-Round Tool Execution: Agents can autonomously decide to use multiple tools in sequence to gather comprehensive information
Parallel Processing: Multiple retrieval operations can execute simultaneously to reduce response time
Context Preservation: Conversation history is maintained and used to enhance subsequent queries
Citation Generation: All responses include proper source attribution with automatic citation extraction

Agentic Workflow

The core workflow demonstrates the Agentic Workflow pattern with orchestrated agent execution, guardrails, and autonomous decision-making within controlled boundaries. Each specialized agent operates with autonomy while being constrained by system guardrails and tool limitations.

flowchart TD
    START([User Query]) --> IR{Intent Recognition}
    
    IR -->|User Manual| UMA[User Manual RAG Agent]
    IR -->|Standards/Regulations| SRA[Standards/Regulations RAG Agent]
    
    subgraph "Standards/Regulations RAG"
        SRA --> SRT{Need Tools?}
        SRT -->|Yes| STL[Standards/Regulations Retrieval Tools<br/>Parallel Execution]
        SRT -->|No| SRS[Answer Synthesis]
        STL --> STC{Continue?}
        STC -->|Yes| QR2[Query Enhancement/<br/>Refinement]
        QR2 --> SRT
        STC -->|No| SRS
    end
    
    subgraph "User Manual RAG"
        UMA --> UMT{Need Tools?}
        UMT -->|Yes| UML[User Manual Retrieval Tools<br/>Parallel Execution]
        UMT -->|No| UMS[Answer Synthesis]
        UML --> UMC{Continue?}
        UMC -->|Yes| QR4[Query Enhancement/<br/>Refinement]
        QR4 --> UMT
        UMC -->|No| UMS
    end
    
    SRS --> SPP[Citation Builder]
    SPP --> END1([Response with Citations])
    
    UMS --> END2([Response])

    style IR fill:#e1f5fe
    style SRA fill:#f3e5f5
    style UMA fill:#e8f5e8
    style STL fill:#fff3e0
    style UML fill:#fff3e0

Agentic Workflow Features Demonstrated:

Orchestrated Graph Structure: LangGraph manages the overall workflow with defined state transitions
Embedded Specialized Agents: Different agents (Standards/Regulations, User Manual) handle domain-specific queries
Intelligent Query Rewriting/Decomposition: Core agentic feature where agents autonomously analyze, decompose, and rewrite queries for optimal retrieval coverage - demonstrating true query understanding and strategic planning
Autonomous Decision Making: Agents decide whether tools are needed and when to continue or finish
Built-in Guardrails: Tool round limits and workflow constraints prevent infinite loops
Memory Integration: Conversation context influences agent decisions throughout the workflow
Tool Orchestration: Agents autonomously select and execute appropriate tools within defined boundaries
Adaptive Query Intelligence: Agents learn from retrieval results and iteratively refine queries, showcasing emergent intelligence
Controllable Citation List and Links: Agentic workflow provides precise, controllable citation tracking with automatic mapping between retrieved sources and generated content, and can dynamically construct formatted citation lists and secure link URLs based on rule logic

Query Rewriting/Decomposition in Agentic Workflow - The Core Intelligence Feature:

This is the defining characteristic that elevates our solution from simple RAG to true Agentic RAG. The agents demonstrate genuine understanding and strategic thinking through sophisticated query processing:

Cognitive Query Analysis: Agents autonomously analyze user queries to understand intent, identify ambiguities, and infer implicit information requirements
Strategic Multi-Perspective Decomposition: Agents intelligently break down complex queries into 2-3 complementary sub-queries that explore different conceptual aspects, ensuring comprehensive coverage
Cross-Language Intelligence: Agents automatically generate semantically equivalent bilingual query variants (Chinese/English), demonstrating deep linguistic understanding
Context-Aware Strategic Rewriting: Agents incorporate conversation history and domain knowledge to refine and enhance queries, showing memory-driven intelligence
Autonomous Parallel Query Orchestration: Agents independently decide to execute multiple rewritten queries in parallel, optimizing for both speed and coverage
Iterative Learning and Refinement: Based on retrieval results, agents autonomously enhance queries for subsequent rounds, demonstrating learning and adaptation
Metadata-Informed Query Enhancement: For Phase 2 retrieval, agents intelligently synthesize metadata constraints from Phase 1 results, showing multi-step reasoning capability

Citation Management in Agentic Workflow - Enhanced Accountability and Traceability:

The Agentic Workflow provides unprecedented control and precision in citation management, going far beyond traditional RAG systems:

Autonomous Citation Tracking: Agents automatically track all tool calls and their results throughout multi-step workflows, maintaining complete provenance information
Fine-Grained Source Mapping: Each citation is precisely mapped to specific tool call results with unique identifiers, enabling exact source traceability
Multi-Round Citation Coherence: Agents maintain consistent citation numbering across multiple tool execution rounds, preventing citation conflicts or duplication
Intelligent Citation Placement: Agents strategically place citations based on content relevance and source quality, not just chronological order
Cross-Tool Citation Integration: Citations seamlessly integrate results from different tools (metadata search, content search) within a unified numbering system
Post-Processing Citation Enhancement: Dedicated post-processing nodes enrich citations with additional metadata (URLs, document titles, publication dates) for comprehensive reference lists
Citation Quality Control: Agents filter and validate citation sources based on relevance scores and metadata quality, ensuring only high-quality references are included

Citation Processing Workflow:

Real-time Citation Capture: As agents execute tools, each result is automatically tagged with tool call ID and order number
Strategic Citation Assignment: Agents intelligently assign citation numbers based on content importance and source authority
Citation Map Generation: Agents generate structured citation mappings in standardized CSV format for processing
Post-Processing Enhancement: Dedicated nodes transform raw citation data into formatted reference lists with complete metadata
Quality Validation: Final citation lists undergo validation to ensure accuracy and completeness

This systematic approach ensures that every piece of information can be traced back to its exact source, providing users with the confidence and transparency required for regulatory and compliance use cases.

Query Processing Strategies - Domain-Specific Intelligence in Action:

The following strategies demonstrate how our agentic approach applies query rewriting/decomposition differently based on the target domain, showcasing true adaptive intelligence:

Standards/Regulations Queries:
- Phase 1: Generate 2-3 parallel metadata-focused sub-queries
- Phase 2: Enhance queries with document codes and metadata constraints from Phase 1
- Lucene Syntax: Intelligent use of advanced search syntax for precision filtering
User Manual Queries:
- Content-Focused: Generate queries targeting procedural and instructional content
- Multi-Modal: Consider both textual content and structural elements (headers, sections)
- Context Integration: Incorporate previous tool results for query refinement
Cross-Agent Learning - Advanced Agentic Intelligence:
- Query Pattern Recognition: Agents learn from successful query patterns across sessions, demonstrating emergent learning capabilities
- Adaptive Rewriting: Query strategies evolve and adapt based on retrieval success rates, showing continuous improvement
- Domain-Specific Optimization: Each agent develops specialized query rewriting patterns for its domain, demonstrating specialized expertise development

Two-Phase Retrieval Strategy

The standards and regulations agent employs a sophisticated two-phase retrieval strategy designed to maximize accuracy and relevance:

Phase 1: Metadata Discovery with Query Decomposition

Query Analysis: Agent analyzes user intent and decomposes complex queries into focused sub-queries
Multi-Perspective Rewriting: Generates 2-3 parallel sub-queries exploring different aspects of the user's intent
Cross-Language Coverage: Automatically includes both Chinese and English query variants for comprehensive search
Metadata-Focused Queries: Searches for document attributes, codes, titles, and publication information
Parallel Execution: Multiple rewritten queries execute simultaneously to maximize metadata coverage
Result Analysis: Agent synthesizes metadata findings to identify relevant standards and regulations

Phase 2: Content Retrieval with Query Enhancement (conditional)

Need Assessment: Agent autonomously determines if detailed content retrieval is required
Query Enhancement: Intelligently incorporates metadata constraints from Phase 1 results
Lucene Syntax Integration: Uses advanced search syntax with metadata filtering (e.g., (content_query) AND (document_code:(ISO45001 OR GB6722)))
Context-Aware Refinement: Enhances queries with conversation history and previous tool results
Focused Content Search: Retrieves detailed document chunks with full context and precise filtering
Multi-Round Capability: Can perform additional query refinement based on initial content results

Query Rewriting Examples:

Original Query: "汽车安全要求标准" (Automotive Safety Requirements Standards)

Phase 1 Decomposed Queries:

"汽车安全标准 automotive safety standards GB ISO requirements"
"车辆安全要求 vehicle safety requirements regulations 法规"
"automotive safety standards ISO GB national standards 汽车"

Phase 2 Enhanced Queries (if Phase 1 found relevant documents):

(安全要求 safety requirements) AND (document_code:(GB11551 OR ISO26262 OR GB7258))
(automotive safety testing procedures) AND (document_category:Standard) AND (x_Standard_Vehicle_Type:passenger)
(车辆安全技术条件) AND (publisher:国家标准委 OR SAC) AND (x_Standard_Published_State_EN:Effective)

This strategy ensures that users receive both overview information and detailed content as needed, while maintaining high precision through metadata-enhanced filtering and intelligent query decomposition.

Agentic Workflow in Two-Phase Retrieval:

Autonomous Phase Detection: Agents autonomously determine when Phase 2 retrieval is needed based on query analysis
Dynamic Query Enhancement: Agents intelligently enhance Phase 2 queries using metadata from Phase 1 results
Controlled Tool Execution: Tool usage is governed by workflow guardrails while allowing agent flexibility
Memory-Informed Decisions: Previous conversation context influences retrieval strategy decisions
Parallel Processing Autonomy: Agents can autonomously decide on parallel query execution for optimal coverage

sequenceDiagram
    participant U as User
    participant A as Agent
    participant QR as Query Rewriter
    participant P1 as Phase 1 Tool
    participant P2 as Phase 2 Tool
    participant AS as Azure Search
    participant LLM as LLM Service
    
    U->>A: Original query about standards
    A->>QR: Analyze and decompose query
    QR->>QR: Generate 2-3 sub-queries
    QR->>QR: Add cross-language variants
    QR-->>A: Rewritten query set
    
    par Phase 1: Parallel Metadata Discovery
        A->>P1: retrieve_standard_regulation(rewritten_query_1)
        A->>P1: retrieve_standard_regulation(rewritten_query_2)
        A->>P1: retrieve_standard_regulation(rewritten_query_3)
        P1->>AS: Search metadata index (parallel)
        AS-->>P1: Standards metadata results
        P1-->>A: Document codes, titles, dates
    end
    
    A->>A: Analyze metadata results
    A->>A: Determine if content needed
    A->>QR: Assess need for Phase 2
    
    opt Phase 2: Enhanced Content Retrieval
        QR->>QR: Enhance queries with metadata constraints
        QR->>QR: Apply Lucene syntax filtering
        QR-->>A: Enhanced query with metadata filters
        A->>P2: retrieve_doc_chunk(enhanced_query_with_constraints)
        P2->>AS: Search content index + metadata filters
        AS-->>P2: Filtered document chunks
        P2-->>A: Detailed content with context
    end
    
    A->>LLM: Synthesize with retrieved data
    LLM-->>A: Generated response
    A->>A: Extract citations from all sources
    A-->>U: Final answer with citations
    
    Note over QR: Query Rewriting Strategies:<br/>- Multi-perspective decomposition<br/>- Cross-language variants<br/>- Context-aware enhancement<br/>- Metadata constraint integration

Memory Management Flow

The system implements sophisticated session management with PostgreSQL-based persistence:

Session Lifecycle Management:

Unique session IDs generated for each conversation thread
Automatic session initialization with proper memory allocation
Conversation turns tracked with message ordering and timestamps
Intelligent message trimming to stay within context length limits
Persistent storage with 7-day TTL for automatic cleanup

Memory Architecture Benefits:

Cross-Request Continuity: Conversations persist across browser sessions
Context-Aware Responses: Agents can reference previous exchanges
Scalable Storage: PostgreSQL provides reliable, scalable persistence
Automatic Cleanup: TTL-based garbage collection prevents storage bloat
Fault Tolerance: Graceful fallback to in-memory storage if PostgreSQL unavailable

Agentic Workflow Memory Integration:

Context-Driven Autonomy: Agents make informed decisions based on conversation history
Memory-Aware Tool Selection: Previous tool results influence future tool choices
Session-Aware Guardrails: Memory context helps agents avoid redundant operations
Adaptive Workflow Paths: Conversation context guides agent workflow decisions
Persistent Learning: Agents can build upon previous conversation context for improved responses

flowchart TD
    subgraph "Session Lifecycle"
        SS[Session Start] --> SI[Session ID Generation]
        SI --> SM[Memory Initialization]
        SM --> CT[Conversation Turns]
        CT --> TM[Message Trimming]
        TM --> PS[Persistent Storage]
        PS --> TTL[TTL Cleanup]
        TTL --> SE[Session End]
    end
    
    subgraph "PostgreSQL Memory"
        SM --> CP[Create Checkpointer]
        CP --> PG[(PostgreSQL DB)]
        PS --> PW[Put Writes]
        PW --> PG
        TM --> TR[Trim Messages]
        TR --> PG
        TTL --> CL[Cleanup Old Records]
        CL --> PG
    end
    
    subgraph "Fallback Strategy"
        CP --> FB{PostgreSQL Available?}
        FB -->|No| IM[In-Memory Store]
        FB -->|Yes| PG
    end
    
    style PG fill:#e3f2fd
    style IM fill:#fff3e0
    style FB fill:#ffebee

Feature Architecture

The Agentic RAG system provides a comprehensive set of features designed for professional manufacturing standards and regulations queries. Each feature is implemented with production-grade quality and user experience considerations.

Feature Design Philosophy

User-Centric Design: All features prioritize ease of use and clear information presentation
Real-Time Feedback: Users receive immediate feedback on system processing and tool execution
Source Transparency: All responses include clear attribution and citation links
Multi-Modal Support: Text, visual, and interactive elements enhance information comprehension
Accessibility: Interface supports multiple languages and responsive design patterns

Core Features

mindmap
  root((Agentic RAG Features))
    Multi-Intent System
      Intent Recognition
      Domain Routing
      Specialized Agents
    Real-time Streaming
      SSE Protocol
      Token Streaming
      Tool Progress
      Citation Updates
    Advanced Retrieval
      Two-Phase Strategy
      Parallel Queries
      Metadata Enhancement
      Content Filtering
    Session Memory
      PostgreSQL Storage
      7-Day TTL
      Context Trimming
      Cross-Request State
    Modern Web UI
      assistant-ui Components
      Tool Visualizations
      Multi-language Support
      Responsive Design
    Production Ready
      Error Handling
      Health Monitoring
      Configuration Management
      Docker Support

Tool System Architecture

The tool system provides the core retrieval and analysis capabilities that power the agent workflows:

Tool Design Principles:

Query Intelligence: Advanced query rewriting and decomposition before tool execution
Modularity: Each tool has a single, well-defined responsibility
Composability: Tools can be combined in various workflows with rewritten queries
Observability: All tool executions provide detailed progress feedback
Error Resilience: Robust error handling with meaningful error messages
Performance: Optimized for both accuracy and response time through smart query enhancement

Query Processing Integration: Before any tool execution, the system applies sophisticated query rewriting and decomposition:

Multi-perspective Decomposition: Breaking complex queries into focused sub-queries
Cross-language Variants: Generating Chinese/English query variants for comprehensive coverage
Context Enhancement: Adding domain-specific context and terminology
Metadata Constraint Integration: Incorporating document type, date, and source constraints

This preprocessing ensures that each tool receives optimally crafted queries for maximum retrieval effectiveness.

Tool Categories:

Standards Tools: Specialized for regulatory and standards documents with intelligent query enhancement

retrieve_standard_regulation: Discovers document metadata using decomposed and rewritten queries
retrieve_doc_chunk_standard_regulation: Retrieves detailed content with metadata-enhanced filtering

User Manual Tools: Optimized for system documentation with context-aware query processing

retrieve_doc_chunk_user_manual: Searches user guides using rewritten queries for better coverage

Query Enhancement Integration: All tools benefit from the query processing pipeline:

Phase 1 Tools receive multiple decomposed queries for comprehensive metadata discovery
Phase 2 Tools receive enhanced queries with metadata constraints for precise content retrieval
Cross-tool Coordination ensures consistent query interpretation across different tool types

Azure AI Search Integration: All tools leverage advanced search capabilities with query intelligence

Smart Query Processing: Handles multiple rewritten queries with parallel execution
Hybrid Search: Combines keyword and vector search for decomposed query components
Semantic Ranking: Improved result relevance through query understanding
Cross-language Support: Processes Chinese/English query variants seamlessly
Metadata-aware Filtering: Applies enhanced constraints from query rewriting
Score Aggregation: Combines results from multiple query variants for comprehensive coverage
Multi-field Search: Searches across content and metadata with context-enhanced queries

graph TB
    subgraph "Query Processing Pipeline"
        QI[Query Input] --> QR[Query Rewriter & Decomposer]
        QR --> QA[Query Analyzer]
        QA --> QD[Query Dispatcher]
    end
    
    subgraph "Query Rewriting Strategies"
        QR --> QR1[Multi-perspective Decomposition]
        QR --> QR2[Cross-language Variants] 
        QR --> QR3[Context Enhancement]
        QR --> QR4[Metadata Constraint Integration]
    end
    
    subgraph "Tool Categories"
        ST[Standards Tools]
        UT[User Manual Tools]
    end
    
    subgraph "Standards Tools"
        ST1[retrieve_standard_regulation<br/>Metadata Search + Query Decomposition]
        ST2[retrieve_doc_chunk_standard_regulation<br/>Content Search + Enhanced Queries]
    end
    
    subgraph "User Manual Tools"
        UT1[retrieve_doc_chunk_user_manual<br/>Manual Search + Rewritten Queries]
    end
    
    subgraph "Tool Execution"
        TE[Tool Executor]
        PS[Parallel Scheduling]
        ER[Error Recovery]
        RF[Result Formatting]
    end
    
    subgraph "Azure AI Search Integration"
        HS[Hybrid Search]
        VS[Vector Search]
        SR[Semantic Ranking]
        RS[Result Scoring]
    end
    
    QD --> ST
    QD --> UT
    
    ST --> ST1
    ST --> ST2
    UT --> UT1
    
    ST1 --> TE
    ST2 --> TE
    UT1 --> TE
    
    TE --> PS
    PS --> ER
    ER --> RF
    
    TE --> HS
    HS --> VS
    VS --> SR
    SR --> RS

Data Flow Architecture

The system implements sophisticated data flow patterns optimized for real-time streaming and multi-step processing. Understanding these flows is crucial for system maintenance and optimization.

Data Flow Design Principles

Streaming-First: All responses use streaming protocols for immediate user feedback
Event-Driven: System components communicate through well-defined events
Backpressure Handling: Proper flow control prevents system overload
Error Propagation: Errors are handled gracefully with meaningful user feedback
Observability: Comprehensive logging and monitoring throughout all flows

Request-Response Flow

sequenceDiagram
    participant Browser as Web Browser
    participant NextJS as Next.js API
    participant FastAPI as FastAPI Backend
    participant LangGraph as LangGraph Engine
    participant Memory as PostgreSQL
    participant Search as Azure Search
    participant LLM as LLM Provider
    
    Browser->>NextJS: POST /api/chat
    NextJS->>FastAPI: Forward request
    FastAPI->>Memory: Load session
    Memory-->>FastAPI: Session data
    
    FastAPI->>LangGraph: Start workflow
    LangGraph->>LangGraph: Intent recognition
    
    alt Standards/Regulations Query
        LangGraph->>Search: Phase 1: Metadata search
        Search-->>LangGraph: Standards metadata
        LangGraph->>Search: Phase 2: Content search
        Search-->>LangGraph: Document chunks
    else User Manual Query
        LangGraph->>Search: Manual content search
        Search-->>LangGraph: Manual chunks
    end
    
    LangGraph->>LLM: Generate response
    LLM-->>LangGraph: Streamed tokens
    LangGraph->>LangGraph: Extract citations
    LangGraph->>Memory: Save conversation
    
    LangGraph-->>FastAPI: Streamed response
    FastAPI-->>NextJS: SSE stream
    NextJS-->>Browser: Data stream protocol

Streaming Data Flow

The streaming architecture implements the Data Stream Protocol for real-time communication:

Stream Event Types:

Text Events: text-start, text-delta, text-end for response content
Tool Events: tool-input-start, tool-input-delta, tool-input-available for tool parameters
Tool Results: tool-output-available for tool execution results
Step Events: start-step, finish-step for workflow progress
Control Events: finish, DONE for stream completion

Frontend Processing:

Data Stream Runtime: Parses and routes stream events to appropriate UI components
UI Components: Update in real-time based on received events
Tool UIs: Specialized visualizations for different tool types and their progress

Benefits:

Immediate Feedback: Users see processing start immediately
Progress Visibility: Tool execution progress is visible in real-time
Error Handling: Stream errors are displayed with context
Responsive UX: Interface remains interactive during processing

flowchart LR
    subgraph "Backend Streaming"
        LG[LangGraph Events]
        AD[AI SDK Adapter]
        SSE[SSE Controller]
    end
    
    subgraph "Stream Events"
        TS[text-start]
        TD[text-delta]
        TE[text-end]
        TIS[tool-input-start]
        TID[tool-input-delta]
        TIA[tool-input-available]
        TOA[tool-output-available]
        SS[start-step]
        FS[finish-step]
        FIN[finish]
        DONE[DONE]
    end
    
    subgraph "Frontend Processing"
        DS[Data Stream Runtime]
        UI[UI Components]
        TU[Tool UIs]
    end
    
    LG --> AD
    AD --> TS
    AD --> TD
    AD --> TE
    AD --> TIS
    AD --> TID
    AD --> TIA
    AD --> TOA
    AD --> SS
    AD --> FS
    AD --> FIN
    AD --> DONE
    
    SSE --> DS
    DS --> UI
    DS --> TU
    
    style LG fill:#e1f5fe
    style DS fill:#e8f5e8
    style SSE fill:#fff3e0

Configuration Architecture

The system uses a sophisticated configuration management approach that balances flexibility, security, and maintainability. Configuration is layered and validated to ensure system reliability.

Configuration Design Philosophy

Separation of Concerns: Different types of configuration are managed separately
Environment Flexibility: Easy adaptation to different deployment environments
Security First: Sensitive data is handled through secure channels
Type Safety: All configuration is validated using Pydantic models
Runtime Adaptability: Configuration can be updated without system restart (where appropriate)

Configuration Layers

Core Application Settings (config.yaml):

Application server configuration (ports, CORS, memory settings)
Database connection parameters
Logging configuration
Tool execution limits and timeouts

LLM and Prompt Configuration (llm_prompt.yaml):

LLM provider settings and model parameters
Specialized prompt templates for different agents
Token limits and generation parameters

Environment Variables:

Sensitive credentials (API keys, passwords)
Environment-specific overrides
Security tokens and certificates

Configuration Management

graph TB
    subgraph "Configuration Sources"
        CF1[config.yaml<br/>Core Settings]
        CF2[llm_prompt.yaml<br/>LLM & Prompts]
        CF3[Environment Variables<br/>Secrets]
    end
    
    subgraph "Configuration Models"
        CM1[AppConfig]
        CM2[LLMConfig]
        CM3[PostgreSQLConfig]
        CM4[RetrievalConfig]
        CM5[LoggingConfig]
    end
    
    subgraph "Runtime Configuration"
        RC1[Cached Config]
        RC2[Validation]
        RC3[Type Safety]
        RC4[Hot Reload]
    end
    
    CF1 --> CM1
    CF1 --> CM3
    CF1 --> CM4
    CF1 --> CM5
    CF2 --> CM2
    CF3 --> CM1
    CF3 --> CM3
    CF3 --> CM4
    
    CM1 --> RC1
    CM2 --> RC1
    CM3 --> RC1
    CM4 --> RC1
    CM5 --> RC1
    
    RC1 --> RC2
    RC2 --> RC3
    RC3 --> RC4
    
    style CF1 fill:#e3f2fd
    style CF2 fill:#e8f5e8
    style CF3 fill:#fff3e0

Service Configuration

The service configuration demonstrates the system's production-ready architecture:

Core Services Configuration:

Application Server: FastAPI running on port 8000 with CORS enabled for cross-origin requests
Database: Azure PostgreSQL with 7-day TTL for automatic session cleanup
LLM Provider: Configurable OpenAI/Azure OpenAI with multiple model support

Retrieval Services Configuration:

Azure AI Search: Hybrid search with semantic ranking across multiple indices
Embedding Service: Dedicated embedding generation service for vector search
Multi-Index Support: Separate indices for standards, regulations, and user manuals

Frontend Configuration:

Next.js Web Server: Port 3001 with server-side rendering and client-side hydration
API Proxy Layer: CORS handling and request forwarding to backend services
Static Asset Management: Optimized delivery of UI components and resources

graph LR
    subgraph "Core Services"
        APP[Application<br/>Port: 8000<br/>CORS: Enabled]
        DB[PostgreSQL<br/>Host: Azure<br/>TTL: 7 days]
        LLM[LLM Provider<br/>OpenAI/Azure<br/>Model: Configurable]
    end
    
    subgraph "Retrieval Services"
        AZ[Azure AI Search<br/>Hybrid + Semantic<br/>Multi-Index]
        EM[Embedding Service<br/>qwen3-embedding-8b<br/>Vector Generation]
    end
    
    subgraph "Frontend Services"
        WEB[Next.js Web<br/>Port: 3001<br/>SSR + Client]
        API[API Routes<br/>Proxy Layer<br/>CORS Handling]
    end
    
    APP --> DB
    APP --> LLM
    APP --> AZ
    AZ --> EM
    WEB --> API
    API --> APP

Deployment Architecture

The deployment architecture is designed for production scalability, reliability, and maintainability. It supports both cloud-native and containerized deployment patterns.

Deployment Design Principles

Cloud-Native: Leverages Azure cloud services for scalability and reliability
Containerization: Docker-based deployment for consistency across environments
Load Distribution: Multiple instances with proper load balancing
Health Monitoring: Comprehensive health checks and performance monitoring
Graceful Scaling: Auto-scaling capabilities based on demand

Production Deployment

The production deployment implements a multi-tier architecture with proper separation of concerns:

Load Balancer Tier:

Azure Load Balancer for high availability and traffic distribution
SSL termination and security policy enforcement
Health check routing to ensure traffic only reaches healthy instances

Frontend Tier:

Multiple Next.js instances for redundancy and load distribution
Static asset caching and CDN integration
Server-side rendering for optimal performance

Backend Tier:

Horizontally scalable FastAPI instances
Connection pooling for database efficiency
Shared session state through PostgreSQL

Data Tier:

Azure PostgreSQL for persistent session storage
Azure AI Search for document retrieval
External LLM services (OpenAI/Azure OpenAI)

Monitoring Tier:

Structured logging with centralized collection
Health check endpoints for all services
Performance metrics and alerting

graph TB
    subgraph "Load Balancer"
        LB[Azure Load Balancer]
    end
    
    subgraph "Frontend Tier"
        WEB1[Next.js Instance 1]
        WEB2[Next.js Instance 2]
        WEB3[Next.js Instance N]
    end
    
    subgraph "Backend Tier"
        API1[FastAPI Instance 1]
        API2[FastAPI Instance 2]
        API3[FastAPI Instance N]
    end
    
    subgraph "Data Tier"
        PG[(Azure PostgreSQL<br/>Session Memory)]
        AZ[(Azure AI Search<br/>Document Indices)]
        LLM[LLM Services<br/>OpenAI/Azure OpenAI]
    end
    
    subgraph "Monitoring"
        LOG[Structured Logging]
        HEALTH[Health Checks]
        METRICS[Performance Metrics]
    end
    
    LB --> WEB1
    LB --> WEB2
    LB --> WEB3
    
    WEB1 --> API1
    WEB2 --> API2
    WEB3 --> API3
    
    API1 --> PG
    API2 --> PG
    API3 --> PG
    
    API1 --> AZ
    API2 --> AZ
    API3 --> AZ
    
    API1 --> LLM
    API2 --> LLM
    API3 --> LLM
    
    API1 --> LOG
    API2 --> LOG
    API3 --> LOG
    
    LOG --> HEALTH
    LOG --> METRICS
    
    style LB fill:#e1f5fe
    style PG fill:#e8f5e8
    style AZ fill:#fff3e0
    style LLM fill:#f3e5f5

Container Architecture

The containerized deployment provides consistency and portability across environments:

Frontend Container:

Next.js application with Node.js runtime
Optimized build with static asset pre-generation
Environment variable injection for configuration
Health check endpoints for load balancer integration

Backend Container:

FastAPI application with Python 3.12+ runtime
Complete dependency tree including LangGraph and database drivers
Multi-stage build for optimized container size
Configuration validation on startup

External Service Integration:

Azure PostgreSQL for session persistence
Azure AI Search for document retrieval
Azure OpenAI for language model capabilities

Configuration Management:

Environment variables for runtime configuration
Mounted configuration files for complex settings
Secret management for sensitive credentials
Health check integration for service discovery

Benefits:

Consistency: Identical runtime environment across all deployments
Scalability: Easy horizontal scaling of individual services
Maintainability: Clear separation of application and infrastructure concerns
Security: Isolated execution environments with minimal attack surface

graph TB
    subgraph "Docker Containers"
        subgraph "Frontend Container"
            NEXT[Next.js<br/>Node.js Runtime<br/>Port: 3000]
        end
        
        subgraph "Backend Container"
            FAST[FastAPI<br/>Python Runtime<br/>Port: 8000]
            DEPS[Dependencies<br/>- LangGraph<br/>- psycopg<br/>- httpx]
        end
    end
    
    subgraph "External Services"
        PG_EXT[(Azure PostgreSQL)]
        AZ_EXT[(Azure AI Search)]
        LLM_EXT[Azure OpenAI]
    end
    
    subgraph "Configuration"
        ENV[Environment Variables]
        CONFIG[Configuration Files]
        SECRETS[Secret Management]
    end
    
    NEXT --> FAST
    FAST --> DEPS
    FAST --> PG_EXT
    FAST --> AZ_EXT
    FAST --> LLM_EXT
    
    ENV --> FAST
    ENV --> NEXT
    CONFIG --> FAST
    SECRETS --> FAST
    
    style NEXT fill:#e1f5fe
    style FAST fill:#e8f5e8
    style PG_EXT fill:#fff3e0
    style AZ_EXT fill:#fff3e0
    style LLM_EXT fill:#f3e5f5

Security Architecture

Security is implemented as a multi-layered defense system addressing threats at every level of the application stack. The architecture follows security best practices and industry standards.

Security Design Principles

Defense in Depth: Multiple security layers prevent single points of failure
Least Privilege: Components have minimal required permissions
Zero Trust: All requests are validated regardless of source
Data Protection: Sensitive data is encrypted at rest and in transit
Audit Trail: Comprehensive logging for security monitoring and compliance

Security Layers

graph TB
    subgraph "Frontend Security"
        CSP[Content Security Policy]
        CORS[CORS Configuration]
        XSS[XSS Protection]
        HTTPS[HTTPS Enforcement]
    end
    
    subgraph "API Security"
        AUTH[Session Authentication]
        RATE[Rate Limiting]
        VAL[Input Validation]
        CSRF[CSRF Protection]
    end
    
    subgraph "Data Security"
        ENC[Data Encryption]
        TLS[TLS Connections]
        KEY[Key Management]
        TTL[Data TTL/Cleanup]
    end
    
    subgraph "Infrastructure Security"
        VPN[Network Isolation]
        FW[Firewall Rules]
        IAM[Identity Management]
        AUDIT[Audit Logging]
    end
    
    CSP --> AUTH
    CORS --> AUTH
    XSS --> VAL
    HTTPS --> TLS
    
    AUTH --> ENC
    RATE --> ENC
    VAL --> ENC
    CSRF --> ENC
    
    ENC --> VPN
    TLS --> VPN
    KEY --> IAM
    TTL --> AUDIT
    
    style CSP fill:#ffebee
    style AUTH fill:#fff3e0
    style ENC fill:#e8f5e8
    style VPN fill:#e1f5fe

Performance Architecture

The system is designed for optimal performance across all components, with careful attention to latency, throughput, and resource utilization. Performance optimization is implemented at every layer.

Performance Design Principles

Latency Optimization: Minimize time to first response and overall response time
Throughput Maximization: Handle maximum concurrent users efficiently
Resource Efficiency: Optimal use of CPU, memory, and network resources
Predictable Performance: Consistent response times under varying loads
Scalable Architecture: Performance scales linearly with additional resources

Performance Optimization Strategies

Frontend Performance:

Server-Side Rendering: Faster initial page loads and better SEO
Code Splitting: Load only necessary JavaScript for each page
Browser Caching: Aggressive caching of static assets and API responses
CDN Distribution: Global content delivery for reduced latency

Backend Performance:

Asynchronous Processing: Non-blocking I/O for maximum concurrency
Connection Pooling: Efficient database connection management
Retry Logic: Intelligent retry mechanisms for transient failures
Streaming Responses: Immediate user feedback with progressive loading

Data Performance:

Search Indexing: Optimized indices for fast document retrieval
Vector Optimization: Efficient similarity search and ranking
Memory Management: Smart caching and memory usage patterns
TTL Optimization: Automatic cleanup to prevent performance degradation

Infrastructure Performance:

Auto Scaling: Dynamic resource allocation based on demand
Load Balancing: Optimal distribution of requests across instances
Performance Monitoring: Real-time metrics and alerting
Alert Systems: Proactive notification of performance issues

graph LR
    subgraph "Frontend Optimization"
        SSR[Server-Side Rendering]
        CODE[Code Splitting]
        CACHE[Browser Caching]
        CDN[CDN Distribution]
    end
    
    subgraph "Backend Optimization"
        ASYNC[Async Processing]
        POOL[Connection Pooling]
        RETRY[Retry Logic]
        STREAM[Streaming Responses]
    end
    
    subgraph "Data Optimization"
        INDEX[Search Indexing]
        VECTOR[Vector Optimization]
        MEMORY[Memory Management]
        TTL_OPT[TTL Optimization]
    end
    
    subgraph "Infrastructure Optimization"
        SCALE[Auto Scaling]
        BALANCE[Load Balancing]
        MONITOR[Performance Monitoring]
        ALERT[Alert Systems]
    end
    
    SSR --> ASYNC
    CODE --> POOL
    CACHE --> RETRY
    CDN --> STREAM
    
    ASYNC --> INDEX
    POOL --> VECTOR
    RETRY --> MEMORY
    STREAM --> TTL_OPT
    
    INDEX --> SCALE
    VECTOR --> BALANCE
    MEMORY --> MONITOR
    TTL_OPT --> ALERT

Technology Stack

The technology stack represents a carefully curated selection of modern, production-ready technologies that work together seamlessly to deliver a robust and scalable solution.

Technology Selection Criteria

Maturity: Proven technologies with strong community support
Performance: Optimized for speed and efficiency
Scalability: Can grow with increasing demands
Developer Experience: Tools that enhance productivity and maintainability
Ecosystem Integration: Technologies that work well together

Stack Components

Frontend Technologies:

Next.js 15: Latest React framework with advanced features like App Router and Server Components
React 19: Modern React with concurrent features and improved performance
TypeScript: Type safety and better developer experience
Tailwind CSS: Utility-first CSS framework for rapid UI development
assistant-ui: Specialized components for AI chat interfaces

Backend Technologies:

FastAPI: High-performance Python web framework with automatic API documentation
Python 3.12+: Latest Python with performance improvements and new features
LangGraph v0.6+: Advanced workflow orchestration for AI agents
Pydantic: Data validation and settings management
asyncio: Asynchronous programming for optimal concurrency

Data Technologies:

PostgreSQL: Robust relational database for session storage
psycopg3: Modern PostgreSQL adapter with async support
Azure AI Search: Advanced search capabilities with hybrid and semantic search
Vector Embeddings: Semantic similarity search for improved relevance

Infrastructure Technologies:

Docker: Containerization for consistent deployments
Azure Cloud: Comprehensive cloud platform with managed services
Health Monitoring: Built-in monitoring and alerting capabilities
Structured Logging: Comprehensive logging for debugging and monitoring

Complete Technology Stack

mindmap
  root((Technology Stack))
    Frontend
      Next.js 15
      React 19
      TypeScript
      Tailwind CSS
      assistant-ui
    Backend
      FastAPI
      Python 3.12+
      LangGraph v0.6+
      Pydantic
      asyncio
    Memory
      PostgreSQL
      psycopg3
      LangGraph Checkpointer
      Connection Pooling
    Search
      Azure AI Search
      Hybrid Search
      Vector Embeddings
      Semantic Ranking
    LLM
      OpenAI API
      Azure OpenAI
      Streaming Support
      Function Calling
    DevOps
      Docker
      Azure Cloud
      Health Monitoring
      Structured Logging

Conclusion

This Agentic RAG system represents a comprehensive solution for manufacturing standards and regulations queries, featuring:

Key Architectural Achievements

Sophisticated Multi-Layer Architecture: Clear separation of concerns with well-defined interfaces between frontend, API gateway, backend services, and data layers
Advanced AI Capabilities: LangGraph-powered multi-intent agents with intelligent routing and streaming responses
Production-Ready Implementation: Comprehensive error handling, monitoring, health checks, and graceful fallback mechanisms
Modern Technology Stack: Latest frameworks and best practices including Next.js 15, React 19, FastAPI, and LangGraph v0.6+
Scalable Design: Architecture ready for enterprise-scale deployment with horizontal scaling capabilities

System Benefits

For Users:

Intelligent, context-aware responses to complex manufacturing standards queries
Real-time streaming with immediate feedback and progress visibility
Multi-language support with automatic browser detection
Persistent conversation history across sessions

For Developers:

Clear, maintainable architecture with excellent documentation
Comprehensive testing framework with unit and integration tests
Configuration-driven deployment with environment flexibility
Modern development tools and practices

For Operations:

Docker-based deployment for consistency across environments
Comprehensive monitoring and alerting capabilities
Graceful degradation and fault tolerance
Automated scaling and load balancing

Design Excellence

The system demonstrates several aspects of excellent software design:

Modularity: Each component has a single, well-defined responsibility
Extensibility: New agents, tools, and features can be added without breaking existing functionality
Reliability: Multiple layers of error handling and fallback mechanisms
Performance: Optimized for both latency and throughput with streaming responses
Security: Multi-layered security architecture following industry best practices
Maintainability: Clean code structure with comprehensive documentation and testing

This architecture provides a solid foundation for current requirements while being flexible enough to accommodate future growth and enhancement. The system successfully combines the power of retrieval-augmented generation with intelligent agent orchestration to provide accurate, grounded, and cite-able responses to complex manufacturing standards queries.

51 KiB Raw Permalink Blame History