# Agentic RAG System Design Document ## Overview This document provides a comprehensive architectural overview of the Agentic RAG (Retrieval-Augmented Generation) system for manufacturing standards and regulations. The system combines LangGraph orchestration, streaming responses, and authoritative document retrieval to provide grounded answers with proper citations. ### Design Philosophy The Agentic RAG system is built on several key design principles: 1. **Intelligent Intent Recognition**: The system automatically classifies user queries into different knowledge domains (standards/regulations vs. user manuals) to route them to specialized agents for optimal handling. 2. **Two-Phase Retrieval Strategy**: For standards and regulations queries, the system first discovers relevant document metadata, then performs detailed content retrieval with enhanced query conditions based on the metadata findings. 3. **Streaming-First Architecture**: All responses are delivered via Server-Sent Events (SSE) with real-time token streaming and tool execution progress, providing immediate feedback to users. 4. **Session-Aware Memory**: Persistent conversation history stored in PostgreSQL enables context-aware multi-turn conversations while maintaining session isolation. 5. **Production-Ready Design**: Comprehensive error handling, health monitoring, configuration management, and graceful fallback mechanisms ensure system reliability. ## System Architecture The Agentic RAG system employs a modern microservices architecture with clear separation of concerns across multiple layers. Each layer has specific responsibilities and communicates through well-defined interfaces. ### Architecture Design Principles - **Layered Architecture**: Clear separation between presentation, business logic, data access, and external services - **Asynchronous Processing**: Non-blocking operations throughout the request pipeline for optimal performance - **Horizontal Scalability**: Stateless services that can be scaled independently based on load - **Fault Tolerance**: Graceful degradation and fallback mechanisms at every layer - **Configuration-Driven**: Environment-specific settings externalized for flexible deployment ### High-Level Architecture ```mermaid graph TB subgraph "Frontend Layer" UI[Next.js Web UI
@assistant-ui/react] TR[Thread Component] TU[Tool UI Components] LS[Language Switcher] end subgraph "API Gateway Layer" NX[Next.js API Routes
/api/chat] DP[Data Stream Protocol
SSE Adapter] end subgraph "Backend Service Layer" FA[FastAPI Server
Port 8000] AS[AI SDK Adapter] SC[SSE Controller] end subgraph "Agent Orchestration Layer" LG[LangGraph Workflow] IR[Intent Recognition] SA[Standards Agent] MA[Manual Agent] PP[Post Processor] end subgraph "Memory Layer" PG[(PostgreSQL
Session Store)] CH[Checkpointer] MM[Memory Manager] end subgraph "Retrieval Layer" AZ[Azure AI Search] EM[Embedding Service] IDX[Search Indices] end subgraph "LLM Layer" LLM[LLM Provider
OpenAI/Azure OpenAI] CF[Configuration] end UI --> NX TR --> NX TU --> NX LS --> UI NX --> DP DP --> FA FA --> AS AS --> SC SC --> LG LG --> IR IR --> SA IR --> MA SA --> PP MA --> PP LG --> CH CH --> PG MM --> PG SA --> AZ MA --> AZ AZ --> EM AZ --> IDX SA --> LLM MA --> LLM LLM --> CF ``` ### Component Architecture The system is organized into several key component groups, each responsible for specific aspects of the application functionality: **Web Frontend Components**: - **Assistant Component**: The main orchestrator that manages the overall chat experience - **Thread UI**: Handles conversation display and user interaction - **Tool UIs**: Specialized visualizations for different tool types (search, retrieval, analysis) - **Language Support**: Multi-language interface with automatic browser detection **Backend Core Components**: - **FastAPI Main**: Central application server handling HTTP requests and responses - **AI SDK Chat Endpoint**: Specialized endpoint implementing the Data Stream Protocol for streaming responses - **SSE Stream Controller**: Manages Server-Sent Events for real-time communication - **Configuration Manager**: Centralized configuration loading and validation **Agent System Components**: - **LangGraph StateGraph**: Core workflow engine managing agent execution - **Intent Router**: Intelligent classifier determining the appropriate agent for each query - **Agent Nodes**: Specialized processing units for different query types - **Tool Nodes**: Execution environment for retrieval and analysis tools - **Memory System**: Persistent storage and retrieval of conversation context ```mermaid graph LR subgraph "Web Frontend" direction TB A1[Assistant Component] A2[Thread UI] A3[Tool UIs] A4[Language Support] A1 --> A2 A1 --> A3 A1 --> A4 end subgraph "Backend Core" direction TB B1[FastAPI Main] B2[AI SDK Chat Endpoint] B3[SSE Stream Controller] B4[Configuration Manager] B1 --> B2 B2 --> B3 B1 --> B4 end subgraph "Agent System" direction TB C1[LangGraph StateGraph] C2[Intent Router] C3[Agent Nodes] C4[Tool Nodes] C5[Memory System] C1 --> C2 C2 --> C3 C3 --> C4 C1 --> C5 end subgraph "Data Layer" direction TB D1[PostgreSQL Memory] D2[Azure AI Search] D3[LLM Services] D4[Configuration Store] end A1 -.-> B2 B3 --> C1 C4 --> D2 C3 --> D3 C5 --> D1 B4 --> D4 ``` ## Workflow Design The Agentic RAG system implements sophisticated workflow patterns to handle different types of queries efficiently. The workflows are designed to be autonomous, adaptive, and optimized for the specific characteristics of each query type. ### Agentic Workflow Architecture Advantages Our system adopts the **Agentic Workflow** paradigm, which represents the optimal balance between autonomy and control in AI system design. This approach combines the best aspects of both traditional AI workflows and AI agents: **AI Workflow Patterns Comparison**: 1. **AI Workflows**: Deterministic, predesigned pipelines with highest predictability but lowest autonomy 2. **AI Agents**: Reason-act loops that decide next steps with higher autonomy but variable reliability 3. **Agentic Workflows**: Orchestrated graphs that embed one or more agents with guardrails, memory, and tools - delivering both autonomy and control **Our Agentic Workflow Benefits**: - **Controlled Autonomy**: Agents can make autonomous decisions within well-defined guardrails and tool constraints - **Predictable Behavior**: LangGraph orchestration ensures reproducible workflows while allowing agent flexibility - **Robust Error Handling**: Built-in guardrails prevent agents from making unreliable or unsafe decisions - **Memory-Aware Processing**: Persistent session memory enables context-aware autonomous decision making - **Tool-Constrained Intelligence**: Agents operate within a curated set of tools, ensuring reliable and relevant outputs - **Multi-Agent Coordination**: Different specialized agents handle different query types with orchestrated handoffs - **Adaptive Execution**: Agents can autonomously decide on tool usage and multi-round execution while staying within system limits **Architectural Implementation**: - **LangGraph StateGraph**: Provides the orchestrated graph structure with defined state transitions - **Intent Recognition Router**: Ensures queries reach the most appropriate specialized agent - **Tool Round Limits**: Guardrails prevent infinite loops while allowing autonomous multi-step reasoning - **Session Memory**: Enables context-aware decisions across conversation turns - **Streaming Feedback**: Real-time progress visibility provides user confidence in autonomous processing ### Workflow Design Principles 1. **Intent-Driven Routing**: Automatic classification ensures queries are handled by the most appropriate specialized agent 2. **Multi-Round Tool Execution**: Agents can autonomously decide to use multiple tools in sequence to gather comprehensive information 3. **Parallel Processing**: Multiple retrieval operations can execute simultaneously to reduce response time 4. **Context Preservation**: Conversation history is maintained and used to enhance subsequent queries 5. **Citation Generation**: All responses include proper source attribution with automatic citation extraction ### Agentic Workflow The core workflow demonstrates the Agentic Workflow pattern with orchestrated agent execution, guardrails, and autonomous decision-making within controlled boundaries. Each specialized agent operates with autonomy while being constrained by system guardrails and tool limitations. ```mermaid flowchart TD START([User Query]) --> IR{Intent Recognition} IR -->|User Manual| UMA[User Manual RAG Agent] IR -->|Standards/Regulations| SRA[Standards/Regulations RAG Agent] subgraph "Standards/Regulations RAG" SRA --> SRT{Need Tools?} SRT -->|Yes| STL[Standards/Regulations Retrieval Tools
Parallel Execution] SRT -->|No| SRS[Answer Synthesis] STL --> STC{Continue?} STC -->|Yes| QR2[Query Enhancement/
Refinement] QR2 --> SRT STC -->|No| SRS end subgraph "User Manual RAG" UMA --> UMT{Need Tools?} UMT -->|Yes| UML[User Manual Retrieval Tools
Parallel Execution] UMT -->|No| UMS[Answer Synthesis] UML --> UMC{Continue?} UMC -->|Yes| QR4[Query Enhancement/
Refinement] QR4 --> UMT UMC -->|No| UMS end SRS --> SPP[Citation Builder] SPP --> END1([Response with Citations]) UMS --> END2([Response]) style IR fill:#e1f5fe style SRA fill:#f3e5f5 style UMA fill:#e8f5e8 style STL fill:#fff3e0 style UML fill:#fff3e0 ``` **Agentic Workflow Features Demonstrated**: - **Orchestrated Graph Structure**: LangGraph manages the overall workflow with defined state transitions - **Embedded Specialized Agents**: Different agents (Standards/Regulations, User Manual) handle domain-specific queries - **Intelligent Query Rewriting/Decomposition**: Core agentic feature where agents autonomously analyze, decompose, and rewrite queries for optimal retrieval coverage - demonstrating true query understanding and strategic planning - **Autonomous Decision Making**: Agents decide whether tools are needed and when to continue or finish - **Built-in Guardrails**: Tool round limits and workflow constraints prevent infinite loops - **Memory Integration**: Conversation context influences agent decisions throughout the workflow - **Tool Orchestration**: Agents autonomously select and execute appropriate tools within defined boundaries - **Adaptive Query Intelligence**: Agents learn from retrieval results and iteratively refine queries, showcasing emergent intelligence - **Controllable Citation List and Links**: Agentic workflow provides precise, controllable citation tracking with automatic mapping between retrieved sources and generated content, and can dynamically construct formatted citation lists and secure link URLs based on rule logic **Query Rewriting/Decomposition in Agentic Workflow** - The Core Intelligence Feature: This is the defining characteristic that elevates our solution from simple RAG to true Agentic RAG. The agents demonstrate genuine understanding and strategic thinking through sophisticated query processing: - **Cognitive Query Analysis**: Agents autonomously analyze user queries to understand intent, identify ambiguities, and infer implicit information requirements - **Strategic Multi-Perspective Decomposition**: Agents intelligently break down complex queries into 2-3 complementary sub-queries that explore different conceptual aspects, ensuring comprehensive coverage - **Cross-Language Intelligence**: Agents automatically generate semantically equivalent bilingual query variants (Chinese/English), demonstrating deep linguistic understanding - **Context-Aware Strategic Rewriting**: Agents incorporate conversation history and domain knowledge to refine and enhance queries, showing memory-driven intelligence - **Autonomous Parallel Query Orchestration**: Agents independently decide to execute multiple rewritten queries in parallel, optimizing for both speed and coverage - **Iterative Learning and Refinement**: Based on retrieval results, agents autonomously enhance queries for subsequent rounds, demonstrating learning and adaptation - **Metadata-Informed Query Enhancement**: For Phase 2 retrieval, agents intelligently synthesize metadata constraints from Phase 1 results, showing multi-step reasoning capability **Citation Management in Agentic Workflow** - Enhanced Accountability and Traceability: The Agentic Workflow provides unprecedented control and precision in citation management, going far beyond traditional RAG systems: - **Autonomous Citation Tracking**: Agents automatically track all tool calls and their results throughout multi-step workflows, maintaining complete provenance information - **Fine-Grained Source Mapping**: Each citation is precisely mapped to specific tool call results with unique identifiers, enabling exact source traceability - **Multi-Round Citation Coherence**: Agents maintain consistent citation numbering across multiple tool execution rounds, preventing citation conflicts or duplication - **Intelligent Citation Placement**: Agents strategically place citations based on content relevance and source quality, not just chronological order - **Cross-Tool Citation Integration**: Citations seamlessly integrate results from different tools (metadata search, content search) within a unified numbering system - **Post-Processing Citation Enhancement**: Dedicated post-processing nodes enrich citations with additional metadata (URLs, document titles, publication dates) for comprehensive reference lists - **Citation Quality Control**: Agents filter and validate citation sources based on relevance scores and metadata quality, ensuring only high-quality references are included **Citation Processing Workflow**: 1. **Real-time Citation Capture**: As agents execute tools, each result is automatically tagged with tool call ID and order number 2. **Strategic Citation Assignment**: Agents intelligently assign citation numbers based on content importance and source authority 3. **Citation Map Generation**: Agents generate structured citation mappings in standardized CSV format for processing 4. **Post-Processing Enhancement**: Dedicated nodes transform raw citation data into formatted reference lists with complete metadata 5. **Quality Validation**: Final citation lists undergo validation to ensure accuracy and completeness This systematic approach ensures that every piece of information can be traced back to its exact source, providing users with the confidence and transparency required for regulatory and compliance use cases. **Query Processing Strategies** - Domain-Specific Intelligence in Action: The following strategies demonstrate how our agentic approach applies query rewriting/decomposition differently based on the target domain, showcasing true adaptive intelligence: 1. **Standards/Regulations Queries**: - **Phase 1**: Generate 2-3 parallel metadata-focused sub-queries - **Phase 2**: Enhance queries with document codes and metadata constraints from Phase 1 - **Lucene Syntax**: Intelligent use of advanced search syntax for precision filtering 2. **User Manual Queries**: - **Content-Focused**: Generate queries targeting procedural and instructional content - **Multi-Modal**: Consider both textual content and structural elements (headers, sections) - **Context Integration**: Incorporate previous tool results for query refinement 3. **Cross-Agent Learning** - Advanced Agentic Intelligence: - **Query Pattern Recognition**: Agents learn from successful query patterns across sessions, demonstrating emergent learning capabilities - **Adaptive Rewriting**: Query strategies evolve and adapt based on retrieval success rates, showing continuous improvement - **Domain-Specific Optimization**: Each agent develops specialized query rewriting patterns for its domain, demonstrating specialized expertise development ### Two-Phase Retrieval Strategy The standards and regulations agent employs a sophisticated two-phase retrieval strategy designed to maximize accuracy and relevance: **Phase 1: Metadata Discovery with Query Decomposition** - **Query Analysis**: Agent analyzes user intent and decomposes complex queries into focused sub-queries - **Multi-Perspective Rewriting**: Generates 2-3 parallel sub-queries exploring different aspects of the user's intent - **Cross-Language Coverage**: Automatically includes both Chinese and English query variants for comprehensive search - **Metadata-Focused Queries**: Searches for document attributes, codes, titles, and publication information - **Parallel Execution**: Multiple rewritten queries execute simultaneously to maximize metadata coverage - **Result Analysis**: Agent synthesizes metadata findings to identify relevant standards and regulations **Phase 2: Content Retrieval with Query Enhancement** (conditional) - **Need Assessment**: Agent autonomously determines if detailed content retrieval is required - **Query Enhancement**: Intelligently incorporates metadata constraints from Phase 1 results - **Lucene Syntax Integration**: Uses advanced search syntax with metadata filtering (e.g., `(content_query) AND (document_code:(ISO45001 OR GB6722))`) - **Context-Aware Refinement**: Enhances queries with conversation history and previous tool results - **Focused Content Search**: Retrieves detailed document chunks with full context and precise filtering - **Multi-Round Capability**: Can perform additional query refinement based on initial content results **Query Rewriting Examples**: *Original Query*: "汽车安全要求标准" (Automotive Safety Requirements Standards) *Phase 1 Decomposed Queries*: 1. "汽车安全标准 automotive safety standards GB ISO requirements" 2. "车辆安全要求 vehicle safety requirements regulations 法规" 3. "automotive safety standards ISO GB national standards 汽车" *Phase 2 Enhanced Queries* (if Phase 1 found relevant documents): 1. `(安全要求 safety requirements) AND (document_code:(GB11551 OR ISO26262 OR GB7258))` 2. `(automotive safety testing procedures) AND (document_category:Standard) AND (x_Standard_Vehicle_Type:passenger)` 3. `(车辆安全技术条件) AND (publisher:国家标准委 OR SAC) AND (x_Standard_Published_State_EN:Effective)` This strategy ensures that users receive both overview information and detailed content as needed, while maintaining high precision through metadata-enhanced filtering and intelligent query decomposition. **Agentic Workflow in Two-Phase Retrieval**: - **Autonomous Phase Detection**: Agents autonomously determine when Phase 2 retrieval is needed based on query analysis - **Dynamic Query Enhancement**: Agents intelligently enhance Phase 2 queries using metadata from Phase 1 results - **Controlled Tool Execution**: Tool usage is governed by workflow guardrails while allowing agent flexibility - **Memory-Informed Decisions**: Previous conversation context influences retrieval strategy decisions - **Parallel Processing Autonomy**: Agents can autonomously decide on parallel query execution for optimal coverage ```mermaid sequenceDiagram participant U as User participant A as Agent participant QR as Query Rewriter participant P1 as Phase 1 Tool participant P2 as Phase 2 Tool participant AS as Azure Search participant LLM as LLM Service U->>A: Original query about standards A->>QR: Analyze and decompose query QR->>QR: Generate 2-3 sub-queries QR->>QR: Add cross-language variants QR-->>A: Rewritten query set par Phase 1: Parallel Metadata Discovery A->>P1: retrieve_standard_regulation(rewritten_query_1) A->>P1: retrieve_standard_regulation(rewritten_query_2) A->>P1: retrieve_standard_regulation(rewritten_query_3) P1->>AS: Search metadata index (parallel) AS-->>P1: Standards metadata results P1-->>A: Document codes, titles, dates end A->>A: Analyze metadata results A->>A: Determine if content needed A->>QR: Assess need for Phase 2 opt Phase 2: Enhanced Content Retrieval QR->>QR: Enhance queries with metadata constraints QR->>QR: Apply Lucene syntax filtering QR-->>A: Enhanced query with metadata filters A->>P2: retrieve_doc_chunk(enhanced_query_with_constraints) P2->>AS: Search content index + metadata filters AS-->>P2: Filtered document chunks P2-->>A: Detailed content with context end A->>LLM: Synthesize with retrieved data LLM-->>A: Generated response A->>A: Extract citations from all sources A-->>U: Final answer with citations Note over QR: Query Rewriting Strategies:
- Multi-perspective decomposition
- Cross-language variants
- Context-aware enhancement
- Metadata constraint integration ``` ### Memory Management Flow The system implements sophisticated session management with PostgreSQL-based persistence: **Session Lifecycle Management**: - Unique session IDs generated for each conversation thread - Automatic session initialization with proper memory allocation - Conversation turns tracked with message ordering and timestamps - Intelligent message trimming to stay within context length limits - Persistent storage with 7-day TTL for automatic cleanup **Memory Architecture Benefits**: - **Cross-Request Continuity**: Conversations persist across browser sessions - **Context-Aware Responses**: Agents can reference previous exchanges - **Scalable Storage**: PostgreSQL provides reliable, scalable persistence - **Automatic Cleanup**: TTL-based garbage collection prevents storage bloat - **Fault Tolerance**: Graceful fallback to in-memory storage if PostgreSQL unavailable **Agentic Workflow Memory Integration**: - **Context-Driven Autonomy**: Agents make informed decisions based on conversation history - **Memory-Aware Tool Selection**: Previous tool results influence future tool choices - **Session-Aware Guardrails**: Memory context helps agents avoid redundant operations - **Adaptive Workflow Paths**: Conversation context guides agent workflow decisions - **Persistent Learning**: Agents can build upon previous conversation context for improved responses ```mermaid flowchart TD subgraph "Session Lifecycle" SS[Session Start] --> SI[Session ID Generation] SI --> SM[Memory Initialization] SM --> CT[Conversation Turns] CT --> TM[Message Trimming] TM --> PS[Persistent Storage] PS --> TTL[TTL Cleanup] TTL --> SE[Session End] end subgraph "PostgreSQL Memory" SM --> CP[Create Checkpointer] CP --> PG[(PostgreSQL DB)] PS --> PW[Put Writes] PW --> PG TM --> TR[Trim Messages] TR --> PG TTL --> CL[Cleanup Old Records] CL --> PG end subgraph "Fallback Strategy" CP --> FB{PostgreSQL Available?} FB -->|No| IM[In-Memory Store] FB -->|Yes| PG end style PG fill:#e3f2fd style IM fill:#fff3e0 style FB fill:#ffebee ``` ## Feature Architecture The Agentic RAG system provides a comprehensive set of features designed for professional manufacturing standards and regulations queries. Each feature is implemented with production-grade quality and user experience considerations. ### Feature Design Philosophy - **User-Centric Design**: All features prioritize ease of use and clear information presentation - **Real-Time Feedback**: Users receive immediate feedback on system processing and tool execution - **Source Transparency**: All responses include clear attribution and citation links - **Multi-Modal Support**: Text, visual, and interactive elements enhance information comprehension - **Accessibility**: Interface supports multiple languages and responsive design patterns ### Core Features ```mermaid mindmap root((Agentic RAG Features)) Multi-Intent System Intent Recognition Domain Routing Specialized Agents Real-time Streaming SSE Protocol Token Streaming Tool Progress Citation Updates Advanced Retrieval Two-Phase Strategy Parallel Queries Metadata Enhancement Content Filtering Session Memory PostgreSQL Storage 7-Day TTL Context Trimming Cross-Request State Modern Web UI assistant-ui Components Tool Visualizations Multi-language Support Responsive Design Production Ready Error Handling Health Monitoring Configuration Management Docker Support ``` ### Tool System Architecture The tool system provides the core retrieval and analysis capabilities that power the agent workflows: **Tool Design Principles**: - **Query Intelligence**: Advanced query rewriting and decomposition before tool execution - **Modularity**: Each tool has a single, well-defined responsibility - **Composability**: Tools can be combined in various workflows with rewritten queries - **Observability**: All tool executions provide detailed progress feedback - **Error Resilience**: Robust error handling with meaningful error messages - **Performance**: Optimized for both accuracy and response time through smart query enhancement **Query Processing Integration**: Before any tool execution, the system applies sophisticated query rewriting and decomposition: 1. **Multi-perspective Decomposition**: Breaking complex queries into focused sub-queries 2. **Cross-language Variants**: Generating Chinese/English query variants for comprehensive coverage 3. **Context Enhancement**: Adding domain-specific context and terminology 4. **Metadata Constraint Integration**: Incorporating document type, date, and source constraints This preprocessing ensures that each tool receives optimally crafted queries for maximum retrieval effectiveness. **Tool Categories**: **Standards Tools**: Specialized for regulatory and standards documents with intelligent query enhancement - `retrieve_standard_regulation`: Discovers document metadata using decomposed and rewritten queries - `retrieve_doc_chunk_standard_regulation`: Retrieves detailed content with metadata-enhanced filtering **User Manual Tools**: Optimized for system documentation with context-aware query processing - `retrieve_doc_chunk_user_manual`: Searches user guides using rewritten queries for better coverage **Query Enhancement Integration**: All tools benefit from the query processing pipeline: - **Phase 1 Tools** receive multiple decomposed queries for comprehensive metadata discovery - **Phase 2 Tools** receive enhanced queries with metadata constraints for precise content retrieval - **Cross-tool Coordination** ensures consistent query interpretation across different tool types **Azure AI Search Integration**: All tools leverage advanced search capabilities with query intelligence - **Smart Query Processing**: Handles multiple rewritten queries with parallel execution - **Hybrid Search**: Combines keyword and vector search for decomposed query components - **Semantic Ranking**: Improved result relevance through query understanding - **Cross-language Support**: Processes Chinese/English query variants seamlessly - **Metadata-aware Filtering**: Applies enhanced constraints from query rewriting - **Score Aggregation**: Combines results from multiple query variants for comprehensive coverage - **Multi-field Search**: Searches across content and metadata with context-enhanced queries ```mermaid graph TB subgraph "Query Processing Pipeline" QI[Query Input] --> QR[Query Rewriter & Decomposer] QR --> QA[Query Analyzer] QA --> QD[Query Dispatcher] end subgraph "Query Rewriting Strategies" QR --> QR1[Multi-perspective Decomposition] QR --> QR2[Cross-language Variants] QR --> QR3[Context Enhancement] QR --> QR4[Metadata Constraint Integration] end subgraph "Tool Categories" ST[Standards Tools] UT[User Manual Tools] end subgraph "Standards Tools" ST1[retrieve_standard_regulation
Metadata Search + Query Decomposition] ST2[retrieve_doc_chunk_standard_regulation
Content Search + Enhanced Queries] end subgraph "User Manual Tools" UT1[retrieve_doc_chunk_user_manual
Manual Search + Rewritten Queries] end subgraph "Tool Execution" TE[Tool Executor] PS[Parallel Scheduling] ER[Error Recovery] RF[Result Formatting] end subgraph "Azure AI Search Integration" HS[Hybrid Search] VS[Vector Search] SR[Semantic Ranking] RS[Result Scoring] end QD --> ST QD --> UT ST --> ST1 ST --> ST2 UT --> UT1 ST1 --> TE ST2 --> TE UT1 --> TE TE --> PS PS --> ER ER --> RF TE --> HS HS --> VS VS --> SR SR --> RS ``` ## Data Flow Architecture The system implements sophisticated data flow patterns optimized for real-time streaming and multi-step processing. Understanding these flows is crucial for system maintenance and optimization. ### Data Flow Design Principles - **Streaming-First**: All responses use streaming protocols for immediate user feedback - **Event-Driven**: System components communicate through well-defined events - **Backpressure Handling**: Proper flow control prevents system overload - **Error Propagation**: Errors are handled gracefully with meaningful user feedback - **Observability**: Comprehensive logging and monitoring throughout all flows ### Request-Response Flow ```mermaid sequenceDiagram participant Browser as Web Browser participant NextJS as Next.js API participant FastAPI as FastAPI Backend participant LangGraph as LangGraph Engine participant Memory as PostgreSQL participant Search as Azure Search participant LLM as LLM Provider Browser->>NextJS: POST /api/chat NextJS->>FastAPI: Forward request FastAPI->>Memory: Load session Memory-->>FastAPI: Session data FastAPI->>LangGraph: Start workflow LangGraph->>LangGraph: Intent recognition alt Standards/Regulations Query LangGraph->>Search: Phase 1: Metadata search Search-->>LangGraph: Standards metadata LangGraph->>Search: Phase 2: Content search Search-->>LangGraph: Document chunks else User Manual Query LangGraph->>Search: Manual content search Search-->>LangGraph: Manual chunks end LangGraph->>LLM: Generate response LLM-->>LangGraph: Streamed tokens LangGraph->>LangGraph: Extract citations LangGraph->>Memory: Save conversation LangGraph-->>FastAPI: Streamed response FastAPI-->>NextJS: SSE stream NextJS-->>Browser: Data stream protocol ``` ### Streaming Data Flow The streaming architecture implements the Data Stream Protocol for real-time communication: **Stream Event Types**: - **Text Events**: `text-start`, `text-delta`, `text-end` for response content - **Tool Events**: `tool-input-start`, `tool-input-delta`, `tool-input-available` for tool parameters - **Tool Results**: `tool-output-available` for tool execution results - **Step Events**: `start-step`, `finish-step` for workflow progress - **Control Events**: `finish`, `DONE` for stream completion **Frontend Processing**: - **Data Stream Runtime**: Parses and routes stream events to appropriate UI components - **UI Components**: Update in real-time based on received events - **Tool UIs**: Specialized visualizations for different tool types and their progress **Benefits**: - **Immediate Feedback**: Users see processing start immediately - **Progress Visibility**: Tool execution progress is visible in real-time - **Error Handling**: Stream errors are displayed with context - **Responsive UX**: Interface remains interactive during processing ```mermaid flowchart LR subgraph "Backend Streaming" LG[LangGraph Events] AD[AI SDK Adapter] SSE[SSE Controller] end subgraph "Stream Events" TS[text-start] TD[text-delta] TE[text-end] TIS[tool-input-start] TID[tool-input-delta] TIA[tool-input-available] TOA[tool-output-available] SS[start-step] FS[finish-step] FIN[finish] DONE[DONE] end subgraph "Frontend Processing" DS[Data Stream Runtime] UI[UI Components] TU[Tool UIs] end LG --> AD AD --> TS AD --> TD AD --> TE AD --> TIS AD --> TID AD --> TIA AD --> TOA AD --> SS AD --> FS AD --> FIN AD --> DONE SSE --> DS DS --> UI DS --> TU style LG fill:#e1f5fe style DS fill:#e8f5e8 style SSE fill:#fff3e0 ``` ## Configuration Architecture The system uses a sophisticated configuration management approach that balances flexibility, security, and maintainability. Configuration is layered and validated to ensure system reliability. ### Configuration Design Philosophy - **Separation of Concerns**: Different types of configuration are managed separately - **Environment Flexibility**: Easy adaptation to different deployment environments - **Security First**: Sensitive data is handled through secure channels - **Type Safety**: All configuration is validated using Pydantic models - **Runtime Adaptability**: Configuration can be updated without system restart (where appropriate) ### Configuration Layers **Core Application Settings** (`config.yaml`): - Application server configuration (ports, CORS, memory settings) - Database connection parameters - Logging configuration - Tool execution limits and timeouts **LLM and Prompt Configuration** (`llm_prompt.yaml`): - LLM provider settings and model parameters - Specialized prompt templates for different agents - Token limits and generation parameters **Environment Variables**: - Sensitive credentials (API keys, passwords) - Environment-specific overrides - Security tokens and certificates ### Configuration Management ```mermaid graph TB subgraph "Configuration Sources" CF1[config.yaml
Core Settings] CF2[llm_prompt.yaml
LLM & Prompts] CF3[Environment Variables
Secrets] end subgraph "Configuration Models" CM1[AppConfig] CM2[LLMConfig] CM3[PostgreSQLConfig] CM4[RetrievalConfig] CM5[LoggingConfig] end subgraph "Runtime Configuration" RC1[Cached Config] RC2[Validation] RC3[Type Safety] RC4[Hot Reload] end CF1 --> CM1 CF1 --> CM3 CF1 --> CM4 CF1 --> CM5 CF2 --> CM2 CF3 --> CM1 CF3 --> CM3 CF3 --> CM4 CM1 --> RC1 CM2 --> RC1 CM3 --> RC1 CM4 --> RC1 CM5 --> RC1 RC1 --> RC2 RC2 --> RC3 RC3 --> RC4 style CF1 fill:#e3f2fd style CF2 fill:#e8f5e8 style CF3 fill:#fff3e0 ``` ### Service Configuration The service configuration demonstrates the system's production-ready architecture: **Core Services Configuration**: - **Application Server**: FastAPI running on port 8000 with CORS enabled for cross-origin requests - **Database**: Azure PostgreSQL with 7-day TTL for automatic session cleanup - **LLM Provider**: Configurable OpenAI/Azure OpenAI with multiple model support **Retrieval Services Configuration**: - **Azure AI Search**: Hybrid search with semantic ranking across multiple indices - **Embedding Service**: Dedicated embedding generation service for vector search - **Multi-Index Support**: Separate indices for standards, regulations, and user manuals **Frontend Configuration**: - **Next.js Web Server**: Port 3001 with server-side rendering and client-side hydration - **API Proxy Layer**: CORS handling and request forwarding to backend services - **Static Asset Management**: Optimized delivery of UI components and resources ```mermaid graph LR subgraph "Core Services" APP[Application
Port: 8000
CORS: Enabled] DB[PostgreSQL
Host: Azure
TTL: 7 days] LLM[LLM Provider
OpenAI/Azure
Model: Configurable] end subgraph "Retrieval Services" AZ[Azure AI Search
Hybrid + Semantic
Multi-Index] EM[Embedding Service
qwen3-embedding-8b
Vector Generation] end subgraph "Frontend Services" WEB[Next.js Web
Port: 3001
SSR + Client] API[API Routes
Proxy Layer
CORS Handling] end APP --> DB APP --> LLM APP --> AZ AZ --> EM WEB --> API API --> APP ``` ## Deployment Architecture The deployment architecture is designed for production scalability, reliability, and maintainability. It supports both cloud-native and containerized deployment patterns. ### Deployment Design Principles - **Cloud-Native**: Leverages Azure cloud services for scalability and reliability - **Containerization**: Docker-based deployment for consistency across environments - **Load Distribution**: Multiple instances with proper load balancing - **Health Monitoring**: Comprehensive health checks and performance monitoring - **Graceful Scaling**: Auto-scaling capabilities based on demand ### Production Deployment The production deployment implements a multi-tier architecture with proper separation of concerns: **Load Balancer Tier**: - Azure Load Balancer for high availability and traffic distribution - SSL termination and security policy enforcement - Health check routing to ensure traffic only reaches healthy instances **Frontend Tier**: - Multiple Next.js instances for redundancy and load distribution - Static asset caching and CDN integration - Server-side rendering for optimal performance **Backend Tier**: - Horizontally scalable FastAPI instances - Connection pooling for database efficiency - Shared session state through PostgreSQL **Data Tier**: - Azure PostgreSQL for persistent session storage - Azure AI Search for document retrieval - External LLM services (OpenAI/Azure OpenAI) **Monitoring Tier**: - Structured logging with centralized collection - Health check endpoints for all services - Performance metrics and alerting ```mermaid graph TB subgraph "Load Balancer" LB[Azure Load Balancer] end subgraph "Frontend Tier" WEB1[Next.js Instance 1] WEB2[Next.js Instance 2] WEB3[Next.js Instance N] end subgraph "Backend Tier" API1[FastAPI Instance 1] API2[FastAPI Instance 2] API3[FastAPI Instance N] end subgraph "Data Tier" PG[(Azure PostgreSQL
Session Memory)] AZ[(Azure AI Search
Document Indices)] LLM[LLM Services
OpenAI/Azure OpenAI] end subgraph "Monitoring" LOG[Structured Logging] HEALTH[Health Checks] METRICS[Performance Metrics] end LB --> WEB1 LB --> WEB2 LB --> WEB3 WEB1 --> API1 WEB2 --> API2 WEB3 --> API3 API1 --> PG API2 --> PG API3 --> PG API1 --> AZ API2 --> AZ API3 --> AZ API1 --> LLM API2 --> LLM API3 --> LLM API1 --> LOG API2 --> LOG API3 --> LOG LOG --> HEALTH LOG --> METRICS style LB fill:#e1f5fe style PG fill:#e8f5e8 style AZ fill:#fff3e0 style LLM fill:#f3e5f5 ``` ### Container Architecture The containerized deployment provides consistency and portability across environments: **Frontend Container**: - Next.js application with Node.js runtime - Optimized build with static asset pre-generation - Environment variable injection for configuration - Health check endpoints for load balancer integration **Backend Container**: - FastAPI application with Python 3.12+ runtime - Complete dependency tree including LangGraph and database drivers - Multi-stage build for optimized container size - Configuration validation on startup **External Service Integration**: - Azure PostgreSQL for session persistence - Azure AI Search for document retrieval - Azure OpenAI for language model capabilities **Configuration Management**: - Environment variables for runtime configuration - Mounted configuration files for complex settings - Secret management for sensitive credentials - Health check integration for service discovery **Benefits**: - **Consistency**: Identical runtime environment across all deployments - **Scalability**: Easy horizontal scaling of individual services - **Maintainability**: Clear separation of application and infrastructure concerns - **Security**: Isolated execution environments with minimal attack surface ```mermaid graph TB subgraph "Docker Containers" subgraph "Frontend Container" NEXT[Next.js
Node.js Runtime
Port: 3000] end subgraph "Backend Container" FAST[FastAPI
Python Runtime
Port: 8000] DEPS[Dependencies
- LangGraph
- psycopg
- httpx] end end subgraph "External Services" PG_EXT[(Azure PostgreSQL)] AZ_EXT[(Azure AI Search)] LLM_EXT[Azure OpenAI] end subgraph "Configuration" ENV[Environment Variables] CONFIG[Configuration Files] SECRETS[Secret Management] end NEXT --> FAST FAST --> DEPS FAST --> PG_EXT FAST --> AZ_EXT FAST --> LLM_EXT ENV --> FAST ENV --> NEXT CONFIG --> FAST SECRETS --> FAST style NEXT fill:#e1f5fe style FAST fill:#e8f5e8 style PG_EXT fill:#fff3e0 style AZ_EXT fill:#fff3e0 style LLM_EXT fill:#f3e5f5 ``` ## Security Architecture Security is implemented as a multi-layered defense system addressing threats at every level of the application stack. The architecture follows security best practices and industry standards. ### Security Design Principles - **Defense in Depth**: Multiple security layers prevent single points of failure - **Least Privilege**: Components have minimal required permissions - **Zero Trust**: All requests are validated regardless of source - **Data Protection**: Sensitive data is encrypted at rest and in transit - **Audit Trail**: Comprehensive logging for security monitoring and compliance ### Security Layers ```mermaid graph TB subgraph "Frontend Security" CSP[Content Security Policy] CORS[CORS Configuration] XSS[XSS Protection] HTTPS[HTTPS Enforcement] end subgraph "API Security" AUTH[Session Authentication] RATE[Rate Limiting] VAL[Input Validation] CSRF[CSRF Protection] end subgraph "Data Security" ENC[Data Encryption] TLS[TLS Connections] KEY[Key Management] TTL[Data TTL/Cleanup] end subgraph "Infrastructure Security" VPN[Network Isolation] FW[Firewall Rules] IAM[Identity Management] AUDIT[Audit Logging] end CSP --> AUTH CORS --> AUTH XSS --> VAL HTTPS --> TLS AUTH --> ENC RATE --> ENC VAL --> ENC CSRF --> ENC ENC --> VPN TLS --> VPN KEY --> IAM TTL --> AUDIT style CSP fill:#ffebee style AUTH fill:#fff3e0 style ENC fill:#e8f5e8 style VPN fill:#e1f5fe ``` ## Performance Architecture The system is designed for optimal performance across all components, with careful attention to latency, throughput, and resource utilization. Performance optimization is implemented at every layer. ### Performance Design Principles - **Latency Optimization**: Minimize time to first response and overall response time - **Throughput Maximization**: Handle maximum concurrent users efficiently - **Resource Efficiency**: Optimal use of CPU, memory, and network resources - **Predictable Performance**: Consistent response times under varying loads - **Scalable Architecture**: Performance scales linearly with additional resources ### Performance Optimization Strategies **Frontend Performance**: - **Server-Side Rendering**: Faster initial page loads and better SEO - **Code Splitting**: Load only necessary JavaScript for each page - **Browser Caching**: Aggressive caching of static assets and API responses - **CDN Distribution**: Global content delivery for reduced latency **Backend Performance**: - **Asynchronous Processing**: Non-blocking I/O for maximum concurrency - **Connection Pooling**: Efficient database connection management - **Retry Logic**: Intelligent retry mechanisms for transient failures - **Streaming Responses**: Immediate user feedback with progressive loading **Data Performance**: - **Search Indexing**: Optimized indices for fast document retrieval - **Vector Optimization**: Efficient similarity search and ranking - **Memory Management**: Smart caching and memory usage patterns - **TTL Optimization**: Automatic cleanup to prevent performance degradation **Infrastructure Performance**: - **Auto Scaling**: Dynamic resource allocation based on demand - **Load Balancing**: Optimal distribution of requests across instances - **Performance Monitoring**: Real-time metrics and alerting - **Alert Systems**: Proactive notification of performance issues ```mermaid graph LR subgraph "Frontend Optimization" SSR[Server-Side Rendering] CODE[Code Splitting] CACHE[Browser Caching] CDN[CDN Distribution] end subgraph "Backend Optimization" ASYNC[Async Processing] POOL[Connection Pooling] RETRY[Retry Logic] STREAM[Streaming Responses] end subgraph "Data Optimization" INDEX[Search Indexing] VECTOR[Vector Optimization] MEMORY[Memory Management] TTL_OPT[TTL Optimization] end subgraph "Infrastructure Optimization" SCALE[Auto Scaling] BALANCE[Load Balancing] MONITOR[Performance Monitoring] ALERT[Alert Systems] end SSR --> ASYNC CODE --> POOL CACHE --> RETRY CDN --> STREAM ASYNC --> INDEX POOL --> VECTOR RETRY --> MEMORY STREAM --> TTL_OPT INDEX --> SCALE VECTOR --> BALANCE MEMORY --> MONITOR TTL_OPT --> ALERT ``` ## Technology Stack The technology stack represents a carefully curated selection of modern, production-ready technologies that work together seamlessly to deliver a robust and scalable solution. ### Technology Selection Criteria - **Maturity**: Proven technologies with strong community support - **Performance**: Optimized for speed and efficiency - **Scalability**: Can grow with increasing demands - **Developer Experience**: Tools that enhance productivity and maintainability - **Ecosystem Integration**: Technologies that work well together ### Stack Components **Frontend Technologies**: - **Next.js 15**: Latest React framework with advanced features like App Router and Server Components - **React 19**: Modern React with concurrent features and improved performance - **TypeScript**: Type safety and better developer experience - **Tailwind CSS**: Utility-first CSS framework for rapid UI development - **assistant-ui**: Specialized components for AI chat interfaces **Backend Technologies**: - **FastAPI**: High-performance Python web framework with automatic API documentation - **Python 3.12+**: Latest Python with performance improvements and new features - **LangGraph v0.6+**: Advanced workflow orchestration for AI agents - **Pydantic**: Data validation and settings management - **asyncio**: Asynchronous programming for optimal concurrency **Data Technologies**: - **PostgreSQL**: Robust relational database for session storage - **psycopg3**: Modern PostgreSQL adapter with async support - **Azure AI Search**: Advanced search capabilities with hybrid and semantic search - **Vector Embeddings**: Semantic similarity search for improved relevance **Infrastructure Technologies**: - **Docker**: Containerization for consistent deployments - **Azure Cloud**: Comprehensive cloud platform with managed services - **Health Monitoring**: Built-in monitoring and alerting capabilities - **Structured Logging**: Comprehensive logging for debugging and monitoring ### Complete Technology Stack ```mermaid mindmap root((Technology Stack)) Frontend Next.js 15 React 19 TypeScript Tailwind CSS assistant-ui Backend FastAPI Python 3.12+ LangGraph v0.6+ Pydantic asyncio Memory PostgreSQL psycopg3 LangGraph Checkpointer Connection Pooling Search Azure AI Search Hybrid Search Vector Embeddings Semantic Ranking LLM OpenAI API Azure OpenAI Streaming Support Function Calling DevOps Docker Azure Cloud Health Monitoring Structured Logging ``` ## Conclusion This Agentic RAG system represents a comprehensive solution for manufacturing standards and regulations queries, featuring: ### Key Architectural Achievements - **Sophisticated Multi-Layer Architecture**: Clear separation of concerns with well-defined interfaces between frontend, API gateway, backend services, and data layers - **Advanced AI Capabilities**: LangGraph-powered multi-intent agents with intelligent routing and streaming responses - **Production-Ready Implementation**: Comprehensive error handling, monitoring, health checks, and graceful fallback mechanisms - **Modern Technology Stack**: Latest frameworks and best practices including Next.js 15, React 19, FastAPI, and LangGraph v0.6+ - **Scalable Design**: Architecture ready for enterprise-scale deployment with horizontal scaling capabilities ### System Benefits **For Users**: - Intelligent, context-aware responses to complex manufacturing standards queries - Real-time streaming with immediate feedback and progress visibility - Multi-language support with automatic browser detection - Persistent conversation history across sessions **For Developers**: - Clear, maintainable architecture with excellent documentation - Comprehensive testing framework with unit and integration tests - Configuration-driven deployment with environment flexibility - Modern development tools and practices **For Operations**: - Docker-based deployment for consistency across environments - Comprehensive monitoring and alerting capabilities - Graceful degradation and fault tolerance - Automated scaling and load balancing ### Design Excellence The system demonstrates several aspects of excellent software design: 1. **Modularity**: Each component has a single, well-defined responsibility 2. **Extensibility**: New agents, tools, and features can be added without breaking existing functionality 3. **Reliability**: Multiple layers of error handling and fallback mechanisms 4. **Performance**: Optimized for both latency and throughput with streaming responses 5. **Security**: Multi-layered security architecture following industry best practices 6. **Maintainability**: Clean code structure with comprehensive documentation and testing This architecture provides a solid foundation for current requirements while being flexible enough to accommodate future growth and enhancement. The system successfully combines the power of retrieval-augmented generation with intelligent agent orchestration to provide accurate, grounded, and cite-able responses to complex manufacturing standards queries.