catonline_ai/vw-agentic-rag/docs/CHANGELOG.md

# Changelog

## v1.2.8 - Enhanced Agentic Workflow and Citation Management Documentation - Thu Sep 12 2025

### 📋 **Documentation** *(Design Document Enhancement)*

**Enhanced the system design documentation with detailed coverage of Agentic Workflow features and advanced citation management capabilities.**

#### Changes Made:

**1. Agentic Workflow Features Enhancement**:
- **Enhanced**: Agentic Workflow Features Demonstrated section with comprehensive query rewriting/decomposition coverage
- **Added**: Detailed "Query Rewriting/Decomposition in Agentic Workflow" section highlighting core intelligence features
- **Added**: "Citation Management in Agentic Workflow" section documenting advanced citation capabilities
- **Updated**: Workflow diagrams to explicitly show query rewriting and citation processing flows

**2. Citation Management Documentation**:
- **Enhanced**: Citation tracking and management documentation with controllable citation lists and links
- **Added**: Detailed citation processing workflow with real-time capture and quality validation
- **Updated**: Tool system architecture to show query processing pipeline integration
- **Added**: Multi-round citation coherence and cross-tool citation integration documentation

**3. Technical Architecture Updates**:
- **Updated**: Sequence diagrams to show query rewriter components and parallel execution
- **Enhanced**: Tool system architecture with query processing strategies
- **Added**: Domain-specific intelligence documentation for different query types
- **Updated**: Cross-agent learning documentation with advanced agentic intelligence features

**4. Design Principles Refinement**:
- **Updated**: Core feature list to highlight controllable citation management
- **Enhanced**: Query processing integration documentation
- **Added**: Strategic citation assignment and post-processing enhancement details
- **Updated**: System benefits documentation to reflect enhanced capabilities

---

## v1.2.7 - Comprehensive System Design Documentation - Tue Sep 10 2025

### 📋 **Documentation** *(System Architecture & Design Documentation)*

**Created comprehensive system design documentation with detailed architectural diagrams and design explanations.**

#### Changes Made:

**1. System Design Document Creation**:
- **Created**: `docs/design.md` - Complete architectural design documentation
- **Architecture Diagrams**: 15+ mermaid diagrams covering all system aspects
- **Design Explanations**: Detailed design principles and implementation rationale
- **Comprehensive Coverage**: All system layers from frontend to infrastructure

**2. Architecture Documentation**:
- **High-Level Architecture**: Multi-layer system overview with component relationships
- **Component Architecture**: Detailed breakdown of frontend, backend, and agent components
- **Workflow Design**: Multi-intent agent workflows and two-phase retrieval strategy
- **Data Flow Architecture**: Request-response flows and streaming data patterns

**3. Feature & System Documentation**:
- **Feature Architecture**: Core capabilities and tool system design
- **Memory Management**: PostgreSQL-based session persistence architecture
- **Configuration Architecture**: Layered configuration management approach
- **Security Architecture**: Multi-layered security implementation

**4. Deployment & Performance Documentation**:
- **Deployment Architecture**: Production deployment patterns and container architecture
- **Performance Architecture**: Optimization strategies across all system layers
- **Technology Stack**: Complete technology selection rationale and integration
- **Future Enhancements**: Roadmap and enhancement strategy

#### Documentation Features:

**Visual Architecture**:
- **15+ Mermaid Diagrams**: Comprehensive visual representation of system architecture
- **Component Relationships**: Clear visualization of component interactions
- **Data Flow Patterns**: Detailed request-response and streaming flow diagrams
- **Deployment Topology**: Production deployment and scaling architecture

**Design Explanations**:
- **Design Philosophy**: Core principles driving architectural decisions
- **Implementation Rationale**: Detailed explanation of design choices
- **Best Practices**: Production-ready patterns and recommendations
- **Performance Considerations**: Optimization strategies and trade-offs

**Comprehensive Coverage**:
- **Frontend Architecture**: Next.js, React, and assistant-ui integration
- **Backend Architecture**: FastAPI, LangGraph, and agent orchestration
- **Data Architecture**: PostgreSQL memory, Azure AI Search, and LLM integration
- **Infrastructure Architecture**: Cloud deployment, security, and monitoring

#### Technical Documentation:

**System Layers Documented**:
```
- Frontend Layer: Next.js Web UI, Thread Components, Tool UIs
- API Gateway Layer: Next.js API Routes, Data Stream Protocol
- Backend Service Layer: FastAPI Server, AI SDK Adapter, SSE Controller
- Agent Orchestration Layer: LangGraph Workflow, Intent Recognition, Agents
- Memory Layer: PostgreSQL Session Store, Checkpointer, Memory Manager
- Retrieval Layer: Azure AI Search, Embedding Service, Search Indices
- LLM Layer: LLM Provider, Configuration Management
```

**Key Architectural Patterns**:
- **Multi-Intent Agent System**: Intent recognition and specialized agent routing
- **Two-Phase Retrieval**: Metadata discovery followed by content retrieval
- **Streaming Architecture**: Real-time SSE with tool progress tracking
- **Session Memory**: PostgreSQL-based persistent conversation history
- **Tool System**: Modular, composable retrieval and analysis tools

#### Benefits:

**For Development Team**:
- **Clear Architecture Understanding**: Complete system overview for new team members
- **Design Rationale**: Understanding of architectural decisions and trade-offs
- **Implementation Guidance**: Best practices and patterns for future development
- **Maintenance Support**: Clear documentation for troubleshooting and updates

**For System Architecture**:
- **Documentation Standards**: Establishes pattern for future architectural documentation
- **Design Consistency**: Ensures architectural decisions align with documented principles
- **Knowledge Preservation**: Captures institutional knowledge about system design
- **Future Planning**: Provides foundation for system evolution and enhancement

**For Operations**:
- **Deployment Understanding**: Clear view of production architecture and dependencies
- **Troubleshooting Guide**: Architectural context for debugging and issue resolution
- **Scaling Guidance**: Understanding of system scaling patterns and limitations
- **Security Overview**: Complete security architecture and implementation details

#### File Structure:
```
docs/
├── design.md              # Comprehensive system design document (NEW)
├── CHANGELOG.md           # This changelog with design documentation entry
├── deployment.md          # Deployment-specific guidance
├── development.md         # Development setup and guidelines
└── testing.md            # Testing strategies and procedures
```

#### Next Steps:
- **Living Documentation**: Keep design document updated with system changes
- **Architecture Reviews**: Use document as reference for architectural decisions
- **Onboarding**: Include design document in new developer onboarding process
- **Documentation Standards**: Apply similar documentation patterns to other system aspects

---

## v1.2.6 - GPT-5 Model Integration and Prompt Template Refinement - Mon Sep 9 2025

### 🚀 **Major Update** *(Model Integration & Enhanced Agent Capabilities)*

**Integrated GPT-5 Chat model with refined prompt templates for improved reasoning and tool coordination.**

#### Changes Made:

**1. GPT-5 Model Integration**:
- **Model Upgrade**: Switched from GPT-4o to `gpt-5-chat` deployment
- **Azure Endpoint**: Updated to `aihubeus21512504059.cognitiveservices.azure.com`
- **API Version**: Upgraded to `2024-12-01-preview` for latest capabilities
- **Enhanced Reasoning**: Leveraging GPT-5's improved reasoning for complex multi-step retrieval

**2. Prompt Template Optimization for GPT-5**:
- **Tool Coordination**: Enhanced instructions for better parallel tool execution
- **Context Management**: Optimized for GPT-5's extended context handling capabilities
- **Reasoning Chain**: Improved workflow instructions leveraging advanced reasoning abilities

**3. Agent System Refinements**:
- **Phase Detection**: Better triggering conditions for Phase 2 document content retrieval
- **Query Rewriting**: Enhanced sub-query generation strategies optimized for GPT-5
- **Citation Accuracy**: Improved metadata tracking and source verification

#### Technical Implementation:

**Updated [`config.yaml`](config.yaml)**:
```yaml
azure:
  base_url: https://aihubeus21512504059.cognitiveservices.azure.com/
  api_key: 277a2631cf224647b2a56f311bd57741
  api_version: 2024-12-01-preview
  deployment: gpt-5-chat
```

**Enhanced [`llm_prompt.yaml`](llm_prompt.yaml)** - Phase 2 Triggers:
```yaml
# Phase 2: Document Content Detailed Retrieval
- **When to execute**: execute Phase 2 if the user asks about:
  - "How to..." / "如何..." (procedures, methods, steps)
  - Testing methods / 测试方法
  - Requirements / 要求
  - Technical details / 技术细节
  - Implementation guidance / 实施指导
  - Specific content within standards/regulations
```

**Tool Coordination Instructions**:
```yaml
# Parallel Retrieval Tool Call:
- Use each rewritten sub-query to call retrieval tools **in parallel**
- This maximizes coverage and ensures comprehensive information gathering
```

#### Key Features:

**GPT-5 Enhanced Capabilities**:
- **Advanced Reasoning**: Better understanding of complex technical queries
- **Improved Tool Coordination**: More efficient parallel tool execution planning
- **Enhanced Context Synthesis**: Better integration of multi-source information
- **Precise Citation Generation**: More accurate source tracking and reference mapping

**Optimized Retrieval Strategy**:
- **Smart Phase Detection**: GPT-5 better determines when detailed content retrieval is needed
- **Context-Aware Queries**: More sophisticated query rewriting based on conversation context
- **Cross-Reference Validation**: Enhanced ability to verify information across multiple sources

**Enhanced User Experience**:
- **Faster Response**: More efficient tool coordination reduces overall response time
- **Higher Accuracy**: Improved reasoning leads to more precise answers
- **Better Coverage**: Enhanced query strategies maximize information discovery

#### Performance Improvements:
- **Tool Efficiency**: Better parallel execution planning reduces redundant calls
- **Context Utilization**: Enhanced ability to maintain context across tool rounds
- **Quality Assurance**: Improved verification and synthesis of retrieved information

#### Migration Notes:
- **Seamless Upgrade**: No breaking changes to existing API or user interfaces
- **Backward Compatibility**: Existing conversation histories remain compatible
- **Enhanced Responses**: Users will notice improved response quality and accuracy
- **Tool Round Optimization**: GPT-5's reasoning works optimally with configured tool round limits

---

## v1.2.5 - Enhanced Multi-Phase Retrieval and Tool Round Optimization - Thu Sep 5 2025

### 🔧 **Enhancement** *(Agent System Prompt & Retrieval Strategy)*

**Optimized retrieval workflow with explicit parallel tool calling strategy and enhanced multi-language query coverage.**

#### Changes Made:

**1. Enhanced Multi-Phase Retrieval Strategy**:
- **Phase 1 - Metadata Discovery**: Added explicit "2-3 parallel rewritten queries" strategy for standards/regulations metadata discovery
- **Phase 2 - Document Content**: Refined detailed retrieval with "2-3 parallel rewritten queries with different content focus"
- **Cross-Language Coverage**: Mandatory inclusion of both Chinese and English query variants for comprehensive search coverage

**2. Parallel Tool Calling Optimization**:
- **Query Strategy Specification**: Clear guidance on generating 2-3 distinct parallel sub-queries per retrieval phase
- **Azure AI Search Optimization**: Enhanced for Hybrid Search (keyword + vector search) with specific terminology and synonyms
- **Tool Calling Efficiency**: Explicit instruction to execute rewritten sub-queries in parallel for maximum coverage

**3. Intent Classification Improvements**:
- **Standard_Regulation_RAG**: Enhanced examples covering content, scope, testing methods, and technical details
- **User_Manual_RAG**: Comprehensive coverage of CATOnline system usage, TRRC processes, and administrative functions
- **Clearer Boundaries**: Better distinction between technical content queries vs system usage queries

**4. User Manual Prompt Refinement**:
- **Evidence-Based Only**: Strengthened directive for 100% grounded responses from user manual content
- **Visual Integration**: Enhanced screenshot embedding requirements with strict formatting templates
- **Context Disambiguation**: Added role-based function differentiation (User vs Administrator)

#### Technical Implementation:

**Updated [`llm_prompt.yaml`](llm_prompt.yaml)** - Agent System Prompt:
```yaml
# Query Optimization & Parallel Retrieval Tool Calling
* Sub-queries Rewriting:
  - Generate 2-3(mostly 2) distinct rewritten sub-queries
  - If user's query is in Chinese, include 1 rewritten sub-query in English
  - If user's query is in English, include 1 rewritten sub-query in Chinese

* Parallel Retrieval Tool Call:
  - Use each rewritten sub-query to call retrieval tools **in parallel**
  - This maximizes coverage and ensures comprehensive information gathering
```

**Enhanced Intent Classification**:
```yaml
# Standard_Regulation_RAG Examples:
- "What regulations relate to intelligent driving?"
- "How do you test the safety of electric vehicles?"
- "What are the main points of GB/T 34567-2023?"

# User_Manual_RAG Examples:
- What is CATOnline (the system)/TRRC/TRRC processes
- How to search for standards, regulations, TRRC news and deliverables
- User management, system configuration, administrative functionalities
```

**User Manual Prompt Template**:
```yaml
Step Template:
Step N: <Action / Instruction from manual>
(Optional short clarification from manual)

![Screenshot: <concise caption>](<image_url_or_placeholder>)

Notes: <business rules / warnings from manual>
```

#### Key Features:

**Multi-Phase Retrieval Workflow**:
- **Round 1**: Parallel metadata discovery with 2-3 optimized queries
- **Round 2**: Focused document content retrieval based on Round 1 insights
- **Round 3+**: Additional targeted retrieval for remaining gaps

**Cross-Language Query Strategy**:
- **Automatic Translation**: Chinese queries include English variants, English queries include Chinese variants
- **Terminology Optimization**: Technical terms, acronyms, and domain-specific language inclusion
- **Azure AI Search Enhancement**: Optimized for hybrid keyword + vector search capabilities

**Enhanced Citation System**:
- **Metadata Tracking**: Precise @tool_call_id and @order_num mapping
- **CSV Format**: Structured citations mapping in HTML comments
- **Source Verification**: Cross-referencing across multiple retrieval results

#### Benefits:
- **Coverage**: Parallel queries with cross-language variants maximize information discovery
- **Efficiency**: Strategic tool calling reduces unnecessary rounds while ensuring thoroughness
- **Accuracy**: Enhanced intent classification improves routing to appropriate RAG systems
- **User Experience**: Better visual integration in user manual responses with mandatory screenshots
- **Consistency**: Standardized formatting templates across all response types

#### Migration Notes:
- Enhanced prompt templates automatically improve response quality
- No breaking changes to existing API or user interfaces
- Cross-language query strategy improves search coverage for multilingual content
- Tool round limits (max_tool_rounds: 4, max_tool_rounds_user_manual: 2) work optimally with new parallel strategy

---

## v1.2.4 - Intent Classification Reference Consolidation - Wed Sep 4 2025

### 🔧 **Enhancement** *(Intent Classification Documentation)*

**Consolidated and enhanced UserManual intent classification examples by merging reference files.**

#### Changes Made:
- **Reference File Consolidation**: Merged UserManual examples from `intent-ref-1.txt` into `intent-ref-2.txt`
- **Enhanced Coverage**: Added more comprehensive use cases for UserManual intent classification
- **Improved Clarity**: Better organized examples to help with accurate intent recognition

#### Technical Implementation:

**Updated `.vibe/ref/intent-ref-2.txt`**:
- **Added from intent-ref-1.txt**:
  - What is CATOnline (the system), TRRC, TRRC processes
  - How to search for standards, regulations, TRRC news and deliverables in the system
  - How to create and update standards, regulations and their documents
  - How to download or export data
  - How to do administrative functionalities
  - Other questions about this (CatOnline) system's functions, or user guide

- **Preserved existing examples**:
  - Questions directly about CatOnline functions or features
  - TRRC-related processes/standards/regulations as implemented in CatOnline
  - How to manage/search/download documents in the system
  - User management or system configuration within CatOnline
  - Use of admin features or data export in CatOnline

#### Categories Covered:
1. **System Introduction**: CATOnline system, TRRC concepts
2. **Search Functions**: Standards, regulations, TRRC news and deliverables search
3. **Document Management**: Create, update, manage, download documents
4. **System Configuration**: User management, system settings
5. **Administrative Functions**: Admin features, data export
6. **General Help**: System functions, user guides

#### Benefits:
- **Accuracy**: More comprehensive examples improve intent classification precision
- **Coverage**: Better coverage of UserManual use cases
- **Consistency**: Unified reference documentation for intent classification
- **Maintainability**: Single consolidated reference file easier to maintain

## v1.2.3 - User Manual Screenshot Format Clarification - Tue Sep 3 2025

### 🔧 **Enhancement** *(User Manual Prompt Refinement)*

**Added explicit clarification about UI screenshot embedding format in user manual responses.**

#### Changes Made:
- **Screenshot Format Guidance**: Added specific instruction about how UI screenshots should be embedded
- **Format Specification**: Clarified that operational UI screenshots are typically embedded in explanatory text using markdown image format

#### Technical Implementation:

**Updated `llm_prompt.yaml` - User Manual Prompt**:
```yaml
- **Visuals First**: ALWAYS include screenshots for explaining features or procedures. Every instructional step must be immediately followed by its screenshot on a new line.
  - **Screenshot Format**: 操作步骤的相关UI截图通常会以markdown图片格式嵌入到说明文字中
```

#### Benefits:
- **Clarity**: AI assistant now has explicit guidance on screenshot embedding format
- **Consistency**: Ensures uniform approach to including UI screenshots in responses
- **User Experience**: Improves the formatting and presentation of instructional content

## v1.2.2 - Prompt Enhancement for Knowledge Boundary Control - Tue Sep 3 2025

### 🔧 **Enhancement** *(LLM Prompt Optimization)*

**Enhanced LLM prompts to strictly prevent model from outputting general knowledge when retrieval yields insufficient results.**

#### Problem Addressed:
- AI assistant was outputting model's built-in general knowledge about topics when specific information wasn't found in retrieval
- Users received generic information about systems/concepts instead of clear "information not available" responses
- Example: When asked about "CATOnline system", AI would provide general CAT (Computer-Assisted Testing) information from its training data

#### Solution Implemented:
- **Enhanced Agent System Prompt**: Added explicit "NO GENERAL KNOWLEDGE" directive
- **Enhanced User Manual Prompt**: Added similar strict knowledge boundary controls
- **Improved Fallback Messages**: Standardized response template for insufficient information scenarios
- **Multiple Reinforcement**: Added the restriction in multiple sections for emphasis

#### Technical Changes:

**Enhanced `llm_prompt.yaml`**:
- Added **"Critical: NO GENERAL KNOWLEDGE"** instruction in agent system prompt
- Enhanced fallback response template: "The system does not contain specific information about [specific topic/feature searched for]."
- Added similar controls in user manual prompt with template: "The user manual does not contain specific information about [specific topic/feature you searched for]."
- Reinforced the restriction in multiple workflow sections

#### Key Prompt Updates:

**Agent System Prompt**:
```yaml
* **Critical: NO GENERAL KNOWLEDGE**: If retrieval yields insufficient or no relevant results, **do not provide any general knowledge or assumptions**. Instead, clearly state "The system does not contain specific information about [specific topic/feature searched for]." and suggest how the user might reformulate their query.
```

**User Manual Prompt**:
```yaml
- **NO GENERAL KNOWLEDGE**: When retrieved content is insufficient, do NOT provide any general knowledge about systems, software, or common practices. State clearly: "The user manual does not contain specific information about [specific topic/feature you searched for]."
```

#### Benefits:
- **Accuracy**: Eliminates confusion from generic information
- **Transparency**: Users clearly understand when information is not available in the system
- **Trust**: Builds user confidence in system's knowledge boundaries
- **Guidance**: Provides clear direction for reformulating queries

#### Testing:
- Verified all prompt sections contain the new "NO GENERAL KNOWLEDGE" instructions
- Confirmed fallback message templates are properly implemented
- Tested that both agent and user manual prompts include the restrictions

## v1.2.1 - Retrieval Module Refactoring and Optimization - Mon Sep 2 2025

### 🔧 **Refactoring** *(Retrieval Module Structure Optimization)*

**Refactored retrieval module structure and optimized normalize_search_result function for better maintainability and performance.**

#### Key Changes:
- **File Renaming**: `service/retrieval/agentic_retrieval.py` → `service/retrieval/retrieval.py` for clearer naming
- **Function Optimization**: Simplified `normalize_search_result` by removing unnecessary `include_content` parameter
- **Logic Consolidation**: Moved result normalization to `search_azure_ai` method to eliminate redundancy
- **Import Updates**: Updated all references across the codebase to use the new module name

#### Technical Implementation:
- **Simplified normalize_search_result**:
  - Removed `include_content` parameter (content is now always preserved)
  - Function now focuses solely on cleaning search results and removing empty fields
  - Eliminates the need for conditional content handling

- **Optimized Result Processing**:
  - `normalize_search_result` is now called directly in `search_azure_ai` method
  - Removed duplicate field removal logic between `search_azure_ai` and `normalize_search_result`
  - Cleaner separation of concerns

- **Updated File References**:
  - `service/graph/tools.py`
  - `service/graph/user_manual_tools.py`
  - `tests/unit/test_retrieval.py`
  - `tests/unit/test_user_manual_tool.py`
  - `tests/conftest.py`
  - `scripts/debug_user_manual_retrieval.py`
  - `scripts/final_verification.py`

#### Benefits:
- **Cleaner Code**: Eliminated redundant logic and simplified function signatures
- **Better Performance**: Single point of result normalization reduces processing overhead
- **Improved Maintainability**: Clearer module naming and consolidated logic
- **Consistent Behavior**: Content is always preserved, eliminating conditional handling complexity

#### Testing:
- Updated all test cases to match new function signatures
- Verified that all retrieval functionality works correctly
- Confirmed that result normalization properly removes unwanted fields while preserving content

## v1.2.0 - Azure AI Search Direct Integration - Wed Sep 2 2025

### ⚡ **Major Enhancement** *(Direct Azure AI Search Integration)*

**Replaced intermediate retrieval service with direct Azure AI Search REST API calls for improved performance and better control.**

#### Key Changes:
- **Direct Azure AI Search Integration**: Eliminated dependency on intermediate retrieval service, now calling Azure AI Search REST API directly
- **Hybrid Search with Semantic Ranking**: Implemented proper hybrid search combining text search + vector search with semantic ranking
- **Enhanced Result Processing**: Added automatic filtering by `@search.rerankerScore` threshold and `@order_num` field injection
- **Improved Configuration**: Extended config structure to support embedding service, API versions, and semantic configuration

#### Technical Implementation:
- **New Config Structure**: Added `EmbeddingConfig`, `IndexConfig` to support embedding generation and Azure Search parameters
- **Vector Query Support**: Implemented proper vector queries with field-specific targeting:
  - `retrieve_standard_regulation`: `full_metadata_vector`
  - `retrieve_doc_chunk_standard_regulation`: `contentVector,full_metadata_vector`
  - `retrieve_doc_chunk_user_manual`: `contentVector`
- **Result Filtering**: Automatic removal of Azure Search metadata fields (`@search.score`, `@search.rerankerScore`, `@search.captions`)
- **Order Numbering**: Added `@order_num` field to track result ranking order
- **Score Threshold Filtering**: Filter results by reranker score threshold for quality control

#### Configuration Updates:
```yaml
retrieval:
  endpoint: "https://search-endpoint.search.azure.cn"
  api_key: "search-api-key"
  api_version: "2024-11-01-preview"
  semantic_configuration: "default"
  embedding:
    base_url: "http://embedding-service/v1-openai"
    api_key: "embedding-api-key"
    model: "qwen3-embedding-8b"
    dimension: 4096
  index:
    standard_regulation_index: "index-name-1"
    chunk_index: "index-name-2"
    chunk_user_manual_index: "index-name-3"
```

#### Benefits:
- **Performance**: Eliminated intermediate service latency
- **Control**: Direct control over search parameters and result processing
- **Reliability**: Reduced dependencies and potential points of failure
- **Feature Support**: Full access to Azure AI Search capabilities including semantic ranking

#### Testing:
- Updated unit tests to work with new Azure AI Search implementation
- Verified hybrid search functionality with real Azure AI Search endpoints
- Confirmed proper result filtering and ordering

## v1.1.9 - Intent Recognition Structured Output Compatibility Fix - Mon Sep 2 2025

### 🔧 **Bug Fix** *(Intent Recognition Compatibility)*

**Fixed intent recognition error for models that don't support OpenAI's structured output format (json_schema).**

#### Problem Addressed:
- Intent recognition failed with error: "Invalid parameter: 'response_format' of type 'json_schema' is not supported with this model"
- DeepSeek and other non-OpenAI models don't support OpenAI's structured output feature
- System would default to Standard_Regulation_RAG but log errors continuously

#### Root Cause:
- `intent_recognition_node` used `llm_client.llm.with_structured_output(Intent)` which automatically adds `json_schema` response_format
- This feature is specific to OpenAI GPT models and not supported by DeepSeek, Claude, or other model providers

#### Solution:
- **Removed structured output dependency**: Replaced `with_structured_output()` with standard LLM calls
- **Enhanced text parsing**: Added robust response parsing to extract intent labels from text responses
- **Improved prompt engineering**: Added explicit output format instructions to system prompt
- **Enhanced error handling**: Better handling of different response content types (string/list)

#### Technical Changes:

**Modified**: `service/graph/intent_recognition.py`
```python
# Before (broken with non-OpenAI models):
intent_llm = llm_client.llm.with_structured_output(Intent)
intent_result = await intent_llm.ainvoke([SystemMessage(content=system_prompt)])

# After (compatible with all models):
system_prompt = intent_prompt_template.format(...) +
    "\n\nIMPORTANT: You must respond with ONLY one of these two exact labels: " +
    "'Standard_Regulation_RAG' or 'User_Manual_RAG'. Do not include any other text."

intent_result = await llm_client.llm.ainvoke([SystemMessage(content=system_prompt)])

# Enhanced response parsing
if isinstance(intent_result.content, str):
    response_text = intent_result.content.strip()
elif isinstance(intent_result.content, list):
    response_text = " ".join([str(item) for item in intent_result.content
                             if isinstance(item, str)]).strip()
```

#### Key Improvements:

**Model Compatibility**:
- Works with all LLM providers (OpenAI, Azure OpenAI, DeepSeek, Claude, etc.)
- No dependency on provider-specific features
- Maintains accuracy through enhanced prompt engineering

**Error Resolution**:
- Eliminated "json_schema not supported" errors
- Improved system reliability and user experience
- Maintained intent classification accuracy

**Robustness**:
- Better handling of different response formats
- Fallback mechanisms for unparseable responses
- Enhanced logging for debugging

#### Testing:
- ✅ Standard regulation queries correctly classified as `Standard_Regulation_RAG`
- ✅ User manual queries correctly classified as `User_Manual_RAG`
- ✅ Compatible with DeepSeek, Azure OpenAI, and other model providers
- ✅ No more structured output errors in logs

---

## v1.1.8 - User Manual Prompt Anti-Hallucination Enhancement - Sun Sep 1 2025

### 🧠 **Prompt Engineering Enhancement** *(User Manual Anti-Hallucination)*

**Enhanced the user_manual_prompt to reduce hallucinations by adopting grounded response principles from agent_system_prompt.**

#### Problem Addressed:
- User manual assistant could speculate about undocumented system features
- Inconsistent handling of missing information compared to main agent prompt
- Less structured approach to failing gracefully when manual information was insufficient
- Potential for inferring functionality not explicitly documented in user manuals

#### Solution:
- **Grounded Response Principles**: Adopted evidence-based response requirements from agent_system_prompt
- **Enhanced Fail-Safe Mechanisms**: Implemented comprehensive "No-Answer with Suggestions" framework
- **Explicit Anti-Speculation**: Added clear prohibitions against guessing or inferring undocumented features
- **Consistent Evidence Requirements**: Aligned with main agent prompt's evidence standards

#### Technical Changes:

**Modified**: `llm_prompt.yaml` - `user_manual_prompt`
```yaml
# Enhanced Core Directives
- **Answer with evidence** from retrieved user manual sources; avoid speculation.
  Never guess or infer functionality not explicitly documented.
- **Fail gracefully**: if retrieval yields insufficient or no relevant results,
  **do not guess**—produce a clear *No-Answer with Suggestions* section.

# Enhanced Workflow - Verify & Synthesize
- Cross-check all retrieved information for consistency.
- Only include information supported by retrieved user manual evidence.
- If evidence is insufficient, follow the *No-Answer with Suggestions* approach.

# Added No-Answer Framework
When retrieved user manual content is insufficient:
- State clearly what specific information is missing
- Do not guess or provide information not explicitly found
- Provide constructive next steps and alternative approaches
```

#### Key Improvements:

**Evidence Requirements**:
- Enhanced from basic "Evidence-Based Only" to comprehensive evidence validation
- Added explicit prohibition against speculation and inference
- Aligned with agent_system_prompt's grounded response standards

**Graceful Failure Handling**:
- Upgraded from simple "state it clearly" to structured "No-Answer with Suggestions"
- Provides specific guidance for reformulating queries
- Offers constructive next steps when information is missing

**Anti-Hallucination Measures**:
- ✅ Grounded responses principle
- ✅ No speculation directive
- ✅ Explicit no-guessing rule
- ✅ Evidence-only responses
- ✅ Constructive suggestions framework

#### Consistency Achievement:
- **Unified Approach**: Same evidence standards across agent_system_prompt and user_manual_prompt
- **Standardized Failure Handling**: Consistent "No-Answer with Suggestions" methodology
- **Preserved Specialization**: Maintained user manual specific features (screenshots, step-by-step format)

#### Files Added:
- `docs/topics/USER_MANUAL_PROMPT_ANTI_HALLUCINATION.md` - Detailed technical documentation
- `scripts/test_user_manual_prompt_improvements.py` - Comprehensive validation test suite

#### Expected Benefits:
- **Reduced Hallucinations**: No speculation about undocumented CATOnline features
- **Improved Reliability**: More accurate step-by-step instructions based only on manual content
- **Better User Guidance**: Structured suggestions when manual information is incomplete
- **System Consistency**: Unified anti-hallucination approach across all prompt types

---

## v1.1.7 - GPT-5 Mini Temperature Parameter Fix - Sun Sep 1 2025

### 🔧 **LLM Compatibility Fix** *(GPT-5 Mini Temperature Support)*

**Fixed temperature parameter handling to support GPT-5 mini model which only accepts default temperature values.**

#### Problem Solved:
- GPT-5 mini model rejected requests with explicit `temperature` parameter (e.g., 0.0, 0.2)
- Error: "Unsupported value: 'temperature' does not support 0.0 with this model. Only the default (1) value is supported."
- System always passed temperature even when commented out in configuration

#### Solution:
- **Conditional parameter passing**: Only include `temperature` in LLM requests when explicitly set in configuration
- **Optional configuration**: Changed temperature from required to optional in both new and legacy config classes
- **Model default usage**: When temperature not specified, model uses its own default value

#### Technical Changes:

**Modified**: `service/config.py`
```python
# Changed temperature from required to optional
class LLMParametersConfig(BaseModel):
    temperature: Optional[float] = None  # Was: float = 0

class LLMRagConfig(BaseModel):
    temperature: Optional[float] = None  # Was: float = 0.2

# Only include temperature in config when explicitly set
def get_llm_config(self) -> Dict[str, Any]:
    if self.llm_prompt.parameters.temperature is not None:
        base_config["temperature"] = self.llm_prompt.parameters.temperature
```

**Modified**: `service/llm_client.py`
```python
# Only pass temperature parameter when present in config
def _create_llm(self):
    params = {
        "base_url": llm_config["base_url"],
        "api_key": llm_config["api_key"],
        "model": llm_config["model"],
        "streaming": True,
    }
    # Only add temperature if explicitly set
    if "temperature" in llm_config:
        params["temperature"] = llm_config["temperature"]
    return ChatOpenAI(**params)
```

#### Configuration Examples:

**No Temperature (Uses Model Default)**:
```yaml
# llm_prompt.yaml
parameters:
  # temperature: 0  # Commented out - model uses default
  max_context_length: 100000
```

**Explicit Temperature**:
```yaml
# llm_prompt.yaml
parameters:
  temperature: 0.7  # Will be passed to model
  max_context_length: 100000
```

#### Backward Compatibility:
- ✅ Existing configurations continue to work
- ✅ Legacy `config.yaml` LLM settings still supported
- ✅ No breaking changes when temperature is explicitly set

#### Files Added:
- `docs/topics/GPT5_MINI_TEMPERATURE_FIX.md` - Detailed technical documentation
- `scripts/test_temperature_fix.py` - Comprehensive test suite

---

## v1.1.6 - Enhanced I18n Multi-Language Support - Sat Aug 31 2025

### 🌐 **Internationalization Enhancement** *(I18n Multi-Language Support)*

**Added comprehensive internationalization (i18n) support for Chinese and English languages across the web interface.**

---

## v1.1.5 - Aggressive Tool Call History Trimming - Sat Aug 31 2025

### 🚀 **Enhanced Token Optimization** *(Aggressive Trimming Strategy)*

**Modified trimming strategy to proactively clean historical tool call results regardless of token count, while protecting current conversation turn's tool calls.**

#### New Behavior:
- **Always trim when multiple tool rounds exist** - regardless of total token count
- **Preserve current conversation turn's tool calls** - never trim active tool execution results
- **Remove historical tool call results** - from previous conversation turns to minimize context pollution

#### Why This Change:
- Historical tool call results accumulate quickly in conversation history
- Large retrieval results consume significant tokens even when total context is manageable
- Proactive trimming prevents context bloat before hitting token limits
- Current tool calls must remain intact for proper agent workflow

#### Technical Implementation:

**Modified**: `service/graph/message_trimmer.py`
- **Enhanced `should_trim()`**: Now triggers when detecting multiple tool rounds (>1), not just on token limit
- **Preserved Strategy**: `_optimize_multi_round_tool_calls()` continues to keep only the most recent tool round
- **Current Turn Protection**: Agent workflow ensures current turn's tool calls are never trimmed during execution

#### Impact:
- **Proactive Cleanup**: Tool call history cleaned before reaching token limits
- **Context Quality**: Conversation stays focused on recent, relevant context
- **Workflow Protection**: Current tool execution results always preserved
- **Token Efficiency**: Maintains optimal token usage across conversation lifetime

---

## v1.1.4 - Multi-Round Tool Call Token Optimization - Sat Aug 31 2025

### 🚀 **Performance Enhancement** *(Token Optimization)*

**Implemented intelligent token optimization for multi-round tool calling scenarios to significantly reduce LLM context usage.**

#### Problem Solved:
- In multi-round tool calling scenarios, previous rounds' tool call results (ToolMessage) were consuming excessive tokens
- Large JSON responses from retrieval tools accumulated in conversation history
- Token usage could exceed LLM context limits, causing API failures

#### Key Features:

1. **Multi-Round Tool Call Detection**:
   - Automatically identifies tool calling rounds in conversation history
   - Recognizes patterns of AI messages with tool_calls followed by ToolMessage responses

2. **Intelligent Message Optimization**:
   - Preserves system messages and original user queries
   - Keeps only the most recent tool calling round for context continuity
   - Removes older ToolMessage content that typically contains large response data

3. **Token Usage Reduction**:
   - Achieves 60-80% reduction in token usage for multi-round scenarios
   - Maintains conversation quality while respecting LLM context constraints
   - Prevents API failures due to context length overflow

#### Technical Implementation:

- **File**: `service/graph/message_trimmer.py`
- **New Methods**:
  - `_optimize_multi_round_tool_calls()` - Core optimization logic
  - `_identify_tool_rounds()` - Tool round pattern recognition
  - Enhanced `trim_conversation_history()` - Integrated optimization workflow

#### Test Results:
- **Message Reduction**: 60% fewer messages in multi-round scenarios
- **Token Savings**: 70-80% reduction in token consumption
- **Context Preservation**: Maintains conversation flow and quality

#### Configuration:
```yaml
parameters:
  max_context_length: 96000  # Configurable context length
  # Optimization automatically applies when multiple tool rounds detected
```

#### Benefits:
- **Cost Efficiency**: Significant reduction in LLM API costs
- **Reliability**: Prevents context overflow errors
- **Performance**: Faster processing with smaller context windows
- **Scalability**: Supports longer multi-round conversations

#### Files Modified:
- `service/graph/message_trimmer.py`
- `tests/unit/test_message_trimmer.py`
- `docs/topics/MULTI_ROUND_TOKEN_OPTIMIZATION.md`
- `docs/CHANGELOG.md`

---

## v1.1.3 - UI Text Update - Fri Aug 30 2025

### ✏️ **Content Update** *(UI Improvement)*

**Updated the example questions in the frontend UI.**

#### Changes Made:

-   Modified the third and fourth example questions in both Chinese and English in `web/src/utils/i18n.ts` to be more relevant to user needs.
    -   **Chinese**:
        -   `根据标准，如何测试电动汽车充电功能的兼容性`
        -   `如何注册申请CATOnline权限？`
    -   **English**:
        -   `According to the standard, how to test the compatibility of electric vehicle charging function?`
        -   `How to register for CATOnline access?`

#### Benefits:

-   Provides users with more practical and common question examples.
-   Improves user experience by guiding them to ask more effective questions.

#### Files Modified:
- `web/src/utils/i18n.ts`
- `docs/CHANGELOG.md`

## v1.1.2 - Prompt Optimization - Fri Aug 30 2025

### 🚀 **Prompt Optimization** *(Prompt Engineering)*

**Optimized and compressed `intent_recognition_prompt` and `user_manual_prompt` in `llm_prompt.yaml`.**

#### Changes Made:

1.  **`intent_recognition_prompt`**:
    *   Condensed background information into key bullet points.
    *   Refined classification descriptions for clarity.
    *   Simplified classification guidelines with keyword hints for better decision-making.

2.  **`user_manual_prompt`**:
    *   Elevated key instructions to **Core Directives** for emphasis.
    *   Streamlined the workflow description.
    *   Made the **Response Formatting** rules more stringent, especially regarding screenshots.
    *   Retained the crucial **Context Disambiguation** section.

#### Benefits:

-   **Efficiency**: More compact prompts for faster processing.
-   **Reliability**: Clearer and more direct instructions reduce the likelihood of incorrect outputs.
-   **Maintainability**: Improved structure makes the prompts easier to read and update.

#### Files Modified:
- `llm_prompt.yaml`
- `docs/CHANGELOG.md`

## v1.1.1 - User Manual Tool Rounds Configuration - Fri Aug 29 2025

### 🔧 **Configuration Enhancement** *(Configuration Update)*

**Added Independent Tool Rounds Configuration for User Manual RAG**

#### Changes Made:

1. **Configuration Structure**
   - Added `max_tool_rounds_user_manual: 3` to `config.yaml`
   - Separated user manual agent tool rounds from main agent configuration
   - Maintained backward compatibility with existing configuration

2. **Code Updates**
   - Updated `AppConfig` class in `service/config.py` to include `max_tool_rounds_user_manual` field
   - Added `max_tool_rounds_user_manual` to `AgentState` in `service/graph/state.py`
   - Modified `service/graph/user_manual_rag.py` to use separate configuration
   - Updated graph initialization in `service/graph/graph.py` to include new config

3. **Prompt System Updates**
   - Updated `user_manual_prompt` in `llm_prompt.yaml`:
     - Removed citation-related instructions (no [1] citations or citation mapping)
     - Set all rewritten queries to use English language
     - Streamlined response format without citation requirements

#### Technical Details:

- **Configuration Priority**: State-level config takes precedence over file config
- **Independent Configuration**: User manual agent now has its own `max_tool_rounds_user_manual` setting
- **Default Values**: Both main agent (3 rounds) and user manual agent (3 rounds) use same default
- **Validation**: All syntax checks and configuration loading tests passed

#### Benefits:

- **Flexibility**: Different tool round limits for different agent types
- **Maintainability**: Clear separation of concerns between agent configurations
- **Consistency**: Follows same configuration pattern as main agent
- **Customization**: Allows fine-tuning user manual agent behavior independently

#### Files Modified:
- `config.yaml`
- `service/config.py`
- `service/graph/state.py`
- `service/graph/graph.py`
- `service/graph/user_manual_rag.py`
- `llm_prompt.yaml`

## v1.1.0  User Manual Agent Update Summary - Fri Aug 29 22:20:20 HKT 2025

## ✅ Successfully Completed

1. **Prompt Configuration Update**
   - Updated `user_manual_prompt` in `llm_prompt.yaml`
   - Integrated query optimization, parallel retrieval, and evidence-based answering from `agent_system_prompt`
   - Verified prompt loading with test script (6566 chars)

2. **Agent Node Logic**
   - User manual agent node is autonomous with multi-round tool calls (3 rounds max)
   - Intent classification correctly routes to User_Manual_RAG
   - Agent node redirects to user_manual_agent_node correctly

3. **Multi-Round Tool Execution**
   - Successfully executes multiple tool rounds
   - Tool calls increment properly (1/3, 2/3, 3/3)
   - Max rounds protection works (forces final synthesis)

## 🚨 Issues Discovered

1. **Citation Number Error**:
   - Error: "AgentWorkflow error: 'citation number'"
   - Occurring during user manual agent execution

2. **SSE Streaming Issue**:
   - TypeError: 'coroutine' object is not iterable
   - Affecting streaming response delivery
   - StreamingResponse configuration needs fixing

## 📊 Test Results

- ✅ Prompt configuration test: PASSED
- ✅ Intent recognition: PASSED
- ✅ Agent routing: PASSED
- ✅ Multi-round tool calls: PASSED
- ❌ Citation processing: FAILED
- ❌ SSE streaming: FAILED

## 🔍 Next Steps

1. Fix citation number error in user manual agent
2. Fix SSE streaming response format
3. Complete end-to-end validation

## v1.0.9 - 2025-08-29 🤖

### 🤖 **User Manual Agent Transformation** *(Major Feature Enhancement)*

#### **🔄 Autonomous User Manual Agent Implementation** *(Architecture Upgrade)*
- **Agent Node Conversion**: Transformed `service/graph/user_manual_rag.py` from simple RAG to autonomous agent
  - **Detect-First-Then-Stream Strategy**: Implemented optimal multi-round behavior with tool detection and streaming synthesis
  - **Tool Round Management**: Added intelligent tool calling with configurable round limits and state tracking
  - **Conversation Trimming**: Integrated automatic context length management for long conversations
  - **Streaming Support**: Enhanced real-time response generation with HTML comment filtering
- **User Manual Tool Integration**: Specialized tool ecosystem for user manual operations
  - **Tool Schema Generation**: Automatic schema generation from `service/graph/user_manual_tools.py`
  - **Force Tool Choice**: Enabled autonomous tool selection for optimal response generation
  - **Tool Execution Pipeline**: Parallel-capable tool execution with streaming events and error handling
- **Routing Logic Enhancement**: Sophisticated routing system for multi-round workflows
  - **Smart Routing**: Routes between `user_manual_tools`, `user_manual_agent`, and `post_process`
  - **State-Aware Decisions**: Context-aware routing based on tool calls and conversation state
  - **Final Synthesis Detection**: Automatic transition to synthesis mode when appropriate
- **Error Handling & Recovery**: Comprehensive error management system
  - **Graceful Degradation**: User-friendly error messages with proper error categorization
  - **Stream Error Events**: Real-time error notification through streaming interface
  - **Tool Error Recovery**: Resilient tool execution with fallback mechanisms

#### **🔧 Technical Implementation Details** *(System Architecture)*
- **Function Signatures**: New agent functions following established patterns from main agent
  - `user_manual_agent_node()`: Main autonomous agent function
  - `user_manual_should_continue()`: Intelligent routing logic
  - `run_user_manual_tools_with_streaming()`: Enhanced tool execution
- **Configuration Integration**: Seamless integration with existing configuration system
  - **Prompt Template Usage**: Uses existing `user_manual_prompt` from `llm_prompt.yaml`
  - **Dynamic Prompt Formatting**: Contextual prompt generation with conversation history and retrieved content
  - **Tool Configuration**: Automatic tool binding and schema management
- **Backward Compatibility**: Maintained legacy function for seamless transition
  - **Legacy Wrapper**: `user_manual_rag_node()` redirects to new agent implementation
  - **API Consistency**: No breaking changes to existing interfaces
  - **Migration Path**: Smooth upgrade path for existing implementations

#### **✅ Testing & Validation** *(Quality Assurance)*
- **Comprehensive Test Suite**: New test script `scripts/test_user_manual_agent.py`
  - **Basic Agent Testing**: Tool detection, calling, and routing validation
  - **Integration Workflow Testing**: Complete multi-round conversation scenarios
  - **Error Handling Testing**: Graceful error recovery and user feedback
  - **Performance Validation**: Streaming response and tool execution timing
- **Functionality Validation**: All core features tested and validated
  - ✅ Tool detection and autonomous calling
  - ✅ Multi-round workflow execution
  - ✅ Streaming response generation
  - ✅ Error handling and recovery
  - ✅ State management and routing logic

#### **📚 Documentation & Examples** *(Knowledge Management)*
- **Implementation Guide**: Comprehensive documentation in `docs/topics/USER_MANUAL_AGENT_IMPLEMENTATION.md`
- **Usage Examples**: Practical code examples and implementation patterns
- **Architecture Overview**: Technical details and design decisions
- **Migration Guide**: Step-by-step upgrade instructions

**Impact**: Transforms user manual functionality from simple retrieval to intelligent autonomous agent capable of multi-round conversations, tool usage, and sophisticated response generation while maintaining full backward compatibility.

## v1.0.8 - 2025-08-29 📚

### 📚 **User Manual Prompt Enhancement** *(Functional Improvement)*

#### **🎯 Enhanced User Manual Assistant Prompt** *(Content Update)*
- **Context Disambiguation Rules**: Added comprehensive disambiguation guidelines for overlapping concepts
  - **Function Distinction**: Clear separation between Homepage functions (User) vs Admin Console functions (Administrator)
  - **Management Clarity**: Differentiated between user management vs user group management operations
  - **Role-based Operations**: Defined default roles for different operations (view/search for Users, edit/delete/configure for Administrators)
  - **Clarification Protocol**: Added requirement to ask for clarification when user context is unclear
- **Response Structure Standards**: Implemented standardized response formatting
  - **Step-by-Step Instructions**: Mandated complete procedural guidance with figures
  - **Structured Format**: Required specific format for each step (description, screenshot, additional notes)
  - **Business Rules Integration**: Ensured inclusion of all relevant business rules from source sections
  - **Documentation Structure**: Maintained original documentation hierarchy and organization
- **Content Reproduction Rules**: Established strict content fidelity guidelines
  - **Exact Wording**: Required copying exact wording and sequence from source sections
  - **Complete Information**: Mandated inclusion of ALL information without summarization
  - **Format Preservation**: Maintained original formatting and hierarchical structure
  - **No Reorganization**: Prohibited modification or reorganization of original content
- **Reference Integration**: Successfully merged guidance from `.vibe/ref/user_manual_prompt-ref.txt`
- **Quality Assurance**: Enhanced accuracy and completeness of user manual responses

#### **📋 Reference File Analysis** *(Content Optimization)*
- **catonline-ref.txt Assessment**: Evaluated system background reference content
  - **Content Alignment**: Confirmed existing content already covers CATOnline system background
  - **Redundancy Avoidance**: Decided against merging to prevent duplicate instructions
  - **Content Validation**: Verified accuracy and completeness of existing background information
- **user_manual_prompt-ref.txt Integration**: Successfully incorporated valuable operational guidelines
  - **Value Assessment**: Identified high-value content missing from existing prompt
  - **Strategic Merge**: Integrated content to enhance response quality without duplication
  - **Instruction Optimization**: Improved prompt effectiveness while maintaining conciseness

## v1.0.7 - 2025-08-29 🎯

### 🎯 **Intent Recognition Enhancement** *(Functional Improvement)*

#### **📝 Enhanced Intent Classification Prompt** *(Content Update)*
- **Detailed Guidelines**: Added comprehensive classification criteria based on reference files
- **Content vs System Operation**: Clear distinction between standard/regulation content queries and CATOnline system operation queries
- **Standard_Regulation_RAG Examples**:
  - "What regulations relate to intelligent driving?"
  - "How do you test the safety of electric vehicles?"
  - "What are the main points of GB/T 34567-2023?"
  - "What is the scope of ISO 26262?"
- **User_Manual_RAG Examples**:
  - "What is CATOnline (the system)?"
  - "How to do search for standards, regulations, TRRC news and deliverables?"
  - "How to create and update standards, regulations and their documents?"
  - "How to download or export data?"
- **Classification Guidelines**: Added specific rules for edge cases and ambiguous queries
- **Reference Integration**: Incorporated guidance from `.vibe/ref/intent-ref-1.txt` and `.vibe/ref/intent-ref-2.txt`

#### **🏢 CATOnline Background Information Integration** *(Context Enhancement)*
- **Background Context**: Added comprehensive CATOnline system background information to intent recognition prompt
- **System Definition**: Integrated explanation that CATOnline is the China Automotive Technical Regulatory Online System
- **Feature Coverage**: Included details about CATOnline capabilities:
  - TRRC process introductions and business areas
  - Standards/laws/regulations/protocols search and viewing
  - Document download and Excel export functionality
  - Consumer test and voluntary certification checking
  - Deliverable reminders and TRRC deliverable retrieval
  - Admin features: popup configuration, working groups management, standards/regulations CRUD operations
- **TRRC Context**: Added clarification that TRRC stands for Technical Regulation Region China of Volkswagen
- **Enhanced Classification**: Background information helps improve intent classification accuracy for CATOnline-specific queries

#### **🧪 Testing & Validation** *(Quality Assurance)*
- **Intent Recognition Tests**: Verified enhanced prompt with multiple test scenarios
- **Multi-Intent Workflow**: Validated proper routing between Standard_Regulation_RAG and User_Manual_RAG
- **Edge Case Handling**: Tested classification accuracy for ambiguous queries
- **TRRC Edge Case**: Added specific handling for TRRC-related queries to distinguish between content vs. system operation
- **CATOnline Background Tests**: Created comprehensive test suite for CATOnline-specific scenarios
- **100% Accuracy**: Maintained perfect classification accuracy on all test suites including background-enhanced scenarios

## v1.0.6 - 2025-08-28 🔧

### 🔧 **Code Architecture Refactoring & Optimization** *(Technical Improvement)*

#### **🧹 Code Structure Cleanup** *(Breaking Fix)*
- **Duplicate State Removal**: Eliminated duplicate `AgentState` definitions across modules
  - **Unified Definition**: Consolidated all state management to `/service/graph/state.py`
  - **Import Cleanup**: Removed redundant AgentState from `graph.py`
  - **Type Safety**: Ensured consistent state typing across all graph nodes
- **Circular Import Resolution**: Fixed circular dependency issues in module imports
- **Clean Dependencies**: Streamlined import statements and removed unused context variables

#### **📁 Module Separation & Organization** *(Code Organization)*
- **Intent Recognition Module**: Moved `intent_recognition_node` to dedicated `/service/graph/intent_recognition.py`
  - **Pure Function**: Self-contained intent classification logic
  - **LLM Integration**: Structured output with Pydantic Intent model
  - **Context Handling**: Intelligent conversation history rendering
- **User Manual RAG Module**: Extracted `user_manual_rag_node` to `/service/graph/user_manual_rag.py`
  - **Specialized Processing**: Dedicated user manual query handling
  - **Tool Integration**: Direct integration with user manual retrieval tools
  - **Stream Support**: Complete SSE streaming capabilities
- **Graph Simplification**: Cleaned up main `graph.py` by removing redundant code

#### **⚙️ Configuration Enhancement** *(Configuration)*
- **Prompt Externalization**: Moved all hardcoded prompts to `llm_prompt.yaml`
  - **Intent Recognition Prompt**: Configurable intent classification instructions
  - **User Manual Prompt**: Configurable user manual response template
  - **Agent System Prompt**: Existing agent behavior remains configurable
- **Runtime Configuration**: All prompts now loaded dynamically from config file
- **Deployment Flexibility**: Different environments can use different prompt configurations

#### **🧪 Testing & Validation** *(Quality Assurance)*
- **Graph Compilation Tests**: Verified successful compilation after refactoring
- **Multi-Intent Workflow Tests**: End-to-end validation of both intent pathways
- **Module Integration Tests**: Confirmed proper module separation and imports
- **Configuration Loading Tests**: Validated dynamic prompt loading from config files

#### **📋 Technical Details**
- **Files Modified**:
  - `/service/graph/graph.py` - Removed duplicate definitions, clean imports
  - `/service/graph/state.py` - Single source of truth for AgentState
  - `/service/graph/intent_recognition.py` - New dedicated module
  - `/service/graph/user_manual_rag.py` - New dedicated module
  - `/llm_prompt.yaml` - Added configurable prompts
- **Import Chain**: Fixed circular imports between graph nodes
- **Type Safety**: Consistent `AgentState` usage across all modules
- **Testing**: 100% pass rate on graph compilation and workflow tests

#### **🚀 Developer Experience**
- **Code Maintainability**: Better separation of concerns and module boundaries
- **Configuration Management**: Centralized prompt management for easier tuning
- **Debug Support**: Cleaner stack traces with resolved circular imports
- **Extension Ready**: Easier to add new intent types or modify existing behavior

#### **<2A> Internationalization & UX Improvements** *(User Experience)*
- **English Prompts**: Updated intent recognition prompts to use English for improved LLM classification accuracy
- **English User Manual Prompts**: Updated user manual RAG prompts to use English for consistency
- **Error Messages**: Converted all error messages to English for consistency
- **No Default Prompts**: Removed hardcoded fallback prompts, ensuring explicit configuration management
- **Enhanced Conversation Rendering**: Updated conversation history format to use `<user>...</user>` and `<ai>...</ai>` tags for better LLM parsing
- **Configuration Integration**: Added `intent_recognition_prompt` and `user_manual_prompt` to configuration loading system

#### **<2A>🎨 UI/UX Improvements** *(User Interface)*
- **Tool Icon Enhancement**: Updated `retrieve_system_usermanual` tool icon to `user-guide.png`
  - **Visual Distinction**: Better visual differentiation between standard regulation and user manual tools
  - **User Experience**: More intuitive icon representing user manual/guide functionality
  - **Icon Asset**: Leveraged existing `user-guide.png` icon from public assets

## v1.0.5 - 2025-08-28 🎯

### 🎯 **Multi-Intent RAG System Implementation** *(Major Feature)*

#### **🧠 Intent Recognition Engine** *(New)*
- **Intent Classification**: LLM-powered intelligent intent recognition with context awareness
- **Supported Intents**:
  - `Standard_Regulation_RAG`: Manufacturing standards, regulations, and compliance queries
  - `User_Manual_RAG`: CATOnline system usage, features, and operational guidance
- **Technology**: Structured output with Pydantic models for reliable classification
- **Accuracy**: 100% classification accuracy in testing across Chinese and English queries
- **Context Awareness**: Leverages conversation history for improved intent disambiguation

#### **🔄 Enhanced Workflow Architecture** *(Breaking Change)*
- **New Graph Structure**: `START → intent_recognition → [conditional_routing] → {Standard_RAG | User_Manual_RAG}`
- **Entry Point Change**: All queries now start with intent recognition instead of direct agent processing
- **Dual Processing Paths**:
  - **Standard_Regulation_RAG**: Multi-round agent workflow with tool orchestration (existing behavior)
  - **User_Manual_RAG**: Single-round specialized processing with user manual retrieval
- **Backward Compatibility**: Existing standard/regulation queries maintain full functionality

#### **📚 User Manual RAG Specialization** *(New)*
- **Dedicated Node**: `user_manual_rag_node` for specialized user manual processing
- **Tool Integration**: Direct integration with `retrieve_system_usermanual` tool
- **Response Template**: Professional user manual assistance with structured guidance
- **Streaming Support**: Real-time token streaming for immediate user feedback
- **Error Handling**: Graceful degradation with support contact suggestions

#### **🏗️ Technical Architecture Improvements**
- **State Management**: Enhanced `AgentState` with `intent` field for workflow routing
- **Modular Design**: Separated user manual tools into dedicated module (`user_manual_tools.py`)
- **Type Safety**: Full TypeScript-style type annotations with Literal types for intent routing
- **Memory Persistence**: Both intent paths support PostgreSQL session memory and conversation history
- **Testing Suite**: Comprehensive test coverage including intent recognition and end-to-end workflow validation

#### **🚀 Performance & Reliability**
- **Smart Routing**: Eliminates unnecessary tool calls for user manual queries
- **Optimized Flow**: Single-round processing for user manual queries vs multi-round for standards
- **Error Recovery**: Intent recognition failure gracefully defaults to standard regulation processing
- **Session Management**: Complete session persistence across both intent pathways

#### **📋 Query Classification Examples**
**Standard_Regulation_RAG Path**:
- "请问GB/T 18488标准的具体内容是什么？"
- "ISO 26262 functional safety standard requirements"
- "汽车安全法规相关规定"

**User_Manual_RAG Path**:
- "如何使用CATOnline系统进行搜索？"
- "How do I log into the CATOnline system?"
- "CATOnline系统的用户管理功能怎么使用？"

#### **🔧 Implementation Files**
- **Core Logic**: Enhanced `service/graph/graph.py` with intent nodes and routing
- **Intent Recognition**: `intent_recognition_node()` function with LLM classification
- **User Manual Processing**: `user_manual_rag_node()` function with specialized handling
- **State Management**: Updated `service/graph/state.py` with intent support
- **Tool Organization**: New `service/graph/user_manual_tools.py` module
- **Documentation**: Comprehensive implementation guide in `docs/topics/MULTI_INTENT_IMPLEMENTATION.md`

#### **📈 Impact**
- **User Experience**: Intelligent query routing for more relevant responses
- **System Efficiency**: Optimized processing paths based on query type
- **Extensibility**: Framework ready for additional intent types
- **Maintainability**: Clear separation of concerns between different query domains

---

## v1.0.4 - 2025-08-27 🔧

### 🔧 **New Tool Implementation**

#### **📚 System User Manual Retrieval Tool** *(New)*
- **Tool Name**: `retrieve_system_usermanual`
- **Purpose**: Search for document content chunks of user manual of this system (CATOnline)
- **Integration**: Full LangGraph integration with @tool decorator pattern
- **UI Support**: Complete frontend integration with multilingual UI labels
  - Chinese: "系统使用手册检索"
  - English: "System User Manual Retrieval"
- **Configuration**: Added `chunk_user_manual_index` support in SearchConfig
- **Error Handling**: Robust error handling with proper logging and fallback responses
- **Testing**: Comprehensive unit tests for tool structure and integration validation

#### **🎯 Technical Implementation Details**
- **Backend**: Added to `service/graph/tools.py` following LangGraph best practices
- **Frontend**: Integrated into `web/src/components/ToolUIs.tsx` with consistent styling
- **Translation**: Updated `web/src/utils/i18n.ts` with bilingual support
- **Configuration**: Enhanced `service/config.py` with user manual index configuration
- **Tool Registration**: Automatically included in tools list and schema generation

#### **📝 Note**
The search index `index-cat-usermanual-chunk-prd` referenced in the configuration is not yet available, but the tool framework is fully implemented and ready for use once the index is created.

## v1.0.3 - 2025-08-26 ✨

### ✨ **UI Enhancements & Example Questions**

#### **📱 Latest CSS Improvements** *(Just Updated)*
- **Enhanced Example Question Layout**: Increased min-width to 360px and max-width to 450px for better readability
- **Perfect Centering**: Added `justify-items: center` for professional grid alignment
- **Improved Spacing**: Enhanced padding and gap values for optimal visual hierarchy
- **Mobile Optimization**: Consistent responsive design with improved touch targets on mobile devices

#### **🎯 Welcome Page Example Questions**
- **Multilingual Support**: Added 4 interactive example questions with Chinese/English translations
- **Smart Interaction**: Click-to-send functionality using `useComposerRuntime()` hook for seamless assistant-ui integration
- **Responsive Design**: Auto-adjusting grid layout (2x2 on desktop, single column on mobile)
- **Professional Styling**: Card-based design with hover effects, shadows, and smooth animations

#### **🌐 Updated Branding & Messaging**
- **App Title**: Updated to "CATOnline AI助手" / "CATOnline AI Assistant"
- **Enhanced Descriptions**: Comprehensive service descriptions highlighting CATOnline semantic search capabilities
- **Detailed Welcome Messages**: Multi-paragraph welcome text explaining current service scope and upcoming features
- **Consistent Multilingual Content**: Perfect alignment between Chinese and English versions

#### **📝 Example Questions Added**
**Chinese**:
1. 电力储能用锂离子电池最新标准发布时间？
2. 如何测试电动汽车的充电性能？
3. 提供关于车辆通讯安全的法规
4. 自动驾驶L2和L3的定义

**English**:
1. When was the latest standard for lithium-ion batteries for power storage released?
2. How to test electric vehicle charging performance?
3. Provide regulations on vehicle communication security
4. Definition of L2 and L3 in autonomous driving

#### **🎨 Technical Implementation**
- **Custom Components**: Created `ExampleQuestionButton` component with proper TypeScript typing
- **CSS Enhancements**: Added responsive grid styles with mobile optimization
- **Architecture**: Seamlessly integrated with existing assistant-ui framework patterns
- **Language Detection**: Automatic language switching via URL parameters and browser detection

## v1.0.2 - 2025-08-26 🔧

### 🔧 **Error Handling & Code Quality Improvements**

#### **🛡️ DRY Error Handling System**
- **Backend Error Handler**: Added unified `error_handler.py` module with structured logging, decorators, and error categorization
- **Frontend Error Components**: Created ErrorBoundary and ErrorToast components with TypeScript support
- **Error Middleware**: Implemented centralized error handling middleware for FastAPI
- **Structured Logging**: JSON-formatted logs with timezone-aware timestamps
- **User-Friendly Messages**: Categorized error types (error/warning/network) with appropriate UI feedback

#### **🌐 Error Message Internationalization**
- **English Default**: All user-facing error messages now default to English for better accessibility
- **Consistent Messaging**: Updated error handler to provide clear, professional English error messages
- **Frontend Updates**: ErrorBoundary component now displays English error messages
- **Backend Messages**: Standardized API error responses in English across all endpoints

#### **🐛 Bug Fixes**
- **Configuration Loading**: Fixed `NameError: 'config' is not defined` in `main.py` by restructuring config loading order
- **Service Startup**: Resolved backend startup issues in both foreground and background modes
- **Deprecation Warnings**: Updated `datetime.utcnow()` to `datetime.now(timezone.utc)` for future compatibility
- **Type Safety**: Fixed TypeScript type conflicts in frontend error handling components

#### **🔄 Code Optimizations**
- **DRY Principles**: Eliminated code duplication in error handling across backend and frontend
- **Modular Architecture**: Separated error handling concerns into reusable, testable modules
- **Component Separation**: Split Toast functionality into distinct hook and component files
- **Clean Code**: Applied consistent naming conventions and removed redundant imports

---

## v1.0.1 - 2025-08-26 🔧

### 🔧 **Configuration Management Improvements**

#### **📋 Environment Configuration Extraction**
- **Centralized Configuration**: Extracted hardcoded environment settings to `config.yaml`
  - `max_tool_rounds`: Maximum tool calling rounds (configurable, default: 3)
  - `service.host` & `service.port`: Service binding configuration
  - `search.standard_regulation_index` & `search.chunk_index`: Search index names
  - `citation.base_url`: Citation link base URL for CAT system
- **Code Optimization**: Reduced duplicate `get_config()` calls in `graph.py` with module-level caching
- **Enhanced Maintainability**: Environment-specific values now externalized for easier deployment management

#### **🚀 Performance Optimizations**
- **Configuration Caching**: Implemented `get_cached_config()` to avoid repeated configuration loading
- **Reduced Code Duplication**: Eliminated 4 duplicate `get_config()` calls across the workflow
- **Memory Efficiency**: Single configuration instance shared across the application

#### **✅ Quality Assurance**
- **Comprehensive Testing**: All configuration changes validated with existing test suite
- **Backward Compatibility**: No breaking changes to API or functionality
- **Configuration Validation**: Added verification of configuration loading and usage

---

## v1.0.0 - 2025-08-25 🎉

### 🚀 **STABLE RELEASE** - Agentic RAG System for Standards & Regulations

This marks the first stable release of our **Agentic RAG System** - a production-ready AI assistant for enterprise standards and regulations search and management.

---

### 🎯 **Core Features**

#### **🤖 Autonomous Agent Architecture**
- **LangGraph-Powered Workflow**: Multi-step autonomous agent using LangGraph OSS for intelligent tool orchestration
- **2-Phase Retrieval Strategy**: Intelligent metadata discovery followed by detailed content retrieval
- **Parallel Tool Execution**: Optimized parallel query processing for maximum information coverage
- **Multi-Round Intelligence**: Adaptive retrieval rounds based on information gaps and user requirements

#### **🔍 Advanced Retrieval System**
- **Dual Retrieval Tools**:
  - `retrieve_standard_regulation`: Standards/regulations metadata discovery
  - `retrieve_doc_chunk_standard_regulation`: Detailed document content chunks
- **Smart Query Optimization**: Automatic sub-query generation with bilingual support (Chinese/English)
- **Version Management**: Intelligent selection of latest published and current versions
- **Hybrid Search Integration**: Optimized for Azure AI Search's keyword + vector search capabilities

#### **💬 Real-time Streaming Interface**
- **Server-Sent Events (SSE)**: Real-time streaming responses with tool execution visibility
- **Assistant-UI Integration**: Modern conversational interface with tool call visualization
- **Progressive Enhancement**: Token-by-token streaming with tool progress indicators
- **Citation Tracking**: Real-time citation mapping and reference management

---

### 🛠 **Technical Architecture**

#### **Backend (Python + FastAPI)**
- **FastAPI Framework**: High-performance async API with comprehensive CORS support
- **PostgreSQL Memory**: Persistent conversation history with 7-day TTL
- **Configuration Management**: YAML-based configuration with environment variable support
- **Structured Logging**: JSON-formatted logs with request tracing and performance metrics

#### **Frontend (Next.js + Assistant-UI)**
- **Next.js 15**: Modern React framework with optimized performance
- **Assistant-UI Components**: Pre-built conversational UI elements with streaming support
- **Markdown Rendering**: Enhanced markdown with LaTeX formula support and external links
- **Responsive Design**: Mobile-friendly interface with dark/light theme support

#### **AI/ML Pipeline**
- **LLM Support**: OpenAI and Azure OpenAI integration with configurable models
- **Prompt Engineering**: Sophisticated system prompts with context-aware instructions
- **Citation System**: Automatic citation mapping with source tracking
- **Error Handling**: Graceful fallbacks with constructive user guidance

---

### 🔧 **Production Features**

#### **Memory & State Management**
- **PostgreSQL Integration**: Robust conversation persistence with automatic cleanup
- **Session Management**: User session isolation with configurable TTL
- **State Recovery**: Conversation context restoration across sessions

#### **Monitoring & Observability**
- **Structured Logging**: Comprehensive request/response logging with timing metrics
- **Error Tracking**: Detailed error reporting with stack traces and context
- **Performance Metrics**: Token usage tracking and response time monitoring

#### **Security & Reliability**
- **Input Validation**: Comprehensive request validation and sanitization
- **Rate Limiting**: Built-in protection against abuse
- **Error Isolation**: Graceful error handling without system crashes
- **Configuration Security**: Environment-based secrets management

---

### 📊 **Performance Metrics**

- **Response Time**: < 200ms for token streaming initiation
- **Context Capacity**: 100k tokens for extended conversations
- **Tool Efficiency**: Optimized "mostly 2" parallel queries strategy
- **Memory Management**: 7-day conversation retention with automatic cleanup
- **Concurrent Users**: Designed for enterprise-scale deployment

---

### 🎨 **User Experience**

#### **Intelligent Interaction**
- **Bilingual Support**: Seamless Chinese/English query processing and responses
- **Visual Content**: Smart image relevance checking and embedding
- **Citation Excellence**: Professional citation mapping with source links
- **Error Recovery**: Constructive suggestions when information is insufficient

#### **Professional Interface**
- **Tool Visualization**: Real-time tool execution progress with clear status indicators
- **Document Previews**: Rich preview of retrieved standards and regulations
- **Export Capabilities**: Easy copying and sharing of responses with citations
- **Accessibility**: WCAG-compliant interface design

---

### 🔄 **Deployment & Operations**

#### **Development Workflow**
- **UV Package Manager**: Fast, Rust-based Python dependency management
- **Hot Reload**: Development server with automatic code reloading
- **Testing Suite**: Comprehensive unit and integration tests
- **Documentation**: Complete API documentation and user guides

#### **Production Deployment**
- **Docker Support**: Containerized deployment with multi-stage builds
- **Environment Configuration**: Flexible configuration for different deployment environments
- **Health Checks**: Built-in health monitoring endpoints
- **Scaling Ready**: Designed for horizontal scaling and load balancing

---

### 📈 **Business Impact**

- **Enterprise Ready**: Production-grade system for standards and regulations management
- **Efficiency Gains**: Automated intelligent search replacing manual document review
- **Accuracy Improvement**: AI-powered relevance filtering and version management
- **User Satisfaction**: Intuitive interface with professional citation handling
- **Scalability**: Architecture supports growing enterprise needs

---

### 🎁 **What's Included**

- ✅ Complete source code with documentation
- ✅ Production deployment configurations
- ✅ Comprehensive testing suite
- ✅ User and administrator guides
- ✅ API documentation and examples
- ✅ Docker containerization setup
- ✅ Monitoring and logging configurations

---

### 🚀 **Getting Started**

```bash
# Clone and setup
git clone <repository>
cd agentic-rag-4

# Install dependencies
uv sync

# Configure environment
cp config.yaml.example config.yaml
# Edit config.yaml with your settings

# Start services
make dev-backend  # Start backend service
make dev-web      # Start frontend interface

# Access the application
open http://localhost:3000
```

---

**🎉 Thank you to all contributors who made this stable release possible!**

## v0.11.4 - 2025-08-25

### 📝 LLM Prompt Restructuring and Optimization
- **Major Workflow Restructuring**: Reorganized retrieval strategy for better clarity and efficiency
  - **Simplified Workflow Structure**: Restructured "2-Phase Retrieval Strategy" section with clearer organization
    - Combined retrieval phases under unified "Retrieval Strategy (for Standards/Regulations)" section
    - Moved multi-round strategy explanation to the beginning for better flow
  - **Enhanced Context Parameters**: Updated max_context_length from 96k to 100k tokens for better conversation handling
  - **Query Strategy Optimization**: Refined sub-query generation approach
    - Changed from "2-3 parallel rewritten queries" to "parallel rewritten queries" for flexibility
    - Specified "2-3(mostly 2)" for sub-query generation to optimize efficiency
    - Reorganized language mixing strategy placement for better readability
  - **Duplicate Rule Consolidation**: Added version selection rule to synthesis phase (step 4) for consistency
    - Ensures version prioritization applies throughout the entire workflow, not just metadata discovery
  - **Enhanced Error Handling**: Improved "No-Answer with Suggestions" section
    - Added specific guidance to "propose 3–5 example rewrite queries" for better user assistance

### 🔧 Technical Improvements
- **Query Optimization**: Streamlined sub-query generation process for better performance
- **Workflow Consistency**: Ensured version selection rules apply consistently across all workflow phases
- **Parameter Tuning**: Increased context window capacity for handling longer conversations

### 🎯 Quality Enhancements
- **User Guidance**: Enhanced fallback suggestions with specific query rewrite examples
- **Retrieval Efficiency**: Optimized parallel query generation strategy
- **Version Management**: Extended version selection logic to synthesis phase for comprehensive coverage

### 📊 Impact
- **Performance**: More efficient query generation with "mostly 2" sub-queries approach
- **Consistency**: Unified version selection behavior across all workflow phases
- **User Experience**: Better guidance when retrieval yields insufficient results
- **Scalability**: Increased context capacity supports longer conversation histories

## v0.11.3 - 2025-08-25

### 📝 LLM Prompt Enhancement - Version Selection Rules
- **Standards/Regulations Version Management**: Added intelligent version selection logic to Phase 1 metadata discovery
  - **Version Selection Rule**: Added rule to handle multiple versions of the same standard/regulation
    - When retrieval results contain similar items (likely different versions), default to the latest published and current version
    - Only applies when user hasn't specified a particular version requirement
  - **Image Processing Enhancement**: Improved visual content handling instructions
    - Added relevance check by reviewing `<figcaption>` before embedding images
    - Ensures only relevant figures/images are included in responses
  - **Terminology Refinement**: Updated "official version" to "published and current version" for better precision
    - Reflects the concept of "发布的现行" - emphasizing both official publication and current validity

### 🎯 Quality Improvements
- **Smart Version Prioritization**: Enhanced metadata discovery to automatically select the most appropriate document versions
- **Visual Content Validation**: Added systematic approach to verify image relevance before inclusion
- **Linguistic Precision**: Improved terminology to better reflect regulatory document status

### 📊 Impact
- **User Experience**: Reduces confusion when multiple document versions are available
- **Content Quality**: Ensures responses include only relevant visual aids
- **Regulatory Accuracy**: Better alignment with how regulatory documents are categorized and prioritized

## v0.11.2 - 2025-08-24

### 🔧 Configuration and Development Workflow Improvements
- **LLM Prompt Configuration**: Enhanced prompt wording and removed redundant "ALWAYS" requirement for Phase 2 retrieval
  - **Workflow Flexibility**: Changed "ALWAYS follow this 2-phase strategy for ANY standards/regulations query" to "Follow this 2-phase strategy for standards/regulations query"
  - **Phase Organization**: Reordered Phase 1 metadata discovery sections for better logical flow (Purpose → Tool → Query strategy)
  - **Clearer Tool Description**: Enhanced Phase 2 tool description for better clarity
  - **Sub-query Generation**: Improved instructions for generating different rewritten sub-queries
- **Configuration Updates**:
  - **Tool Loop Limit**: Commented out `max_tool_loops` setting in config to use default value (5 instead of 10)
  - **Service Configuration**: Updated default `max_tool_loops` from 3 to 5 in AppConfig for better balance
- **Frontend Dependencies**: Added `rehype-raw` dependency for enhanced HTML processing in markdown rendering

### 🎯 Code Organization
- **Development Workflow**: Enhanced prompt management and configuration structure
- **Documentation**: Updated project structure to reflect latest changes and improvements
- **Dependencies**: Added necessary frontend packages for improved markdown and HTML processing

### 📝 Development Notes
- **Prompt Engineering**: Refined retrieval strategy instructions for more flexible execution
- **Configuration Management**: Simplified configuration by using sensible defaults
- **Frontend Enhancement**: Added support for raw HTML processing in markdown content

## v0.11.1 - 2025-08-24

### 📝 LLM Prompt Optimization
- **English Wording Improvements**: Comprehensive optimization of LLM prompt for better clarity and professional tone
  - **Grammar and Articles**: Fixed grammatical issues and article usage throughout the prompt
    - "for CATOnline system" → "for **the** CATOnline system"
    - "information got from retrieval tools" → "information **retrieved from** search tools"
    - "CATOnline is an standards" → "CATOnline is **a** standards"
  - **Word Choice Enhancement**: Improved vocabulary and clarity
    - "anwser questions" → "**answer** questions" (spelling correction)
    - "Give a Citations Mapping" → "**Provide** a Citations Mapping"
    - "Response in the user's language" → "**Respond** in the user's language"
    - "refuse and redirect" → "**decline** and redirect"
  - **Improved Flow and Structure**: Enhanced readability and professional presentation
    - "maintain core intent" → "maintain **the** core intent"
    - "in the below exact format" → "in the exact format **below**"
    - "citations_map is as:" → "citations_map **is:**"
  - **Technical Accuracy**: Fixed technical description issues in Phase 2 query strategy
  - **Consistency**: Ensured parallel structure and consistent terminology throughout

### 🎯 Quality Improvements
- **Professional Tone**: Enhanced overall professionalism of AI assistant instructions
- **Clarity**: Improved instruction clarity for better LLM understanding and execution
- **Readability**: Better structured sections with clearer headings and formatting

## v0.11.0 - 2025-08-24

### 🔧 HTML Comment Filtering Fix
- **Streaming Response Cleanup**: Fixed HTML comments leaking to client in streaming responses
  - **Robust HTML Comment Removal**: Implemented comprehensive filtering using regex pattern `<!--.*?-->` with DOTALL flag
  - **Citations Map Protection**: Specifically prevents `<!-- citations_map ... -->` comments from reaching client
  - **Multi-Point Filtering**: Applied filtering in both `call_model` and `post_process_node` functions
  - **Token Accumulation Strategy**: Enhanced streaming logic to accumulate tokens and batch-filter HTML comments

### 🛡️ Security and Data Integrity
- **Client-Side Protection**: Ensured no internal processing comments are exposed to end users
- **Citation Processing**: Maintained proper citation functionality while filtering internal metadata
- **Content Integrity**: Preserved all legitimate markdown content including citation links and references

### 🧪 Comprehensive Validation
- **HTML Comment Filtering Test**: Created dedicated test script `test_html_comment_filtering.py`
  - **1700+ Event Analysis**: Validated 1714 streaming events with zero HTML comment leakage
  - **Real HTTP API Testing**: Used actual streaming endpoint for authentic validation
  - **Pattern Detection**: Comprehensive regex pattern matching for all HTML comment variations
- **All Existing Tests Maintained**: Confirmed no regression in existing functionality
  - **Unit Tests**: 41/41 passing ✅
  - **Multi-Round Tool Calls**: Working correctly ✅
  - **2-Phase Retrieval**: Functioning as expected ✅
  - **Streaming Response**: Clean and efficient ✅

### 📊 Technical Implementation Details
- **Streaming Logic Enhancement**:
  ```python
  # Remove HTML comments while preserving content
  content = re.sub(r'<!--.*?-->', '', content, flags=re.DOTALL)
  ```
- **Performance Optimization**: Minimal impact on streaming performance through efficient regex processing
- **Error Handling**: Robust handling of edge cases in comment filtering
- **Backward Compatibility**: Full compatibility with existing citation and markdown processing

### 🎯 Quality Assurance Results
- **Zero HTML Comments**: No `<!-- citations_map ... -->` or other HTML comments found in client output
- **Citation Functionality**: All citation links and references render correctly
- **Streaming Performance**: No degradation in response time or user experience
- **Cross-Platform Testing**: Validated on multiple query types and response patterns

## v0.10.0 - 2025-08-24

### 🎯 Optimal Multi-Round Architecture Implementation
- **Streaming Only at Final Step**: Refactored architecture to follow optimal "streaming only at final step" pattern
  - **Non-Streaming Planning**: All tool calling phases now use non-streaming LLM calls for better stability
  - **Streaming Final Synthesis**: Only the final response generation step streams to the user
  - **Tool Results Accumulation**: Enhanced AgentState with `Annotated[List[Dict[str, Any]], reducer]` for proper tool result aggregation
  - **Temporary Tool Disabling**: Tools are automatically disabled during final synthesis phase to prevent infinite loops
  - **Simplified Routing Logic**: Streamlined `should_continue` logic based on tool_calls presence rather than complex state checks

### 🔧 Architecture Optimization
- **Enhanced State Management**: Improved AgentState design for robust multi-round execution
  - Added `tool_results` accumulation with proper reducer function
  - Enhanced `tool_rounds` tracking with automatic increment logic
  - Simplified state updates and transitions between agent and tools nodes
- **Tool Execution Improvements**: Refined parallel tool execution and error handling
  - Fixed tool disabling logic to prevent termination issues
  - Enhanced logging for better debugging and monitoring
  - Improved tool result processing and aggregation
- **Graph Flow Optimization**: Streamlined workflow routing for better reliability
  - Simplified conditional routing logic
  - Enhanced error handling and recovery mechanisms
  - Improved final synthesis triggering and tool state management

### 🧪 Comprehensive Test Validation
- **All Tests Passing**: Achieved 100% test success rate across all test categories
  - **Unit Tests**: 41/41 passed - Core functionality validated
  - **Script Tests**: 10/10 passed - Multi-round, streaming, and 2-phase retrieval confirmed
  - **Integration Tests**: Properly skipped (service-dependent tests)
- **Test Framework Improvements**: Enhanced script tests with proper async pytest decorators
  - Fixed import order and pytest.mark.asyncio decorators in all script test files
  - Resolved async function compatibility issues
  - Improved test reliability and execution speed

### ✅ Feature Validation Complete
- **Multi-Round Tool Calls**: ✅ Automatic execution of 1-3 rounds confirmed via service logs
- **Parallel Tool Execution**: ✅ Concurrent tool execution within each round validated
- **2-Phase Retrieval Strategy**: ✅ Both metadata and content retrieval tools used systematically
- **Streaming Response**: ✅ Final response streams properly after all tool execution
- **Error Handling**: ✅ Robust error handling for tool failures, timeouts, and edge cases
- **Tool State Management**: ✅ Proper tool disabling during synthesis prevents infinite loops

### 📝 Documentation Updates
- **Implementation Notes**: Updated documentation to reflect optimal architecture
- **Test Coverage**: Comprehensive documentation of test validation results
- **Service Logs**: Confirmed multi-round behavior through actual service execution logs

## v0.9.0 - 2025-08-24

### 🎯 Multi-Round Parallel Tool Calling Implementation
- **Auto Multi-Round Tool Execution**: Implemented true automatic multi-round parallel tool calling capability
  - Added `tool_rounds` and `max_tool_rounds` tracking to `AgentState` (default: 3 rounds)
  - Enhanced agent node with round-based tool calling logic and round limits
  - Fixed workflow routing to ensure final synthesis after completing all tool rounds
  - Agent can now automatically execute multiple rounds of tool calls within a single user interaction
  - Each round supports parallel tool execution for maximum efficiency

### 🔍 2-Phase Retrieval Strategy Enforcement
- **Mandatory 2-Phase Retrieval**: Fixed agent to consistently follow 2-phase retrieval for content queries
  - **Phase 1**: Metadata discovery using `retrieve_standard_regulation`
  - **Phase 2**: Content chunk retrieval using `retrieve_doc_chunk_standard_regulation`
  - Updated system prompt to make 2-phase retrieval mandatory for content-focused queries
  - Enhanced query construction with document_code filtering for Phase 2
  - Agent now correctly uses both tools for queries requiring detailed content (testing methods, procedures, requirements)

### 🧪 Comprehensive Testing Framework
- **Multi-Round Test Suite**: Created extensive test scripts to validate new functionality
  - `test_2phase_retrieval.py`: Validates both metadata and content retrieval phases
  - `test_multi_round_tool_calls.py`: Tests multi-round automatic tool calling behavior
  - `test_streaming_multi_round.py`: Confirms streaming works with multi-round execution
  - All tests confirm proper parallel execution and multi-round behavior

### 🔧 Technical Enhancements
- **Workflow Routing Logic**: Improved `should_continue()` function for proper multi-round flow
  - Enhanced routing logic to handle tool completion and round progression
  - Fixed final synthesis routing after maximum rounds reached
  - Maintained streaming response capability throughout multi-round execution
- **State Management**: Enhanced AgentState with round tracking and management
- **Tool Integration**: Verified both retrieval tools work correctly in multi-round scenarios

### ✅ Validation Results
- **Multi-Round Capability**: ✅ Agent executes 1-3 rounds of tool calls automatically
- **Parallel Execution**: ✅ Tools execute in parallel within each round
- **2-Phase Retrieval**: ✅ Agent uses both metadata and content retrieval tools
- **Streaming Response**: ✅ Full streaming support maintained throughout workflow
- **Round Management**: ✅ Proper progression and final synthesis after max rounds

## v0.8.7 - 2025-08-24

### 🛠 Tool Modularization
- **Tool Code Organization**: Extracted tool definitions and schemas into separate module
  - Created new `service/graph/tools.py` module containing all tool implementations
  - Moved `retrieve_standard_regulation` and `retrieve_doc_chunk_standard_regulation` functions
  - Added `get_tool_schemas()` and `get_tools_by_name()` utility functions
  - Updated `service/graph/graph.py` to import tools from the new module
  - Updated test imports to reference tools from the correct module location
  - Improved code maintainability and separation of concerns

## v0.8.6 - 2025-08-24

### 🔧 Configuration Restructuring
- **LLM Configuration Separation**: Extracted LLM parameters and prompt templates to dedicated `llm_prompt.yaml`
  - Created new `llm_prompt.yaml` file containing parameters and prompts sections
  - Added support for loading both `config.yaml` and `llm_prompt.yaml` configurations
  - Enhanced configuration models with `LLMParametersConfig` and `LLMPromptsConfig`
  - Added `get_max_context_length()` method for consistent context length access
  - Updated `message_trimmer.py` to use new configuration structure
  - Maintains backward compatibility with legacy configuration format

### 📂 File Structure Changes
- **New file**: `llm_prompt.yaml` - Contains all LLM-related parameters and prompt templates
- **Updated**: `service/config.py` - Enhanced to support dual configuration files
- **Updated**: `service/graph/message_trimmer.py` - Uses new configuration method

## v0.8.5 - 2025-08-24

### 🚀 Performance Improvements
- **Parallel Tool Execution**: Fixed sequential tool calling to implement true parallel execution
  - Modified `run_tools_with_streaming()` to use `asyncio.gather()` for concurrent tool calls
  - Added proper error handling and result aggregation for parallel execution
  - Improved tool execution performance when LLM calls multiple tools simultaneously
  - Enhanced logging to track parallel execution completion

### 🔧 Technical Enhancements
- **Query Optimization Strategy**: Enhanced agent prompt to encourage multiple parallel tool calls
  - Agent now generates 1-3 rewritten queries before retrieval
  - Cross-language query generation (Chinese ↔ English) for broader coverage
  - Optimized for Azure AI Search's Hybrid Search capabilities
  - True parallel tool calling implementation in LangGraph workflow

## v0.8.4 - 2025-08-24

### 🚀 Agent Intelligence Improvements
- **Advanced Query Rewriting Strategy**: Enhanced agent system prompt with intelligent query optimization
  - Added mandatory query rewriting step before retrieval tool calls
  - Generates 1-3 rewritten queries to explore different aspects of user intent
  - Cross-language query generation (Chinese ↔ English) for broader search coverage
  - Optimized queries for Azure AI Search's Hybrid Search (keyword + vector search)
  - Parallel retrieval tool calling for comprehensive information gathering
  - Enhanced coverage through synonyms, technical terms, and alternative phrasings

## v0.8.3 - 2025-08-24

### 🎨 UI/UX Improvements
- **Citation Format Update**: Changed citation format from superscript HTML tags `<sup>1</sup>` to square brackets `[1]`
  - Updated agent system prompt to use square bracket citations for improved readability
  - Modified citation examples in configuration to reflect new format
  - Enhanced Markdown compatibility with bracket-style citations

### 🔧 Configuration Updates
- **Agent System Prompt Optimization**: Enhanced prompt engineering for better query rewriting capabilities
  - Added support for generating 1-3 rewritten queries based on conversation context
  - Improved parallel tool calling workflow for comprehensive information retrieval
  - Added cross-language query generation (Chinese ↔ English) for broader search coverage
  - Optimized query text for Azure AI Search's Hybrid Search (keyword + vector search)

## v0.8.2 - 2025-08-24

### 🐛 Code Quality Fixes
- **Removed Duplicate Route Definitions**: Fixed main.py having duplicate endpoint definitions
  - Removed duplicate `/api/chat`, `/api/ai-sdk/chat`, `/health`, and `/` route definitions
  - Removed duplicate `if __name__ == "__main__"` blocks
  - Standardized `/api/chat` endpoint to use proper SSE configuration (`text/event-stream`)
- **Code Deduplication**: Cleaned up redundant code that could cause routing conflicts
- **Consistent Headers**: Unified streaming response headers for better browser compatibility

## v0.8.1 - 2025-08-24

### 🧪 Integration Test Modernization
- **Complete Integration Test Rewrite**: Modernized all integration tests to match latest codebase features
  - **Remote Service Testing**: All integration tests now connect to running service at `http://localhost:8000` using `httpx.AsyncClient`
  - **LangGraph v0.6+ Compatibility**: Updated streaming contract validation for latest LangGraph features
  - **PostgreSQL Memory Testing**: Added session persistence testing with PostgreSQL backend
  - **AI SDK Endpoints**: Comprehensive testing of `/api/chat` and `/api/ai-sdk/chat` endpoints

### 🔄 Test Infrastructure Updates
- **Modern Async Patterns**: Converted all tests to use `pytest.mark.asyncio` and async/await
- **Server-Sent Events (SSE)**: Added streaming response validation with proper SSE format parsing
- **Citation Processing**: Testing of citation CSV format and tool result aggregation
- **Concurrent Testing**: Multi-session and rapid-fire request testing for performance validation

### 📁 Test File Organization
- **`test_api.py`**: Basic API endpoints, request validation, CORS/security headers, error handling
- **`test_full_workflow.py`**: End-to-end workflows, session continuity, real-world scenarios
- **`test_streaming_integration.py`**: Streaming behavior, performance, concurrent requests, content validation
- **`test_e2e_tool_ui.py`**: Complete tool UI workflows, multi-turn conversations, specialized queries
- **`test_mocked_streaming.py`**: Mocked streaming tests for internal validation without external dependencies

### 🎯 Test Coverage Enhancements
- **Real-World Scenarios**: Compliance officer and engineer research workflow testing
- **Performance Testing**: Response timing, large context handling, rapid request sequences
- **Error Recovery**: Session recovery after errors, timeout handling, malformed request validation
- **Content Validation**: Unicode support, encoding verification, response consistency testing

### ⚙️ Test Execution
- **Service Dependency**: Integration tests require running service (fail appropriately when service unavailable)
- **Flag-based Execution**: Use `--run-integration` flag to execute integration tests
- **Comprehensive Validation**: All tests validate response structure, streaming format, and business logic

## v0.8.0 - 2025-08-23

### 🚀 Major Changes - PostgreSQL Migration
- **Breaking Change**: Migrated session memory storage from Redis to PostgreSQL
  - **Complete removal of Redis dependencies**: Removed `redis` and `langgraph-checkpoint-redis` packages
  - **New PostgreSQL-based session persistence**: Using `langgraph-checkpoint-postgres` for robust session management
  - **Azure Database for PostgreSQL**: Configured for production Azure environment with SSL security
  - **7-day TTL**: Automatic cleanup of old conversation data with PostgreSQL-based retention policy

### 🔧 Session Memory Infrastructure
- **PostgreSQL Storage**: Implemented comprehensive session-level memory with PostgreSQL persistence
  - Created `PostgreSQLCheckpointerWrapper` for complete LangGraph checkpointer interface compatibility
  - Automatic schema migration and table creation via LangGraph PostgresSaver
  - Robust connection pooling with `psycopg[binary]` driver
  - Context-managed database connections with automatic cleanup
- **Backward Compatibility**: Full interface compatibility with existing Redis implementation
  - All checkpointer methods (sync/async): `get`, `put`, `list`, `get_tuple`, `put_writes`, etc.
  - Graceful fallback mechanisms for async methods not natively supported by PostgresSaver
  - Thread-safe execution with proper async/sync method bridging

### 🛠️ Technical Improvements
- **Configuration Updates**:
  - Added `postgresql` configuration section to `config.yaml`
  - Removed `redis` configuration sections completely
  - Updated all logging and comments from "Redis" to "PostgreSQL"
- **Memory Management**:
  - `PostgreSQLMemoryManager` for conditional PostgreSQL/in-memory checkpointer initialization
  - Connection testing and validation during startup
  - Improved error handling with detailed logging and connection diagnostics
- **Code Architecture**:
  - Updated `AgenticWorkflow` to use PostgreSQL checkpointer for session memory
  - Fixed variable name conflicts in `ai_sdk_chat.py` (config vs graph_config)
  - Proper state management using `TurnState` objects in workflow execution

### 🐛 Bug Fixes
- **Workflow Execution**: Fixed async method compatibility issues with PostgresSaver
  - Resolved `NotImplementedError` for `aget_tuple` and other async methods
  - Added fallback to sync methods with proper thread pool execution
  - Fixed LangGraph integration with correct `AgentState` format usage
- **Session History**: Restored conversation memory functionality
  - Fixed session history loading and persistence across conversation turns
  - Verified multi-turn conversations correctly remember previous context
  - Ensured proper message threading with session IDs

### 🧹 Cleanup & Maintenance
- **Removed Legacy Code**:
  - Deleted `redis_memory.py` and all Redis-related implementations
  - Cleaned up temporary test files and development artifacts
  - Removed all `__pycache__` directories
  - Deleted obsolete backup and version files
- **Updated Documentation**:
  - All code comments updated from Redis to PostgreSQL references
  - Logging messages updated to reflect PostgreSQL usage
  - Maintained existing API documentation and interfaces

### ✅ Verification & Testing
- **Functional Testing**: All core features verified working with PostgreSQL backend
  - Chat functionality with tool calling and streaming responses
  - Session persistence across multiple conversation turns
  - PostgreSQL schema auto-creation and TTL cleanup functionality
  - Health check endpoints and service startup/shutdown procedures
- **Performance**: No degradation in response times or functionality
  - Maintained all existing streaming capabilities
  - Tool execution and result processing unchanged
  - Citation processing and response formatting intact

### 📈 Impact
- **Production Ready**: Fully migrated from Redis to Azure Database for PostgreSQL
- **Scalability**: Better long-term data management with relational database benefits
- **Reliability**: Enhanced data consistency and backup capabilities through PostgreSQL
- **Maintainability**: Simplified dependency management with single database backend

---

## v0.7.9 - 2025-08-23

### 🐛 Bug Fixes
- **Fixed**: Syntax errors in `service/graph/graph.py`
  - Fixed type annotation errors with message parameters by adding proper type casting
  - Fixed graph.astream call type errors by using proper `RunnableConfig` and `AgentState` typing
  - Added missing `cast` import for better type handling
  - Ensured compatibility with LangGraph and LangChain type system

---

## v0.7.8 - 2025-08-23

### 🔧 Configuration Updates
- **Breaking Change**: Replaced `max_tokens` with `max_context_length` in configuration
- **Added**: Optional `max_output_tokens` setting for LLM response length control
  - Default: `None` (no output token limit)
  - When set: Applied as `max_tokens` parameter to LLM calls
  - Provides flexibility to limit output length when needed
- Updated conversation history management to use 96k context length by default
- Improved token allocation: 85% for conversation history, 15% reserved for responses

### 🔄 Conversation Management
- Enhanced conversation trimmer to handle larger context windows
- Updated trimming strategy to allow ending on AI messages for better conversation flow
- Improved error handling and fallback mechanisms in message trimming

### 📝 Documentation
- Updated conversation history management documentation
- Clarified distinction between context length and output token limits
- Added examples for optional output token limiting

---

## v0.7.7 - 2025-08-23

### Added
- **Conversation History Management**: Implemented automatic context length management
  - Added `ConversationTrimmer` class to handle conversation history trimming
  - Integrated with LangChain's `trim_messages` utility for intelligent message truncation
  - Automatic token counting and trimming to prevent context window overflow
  - Preserves system messages and maintains conversation validity
  - Fallback to message count-based trimming when token counting fails
  - Configurable token limits with 70% allocation for conversation history
  - Smart conversation flow preservation (starts with human, ends with human/tool)

### Enhanced
- **Context Window Protection**: Prevents API failures due to exceeded token limits
  - Monitors conversation length and applies trimming when necessary
  - Maintains conversation quality while respecting LLM context constraints
  - Improves reliability for long-running conversations

## v0.7.6 - 2025-08-23

### Enhanced
- **Universal Tool Calling**: Implemented consistent forced tool calling across all query types
  - Modified graph.py to always use `tool_choice="required"` for better DeepSeek compatibility
  - Ensures reliable tool invocation for both technical and non-technical queries
  - Provides consistent behavior across all LLM providers (Azure, OpenAI, DeepSeek)
  - Maintains response quality while guaranteeing tool usage for retrieval-based queries

### Validated
- **DeepSeek Integration**: Comprehensive testing confirms optimal configuration
  - Verified that ChatOpenAI with custom endpoints fully supports DeepSeek models
  - Confirmed that forced tool calling resolves DeepSeek tool invocation issues
  - Tested both technical queries (GB/T standards) and general queries (greetings)
  - Established that current implementation requires no DeepSeek-specific handling

## v0.7.5 - 2025-01-18

### Improved
- **Code Simplification**: Removed unnecessary ChatDeepSeek dependency and complexity
  - Simplified LLMClient to use only ChatOpenAI for all OpenAI-compatible endpoints (including custom DeepSeek)
  - Removed unused `langchain-deepseek` dependency as ChatOpenAI handles custom DeepSeek endpoints perfectly
  - Cleaned up _create_llm method by removing DeepSeek-specific handling logic
  - Maintained full compatibility with existing tool calling functionality
  - Code is now more maintainable and follows KISS principle

## v0.7.4 - 2025-08-23

### Fixed
- **OpenAI Provider Tool Calling**: Fixed DeepSeek model tool calling issues for custom endpoints
  - Added `langchain-deepseek` dependency for better DeepSeek model support
  - Modified LLMClient to use ChatOpenAI for custom DeepSeek endpoints (instead of ChatDeepSeek which only works with official api.deepseek.com)
  - Implemented forced tool calling using `tool_choice="required"` for initial queries to ensure tool usage
  - Enhanced agent system prompt to explicitly require tool usage for all information queries
  - Resolved issue where DeepSeek models weren't calling tools consistently when using provider: openai
  - Now both Azure and OpenAI providers (including custom DeepSeek endpoints) work correctly with tool calling

### Enhanced
- **System Prompt Optimization**: Improved agent prompts for better tool usage reliability
  - Added explicit tool listing and mandatory workflow instructions
  - Enhanced prompts specifically for GB/T standards and technical information queries
  - Better handling of Chinese technical queries with forced tool retrieval

## v0.7.3 - 2025-08-23

### Fixed
- **Citation Display**: Fixed citation header visibility logic
  - Modified `_build_citation_markdown` function to only display "### 📘 Citations:" header when valid citations exist
  - Prevents empty citation sections from appearing when agent response doesn't contain citation mapping
  - Improved user experience by removing unnecessary empty citation headers

## v0.7.2 - 2025-01-16

### Enhanced
- **Tool Conversation Context**: Added conversation history parameter support to retrieval tools
  - Both `retrieve_standard_regulation` and `retrieve_doc_chunk_standard_regulation` now accept `conversation_history` parameter
  - Enhanced agent node to autonomously use tools with conversation context for better multi-turn understanding
  - Improved tool call responses with contextual information for citations mapping
- **Citation Processing**: Improved citation mapping and metadata handling
  - Updated `_build_citation_markdown` to prioritize English titles over Chinese for internationalization
  - Enhanced `_normalize_result` function with dynamic structure and selective field removal
  - Removed noise fields (`@search.score`, `@search.rerankerScore`, `@search.captions`, `@subquery_id`) from tool responses
  - Improved tool result metadata structure with `@tool_call_id` and `@order_num` for accurate citation mapping
- **Agent Optimization**: Refined autonomous agent workflow for better tool usage
  - Function calling mode (not ReAct) to minimize LLM calls and token consumption
  - Enhanced multi-step tool loops with improved context passing between tool calls
  - Optimized retrieval API configurations with `include_trace: False` for cleaner responses
- **Session Management**: Improved session behavior for better user experience
  - Changed session ID generation to create new session on every page refresh
  - Switched from localStorage to sessionStorage for session ID persistence
  - New sessions start fresh conversations while maintaining session isolation per browser tab

### Fixed
- **Tool Configuration**: Updated retrieval API field selections and search parameters
  - Standardized field lists for `select`, `search_fields`, and `fields_for_gen_rerank` across tools
  - Removed deprecated `timestamp` and `x_Standard_Code` fields from standard regulation tool
  - Added missing metadata fields (`func_uuid`, `filepath`, `x_Standard_Regulation_Id`) for proper citation link generation

## v0.7.1 - 2025-01-16

### Fixed
- **Session Memory Bug**: Fixed critical multi-turn conversation context loss in webchat
  - **Root Cause**: `ai_sdk_chat.py` was creating new `TurnState` for each request without loading previous conversation history from Redis/LangGraph memory
  - **Additional Issue**: Frontend was generating new `session_id` for each request instead of maintaining persistent session
  - **Solution**: Refactored to let LangGraph's checkpointer handle session history automatically using `thread_id`
  - **Frontend Fix**: Added `useSessionId` hook to maintain persistent session ID in localStorage, passed via headers to backend
  - **Implementation**: Removed manual state creation, pass only new user message and `session_id` to compiled graph
  - **Validation**: Tested multi-turn conversations with same `session_id` - second message correctly references first message context
  - **Session Isolation**: Verified different sessions maintain separate conversation contexts without cross-contamination

### Enhanced
- **Memory Integration**: Improved LangGraph session memory reliability
  - Stream callback handling via contextvars for proper async streaming
  - Automatic fallback to in-memory checkpointer when Redis modules unavailable
  - Robust error handling for Redis connection issues while maintaining session functionality
- **Frontend Session Management**: Added persistent session ID management
  - `useSessionId` React hook for localStorage-based session persistence
  - Session ID passed via `X-Session-ID` header from frontend to backend
  - Graceful fallback to generated session ID if none provided

## v0.7.0 - 2025-08-22

### Added
- **Redis Session Memory**: Implemented robust session-level memory with Redis persistence
  - Redis-based chat history storage with 7-day TTL using Azure Cache for Redis
  - LangGraph `RedisSaver` integration for session persistence and state management
  - Graceful fallback to `InMemorySaver` if Redis is unavailable or modules missing
  - Session-level memory isolation using `thread_id` for proper conversation context
  - Config validation with dedicated `RedisConfig` model for connection parameters
  - Session memory verification tests confirming isolation and persistence

### Enhanced
- **Memory Architecture**: Refactored from simple in-memory store to session-based graph memory
  - Migrated from `InMemoryStore` to LangGraph's checkpoint system
  - Updated `AgenticWorkflow` graph to use `MessagesState` with Redis persistence
  - Added `RedisMemoryManager` for conditional Redis/in-memory checkpointer initialization
  - Session-based conversation tracking via `session_id` as LangGraph `thread_id`

## v0.6.2 - 2025-08-22

### Added
- **Stream Filtering for Citations Mapping**: Implemented intelligent filtering of citations mapping HTML comments from token stream
  - Agent-generated citations mapping is now filtered from the client-side stream while preserved in the complete response
  - Added buffer-based detection of HTML comment boundaries (`<!--` and `-->`)
  - Ensures citations mapping CSV remains available for post-processing while not displaying to users
  - Maintains complete response integrity in state for `post_process_node` to access citations mapping
  - Enhanced token streaming logic with comment detection and filtering state management

### Improved
- **Optimized Stream Buffering Logic**: Enhanced token filtering to minimize latency
  - Non-comment tokens are now sent immediately to client without unnecessary buffering
  - Only potential HTML comment prefixes (`<`, `<!`, `<!-`) are buffered for detection
  - Reduced buffer size from 10 characters to 4 characters (minimum needed for `<!--`)
  - Improved user experience with faster token delivery for normal content
- **Citation List Block Return**: Changed citation list delivery from character-by-character streaming to single block return
  - Citations are now sent as a complete markdown block in post-processing
  - Improved rendering performance and reduces UI jitter
  - Better user experience with instant citation list appearance

### Technical
- **Stream Token Filtering Logic**: Enhanced `call_model` function in agent node with sophisticated filtering
  - Implements intelligent buffering that only delays tokens when necessary for comment detection
  - Maintains filtering state to handle multi-token HTML comments
  - Preserves all content in response while selectively filtering stream output
  - Compatible with existing streaming protocol and post-processing pipeline

## v0.6.1 - 2025-08-22

### Added
- **Citation List and Link Building**: Enhanced `post_process_node` to build complete citation lists with links
  - Added citation mapping extraction from agent responses using CSV format in HTML comments
  - Implemented citation markdown generation following `build_citations.py` logic
  - Added automatic link generation for CAT system with proper URL encoding
  - Added helper functions: `_extract_citations_mapping`, `_build_citation_markdown`, `_remove_citations_comment`
- **Frontend External Links Support**: Added `rehype-external-links` plugin for secure external link handling
  - Installed `rehype-external-links` v3.0.0 dependency in web frontend
  - Configured automatic `target="_blank"` and `rel="noopener noreferrer"` for external links
  - Enhanced security and UX for citation links and external references

### Fixed
- **Chat UI Link Rendering**: Fixed links not being properly rendered in the chat interface
  - Resolved component configuration conflict between `MyChat` and `AiAssistantMessage`
  - Updated `AiAssistantMessage` to properly use `MarkdownText` component with external links support
  - Added `@tailwindcss/typography` plugin for proper prose styling
  - Enhanced link styling with blue color and hover effects
  - Added intelligent content detection to handle both Markdown and HTML content
  - Installed `isomorphic-dompurify` for safe HTML sanitization
  - Enhanced Agent prompt to explicitly require Markdown-only output (no HTML tags)

### Changed
- **Enhanced Post-Processing**: `post_process_node` now processes citations mapping and generates structured citation lists
  - Extracts citations mapping CSV from agent response HTML comments
  - Builds proper citation markdown with document titles, headers, and clickable links
  - Streams citation markdown to client for real-time display
  - Maintains clean separation between agent response and citation processing

### Technical
- Added URL encoding support for document codes and titles
- Improved error handling in citation processing with fallback to error messages
- Maintained backward compatibility with existing streaming protocol
- Enhanced markdown rendering with proper external link security attributes

## v0.6.0 - 2025-08-22

### Changed
- **Removed `agent_done` event**: The streaming protocol no longer includes the deprecated `agent_done` event.
  - Removed handling in `AISDKEventAdapter` (`service/ai_sdk_adapter.py`).
  - Cleaned up commented-out `create_agent_done_event` in `service/sse.py` and related imports in `service/graph/graph.py`.
  - Updated tests to no longer expect `agent_done` events across unit and integration suites.

### Technical
- Simplified adapter logic by eliminating obsolete event type handling.
- Version bump to reflect breaking change in streaming protocol.

## v0.5.3 - 2025-01-27

### Fixed
- **Tool Result Retrieval**: Fixed agent not receiving tool results correctly
  - Fixed tool node serialization in `service/graph/graph.py`
  - Tool results now passed directly as dicts to agent instead of using `model_dump()`
  - Agent can now correctly retrieve and use tool results in conversation flow
  - Verified through SSE stream testing that tool results are properly transmitted

## v0.5.2 - 2025-01-27

### Changed
- **Simplified Data Structure**: Rewrote `_normalize_result` function to return dynamic data structure
  - Returns `Dict[str, Any]` instead of rigid `RetrievalResult` class
  - Automatically removes search-specific fields: `@search.score`, `@search.rerankerScore`, `@search.captions`, `@subquery_id`
  - Removes empty fields (None, empty string, empty list, empty dict)
  - Cleaner, more flexible result processing

### Removed
- **Removed Schema Dependencies**: Eliminated `service/schemas/retrieval.py`
  - No longer need `RetrievalResult` class or `metadata` field
  - Simplified `RetrievalResponse` class moved inline to `agentic_retrieval.py`
  - Reduced code complexity and maintenance overhead

### Technical
- Updated `AgenticRetrieval` class to use dynamic result normalization
- Maintained backward compatibility with existing tool interfaces
- Improved data processing efficiency

## v0.5.1 - 2025-01-27

### Added
- **Citations Mapping CSV**: Added citations mapping CSV functionality to agent responses
  - Updated `agent_system_prompt` in `config.yaml` to instruct LLM to generate citations mapping CSV
  - Citations mapping CSV format: `{citation_number},{tool_call_id},{search_result_code}`
  - Citations mapping embedded in HTML comment at end of response: `<!-- citations_map ... -->`
  - Includes brief example in system prompt for clarity
  - Fully compatible with existing streaming and markdown processing

### Technical
- Verified agent node and post-processing node support citations mapping output
- Confirmed SSE streaming handles citations mapping within markdown content
- Created validation test script to verify output format

## v0.5.0 - 2025-08-21

### Changed - Major Simplification
- **Simplified `post_process_node`**: 大幅简化后处理节点，现在只返回工具调用结果条目数的简单摘要
  - 移除复杂的答案和引用提取逻辑
  - 移除多个post-append事件流和特殊的`tool_summary`事件
  - **工具摘要作为普通消息**: 现在工具执行摘要直接作为常规的AI消息返回，以Markdown格式呈现
  - **统一消息处理**: 去除特殊事件处理逻辑，工具摘要通过标准消息流处理，前端以普通markdown渲染
  - 显著减少代码复杂度和维护成本，提升通用性

### Removed
- **AgentState字段简化**: 从`AgentState`中移除`citations_mapping_csv`字段
  - 该字段仅用于复杂的引用处理，现已不需要
  - 保留`stream_callback`字段，因为它在整个图形中用于事件流传输
  - 相应地从`TurnState`中也移除了`citations_mapping_csv`字段

- **移除未使用的辅助函数**:
  - `_extract_citations_from_markdown()`: 从Markdown中提取引用的复杂逻辑
  - `_generate_basic_citations()`: 生成基础引用映射的函数
  - `create_post_append_events()`: 创建复杂post-append事件序列的函数（已被简化的工具摘要替代）
  - `create_tool_summary_event()`: 创建特殊工具摘要事件的函数（改为普通消息处理）
  - 简化代码库，移除不再需要的引用处理逻辑

- **清理SSE模块**: 移除业务特定的事件创建函数
  - 删除`create_post_append_events()`和`create_tool_summary_event()`函数及其相关测试
  - SSE模块现在只包含通用的事件创建工具函数
  - 提升模块的内聚性和可复用性

### Added
- **统一消息处理架构**: 工具执行摘要现在通过标准的LangGraph消息流处理
  - 工具摘要以Markdown格式呈现，包含 `**Tool Execution Summary**` 标题
  - 前端以普通markdown渲染，无需特殊事件处理逻辑
  - 提升了系统的通用性和一致性

### Impact
- **代码复杂度**: 显著降低后处理逻辑的复杂度
- **维护性**: 更易于理解和维护的post-processing流程
- **性能**: 减少事件处理开销，更快的响应时间
- **向后兼容**: 保持API接口兼容，内部实现简化

## v0.4.9 - 2024-12-21

### Changed
- 重命名前端目录：`web/src/lib` → `web/src/utils`
- 更新所有相关引用以使用新的目录结构
- 移除`web/src/components/ToolUIs.tsx`中未使用的imports
- 提升代码组织一致性，utils目录更准确反映其工具函数的性质

### Fixed
- 修复前端构建错误：删除对不存在schemas的引用
- 确保前端构建成功且服务正常运行

## v0.4.8 - 2024-12-21


### Removed
- 删除冗余的 `service/retrieval/schemas.py` 文件
- 该文件定义的静态工具schemas已被graph.py中的动态生成方式替代
- 消除代码重复，简化维护，避免静态和动态定义不一致的风险

### Improved
- 工具schemas现在完全通过动态生成，基于工具对象属性
- 减少代码冗余，提升maintainability
- 统一工具schema定义方式，确保一致性

### Technical
- 验证删除后服务仍正常运行
- 保持向后兼容，无破坏性变更

## [0.4.7] - 2024-12-21## Refactored
- 重构代码目录结构，提升语义清晰度和模块化
- `service/tools/` → `service/retrieval/`
- `service/tools/retrieval.py` → `service/retrieval/agentic_retrieval.py`
- 更新所有相关导入路径，确保代码结构更加清晰和专业
- 清理Python缓存文件，避免导入冲突

### Verified
- 验证重构后服务启动正常，所有功能运行正常
- 工具调用、Agent流程、后处理节点均工作正常
- HTTP API调用和响应流畅运行
- 无破坏性变更，向后兼容

### Technical
- 提升代码可维护性和可读性
- 为后续功能扩展奠定更好的基础架构
- 符合Python项目最佳实践的目录命名规范

## [0.4.6] - 2024-12-21.4.6 - 2024-12-21

### Improved
- 降低工具执行时图标的闪烁频率，提升视觉体验
- 将脉冲动画从2秒延长到3-4秒，减少干扰性
- 调整透明度变化从0.6到0.75/0.85，更加柔和
- 添加温和的缩放效果(pulse-gentle)替代强烈的透明度变化
- 新增小型旋转加载指示器，提供更好的运行状态反馈
- 优化动画性能，使用更平滑的过渡效果

### Technical
- 新增CSS动画类：animate-pulse-gentle, animate-spin-slow
- 改进工具UI的加载状态视觉设计
- 提供多种动画强度选择，适应不同用户偏好

## [0.4.5] - 2024-12-21

### Fixed
- 修复工具调用抽屉展开后显示原始JSON的问题
- 为检索工具结果提供格式化显示，包含文档标题、评分、内容预览和元数据
- 添加"格式化显示/原始数据"切换按钮，用户可选择查看方式
- 改进结果展示的用户体验，文档内容支持行截断显示
- 添加CSS line-clamp工具类支持文本截断

### Improved
- 工具UI结果显示更加用户友好和直观
- 支持长文档内容的截断预览（超过200字符自动截断）
- 增强了检索结果的可读性，突出显示关键信息

## [0.4.4] - 2024-12-21

### Changed
- Completely refactored `/web` codebase for DRY and best practices
- Created unified `ToolUIRenderer` component with TypeScript strict typing
- Eliminated all `any` types and improved type safety throughout
- Simplified tool UI generation with generic `createToolUI` factory function
- Fixed all TypeScript compilation errors and ESLint warnings
- Added missing dependencies: `@langchain/langgraph-sdk`, `@assistant-ui/react-langgraph`

### Removed
- All legacy test directories and components (`simplified`, `ui-test`, `chat-simplified`)
- Duplicate tool UI components (`EnhancedAssistant.tsx`, `ModernAssistant.tsx`, etc.)
- Empty directories and backup files
- TypeScript `any` type usage across API routes

### Fixed
- React Hooks usage in assistant-ui tool render functions
- TypeScript strict type checking compliance
- Build process now passes without errors or warnings
- Proper module exports and imports throughout codebase

### Technical
- Codebase now fully compliant with assistant-ui + LangGraph v0.6.0+ best practices
- All components properly typed with TypeScript strict mode
- Single source of truth for UI logic with `Assistant.tsx` component
- DRY tool UI implementation reduces code duplication by ~60%

## [0.4.3] - 2024-12-21

### ⚙️ Web UI Best Practices Implementation
- Updated frontend `/web` using `@assistant-ui/react@0.10.43`, `@assistant-ui/react-ui@0.1.8`, `@assistant-ui/react-markdown@0.10.9`, `@assistant-ui/react-data-stream@0.10.1`
- Improved Next.js API routes under `/web/src/app/api` for AI SDK Data Stream Protocol compatibility and enhanced error handling
- Added `EnhancedAssistant`, `SimpleAssistant`, and `FrontendTools` React components demonstrating assistant-ui best practices
- Created `docs/topics/ASSISTANT_UI_BEST_PRACTICES.md` guideline documentation
- Added unit tests in `tests/unit/test_assistant_ui_best_practices.py` validating dependencies, config, API routes, components, and documentation
- Switched to `pnpm` for dependency management with updated install scripts (`pnpm install`, `pnpm dev`)

### ✅ Tests
- All existing and new unit tests and integration tests passed, including best practices validation tests

## v0.4.2 - 2025-08-20

### 🧹 Code Cleanup and Refactoring
**代码清理重构**: 简化项目结构，移除冗余代码和配置

#### 文件重构
- **重命名主文件**: `improved_graph.py` → `graph.py`，简化文件命名
- **函数重命名**: `build_improved_graph()` → `build_graph()`，保持命名一致性
- **移除冗余文件**: 删除旧的graph.py备份和临时文件

#### 配置清理
- **精简config.yaml**: 移除已注释的旧配置项和冗余字段
- **移除过期提示**: 清理legacy prompts和未使用的synthesis prompts
- **统一日志配置**: 简化logging配置结构

#### 导入更新
- **更新主模块**: 修改service/main.py中的import语句
- **清理缓存**: 移除所有__pycache__目录

#### 验证
- ✅ 服务正常启动
- ✅ 健康检查通过
- ✅ API功能正常

---

## v0.4.1 - 2025-08-20

### 🎨 Markdown Output Format Upgrade
**重大用户体验提升**: Agent输出格式从JSON转换为Markdown，提升可读性和用户体验

#### 核心改进
- **Markdown格式输出**: Agent现在生成Markdown格式响应，包含结构化标题、列表和引用
- **增强引用处理**: 新增`_extract_citations_from_markdown()`函数，从Markdown文本中提取引用信息
- **向下兼容性**: Post-process节点同时支持JSON（旧格式）和Markdown（新格式）响应
- **智能格式检测**: 自动检测响应格式并相应处理
- **完整日志记录**: 添加详细调试日志，跟踪响应格式检测和处理过程

#### 技术实现
- **系统提示更新**: 修改agent_system_prompt明确要求Markdown格式输出
- **双格式处理**: `post_process_node`增强，支持JSON/Markdown双格式
- **流式事件验证**: 确保所有流式事件（tool_start, tool_result, tokens, agent_done）正常工作
- **服务重启检测**: 配置变更需要服务重启才能生效

#### 测试验证
- ✅ 流式集成测试确认Markdown输出
- ✅ 事件流验证通过
- ✅ 引用映射正确生成
- ✅ agent_done事件正确发送

---

## v0.4.0 - 2025-08-20

### 🚀 LangGraph v0.6.0+ Best Practices Implementation
**重大架构升级**: 完全重构LangGraph实现，遵循v0.6.0+最佳实践，实现真正的autonomous agent workflow

#### 核心改进
- **TypedDict状态管理**: 使用`TypedDict`替换`BaseModel`，完全符合LangGraph v0.6.0+标准
- **Function Calling Agent**: 实现纯function calling模式，摒弃ReAct，减少LLM调用次数和token消耗
- **Autonomous Tool Usage**: Agent可根据上下文自动使用合适工具，支持基于前面输出的连续工具调用
- **Integrated Synthesis**: 将synthesis步骤整合到agent节点，减少额外LLM调用

#### 架构优化
- **简化工作流**: Agent → Tools → Agent → Post-process (更符合LangGraph标准模式)
- **减少LLM调用**: 从3次LLM调用减少到1-2次，显著降低token消耗
- **标准化工具绑定**: 使用LangChain `bind_tools()`和标准tool schema
- **改进状态传递**: 遵循LangGraph `add_messages`模式

#### 技术细节
- **新文件**: `service/graph/improved_graph.py` - 实现v0.6.0+最佳实践
- **Agent System Prompt**: 更新为支持autonomous function calling的prompt
- **工具执行**: 保持streaming支持的同时简化执行逻辑
- **后处理节点**: 仅处理格式化和事件发送，不再调用LLM

#### 测试与验证
- **测试脚本**: `scripts/test_improved_langgraph.py` - 验证新实现
- **工具调用**: ✅ 自动调用retrieve_standard_regulation和retrieve_doc_chunk_standard_regulation
- **事件流**: ✅ 支持tool_start、tool_result等streaming events
- **状态管理**: ✅ 正确的TypedDict状态传递

#### 配置更新
- **新增**: `agent_system_prompt` - 专为autonomous agent设计的system prompt
- **保持向后兼容**: 原有配置和接口保持不变

## v0.3.6 - 2025-08-20

### Major LangGraph Optimization Implementation ⚡
- **正式实施LangGraph优化方案**: 完成了生产代码中的LangGraph最佳实践实施
- **重构主要组件**:
  - 使用`StateGraph`、`add_node`、`conditional_edges`替代自定义工作流
  - 实现`@tool`装饰器模式，提高工具定义的DRY原则
  - 简化状态管理，使用LangGraph标准`AgentState`
  - 模块化节点函数：`call_model`、`run_tools`、`synthesis_node`、`post_process_node`

### Technical Improvements
- **代码质量提升**: 遵循LangGraph官方示例的设计模式
- **维护性**: 减少重复代码，提高可读性和可测试性
- **标准化**: 使用社区认可的LangGraph工作流编排方式
- **依赖管理**: 添加langgraph>=0.2.0到项目依赖

### Performance & Architecture
- **预期性能提升**: 基于之前分析，预计35%的性能改进
- **更清晰的控制流**: 使用conditional_edges进行决策路由
- **工具执行优化**: 标准化工具调用和结果处理流程
- **错误处理**: 改进的异常处理和降级策略

### Implementation Status
- ✅ 核心LangGraph工作流实现完成
- ✅ 工具装饰器模式实施
- ✅ 状态管理优化
- ✅ 依赖更新和导入修复
- ✅ **集成测试全部通过** (4/4, 100%成功率)
- ✅ **单元测试全部通过** (20/20, 100%成功率)
- ✅ **工作流验证成功**: 工具调用、流式响应、条件路由正常
- ✅ **API兼容性**: 与现有前端和接口完全兼容

### Test Results
- **核心功能**: 服务健康、API文档、图构建全部正常
- **工作流执行**: call_model → tools → synthesis 流程验证成功
- **工具调用**: 检测到正确的工具调用事件(retrieve_standard_regulation, retrieve_doc_chunk_standard_regulation)
- **流式响应**: 376个SSE事件正确接收和处理
- **会话管理**: 多轮对话功能正常

## v0.3.5 - 2025-08-20

### Research & Analysis
- **LangGraph实现优化研究 (LangGraph Implementation Optimization)**
  - **官方示例分析**: 研究了assistant-ui-langgraph-fastapi官方示例
  - **创建简化版本**: 实现了基于LangGraph最佳实践的简化版本 (`simplified_graph.py`)
  - **性能对比**: 简化版本比当前实现快35%，代码量减少50%
  - **最佳实践应用**: 使用`@tool`装饰器、标准LangGraph模式和简化状态管理

### Key Findings
- **代码更简洁**: 从400行减少到200行代码
- **更标准化**: 遵循LangGraph社区约定和最佳实践
- **性能提升**: 35%的执行时间改进
- **维护性**: 更模块化和可测试的代码结构

### Next Steps
- 需要将简化版本的功能完善到与当前版本等效
- 考虑逐步迁移到标准LangGraph模式
- 保持现有SSE流式处理和citation功能

## v0.3.4 - 2025-08-20

### Housekeeping
- **代码目录整理 (Code Organization)**
  - **临时脚本迁移**: 将所有临时测试和演示脚本从 `scripts/` 迁移到 `tests/tmp/`
  - **脚本分离**: `scripts/` 目录现在只包含生产用脚本（服务管理等）
  - **整洁架构**: 提高代码可维护性和目录结构的清晰度

### Moved Files
- `scripts/startup_demo.py` → `tests/tmp/startup_demo.py`
- `scripts/test_startup_modes.py` → `tests/tmp/test_startup_modes.py`

### Directory Structure Clean-up
- **`scripts/`**: 只包含生产脚本（start_service.sh, stop_service.sh 等）
- **`tests/tmp/`**: 包含所有临时测试和演示脚本
- **`.tmp/`**: 包含调试和开发时临时文件

## v0.3.3 - 2025-08-20

### Enhanced
- **服务启动方式重大改进 (Service Startup Improvements)**
  - **默认前台运行**: 服务现在默认在前台运行，便于开发调试和实时查看日志
  - **优雅停止**: 前台模式支持 `Ctrl+C` 优雅停止服务
  - **多种启动模式**: 支持前台、后台、开发模式三种启动方式
  - **改进的脚本**: `scripts/start_service.sh` 支持 `--background` 和 `--dev` 参数
  - **增强的 Makefile**: 新增 `make start-bg` 命令用于后台启动
  - **详细的使用指南**: 新增 `docs/SERVICE_STARTUP_GUIDE.md` 完整说明

### Service Management Commands
- `make start` - 前台运行（默认，推荐开发）
- `make start-bg` - 后台运行（适合生产）
- `make dev-backend` - 开发模式（自动重载）
- `make stop` - 停止服务
- `make status` - 检查服务状态

### Script Options
- `./scripts/start_service.sh` - 前台运行（默认）
- `./scripts/start_service.sh --background` - 后台运行
- `./scripts/start_service.sh --dev` - 开发模式

### Documentation
- 新增 `docs/SERVICE_STARTUP_GUIDE.md` - 详细的服务启动指南
- 更新 `README.md` - 反映新的启动方式和最佳实践
- 更新 Makefile 帮助信息

## v0.3.2 - 2025-08-20

### Enhanced
- **UI 优化 (UI Improvements)**
  - **图标闪烁频率降低**: 将工具执行时的图标闪烁从快速脉冲改为2秒慢速脉冲 (`animate-pulse-slow`)，减少视觉干扰
  - **移除头像区域**: 隐藏助手和用户头像，为聊天内容提供更大显示空间
  - **布局优化**: 将主容器最大宽度从 `max-w-4xl` 扩展到 `max-w-5xl`，充分利用移除头像后的额外空间
  - **消息间距优化**: 增加助手回复内容区域上方的间距 (`margin-top: 1.5rem`)，改善工具调用框与回答内容的视觉分离
  - **自动隐藏滚动条**: 为聊天区域添加自动隐藏滚动条样式，提升视觉美观度
  - **消息区域底色**: 为助手消息区域添加淡色背景 (`bg-muted/30`)，提升内容可读性
  - **等待动画效果**: 启用assistant-ui等待消息内容时的动画效果，包括"AI is thinking..."指示器、类型输入点、工具调用微光效果和消息出现动画
  - **工具状态颜色优化**: 优化工具调用进度文字颜色，使其符合整体设计系统色谱
  - **工具状态对齐优化**: 调整工具调用进度文字位置，使其与工具标题横向对齐
  - **CSS改进**: 通过CSS选择器隐藏头像元素，调整消息布局以移除头像占用的空间

### Technical Details
- 添加 `animate-pulse-slow` 自定义动画类 (2秒周期，透明度0.6-1.0渐变)
- 通过CSS隐藏 `[data-testid="avatar"]` 和 `.aui-avatar` 元素
- 调整消息容器的 `margin-left` 和 `padding-left` 为0
- 工具图标使用 `animate-pulse-slow` 替代 `animate-pulse`
- 为助手消息内容区域添加 `margin-top: 1.5rem`，增加与工具调用框的间距
- 滚动条样式: `scrollbar-hide` (webkit) 和 `scrollbar-width: none` (firefox)
- assistant-ui 等待动画包括:
  - `.aui-composer-attachment-root[data-state="loading"]`: 加载状态脉冲动画
  - `.aui-message[data-loading="true"]`: 消息加载时的类型输入点动画
  - `.aui-tool-call[data-state="loading"]`: 工具调用微光效果
  - `.aui-thread[data-state="running"] .aui-composer::before`: "AI is thinking..." 指示器
- 工具状态颜色系统:
  - `.tool-status-running`: Primary blue (80% opacity) - 蓝色运行状态
  - `.tool-status-processing`: Warm amber (80% opacity) - 温暖琥珀色处理状态
  - `.tool-status-complete`: Emerald green - 翠绿色完成状态
  - `.tool-status-error`: Destructive red (80% opacity) - 红色错误状态
- 工具布局: 使用 `justify-between` 实现标题和状态文字的横向对齐

## v0.3.1 - 2025-08-20

### Enhanced
- **UI Animations**: Applied `assistant-ui` animation effects with fade-in and slide-in for tool calls and responses using custom Tailwind CSS utilities.
- **Tool Icons**: Configured `retrieve_standard_regulation` tool to use `legal-document.png` icon and `retrieve_doc_chunk_standard_regulation` to use `search.png`.
- **Component Updates**: Updated `ToolUIs.tsx` to integrate Next.js `Image` component for custom icons.
- **CSS Enhancements**: Defined custom keyframes and utility classes in `globals.css` for animation support.
- **Tailwind Config**: Added `tailwindcss-animate` and `@assistant-ui/react-ui/tailwindcss` plugins in `tailwind.config.ts`.

## v0.3.0 - 2025-08-20

### Added
- **Function-call based autonomous agent**
  - LLM-driven dynamic tool selection and multi-round iteration
  - Integration of `retrieve_standard_regulation` and `retrieve_doc_chunk_standard_regulation` tools via OpenAI function calling
- **LLM client enhancements**: `bind_tools()`, `ainvoke_with_tools()` for function-calling support
- **Agent workflow refactoring**: `AgentNode` and `AgentWorkflow` redesigned for autonomous execution
- **Configuration updates**: New prompts in `config.yaml` (`agent_system_prompt`, `synthesis_system_prompt`, `synthesis_user_prompt`)
- **Test scripts**: Added `scripts/test_autonomous_agent.py` and `scripts/test_autonomous_api.py`
- **Documentation**: Created `docs/topics/AUTONOMOUS_AGENT_UPGRADE.md` covering the new architecture

### Changed
- Refactored RAG pipeline to function-call based autonomy
- Backward-compatible CLI/API endpoints and prompts maintained

### Fixed
- N/A

## v0.2.9

### Added
- **🌍 多语言支持 (Multi-Language Support)**
  - **自动语言检测**: 根据浏览器首选语言自动切换界面语言
  - **URL参数覆盖**: 支持通过 `?lang=zh` 或 `?lang=en` URL参数强制指定语言
  - **语言切换器**: 页面右上角提供便捷的语言切换按钮
  - **持久化存储**: 用户选择的语言偏好保存到 localStorage
  - **全面本地化**: 包括页面标题、工具名称、状态消息、按钮文本等所有UI元素

### Technical Features
- **i18n架构**: 完整的国际化基础设施
  - 类型安全的翻译系统 (`lib/i18n.ts`)
  - React Hook集成 (`hooks/useTranslation.ts`)
  - 实时语言切换支持
- **URL状态同步**: 语言选择自动同步到URL，支持直接分享多语言链接
- **事件驱动更新**: 基于自定义事件的响应式语言切换机制

### Languages Supported
- **中文** (zh): 完整的中文界面，包括工具调用状态和结果展示
- **English** (en): 完整的英文界面，专业术语准确翻译

### User Experience
- **智能默认值**:
  1. 优先使用URL参数指定的语言
  2. 其次使用用户保存的语言偏好
  3. 最后回退到浏览器首选语言
- **无缝切换**: 语言切换无需页面刷新，即时生效
- **开发者友好**: 易于扩展新语言，翻译字符串集中管理

## v0.2.8

### Enhanced
- **Tool UI Redesign**: Completely redesigned tool call UI with assistant-ui pre-built components
  - **Drawer-style Interface**: Tool calls now display as collapsible cards by default, showing only name and status
  - **Expandable Details**: Click to expand/collapse tool details (query, results, etc.)
  - **Simplified Components**: Removed complex inline styling in favor of Tailwind CSS classes
  - **Better UX**: Tool calls are less intrusive while remaining accessible
  - **Status Indicators**: Clear visual feedback for running, completed, and error states
  - **Chinese Localization**: Tool names and status messages in Chinese for better user experience

### Technical
- **Tailwind Integration**: Enhanced Tailwind config with full shadcn/ui color variables and animation support
  - Added `tailwindcss-animate` dependency via pnpm
  - Configured `@assistant-ui/react-ui/tailwindcss` with shadcn theme support
  - Added comprehensive CSS variables for consistent theming
- **Component Architecture**: Improved separation of concerns with cleaner component structure
- **State Management**: Added local state management for tool expansion/collapse functionality

## v0.2.7

### Changed
- **Script Organization**: Moved `start_service.sh` and `stop_service.sh` into the `/scripts` directory for better structure.
- **Makefile Updates**: Updated `make start`, `make stop`, and `make dev-backend` to reference scripts in `/scripts`.
- **VSCode Tasks**: Adjusted `.vscode/tasks.json` to run service management scripts from `/scripts`.

## v0.2.6

### Fixed
- **Markdown Rendering**: Enabled rendering of assistant messages as markdown in the chat UI.
  - Correctly pass `assistantMessage.components.Text` to the `Thread` component.
  - Updated CSS import to use `@assistant-ui/react-markdown/styles/dot.css`.

### Added
- **MarkdownText Component**: Introduced `MarkdownText` via `makeMarkdownText()` in `web/src/components/ui/markdown-text.tsx`.
- **Thread Configuration**: Updated `web/src/app/page.tsx` to configure `Thread` for markdown with `assistantMessage.components`.

### Changed
- **CSS Imports**: Replaced incorrect markdown CSS imports in `globals.css` with the correct path from `@assistant-ui/react-markdown`.

## v0.2.5

### Fixed
- **React Infinite Loop Error**: Resolved "Maximum update depth exceeded" error in tool UI registration
  - **Problem**: Incorrect usage of useToolUIs hook causing setState循环导致的forceStoreRerender无限调用
  - **Solution**: Adopted correct assistant-ui pattern - direct component usage instead of manual registration
  - **Implementation**: Place tool UI components directly inside AssistantRuntimeProvider (not via setToolUI)
  - **UI Stability**: 前端现在可以正常加载，无React运行时错误

### Added
- **Tool UI Components**: Implemented custom assistant-ui tool UI components for enhanced user experience
  - **RetrieveStandardRegulationUI**: Visual component for standard regulation search with query display and result summary
  - **RetrieveDocChunkStandardRegulationUI**: Visual component for document chunk retrieval with content preview
  - **Tool UI Registration**: Proper registration system using useToolUIs hook and setToolUI method
  - **Visual Feedback**: Tool calls now display as interactive UI elements instead of raw JSON data

### Enhanced
- **Interactive Tool Display**: Tool calls now rendered as branded UI components with:
  - 🔍 Search icons and status indicators (Searching... / Processing...)
  - Query display with formatted text
  - Result summaries with document codes, titles, and content previews
  - Color-coded status (blue for running, green/orange for results)
  - Responsive design with proper spacing and typography

### Technical
- **Frontend Architecture**: Updated page.tsx to properly register tool UI components
  - Import useToolUIs hook from @assistant-ui/react
  - Created ToolUIRegistration component for clean separation of concerns
  - TypeScript-safe implementation with proper type handling for args, result, and status

## v0.2.4

### Fixed
- **Post-Append Events Display**: Fixed missing UI display of post-processing events
  - **Problem**: Last 3 post-append events were sent as type 2 (data) events but not displayed in UI
  - **Solution**: Modified AI SDK adapter to convert post-append events to visible text streams
  - **post_append_2**: Tool execution summary now displays as formatted text: "🛠️ **Tool Execution Summary**"
  - **post_append_3**: Notice message now displays as formatted text: "⚠️ **AI can make mistakes. Please check important info.**"
  - **UI Compliance**: All three post-append events now visible in assistant-ui interface

### Enhanced
- **User Experience**: Post-processing information now properly integrated into chat flow
  - Tool execution summaries provide transparency about backend operations
  - Warning notices ensure users are informed about AI limitations
  - Formatted display improves readability and user awareness

## v0.2.3

### Verified
- **Post-Processing Node Compliance**: Confirmed full compliance with prompt.md specification
  - ✅ Post-append event 1: Agent's final answer + citations_mapping_csv (excluding tool raw prints)
  - ✅ Post-append event 2: Consolidated printout of all tool call outputs used for this turn
  - ✅ Post-append event 3: Trailing notice "AI can make mistakes. Please check important info."
  - All three events sent in correct order after agent completion
  - Events properly formatted in AI SDK Data Stream Protocol (type 2 - data events)

### Debugging Tools Added
- **Debug Scripts**: Added comprehensive debugging utilities for post-processing verification
  - `debug_ai_sdk_raw.py`: Inspects raw AI SDK endpoint responses for post-append events
  - `test_post_append_final.py`: Validates all three post-append events in correct order
  - `debug_post_append_format.py`: Analyzes post-append event structure and content
  - Server-side logging in PostProcessNode for event generation verification

### Tests
- **Post-Append Compliance Test**: Complete validation of prompt.md requirements
  - ✅ Total chunks: 864, all post-append events found at correct positions (861, 862, 863)
  - ✅ Post-append 1: Contains answer (854 chars) + citations (494 chars)
  - ✅ Post-append 2: Contains tool outputs (2 tools executed)
  - ✅ Post-append 3: Contains exact notice message as specified
  - **Final Result**: FULLY COMPLIANT with prompt.md specification

## v0.2.2

### Fixed
- **UI Content Display**: Fixed PostProcessNode content not appearing in assistant-ui interface
  - Modified AI SDK adapter to stream final answers as text events (type 0)
  - Updated adapter to extract answer content from post_append_1 events correctly
  - Fixed event formatting to ensure proper UI rendering compatibility

### Tests
- **Integration Test Success**: Complete workflow validation confirms perfect system integration
  - ✅ AI SDK endpoint streaming protocol fully operational
  - ✅ Tool call events (type 9) and tool result events (type a) working correctly
  - ✅ Text streaming events (type 0) rendering final answers properly
  - ✅ Assistant-ui compatibility with LangGraph backend confirmed
  - **Test Results**: 2 tool calls, 2 tool results, 509 text events, 1 finish event
  - **Content Validation**: Complete answer with citations, references, and proper formatting
  - **UI Rendering**: Real-time streaming display with tool execution visualization

## v0.2.1

### Fixed
- **Message Format Compatibility**: Fixed assistant-ui to backend message format conversion
  - assistant-ui sends `content: [{"type": "text", "text": "message"}]` array format
  - Backend expects `content: "message"` string format
  - Added transformation logic in `/web/src/app/api/chat/route.ts` to convert formats
  - Resolved Pydantic validation error: "Input should be a valid string [type=string_type]"
- **End-to-End Chat Flow**: Verified complete user input → format conversion → tool execution → streaming response pipeline

### Added
- **Assistant-UI Integration**: Complete integration with @assistant-ui/react framework for professional chat interface
- **Data Stream Protocol**: Full implementation of Vercel AI SDK Data Stream Protocol for real-time streaming
- **Custom Tool UIs**: Rich visual components for different tool types:
  - Document retrieval UI with relevance scoring and source information
  - Web search UI with result links and snippets
  - Python code execution UI with stdout/stderr display
  - URL fetching UI with page content preview
  - Code analysis UI with suggestions and feedback
- **Next.js 15 Frontend**: Modern React 19 + TypeScript + Tailwind CSS v3 web application
- **Responsive Design**: Mobile-friendly interface with dark/light theme support
- **Streaming Visualization**: Real-time display of AI reasoning steps and tool executions

### Enhanced
- **Simplified UI Architecture**: Streamlined web interface with minimal code and default styling
  - Removed custom tool UI components in favor of assistant-ui defaults
  - Reduced `/web/src/app/page.tsx` to essential AssistantRuntimeProvider and Thread components
  - Simplified `/web/src/app/globals.css` to basic reset and assistant-ui imports only
  - Minimized `/web/tailwind.config.ts` configuration for cleaner build
  - Removed unnecessary dependencies for lighter bundle size
- **Backend Protocol Compliance**: Updated AI SDK adapter to match official Data Stream Protocol specification
- **Event Format**: Standardized to `TYPE_ID:JSON\n` format for all streaming events
- **Tool Call Visualization**: Step-by-step visualization of multi-tool workflows
- **Error Handling**: Comprehensive error states and recovery mechanisms
- **Performance**: Optimized streaming and rendering for smooth user experience

### Technical Implementation
- **Protocol Mapping**: Proper mapping of LangGraph events to Data Stream Protocol types:
  - Type 0: Text streaming (tokens)
  - Type 9: Tool calls with arguments

### Integration Testing Results ✅
- **Frontend Service**: Successfully deployed on localhost:3000 with Next.js 15 + Turbopack
- **Backend Service**: Healthy and responsive on localhost:8000 (FastAPI + LangGraph)
- **API Proxy**: Correct routing from `/api/chat` to backend AI SDK endpoint with format conversion
- **Message Format**: assistant-ui array format correctly converted to backend string format
- **Streaming Protocol**: Data Stream Protocol events properly formatted and transmitted
- **Tool Execution**: Multi-step tool calls working (retrieve_standard_regulation, etc.)
- **UI Rendering**: assistant-ui components properly rendered with default styling
- **End-to-End Flow**: Complete user query → tool execution → streaming response pipeline verified
  - Format conversion: assistant-ui array format → backend string format
  - Tool execution validation: retrieve_standard_regulation, retrieve_doc_chunk_standard_regulation
  - Real-time streaming with proper Data Stream Protocol compliance
  - Content relevance verification: automotive safety standards and testing procedures
  - Type a: Tool results
  - Type d: Message completion
  - Type 3: Error handling
- **Runtime Integration**: `useDataStreamRuntime` for seamless assistant-ui integration
- **API Proxy**: Next.js API route for backend communication with proper headers
- **Component Architecture**: Modular tool UI components with makeAssistantToolUI

### Documentation
- **Protocol Reference**: Enhanced `docs/topics/AI_SDK_UI.md` with implementation details
- **Integration Guide**: Comprehensive setup and testing procedures
- **API Compatibility**: Dual endpoint support for legacy and modern integrations

# v0.1.7

### Changed
- **Simplified Web UI**: Replaced Tailwind CSS with inline styles for simpler, more maintainable code
- **Reduced Dependencies**: Removed complex styling frameworks in favor of vanilla CSS-in-JS approach
- **Cleaner Interface**: Simplified chatbot UI with essential functionality and clean default styling
- **Streamlined Code**: Reduced component complexity by removing unnecessary features like timestamps and session display

### Improved
- **Code Maintainability**: Easier to understand and modify without external CSS framework dependencies
- **Performance**: Lighter bundle size without Tailwind CSS classes
- **Accessibility**: Cleaner DOM structure with semantic HTML and inline styles

### Removed
- **Tailwind CSS Classes**: Replaced complex utility classes with simple inline styles
- **Timestamp Display**: Removed message timestamps for cleaner interface
- **Session ID Display**: Simplified footer by removing session information
- **Complex Animations**: Simplified loading indicators and removed complex animations

### Technical Details
- Maintained all core functionality (streaming, error handling, message management)
- Preserved AI SDK Data Stream Protocol compatibility
- Kept responsive design with percentage-based layouts
- Used standard CSS properties for styling (flexbox, basic colors, borders)

# v0.1.6

### Fixed
- **Web UI Component Error**: Resolved "The default export is not a React Component in '/page'" error caused by empty `page.tsx` file
- **AI SDK v5 Compatibility**: Fixed compatibility issues with Vercel AI SDK v5 API changes by implementing custom streaming solution
- **TypeScript Errors**: Resolved compilation errors related to deprecated `useChat` hook properties in AI SDK v5
- **Frontend Dependencies**: Ensured all required AI SDK dependencies are properly installed and configured

### Changed
- **Custom Streaming Implementation**: Replaced AI SDK v5 `useChat` hook with custom streaming solution for better control and compatibility
- **Direct Protocol Handling**: Implemented direct AI SDK Data Stream Protocol parsing in frontend for real-time message updates
- **Enhanced Error Handling**: Added comprehensive error handling for network issues and streaming failures
- **Message State Management**: Improved message state management with TypeScript interfaces and proper typing

### Technical Implementation
- **Custom Stream Reader**: Implemented `ReadableStream` processing with `TextDecoder` for chunk-by-chunk data handling
- **Protocol Parsing**: Direct parsing of AI SDK protocol lines (`0:`, `9:`, `a:`, `d:`, `2:`) in frontend
- **Real-time Updates**: Optimized message content updates during streaming for smooth user experience
- **Session Management**: Added session ID generation and tracking for conversation context

### Validated
- ✅ Frontend compiles without TypeScript errors
- ✅ Chat interface loads successfully at http://localhost:3000
- ✅ Custom streaming implementation works with backend AI SDK endpoint
- ✅ Real-time message updates during streaming responses
- ✅ Error handling for failed requests and network issues

# v0.1.5

### Added
- **Web UI Chatbot**: Created comprehensive Next.js chatbot interface using Vercel AI SDK Elements in `/web` directory
- **AI SDK Protocol Adapter**: Implemented `service/ai_sdk_adapter.py` to convert internal SSE events to Vercel AI SDK Data Stream Protocol
- **AI SDK Compatible Endpoint**: Added new `/api/ai-sdk/chat` endpoint for frontend integration while maintaining backward compatibility
- **Frontend API Proxy**: Created Next.js API route `/api/chat/route.ts` to proxy requests between frontend and backend
- **Streaming UI Components**: Integrated real-time streaming display for tool calls, intermediate steps, and final answers
- **End-to-End Testing**: Added `test_ai_sdk_endpoint.py` for backend AI SDK endpoint validation

### Changed
- **Protocol Implementation**: Fully migrated to Vercel AI SDK Data Stream Protocol (SSE) for client-service communication
- **Event Type Mapping**: Enhanced event handling to support AI SDK protocol types (`9:`, `a:`, `0:`, `d:`, `2:`)
- **Multi-line SSE Processing**: Improved adapter to correctly handle multi-line SSE events from internal system
- **Frontend Architecture**: Established modern React-based chat interface with TypeScript and Tailwind CSS

### Technical Implementation
- **Frontend Stack**: Next.js 15.4.7, Vercel AI SDK (`ai`, `@ai-sdk/react`, `@ai-sdk/ui-utils`), TypeScript, Tailwind CSS
- **Backend Adapter**: Protocol conversion layer between internal LangGraph events and AI SDK format
- **Streaming Pipeline**: End-to-end streaming from LangGraph → Internal SSE → AI SDK Protocol → Frontend UI
- **Tool Call Visualization**: Real-time display of multi-step agent workflow including retrieval and generation phases

### Validated
- ✅ Backend AI SDK endpoint streaming compatibility
- ✅ Frontend-backend protocol integration
- ✅ Tool call event mapping and display
- ✅ Multi-line SSE event parsing
- ✅ End-to-end chat workflow functionality
- ✅ Service deployed and accessible at http://localhost:3001

### Documentation
- **Protocol Reference**: Enhanced `docs/topics/AI_SDK_UI.md` with implementation details
- **Integration Guide**: Comprehensive setup and testing procedures
- **API Compatibility**: Dual endpoint support for legacy and modern integrations

# v0.1.4

### Fixed
- **Streaming Token Display**: Fixed streaming test script to correctly read token content from `delta` field
- **Event Parsing**: Resolved issue where streaming logs showed empty answer tokens due to incorrect field access
- **Stream Validation**: Verified streaming API returns proper token content and LLM responses

### Added
- **Debug Script**: Added `debug_llm_stream.py` to inspect streaming chunk structure and validate token flow
- **Stream Testing**: Enhanced streaming test with proper token parsing and validation

### Changed
- **Test Script Enhancement**: 更新 `scripts/test_real_streaming.py` to display actual streamed tokens correctly
- **Event Processing**: Improved streaming event parsing and display logic for better debugging

# v0.1.3

### Added
- **Jinja2 Template Support**: Added comprehensive Jinja2 template rendering for LLM prompts
- **Template Utilities**: Created `service/utils/templates.py` for robust template processing
- **Template Validation**: Added test script `test_templates.py` to verify template rendering
- **Enhanced VS Code Debug Support**: Complete debugging configuration for development workflow

### Changed
- **Template Engine Migration**: Replaced Python `.format()` with Jinja2 template rendering
- **Variable Substitution**: Fixed template variable replacement in user and system prompts
- **Template Variables**: Added support for `output_language`, `user_query`, `conversation_history`, and `reference_document_chunks`
- **Error Handling**: Improved template rendering error handling and logging

### Fixed
- **Variable Substitution Bug**: Fixed issue where `{{variable}}` syntax was not being replaced in prompts
- **Template Context**: Ensured all required variables are properly passed to template renderer
- **Language Support**: Added configurable output language support (default: zh-CN)

### Technical Details
- Added `jinja2>=3.1.0` dependency to pyproject.toml
- Updated `service/graph/graph.py` to use Jinja2 template rendering
- Template variables now support complex data structures and safe rendering
- All template variables are properly escaped and validated

# v0.1.2

### Fixed
- Fixed configuration access pattern: refactored `config.prompts.rag` to use `config.get_rag_prompts()` method
- Fixed Azure OpenAI endpoint configuration: corrected `base_url` to use root endpoint without API path
- Fixed Azure OpenAI API version mismatch: updated `api_version` from "2024-02-01" to "2024-02-15-preview"
- Fixed streaming API error handling to properly propagate HTTP errors without silent failures

### Changed
- Improved error handling in streaming responses to surface external service errors
- Enhanced service stability by ensuring config/code consistency

### Validated
- Streaming API end-to-end functionality with tool execution and answer generation
- Azure OpenAI integration with correct endpoint configuration
- Error propagation and robust exception handling in streaming workflow

# v0.1.1

### Added
- Added service startup and stop scripts (`start_service.sh`, `stop_service.sh`)
- Added comprehensive service setup documentation (`SERVICE_SETUP.md`)
- Added support for environment variable substitution with default values (`${VAR:-default}`)
- Added LLM configuration structure in config.yaml for better organization

### Changed
- Updated `docs/config.yaml` based on `.coding/config.yaml` configuration
- Moved `config.yaml` to root directory for easier access
- Restructured configuration to support `llm.rag` section for prompts and parameters
- Improved `service/config.py` to handle new configuration structure
- Enhanced environment variable substitution logic

### Fixed
- Fixed SSE event parsing logic in integration test script to correctly associate `event:` and `data:` lines
- Improved streaming event validation for tool execution, error handling, and answer generation
- Fixed configuration loading to work with root directory placement
- Fixed port mismatch in integration test script to connect to correct service port
- Fixed prompt access issue: changed from `config.prompts.rag` to `config.get_rag_prompts()` method

### Added
- Added comprehensive integration tests for streaming functionality
- Added robust error handling for missing OpenAI API key scenarios
- Added event streaming validation for tool results, errors, and completion events
- Added configurable port/host support in test scripts for flexible service connection

## Previous Changes

- Initial implementation of Agentic RAG system
- FastAPI-based streaming endpoints
- LangGraph-inspired workflow orchestration
- Retrieval tool integration
- Memory management with TTL
- Web client with EventSource streaming