Files
catonline_ai/vw-agentic-rag/docs/CHANGELOG.md
2025-09-26 17:15:54 +08:00

3086 lines
161 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Changelog
## v1.2.8 - Enhanced Agentic Workflow and Citation Management Documentation - Thu Sep 12 2025
### 📋 **Documentation** *(Design Document Enhancement)*
**Enhanced the system design documentation with detailed coverage of Agentic Workflow features and advanced citation management capabilities.**
#### Changes Made:
**1. Agentic Workflow Features Enhancement**:
- **Enhanced**: Agentic Workflow Features Demonstrated section with comprehensive query rewriting/decomposition coverage
- **Added**: Detailed "Query Rewriting/Decomposition in Agentic Workflow" section highlighting core intelligence features
- **Added**: "Citation Management in Agentic Workflow" section documenting advanced citation capabilities
- **Updated**: Workflow diagrams to explicitly show query rewriting and citation processing flows
**2. Citation Management Documentation**:
- **Enhanced**: Citation tracking and management documentation with controllable citation lists and links
- **Added**: Detailed citation processing workflow with real-time capture and quality validation
- **Updated**: Tool system architecture to show query processing pipeline integration
- **Added**: Multi-round citation coherence and cross-tool citation integration documentation
**3. Technical Architecture Updates**:
- **Updated**: Sequence diagrams to show query rewriter components and parallel execution
- **Enhanced**: Tool system architecture with query processing strategies
- **Added**: Domain-specific intelligence documentation for different query types
- **Updated**: Cross-agent learning documentation with advanced agentic intelligence features
**4. Design Principles Refinement**:
- **Updated**: Core feature list to highlight controllable citation management
- **Enhanced**: Query processing integration documentation
- **Added**: Strategic citation assignment and post-processing enhancement details
- **Updated**: System benefits documentation to reflect enhanced capabilities
---
## v1.2.7 - Comprehensive System Design Documentation - Tue Sep 10 2025
### 📋 **Documentation** *(System Architecture & Design Documentation)*
**Created comprehensive system design documentation with detailed architectural diagrams and design explanations.**
#### Changes Made:
**1. System Design Document Creation**:
- **Created**: `docs/design.md` - Complete architectural design documentation
- **Architecture Diagrams**: 15+ mermaid diagrams covering all system aspects
- **Design Explanations**: Detailed design principles and implementation rationale
- **Comprehensive Coverage**: All system layers from frontend to infrastructure
**2. Architecture Documentation**:
- **High-Level Architecture**: Multi-layer system overview with component relationships
- **Component Architecture**: Detailed breakdown of frontend, backend, and agent components
- **Workflow Design**: Multi-intent agent workflows and two-phase retrieval strategy
- **Data Flow Architecture**: Request-response flows and streaming data patterns
**3. Feature & System Documentation**:
- **Feature Architecture**: Core capabilities and tool system design
- **Memory Management**: PostgreSQL-based session persistence architecture
- **Configuration Architecture**: Layered configuration management approach
- **Security Architecture**: Multi-layered security implementation
**4. Deployment & Performance Documentation**:
- **Deployment Architecture**: Production deployment patterns and container architecture
- **Performance Architecture**: Optimization strategies across all system layers
- **Technology Stack**: Complete technology selection rationale and integration
- **Future Enhancements**: Roadmap and enhancement strategy
#### Documentation Features:
**Visual Architecture**:
- **15+ Mermaid Diagrams**: Comprehensive visual representation of system architecture
- **Component Relationships**: Clear visualization of component interactions
- **Data Flow Patterns**: Detailed request-response and streaming flow diagrams
- **Deployment Topology**: Production deployment and scaling architecture
**Design Explanations**:
- **Design Philosophy**: Core principles driving architectural decisions
- **Implementation Rationale**: Detailed explanation of design choices
- **Best Practices**: Production-ready patterns and recommendations
- **Performance Considerations**: Optimization strategies and trade-offs
**Comprehensive Coverage**:
- **Frontend Architecture**: Next.js, React, and assistant-ui integration
- **Backend Architecture**: FastAPI, LangGraph, and agent orchestration
- **Data Architecture**: PostgreSQL memory, Azure AI Search, and LLM integration
- **Infrastructure Architecture**: Cloud deployment, security, and monitoring
#### Technical Documentation:
**System Layers Documented**:
```
- Frontend Layer: Next.js Web UI, Thread Components, Tool UIs
- API Gateway Layer: Next.js API Routes, Data Stream Protocol
- Backend Service Layer: FastAPI Server, AI SDK Adapter, SSE Controller
- Agent Orchestration Layer: LangGraph Workflow, Intent Recognition, Agents
- Memory Layer: PostgreSQL Session Store, Checkpointer, Memory Manager
- Retrieval Layer: Azure AI Search, Embedding Service, Search Indices
- LLM Layer: LLM Provider, Configuration Management
```
**Key Architectural Patterns**:
- **Multi-Intent Agent System**: Intent recognition and specialized agent routing
- **Two-Phase Retrieval**: Metadata discovery followed by content retrieval
- **Streaming Architecture**: Real-time SSE with tool progress tracking
- **Session Memory**: PostgreSQL-based persistent conversation history
- **Tool System**: Modular, composable retrieval and analysis tools
#### Benefits:
**For Development Team**:
- **Clear Architecture Understanding**: Complete system overview for new team members
- **Design Rationale**: Understanding of architectural decisions and trade-offs
- **Implementation Guidance**: Best practices and patterns for future development
- **Maintenance Support**: Clear documentation for troubleshooting and updates
**For System Architecture**:
- **Documentation Standards**: Establishes pattern for future architectural documentation
- **Design Consistency**: Ensures architectural decisions align with documented principles
- **Knowledge Preservation**: Captures institutional knowledge about system design
- **Future Planning**: Provides foundation for system evolution and enhancement
**For Operations**:
- **Deployment Understanding**: Clear view of production architecture and dependencies
- **Troubleshooting Guide**: Architectural context for debugging and issue resolution
- **Scaling Guidance**: Understanding of system scaling patterns and limitations
- **Security Overview**: Complete security architecture and implementation details
#### File Structure:
```
docs/
├── design.md # Comprehensive system design document (NEW)
├── CHANGELOG.md # This changelog with design documentation entry
├── deployment.md # Deployment-specific guidance
├── development.md # Development setup and guidelines
└── testing.md # Testing strategies and procedures
```
#### Next Steps:
- **Living Documentation**: Keep design document updated with system changes
- **Architecture Reviews**: Use document as reference for architectural decisions
- **Onboarding**: Include design document in new developer onboarding process
- **Documentation Standards**: Apply similar documentation patterns to other system aspects
---
## v1.2.6 - GPT-5 Model Integration and Prompt Template Refinement - Mon Sep 9 2025
### 🚀 **Major Update** *(Model Integration & Enhanced Agent Capabilities)*
**Integrated GPT-5 Chat model with refined prompt templates for improved reasoning and tool coordination.**
#### Changes Made:
**1. GPT-5 Model Integration**:
- **Model Upgrade**: Switched from GPT-4o to `gpt-5-chat` deployment
- **Azure Endpoint**: Updated to `aihubeus21512504059.cognitiveservices.azure.com`
- **API Version**: Upgraded to `2024-12-01-preview` for latest capabilities
- **Enhanced Reasoning**: Leveraging GPT-5's improved reasoning for complex multi-step retrieval
**2. Prompt Template Optimization for GPT-5**:
- **Tool Coordination**: Enhanced instructions for better parallel tool execution
- **Context Management**: Optimized for GPT-5's extended context handling capabilities
- **Reasoning Chain**: Improved workflow instructions leveraging advanced reasoning abilities
**3. Agent System Refinements**:
- **Phase Detection**: Better triggering conditions for Phase 2 document content retrieval
- **Query Rewriting**: Enhanced sub-query generation strategies optimized for GPT-5
- **Citation Accuracy**: Improved metadata tracking and source verification
#### Technical Implementation:
**Updated [`config.yaml`](config.yaml)**:
```yaml
azure:
base_url: https://aihubeus21512504059.cognitiveservices.azure.com/
api_key: 277a2631cf224647b2a56f311bd57741
api_version: 2024-12-01-preview
deployment: gpt-5-chat
```
**Enhanced [`llm_prompt.yaml`](llm_prompt.yaml)** - Phase 2 Triggers:
```yaml
# Phase 2: Document Content Detailed Retrieval
- **When to execute**: execute Phase 2 if the user asks about:
- "How to..." / "如何..." (procedures, methods, steps)
- Testing methods / 测试方法
- Requirements / 要求
- Technical details / 技术细节
- Implementation guidance / 实施指导
- Specific content within standards/regulations
```
**Tool Coordination Instructions**:
```yaml
# Parallel Retrieval Tool Call:
- Use each rewritten sub-query to call retrieval tools **in parallel**
- This maximizes coverage and ensures comprehensive information gathering
```
#### Key Features:
**GPT-5 Enhanced Capabilities**:
- **Advanced Reasoning**: Better understanding of complex technical queries
- **Improved Tool Coordination**: More efficient parallel tool execution planning
- **Enhanced Context Synthesis**: Better integration of multi-source information
- **Precise Citation Generation**: More accurate source tracking and reference mapping
**Optimized Retrieval Strategy**:
- **Smart Phase Detection**: GPT-5 better determines when detailed content retrieval is needed
- **Context-Aware Queries**: More sophisticated query rewriting based on conversation context
- **Cross-Reference Validation**: Enhanced ability to verify information across multiple sources
**Enhanced User Experience**:
- **Faster Response**: More efficient tool coordination reduces overall response time
- **Higher Accuracy**: Improved reasoning leads to more precise answers
- **Better Coverage**: Enhanced query strategies maximize information discovery
#### Performance Improvements:
- **Tool Efficiency**: Better parallel execution planning reduces redundant calls
- **Context Utilization**: Enhanced ability to maintain context across tool rounds
- **Quality Assurance**: Improved verification and synthesis of retrieved information
#### Migration Notes:
- **Seamless Upgrade**: No breaking changes to existing API or user interfaces
- **Backward Compatibility**: Existing conversation histories remain compatible
- **Enhanced Responses**: Users will notice improved response quality and accuracy
- **Tool Round Optimization**: GPT-5's reasoning works optimally with configured tool round limits
---
## v1.2.5 - Enhanced Multi-Phase Retrieval and Tool Round Optimization - Thu Sep 5 2025
### 🔧 **Enhancement** *(Agent System Prompt & Retrieval Strategy)*
**Optimized retrieval workflow with explicit parallel tool calling strategy and enhanced multi-language query coverage.**
#### Changes Made:
**1. Enhanced Multi-Phase Retrieval Strategy**:
- **Phase 1 - Metadata Discovery**: Added explicit "2-3 parallel rewritten queries" strategy for standards/regulations metadata discovery
- **Phase 2 - Document Content**: Refined detailed retrieval with "2-3 parallel rewritten queries with different content focus"
- **Cross-Language Coverage**: Mandatory inclusion of both Chinese and English query variants for comprehensive search coverage
**2. Parallel Tool Calling Optimization**:
- **Query Strategy Specification**: Clear guidance on generating 2-3 distinct parallel sub-queries per retrieval phase
- **Azure AI Search Optimization**: Enhanced for Hybrid Search (keyword + vector search) with specific terminology and synonyms
- **Tool Calling Efficiency**: Explicit instruction to execute rewritten sub-queries in parallel for maximum coverage
**3. Intent Classification Improvements**:
- **Standard_Regulation_RAG**: Enhanced examples covering content, scope, testing methods, and technical details
- **User_Manual_RAG**: Comprehensive coverage of CATOnline system usage, TRRC processes, and administrative functions
- **Clearer Boundaries**: Better distinction between technical content queries vs system usage queries
**4. User Manual Prompt Refinement**:
- **Evidence-Based Only**: Strengthened directive for 100% grounded responses from user manual content
- **Visual Integration**: Enhanced screenshot embedding requirements with strict formatting templates
- **Context Disambiguation**: Added role-based function differentiation (User vs Administrator)
#### Technical Implementation:
**Updated [`llm_prompt.yaml`](llm_prompt.yaml)** - Agent System Prompt:
```yaml
# Query Optimization & Parallel Retrieval Tool Calling
* Sub-queries Rewriting:
- Generate 2-3(mostly 2) distinct rewritten sub-queries
- If user's query is in Chinese, include 1 rewritten sub-query in English
- If user's query is in English, include 1 rewritten sub-query in Chinese
* Parallel Retrieval Tool Call:
- Use each rewritten sub-query to call retrieval tools **in parallel**
- This maximizes coverage and ensures comprehensive information gathering
```
**Enhanced Intent Classification**:
```yaml
# Standard_Regulation_RAG Examples:
- "What regulations relate to intelligent driving?"
- "How do you test the safety of electric vehicles?"
- "What are the main points of GB/T 34567-2023?"
# User_Manual_RAG Examples:
- What is CATOnline (the system)/TRRC/TRRC processes
- How to search for standards, regulations, TRRC news and deliverables
- User management, system configuration, administrative functionalities
```
**User Manual Prompt Template**:
```yaml
Step Template:
Step N: <Action / Instruction from manual>
(Optional short clarification from manual)
![Screenshot: <concise caption>](<image_url_or_placeholder>)
Notes: <business rules / warnings from manual>
```
#### Key Features:
**Multi-Phase Retrieval Workflow**:
- **Round 1**: Parallel metadata discovery with 2-3 optimized queries
- **Round 2**: Focused document content retrieval based on Round 1 insights
- **Round 3+**: Additional targeted retrieval for remaining gaps
**Cross-Language Query Strategy**:
- **Automatic Translation**: Chinese queries include English variants, English queries include Chinese variants
- **Terminology Optimization**: Technical terms, acronyms, and domain-specific language inclusion
- **Azure AI Search Enhancement**: Optimized for hybrid keyword + vector search capabilities
**Enhanced Citation System**:
- **Metadata Tracking**: Precise @tool_call_id and @order_num mapping
- **CSV Format**: Structured citations mapping in HTML comments
- **Source Verification**: Cross-referencing across multiple retrieval results
#### Benefits:
- **Coverage**: Parallel queries with cross-language variants maximize information discovery
- **Efficiency**: Strategic tool calling reduces unnecessary rounds while ensuring thoroughness
- **Accuracy**: Enhanced intent classification improves routing to appropriate RAG systems
- **User Experience**: Better visual integration in user manual responses with mandatory screenshots
- **Consistency**: Standardized formatting templates across all response types
#### Migration Notes:
- Enhanced prompt templates automatically improve response quality
- No breaking changes to existing API or user interfaces
- Cross-language query strategy improves search coverage for multilingual content
- Tool round limits (max_tool_rounds: 4, max_tool_rounds_user_manual: 2) work optimally with new parallel strategy
---
## v1.2.4 - Intent Classification Reference Consolidation - Wed Sep 4 2025
### 🔧 **Enhancement** *(Intent Classification Documentation)*
**Consolidated and enhanced UserManual intent classification examples by merging reference files.**
#### Changes Made:
- **Reference File Consolidation**: Merged UserManual examples from `intent-ref-1.txt` into `intent-ref-2.txt`
- **Enhanced Coverage**: Added more comprehensive use cases for UserManual intent classification
- **Improved Clarity**: Better organized examples to help with accurate intent recognition
#### Technical Implementation:
**Updated `.vibe/ref/intent-ref-2.txt`**:
- **Added from intent-ref-1.txt**:
- What is CATOnline (the system), TRRC, TRRC processes
- How to search for standards, regulations, TRRC news and deliverables in the system
- How to create and update standards, regulations and their documents
- How to download or export data
- How to do administrative functionalities
- Other questions about this (CatOnline) system's functions, or user guide
- **Preserved existing examples**:
- Questions directly about CatOnline functions or features
- TRRC-related processes/standards/regulations as implemented in CatOnline
- How to manage/search/download documents in the system
- User management or system configuration within CatOnline
- Use of admin features or data export in CatOnline
#### Categories Covered:
1. **System Introduction**: CATOnline system, TRRC concepts
2. **Search Functions**: Standards, regulations, TRRC news and deliverables search
3. **Document Management**: Create, update, manage, download documents
4. **System Configuration**: User management, system settings
5. **Administrative Functions**: Admin features, data export
6. **General Help**: System functions, user guides
#### Benefits:
- **Accuracy**: More comprehensive examples improve intent classification precision
- **Coverage**: Better coverage of UserManual use cases
- **Consistency**: Unified reference documentation for intent classification
- **Maintainability**: Single consolidated reference file easier to maintain
## v1.2.3 - User Manual Screenshot Format Clarification - Tue Sep 3 2025
### 🔧 **Enhancement** *(User Manual Prompt Refinement)*
**Added explicit clarification about UI screenshot embedding format in user manual responses.**
#### Changes Made:
- **Screenshot Format Guidance**: Added specific instruction about how UI screenshots should be embedded
- **Format Specification**: Clarified that operational UI screenshots are typically embedded in explanatory text using markdown image format
#### Technical Implementation:
**Updated `llm_prompt.yaml` - User Manual Prompt**:
```yaml
- **Visuals First**: ALWAYS include screenshots for explaining features or procedures. Every instructional step must be immediately followed by its screenshot on a new line.
- **Screenshot Format**: 操作步骤的相关UI截图通常会以markdown图片格式嵌入到说明文字中
```
#### Benefits:
- **Clarity**: AI assistant now has explicit guidance on screenshot embedding format
- **Consistency**: Ensures uniform approach to including UI screenshots in responses
- **User Experience**: Improves the formatting and presentation of instructional content
## v1.2.2 - Prompt Enhancement for Knowledge Boundary Control - Tue Sep 3 2025
### 🔧 **Enhancement** *(LLM Prompt Optimization)*
**Enhanced LLM prompts to strictly prevent model from outputting general knowledge when retrieval yields insufficient results.**
#### Problem Addressed:
- AI assistant was outputting model's built-in general knowledge about topics when specific information wasn't found in retrieval
- Users received generic information about systems/concepts instead of clear "information not available" responses
- Example: When asked about "CATOnline system", AI would provide general CAT (Computer-Assisted Testing) information from its training data
#### Solution Implemented:
- **Enhanced Agent System Prompt**: Added explicit "NO GENERAL KNOWLEDGE" directive
- **Enhanced User Manual Prompt**: Added similar strict knowledge boundary controls
- **Improved Fallback Messages**: Standardized response template for insufficient information scenarios
- **Multiple Reinforcement**: Added the restriction in multiple sections for emphasis
#### Technical Changes:
**Enhanced `llm_prompt.yaml`**:
- Added **"Critical: NO GENERAL KNOWLEDGE"** instruction in agent system prompt
- Enhanced fallback response template: "The system does not contain specific information about [specific topic/feature searched for]."
- Added similar controls in user manual prompt with template: "The user manual does not contain specific information about [specific topic/feature you searched for]."
- Reinforced the restriction in multiple workflow sections
#### Key Prompt Updates:
**Agent System Prompt**:
```yaml
* **Critical: NO GENERAL KNOWLEDGE**: If retrieval yields insufficient or no relevant results, **do not provide any general knowledge or assumptions**. Instead, clearly state "The system does not contain specific information about [specific topic/feature searched for]." and suggest how the user might reformulate their query.
```
**User Manual Prompt**:
```yaml
- **NO GENERAL KNOWLEDGE**: When retrieved content is insufficient, do NOT provide any general knowledge about systems, software, or common practices. State clearly: "The user manual does not contain specific information about [specific topic/feature you searched for]."
```
#### Benefits:
- **Accuracy**: Eliminates confusion from generic information
- **Transparency**: Users clearly understand when information is not available in the system
- **Trust**: Builds user confidence in system's knowledge boundaries
- **Guidance**: Provides clear direction for reformulating queries
#### Testing:
- Verified all prompt sections contain the new "NO GENERAL KNOWLEDGE" instructions
- Confirmed fallback message templates are properly implemented
- Tested that both agent and user manual prompts include the restrictions
## v1.2.1 - Retrieval Module Refactoring and Optimization - Mon Sep 2 2025
### 🔧 **Refactoring** *(Retrieval Module Structure Optimization)*
**Refactored retrieval module structure and optimized normalize_search_result function for better maintainability and performance.**
#### Key Changes:
- **File Renaming**: `service/retrieval/agentic_retrieval.py``service/retrieval/retrieval.py` for clearer naming
- **Function Optimization**: Simplified `normalize_search_result` by removing unnecessary `include_content` parameter
- **Logic Consolidation**: Moved result normalization to `search_azure_ai` method to eliminate redundancy
- **Import Updates**: Updated all references across the codebase to use the new module name
#### Technical Implementation:
- **Simplified normalize_search_result**:
- Removed `include_content` parameter (content is now always preserved)
- Function now focuses solely on cleaning search results and removing empty fields
- Eliminates the need for conditional content handling
- **Optimized Result Processing**:
- `normalize_search_result` is now called directly in `search_azure_ai` method
- Removed duplicate field removal logic between `search_azure_ai` and `normalize_search_result`
- Cleaner separation of concerns
- **Updated File References**:
- `service/graph/tools.py`
- `service/graph/user_manual_tools.py`
- `tests/unit/test_retrieval.py`
- `tests/unit/test_user_manual_tool.py`
- `tests/conftest.py`
- `scripts/debug_user_manual_retrieval.py`
- `scripts/final_verification.py`
#### Benefits:
- **Cleaner Code**: Eliminated redundant logic and simplified function signatures
- **Better Performance**: Single point of result normalization reduces processing overhead
- **Improved Maintainability**: Clearer module naming and consolidated logic
- **Consistent Behavior**: Content is always preserved, eliminating conditional handling complexity
#### Testing:
- Updated all test cases to match new function signatures
- Verified that all retrieval functionality works correctly
- Confirmed that result normalization properly removes unwanted fields while preserving content
## v1.2.0 - Azure AI Search Direct Integration - Wed Sep 2 2025
### ⚡ **Major Enhancement** *(Direct Azure AI Search Integration)*
**Replaced intermediate retrieval service with direct Azure AI Search REST API calls for improved performance and better control.**
#### Key Changes:
- **Direct Azure AI Search Integration**: Eliminated dependency on intermediate retrieval service, now calling Azure AI Search REST API directly
- **Hybrid Search with Semantic Ranking**: Implemented proper hybrid search combining text search + vector search with semantic ranking
- **Enhanced Result Processing**: Added automatic filtering by `@search.rerankerScore` threshold and `@order_num` field injection
- **Improved Configuration**: Extended config structure to support embedding service, API versions, and semantic configuration
#### Technical Implementation:
- **New Config Structure**: Added `EmbeddingConfig`, `IndexConfig` to support embedding generation and Azure Search parameters
- **Vector Query Support**: Implemented proper vector queries with field-specific targeting:
- `retrieve_standard_regulation`: `full_metadata_vector`
- `retrieve_doc_chunk_standard_regulation`: `contentVector,full_metadata_vector`
- `retrieve_doc_chunk_user_manual`: `contentVector`
- **Result Filtering**: Automatic removal of Azure Search metadata fields (`@search.score`, `@search.rerankerScore`, `@search.captions`)
- **Order Numbering**: Added `@order_num` field to track result ranking order
- **Score Threshold Filtering**: Filter results by reranker score threshold for quality control
#### Configuration Updates:
```yaml
retrieval:
endpoint: "https://search-endpoint.search.azure.cn"
api_key: "search-api-key"
api_version: "2024-11-01-preview"
semantic_configuration: "default"
embedding:
base_url: "http://embedding-service/v1-openai"
api_key: "embedding-api-key"
model: "qwen3-embedding-8b"
dimension: 4096
index:
standard_regulation_index: "index-name-1"
chunk_index: "index-name-2"
chunk_user_manual_index: "index-name-3"
```
#### Benefits:
- **Performance**: Eliminated intermediate service latency
- **Control**: Direct control over search parameters and result processing
- **Reliability**: Reduced dependencies and potential points of failure
- **Feature Support**: Full access to Azure AI Search capabilities including semantic ranking
#### Testing:
- Updated unit tests to work with new Azure AI Search implementation
- Verified hybrid search functionality with real Azure AI Search endpoints
- Confirmed proper result filtering and ordering
## v1.1.9 - Intent Recognition Structured Output Compatibility Fix - Mon Sep 2 2025
### 🔧 **Bug Fix** *(Intent Recognition Compatibility)*
**Fixed intent recognition error for models that don't support OpenAI's structured output format (json_schema).**
#### Problem Addressed:
- Intent recognition failed with error: "Invalid parameter: 'response_format' of type 'json_schema' is not supported with this model"
- DeepSeek and other non-OpenAI models don't support OpenAI's structured output feature
- System would default to Standard_Regulation_RAG but log errors continuously
#### Root Cause:
- `intent_recognition_node` used `llm_client.llm.with_structured_output(Intent)` which automatically adds `json_schema` response_format
- This feature is specific to OpenAI GPT models and not supported by DeepSeek, Claude, or other model providers
#### Solution:
- **Removed structured output dependency**: Replaced `with_structured_output()` with standard LLM calls
- **Enhanced text parsing**: Added robust response parsing to extract intent labels from text responses
- **Improved prompt engineering**: Added explicit output format instructions to system prompt
- **Enhanced error handling**: Better handling of different response content types (string/list)
#### Technical Changes:
**Modified**: `service/graph/intent_recognition.py`
```python
# Before (broken with non-OpenAI models):
intent_llm = llm_client.llm.with_structured_output(Intent)
intent_result = await intent_llm.ainvoke([SystemMessage(content=system_prompt)])
# After (compatible with all models):
system_prompt = intent_prompt_template.format(...) +
"\n\nIMPORTANT: You must respond with ONLY one of these two exact labels: " +
"'Standard_Regulation_RAG' or 'User_Manual_RAG'. Do not include any other text."
intent_result = await llm_client.llm.ainvoke([SystemMessage(content=system_prompt)])
# Enhanced response parsing
if isinstance(intent_result.content, str):
response_text = intent_result.content.strip()
elif isinstance(intent_result.content, list):
response_text = " ".join([str(item) for item in intent_result.content
if isinstance(item, str)]).strip()
```
#### Key Improvements:
**Model Compatibility**:
- Works with all LLM providers (OpenAI, Azure OpenAI, DeepSeek, Claude, etc.)
- No dependency on provider-specific features
- Maintains accuracy through enhanced prompt engineering
**Error Resolution**:
- Eliminated "json_schema not supported" errors
- Improved system reliability and user experience
- Maintained intent classification accuracy
**Robustness**:
- Better handling of different response formats
- Fallback mechanisms for unparseable responses
- Enhanced logging for debugging
#### Testing:
- ✅ Standard regulation queries correctly classified as `Standard_Regulation_RAG`
- ✅ User manual queries correctly classified as `User_Manual_RAG`
- ✅ Compatible with DeepSeek, Azure OpenAI, and other model providers
- ✅ No more structured output errors in logs
---
## v1.1.8 - User Manual Prompt Anti-Hallucination Enhancement - Sun Sep 1 2025
### 🧠 **Prompt Engineering Enhancement** *(User Manual Anti-Hallucination)*
**Enhanced the user_manual_prompt to reduce hallucinations by adopting grounded response principles from agent_system_prompt.**
#### Problem Addressed:
- User manual assistant could speculate about undocumented system features
- Inconsistent handling of missing information compared to main agent prompt
- Less structured approach to failing gracefully when manual information was insufficient
- Potential for inferring functionality not explicitly documented in user manuals
#### Solution:
- **Grounded Response Principles**: Adopted evidence-based response requirements from agent_system_prompt
- **Enhanced Fail-Safe Mechanisms**: Implemented comprehensive "No-Answer with Suggestions" framework
- **Explicit Anti-Speculation**: Added clear prohibitions against guessing or inferring undocumented features
- **Consistent Evidence Requirements**: Aligned with main agent prompt's evidence standards
#### Technical Changes:
**Modified**: `llm_prompt.yaml` - `user_manual_prompt`
```yaml
# Enhanced Core Directives
- **Answer with evidence** from retrieved user manual sources; avoid speculation.
Never guess or infer functionality not explicitly documented.
- **Fail gracefully**: if retrieval yields insufficient or no relevant results,
**do not guess**—produce a clear *No-Answer with Suggestions* section.
# Enhanced Workflow - Verify & Synthesize
- Cross-check all retrieved information for consistency.
- Only include information supported by retrieved user manual evidence.
- If evidence is insufficient, follow the *No-Answer with Suggestions* approach.
# Added No-Answer Framework
When retrieved user manual content is insufficient:
- State clearly what specific information is missing
- Do not guess or provide information not explicitly found
- Provide constructive next steps and alternative approaches
```
#### Key Improvements:
**Evidence Requirements**:
- Enhanced from basic "Evidence-Based Only" to comprehensive evidence validation
- Added explicit prohibition against speculation and inference
- Aligned with agent_system_prompt's grounded response standards
**Graceful Failure Handling**:
- Upgraded from simple "state it clearly" to structured "No-Answer with Suggestions"
- Provides specific guidance for reformulating queries
- Offers constructive next steps when information is missing
**Anti-Hallucination Measures**:
- ✅ Grounded responses principle
- ✅ No speculation directive
- ✅ Explicit no-guessing rule
- ✅ Evidence-only responses
- ✅ Constructive suggestions framework
#### Consistency Achievement:
- **Unified Approach**: Same evidence standards across agent_system_prompt and user_manual_prompt
- **Standardized Failure Handling**: Consistent "No-Answer with Suggestions" methodology
- **Preserved Specialization**: Maintained user manual specific features (screenshots, step-by-step format)
#### Files Added:
- `docs/topics/USER_MANUAL_PROMPT_ANTI_HALLUCINATION.md` - Detailed technical documentation
- `scripts/test_user_manual_prompt_improvements.py` - Comprehensive validation test suite
#### Expected Benefits:
- **Reduced Hallucinations**: No speculation about undocumented CATOnline features
- **Improved Reliability**: More accurate step-by-step instructions based only on manual content
- **Better User Guidance**: Structured suggestions when manual information is incomplete
- **System Consistency**: Unified anti-hallucination approach across all prompt types
---
## v1.1.7 - GPT-5 Mini Temperature Parameter Fix - Sun Sep 1 2025
### 🔧 **LLM Compatibility Fix** *(GPT-5 Mini Temperature Support)*
**Fixed temperature parameter handling to support GPT-5 mini model which only accepts default temperature values.**
#### Problem Solved:
- GPT-5 mini model rejected requests with explicit `temperature` parameter (e.g., 0.0, 0.2)
- Error: "Unsupported value: 'temperature' does not support 0.0 with this model. Only the default (1) value is supported."
- System always passed temperature even when commented out in configuration
#### Solution:
- **Conditional parameter passing**: Only include `temperature` in LLM requests when explicitly set in configuration
- **Optional configuration**: Changed temperature from required to optional in both new and legacy config classes
- **Model default usage**: When temperature not specified, model uses its own default value
#### Technical Changes:
**Modified**: `service/config.py`
```python
# Changed temperature from required to optional
class LLMParametersConfig(BaseModel):
temperature: Optional[float] = None # Was: float = 0
class LLMRagConfig(BaseModel):
temperature: Optional[float] = None # Was: float = 0.2
# Only include temperature in config when explicitly set
def get_llm_config(self) -> Dict[str, Any]:
if self.llm_prompt.parameters.temperature is not None:
base_config["temperature"] = self.llm_prompt.parameters.temperature
```
**Modified**: `service/llm_client.py`
```python
# Only pass temperature parameter when present in config
def _create_llm(self):
params = {
"base_url": llm_config["base_url"],
"api_key": llm_config["api_key"],
"model": llm_config["model"],
"streaming": True,
}
# Only add temperature if explicitly set
if "temperature" in llm_config:
params["temperature"] = llm_config["temperature"]
return ChatOpenAI(**params)
```
#### Configuration Examples:
**No Temperature (Uses Model Default)**:
```yaml
# llm_prompt.yaml
parameters:
# temperature: 0 # Commented out - model uses default
max_context_length: 100000
```
**Explicit Temperature**:
```yaml
# llm_prompt.yaml
parameters:
temperature: 0.7 # Will be passed to model
max_context_length: 100000
```
#### Backward Compatibility:
- ✅ Existing configurations continue to work
- ✅ Legacy `config.yaml` LLM settings still supported
- ✅ No breaking changes when temperature is explicitly set
#### Files Added:
- `docs/topics/GPT5_MINI_TEMPERATURE_FIX.md` - Detailed technical documentation
- `scripts/test_temperature_fix.py` - Comprehensive test suite
---
## v1.1.6 - Enhanced I18n Multi-Language Support - Sat Aug 31 2025
### 🌐 **Internationalization Enhancement** *(I18n Multi-Language Support)*
**Added comprehensive internationalization (i18n) support for Chinese and English languages across the web interface.**
---
## v1.1.5 - Aggressive Tool Call History Trimming - Sat Aug 31 2025
### 🚀 **Enhanced Token Optimization** *(Aggressive Trimming Strategy)*
**Modified trimming strategy to proactively clean historical tool call results regardless of token count, while protecting current conversation turn's tool calls.**
#### New Behavior:
- **Always trim when multiple tool rounds exist** - regardless of total token count
- **Preserve current conversation turn's tool calls** - never trim active tool execution results
- **Remove historical tool call results** - from previous conversation turns to minimize context pollution
#### Why This Change:
- Historical tool call results accumulate quickly in conversation history
- Large retrieval results consume significant tokens even when total context is manageable
- Proactive trimming prevents context bloat before hitting token limits
- Current tool calls must remain intact for proper agent workflow
#### Technical Implementation:
**Modified**: `service/graph/message_trimmer.py`
- **Enhanced `should_trim()`**: Now triggers when detecting multiple tool rounds (>1), not just on token limit
- **Preserved Strategy**: `_optimize_multi_round_tool_calls()` continues to keep only the most recent tool round
- **Current Turn Protection**: Agent workflow ensures current turn's tool calls are never trimmed during execution
#### Impact:
- **Proactive Cleanup**: Tool call history cleaned before reaching token limits
- **Context Quality**: Conversation stays focused on recent, relevant context
- **Workflow Protection**: Current tool execution results always preserved
- **Token Efficiency**: Maintains optimal token usage across conversation lifetime
---
## v1.1.4 - Multi-Round Tool Call Token Optimization - Sat Aug 31 2025
### 🚀 **Performance Enhancement** *(Token Optimization)*
**Implemented intelligent token optimization for multi-round tool calling scenarios to significantly reduce LLM context usage.**
#### Problem Solved:
- In multi-round tool calling scenarios, previous rounds' tool call results (ToolMessage) were consuming excessive tokens
- Large JSON responses from retrieval tools accumulated in conversation history
- Token usage could exceed LLM context limits, causing API failures
#### Key Features:
1. **Multi-Round Tool Call Detection**:
- Automatically identifies tool calling rounds in conversation history
- Recognizes patterns of AI messages with tool_calls followed by ToolMessage responses
2. **Intelligent Message Optimization**:
- Preserves system messages and original user queries
- Keeps only the most recent tool calling round for context continuity
- Removes older ToolMessage content that typically contains large response data
3. **Token Usage Reduction**:
- Achieves 60-80% reduction in token usage for multi-round scenarios
- Maintains conversation quality while respecting LLM context constraints
- Prevents API failures due to context length overflow
#### Technical Implementation:
- **File**: `service/graph/message_trimmer.py`
- **New Methods**:
- `_optimize_multi_round_tool_calls()` - Core optimization logic
- `_identify_tool_rounds()` - Tool round pattern recognition
- Enhanced `trim_conversation_history()` - Integrated optimization workflow
#### Test Results:
- **Message Reduction**: 60% fewer messages in multi-round scenarios
- **Token Savings**: 70-80% reduction in token consumption
- **Context Preservation**: Maintains conversation flow and quality
#### Configuration:
```yaml
parameters:
max_context_length: 96000 # Configurable context length
# Optimization automatically applies when multiple tool rounds detected
```
#### Benefits:
- **Cost Efficiency**: Significant reduction in LLM API costs
- **Reliability**: Prevents context overflow errors
- **Performance**: Faster processing with smaller context windows
- **Scalability**: Supports longer multi-round conversations
#### Files Modified:
- `service/graph/message_trimmer.py`
- `tests/unit/test_message_trimmer.py`
- `docs/topics/MULTI_ROUND_TOKEN_OPTIMIZATION.md`
- `docs/CHANGELOG.md`
---
## v1.1.3 - UI Text Update - Fri Aug 30 2025
### ✏️ **Content Update** *(UI Improvement)*
**Updated the example questions in the frontend UI.**
#### Changes Made:
- Modified the third and fourth example questions in both Chinese and English in `web/src/utils/i18n.ts` to be more relevant to user needs.
- **Chinese**:
- `根据标准,如何测试电动汽车充电功能的兼容性`
- `如何注册申请CATOnline权限`
- **English**:
- `According to the standard, how to test the compatibility of electric vehicle charging function?`
- `How to register for CATOnline access?`
#### Benefits:
- Provides users with more practical and common question examples.
- Improves user experience by guiding them to ask more effective questions.
#### Files Modified:
- `web/src/utils/i18n.ts`
- `docs/CHANGELOG.md`
## v1.1.2 - Prompt Optimization - Fri Aug 30 2025
### 🚀 **Prompt Optimization** *(Prompt Engineering)*
**Optimized and compressed `intent_recognition_prompt` and `user_manual_prompt` in `llm_prompt.yaml`.**
#### Changes Made:
1. **`intent_recognition_prompt`**:
* Condensed background information into key bullet points.
* Refined classification descriptions for clarity.
* Simplified classification guidelines with keyword hints for better decision-making.
2. **`user_manual_prompt`**:
* Elevated key instructions to **Core Directives** for emphasis.
* Streamlined the workflow description.
* Made the **Response Formatting** rules more stringent, especially regarding screenshots.
* Retained the crucial **Context Disambiguation** section.
#### Benefits:
- **Efficiency**: More compact prompts for faster processing.
- **Reliability**: Clearer and more direct instructions reduce the likelihood of incorrect outputs.
- **Maintainability**: Improved structure makes the prompts easier to read and update.
#### Files Modified:
- `llm_prompt.yaml`
- `docs/CHANGELOG.md`
## v1.1.1 - User Manual Tool Rounds Configuration - Fri Aug 29 2025
### 🔧 **Configuration Enhancement** *(Configuration Update)*
**Added Independent Tool Rounds Configuration for User Manual RAG**
#### Changes Made:
1. **Configuration Structure**
- Added `max_tool_rounds_user_manual: 3` to `config.yaml`
- Separated user manual agent tool rounds from main agent configuration
- Maintained backward compatibility with existing configuration
2. **Code Updates**
- Updated `AppConfig` class in `service/config.py` to include `max_tool_rounds_user_manual` field
- Added `max_tool_rounds_user_manual` to `AgentState` in `service/graph/state.py`
- Modified `service/graph/user_manual_rag.py` to use separate configuration
- Updated graph initialization in `service/graph/graph.py` to include new config
3. **Prompt System Updates**
- Updated `user_manual_prompt` in `llm_prompt.yaml`:
- Removed citation-related instructions (no [1] citations or citation mapping)
- Set all rewritten queries to use English language
- Streamlined response format without citation requirements
#### Technical Details:
- **Configuration Priority**: State-level config takes precedence over file config
- **Independent Configuration**: User manual agent now has its own `max_tool_rounds_user_manual` setting
- **Default Values**: Both main agent (3 rounds) and user manual agent (3 rounds) use same default
- **Validation**: All syntax checks and configuration loading tests passed
#### Benefits:
- **Flexibility**: Different tool round limits for different agent types
- **Maintainability**: Clear separation of concerns between agent configurations
- **Consistency**: Follows same configuration pattern as main agent
- **Customization**: Allows fine-tuning user manual agent behavior independently
#### Files Modified:
- `config.yaml`
- `service/config.py`
- `service/graph/state.py`
- `service/graph/graph.py`
- `service/graph/user_manual_rag.py`
- `llm_prompt.yaml`
## v1.1.0 User Manual Agent Update Summary - Fri Aug 29 22:20:20 HKT 2025
## ✅ Successfully Completed
1. **Prompt Configuration Update**
- Updated `user_manual_prompt` in `llm_prompt.yaml`
- Integrated query optimization, parallel retrieval, and evidence-based answering from `agent_system_prompt`
- Verified prompt loading with test script (6566 chars)
2. **Agent Node Logic**
- User manual agent node is autonomous with multi-round tool calls (3 rounds max)
- Intent classification correctly routes to User_Manual_RAG
- Agent node redirects to user_manual_agent_node correctly
3. **Multi-Round Tool Execution**
- Successfully executes multiple tool rounds
- Tool calls increment properly (1/3, 2/3, 3/3)
- Max rounds protection works (forces final synthesis)
## 🚨 Issues Discovered
1. **Citation Number Error**:
- Error: "AgentWorkflow error: 'citation number'"
- Occurring during user manual agent execution
2. **SSE Streaming Issue**:
- TypeError: 'coroutine' object is not iterable
- Affecting streaming response delivery
- StreamingResponse configuration needs fixing
## 📊 Test Results
- ✅ Prompt configuration test: PASSED
- ✅ Intent recognition: PASSED
- ✅ Agent routing: PASSED
- ✅ Multi-round tool calls: PASSED
- ❌ Citation processing: FAILED
- ❌ SSE streaming: FAILED
## 🔍 Next Steps
1. Fix citation number error in user manual agent
2. Fix SSE streaming response format
3. Complete end-to-end validation
## v1.0.9 - 2025-08-29 🤖
### 🤖 **User Manual Agent Transformation** *(Major Feature Enhancement)*
#### **🔄 Autonomous User Manual Agent Implementation** *(Architecture Upgrade)*
- **Agent Node Conversion**: Transformed `service/graph/user_manual_rag.py` from simple RAG to autonomous agent
- **Detect-First-Then-Stream Strategy**: Implemented optimal multi-round behavior with tool detection and streaming synthesis
- **Tool Round Management**: Added intelligent tool calling with configurable round limits and state tracking
- **Conversation Trimming**: Integrated automatic context length management for long conversations
- **Streaming Support**: Enhanced real-time response generation with HTML comment filtering
- **User Manual Tool Integration**: Specialized tool ecosystem for user manual operations
- **Tool Schema Generation**: Automatic schema generation from `service/graph/user_manual_tools.py`
- **Force Tool Choice**: Enabled autonomous tool selection for optimal response generation
- **Tool Execution Pipeline**: Parallel-capable tool execution with streaming events and error handling
- **Routing Logic Enhancement**: Sophisticated routing system for multi-round workflows
- **Smart Routing**: Routes between `user_manual_tools`, `user_manual_agent`, and `post_process`
- **State-Aware Decisions**: Context-aware routing based on tool calls and conversation state
- **Final Synthesis Detection**: Automatic transition to synthesis mode when appropriate
- **Error Handling & Recovery**: Comprehensive error management system
- **Graceful Degradation**: User-friendly error messages with proper error categorization
- **Stream Error Events**: Real-time error notification through streaming interface
- **Tool Error Recovery**: Resilient tool execution with fallback mechanisms
#### **🔧 Technical Implementation Details** *(System Architecture)*
- **Function Signatures**: New agent functions following established patterns from main agent
- `user_manual_agent_node()`: Main autonomous agent function
- `user_manual_should_continue()`: Intelligent routing logic
- `run_user_manual_tools_with_streaming()`: Enhanced tool execution
- **Configuration Integration**: Seamless integration with existing configuration system
- **Prompt Template Usage**: Uses existing `user_manual_prompt` from `llm_prompt.yaml`
- **Dynamic Prompt Formatting**: Contextual prompt generation with conversation history and retrieved content
- **Tool Configuration**: Automatic tool binding and schema management
- **Backward Compatibility**: Maintained legacy function for seamless transition
- **Legacy Wrapper**: `user_manual_rag_node()` redirects to new agent implementation
- **API Consistency**: No breaking changes to existing interfaces
- **Migration Path**: Smooth upgrade path for existing implementations
#### **✅ Testing & Validation** *(Quality Assurance)*
- **Comprehensive Test Suite**: New test script `scripts/test_user_manual_agent.py`
- **Basic Agent Testing**: Tool detection, calling, and routing validation
- **Integration Workflow Testing**: Complete multi-round conversation scenarios
- **Error Handling Testing**: Graceful error recovery and user feedback
- **Performance Validation**: Streaming response and tool execution timing
- **Functionality Validation**: All core features tested and validated
- ✅ Tool detection and autonomous calling
- ✅ Multi-round workflow execution
- ✅ Streaming response generation
- ✅ Error handling and recovery
- ✅ State management and routing logic
#### **📚 Documentation & Examples** *(Knowledge Management)*
- **Implementation Guide**: Comprehensive documentation in `docs/topics/USER_MANUAL_AGENT_IMPLEMENTATION.md`
- **Usage Examples**: Practical code examples and implementation patterns
- **Architecture Overview**: Technical details and design decisions
- **Migration Guide**: Step-by-step upgrade instructions
**Impact**: Transforms user manual functionality from simple retrieval to intelligent autonomous agent capable of multi-round conversations, tool usage, and sophisticated response generation while maintaining full backward compatibility.
## v1.0.8 - 2025-08-29 📚
### 📚 **User Manual Prompt Enhancement** *(Functional Improvement)*
#### **🎯 Enhanced User Manual Assistant Prompt** *(Content Update)*
- **Context Disambiguation Rules**: Added comprehensive disambiguation guidelines for overlapping concepts
- **Function Distinction**: Clear separation between Homepage functions (User) vs Admin Console functions (Administrator)
- **Management Clarity**: Differentiated between user management vs user group management operations
- **Role-based Operations**: Defined default roles for different operations (view/search for Users, edit/delete/configure for Administrators)
- **Clarification Protocol**: Added requirement to ask for clarification when user context is unclear
- **Response Structure Standards**: Implemented standardized response formatting
- **Step-by-Step Instructions**: Mandated complete procedural guidance with figures
- **Structured Format**: Required specific format for each step (description, screenshot, additional notes)
- **Business Rules Integration**: Ensured inclusion of all relevant business rules from source sections
- **Documentation Structure**: Maintained original documentation hierarchy and organization
- **Content Reproduction Rules**: Established strict content fidelity guidelines
- **Exact Wording**: Required copying exact wording and sequence from source sections
- **Complete Information**: Mandated inclusion of ALL information without summarization
- **Format Preservation**: Maintained original formatting and hierarchical structure
- **No Reorganization**: Prohibited modification or reorganization of original content
- **Reference Integration**: Successfully merged guidance from `.vibe/ref/user_manual_prompt-ref.txt`
- **Quality Assurance**: Enhanced accuracy and completeness of user manual responses
#### **📋 Reference File Analysis** *(Content Optimization)*
- **catonline-ref.txt Assessment**: Evaluated system background reference content
- **Content Alignment**: Confirmed existing content already covers CATOnline system background
- **Redundancy Avoidance**: Decided against merging to prevent duplicate instructions
- **Content Validation**: Verified accuracy and completeness of existing background information
- **user_manual_prompt-ref.txt Integration**: Successfully incorporated valuable operational guidelines
- **Value Assessment**: Identified high-value content missing from existing prompt
- **Strategic Merge**: Integrated content to enhance response quality without duplication
- **Instruction Optimization**: Improved prompt effectiveness while maintaining conciseness
## v1.0.7 - 2025-08-29 🎯
### 🎯 **Intent Recognition Enhancement** *(Functional Improvement)*
#### **📝 Enhanced Intent Classification Prompt** *(Content Update)*
- **Detailed Guidelines**: Added comprehensive classification criteria based on reference files
- **Content vs System Operation**: Clear distinction between standard/regulation content queries and CATOnline system operation queries
- **Standard_Regulation_RAG Examples**:
- "What regulations relate to intelligent driving?"
- "How do you test the safety of electric vehicles?"
- "What are the main points of GB/T 34567-2023?"
- "What is the scope of ISO 26262?"
- **User_Manual_RAG Examples**:
- "What is CATOnline (the system)?"
- "How to do search for standards, regulations, TRRC news and deliverables?"
- "How to create and update standards, regulations and their documents?"
- "How to download or export data?"
- **Classification Guidelines**: Added specific rules for edge cases and ambiguous queries
- **Reference Integration**: Incorporated guidance from `.vibe/ref/intent-ref-1.txt` and `.vibe/ref/intent-ref-2.txt`
#### **🏢 CATOnline Background Information Integration** *(Context Enhancement)*
- **Background Context**: Added comprehensive CATOnline system background information to intent recognition prompt
- **System Definition**: Integrated explanation that CATOnline is the China Automotive Technical Regulatory Online System
- **Feature Coverage**: Included details about CATOnline capabilities:
- TRRC process introductions and business areas
- Standards/laws/regulations/protocols search and viewing
- Document download and Excel export functionality
- Consumer test and voluntary certification checking
- Deliverable reminders and TRRC deliverable retrieval
- Admin features: popup configuration, working groups management, standards/regulations CRUD operations
- **TRRC Context**: Added clarification that TRRC stands for Technical Regulation Region China of Volkswagen
- **Enhanced Classification**: Background information helps improve intent classification accuracy for CATOnline-specific queries
#### **🧪 Testing & Validation** *(Quality Assurance)*
- **Intent Recognition Tests**: Verified enhanced prompt with multiple test scenarios
- **Multi-Intent Workflow**: Validated proper routing between Standard_Regulation_RAG and User_Manual_RAG
- **Edge Case Handling**: Tested classification accuracy for ambiguous queries
- **TRRC Edge Case**: Added specific handling for TRRC-related queries to distinguish between content vs. system operation
- **CATOnline Background Tests**: Created comprehensive test suite for CATOnline-specific scenarios
- **100% Accuracy**: Maintained perfect classification accuracy on all test suites including background-enhanced scenarios
## v1.0.6 - 2025-08-28 🔧
### 🔧 **Code Architecture Refactoring & Optimization** *(Technical Improvement)*
#### **🧹 Code Structure Cleanup** *(Breaking Fix)*
- **Duplicate State Removal**: Eliminated duplicate `AgentState` definitions across modules
- **Unified Definition**: Consolidated all state management to `/service/graph/state.py`
- **Import Cleanup**: Removed redundant AgentState from `graph.py`
- **Type Safety**: Ensured consistent state typing across all graph nodes
- **Circular Import Resolution**: Fixed circular dependency issues in module imports
- **Clean Dependencies**: Streamlined import statements and removed unused context variables
#### **📁 Module Separation & Organization** *(Code Organization)*
- **Intent Recognition Module**: Moved `intent_recognition_node` to dedicated `/service/graph/intent_recognition.py`
- **Pure Function**: Self-contained intent classification logic
- **LLM Integration**: Structured output with Pydantic Intent model
- **Context Handling**: Intelligent conversation history rendering
- **User Manual RAG Module**: Extracted `user_manual_rag_node` to `/service/graph/user_manual_rag.py`
- **Specialized Processing**: Dedicated user manual query handling
- **Tool Integration**: Direct integration with user manual retrieval tools
- **Stream Support**: Complete SSE streaming capabilities
- **Graph Simplification**: Cleaned up main `graph.py` by removing redundant code
#### **⚙️ Configuration Enhancement** *(Configuration)*
- **Prompt Externalization**: Moved all hardcoded prompts to `llm_prompt.yaml`
- **Intent Recognition Prompt**: Configurable intent classification instructions
- **User Manual Prompt**: Configurable user manual response template
- **Agent System Prompt**: Existing agent behavior remains configurable
- **Runtime Configuration**: All prompts now loaded dynamically from config file
- **Deployment Flexibility**: Different environments can use different prompt configurations
#### **🧪 Testing & Validation** *(Quality Assurance)*
- **Graph Compilation Tests**: Verified successful compilation after refactoring
- **Multi-Intent Workflow Tests**: End-to-end validation of both intent pathways
- **Module Integration Tests**: Confirmed proper module separation and imports
- **Configuration Loading Tests**: Validated dynamic prompt loading from config files
#### **📋 Technical Details**
- **Files Modified**:
- `/service/graph/graph.py` - Removed duplicate definitions, clean imports
- `/service/graph/state.py` - Single source of truth for AgentState
- `/service/graph/intent_recognition.py` - New dedicated module
- `/service/graph/user_manual_rag.py` - New dedicated module
- `/llm_prompt.yaml` - Added configurable prompts
- **Import Chain**: Fixed circular imports between graph nodes
- **Type Safety**: Consistent `AgentState` usage across all modules
- **Testing**: 100% pass rate on graph compilation and workflow tests
#### **🚀 Developer Experience**
- **Code Maintainability**: Better separation of concerns and module boundaries
- **Configuration Management**: Centralized prompt management for easier tuning
- **Debug Support**: Cleaner stack traces with resolved circular imports
- **Extension Ready**: Easier to add new intent types or modify existing behavior
#### **<2A> Internationalization & UX Improvements** *(User Experience)*
- **English Prompts**: Updated intent recognition prompts to use English for improved LLM classification accuracy
- **English User Manual Prompts**: Updated user manual RAG prompts to use English for consistency
- **Error Messages**: Converted all error messages to English for consistency
- **No Default Prompts**: Removed hardcoded fallback prompts, ensuring explicit configuration management
- **Enhanced Conversation Rendering**: Updated conversation history format to use `<user>...</user>` and `<ai>...</ai>` tags for better LLM parsing
- **Configuration Integration**: Added `intent_recognition_prompt` and `user_manual_prompt` to configuration loading system
#### **<2A>🎨 UI/UX Improvements** *(User Interface)*
- **Tool Icon Enhancement**: Updated `retrieve_system_usermanual` tool icon to `user-guide.png`
- **Visual Distinction**: Better visual differentiation between standard regulation and user manual tools
- **User Experience**: More intuitive icon representing user manual/guide functionality
- **Icon Asset**: Leveraged existing `user-guide.png` icon from public assets
## v1.0.5 - 2025-08-28 🎯
### 🎯 **Multi-Intent RAG System Implementation** *(Major Feature)*
#### **🧠 Intent Recognition Engine** *(New)*
- **Intent Classification**: LLM-powered intelligent intent recognition with context awareness
- **Supported Intents**:
- `Standard_Regulation_RAG`: Manufacturing standards, regulations, and compliance queries
- `User_Manual_RAG`: CATOnline system usage, features, and operational guidance
- **Technology**: Structured output with Pydantic models for reliable classification
- **Accuracy**: 100% classification accuracy in testing across Chinese and English queries
- **Context Awareness**: Leverages conversation history for improved intent disambiguation
#### **🔄 Enhanced Workflow Architecture** *(Breaking Change)*
- **New Graph Structure**: `START → intent_recognition → [conditional_routing] → {Standard_RAG | User_Manual_RAG}`
- **Entry Point Change**: All queries now start with intent recognition instead of direct agent processing
- **Dual Processing Paths**:
- **Standard_Regulation_RAG**: Multi-round agent workflow with tool orchestration (existing behavior)
- **User_Manual_RAG**: Single-round specialized processing with user manual retrieval
- **Backward Compatibility**: Existing standard/regulation queries maintain full functionality
#### **📚 User Manual RAG Specialization** *(New)*
- **Dedicated Node**: `user_manual_rag_node` for specialized user manual processing
- **Tool Integration**: Direct integration with `retrieve_system_usermanual` tool
- **Response Template**: Professional user manual assistance with structured guidance
- **Streaming Support**: Real-time token streaming for immediate user feedback
- **Error Handling**: Graceful degradation with support contact suggestions
#### **🏗️ Technical Architecture Improvements**
- **State Management**: Enhanced `AgentState` with `intent` field for workflow routing
- **Modular Design**: Separated user manual tools into dedicated module (`user_manual_tools.py`)
- **Type Safety**: Full TypeScript-style type annotations with Literal types for intent routing
- **Memory Persistence**: Both intent paths support PostgreSQL session memory and conversation history
- **Testing Suite**: Comprehensive test coverage including intent recognition and end-to-end workflow validation
#### **🚀 Performance & Reliability**
- **Smart Routing**: Eliminates unnecessary tool calls for user manual queries
- **Optimized Flow**: Single-round processing for user manual queries vs multi-round for standards
- **Error Recovery**: Intent recognition failure gracefully defaults to standard regulation processing
- **Session Management**: Complete session persistence across both intent pathways
#### **📋 Query Classification Examples**
**Standard_Regulation_RAG Path**:
- "请问GB/T 18488标准的具体内容是什么"
- "ISO 26262 functional safety standard requirements"
- "汽车安全法规相关规定"
**User_Manual_RAG Path**:
- "如何使用CATOnline系统进行搜索"
- "How do I log into the CATOnline system?"
- "CATOnline系统的用户管理功能怎么使用"
#### **🔧 Implementation Files**
- **Core Logic**: Enhanced `service/graph/graph.py` with intent nodes and routing
- **Intent Recognition**: `intent_recognition_node()` function with LLM classification
- **User Manual Processing**: `user_manual_rag_node()` function with specialized handling
- **State Management**: Updated `service/graph/state.py` with intent support
- **Tool Organization**: New `service/graph/user_manual_tools.py` module
- **Documentation**: Comprehensive implementation guide in `docs/topics/MULTI_INTENT_IMPLEMENTATION.md`
#### **📈 Impact**
- **User Experience**: Intelligent query routing for more relevant responses
- **System Efficiency**: Optimized processing paths based on query type
- **Extensibility**: Framework ready for additional intent types
- **Maintainability**: Clear separation of concerns between different query domains
---
## v1.0.4 - 2025-08-27 🔧
### 🔧 **New Tool Implementation**
#### **📚 System User Manual Retrieval Tool** *(New)*
- **Tool Name**: `retrieve_system_usermanual`
- **Purpose**: Search for document content chunks of user manual of this system (CATOnline)
- **Integration**: Full LangGraph integration with @tool decorator pattern
- **UI Support**: Complete frontend integration with multilingual UI labels
- Chinese: "系统使用手册检索"
- English: "System User Manual Retrieval"
- **Configuration**: Added `chunk_user_manual_index` support in SearchConfig
- **Error Handling**: Robust error handling with proper logging and fallback responses
- **Testing**: Comprehensive unit tests for tool structure and integration validation
#### **🎯 Technical Implementation Details**
- **Backend**: Added to `service/graph/tools.py` following LangGraph best practices
- **Frontend**: Integrated into `web/src/components/ToolUIs.tsx` with consistent styling
- **Translation**: Updated `web/src/utils/i18n.ts` with bilingual support
- **Configuration**: Enhanced `service/config.py` with user manual index configuration
- **Tool Registration**: Automatically included in tools list and schema generation
#### **📝 Note**
The search index `index-cat-usermanual-chunk-prd` referenced in the configuration is not yet available, but the tool framework is fully implemented and ready for use once the index is created.
## v1.0.3 - 2025-08-26 ✨
### ✨ **UI Enhancements & Example Questions**
#### **📱 Latest CSS Improvements** *(Just Updated)*
- **Enhanced Example Question Layout**: Increased min-width to 360px and max-width to 450px for better readability
- **Perfect Centering**: Added `justify-items: center` for professional grid alignment
- **Improved Spacing**: Enhanced padding and gap values for optimal visual hierarchy
- **Mobile Optimization**: Consistent responsive design with improved touch targets on mobile devices
#### **🎯 Welcome Page Example Questions**
- **Multilingual Support**: Added 4 interactive example questions with Chinese/English translations
- **Smart Interaction**: Click-to-send functionality using `useComposerRuntime()` hook for seamless assistant-ui integration
- **Responsive Design**: Auto-adjusting grid layout (2x2 on desktop, single column on mobile)
- **Professional Styling**: Card-based design with hover effects, shadows, and smooth animations
#### **🌐 Updated Branding & Messaging**
- **App Title**: Updated to "CATOnline AI助手" / "CATOnline AI Assistant"
- **Enhanced Descriptions**: Comprehensive service descriptions highlighting CATOnline semantic search capabilities
- **Detailed Welcome Messages**: Multi-paragraph welcome text explaining current service scope and upcoming features
- **Consistent Multilingual Content**: Perfect alignment between Chinese and English versions
#### **📝 Example Questions Added**
**Chinese**:
1. 电力储能用锂离子电池最新标准发布时间?
2. 如何测试电动汽车的充电性能?
3. 提供关于车辆通讯安全的法规
4. 自动驾驶L2和L3的定义
**English**:
1. When was the latest standard for lithium-ion batteries for power storage released?
2. How to test electric vehicle charging performance?
3. Provide regulations on vehicle communication security
4. Definition of L2 and L3 in autonomous driving
#### **🎨 Technical Implementation**
- **Custom Components**: Created `ExampleQuestionButton` component with proper TypeScript typing
- **CSS Enhancements**: Added responsive grid styles with mobile optimization
- **Architecture**: Seamlessly integrated with existing assistant-ui framework patterns
- **Language Detection**: Automatic language switching via URL parameters and browser detection
## v1.0.2 - 2025-08-26 🔧
### 🔧 **Error Handling & Code Quality Improvements**
#### **🛡️ DRY Error Handling System**
- **Backend Error Handler**: Added unified `error_handler.py` module with structured logging, decorators, and error categorization
- **Frontend Error Components**: Created ErrorBoundary and ErrorToast components with TypeScript support
- **Error Middleware**: Implemented centralized error handling middleware for FastAPI
- **Structured Logging**: JSON-formatted logs with timezone-aware timestamps
- **User-Friendly Messages**: Categorized error types (error/warning/network) with appropriate UI feedback
#### **🌐 Error Message Internationalization**
- **English Default**: All user-facing error messages now default to English for better accessibility
- **Consistent Messaging**: Updated error handler to provide clear, professional English error messages
- **Frontend Updates**: ErrorBoundary component now displays English error messages
- **Backend Messages**: Standardized API error responses in English across all endpoints
#### **🐛 Bug Fixes**
- **Configuration Loading**: Fixed `NameError: 'config' is not defined` in `main.py` by restructuring config loading order
- **Service Startup**: Resolved backend startup issues in both foreground and background modes
- **Deprecation Warnings**: Updated `datetime.utcnow()` to `datetime.now(timezone.utc)` for future compatibility
- **Type Safety**: Fixed TypeScript type conflicts in frontend error handling components
#### **🔄 Code Optimizations**
- **DRY Principles**: Eliminated code duplication in error handling across backend and frontend
- **Modular Architecture**: Separated error handling concerns into reusable, testable modules
- **Component Separation**: Split Toast functionality into distinct hook and component files
- **Clean Code**: Applied consistent naming conventions and removed redundant imports
---
## v1.0.1 - 2025-08-26 🔧
### 🔧 **Configuration Management Improvements**
#### **📋 Environment Configuration Extraction**
- **Centralized Configuration**: Extracted hardcoded environment settings to `config.yaml`
- `max_tool_rounds`: Maximum tool calling rounds (configurable, default: 3)
- `service.host` & `service.port`: Service binding configuration
- `search.standard_regulation_index` & `search.chunk_index`: Search index names
- `citation.base_url`: Citation link base URL for CAT system
- **Code Optimization**: Reduced duplicate `get_config()` calls in `graph.py` with module-level caching
- **Enhanced Maintainability**: Environment-specific values now externalized for easier deployment management
#### **🚀 Performance Optimizations**
- **Configuration Caching**: Implemented `get_cached_config()` to avoid repeated configuration loading
- **Reduced Code Duplication**: Eliminated 4 duplicate `get_config()` calls across the workflow
- **Memory Efficiency**: Single configuration instance shared across the application
#### **✅ Quality Assurance**
- **Comprehensive Testing**: All configuration changes validated with existing test suite
- **Backward Compatibility**: No breaking changes to API or functionality
- **Configuration Validation**: Added verification of configuration loading and usage
---
## v1.0.0 - 2025-08-25 🎉
### 🚀 **STABLE RELEASE** - Agentic RAG System for Standards & Regulations
This marks the first stable release of our **Agentic RAG System** - a production-ready AI assistant for enterprise standards and regulations search and management.
---
### 🎯 **Core Features**
#### **🤖 Autonomous Agent Architecture**
- **LangGraph-Powered Workflow**: Multi-step autonomous agent using LangGraph OSS for intelligent tool orchestration
- **2-Phase Retrieval Strategy**: Intelligent metadata discovery followed by detailed content retrieval
- **Parallel Tool Execution**: Optimized parallel query processing for maximum information coverage
- **Multi-Round Intelligence**: Adaptive retrieval rounds based on information gaps and user requirements
#### **🔍 Advanced Retrieval System**
- **Dual Retrieval Tools**:
- `retrieve_standard_regulation`: Standards/regulations metadata discovery
- `retrieve_doc_chunk_standard_regulation`: Detailed document content chunks
- **Smart Query Optimization**: Automatic sub-query generation with bilingual support (Chinese/English)
- **Version Management**: Intelligent selection of latest published and current versions
- **Hybrid Search Integration**: Optimized for Azure AI Search's keyword + vector search capabilities
#### **💬 Real-time Streaming Interface**
- **Server-Sent Events (SSE)**: Real-time streaming responses with tool execution visibility
- **Assistant-UI Integration**: Modern conversational interface with tool call visualization
- **Progressive Enhancement**: Token-by-token streaming with tool progress indicators
- **Citation Tracking**: Real-time citation mapping and reference management
---
### 🛠 **Technical Architecture**
#### **Backend (Python + FastAPI)**
- **FastAPI Framework**: High-performance async API with comprehensive CORS support
- **PostgreSQL Memory**: Persistent conversation history with 7-day TTL
- **Configuration Management**: YAML-based configuration with environment variable support
- **Structured Logging**: JSON-formatted logs with request tracing and performance metrics
#### **Frontend (Next.js + Assistant-UI)**
- **Next.js 15**: Modern React framework with optimized performance
- **Assistant-UI Components**: Pre-built conversational UI elements with streaming support
- **Markdown Rendering**: Enhanced markdown with LaTeX formula support and external links
- **Responsive Design**: Mobile-friendly interface with dark/light theme support
#### **AI/ML Pipeline**
- **LLM Support**: OpenAI and Azure OpenAI integration with configurable models
- **Prompt Engineering**: Sophisticated system prompts with context-aware instructions
- **Citation System**: Automatic citation mapping with source tracking
- **Error Handling**: Graceful fallbacks with constructive user guidance
---
### 🔧 **Production Features**
#### **Memory & State Management**
- **PostgreSQL Integration**: Robust conversation persistence with automatic cleanup
- **Session Management**: User session isolation with configurable TTL
- **State Recovery**: Conversation context restoration across sessions
#### **Monitoring & Observability**
- **Structured Logging**: Comprehensive request/response logging with timing metrics
- **Error Tracking**: Detailed error reporting with stack traces and context
- **Performance Metrics**: Token usage tracking and response time monitoring
#### **Security & Reliability**
- **Input Validation**: Comprehensive request validation and sanitization
- **Rate Limiting**: Built-in protection against abuse
- **Error Isolation**: Graceful error handling without system crashes
- **Configuration Security**: Environment-based secrets management
---
### 📊 **Performance Metrics**
- **Response Time**: < 200ms for token streaming initiation
- **Context Capacity**: 100k tokens for extended conversations
- **Tool Efficiency**: Optimized "mostly 2" parallel queries strategy
- **Memory Management**: 7-day conversation retention with automatic cleanup
- **Concurrent Users**: Designed for enterprise-scale deployment
---
### 🎨 **User Experience**
#### **Intelligent Interaction**
- **Bilingual Support**: Seamless Chinese/English query processing and responses
- **Visual Content**: Smart image relevance checking and embedding
- **Citation Excellence**: Professional citation mapping with source links
- **Error Recovery**: Constructive suggestions when information is insufficient
#### **Professional Interface**
- **Tool Visualization**: Real-time tool execution progress with clear status indicators
- **Document Previews**: Rich preview of retrieved standards and regulations
- **Export Capabilities**: Easy copying and sharing of responses with citations
- **Accessibility**: WCAG-compliant interface design
---
### 🔄 **Deployment & Operations**
#### **Development Workflow**
- **UV Package Manager**: Fast, Rust-based Python dependency management
- **Hot Reload**: Development server with automatic code reloading
- **Testing Suite**: Comprehensive unit and integration tests
- **Documentation**: Complete API documentation and user guides
#### **Production Deployment**
- **Docker Support**: Containerized deployment with multi-stage builds
- **Environment Configuration**: Flexible configuration for different deployment environments
- **Health Checks**: Built-in health monitoring endpoints
- **Scaling Ready**: Designed for horizontal scaling and load balancing
---
### 📈 **Business Impact**
- **Enterprise Ready**: Production-grade system for standards and regulations management
- **Efficiency Gains**: Automated intelligent search replacing manual document review
- **Accuracy Improvement**: AI-powered relevance filtering and version management
- **User Satisfaction**: Intuitive interface with professional citation handling
- **Scalability**: Architecture supports growing enterprise needs
---
### 🎁 **What's Included**
- Complete source code with documentation
- Production deployment configurations
- Comprehensive testing suite
- User and administrator guides
- API documentation and examples
- Docker containerization setup
- Monitoring and logging configurations
---
### 🚀 **Getting Started**
```bash
# Clone and setup
git clone <repository>
cd agentic-rag-4
# Install dependencies
uv sync
# Configure environment
cp config.yaml.example config.yaml
# Edit config.yaml with your settings
# Start services
make dev-backend # Start backend service
make dev-web # Start frontend interface
# Access the application
open http://localhost:3000
```
---
**🎉 Thank you to all contributors who made this stable release possible!**
## v0.11.4 - 2025-08-25
### 📝 LLM Prompt Restructuring and Optimization
- **Major Workflow Restructuring**: Reorganized retrieval strategy for better clarity and efficiency
- **Simplified Workflow Structure**: Restructured "2-Phase Retrieval Strategy" section with clearer organization
- Combined retrieval phases under unified "Retrieval Strategy (for Standards/Regulations)" section
- Moved multi-round strategy explanation to the beginning for better flow
- **Enhanced Context Parameters**: Updated max_context_length from 96k to 100k tokens for better conversation handling
- **Query Strategy Optimization**: Refined sub-query generation approach
- Changed from "2-3 parallel rewritten queries" to "parallel rewritten queries" for flexibility
- Specified "2-3(mostly 2)" for sub-query generation to optimize efficiency
- Reorganized language mixing strategy placement for better readability
- **Duplicate Rule Consolidation**: Added version selection rule to synthesis phase (step 4) for consistency
- Ensures version prioritization applies throughout the entire workflow, not just metadata discovery
- **Enhanced Error Handling**: Improved "No-Answer with Suggestions" section
- Added specific guidance to "propose 35 example rewrite queries" for better user assistance
### 🔧 Technical Improvements
- **Query Optimization**: Streamlined sub-query generation process for better performance
- **Workflow Consistency**: Ensured version selection rules apply consistently across all workflow phases
- **Parameter Tuning**: Increased context window capacity for handling longer conversations
### 🎯 Quality Enhancements
- **User Guidance**: Enhanced fallback suggestions with specific query rewrite examples
- **Retrieval Efficiency**: Optimized parallel query generation strategy
- **Version Management**: Extended version selection logic to synthesis phase for comprehensive coverage
### 📊 Impact
- **Performance**: More efficient query generation with "mostly 2" sub-queries approach
- **Consistency**: Unified version selection behavior across all workflow phases
- **User Experience**: Better guidance when retrieval yields insufficient results
- **Scalability**: Increased context capacity supports longer conversation histories
## v0.11.3 - 2025-08-25
### 📝 LLM Prompt Enhancement - Version Selection Rules
- **Standards/Regulations Version Management**: Added intelligent version selection logic to Phase 1 metadata discovery
- **Version Selection Rule**: Added rule to handle multiple versions of the same standard/regulation
- When retrieval results contain similar items (likely different versions), default to the latest published and current version
- Only applies when user hasn't specified a particular version requirement
- **Image Processing Enhancement**: Improved visual content handling instructions
- Added relevance check by reviewing `<figcaption>` before embedding images
- Ensures only relevant figures/images are included in responses
- **Terminology Refinement**: Updated "official version" to "published and current version" for better precision
- Reflects the concept of "发布的现行" - emphasizing both official publication and current validity
### 🎯 Quality Improvements
- **Smart Version Prioritization**: Enhanced metadata discovery to automatically select the most appropriate document versions
- **Visual Content Validation**: Added systematic approach to verify image relevance before inclusion
- **Linguistic Precision**: Improved terminology to better reflect regulatory document status
### 📊 Impact
- **User Experience**: Reduces confusion when multiple document versions are available
- **Content Quality**: Ensures responses include only relevant visual aids
- **Regulatory Accuracy**: Better alignment with how regulatory documents are categorized and prioritized
## v0.11.2 - 2025-08-24
### 🔧 Configuration and Development Workflow Improvements
- **LLM Prompt Configuration**: Enhanced prompt wording and removed redundant "ALWAYS" requirement for Phase 2 retrieval
- **Workflow Flexibility**: Changed "ALWAYS follow this 2-phase strategy for ANY standards/regulations query" to "Follow this 2-phase strategy for standards/regulations query"
- **Phase Organization**: Reordered Phase 1 metadata discovery sections for better logical flow (Purpose Tool Query strategy)
- **Clearer Tool Description**: Enhanced Phase 2 tool description for better clarity
- **Sub-query Generation**: Improved instructions for generating different rewritten sub-queries
- **Configuration Updates**:
- **Tool Loop Limit**: Commented out `max_tool_loops` setting in config to use default value (5 instead of 10)
- **Service Configuration**: Updated default `max_tool_loops` from 3 to 5 in AppConfig for better balance
- **Frontend Dependencies**: Added `rehype-raw` dependency for enhanced HTML processing in markdown rendering
### 🎯 Code Organization
- **Development Workflow**: Enhanced prompt management and configuration structure
- **Documentation**: Updated project structure to reflect latest changes and improvements
- **Dependencies**: Added necessary frontend packages for improved markdown and HTML processing
### 📝 Development Notes
- **Prompt Engineering**: Refined retrieval strategy instructions for more flexible execution
- **Configuration Management**: Simplified configuration by using sensible defaults
- **Frontend Enhancement**: Added support for raw HTML processing in markdown content
## v0.11.1 - 2025-08-24
### 📝 LLM Prompt Optimization
- **English Wording Improvements**: Comprehensive optimization of LLM prompt for better clarity and professional tone
- **Grammar and Articles**: Fixed grammatical issues and article usage throughout the prompt
- "for CATOnline system" "for **the** CATOnline system"
- "information got from retrieval tools" "information **retrieved from** search tools"
- "CATOnline is an standards" "CATOnline is **a** standards"
- **Word Choice Enhancement**: Improved vocabulary and clarity
- "anwser questions" "**answer** questions" (spelling correction)
- "Give a Citations Mapping" "**Provide** a Citations Mapping"
- "Response in the user's language" "**Respond** in the user's language"
- "refuse and redirect" "**decline** and redirect"
- **Improved Flow and Structure**: Enhanced readability and professional presentation
- "maintain core intent" "maintain **the** core intent"
- "in the below exact format" "in the exact format **below**"
- "citations_map is as:" "citations_map **is:**"
- **Technical Accuracy**: Fixed technical description issues in Phase 2 query strategy
- **Consistency**: Ensured parallel structure and consistent terminology throughout
### 🎯 Quality Improvements
- **Professional Tone**: Enhanced overall professionalism of AI assistant instructions
- **Clarity**: Improved instruction clarity for better LLM understanding and execution
- **Readability**: Better structured sections with clearer headings and formatting
## v0.11.0 - 2025-08-24
### 🔧 HTML Comment Filtering Fix
- **Streaming Response Cleanup**: Fixed HTML comments leaking to client in streaming responses
- **Robust HTML Comment Removal**: Implemented comprehensive filtering using regex pattern `<!--.*?-->` with DOTALL flag
- **Citations Map Protection**: Specifically prevents `<!-- citations_map ... -->` comments from reaching client
- **Multi-Point Filtering**: Applied filtering in both `call_model` and `post_process_node` functions
- **Token Accumulation Strategy**: Enhanced streaming logic to accumulate tokens and batch-filter HTML comments
### 🛡️ Security and Data Integrity
- **Client-Side Protection**: Ensured no internal processing comments are exposed to end users
- **Citation Processing**: Maintained proper citation functionality while filtering internal metadata
- **Content Integrity**: Preserved all legitimate markdown content including citation links and references
### 🧪 Comprehensive Validation
- **HTML Comment Filtering Test**: Created dedicated test script `test_html_comment_filtering.py`
- **1700+ Event Analysis**: Validated 1714 streaming events with zero HTML comment leakage
- **Real HTTP API Testing**: Used actual streaming endpoint for authentic validation
- **Pattern Detection**: Comprehensive regex pattern matching for all HTML comment variations
- **All Existing Tests Maintained**: Confirmed no regression in existing functionality
- **Unit Tests**: 41/41 passing
- **Multi-Round Tool Calls**: Working correctly
- **2-Phase Retrieval**: Functioning as expected
- **Streaming Response**: Clean and efficient
### 📊 Technical Implementation Details
- **Streaming Logic Enhancement**:
```python
# Remove HTML comments while preserving content
content = re.sub(r'<!--.*?-->', '', content, flags=re.DOTALL)
```
- **Performance Optimization**: Minimal impact on streaming performance through efficient regex processing
- **Error Handling**: Robust handling of edge cases in comment filtering
- **Backward Compatibility**: Full compatibility with existing citation and markdown processing
### 🎯 Quality Assurance Results
- **Zero HTML Comments**: No `<!-- citations_map ... -->` or other HTML comments found in client output
- **Citation Functionality**: All citation links and references render correctly
- **Streaming Performance**: No degradation in response time or user experience
- **Cross-Platform Testing**: Validated on multiple query types and response patterns
## v0.10.0 - 2025-08-24
### 🎯 Optimal Multi-Round Architecture Implementation
- **Streaming Only at Final Step**: Refactored architecture to follow optimal "streaming only at final step" pattern
- **Non-Streaming Planning**: All tool calling phases now use non-streaming LLM calls for better stability
- **Streaming Final Synthesis**: Only the final response generation step streams to the user
- **Tool Results Accumulation**: Enhanced AgentState with `Annotated[List[Dict[str, Any]], reducer]` for proper tool result aggregation
- **Temporary Tool Disabling**: Tools are automatically disabled during final synthesis phase to prevent infinite loops
- **Simplified Routing Logic**: Streamlined `should_continue` logic based on tool_calls presence rather than complex state checks
### 🔧 Architecture Optimization
- **Enhanced State Management**: Improved AgentState design for robust multi-round execution
- Added `tool_results` accumulation with proper reducer function
- Enhanced `tool_rounds` tracking with automatic increment logic
- Simplified state updates and transitions between agent and tools nodes
- **Tool Execution Improvements**: Refined parallel tool execution and error handling
- Fixed tool disabling logic to prevent termination issues
- Enhanced logging for better debugging and monitoring
- Improved tool result processing and aggregation
- **Graph Flow Optimization**: Streamlined workflow routing for better reliability
- Simplified conditional routing logic
- Enhanced error handling and recovery mechanisms
- Improved final synthesis triggering and tool state management
### 🧪 Comprehensive Test Validation
- **All Tests Passing**: Achieved 100% test success rate across all test categories
- **Unit Tests**: 41/41 passed - Core functionality validated
- **Script Tests**: 10/10 passed - Multi-round, streaming, and 2-phase retrieval confirmed
- **Integration Tests**: Properly skipped (service-dependent tests)
- **Test Framework Improvements**: Enhanced script tests with proper async pytest decorators
- Fixed import order and pytest.mark.asyncio decorators in all script test files
- Resolved async function compatibility issues
- Improved test reliability and execution speed
### ✅ Feature Validation Complete
- **Multi-Round Tool Calls**: ✅ Automatic execution of 1-3 rounds confirmed via service logs
- **Parallel Tool Execution**: ✅ Concurrent tool execution within each round validated
- **2-Phase Retrieval Strategy**: ✅ Both metadata and content retrieval tools used systematically
- **Streaming Response**: ✅ Final response streams properly after all tool execution
- **Error Handling**: ✅ Robust error handling for tool failures, timeouts, and edge cases
- **Tool State Management**: ✅ Proper tool disabling during synthesis prevents infinite loops
### 📝 Documentation Updates
- **Implementation Notes**: Updated documentation to reflect optimal architecture
- **Test Coverage**: Comprehensive documentation of test validation results
- **Service Logs**: Confirmed multi-round behavior through actual service execution logs
## v0.9.0 - 2025-08-24
### 🎯 Multi-Round Parallel Tool Calling Implementation
- **Auto Multi-Round Tool Execution**: Implemented true automatic multi-round parallel tool calling capability
- Added `tool_rounds` and `max_tool_rounds` tracking to `AgentState` (default: 3 rounds)
- Enhanced agent node with round-based tool calling logic and round limits
- Fixed workflow routing to ensure final synthesis after completing all tool rounds
- Agent can now automatically execute multiple rounds of tool calls within a single user interaction
- Each round supports parallel tool execution for maximum efficiency
### 🔍 2-Phase Retrieval Strategy Enforcement
- **Mandatory 2-Phase Retrieval**: Fixed agent to consistently follow 2-phase retrieval for content queries
- **Phase 1**: Metadata discovery using `retrieve_standard_regulation`
- **Phase 2**: Content chunk retrieval using `retrieve_doc_chunk_standard_regulation`
- Updated system prompt to make 2-phase retrieval mandatory for content-focused queries
- Enhanced query construction with document_code filtering for Phase 2
- Agent now correctly uses both tools for queries requiring detailed content (testing methods, procedures, requirements)
### 🧪 Comprehensive Testing Framework
- **Multi-Round Test Suite**: Created extensive test scripts to validate new functionality
- `test_2phase_retrieval.py`: Validates both metadata and content retrieval phases
- `test_multi_round_tool_calls.py`: Tests multi-round automatic tool calling behavior
- `test_streaming_multi_round.py`: Confirms streaming works with multi-round execution
- All tests confirm proper parallel execution and multi-round behavior
### 🔧 Technical Enhancements
- **Workflow Routing Logic**: Improved `should_continue()` function for proper multi-round flow
- Enhanced routing logic to handle tool completion and round progression
- Fixed final synthesis routing after maximum rounds reached
- Maintained streaming response capability throughout multi-round execution
- **State Management**: Enhanced AgentState with round tracking and management
- **Tool Integration**: Verified both retrieval tools work correctly in multi-round scenarios
### ✅ Validation Results
- **Multi-Round Capability**: ✅ Agent executes 1-3 rounds of tool calls automatically
- **Parallel Execution**: ✅ Tools execute in parallel within each round
- **2-Phase Retrieval**: ✅ Agent uses both metadata and content retrieval tools
- **Streaming Response**: ✅ Full streaming support maintained throughout workflow
- **Round Management**: ✅ Proper progression and final synthesis after max rounds
## v0.8.7 - 2025-08-24
### 🛠 Tool Modularization
- **Tool Code Organization**: Extracted tool definitions and schemas into separate module
- Created new `service/graph/tools.py` module containing all tool implementations
- Moved `retrieve_standard_regulation` and `retrieve_doc_chunk_standard_regulation` functions
- Added `get_tool_schemas()` and `get_tools_by_name()` utility functions
- Updated `service/graph/graph.py` to import tools from the new module
- Updated test imports to reference tools from the correct module location
- Improved code maintainability and separation of concerns
## v0.8.6 - 2025-08-24
### 🔧 Configuration Restructuring
- **LLM Configuration Separation**: Extracted LLM parameters and prompt templates to dedicated `llm_prompt.yaml`
- Created new `llm_prompt.yaml` file containing parameters and prompts sections
- Added support for loading both `config.yaml` and `llm_prompt.yaml` configurations
- Enhanced configuration models with `LLMParametersConfig` and `LLMPromptsConfig`
- Added `get_max_context_length()` method for consistent context length access
- Updated `message_trimmer.py` to use new configuration structure
- Maintains backward compatibility with legacy configuration format
### 📂 File Structure Changes
- **New file**: `llm_prompt.yaml` - Contains all LLM-related parameters and prompt templates
- **Updated**: `service/config.py` - Enhanced to support dual configuration files
- **Updated**: `service/graph/message_trimmer.py` - Uses new configuration method
## v0.8.5 - 2025-08-24
### 🚀 Performance Improvements
- **Parallel Tool Execution**: Fixed sequential tool calling to implement true parallel execution
- Modified `run_tools_with_streaming()` to use `asyncio.gather()` for concurrent tool calls
- Added proper error handling and result aggregation for parallel execution
- Improved tool execution performance when LLM calls multiple tools simultaneously
- Enhanced logging to track parallel execution completion
### 🔧 Technical Enhancements
- **Query Optimization Strategy**: Enhanced agent prompt to encourage multiple parallel tool calls
- Agent now generates 1-3 rewritten queries before retrieval
- Cross-language query generation (Chinese ↔ English) for broader coverage
- Optimized for Azure AI Search's Hybrid Search capabilities
- True parallel tool calling implementation in LangGraph workflow
## v0.8.4 - 2025-08-24
### 🚀 Agent Intelligence Improvements
- **Advanced Query Rewriting Strategy**: Enhanced agent system prompt with intelligent query optimization
- Added mandatory query rewriting step before retrieval tool calls
- Generates 1-3 rewritten queries to explore different aspects of user intent
- Cross-language query generation (Chinese ↔ English) for broader search coverage
- Optimized queries for Azure AI Search's Hybrid Search (keyword + vector search)
- Parallel retrieval tool calling for comprehensive information gathering
- Enhanced coverage through synonyms, technical terms, and alternative phrasings
## v0.8.3 - 2025-08-24
### 🎨 UI/UX Improvements
- **Citation Format Update**: Changed citation format from superscript HTML tags `<sup>1</sup>` to square brackets `[1]`
- Updated agent system prompt to use square bracket citations for improved readability
- Modified citation examples in configuration to reflect new format
- Enhanced Markdown compatibility with bracket-style citations
### 🔧 Configuration Updates
- **Agent System Prompt Optimization**: Enhanced prompt engineering for better query rewriting capabilities
- Added support for generating 1-3 rewritten queries based on conversation context
- Improved parallel tool calling workflow for comprehensive information retrieval
- Added cross-language query generation (Chinese ↔ English) for broader search coverage
- Optimized query text for Azure AI Search's Hybrid Search (keyword + vector search)
## v0.8.2 - 2025-08-24
### 🐛 Code Quality Fixes
- **Removed Duplicate Route Definitions**: Fixed main.py having duplicate endpoint definitions
- Removed duplicate `/api/chat`, `/api/ai-sdk/chat`, `/health`, and `/` route definitions
- Removed duplicate `if __name__ == "__main__"` blocks
- Standardized `/api/chat` endpoint to use proper SSE configuration (`text/event-stream`)
- **Code Deduplication**: Cleaned up redundant code that could cause routing conflicts
- **Consistent Headers**: Unified streaming response headers for better browser compatibility
## v0.8.1 - 2025-08-24
### 🧪 Integration Test Modernization
- **Complete Integration Test Rewrite**: Modernized all integration tests to match latest codebase features
- **Remote Service Testing**: All integration tests now connect to running service at `http://localhost:8000` using `httpx.AsyncClient`
- **LangGraph v0.6+ Compatibility**: Updated streaming contract validation for latest LangGraph features
- **PostgreSQL Memory Testing**: Added session persistence testing with PostgreSQL backend
- **AI SDK Endpoints**: Comprehensive testing of `/api/chat` and `/api/ai-sdk/chat` endpoints
### 🔄 Test Infrastructure Updates
- **Modern Async Patterns**: Converted all tests to use `pytest.mark.asyncio` and async/await
- **Server-Sent Events (SSE)**: Added streaming response validation with proper SSE format parsing
- **Citation Processing**: Testing of citation CSV format and tool result aggregation
- **Concurrent Testing**: Multi-session and rapid-fire request testing for performance validation
### 📁 Test File Organization
- **`test_api.py`**: Basic API endpoints, request validation, CORS/security headers, error handling
- **`test_full_workflow.py`**: End-to-end workflows, session continuity, real-world scenarios
- **`test_streaming_integration.py`**: Streaming behavior, performance, concurrent requests, content validation
- **`test_e2e_tool_ui.py`**: Complete tool UI workflows, multi-turn conversations, specialized queries
- **`test_mocked_streaming.py`**: Mocked streaming tests for internal validation without external dependencies
### 🎯 Test Coverage Enhancements
- **Real-World Scenarios**: Compliance officer and engineer research workflow testing
- **Performance Testing**: Response timing, large context handling, rapid request sequences
- **Error Recovery**: Session recovery after errors, timeout handling, malformed request validation
- **Content Validation**: Unicode support, encoding verification, response consistency testing
### ⚙️ Test Execution
- **Service Dependency**: Integration tests require running service (fail appropriately when service unavailable)
- **Flag-based Execution**: Use `--run-integration` flag to execute integration tests
- **Comprehensive Validation**: All tests validate response structure, streaming format, and business logic
## v0.8.0 - 2025-08-23
### 🚀 Major Changes - PostgreSQL Migration
- **Breaking Change**: Migrated session memory storage from Redis to PostgreSQL
- **Complete removal of Redis dependencies**: Removed `redis` and `langgraph-checkpoint-redis` packages
- **New PostgreSQL-based session persistence**: Using `langgraph-checkpoint-postgres` for robust session management
- **Azure Database for PostgreSQL**: Configured for production Azure environment with SSL security
- **7-day TTL**: Automatic cleanup of old conversation data with PostgreSQL-based retention policy
### 🔧 Session Memory Infrastructure
- **PostgreSQL Storage**: Implemented comprehensive session-level memory with PostgreSQL persistence
- Created `PostgreSQLCheckpointerWrapper` for complete LangGraph checkpointer interface compatibility
- Automatic schema migration and table creation via LangGraph PostgresSaver
- Robust connection pooling with `psycopg[binary]` driver
- Context-managed database connections with automatic cleanup
- **Backward Compatibility**: Full interface compatibility with existing Redis implementation
- All checkpointer methods (sync/async): `get`, `put`, `list`, `get_tuple`, `put_writes`, etc.
- Graceful fallback mechanisms for async methods not natively supported by PostgresSaver
- Thread-safe execution with proper async/sync method bridging
### 🛠️ Technical Improvements
- **Configuration Updates**:
- Added `postgresql` configuration section to `config.yaml`
- Removed `redis` configuration sections completely
- Updated all logging and comments from "Redis" to "PostgreSQL"
- **Memory Management**:
- `PostgreSQLMemoryManager` for conditional PostgreSQL/in-memory checkpointer initialization
- Connection testing and validation during startup
- Improved error handling with detailed logging and connection diagnostics
- **Code Architecture**:
- Updated `AgenticWorkflow` to use PostgreSQL checkpointer for session memory
- Fixed variable name conflicts in `ai_sdk_chat.py` (config vs graph_config)
- Proper state management using `TurnState` objects in workflow execution
### 🐛 Bug Fixes
- **Workflow Execution**: Fixed async method compatibility issues with PostgresSaver
- Resolved `NotImplementedError` for `aget_tuple` and other async methods
- Added fallback to sync methods with proper thread pool execution
- Fixed LangGraph integration with correct `AgentState` format usage
- **Session History**: Restored conversation memory functionality
- Fixed session history loading and persistence across conversation turns
- Verified multi-turn conversations correctly remember previous context
- Ensured proper message threading with session IDs
### 🧹 Cleanup & Maintenance
- **Removed Legacy Code**:
- Deleted `redis_memory.py` and all Redis-related implementations
- Cleaned up temporary test files and development artifacts
- Removed all `__pycache__` directories
- Deleted obsolete backup and version files
- **Updated Documentation**:
- All code comments updated from Redis to PostgreSQL references
- Logging messages updated to reflect PostgreSQL usage
- Maintained existing API documentation and interfaces
### ✅ Verification & Testing
- **Functional Testing**: All core features verified working with PostgreSQL backend
- Chat functionality with tool calling and streaming responses
- Session persistence across multiple conversation turns
- PostgreSQL schema auto-creation and TTL cleanup functionality
- Health check endpoints and service startup/shutdown procedures
- **Performance**: No degradation in response times or functionality
- Maintained all existing streaming capabilities
- Tool execution and result processing unchanged
- Citation processing and response formatting intact
### 📈 Impact
- **Production Ready**: Fully migrated from Redis to Azure Database for PostgreSQL
- **Scalability**: Better long-term data management with relational database benefits
- **Reliability**: Enhanced data consistency and backup capabilities through PostgreSQL
- **Maintainability**: Simplified dependency management with single database backend
---
## v0.7.9 - 2025-08-23
### 🐛 Bug Fixes
- **Fixed**: Syntax errors in `service/graph/graph.py`
- Fixed type annotation errors with message parameters by adding proper type casting
- Fixed graph.astream call type errors by using proper `RunnableConfig` and `AgentState` typing
- Added missing `cast` import for better type handling
- Ensured compatibility with LangGraph and LangChain type system
---
## v0.7.8 - 2025-08-23
### 🔧 Configuration Updates
- **Breaking Change**: Replaced `max_tokens` with `max_context_length` in configuration
- **Added**: Optional `max_output_tokens` setting for LLM response length control
- Default: `None` (no output token limit)
- When set: Applied as `max_tokens` parameter to LLM calls
- Provides flexibility to limit output length when needed
- Updated conversation history management to use 96k context length by default
- Improved token allocation: 85% for conversation history, 15% reserved for responses
### 🔄 Conversation Management
- Enhanced conversation trimmer to handle larger context windows
- Updated trimming strategy to allow ending on AI messages for better conversation flow
- Improved error handling and fallback mechanisms in message trimming
### 📝 Documentation
- Updated conversation history management documentation
- Clarified distinction between context length and output token limits
- Added examples for optional output token limiting
---
## v0.7.7 - 2025-08-23
### Added
- **Conversation History Management**: Implemented automatic context length management
- Added `ConversationTrimmer` class to handle conversation history trimming
- Integrated with LangChain's `trim_messages` utility for intelligent message truncation
- Automatic token counting and trimming to prevent context window overflow
- Preserves system messages and maintains conversation validity
- Fallback to message count-based trimming when token counting fails
- Configurable token limits with 70% allocation for conversation history
- Smart conversation flow preservation (starts with human, ends with human/tool)
### Enhanced
- **Context Window Protection**: Prevents API failures due to exceeded token limits
- Monitors conversation length and applies trimming when necessary
- Maintains conversation quality while respecting LLM context constraints
- Improves reliability for long-running conversations
## v0.7.6 - 2025-08-23
### Enhanced
- **Universal Tool Calling**: Implemented consistent forced tool calling across all query types
- Modified graph.py to always use `tool_choice="required"` for better DeepSeek compatibility
- Ensures reliable tool invocation for both technical and non-technical queries
- Provides consistent behavior across all LLM providers (Azure, OpenAI, DeepSeek)
- Maintains response quality while guaranteeing tool usage for retrieval-based queries
### Validated
- **DeepSeek Integration**: Comprehensive testing confirms optimal configuration
- Verified that ChatOpenAI with custom endpoints fully supports DeepSeek models
- Confirmed that forced tool calling resolves DeepSeek tool invocation issues
- Tested both technical queries (GB/T standards) and general queries (greetings)
- Established that current implementation requires no DeepSeek-specific handling
## v0.7.5 - 2025-01-18
### Improved
- **Code Simplification**: Removed unnecessary ChatDeepSeek dependency and complexity
- Simplified LLMClient to use only ChatOpenAI for all OpenAI-compatible endpoints (including custom DeepSeek)
- Removed unused `langchain-deepseek` dependency as ChatOpenAI handles custom DeepSeek endpoints perfectly
- Cleaned up _create_llm method by removing DeepSeek-specific handling logic
- Maintained full compatibility with existing tool calling functionality
- Code is now more maintainable and follows KISS principle
## v0.7.4 - 2025-08-23
### Fixed
- **OpenAI Provider Tool Calling**: Fixed DeepSeek model tool calling issues for custom endpoints
- Added `langchain-deepseek` dependency for better DeepSeek model support
- Modified LLMClient to use ChatOpenAI for custom DeepSeek endpoints (instead of ChatDeepSeek which only works with official api.deepseek.com)
- Implemented forced tool calling using `tool_choice="required"` for initial queries to ensure tool usage
- Enhanced agent system prompt to explicitly require tool usage for all information queries
- Resolved issue where DeepSeek models weren't calling tools consistently when using provider: openai
- Now both Azure and OpenAI providers (including custom DeepSeek endpoints) work correctly with tool calling
### Enhanced
- **System Prompt Optimization**: Improved agent prompts for better tool usage reliability
- Added explicit tool listing and mandatory workflow instructions
- Enhanced prompts specifically for GB/T standards and technical information queries
- Better handling of Chinese technical queries with forced tool retrieval
## v0.7.3 - 2025-08-23
### Fixed
- **Citation Display**: Fixed citation header visibility logic
- Modified `_build_citation_markdown` function to only display "### 📘 Citations:" header when valid citations exist
- Prevents empty citation sections from appearing when agent response doesn't contain citation mapping
- Improved user experience by removing unnecessary empty citation headers
## v0.7.2 - 2025-01-16
### Enhanced
- **Tool Conversation Context**: Added conversation history parameter support to retrieval tools
- Both `retrieve_standard_regulation` and `retrieve_doc_chunk_standard_regulation` now accept `conversation_history` parameter
- Enhanced agent node to autonomously use tools with conversation context for better multi-turn understanding
- Improved tool call responses with contextual information for citations mapping
- **Citation Processing**: Improved citation mapping and metadata handling
- Updated `_build_citation_markdown` to prioritize English titles over Chinese for internationalization
- Enhanced `_normalize_result` function with dynamic structure and selective field removal
- Removed noise fields (`@search.score`, `@search.rerankerScore`, `@search.captions`, `@subquery_id`) from tool responses
- Improved tool result metadata structure with `@tool_call_id` and `@order_num` for accurate citation mapping
- **Agent Optimization**: Refined autonomous agent workflow for better tool usage
- Function calling mode (not ReAct) to minimize LLM calls and token consumption
- Enhanced multi-step tool loops with improved context passing between tool calls
- Optimized retrieval API configurations with `include_trace: False` for cleaner responses
- **Session Management**: Improved session behavior for better user experience
- Changed session ID generation to create new session on every page refresh
- Switched from localStorage to sessionStorage for session ID persistence
- New sessions start fresh conversations while maintaining session isolation per browser tab
### Fixed
- **Tool Configuration**: Updated retrieval API field selections and search parameters
- Standardized field lists for `select`, `search_fields`, and `fields_for_gen_rerank` across tools
- Removed deprecated `timestamp` and `x_Standard_Code` fields from standard regulation tool
- Added missing metadata fields (`func_uuid`, `filepath`, `x_Standard_Regulation_Id`) for proper citation link generation
## v0.7.1 - 2025-01-16
### Fixed
- **Session Memory Bug**: Fixed critical multi-turn conversation context loss in webchat
- **Root Cause**: `ai_sdk_chat.py` was creating new `TurnState` for each request without loading previous conversation history from Redis/LangGraph memory
- **Additional Issue**: Frontend was generating new `session_id` for each request instead of maintaining persistent session
- **Solution**: Refactored to let LangGraph's checkpointer handle session history automatically using `thread_id`
- **Frontend Fix**: Added `useSessionId` hook to maintain persistent session ID in localStorage, passed via headers to backend
- **Implementation**: Removed manual state creation, pass only new user message and `session_id` to compiled graph
- **Validation**: Tested multi-turn conversations with same `session_id` - second message correctly references first message context
- **Session Isolation**: Verified different sessions maintain separate conversation contexts without cross-contamination
### Enhanced
- **Memory Integration**: Improved LangGraph session memory reliability
- Stream callback handling via contextvars for proper async streaming
- Automatic fallback to in-memory checkpointer when Redis modules unavailable
- Robust error handling for Redis connection issues while maintaining session functionality
- **Frontend Session Management**: Added persistent session ID management
- `useSessionId` React hook for localStorage-based session persistence
- Session ID passed via `X-Session-ID` header from frontend to backend
- Graceful fallback to generated session ID if none provided
## v0.7.0 - 2025-08-22
### Added
- **Redis Session Memory**: Implemented robust session-level memory with Redis persistence
- Redis-based chat history storage with 7-day TTL using Azure Cache for Redis
- LangGraph `RedisSaver` integration for session persistence and state management
- Graceful fallback to `InMemorySaver` if Redis is unavailable or modules missing
- Session-level memory isolation using `thread_id` for proper conversation context
- Config validation with dedicated `RedisConfig` model for connection parameters
- Session memory verification tests confirming isolation and persistence
### Enhanced
- **Memory Architecture**: Refactored from simple in-memory store to session-based graph memory
- Migrated from `InMemoryStore` to LangGraph's checkpoint system
- Updated `AgenticWorkflow` graph to use `MessagesState` with Redis persistence
- Added `RedisMemoryManager` for conditional Redis/in-memory checkpointer initialization
- Session-based conversation tracking via `session_id` as LangGraph `thread_id`
## v0.6.2 - 2025-08-22
### Added
- **Stream Filtering for Citations Mapping**: Implemented intelligent filtering of citations mapping HTML comments from token stream
- Agent-generated citations mapping is now filtered from the client-side stream while preserved in the complete response
- Added buffer-based detection of HTML comment boundaries (`<!--` and `-->`)
- Ensures citations mapping CSV remains available for post-processing while not displaying to users
- Maintains complete response integrity in state for `post_process_node` to access citations mapping
- Enhanced token streaming logic with comment detection and filtering state management
### Improved
- **Optimized Stream Buffering Logic**: Enhanced token filtering to minimize latency
- Non-comment tokens are now sent immediately to client without unnecessary buffering
- Only potential HTML comment prefixes (`<`, `<!`, `<!-`) are buffered for detection
- Reduced buffer size from 10 characters to 4 characters (minimum needed for `<!--`)
- Improved user experience with faster token delivery for normal content
- **Citation List Block Return**: Changed citation list delivery from character-by-character streaming to single block return
- Citations are now sent as a complete markdown block in post-processing
- Improved rendering performance and reduces UI jitter
- Better user experience with instant citation list appearance
### Technical
- **Stream Token Filtering Logic**: Enhanced `call_model` function in agent node with sophisticated filtering
- Implements intelligent buffering that only delays tokens when necessary for comment detection
- Maintains filtering state to handle multi-token HTML comments
- Preserves all content in response while selectively filtering stream output
- Compatible with existing streaming protocol and post-processing pipeline
## v0.6.1 - 2025-08-22
### Added
- **Citation List and Link Building**: Enhanced `post_process_node` to build complete citation lists with links
- Added citation mapping extraction from agent responses using CSV format in HTML comments
- Implemented citation markdown generation following `build_citations.py` logic
- Added automatic link generation for CAT system with proper URL encoding
- Added helper functions: `_extract_citations_mapping`, `_build_citation_markdown`, `_remove_citations_comment`
- **Frontend External Links Support**: Added `rehype-external-links` plugin for secure external link handling
- Installed `rehype-external-links` v3.0.0 dependency in web frontend
- Configured automatic `target="_blank"` and `rel="noopener noreferrer"` for external links
- Enhanced security and UX for citation links and external references
### Fixed
- **Chat UI Link Rendering**: Fixed links not being properly rendered in the chat interface
- Resolved component configuration conflict between `MyChat` and `AiAssistantMessage`
- Updated `AiAssistantMessage` to properly use `MarkdownText` component with external links support
- Added `@tailwindcss/typography` plugin for proper prose styling
- Enhanced link styling with blue color and hover effects
- Added intelligent content detection to handle both Markdown and HTML content
- Installed `isomorphic-dompurify` for safe HTML sanitization
- Enhanced Agent prompt to explicitly require Markdown-only output (no HTML tags)
### Changed
- **Enhanced Post-Processing**: `post_process_node` now processes citations mapping and generates structured citation lists
- Extracts citations mapping CSV from agent response HTML comments
- Builds proper citation markdown with document titles, headers, and clickable links
- Streams citation markdown to client for real-time display
- Maintains clean separation between agent response and citation processing
### Technical
- Added URL encoding support for document codes and titles
- Improved error handling in citation processing with fallback to error messages
- Maintained backward compatibility with existing streaming protocol
- Enhanced markdown rendering with proper external link security attributes
## v0.6.0 - 2025-08-22
### Changed
- **Removed `agent_done` event**: The streaming protocol no longer includes the deprecated `agent_done` event.
- Removed handling in `AISDKEventAdapter` (`service/ai_sdk_adapter.py`).
- Cleaned up commented-out `create_agent_done_event` in `service/sse.py` and related imports in `service/graph/graph.py`.
- Updated tests to no longer expect `agent_done` events across unit and integration suites.
### Technical
- Simplified adapter logic by eliminating obsolete event type handling.
- Version bump to reflect breaking change in streaming protocol.
## v0.5.3 - 2025-01-27
### Fixed
- **Tool Result Retrieval**: Fixed agent not receiving tool results correctly
- Fixed tool node serialization in `service/graph/graph.py`
- Tool results now passed directly as dicts to agent instead of using `model_dump()`
- Agent can now correctly retrieve and use tool results in conversation flow
- Verified through SSE stream testing that tool results are properly transmitted
## v0.5.2 - 2025-01-27
### Changed
- **Simplified Data Structure**: Rewrote `_normalize_result` function to return dynamic data structure
- Returns `Dict[str, Any]` instead of rigid `RetrievalResult` class
- Automatically removes search-specific fields: `@search.score`, `@search.rerankerScore`, `@search.captions`, `@subquery_id`
- Removes empty fields (None, empty string, empty list, empty dict)
- Cleaner, more flexible result processing
### Removed
- **Removed Schema Dependencies**: Eliminated `service/schemas/retrieval.py`
- No longer need `RetrievalResult` class or `metadata` field
- Simplified `RetrievalResponse` class moved inline to `agentic_retrieval.py`
- Reduced code complexity and maintenance overhead
### Technical
- Updated `AgenticRetrieval` class to use dynamic result normalization
- Maintained backward compatibility with existing tool interfaces
- Improved data processing efficiency
## v0.5.1 - 2025-01-27
### Added
- **Citations Mapping CSV**: Added citations mapping CSV functionality to agent responses
- Updated `agent_system_prompt` in `config.yaml` to instruct LLM to generate citations mapping CSV
- Citations mapping CSV format: `{citation_number},{tool_call_id},{search_result_code}`
- Citations mapping embedded in HTML comment at end of response: `<!-- citations_map ... -->`
- Includes brief example in system prompt for clarity
- Fully compatible with existing streaming and markdown processing
### Technical
- Verified agent node and post-processing node support citations mapping output
- Confirmed SSE streaming handles citations mapping within markdown content
- Created validation test script to verify output format
## v0.5.0 - 2025-08-21
### Changed - Major Simplification
- **Simplified `post_process_node`**: 大幅简化后处理节点,现在只返回工具调用结果条目数的简单摘要
- 移除复杂的答案和引用提取逻辑
- 移除多个post-append事件流和特殊的`tool_summary`事件
- **工具摘要作为普通消息**: 现在工具执行摘要直接作为常规的AI消息返回以Markdown格式呈现
- **统一消息处理**: 去除特殊事件处理逻辑工具摘要通过标准消息流处理前端以普通markdown渲染
- 显著减少代码复杂度和维护成本,提升通用性
### Removed
- **AgentState字段简化**: 从`AgentState`中移除`citations_mapping_csv`字段
- 该字段仅用于复杂的引用处理,现已不需要
- 保留`stream_callback`字段,因为它在整个图形中用于事件流传输
- 相应地从`TurnState`中也移除了`citations_mapping_csv`字段
- **移除未使用的辅助函数**:
- `_extract_citations_from_markdown()`: 从Markdown中提取引用的复杂逻辑
- `_generate_basic_citations()`: 生成基础引用映射的函数
- `create_post_append_events()`: 创建复杂post-append事件序列的函数已被简化的工具摘要替代
- `create_tool_summary_event()`: 创建特殊工具摘要事件的函数(改为普通消息处理)
- 简化代码库,移除不再需要的引用处理逻辑
- **清理SSE模块**: 移除业务特定的事件创建函数
- 删除`create_post_append_events()`和`create_tool_summary_event()`函数及其相关测试
- SSE模块现在只包含通用的事件创建工具函数
- 提升模块的内聚性和可复用性
### Added
- **统一消息处理架构**: 工具执行摘要现在通过标准的LangGraph消息流处理
- 工具摘要以Markdown格式呈现包含 `**Tool Execution Summary**` 标题
- 前端以普通markdown渲染无需特殊事件处理逻辑
- 提升了系统的通用性和一致性
### Impact
- **代码复杂度**: 显著降低后处理逻辑的复杂度
- **维护性**: 更易于理解和维护的post-processing流程
- **性能**: 减少事件处理开销,更快的响应时间
- **向后兼容**: 保持API接口兼容内部实现简化
## v0.4.9 - 2024-12-21
### Changed
- 重命名前端目录:`web/src/lib` → `web/src/utils`
- 更新所有相关引用以使用新的目录结构
- 移除`web/src/components/ToolUIs.tsx`中未使用的imports
- 提升代码组织一致性utils目录更准确反映其工具函数的性质
### Fixed
- 修复前端构建错误删除对不存在schemas的引用
- 确保前端构建成功且服务正常运行
## v0.4.8 - 2024-12-21
### Removed
- 删除冗余的 `service/retrieval/schemas.py` 文件
- 该文件定义的静态工具schemas已被graph.py中的动态生成方式替代
- 消除代码重复,简化维护,避免静态和动态定义不一致的风险
### Improved
- 工具schemas现在完全通过动态生成基于工具对象属性
- 减少代码冗余提升maintainability
- 统一工具schema定义方式确保一致性
### Technical
- 验证删除后服务仍正常运行
- 保持向后兼容,无破坏性变更
## [0.4.7] - 2024-12-21## Refactored
- 重构代码目录结构,提升语义清晰度和模块化
- `service/tools/` → `service/retrieval/`
- `service/tools/retrieval.py` → `service/retrieval/agentic_retrieval.py`
- 更新所有相关导入路径,确保代码结构更加清晰和专业
- 清理Python缓存文件避免导入冲突
### Verified
- 验证重构后服务启动正常,所有功能运行正常
- 工具调用、Agent流程、后处理节点均工作正常
- HTTP API调用和响应流畅运行
- 无破坏性变更,向后兼容
### Technical
- 提升代码可维护性和可读性
- 为后续功能扩展奠定更好的基础架构
- 符合Python项目最佳实践的目录命名规范
## [0.4.6] - 2024-12-21.4.6 - 2024-12-21
### Improved
- 降低工具执行时图标的闪烁频率,提升视觉体验
- 将脉冲动画从2秒延长到3-4秒减少干扰性
- 调整透明度变化从0.6到0.75/0.85,更加柔和
- 添加温和的缩放效果(pulse-gentle)替代强烈的透明度变化
- 新增小型旋转加载指示器,提供更好的运行状态反馈
- 优化动画性能,使用更平滑的过渡效果
### Technical
- 新增CSS动画类animate-pulse-gentle, animate-spin-slow
- 改进工具UI的加载状态视觉设计
- 提供多种动画强度选择,适应不同用户偏好
## [0.4.5] - 2024-12-21
### Fixed
- 修复工具调用抽屉展开后显示原始JSON的问题
- 为检索工具结果提供格式化显示,包含文档标题、评分、内容预览和元数据
- 添加"格式化显示/原始数据"切换按钮,用户可选择查看方式
- 改进结果展示的用户体验,文档内容支持行截断显示
- 添加CSS line-clamp工具类支持文本截断
### Improved
- 工具UI结果显示更加用户友好和直观
- 支持长文档内容的截断预览超过200字符自动截断
- 增强了检索结果的可读性,突出显示关键信息
## [0.4.4] - 2024-12-21
### Changed
- Completely refactored `/web` codebase for DRY and best practices
- Created unified `ToolUIRenderer` component with TypeScript strict typing
- Eliminated all `any` types and improved type safety throughout
- Simplified tool UI generation with generic `createToolUI` factory function
- Fixed all TypeScript compilation errors and ESLint warnings
- Added missing dependencies: `@langchain/langgraph-sdk`, `@assistant-ui/react-langgraph`
### Removed
- All legacy test directories and components (`simplified`, `ui-test`, `chat-simplified`)
- Duplicate tool UI components (`EnhancedAssistant.tsx`, `ModernAssistant.tsx`, etc.)
- Empty directories and backup files
- TypeScript `any` type usage across API routes
### Fixed
- React Hooks usage in assistant-ui tool render functions
- TypeScript strict type checking compliance
- Build process now passes without errors or warnings
- Proper module exports and imports throughout codebase
### Technical
- Codebase now fully compliant with assistant-ui + LangGraph v0.6.0+ best practices
- All components properly typed with TypeScript strict mode
- Single source of truth for UI logic with `Assistant.tsx` component
- DRY tool UI implementation reduces code duplication by ~60%
## [0.4.3] - 2024-12-21
### ⚙️ Web UI Best Practices Implementation
- Updated frontend `/web` using `@assistant-ui/react@0.10.43`, `@assistant-ui/react-ui@0.1.8`, `@assistant-ui/react-markdown@0.10.9`, `@assistant-ui/react-data-stream@0.10.1`
- Improved Next.js API routes under `/web/src/app/api` for AI SDK Data Stream Protocol compatibility and enhanced error handling
- Added `EnhancedAssistant`, `SimpleAssistant`, and `FrontendTools` React components demonstrating assistant-ui best practices
- Created `docs/topics/ASSISTANT_UI_BEST_PRACTICES.md` guideline documentation
- Added unit tests in `tests/unit/test_assistant_ui_best_practices.py` validating dependencies, config, API routes, components, and documentation
- Switched to `pnpm` for dependency management with updated install scripts (`pnpm install`, `pnpm dev`)
### ✅ Tests
- All existing and new unit tests and integration tests passed, including best practices validation tests
## v0.4.2 - 2025-08-20
### 🧹 Code Cleanup and Refactoring
**代码清理重构**: 简化项目结构,移除冗余代码和配置
#### 文件重构
- **重命名主文件**: `improved_graph.py` → `graph.py`,简化文件命名
- **函数重命名**: `build_improved_graph()` → `build_graph()`,保持命名一致性
- **移除冗余文件**: 删除旧的graph.py备份和临时文件
#### 配置清理
- **精简config.yaml**: 移除已注释的旧配置项和冗余字段
- **移除过期提示**: 清理legacy prompts和未使用的synthesis prompts
- **统一日志配置**: 简化logging配置结构
#### 导入更新
- **更新主模块**: 修改service/main.py中的import语句
- **清理缓存**: 移除所有__pycache__目录
#### 验证
- ✅ 服务正常启动
- ✅ 健康检查通过
- ✅ API功能正常
---
## v0.4.1 - 2025-08-20
### 🎨 Markdown Output Format Upgrade
**重大用户体验提升**: Agent输出格式从JSON转换为Markdown提升可读性和用户体验
#### 核心改进
- **Markdown格式输出**: Agent现在生成Markdown格式响应包含结构化标题、列表和引用
- **增强引用处理**: 新增`_extract_citations_from_markdown()`函数从Markdown文本中提取引用信息
- **向下兼容性**: Post-process节点同时支持JSON旧格式和Markdown新格式响应
- **智能格式检测**: 自动检测响应格式并相应处理
- **完整日志记录**: 添加详细调试日志,跟踪响应格式检测和处理过程
#### 技术实现
- **系统提示更新**: 修改agent_system_prompt明确要求Markdown格式输出
- **双格式处理**: `post_process_node`增强支持JSON/Markdown双格式
- **流式事件验证**: 确保所有流式事件tool_start, tool_result, tokens, agent_done正常工作
- **服务重启检测**: 配置变更需要服务重启才能生效
#### 测试验证
- ✅ 流式集成测试确认Markdown输出
- ✅ 事件流验证通过
- ✅ 引用映射正确生成
- ✅ agent_done事件正确发送
---
## v0.4.0 - 2025-08-20
### 🚀 LangGraph v0.6.0+ Best Practices Implementation
**重大架构升级**: 完全重构LangGraph实现遵循v0.6.0+最佳实践实现真正的autonomous agent workflow
#### 核心改进
- **TypedDict状态管理**: 使用`TypedDict`替换`BaseModel`完全符合LangGraph v0.6.0+标准
- **Function Calling Agent**: 实现纯function calling模式摒弃ReAct减少LLM调用次数和token消耗
- **Autonomous Tool Usage**: Agent可根据上下文自动使用合适工具支持基于前面输出的连续工具调用
- **Integrated Synthesis**: 将synthesis步骤整合到agent节点减少额外LLM调用
#### 架构优化
- **简化工作流**: Agent → Tools → Agent → Post-process (更符合LangGraph标准模式)
- **减少LLM调用**: 从3次LLM调用减少到1-2次显著降低token消耗
- **标准化工具绑定**: 使用LangChain `bind_tools()`和标准tool schema
- **改进状态传递**: 遵循LangGraph `add_messages`模式
#### 技术细节
- **新文件**: `service/graph/improved_graph.py` - 实现v0.6.0+最佳实践
- **Agent System Prompt**: 更新为支持autonomous function calling的prompt
- **工具执行**: 保持streaming支持的同时简化执行逻辑
- **后处理节点**: 仅处理格式化和事件发送不再调用LLM
#### 测试与验证
- **测试脚本**: `scripts/test_improved_langgraph.py` - 验证新实现
- **工具调用**: ✅ 自动调用retrieve_standard_regulation和retrieve_doc_chunk_standard_regulation
- **事件流**: ✅ 支持tool_start、tool_result等streaming events
- **状态管理**: ✅ 正确的TypedDict状态传递
#### 配置更新
- **新增**: `agent_system_prompt` - 专为autonomous agent设计的system prompt
- **保持向后兼容**: 原有配置和接口保持不变
## v0.3.6 - 2025-08-20
### Major LangGraph Optimization Implementation ⚡
- **正式实施LangGraph优化方案**: 完成了生产代码中的LangGraph最佳实践实施
- **重构主要组件**:
- 使用`StateGraph`、`add_node`、`conditional_edges`替代自定义工作流
- 实现`@tool`装饰器模式提高工具定义的DRY原则
- 简化状态管理使用LangGraph标准`AgentState`
- 模块化节点函数:`call_model`、`run_tools`、`synthesis_node`、`post_process_node`
### Technical Improvements
- **代码质量提升**: 遵循LangGraph官方示例的设计模式
- **维护性**: 减少重复代码,提高可读性和可测试性
- **标准化**: 使用社区认可的LangGraph工作流编排方式
- **依赖管理**: 添加langgraph>=0.2.0到项目依赖
### Performance & Architecture
- **预期性能提升**: 基于之前分析预计35%的性能改进
- **更清晰的控制流**: 使用conditional_edges进行决策路由
- **工具执行优化**: 标准化工具调用和结果处理流程
- **错误处理**: 改进的异常处理和降级策略
### Implementation Status
- ✅ 核心LangGraph工作流实现完成
- ✅ 工具装饰器模式实施
- ✅ 状态管理优化
- ✅ 依赖更新和导入修复
- ✅ **集成测试全部通过** (4/4, 100%成功率)
- ✅ **单元测试全部通过** (20/20, 100%成功率)
- ✅ **工作流验证成功**: 工具调用、流式响应、条件路由正常
- ✅ **API兼容性**: 与现有前端和接口完全兼容
### Test Results
- **核心功能**: 服务健康、API文档、图构建全部正常
- **工作流执行**: call_model → tools → synthesis 流程验证成功
- **工具调用**: 检测到正确的工具调用事件(retrieve_standard_regulation, retrieve_doc_chunk_standard_regulation)
- **流式响应**: 376个SSE事件正确接收和处理
- **会话管理**: 多轮对话功能正常
## v0.3.5 - 2025-08-20
### Research & Analysis
- **LangGraph实现优化研究 (LangGraph Implementation Optimization)**
- **官方示例分析**: 研究了assistant-ui-langgraph-fastapi官方示例
- **创建简化版本**: 实现了基于LangGraph最佳实践的简化版本 (`simplified_graph.py`)
- **性能对比**: 简化版本比当前实现快35%代码量减少50%
- **最佳实践应用**: 使用`@tool`装饰器、标准LangGraph模式和简化状态管理
### Key Findings
- **代码更简洁**: 从400行减少到200行代码
- **更标准化**: 遵循LangGraph社区约定和最佳实践
- **性能提升**: 35%的执行时间改进
- **维护性**: 更模块化和可测试的代码结构
### Next Steps
- 需要将简化版本的功能完善到与当前版本等效
- 考虑逐步迁移到标准LangGraph模式
- 保持现有SSE流式处理和citation功能
## v0.3.4 - 2025-08-20
### Housekeeping
- **代码目录整理 (Code Organization)**
- **临时脚本迁移**: 将所有临时测试和演示脚本从 `scripts/` 迁移到 `tests/tmp/`
- **脚本分离**: `scripts/` 目录现在只包含生产用脚本(服务管理等)
- **整洁架构**: 提高代码可维护性和目录结构的清晰度
### Moved Files
- `scripts/startup_demo.py` → `tests/tmp/startup_demo.py`
- `scripts/test_startup_modes.py` → `tests/tmp/test_startup_modes.py`
### Directory Structure Clean-up
- **`scripts/`**: 只包含生产脚本start_service.sh, stop_service.sh 等)
- **`tests/tmp/`**: 包含所有临时测试和演示脚本
- **`.tmp/`**: 包含调试和开发时临时文件
## v0.3.3 - 2025-08-20
### Enhanced
- **服务启动方式重大改进 (Service Startup Improvements)**
- **默认前台运行**: 服务现在默认在前台运行,便于开发调试和实时查看日志
- **优雅停止**: 前台模式支持 `Ctrl+C` 优雅停止服务
- **多种启动模式**: 支持前台、后台、开发模式三种启动方式
- **改进的脚本**: `scripts/start_service.sh` 支持 `--background` 和 `--dev` 参数
- **增强的 Makefile**: 新增 `make start-bg` 命令用于后台启动
- **详细的使用指南**: 新增 `docs/SERVICE_STARTUP_GUIDE.md` 完整说明
### Service Management Commands
- `make start` - 前台运行(默认,推荐开发)
- `make start-bg` - 后台运行(适合生产)
- `make dev-backend` - 开发模式(自动重载)
- `make stop` - 停止服务
- `make status` - 检查服务状态
### Script Options
- `./scripts/start_service.sh` - 前台运行(默认)
- `./scripts/start_service.sh --background` - 后台运行
- `./scripts/start_service.sh --dev` - 开发模式
### Documentation
- 新增 `docs/SERVICE_STARTUP_GUIDE.md` - 详细的服务启动指南
- 更新 `README.md` - 反映新的启动方式和最佳实践
- 更新 Makefile 帮助信息
## v0.3.2 - 2025-08-20
### Enhanced
- **UI 优化 (UI Improvements)**
- **图标闪烁频率降低**: 将工具执行时的图标闪烁从快速脉冲改为2秒慢速脉冲 (`animate-pulse-slow`),减少视觉干扰
- **移除头像区域**: 隐藏助手和用户头像,为聊天内容提供更大显示空间
- **布局优化**: 将主容器最大宽度从 `max-w-4xl` 扩展到 `max-w-5xl`,充分利用移除头像后的额外空间
- **消息间距优化**: 增加助手回复内容区域上方的间距 (`margin-top: 1.5rem`),改善工具调用框与回答内容的视觉分离
- **自动隐藏滚动条**: 为聊天区域添加自动隐藏滚动条样式,提升视觉美观度
- **消息区域底色**: 为助手消息区域添加淡色背景 (`bg-muted/30`),提升内容可读性
- **等待动画效果**: 启用assistant-ui等待消息内容时的动画效果包括"AI is thinking..."指示器、类型输入点、工具调用微光效果和消息出现动画
- **工具状态颜色优化**: 优化工具调用进度文字颜色,使其符合整体设计系统色谱
- **工具状态对齐优化**: 调整工具调用进度文字位置,使其与工具标题横向对齐
- **CSS改进**: 通过CSS选择器隐藏头像元素调整消息布局以移除头像占用的空间
### Technical Details
- 添加 `animate-pulse-slow` 自定义动画类 (2秒周期透明度0.6-1.0渐变)
- 通过CSS隐藏 `[data-testid="avatar"]` 和 `.aui-avatar` 元素
- 调整消息容器的 `margin-left` 和 `padding-left` 为0
- 工具图标使用 `animate-pulse-slow` 替代 `animate-pulse`
- 为助手消息内容区域添加 `margin-top: 1.5rem`,增加与工具调用框的间距
- 滚动条样式: `scrollbar-hide` (webkit) 和 `scrollbar-width: none` (firefox)
- assistant-ui 等待动画包括:
- `.aui-composer-attachment-root[data-state="loading"]`: 加载状态脉冲动画
- `.aui-message[data-loading="true"]`: 消息加载时的类型输入点动画
- `.aui-tool-call[data-state="loading"]`: 工具调用微光效果
- `.aui-thread[data-state="running"] .aui-composer::before`: "AI is thinking..." 指示器
- 工具状态颜色系统:
- `.tool-status-running`: Primary blue (80% opacity) - 蓝色运行状态
- `.tool-status-processing`: Warm amber (80% opacity) - 温暖琥珀色处理状态
- `.tool-status-complete`: Emerald green - 翠绿色完成状态
- `.tool-status-error`: Destructive red (80% opacity) - 红色错误状态
- 工具布局: 使用 `justify-between` 实现标题和状态文字的横向对齐
## v0.3.1 - 2025-08-20
### Enhanced
- **UI Animations**: Applied `assistant-ui` animation effects with fade-in and slide-in for tool calls and responses using custom Tailwind CSS utilities.
- **Tool Icons**: Configured `retrieve_standard_regulation` tool to use `legal-document.png` icon and `retrieve_doc_chunk_standard_regulation` to use `search.png`.
- **Component Updates**: Updated `ToolUIs.tsx` to integrate Next.js `Image` component for custom icons.
- **CSS Enhancements**: Defined custom keyframes and utility classes in `globals.css` for animation support.
- **Tailwind Config**: Added `tailwindcss-animate` and `@assistant-ui/react-ui/tailwindcss` plugins in `tailwind.config.ts`.
## v0.3.0 - 2025-08-20
### Added
- **Function-call based autonomous agent**
- LLM-driven dynamic tool selection and multi-round iteration
- Integration of `retrieve_standard_regulation` and `retrieve_doc_chunk_standard_regulation` tools via OpenAI function calling
- **LLM client enhancements**: `bind_tools()`, `ainvoke_with_tools()` for function-calling support
- **Agent workflow refactoring**: `AgentNode` and `AgentWorkflow` redesigned for autonomous execution
- **Configuration updates**: New prompts in `config.yaml` (`agent_system_prompt`, `synthesis_system_prompt`, `synthesis_user_prompt`)
- **Test scripts**: Added `scripts/test_autonomous_agent.py` and `scripts/test_autonomous_api.py`
- **Documentation**: Created `docs/topics/AUTONOMOUS_AGENT_UPGRADE.md` covering the new architecture
### Changed
- Refactored RAG pipeline to function-call based autonomy
- Backward-compatible CLI/API endpoints and prompts maintained
### Fixed
- N/A
## v0.2.9
### Added
- **🌍 多语言支持 (Multi-Language Support)**
- **自动语言检测**: 根据浏览器首选语言自动切换界面语言
- **URL参数覆盖**: 支持通过 `?lang=zh` 或 `?lang=en` URL参数强制指定语言
- **语言切换器**: 页面右上角提供便捷的语言切换按钮
- **持久化存储**: 用户选择的语言偏好保存到 localStorage
- **全面本地化**: 包括页面标题、工具名称、状态消息、按钮文本等所有UI元素
### Technical Features
- **i18n架构**: 完整的国际化基础设施
- 类型安全的翻译系统 (`lib/i18n.ts`)
- React Hook集成 (`hooks/useTranslation.ts`)
- 实时语言切换支持
- **URL状态同步**: 语言选择自动同步到URL支持直接分享多语言链接
- **事件驱动更新**: 基于自定义事件的响应式语言切换机制
### Languages Supported
- **中文** (zh): 完整的中文界面,包括工具调用状态和结果展示
- **English** (en): 完整的英文界面,专业术语准确翻译
### User Experience
- **智能默认值**:
1. 优先使用URL参数指定的语言
2. 其次使用用户保存的语言偏好
3. 最后回退到浏览器首选语言
- **无缝切换**: 语言切换无需页面刷新,即时生效
- **开发者友好**: 易于扩展新语言,翻译字符串集中管理
## v0.2.8
### Enhanced
- **Tool UI Redesign**: Completely redesigned tool call UI with assistant-ui pre-built components
- **Drawer-style Interface**: Tool calls now display as collapsible cards by default, showing only name and status
- **Expandable Details**: Click to expand/collapse tool details (query, results, etc.)
- **Simplified Components**: Removed complex inline styling in favor of Tailwind CSS classes
- **Better UX**: Tool calls are less intrusive while remaining accessible
- **Status Indicators**: Clear visual feedback for running, completed, and error states
- **Chinese Localization**: Tool names and status messages in Chinese for better user experience
### Technical
- **Tailwind Integration**: Enhanced Tailwind config with full shadcn/ui color variables and animation support
- Added `tailwindcss-animate` dependency via pnpm
- Configured `@assistant-ui/react-ui/tailwindcss` with shadcn theme support
- Added comprehensive CSS variables for consistent theming
- **Component Architecture**: Improved separation of concerns with cleaner component structure
- **State Management**: Added local state management for tool expansion/collapse functionality
## v0.2.7
### Changed
- **Script Organization**: Moved `start_service.sh` and `stop_service.sh` into the `/scripts` directory for better structure.
- **Makefile Updates**: Updated `make start`, `make stop`, and `make dev-backend` to reference scripts in `/scripts`.
- **VSCode Tasks**: Adjusted `.vscode/tasks.json` to run service management scripts from `/scripts`.
## v0.2.6
### Fixed
- **Markdown Rendering**: Enabled rendering of assistant messages as markdown in the chat UI.
- Correctly pass `assistantMessage.components.Text` to the `Thread` component.
- Updated CSS import to use `@assistant-ui/react-markdown/styles/dot.css`.
### Added
- **MarkdownText Component**: Introduced `MarkdownText` via `makeMarkdownText()` in `web/src/components/ui/markdown-text.tsx`.
- **Thread Configuration**: Updated `web/src/app/page.tsx` to configure `Thread` for markdown with `assistantMessage.components`.
### Changed
- **CSS Imports**: Replaced incorrect markdown CSS imports in `globals.css` with the correct path from `@assistant-ui/react-markdown`.
## v0.2.5
### Fixed
- **React Infinite Loop Error**: Resolved "Maximum update depth exceeded" error in tool UI registration
- **Problem**: Incorrect usage of useToolUIs hook causing setState循环导致的forceStoreRerender无限调用
- **Solution**: Adopted correct assistant-ui pattern - direct component usage instead of manual registration
- **Implementation**: Place tool UI components directly inside AssistantRuntimeProvider (not via setToolUI)
- **UI Stability**: 前端现在可以正常加载无React运行时错误
### Added
- **Tool UI Components**: Implemented custom assistant-ui tool UI components for enhanced user experience
- **RetrieveStandardRegulationUI**: Visual component for standard regulation search with query display and result summary
- **RetrieveDocChunkStandardRegulationUI**: Visual component for document chunk retrieval with content preview
- **Tool UI Registration**: Proper registration system using useToolUIs hook and setToolUI method
- **Visual Feedback**: Tool calls now display as interactive UI elements instead of raw JSON data
### Enhanced
- **Interactive Tool Display**: Tool calls now rendered as branded UI components with:
- 🔍 Search icons and status indicators (Searching... / Processing...)
- Query display with formatted text
- Result summaries with document codes, titles, and content previews
- Color-coded status (blue for running, green/orange for results)
- Responsive design with proper spacing and typography
### Technical
- **Frontend Architecture**: Updated page.tsx to properly register tool UI components
- Import useToolUIs hook from @assistant-ui/react
- Created ToolUIRegistration component for clean separation of concerns
- TypeScript-safe implementation with proper type handling for args, result, and status
## v0.2.4
### Fixed
- **Post-Append Events Display**: Fixed missing UI display of post-processing events
- **Problem**: Last 3 post-append events were sent as type 2 (data) events but not displayed in UI
- **Solution**: Modified AI SDK adapter to convert post-append events to visible text streams
- **post_append_2**: Tool execution summary now displays as formatted text: "🛠️ **Tool Execution Summary**"
- **post_append_3**: Notice message now displays as formatted text: "⚠️ **AI can make mistakes. Please check important info.**"
- **UI Compliance**: All three post-append events now visible in assistant-ui interface
### Enhanced
- **User Experience**: Post-processing information now properly integrated into chat flow
- Tool execution summaries provide transparency about backend operations
- Warning notices ensure users are informed about AI limitations
- Formatted display improves readability and user awareness
## v0.2.3
### Verified
- **Post-Processing Node Compliance**: Confirmed full compliance with prompt.md specification
- ✅ Post-append event 1: Agent's final answer + citations_mapping_csv (excluding tool raw prints)
- ✅ Post-append event 2: Consolidated printout of all tool call outputs used for this turn
- ✅ Post-append event 3: Trailing notice "AI can make mistakes. Please check important info."
- All three events sent in correct order after agent completion
- Events properly formatted in AI SDK Data Stream Protocol (type 2 - data events)
### Debugging Tools Added
- **Debug Scripts**: Added comprehensive debugging utilities for post-processing verification
- `debug_ai_sdk_raw.py`: Inspects raw AI SDK endpoint responses for post-append events
- `test_post_append_final.py`: Validates all three post-append events in correct order
- `debug_post_append_format.py`: Analyzes post-append event structure and content
- Server-side logging in PostProcessNode for event generation verification
### Tests
- **Post-Append Compliance Test**: Complete validation of prompt.md requirements
- ✅ Total chunks: 864, all post-append events found at correct positions (861, 862, 863)
- ✅ Post-append 1: Contains answer (854 chars) + citations (494 chars)
- ✅ Post-append 2: Contains tool outputs (2 tools executed)
- ✅ Post-append 3: Contains exact notice message as specified
- **Final Result**: FULLY COMPLIANT with prompt.md specification
## v0.2.2
### Fixed
- **UI Content Display**: Fixed PostProcessNode content not appearing in assistant-ui interface
- Modified AI SDK adapter to stream final answers as text events (type 0)
- Updated adapter to extract answer content from post_append_1 events correctly
- Fixed event formatting to ensure proper UI rendering compatibility
### Tests
- **Integration Test Success**: Complete workflow validation confirms perfect system integration
- ✅ AI SDK endpoint streaming protocol fully operational
- ✅ Tool call events (type 9) and tool result events (type a) working correctly
- ✅ Text streaming events (type 0) rendering final answers properly
- ✅ Assistant-ui compatibility with LangGraph backend confirmed
- **Test Results**: 2 tool calls, 2 tool results, 509 text events, 1 finish event
- **Content Validation**: Complete answer with citations, references, and proper formatting
- **UI Rendering**: Real-time streaming display with tool execution visualization
## v0.2.1
### Fixed
- **Message Format Compatibility**: Fixed assistant-ui to backend message format conversion
- assistant-ui sends `content: [{"type": "text", "text": "message"}]` array format
- Backend expects `content: "message"` string format
- Added transformation logic in `/web/src/app/api/chat/route.ts` to convert formats
- Resolved Pydantic validation error: "Input should be a valid string [type=string_type]"
- **End-to-End Chat Flow**: Verified complete user input → format conversion → tool execution → streaming response pipeline
### Added
- **Assistant-UI Integration**: Complete integration with @assistant-ui/react framework for professional chat interface
- **Data Stream Protocol**: Full implementation of Vercel AI SDK Data Stream Protocol for real-time streaming
- **Custom Tool UIs**: Rich visual components for different tool types:
- Document retrieval UI with relevance scoring and source information
- Web search UI with result links and snippets
- Python code execution UI with stdout/stderr display
- URL fetching UI with page content preview
- Code analysis UI with suggestions and feedback
- **Next.js 15 Frontend**: Modern React 19 + TypeScript + Tailwind CSS v3 web application
- **Responsive Design**: Mobile-friendly interface with dark/light theme support
- **Streaming Visualization**: Real-time display of AI reasoning steps and tool executions
### Enhanced
- **Simplified UI Architecture**: Streamlined web interface with minimal code and default styling
- Removed custom tool UI components in favor of assistant-ui defaults
- Reduced `/web/src/app/page.tsx` to essential AssistantRuntimeProvider and Thread components
- Simplified `/web/src/app/globals.css` to basic reset and assistant-ui imports only
- Minimized `/web/tailwind.config.ts` configuration for cleaner build
- Removed unnecessary dependencies for lighter bundle size
- **Backend Protocol Compliance**: Updated AI SDK adapter to match official Data Stream Protocol specification
- **Event Format**: Standardized to `TYPE_ID:JSON\n` format for all streaming events
- **Tool Call Visualization**: Step-by-step visualization of multi-tool workflows
- **Error Handling**: Comprehensive error states and recovery mechanisms
- **Performance**: Optimized streaming and rendering for smooth user experience
### Technical Implementation
- **Protocol Mapping**: Proper mapping of LangGraph events to Data Stream Protocol types:
- Type 0: Text streaming (tokens)
- Type 9: Tool calls with arguments
### Integration Testing Results ✅
- **Frontend Service**: Successfully deployed on localhost:3000 with Next.js 15 + Turbopack
- **Backend Service**: Healthy and responsive on localhost:8000 (FastAPI + LangGraph)
- **API Proxy**: Correct routing from `/api/chat` to backend AI SDK endpoint with format conversion
- **Message Format**: assistant-ui array format correctly converted to backend string format
- **Streaming Protocol**: Data Stream Protocol events properly formatted and transmitted
- **Tool Execution**: Multi-step tool calls working (retrieve_standard_regulation, etc.)
- **UI Rendering**: assistant-ui components properly rendered with default styling
- **End-to-End Flow**: Complete user query → tool execution → streaming response pipeline verified
- Format conversion: assistant-ui array format → backend string format
- Tool execution validation: retrieve_standard_regulation, retrieve_doc_chunk_standard_regulation
- Real-time streaming with proper Data Stream Protocol compliance
- Content relevance verification: automotive safety standards and testing procedures
- Type a: Tool results
- Type d: Message completion
- Type 3: Error handling
- **Runtime Integration**: `useDataStreamRuntime` for seamless assistant-ui integration
- **API Proxy**: Next.js API route for backend communication with proper headers
- **Component Architecture**: Modular tool UI components with makeAssistantToolUI
### Documentation
- **Protocol Reference**: Enhanced `docs/topics/AI_SDK_UI.md` with implementation details
- **Integration Guide**: Comprehensive setup and testing procedures
- **API Compatibility**: Dual endpoint support for legacy and modern integrations
# v0.1.7
### Changed
- **Simplified Web UI**: Replaced Tailwind CSS with inline styles for simpler, more maintainable code
- **Reduced Dependencies**: Removed complex styling frameworks in favor of vanilla CSS-in-JS approach
- **Cleaner Interface**: Simplified chatbot UI with essential functionality and clean default styling
- **Streamlined Code**: Reduced component complexity by removing unnecessary features like timestamps and session display
### Improved
- **Code Maintainability**: Easier to understand and modify without external CSS framework dependencies
- **Performance**: Lighter bundle size without Tailwind CSS classes
- **Accessibility**: Cleaner DOM structure with semantic HTML and inline styles
### Removed
- **Tailwind CSS Classes**: Replaced complex utility classes with simple inline styles
- **Timestamp Display**: Removed message timestamps for cleaner interface
- **Session ID Display**: Simplified footer by removing session information
- **Complex Animations**: Simplified loading indicators and removed complex animations
### Technical Details
- Maintained all core functionality (streaming, error handling, message management)
- Preserved AI SDK Data Stream Protocol compatibility
- Kept responsive design with percentage-based layouts
- Used standard CSS properties for styling (flexbox, basic colors, borders)
# v0.1.6
### Fixed
- **Web UI Component Error**: Resolved "The default export is not a React Component in '/page'" error caused by empty `page.tsx` file
- **AI SDK v5 Compatibility**: Fixed compatibility issues with Vercel AI SDK v5 API changes by implementing custom streaming solution
- **TypeScript Errors**: Resolved compilation errors related to deprecated `useChat` hook properties in AI SDK v5
- **Frontend Dependencies**: Ensured all required AI SDK dependencies are properly installed and configured
### Changed
- **Custom Streaming Implementation**: Replaced AI SDK v5 `useChat` hook with custom streaming solution for better control and compatibility
- **Direct Protocol Handling**: Implemented direct AI SDK Data Stream Protocol parsing in frontend for real-time message updates
- **Enhanced Error Handling**: Added comprehensive error handling for network issues and streaming failures
- **Message State Management**: Improved message state management with TypeScript interfaces and proper typing
### Technical Implementation
- **Custom Stream Reader**: Implemented `ReadableStream` processing with `TextDecoder` for chunk-by-chunk data handling
- **Protocol Parsing**: Direct parsing of AI SDK protocol lines (`0:`, `9:`, `a:`, `d:`, `2:`) in frontend
- **Real-time Updates**: Optimized message content updates during streaming for smooth user experience
- **Session Management**: Added session ID generation and tracking for conversation context
### Validated
- ✅ Frontend compiles without TypeScript errors
- ✅ Chat interface loads successfully at http://localhost:3000
- ✅ Custom streaming implementation works with backend AI SDK endpoint
- ✅ Real-time message updates during streaming responses
- ✅ Error handling for failed requests and network issues
# v0.1.5
### Added
- **Web UI Chatbot**: Created comprehensive Next.js chatbot interface using Vercel AI SDK Elements in `/web` directory
- **AI SDK Protocol Adapter**: Implemented `service/ai_sdk_adapter.py` to convert internal SSE events to Vercel AI SDK Data Stream Protocol
- **AI SDK Compatible Endpoint**: Added new `/api/ai-sdk/chat` endpoint for frontend integration while maintaining backward compatibility
- **Frontend API Proxy**: Created Next.js API route `/api/chat/route.ts` to proxy requests between frontend and backend
- **Streaming UI Components**: Integrated real-time streaming display for tool calls, intermediate steps, and final answers
- **End-to-End Testing**: Added `test_ai_sdk_endpoint.py` for backend AI SDK endpoint validation
### Changed
- **Protocol Implementation**: Fully migrated to Vercel AI SDK Data Stream Protocol (SSE) for client-service communication
- **Event Type Mapping**: Enhanced event handling to support AI SDK protocol types (`9:`, `a:`, `0:`, `d:`, `2:`)
- **Multi-line SSE Processing**: Improved adapter to correctly handle multi-line SSE events from internal system
- **Frontend Architecture**: Established modern React-based chat interface with TypeScript and Tailwind CSS
### Technical Implementation
- **Frontend Stack**: Next.js 15.4.7, Vercel AI SDK (`ai`, `@ai-sdk/react`, `@ai-sdk/ui-utils`), TypeScript, Tailwind CSS
- **Backend Adapter**: Protocol conversion layer between internal LangGraph events and AI SDK format
- **Streaming Pipeline**: End-to-end streaming from LangGraph → Internal SSE → AI SDK Protocol → Frontend UI
- **Tool Call Visualization**: Real-time display of multi-step agent workflow including retrieval and generation phases
### Validated
- ✅ Backend AI SDK endpoint streaming compatibility
- ✅ Frontend-backend protocol integration
- ✅ Tool call event mapping and display
- ✅ Multi-line SSE event parsing
- ✅ End-to-end chat workflow functionality
- ✅ Service deployed and accessible at http://localhost:3001
### Documentation
- **Protocol Reference**: Enhanced `docs/topics/AI_SDK_UI.md` with implementation details
- **Integration Guide**: Comprehensive setup and testing procedures
- **API Compatibility**: Dual endpoint support for legacy and modern integrations
# v0.1.4
### Fixed
- **Streaming Token Display**: Fixed streaming test script to correctly read token content from `delta` field
- **Event Parsing**: Resolved issue where streaming logs showed empty answer tokens due to incorrect field access
- **Stream Validation**: Verified streaming API returns proper token content and LLM responses
### Added
- **Debug Script**: Added `debug_llm_stream.py` to inspect streaming chunk structure and validate token flow
- **Stream Testing**: Enhanced streaming test with proper token parsing and validation
### Changed
- **Test Script Enhancement**: 更新 `scripts/test_real_streaming.py` to display actual streamed tokens correctly
- **Event Processing**: Improved streaming event parsing and display logic for better debugging
# v0.1.3
### Added
- **Jinja2 Template Support**: Added comprehensive Jinja2 template rendering for LLM prompts
- **Template Utilities**: Created `service/utils/templates.py` for robust template processing
- **Template Validation**: Added test script `test_templates.py` to verify template rendering
- **Enhanced VS Code Debug Support**: Complete debugging configuration for development workflow
### Changed
- **Template Engine Migration**: Replaced Python `.format()` with Jinja2 template rendering
- **Variable Substitution**: Fixed template variable replacement in user and system prompts
- **Template Variables**: Added support for `output_language`, `user_query`, `conversation_history`, and `reference_document_chunks`
- **Error Handling**: Improved template rendering error handling and logging
### Fixed
- **Variable Substitution Bug**: Fixed issue where `{{variable}}` syntax was not being replaced in prompts
- **Template Context**: Ensured all required variables are properly passed to template renderer
- **Language Support**: Added configurable output language support (default: zh-CN)
### Technical Details
- Added `jinja2>=3.1.0` dependency to pyproject.toml
- Updated `service/graph/graph.py` to use Jinja2 template rendering
- Template variables now support complex data structures and safe rendering
- All template variables are properly escaped and validated
# v0.1.2
### Fixed
- Fixed configuration access pattern: refactored `config.prompts.rag` to use `config.get_rag_prompts()` method
- Fixed Azure OpenAI endpoint configuration: corrected `base_url` to use root endpoint without API path
- Fixed Azure OpenAI API version mismatch: updated `api_version` from "2024-02-01" to "2024-02-15-preview"
- Fixed streaming API error handling to properly propagate HTTP errors without silent failures
### Changed
- Improved error handling in streaming responses to surface external service errors
- Enhanced service stability by ensuring config/code consistency
### Validated
- Streaming API end-to-end functionality with tool execution and answer generation
- Azure OpenAI integration with correct endpoint configuration
- Error propagation and robust exception handling in streaming workflow
# v0.1.1
### Added
- Added service startup and stop scripts (`start_service.sh`, `stop_service.sh`)
- Added comprehensive service setup documentation (`SERVICE_SETUP.md`)
- Added support for environment variable substitution with default values (`${VAR:-default}`)
- Added LLM configuration structure in config.yaml for better organization
### Changed
- Updated `docs/config.yaml` based on `.coding/config.yaml` configuration
- Moved `config.yaml` to root directory for easier access
- Restructured configuration to support `llm.rag` section for prompts and parameters
- Improved `service/config.py` to handle new configuration structure
- Enhanced environment variable substitution logic
### Fixed
- Fixed SSE event parsing logic in integration test script to correctly associate `event:` and `data:` lines
- Improved streaming event validation for tool execution, error handling, and answer generation
- Fixed configuration loading to work with root directory placement
- Fixed port mismatch in integration test script to connect to correct service port
- Fixed prompt access issue: changed from `config.prompts.rag` to `config.get_rag_prompts()` method
### Added
- Added comprehensive integration tests for streaming functionality
- Added robust error handling for missing OpenAI API key scenarios
- Added event streaming validation for tool results, errors, and completion events
- Added configurable port/host support in test scripts for flexible service connection
## Previous Changes
- Initial implementation of Agentic RAG system
- FastAPI-based streaming endpoints
- LangGraph-inspired workflow orchestration
- Retrieval tool integration
- Memory management with TTL
- Web client with EventSource streaming