Doris MCP (Model Context Protocol) Server is a backend service built with Python and FastAPI. It implements the MCP, allowing clients to interact with it through defined "Tools". It's primarily designed to connect to Apache Doris databases, potentially leveraging Large Language Models (LLMs) for tasks like converting natural language queries to SQL (NL2SQL), executing queries, and performing metadata management and analysis.
- **🛠️ Enhanced Monitoring Tools Module**: Advanced memory tracking, metrics collection, and BE node discovery with modular, extensible design
- **🔍 Mature Query Information Tools**: Enhanced SQL explain and profiling with configurable content truncation, file export for LLM attachments, and advanced query analytics
- **⚙️ Unified Configuration Framework**: Centralized configuration management through `config.py` with comprehensive validation and standardized parameter naming
- **🗃️ Smart Default Database Handling**: Automatic fallback to `information_schema` database, eliminating startup configuration requirements and improving reliability
- **🗑️ Production-Ready Architecture**: Removed experimental tools in favor of battle-tested, production-ready alternatives with better performance and monitoring
> **⚠️ Breaking Changes**: Experimental tools (`column_analysis`, `performance_stats`) have been removed and replaced with enhanced monitoring and query information tools.
***Streamable HTTP Communication**: Unified HTTP endpoint supporting both request/response and streaming communication for optimal performance and reliability.
***Stdio Communication**: Standard input/output mode for direct integration with MCP clients like Cursor.
***Tools Manager**: Centralized tool registration and routing with unified interfaces (`doris_mcp_server/tools/tools_manager.py`)
***Enhanced Monitoring Tools Module**: Advanced memory tracking, metrics collection, and flexible BE node discovery (Enhanced in v0.4.0)
***Query Information Tools**: Enhanced SQL explain and profiling with configurable content management and LLM file export capabilities (Enhanced in v0.4.0)
> **💡 Command Compatibility**: After installation, both `doris-mcp-server` and `mcp-doris-server` commands are available for backward compatibility. You can use either command interchangeably.
| `get_table_column_comments` | Get comment information for all columns in table. | `table_name` (string, Required), `db_name` (string, Optional), `catalog_name` (string, Optional) |
| `get_table_indexes` | Get index information for specified table. | `table_name` (string, Required), `db_name` (string, Optional), `catalog_name` (string, Optional) |
| `get_recent_audit_logs` | Get audit log records for recent period. | `days` (integer, Optional), `limit` (integer, Optional) |
| `get_catalog_list` | Get list of all catalog names. | `random_string` (string, Required) |
| `get_sql_explain` | Get SQL execution plan with configurable content truncation and file export for LLM analysis. | `sql` (string, Required), `verbose` (boolean, Optional), `db_name` (string, Optional), `catalog_name` (string, Optional) |
| `get_sql_profile` | Get SQL execution profile with content management and file export for LLM optimization workflows. | `sql` (string, Required), `db_name` (string, Optional), `catalog_name` (string, Optional), `timeout` (integer, Optional) |
| `get_table_data_size` | Get table data size information via FE HTTP API. | `db_name` (string, Optional), `table_name` (string, Optional), `single_replica` (boolean, Optional) |
| `get_monitoring_metrics_data` | Get actual Doris monitoring metrics data from nodes with flexible BE discovery. | `role` (string, Optional), `monitor_type` (string, Optional), `priority` (string, Optional) |
| `get_realtime_memory_stats` | Get real-time memory statistics via BE Memory Tracker with auto/manual BE discovery. | `tracker_type` (string, Optional), `include_details` (boolean, Optional) |
| `get_historical_memory_stats` | Get historical memory statistics via BE Bvar interface with flexible BE configuration. | `tracker_names` (array, Optional), `time_range` (string, Optional) |
**Note:** All metadata tools support catalog federation for multi-catalog environments. The `get_catalog_list` tool requires a `random_string` parameter for compatibility reasons. Enhanced monitoring tools in v0.4.0 provide comprehensive memory tracking and metrics collection capabilities with flexible BE node discovery.
Interaction with the Doris MCP Server requires an **MCP Client**. The client connects to the server's Streamable HTTP endpoint and sends requests according to the MCP specification to invoke the server's tools.
The Doris MCP Server supports **catalog federation**, enabling interaction with multiple data catalogs (internal Doris tables and external data sources like Hive, MySQL, etc.) within a unified interface.
#### Key Features:
***Multi-Catalog Metadata Access**: All metadata tools (`get_db_list`, `get_db_table_list`, `get_table_schema`, etc.) support an optional `catalog_name` parameter to query specific catalogs.
***Cross-Catalog SQL Queries**: Execute SQL queries that span multiple catalogs using three-part table naming.
***Catalog Discovery**: Use `mcp_doris_get_catalog_list` to discover available catalogs and their types.
#### Three-Part Naming Requirement:
**All SQL queries MUST use three-part naming for table references:**
The Doris MCP Server includes a comprehensive security framework that provides enterprise-level protection through authentication, authorization, SQL security validation, and data masking capabilities.
### Security Features
***🔐 Authentication**: Support for token-based and basic authentication
***🛡️ Authorization**: Role-based access control (RBAC) with security levels
***🚫 SQL Security**: SQL injection protection and blocked operations
***🎭 Data Masking**: Automatic sensitive data masking based on user permissions
| `id_mask` | Masks ID card numbers | `110101****1234` |
| `name_mask` | Masks personal names | `张*明` |
| `partial_mask` | Partial masking with ratio | `abc***xyz` |
#### Custom Masking Rules
Add custom masking rules in your configuration:
```python
# Custom masking rule
custom_rule = {
"column_pattern": r".*salary.*|.*income.*",
"algorithm": "partial_mask",
"parameters": {
"mask_char": "*",
"mask_ratio": 0.6
},
"security_level": "confidential"
}
```
### Security Configuration Examples
#### Environment Variables
```bash
# .env file
AUTH_TYPE=token
TOKEN_SECRET=your_jwt_secret_key
ENABLE_MASKING=true
MAX_RESULT_ROWS=10000
BLOCKED_SQL_OPERATIONS=DROP,DELETE,TRUNCATE,ALTER
MAX_QUERY_COMPLEXITY=100
ENABLE_AUDIT=true
```
#### Sensitive Tables Configuration
```python
# Configure sensitive tables with security levels
sensitive_tables = {
"user_profiles": "confidential",
"payment_records": "secret",
"employee_salaries": "secret",
"customer_data": "confidential",
"public_reports": "public"
}
```
### Security Best Practices
1.**🔑 Strong Authentication**: Use JWT tokens with proper expiration
2.**🎯 Principle of Least Privilege**: Grant minimum required permissions
3.**🔍 Regular Auditing**: Enable audit logging for security monitoring
4.**🛡️ Input Validation**: All SQL queries are automatically validated
5.**🎭 Data Classification**: Properly classify data with security levels
6.**🔄 Regular Updates**: Keep security rules and configurations updated
### Security Monitoring
The system provides comprehensive security monitoring:
```python
# Security audit log example
{
"timestamp": "2024-01-15T10:30:00Z",
"user_id": "analyst_user",
"action": "query_execution",
"resource": "customer_data",
"result": "blocked",
"reason": "insufficient_permissions",
"risk_level": "medium"
}
```
> **⚠️ Important**: Always test security configurations in a development environment before deploying to production. Regularly review and update security policies based on your organization's requirements.
Stdio mode allows Cursor to manage the server process directly. Configuration is done within Cursor's MCP Server settings file (typically `~/.cursor/mcp.json` or similar).
1.**Configure `.env`:** Ensure your database credentials and any other necessary settings are correctly configured in the `.env` file within the project directory.
This script reads the `.env` file and starts the FastAPI server with Streamable HTTP support. Note the host and port the server is listening on (default is `0.0.0.0:3000`).
3.**Configure Cursor:** Add an entry like the following to your Cursor MCP configuration, pointing to the running server's Streamable HTTP endpoint:
This section outlines the process for adding new MCP tools to the Doris MCP Server, based on the unified modular architecture with centralized tool management.
Add your new tool to the `DorisToolsManager` class in `doris_mcp_server/tools/tools_manager.py`. The tools manager provides a centralized approach to tool registration and execution with unified interfaces.
The project includes a unified MCP client (`doris_mcp_client/`) for testing and integration purposes. The client supports multiple connection modes and provides a convenient interface for interacting with the MCP server.
### Q: Why do Qwen3-32b and other small parameter models always fail when calling tools?
**A:** This is a common issue. The main reason is that these models need more explicit guidance to correctly use MCP tools. It's recommended to add the following instruction prompt for the model:
Use MCP tools to complete tasks as much as possible. Carefully read the annotations, method names, and parameter descriptions of each tool. Please follow these steps:
1. Carefully analyze the user's question and match the most appropriate tool from the existing Tools list.
2. Ensure tool names, method names, and parameters are used exactly as defined in the tool annotations. Do not create tool names or parameters on your own.
3. When passing parameters, strictly follow the parameter format and requirements specified in the tool annotations.
4. When calling tools, call them directly as needed, but refer to the following request format for parameters: {"mcp_sse_call_tool": {"tool_name": "$tools_name", "arguments": "{}"}}
5. When outputting results, do not include any XML tags, return plain text content only.
<input>
User question: user_query
</input>
<output>
Return tool call results or final answer, along with analysis of the results.
</output>
</instruction>
```
If you have further requirements for the returned results, you can describe the specific requirements in the `<output>` tag.
### Q: How to configure different database connections?
**A:** You can configure database connections in several ways: