Doris MCP (Model Context Protocol) Server is a backend service built with Python and FastAPI. It implements the MCP, allowing clients to interact with it through defined "Tools". It's primarily designed to connect to Apache Doris databases, potentially leveraging Large Language Models (LLMs) for tasks like converting natural language queries to SQL (NL2SQL), executing queries, and performing metadata management and analysis.
***Multiple Communication Modes** (Updated in v0.3.0):
***Stdio**: Standard input/output mode for direct integration with MCP clients like Cursor.
***Streamable HTTP**: Unified HTTP endpoint supporting request/response and streaming (Primary mode since v0.3.0).
> **⚠️ Breaking Change in v0.3.0**: SSE (Server-Sent Events) mode has been completely removed in favor of the more robust Streamable HTTP implementation.
***Enterprise-Grade Architecture**: Modular design with comprehensive functionality:
***Tools Manager**: Centralized tool registration and routing (`doris_mcp_server/tools/tools_manager.py`)
***Resources Manager**: Resource management and metadata exposure (`doris_mcp_server/tools/resources_manager.py`)
***Prompts Manager**: Intelligent prompt templates for data analysis (`doris_mcp_server/tools/prompts_manager.py`)
***Advanced Database Features**:
***Query Execution**: High-performance SQL execution with caching and optimization (`doris_mcp_server/utils/query_executor.py`)
***Security Management**: SQL security validation, data masking, and access control (`doris_mcp_server/utils/security.py`)
***Metadata Extraction**: Comprehensive database metadata with catalog federation support (`doris_mcp_server/utils/schema_extractor.py`)
***Performance Analysis**: Column statistics, performance monitoring, and data analysis tools (`doris_mcp_server/utils/analysis_tools.py`)
***Catalog Federation Support**: Full support for multi-catalog environments (internal Doris tables and external data sources like Hive, MySQL, etc.)
***Enterprise Security**: Comprehensive security framework with authentication, authorization, SQL injection protection, and data masking (`doris_mcp_server/utils/security.py`)
***Flexible Configuration**: Comprehensive configuration management with environment variables, file-based config, and validation (`doris_mcp_server/utils/config.py`)
> **💡 Command Compatibility**: After installation, both `doris-mcp-server` and `mcp-doris-server` commands are available for backward compatibility. You can use either command interchangeably.
### Start Streamable HTTP Mode (Web Service)
```bash
# Full configuration with database connection
doris-mcp-server \
--transport http \
--host 0.0.0.0 \
--port 3000 \
--db-host 127.0.0.1 \
--db-port 9030 \
--db-user root \
--db-password your_password
```
### Start Stdio Mode (for Cursor and other MCP clients)
```bash
# For direct integration with MCP clients like Cursor
doris-mcp-server --transport stdio
```
### Verify Installation
```bash
# Check installation
doris-mcp-server --help
# Test HTTP mode (in another terminal)
curl http://localhost:3000/health
```
### Environment Variables (Optional)
Instead of command-line arguments, you can use environment variables:
| `exec_query` | Execute SQL query with catalog federation support. | `sql` (string, Required - MUST use three-part naming), `db_name` (string, Optional), `catalog_name` (string, Optional), `max_rows` (integer, Optional, default 100), `timeout` (integer, Optional, default 30) | ✅ Active |
| `get_catalog_list` | Get a list of all catalogs with detailed information. | `random_string` (string, Required) | ✅ Active |
| `get_db_list` | Get a list of all database names in the specified catalog. | `catalog_name` (string, Optional, defaults to internal catalog) | ✅ Active |
| `get_db_table_list` | Get a list of all table names in the specified database. | `db_name` (string, Optional), `catalog_name` (string, Optional) | ✅ Active |
| `get_table_schema` | Get detailed structure of the specified table. | `table_name` (string, Required), `db_name` (string, Optional), `catalog_name` (string, Optional) | ✅ Active |
| `get_table_comment` | Get the comment for the specified table. | `table_name` (string, Required), `db_name` (string, Optional), `catalog_name` (string, Optional) | ✅ Active |
| `get_table_column_comments` | Get comments for all columns in the specified table. | `table_name` (string, Required), `db_name` (string, Optional), `catalog_name` (string, Optional) | ✅ Active |
| `get_table_indexes` | Get index information for the specified table. | `table_name` (string, Required), `db_name` (string, Optional), `catalog_name` (string, Optional) | ✅ Active |
| `get_recent_audit_logs` | Get audit log records for a recent period. | `days` (integer, Optional, default 7), `limit` (integer, Optional, default 100) | ✅ Active |
| `column_analysis` | Analyze statistical information and data distribution. | `table_name` (string, Required), `column_name` (string, Required), `analysis_type` (string, Optional: basic/distribution/detailed) | ⚠️ Experimental |
**Note:** All metadata tools support catalog federation for multi-catalog environments. The `get_catalog_list` tool requires a `random_string` parameter for compatibility reasons.
Interaction with the Doris MCP Server requires an **MCP Client**. The client connects to the server's Streamable HTTP endpoint and sends requests according to the MCP specification to invoke the server's tools.
> **Migration Note**: If you're upgrading from v0.2.x, note that tool names have been simplified (removed `mcp_doris_` prefix) and the communication protocol has been updated to use Streamable HTTP exclusively.
The Doris MCP Server supports **catalog federation**, enabling interaction with multiple data catalogs (internal Doris tables and external data sources like Hive, MySQL, etc.) within a unified interface.
#### Key Features:
***Multi-Catalog Metadata Access**: All metadata tools (`get_db_list`, `get_db_table_list`, `get_table_schema`, etc.) support an optional `catalog_name` parameter to query specific catalogs.
***Cross-Catalog SQL Queries**: Execute SQL queries that span multiple catalogs using three-part table naming.
***Catalog Discovery**: Use `mcp_doris_get_catalog_list` to discover available catalogs and their types.
#### Three-Part Naming Requirement:
**All SQL queries MUST use three-part naming for table references:**
The Doris MCP Server includes a comprehensive security framework that provides enterprise-level protection through authentication, authorization, SQL security validation, and data masking capabilities.
### Security Features
***🔐 Authentication**: Support for token-based and basic authentication
***🛡️ Authorization**: Role-based access control (RBAC) with security levels
***🚫 SQL Security**: SQL injection protection and blocked operations
***🎭 Data Masking**: Automatic sensitive data masking based on user permissions
| `id_mask` | Masks ID card numbers | `110101****1234` |
| `name_mask` | Masks personal names | `张*明` |
| `partial_mask` | Partial masking with ratio | `abc***xyz` |
#### Custom Masking Rules
Add custom masking rules in your configuration:
```python
# Custom masking rule
custom_rule = {
"column_pattern": r".*salary.*|.*income.*",
"algorithm": "partial_mask",
"parameters": {
"mask_char": "*",
"mask_ratio": 0.6
},
"security_level": "confidential"
}
```
### Security Configuration Examples
#### Environment Variables
```bash
# .env file
AUTH_TYPE=token
TOKEN_SECRET=your_jwt_secret_key
ENABLE_MASKING=true
MAX_RESULT_ROWS=10000
BLOCKED_SQL_OPERATIONS=DROP,DELETE,TRUNCATE,ALTER
MAX_QUERY_COMPLEXITY=100
ENABLE_AUDIT=true
```
#### Sensitive Tables Configuration
```python
# Configure sensitive tables with security levels
sensitive_tables = {
"user_profiles": "confidential",
"payment_records": "secret",
"employee_salaries": "secret",
"customer_data": "confidential",
"public_reports": "public"
}
```
### Security Best Practices
1.**🔑 Strong Authentication**: Use JWT tokens with proper expiration
2.**🎯 Principle of Least Privilege**: Grant minimum required permissions
3.**🔍 Regular Auditing**: Enable audit logging for security monitoring
4.**🛡️ Input Validation**: All SQL queries are automatically validated
5.**🎭 Data Classification**: Properly classify data with security levels
6.**🔄 Regular Updates**: Keep security rules and configurations updated
### Security Monitoring
The system provides comprehensive security monitoring:
```python
# Security audit log example
{
"timestamp": "2024-01-15T10:30:00Z",
"user_id": "analyst_user",
"action": "query_execution",
"resource": "customer_data",
"result": "blocked",
"reason": "insufficient_permissions",
"risk_level": "medium"
}
```
> **⚠️ Important**: Always test security configurations in a development environment before deploying to production. Regularly review and update security policies based on your organization's requirements.
Stdio mode allows Cursor to manage the server process directly. Configuration is done within Cursor's MCP Server settings file (typically `~/.cursor/mcp.json` or similar).
1.**Configure `.env`:** Ensure your database credentials and any other necessary settings are correctly configured in the `.env` file within the project directory.
This script reads the `.env` file and starts the FastAPI server with Streamable HTTP support. Note the host and port the server is listening on (default is `0.0.0.0:3000`).
3.**Configure Cursor:** Add an entry like the following to your Cursor MCP configuration, pointing to the running server's Streamable HTTP endpoint:
> **Note**: Adjust the host/port if your server runs on a different address. The `/mcp` endpoint is the unified Streamable HTTP interface introduced in v0.3.0.
After configuring either mode in Cursor, you should be able to select the server (e.g., `doris-stdio` or `doris-http`) and use its tools.
> **⚠️ Migration from v0.2.x**: If you were using SSE mode (`/sse` endpoint), update your configuration to use the new Streamable HTTP endpoint (`/mcp`).
Add your new tool to the `DorisToolsManager` class in `doris_mcp_server/tools/tools_manager.py`. The tools manager provides a centralized approach to tool registration and execution.
The project includes a unified MCP client (`doris_mcp_client/`) for testing and integration purposes. The client supports multiple connection modes and provides a convenient interface for interacting with the MCP server.
### Q: Why do Qwen3-32b and other small parameter models always fail when calling tools?
**A:** This is a common issue. The main reason is that these models need more explicit guidance to correctly use MCP tools. It's recommended to add the following instruction prompt for the model:
```xml
<instruction>
Use MCP tools to complete tasks as much as possible. Carefully read the annotations, method names, and parameter descriptions of each tool. Please follow these steps:
1. Carefully analyze the user's question and match the most appropriate tool from the existing Tools list.
2. Ensure tool names, method names, and parameters are used exactly as defined in the tool annotations. Do not create tool names or parameters on your own.
3. When passing parameters, strictly follow the parameter format and requirements specified in the tool annotations.
4. When calling tools, call them directly as needed, but refer to the following request format for parameters: {"mcp_sse_call_tool": {"tool_name": "$tools_name", "arguments": "{}"}}
5. When outputting results, do not include any XML tags, return plain text content only.
<input>
User question: user_query
</input>
<output>
Return tool call results or final answer, along with analysis of the results.
</output>
</instruction>
```
If you have further requirements for the returned results, you can describe the specific requirements in the `<output>` tag.
### Q: How to configure different database connections?
**A:** You can configure database connections in several ways: