support Multi-Catalog

This commit is contained in:
FreeOnePlus
2025-06-06 14:41:14 +08:00
parent 5e98e5ba41
commit d9fed06c92

View File

@@ -9,15 +9,16 @@ Doris MCP (Model Context Protocol) Server is a backend service built with Python
* **SSE (Server-Sent Events)**: Served via `/sse` (initialization) and `/mcp/messages` (communication) endpoints (`src/sse_server.py`). * **SSE (Server-Sent Events)**: Served via `/sse` (initialization) and `/mcp/messages` (communication) endpoints (`src/sse_server.py`).
* **Streamable HTTP**: Served via the unified `/mcp` endpoint, supporting request/response and streaming (`src/streamable_server.py`). * **Streamable HTTP**: Served via the unified `/mcp` endpoint, supporting request/response and streaming (`src/streamable_server.py`).
* **(Optional) Stdio**: Interaction possible via standard input/output (`src/stdio_server.py`), requires specific startup configuration. * **(Optional) Stdio**: Interaction possible via standard input/output (`src/stdio_server.py`), requires specific startup configuration.
* **Tool-Based Interface**: Core functionalities are encapsulated as MCP tools that clients can call as needed. Currently available key tools focus on direct database interaction: * **Tool-Based Interface**: Core functionalities are encapsulated as MCP tools that clients can call as needed. Currently available key tools focus on direct database interaction with full catalog federation support:
* SQL Execution (`mcp_doris_exec_query`) * SQL Execution with Catalog Federation (`mcp_doris_exec_query`)
* Catalog Management (`mcp_doris_get_catalog_list`)
* Database and Table Listing (`mcp_doris_get_db_list`, `mcp_doris_get_db_table_list`) * Database and Table Listing (`mcp_doris_get_db_list`, `mcp_doris_get_db_table_list`)
* Metadata Retrieval (`mcp_doris_get_table_schema`, `mcp_doris_get_table_comment`, `mcp_doris_get_table_column_comments`, `mcp_doris_get_table_indexes`) * Metadata Retrieval (`mcp_doris_get_table_schema`, `mcp_doris_get_table_comment`, `mcp_doris_get_table_column_comments`, `mcp_doris_get_table_indexes`)
* Audit Log Retrieval (`mcp_doris_get_recent_audit_logs`) * Audit Log Retrieval (`mcp_doris_get_recent_audit_logs`)
*Note: Current tools primarily focus on direct DB operations.* *Note: All metadata tools support catalog federation for multi-catalog environments.*
* **Database Interaction**: Provides functionality to connect to Apache Doris (or other compatible databases) and execute queries (`src/utils/db.py`). * **Database Interaction**: Provides functionality to connect to Apache Doris (or other compatible databases) and execute queries (`src/utils/db.py`).
* **Flexible Configuration**: Configured via a `.env` file, supporting settings for database connections, LLM providers/models, API keys, logging levels, etc. * **Flexible Configuration**: Configured via a `.env` file, supporting settings for database connections, LLM providers/models, API keys, logging levels, etc.
* **Metadata Extraction**: Capable of extracting database metadata information (`src/utils/schema_extractor.py`). * **Metadata Extraction**: Capable of extracting database metadata information with full catalog federation support (`src/utils/schema_extractor.py`).
## System Requirements ## System Requirements
@@ -72,13 +73,14 @@ The following table lists the main tools currently available for invocation via
| Tool Name | Description | Parameters | Status | | Tool Name | Description | Parameters | Status |
| :-------------------------------- | :---------------------------------------------------------- | :--------------------------------------------------------------------------------------------------------- | :------- | | :-------------------------------- | :---------------------------------------------------------- | :--------------------------------------------------------------------------------------------------------- | :------- |
| `mcp_doris_get_db_list` | Get a list of all database names on the server. | `random_string` (string, Required) | ✅ Active | | `mcp_doris_get_catalog_list` | Get a list of all catalogs with detailed information. | `random_string` (string, Required) | ✅ Active |
| `mcp_doris_get_db_table_list` | Get a list of all table names in the specified database. | `random_string` (string, Required), `db_name` (string, Optional, defaults to current db) | ✅ Active | | `mcp_doris_get_db_list` | Get a list of all database names in the specified catalog. | `random_string` (string, Required), `catalog_name` (string, Optional, defaults to internal catalog) | ✅ Active |
| `mcp_doris_get_table_schema` | Get detailed structure of the specified table. | `random_string` (string, Required), `table_name` (string, Required), `db_name` (string, Optional) | ✅ Active | | `mcp_doris_get_db_table_list` | Get a list of all table names in the specified database. | `random_string` (string, Required), `db_name` (string, Optional), `catalog_name` (string, Optional) | ✅ Active |
| `mcp_doris_get_table_comment` | Get the comment for the specified table. | `random_string` (string, Required), `table_name` (string, Required), `db_name` (string, Optional) | ✅ Active | | `mcp_doris_get_table_schema` | Get detailed structure of the specified table. | `random_string` (string, Required), `table_name` (string, Required), `db_name` (string, Optional), `catalog_name` (string, Optional) | ✅ Active |
| `mcp_doris_get_table_column_comments` | Get comments for all columns in the specified table. | `random_string` (string, Required), `table_name` (string, Required), `db_name` (string, Optional) | ✅ Active | | `mcp_doris_get_table_comment` | Get the comment for the specified table. | `random_string` (string, Required), `table_name` (string, Required), `db_name` (string, Optional), `catalog_name` (string, Optional) | ✅ Active |
| `mcp_doris_get_table_indexes` | Get index information for the specified table. | `random_string` (string, Required), `table_name` (string, Required), `db_name` (string, Optional) | ✅ Active | | `mcp_doris_get_table_column_comments` | Get comments for all columns in the specified table. | `random_string` (string, Required), `table_name` (string, Required), `db_name` (string, Optional), `catalog_name` (string, Optional) | ✅ Active |
| `mcp_doris_exec_query` | Execute SQL query and return result command. | `random_string` (string, Required), `sql` (string, Required), `db_name` (string, Optional), `max_rows` (integer, Optional, default 100), `timeout` (integer, Optional, default 30) | ✅ Active | | `mcp_doris_get_table_indexes` | Get index information for the specified table. | `random_string` (string, Required), `table_name` (string, Required), `db_name` (string, Optional), `catalog_name` (string, Optional) | ✅ Active |
| `mcp_doris_exec_query` | Execute SQL query with catalog federation support. | `random_string` (string, Required), `sql` (string, Required - MUST use three-part naming), `db_name` (string, Optional), `catalog_name` (string, Optional), `max_rows` (integer, Optional, default 100), `timeout` (integer, Optional, default 30) | ✅ Active |
| `mcp_doris_get_recent_audit_logs` | Get audit log records for a recent period. | `random_string` (string, Required), `days` (integer, Optional, default 7), `limit` (integer, Optional, default 100) | ✅ Active | | `mcp_doris_get_recent_audit_logs` | Get audit log records for a recent period. | `random_string` (string, Required), `days` (integer, Optional, default 7), `limit` (integer, Optional, default 100) | ✅ Active |
**Note:** All tools require a `random_string` parameter as a call identifier, typically handled automatically by the MCP client. "Optional" and "Required" refer to the tool's internal logic; the client might need to provide values for all parameters depending on its implementation. The tool names listed here are the base names; clients might see them prefixed (e.g., `mcp_doris_stdio3_get_db_list`) depending on the connection mode. **Note:** All tools require a `random_string` parameter as a call identifier, typically handled automatically by the MCP client. "Optional" and "Required" refer to the tool's internal logic; the client might need to provide values for all parameters depending on its implementation. The tool names listed here are the base names; clients might see them prefixed (e.g., `mcp_doris_stdio3_get_db_list`) depending on the connection mode.
@@ -112,13 +114,81 @@ Interaction with the Doris MCP Server requires an **MCP Client**. The client con
3. **Call Tool**: The client sends a `tool_call` message/request, specifying the `tool_name` and `arguments`. 3. **Call Tool**: The client sends a `tool_call` message/request, specifying the `tool_name` and `arguments`.
* **Example: Get Table Schema** * **Example: Get Table Schema**
* `tool_name`: `mcp_doris_get_table_schema` (or the mode-specific name) * `tool_name`: `mcp_doris_get_table_schema` (or the mode-specific name)
* `arguments`: Include `random_string`, `table_name`, `db_name`. * `arguments`: Include `random_string`, `table_name`, `db_name`, `catalog_name`.
4. **Handle Response**: 4. **Handle Response**:
* **Non-streaming**: The client receives a response containing `result` or `error`. * **Non-streaming**: The client receives a response containing `result` or `error`.
* **Streaming**: The client receives a series of `tools/progress` notifications, followed by a final response containing the `result` or `error`. * **Streaming**: The client receives a series of `tools/progress` notifications, followed by a final response containing the `result` or `error`.
Specific tool names and parameters should be referenced from the `src/tools/` code or obtained via MCP discovery mechanisms. Specific tool names and parameters should be referenced from the `src/tools/` code or obtained via MCP discovery mechanisms.
### Catalog Federation Support
The Doris MCP Server supports **catalog federation**, enabling interaction with multiple data catalogs (internal Doris tables and external data sources like Hive, MySQL, etc.) within a unified interface.
#### Key Features:
* **Multi-Catalog Metadata Access**: All metadata tools (`get_db_list`, `get_db_table_list`, `get_table_schema`, etc.) support an optional `catalog_name` parameter to query specific catalogs.
* **Cross-Catalog SQL Queries**: Execute SQL queries that span multiple catalogs using three-part table naming.
* **Catalog Discovery**: Use `mcp_doris_get_catalog_list` to discover available catalogs and their types.
#### Three-Part Naming Requirement:
**All SQL queries MUST use three-part naming for table references:**
* **Internal Tables**: `internal.database_name.table_name`
* **External Tables**: `catalog_name.database_name.table_name`
#### Examples:
1. **Get Available Catalogs:**
```json
{
"tool_name": "mcp_doris_get_catalog_list",
"arguments": {"random_string": "unique_id"}
}
```
2. **Get Databases in Specific Catalog:**
```json
{
"tool_name": "mcp_doris_get_db_list",
"arguments": {"random_string": "unique_id", "catalog_name": "mysql"}
}
```
3. **Query Internal Catalog:**
```json
{
"tool_name": "mcp_doris_exec_query",
"arguments": {
"random_string": "unique_id",
"sql": "SELECT COUNT(*) FROM internal.ssb.customer"
}
}
```
4. **Query External Catalog:**
```json
{
"tool_name": "mcp_doris_exec_query",
"arguments": {
"random_string": "unique_id",
"sql": "SELECT COUNT(*) FROM mysql.ssb.customer"
}
}
```
5. **Cross-Catalog Query:**
```json
{
"tool_name": "mcp_doris_exec_query",
"arguments": {
"random_string": "unique_id",
"sql": "SELECT i.c_name, m.external_data FROM internal.ssb.customer i JOIN mysql.test.user_info m ON i.c_custkey = m.customer_id"
}
}
```
## Connecting with Cursor ## Connecting with Cursor
You can connect Cursor to this MCP server using either Stdio or SSE mode. You can connect Cursor to this MCP server using either Stdio or SSE mode.
@@ -225,7 +295,7 @@ This section outlines the process for adding new MCP tools to the Doris MCP Serv
Before writing new database interaction logic from scratch, check the existing utility modules: Before writing new database interaction logic from scratch, check the existing utility modules:
* **`doris_mcp_server/utils/db.py`**: Provides basic functions for getting database connections (`get_db_connection`) and executing raw queries (`execute_query`, `execute_query_df`). * **`doris_mcp_server/utils/db.py`**: Provides basic functions for getting database connections (`get_db_connection`) and executing raw queries (`execute_query`, `execute_query_df`).
* **`doris_mcp_server/utils/schema_extractor.py` (`MetadataExtractor` class)**: Offers high-level methods to retrieve database metadata, such as listing databases/tables (`get_all_databases`, `get_database_tables`), getting table schemas/comments/indexes (`get_table_schema`, `get_table_comment`, `get_column_comments`, `get_table_indexes`), and accessing audit logs (`get_recent_audit_logs`). It includes caching mechanisms. * **`doris_mcp_server/utils/schema_extractor.py` (`MetadataExtractor` class)**: Offers high-level methods to retrieve database metadata with catalog federation support, such as listing databases/tables (`get_all_databases`, `get_database_tables`), getting table schemas/comments/indexes (`get_table_schema`, `get_table_comment`, `get_column_comments`, `get_table_indexes`), and accessing audit logs (`get_recent_audit_logs`). All methods support optional `catalog_name` parameters for multi-catalog environments. It includes caching mechanisms.
* **`doris_mcp_server/utils/sql_executor_tools.py` (`execute_sql_query` function)**: Provides a wrapper around `db.execute_query` that includes security checks (optional, controlled by `ENABLE_SQL_SECURITY_CHECK` env var), adds automatic `LIMIT` to SELECT queries, handles result serialization (dates, decimals), and formats the output into the standard MCP success/error structure. **It's recommended to use this for executing user-provided or generated SQL.** * **`doris_mcp_server/utils/sql_executor_tools.py` (`execute_sql_query` function)**: Provides a wrapper around `db.execute_query` that includes security checks (optional, controlled by `ENABLE_SQL_SECURITY_CHECK` env var), adds automatic `LIMIT` to SELECT queries, handles result serialization (dates, decimals), and formats the output into the standard MCP success/error structure. **It's recommended to use this for executing user-provided or generated SQL.**
You can import and combine functionalities from these modules to build your new tool. You can import and combine functionalities from these modules to build your new tool.