support Multi-Catalog
This commit is contained in:
96
README.md
96
README.md
@@ -9,15 +9,16 @@ Doris MCP (Model Context Protocol) Server is a backend service built with Python
|
|||||||
* **SSE (Server-Sent Events)**: Served via `/sse` (initialization) and `/mcp/messages` (communication) endpoints (`src/sse_server.py`).
|
* **SSE (Server-Sent Events)**: Served via `/sse` (initialization) and `/mcp/messages` (communication) endpoints (`src/sse_server.py`).
|
||||||
* **Streamable HTTP**: Served via the unified `/mcp` endpoint, supporting request/response and streaming (`src/streamable_server.py`).
|
* **Streamable HTTP**: Served via the unified `/mcp` endpoint, supporting request/response and streaming (`src/streamable_server.py`).
|
||||||
* **(Optional) Stdio**: Interaction possible via standard input/output (`src/stdio_server.py`), requires specific startup configuration.
|
* **(Optional) Stdio**: Interaction possible via standard input/output (`src/stdio_server.py`), requires specific startup configuration.
|
||||||
* **Tool-Based Interface**: Core functionalities are encapsulated as MCP tools that clients can call as needed. Currently available key tools focus on direct database interaction:
|
* **Tool-Based Interface**: Core functionalities are encapsulated as MCP tools that clients can call as needed. Currently available key tools focus on direct database interaction with full catalog federation support:
|
||||||
* SQL Execution (`mcp_doris_exec_query`)
|
* SQL Execution with Catalog Federation (`mcp_doris_exec_query`)
|
||||||
|
* Catalog Management (`mcp_doris_get_catalog_list`)
|
||||||
* Database and Table Listing (`mcp_doris_get_db_list`, `mcp_doris_get_db_table_list`)
|
* Database and Table Listing (`mcp_doris_get_db_list`, `mcp_doris_get_db_table_list`)
|
||||||
* Metadata Retrieval (`mcp_doris_get_table_schema`, `mcp_doris_get_table_comment`, `mcp_doris_get_table_column_comments`, `mcp_doris_get_table_indexes`)
|
* Metadata Retrieval (`mcp_doris_get_table_schema`, `mcp_doris_get_table_comment`, `mcp_doris_get_table_column_comments`, `mcp_doris_get_table_indexes`)
|
||||||
* Audit Log Retrieval (`mcp_doris_get_recent_audit_logs`)
|
* Audit Log Retrieval (`mcp_doris_get_recent_audit_logs`)
|
||||||
*Note: Current tools primarily focus on direct DB operations.*
|
*Note: All metadata tools support catalog federation for multi-catalog environments.*
|
||||||
* **Database Interaction**: Provides functionality to connect to Apache Doris (or other compatible databases) and execute queries (`src/utils/db.py`).
|
* **Database Interaction**: Provides functionality to connect to Apache Doris (or other compatible databases) and execute queries (`src/utils/db.py`).
|
||||||
* **Flexible Configuration**: Configured via a `.env` file, supporting settings for database connections, LLM providers/models, API keys, logging levels, etc.
|
* **Flexible Configuration**: Configured via a `.env` file, supporting settings for database connections, LLM providers/models, API keys, logging levels, etc.
|
||||||
* **Metadata Extraction**: Capable of extracting database metadata information (`src/utils/schema_extractor.py`).
|
* **Metadata Extraction**: Capable of extracting database metadata information with full catalog federation support (`src/utils/schema_extractor.py`).
|
||||||
|
|
||||||
## System Requirements
|
## System Requirements
|
||||||
|
|
||||||
@@ -72,13 +73,14 @@ The following table lists the main tools currently available for invocation via
|
|||||||
|
|
||||||
| Tool Name | Description | Parameters | Status |
|
| Tool Name | Description | Parameters | Status |
|
||||||
| :-------------------------------- | :---------------------------------------------------------- | :--------------------------------------------------------------------------------------------------------- | :------- |
|
| :-------------------------------- | :---------------------------------------------------------- | :--------------------------------------------------------------------------------------------------------- | :------- |
|
||||||
| `mcp_doris_get_db_list` | Get a list of all database names on the server. | `random_string` (string, Required) | ✅ Active |
|
| `mcp_doris_get_catalog_list` | Get a list of all catalogs with detailed information. | `random_string` (string, Required) | ✅ Active |
|
||||||
| `mcp_doris_get_db_table_list` | Get a list of all table names in the specified database. | `random_string` (string, Required), `db_name` (string, Optional, defaults to current db) | ✅ Active |
|
| `mcp_doris_get_db_list` | Get a list of all database names in the specified catalog. | `random_string` (string, Required), `catalog_name` (string, Optional, defaults to internal catalog) | ✅ Active |
|
||||||
| `mcp_doris_get_table_schema` | Get detailed structure of the specified table. | `random_string` (string, Required), `table_name` (string, Required), `db_name` (string, Optional) | ✅ Active |
|
| `mcp_doris_get_db_table_list` | Get a list of all table names in the specified database. | `random_string` (string, Required), `db_name` (string, Optional), `catalog_name` (string, Optional) | ✅ Active |
|
||||||
| `mcp_doris_get_table_comment` | Get the comment for the specified table. | `random_string` (string, Required), `table_name` (string, Required), `db_name` (string, Optional) | ✅ Active |
|
| `mcp_doris_get_table_schema` | Get detailed structure of the specified table. | `random_string` (string, Required), `table_name` (string, Required), `db_name` (string, Optional), `catalog_name` (string, Optional) | ✅ Active |
|
||||||
| `mcp_doris_get_table_column_comments` | Get comments for all columns in the specified table. | `random_string` (string, Required), `table_name` (string, Required), `db_name` (string, Optional) | ✅ Active |
|
| `mcp_doris_get_table_comment` | Get the comment for the specified table. | `random_string` (string, Required), `table_name` (string, Required), `db_name` (string, Optional), `catalog_name` (string, Optional) | ✅ Active |
|
||||||
| `mcp_doris_get_table_indexes` | Get index information for the specified table. | `random_string` (string, Required), `table_name` (string, Required), `db_name` (string, Optional) | ✅ Active |
|
| `mcp_doris_get_table_column_comments` | Get comments for all columns in the specified table. | `random_string` (string, Required), `table_name` (string, Required), `db_name` (string, Optional), `catalog_name` (string, Optional) | ✅ Active |
|
||||||
| `mcp_doris_exec_query` | Execute SQL query and return result command. | `random_string` (string, Required), `sql` (string, Required), `db_name` (string, Optional), `max_rows` (integer, Optional, default 100), `timeout` (integer, Optional, default 30) | ✅ Active |
|
| `mcp_doris_get_table_indexes` | Get index information for the specified table. | `random_string` (string, Required), `table_name` (string, Required), `db_name` (string, Optional), `catalog_name` (string, Optional) | ✅ Active |
|
||||||
|
| `mcp_doris_exec_query` | Execute SQL query with catalog federation support. | `random_string` (string, Required), `sql` (string, Required - MUST use three-part naming), `db_name` (string, Optional), `catalog_name` (string, Optional), `max_rows` (integer, Optional, default 100), `timeout` (integer, Optional, default 30) | ✅ Active |
|
||||||
| `mcp_doris_get_recent_audit_logs` | Get audit log records for a recent period. | `random_string` (string, Required), `days` (integer, Optional, default 7), `limit` (integer, Optional, default 100) | ✅ Active |
|
| `mcp_doris_get_recent_audit_logs` | Get audit log records for a recent period. | `random_string` (string, Required), `days` (integer, Optional, default 7), `limit` (integer, Optional, default 100) | ✅ Active |
|
||||||
|
|
||||||
**Note:** All tools require a `random_string` parameter as a call identifier, typically handled automatically by the MCP client. "Optional" and "Required" refer to the tool's internal logic; the client might need to provide values for all parameters depending on its implementation. The tool names listed here are the base names; clients might see them prefixed (e.g., `mcp_doris_stdio3_get_db_list`) depending on the connection mode.
|
**Note:** All tools require a `random_string` parameter as a call identifier, typically handled automatically by the MCP client. "Optional" and "Required" refer to the tool's internal logic; the client might need to provide values for all parameters depending on its implementation. The tool names listed here are the base names; clients might see them prefixed (e.g., `mcp_doris_stdio3_get_db_list`) depending on the connection mode.
|
||||||
@@ -112,13 +114,81 @@ Interaction with the Doris MCP Server requires an **MCP Client**. The client con
|
|||||||
3. **Call Tool**: The client sends a `tool_call` message/request, specifying the `tool_name` and `arguments`.
|
3. **Call Tool**: The client sends a `tool_call` message/request, specifying the `tool_name` and `arguments`.
|
||||||
* **Example: Get Table Schema**
|
* **Example: Get Table Schema**
|
||||||
* `tool_name`: `mcp_doris_get_table_schema` (or the mode-specific name)
|
* `tool_name`: `mcp_doris_get_table_schema` (or the mode-specific name)
|
||||||
* `arguments`: Include `random_string`, `table_name`, `db_name`.
|
* `arguments`: Include `random_string`, `table_name`, `db_name`, `catalog_name`.
|
||||||
4. **Handle Response**:
|
4. **Handle Response**:
|
||||||
* **Non-streaming**: The client receives a response containing `result` or `error`.
|
* **Non-streaming**: The client receives a response containing `result` or `error`.
|
||||||
* **Streaming**: The client receives a series of `tools/progress` notifications, followed by a final response containing the `result` or `error`.
|
* **Streaming**: The client receives a series of `tools/progress` notifications, followed by a final response containing the `result` or `error`.
|
||||||
|
|
||||||
Specific tool names and parameters should be referenced from the `src/tools/` code or obtained via MCP discovery mechanisms.
|
Specific tool names and parameters should be referenced from the `src/tools/` code or obtained via MCP discovery mechanisms.
|
||||||
|
|
||||||
|
### Catalog Federation Support
|
||||||
|
|
||||||
|
The Doris MCP Server supports **catalog federation**, enabling interaction with multiple data catalogs (internal Doris tables and external data sources like Hive, MySQL, etc.) within a unified interface.
|
||||||
|
|
||||||
|
#### Key Features:
|
||||||
|
|
||||||
|
* **Multi-Catalog Metadata Access**: All metadata tools (`get_db_list`, `get_db_table_list`, `get_table_schema`, etc.) support an optional `catalog_name` parameter to query specific catalogs.
|
||||||
|
* **Cross-Catalog SQL Queries**: Execute SQL queries that span multiple catalogs using three-part table naming.
|
||||||
|
* **Catalog Discovery**: Use `mcp_doris_get_catalog_list` to discover available catalogs and their types.
|
||||||
|
|
||||||
|
#### Three-Part Naming Requirement:
|
||||||
|
|
||||||
|
**All SQL queries MUST use three-part naming for table references:**
|
||||||
|
|
||||||
|
* **Internal Tables**: `internal.database_name.table_name`
|
||||||
|
* **External Tables**: `catalog_name.database_name.table_name`
|
||||||
|
|
||||||
|
#### Examples:
|
||||||
|
|
||||||
|
1. **Get Available Catalogs:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"tool_name": "mcp_doris_get_catalog_list",
|
||||||
|
"arguments": {"random_string": "unique_id"}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Get Databases in Specific Catalog:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"tool_name": "mcp_doris_get_db_list",
|
||||||
|
"arguments": {"random_string": "unique_id", "catalog_name": "mysql"}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Query Internal Catalog:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"tool_name": "mcp_doris_exec_query",
|
||||||
|
"arguments": {
|
||||||
|
"random_string": "unique_id",
|
||||||
|
"sql": "SELECT COUNT(*) FROM internal.ssb.customer"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **Query External Catalog:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"tool_name": "mcp_doris_exec_query",
|
||||||
|
"arguments": {
|
||||||
|
"random_string": "unique_id",
|
||||||
|
"sql": "SELECT COUNT(*) FROM mysql.ssb.customer"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
5. **Cross-Catalog Query:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"tool_name": "mcp_doris_exec_query",
|
||||||
|
"arguments": {
|
||||||
|
"random_string": "unique_id",
|
||||||
|
"sql": "SELECT i.c_name, m.external_data FROM internal.ssb.customer i JOIN mysql.test.user_info m ON i.c_custkey = m.customer_id"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
## Connecting with Cursor
|
## Connecting with Cursor
|
||||||
|
|
||||||
You can connect Cursor to this MCP server using either Stdio or SSE mode.
|
You can connect Cursor to this MCP server using either Stdio or SSE mode.
|
||||||
@@ -225,7 +295,7 @@ This section outlines the process for adding new MCP tools to the Doris MCP Serv
|
|||||||
Before writing new database interaction logic from scratch, check the existing utility modules:
|
Before writing new database interaction logic from scratch, check the existing utility modules:
|
||||||
|
|
||||||
* **`doris_mcp_server/utils/db.py`**: Provides basic functions for getting database connections (`get_db_connection`) and executing raw queries (`execute_query`, `execute_query_df`).
|
* **`doris_mcp_server/utils/db.py`**: Provides basic functions for getting database connections (`get_db_connection`) and executing raw queries (`execute_query`, `execute_query_df`).
|
||||||
* **`doris_mcp_server/utils/schema_extractor.py` (`MetadataExtractor` class)**: Offers high-level methods to retrieve database metadata, such as listing databases/tables (`get_all_databases`, `get_database_tables`), getting table schemas/comments/indexes (`get_table_schema`, `get_table_comment`, `get_column_comments`, `get_table_indexes`), and accessing audit logs (`get_recent_audit_logs`). It includes caching mechanisms.
|
* **`doris_mcp_server/utils/schema_extractor.py` (`MetadataExtractor` class)**: Offers high-level methods to retrieve database metadata with catalog federation support, such as listing databases/tables (`get_all_databases`, `get_database_tables`), getting table schemas/comments/indexes (`get_table_schema`, `get_table_comment`, `get_column_comments`, `get_table_indexes`), and accessing audit logs (`get_recent_audit_logs`). All methods support optional `catalog_name` parameters for multi-catalog environments. It includes caching mechanisms.
|
||||||
* **`doris_mcp_server/utils/sql_executor_tools.py` (`execute_sql_query` function)**: Provides a wrapper around `db.execute_query` that includes security checks (optional, controlled by `ENABLE_SQL_SECURITY_CHECK` env var), adds automatic `LIMIT` to SELECT queries, handles result serialization (dates, decimals), and formats the output into the standard MCP success/error structure. **It's recommended to use this for executing user-provided or generated SQL.**
|
* **`doris_mcp_server/utils/sql_executor_tools.py` (`execute_sql_query` function)**: Provides a wrapper around `db.execute_query` that includes security checks (optional, controlled by `ENABLE_SQL_SECURITY_CHECK` env var), adds automatic `LIMIT` to SELECT queries, handles result serialization (dates, decimals), and formats the output into the standard MCP success/error structure. **It's recommended to use this for executing user-provided or generated SQL.**
|
||||||
|
|
||||||
You can import and combine functionalities from these modules to build your new tool.
|
You can import and combine functionalities from these modules to build your new tool.
|
||||||
|
|||||||
Reference in New Issue
Block a user