将flask改成fastapi

This commit is contained in:
2025-10-13 13:18:03 +08:00
commit 88db2539b0
476 changed files with 739741 additions and 0 deletions

85
docker/README.md Normal file
View File

@@ -0,0 +1,85 @@
# RAGFlow Docker 服务管理
## 问题解决
原来的配置每次启动 `docker-compose.yml` 都会重新创建 `docker-compose-base.yml` 中的服务,现在已修改为只启动 ragflow 服务。
## Docker Compose 网络命名说明
Docker Compose 会自动为网络名称添加项目前缀:
- **项目名** + **网络名** = 最终网络名称
- 默认项目名通常是目录名:`ragflow-20250916`
- 最终网络名:`ragflow-20250916_ragflow`
## 修改内容
1. **移除了 `include` 指令**:不再包含 `docker-compose-base.yml`
2. **使用外部网络**ragflow 服务连接到由 `docker-compose-base.yml` 创建的 `ragflow-20250916_ragflow` 网络
3. **移除了 `depends_on`**:不再依赖 postgres 健康检查
4. **网络配置**
- `docker-compose-base.yml` 创建名为 `ragflow-20250916_ragflow` 的网络
- `docker-compose.yml` 使用 `external: true` 连接到已存在的网络
5. **使用项目名**:通过 `-p ragflow` 参数统一项目名
## 使用方法
### 首次使用(初始化)
```bash
# 1. 启动基础服务(创建网络)
docker-compose -p ragflow -f docker-compose-base.yml up -d
# 2. 启动 ragflow 服务
docker-compose -p ragflow -f docker-compose.yml up -d ragflow
```
### 日常使用(只启动 ragflow
```bash
# 使用脚本(推荐)
./start-ragflow.sh
# 或手动启动
docker-compose -p ragflow -f docker-compose.yml up -d ragflow
```
### 使用 ragflow.sh完整管理
```bash
# 启动 RAGFlow 服务(不重新创建基础服务)
./ragflow.sh start
# 停止 RAGFlow 服务(保留基础服务)
./ragflow.sh stop
# 重启 RAGFlow 服务
./ragflow.sh restart
# 查看服务状态
./ragflow.sh status
# 查看日志
./ragflow.sh logs
```
### 手动操作
```bash
# 只启动基础服务
docker-compose -f docker-compose-base.yml up -d
# 只启动 ragflow 服务
docker-compose -f docker-compose.yml up -d ragflow
# 停止 ragflow 服务
docker-compose -f docker-compose.yml down
```
## 服务说明
- **基础服务**postgres、redis、minio、opensearch
- **应用服务**ragflow-server
- **网络**ragflow外部网络
## 优势
1. **快速启动**:只启动需要的服务
2. **数据持久**:基础服务数据不会丢失
3. **灵活管理**:可以独立管理各个服务
4. **资源节约**:避免不必要的服务重建

View File

@@ -0,0 +1,33 @@
# The RAGFlow team do not actively maintain docker-compose-CN-oc9.yml, so use them at your own risk.
# However, you are welcome to file a pull request to improve it.
include:
- ./docker-compose-base.yml
services:
ragflow:
depends_on:
mysql:
condition: service_healthy
image: edwardelric233/ragflow:oc9
container_name: ragflow-server
ports:
- ${SVR_HTTP_PORT}:9380
- 80:80
- 443:443
volumes:
- ./ragflow-logs:/ragflow/logs
- ./nginx/ragflow.conf:/etc/nginx/conf.d/ragflow.conf
- ./nginx/proxy.conf:/etc/nginx/proxy.conf
- ./nginx/nginx.conf:/etc/nginx/nginx.conf
env_file: .env
environment:
- TZ=${TIMEZONE}
- HF_ENDPOINT=${HF_ENDPOINT}
- MACOS=${MACOS}
networks:
- ragflow
restart: on-failure
# https://docs.docker.com/engine/daemon/prometheus/#create-a-prometheus-configuration
# If you're using Docker Desktop, the --add-host flag is optional. This flag makes sure that the host's internal IP gets exposed to the Prometheus container.
extra_hosts:
- "host.docker.internal:host-gateway"

View File

@@ -0,0 +1,126 @@
services:
opensearch01:
container_name: ragflow-opensearch-01
image: hub.icert.top/opensearchproject/opensearch:2.19.1
volumes:
- osdata01:/usr/share/opensearch/data
ports:
- ${OS_PORT}:9201
env_file: .env
environment:
- node.name=opensearch01
- OPENSEARCH_PASSWORD=${OPENSEARCH_PASSWORD}
- OPENSEARCH_INITIAL_ADMIN_PASSWORD=${OPENSEARCH_PASSWORD}
- bootstrap.memory_lock=false
- discovery.type=single-node
- plugins.security.disabled=false
- plugins.security.ssl.http.enabled=false
- plugins.security.ssl.transport.enabled=true
- cluster.routing.allocation.disk.watermark.low=5gb
- cluster.routing.allocation.disk.watermark.high=3gb
- cluster.routing.allocation.disk.watermark.flood_stage=2gb
- TZ=${TIMEZONE}
- http.port=9201
mem_limit: ${MEM_LIMIT}
ulimits:
memlock:
soft: -1
hard: -1
healthcheck:
test: ["CMD-SHELL", "curl http://localhost:9201"]
interval: 10s
timeout: 10s
retries: 120
networks:
- ragflow
restart: on-failure
postgres:
image: postgres:15
container_name: ragflow-postgres
env_file: .env
environment:
- POSTGRES_DB=${POSTGRES_DBNAME}
- POSTGRES_USER=${POSTGRES_USER}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
- TZ=${TIMEZONE}
ports:
- ${POSTGRES_PORT-5440}:5432
volumes:
- postgres_data:/var/lib/postgresql/data
networks:
- ragflow
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DBNAME}"]
interval: 10s
timeout: 5s
retries: 5
restart: on-failure
minio:
image: quay.io/minio/minio:RELEASE.2025-06-13T11-33-47Z
container_name: ragflow-minio
command: server --console-address ":9001" /data
ports:
- ${MINIO_PORT}:9000
- ${MINIO_CONSOLE_PORT}:9001
env_file: .env
environment:
- MINIO_ROOT_USER=${MINIO_USER}
- MINIO_ROOT_PASSWORD=${MINIO_PASSWORD}
- TZ=${TIMEZONE}
volumes:
- minio_data:/data
networks:
- ragflow
restart: on-failure
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
interval: 30s
timeout: 20s
retries: 3
redis:
# swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/valkey/valkey:8
image: valkey/valkey
container_name: ragflow-redis
command: redis-server --requirepass ${REDIS_PASSWORD} --maxmemory 128mb --maxmemory-policy allkeys-lru
env_file: .env
ports:
- ${REDIS_PORT}:6379
volumes:
- redis_data:/data
networks:
- ragflow
restart: on-failure
healthcheck:
test: ["CMD", "redis-cli", "-a", "${REDIS_PASSWORD}", "ping"]
interval: 5s
timeout: 3s
retries: 3
start_period: 10s
volumes:
esdata01:
driver: local
osdata01:
driver: local
infinity_data:
driver: local
mysql_data:
driver: local
minio_data:
driver: local
redis_data:
driver: local
postgres_data:
driver: local
networks:
ragflow:
name: ragflow-20250916_ragflow
driver: bridge

View File

@@ -0,0 +1,40 @@
# The RAGFlow team do not actively maintain docker-compose-gpu-CN-oc9.yml, so use them at your own risk.
# However, you are welcome to file a pull request to improve it.
include:
- ./docker-compose-base.yml
services:
ragflow:
depends_on:
mysql:
condition: service_healthy
image: edwardelric233/ragflow:oc9
container_name: ragflow-server
ports:
- ${SVR_HTTP_PORT}:9380
- 80:80
- 443:443
volumes:
- ./ragflow-logs:/ragflow/logs
- ./nginx/ragflow.conf:/etc/nginx/conf.d/ragflow.conf
- ./nginx/proxy.conf:/etc/nginx/proxy.conf
- ./nginx/nginx.conf:/etc/nginx/nginx.conf
env_file: .env
environment:
- TZ=${TIMEZONE}
- HF_ENDPOINT=${HF_ENDPOINT}
- MACOS=${MACOS}
networks:
- ragflow
restart: on-failure
# https://docs.docker.com/engine/daemon/prometheus/#create-a-prometheus-configuration
# If you're using Docker Desktop, the --add-host flag is optional. This flag makes sure that the host's internal IP gets exposed to the Prometheus container.
extra_hosts:
- "host.docker.internal:host-gateway"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]

View File

@@ -0,0 +1,40 @@
# The RAGFlow team do not actively maintain docker-compose-gpu.yml, so use them at your own risk.
# Pull requests to improve it are welcome.
include:
- ./docker-compose-base.yml
services:
ragflow:
depends_on:
mysql:
condition: service_healthy
image: ${RAGFLOW_IMAGE}
container_name: ragflow-server
ports:
- ${SVR_HTTP_PORT}:9380
- 80:80
- 443:443
volumes:
- ./ragflow-logs:/ragflow/logs
- ./nginx/ragflow.conf:/etc/nginx/conf.d/ragflow.conf
- ./nginx/proxy.conf:/etc/nginx/proxy.conf
- ./nginx/nginx.conf:/etc/nginx/nginx.conf
env_file: .env
environment:
- TZ=${TIMEZONE}
- HF_ENDPOINT=${HF_ENDPOINT}
- MACOS=${MACOS}
networks:
- ragflow
restart: on-failure
# https://docs.docker.com/engine/daemon/prometheus/#create-a-prometheus-configuration
# If you're using Docker Desktop, the --add-host flag is optional. This flag makes sure that the host's internal IP gets exposed to the Prometheus container.
extra_hosts:
- "host.docker.internal:host-gateway"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]

View File

@@ -0,0 +1,57 @@
include:
- ./docker-compose-base.yml
services:
ragflow:
platform: linux/amd64
depends_on:
mysql:
condition: service_healthy
build:
context: ../
dockerfile: Dockerfile
container_name: ragflow-server
ports:
- ${SVR_HTTP_PORT}:9380
- 80:80
- 443:443
volumes:
- ./ragflow-logs:/ragflow/logs
- ./nginx/ragflow.conf:/etc/nginx/conf.d/ragflow.conf
- ./nginx/proxy.conf:/etc/nginx/proxy.conf
- ./nginx/nginx.conf:/etc/nginx/nginx.conf
env_file: .env
environment:
- TZ=${TIMEZONE}
- HF_ENDPOINT=${HF_ENDPOINT}
- MACOS=${MACOS:-1}
- LIGHTEN=${LIGHTEN:-1}
networks:
- ragflow
restart: on-failure
# https://docs.docker.com/engine/daemon/prometheus/#create-a-prometheus-configuration
# If you're using Docker Desktop, the --add-host flag is optional. This flag makes sure that the host's internal IP gets exposed to the Prometheus container.
extra_hosts:
- "host.docker.internal:host-gateway"
# executor:
# depends_on:
# mysql:
# condition: service_healthy
# image: ${RAGFLOW_IMAGE}
# container_name: ragflow-executor
# volumes:
# - ./ragflow-logs:/ragflow/logs
# - ./nginx/ragflow.conf:/etc/nginx/conf.d/ragflow.conf
# env_file: .env
# environment:
# - TZ=${TIMEZONE}
# - HF_ENDPOINT=${HF_ENDPOINT}
# - MACOS=${MACOS}
# entrypoint: "/ragflow/entrypoint_task_executor.sh 1 3"
# networks:
# - ragflow
# restart: on-failure
# # https://docs.docker.com/engine/daemon/prometheus/#create-a-prometheus-configuration
# # If you're using Docker Desktop, the --add-host flag is optional. This flag makes sure that the host's internal IP gets exposed to the Prometheus container.
# extra_hosts:
# - "host.docker.internal:host-gateway"

53
docker/docker-compose.yml Normal file
View File

@@ -0,0 +1,53 @@
# To ensure that the container processes the locally modified `service_conf.yaml.template` instead of the one included in its image, you need to mount the local `service_conf.yaml.template` to the container.
services:
ragflow:
image: ${RAGFLOW_IMAGE}
# Example configuration to set up an MCP server:
# command:
# - --enable-mcpserver
# - --mcp-host=0.0.0.0
# - --mcp-port=9382
# - --mcp-base-url=http://127.0.0.1:9380
# - --mcp-script-path=/ragflow/mcp/server/server.py
# - --mcp-mode=self-host
# - --mcp-host-api-key=ragflow-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
# Optional transport flags for MCP (customize if needed).
# Host mode need to combined with --no-transport-streamable-http-enabled flag, namely, host+streamable-http is not supported yet.
# The following are enabled by default unless explicitly disabled with --no-<flag>.
# - --no-transport-sse-enabled # Disable legacy SSE endpoints (/sse and /messages/)
# - --no-transport-streamable-http-enabled # Disable Streamable HTTP transport (/mcp endpoint)
# - --no-json-response # Disable JSON response mode in Streamable HTTP transport (instead of SSE over HTTP)
container_name: ragflow-server
ports:
- ${SVR_HTTP_PORT}:9380
- 8000:80
- 8443:443
- 15678:5678
- 15679:5679
- 19382:9382 # entry for MCP (host_port:docker_port). The docker_port must match the value you set for `mcp-port` above.
volumes:
- ./ragflow-logs:/ragflow/logs
- ./nginx/ragflow.conf:/etc/nginx/conf.d/ragflow.conf
- ./nginx/proxy.conf:/etc/nginx/proxy.conf
- ./nginx/nginx.conf:/etc/nginx/nginx.conf
- ../history_data_agent:/ragflow/history_data_agent
- ./service_conf.yaml.template:/ragflow/conf/service_conf.yaml.template
- ./entrypoint.sh:/ragflow/entrypoint.sh
env_file: .env
environment:
- TZ=${TIMEZONE}
- HF_ENDPOINT=${HF_ENDPOINT-}
- MACOS=${MACOS-}
- DB_TYPE=postgres
networks:
- ragflow
restart: on-failure
# https://docs.docker.com/engine/daemon/prometheus/#create-a-prometheus-configuration
# If you use Docker Desktop, the --add-host flag is optional. This flag ensures that the host's internal IP is exposed to the Prometheus container.
extra_hosts:
- "host.docker.internal:host-gateway"
networks:
ragflow:
name: ragflow-20250916_ragflow
external: true

210
docker/entrypoint.sh Normal file
View File

@@ -0,0 +1,210 @@
#!/usr/bin/env bash
set -e
# -----------------------------------------------------------------------------
# Usage and command-line argument parsing
# -----------------------------------------------------------------------------
function usage() {
echo "Usage: $0 [--disable-webserver] [--disable-taskexecutor] [--consumer-no-beg=<num>] [--consumer-no-end=<num>] [--workers=<num>] [--host-id=<string>]"
echo
echo " --disable-webserver Disables the web server (nginx + ragflow_server)."
echo " --disable-taskexecutor Disables task executor workers."
echo " --enable-mcpserver Enables the MCP server."
echo " --consumer-no-beg=<num> Start range for consumers (if using range-based)."
echo " --consumer-no-end=<num> End range for consumers (if using range-based)."
echo " --workers=<num> Number of task executors to run (if range is not used)."
echo " --host-id=<string> Unique ID for the host (defaults to \`hostname\`)."
echo
echo "Examples:"
echo " $0 --disable-taskexecutor"
echo " $0 --disable-webserver --consumer-no-beg=0 --consumer-no-end=5"
echo " $0 --disable-webserver --workers=2 --host-id=myhost123"
echo " $0 --enable-mcpserver"
exit 1
}
ENABLE_WEBSERVER=1 # Default to enable web server
ENABLE_TASKEXECUTOR=1 # Default to enable task executor
ENABLE_MCP_SERVER=0
CONSUMER_NO_BEG=0
CONSUMER_NO_END=0
WORKERS=1
MCP_HOST="127.0.0.1"
MCP_PORT=9382
MCP_BASE_URL="http://127.0.0.1:9380"
MCP_SCRIPT_PATH="/ragflow/mcp/server/server.py"
MCP_MODE="self-host"
MCP_HOST_API_KEY=""
MCP_TRANSPORT_SSE_FLAG="--transport-sse-enabled"
MCP_TRANSPORT_STREAMABLE_HTTP_FLAG="--transport-streamable-http-enabled"
MCP_JSON_RESPONSE_FLAG="--json-response"
# -----------------------------------------------------------------------------
# Host ID logic:
# 1. By default, use the system hostname if length <= 32
# 2. Otherwise, use the full MD5 hash of the hostname (32 hex chars)
# -----------------------------------------------------------------------------
CURRENT_HOSTNAME="$(hostname)"
if [ ${#CURRENT_HOSTNAME} -le 32 ]; then
DEFAULT_HOST_ID="$CURRENT_HOSTNAME"
else
DEFAULT_HOST_ID="$(echo -n "$CURRENT_HOSTNAME" | md5sum | cut -d ' ' -f 1)"
fi
HOST_ID="$DEFAULT_HOST_ID"
# Parse arguments
for arg in "$@"; do
case $arg in
--disable-webserver)
ENABLE_WEBSERVER=0
shift
;;
--disable-taskexecutor)
ENABLE_TASKEXECUTOR=0
shift
;;
--enable-mcpserver)
ENABLE_MCP_SERVER=1
shift
;;
--mcp-host=*)
MCP_HOST="${arg#*=}"
shift
;;
--mcp-port=*)
MCP_PORT="${arg#*=}"
shift
;;
--mcp-base-url=*)
MCP_BASE_URL="${arg#*=}"
shift
;;
--mcp-mode=*)
MCP_MODE="${arg#*=}"
shift
;;
--mcp-host-api-key=*)
MCP_HOST_API_KEY="${arg#*=}"
shift
;;
--mcp-script-path=*)
MCP_SCRIPT_PATH="${arg#*=}"
shift
;;
--no-transport-sse-enabled)
MCP_TRANSPORT_SSE_FLAG="--no-transport-sse-enabled"
shift
;;
--no-transport-streamable-http-enabled)
MCP_TRANSPORT_STREAMABLE_HTTP_FLAG="--no-transport-streamable-http-enabled"
shift
;;
--no-json-response)
MCP_JSON_RESPONSE_FLAG="--no-json-response"
shift
;;
--consumer-no-beg=*)
CONSUMER_NO_BEG="${arg#*=}"
shift
;;
--consumer-no-end=*)
CONSUMER_NO_END="${arg#*=}"
shift
;;
--workers=*)
WORKERS="${arg#*=}"
shift
;;
--host-id=*)
HOST_ID="${arg#*=}"
shift
;;
*)
usage
;;
esac
done
# -----------------------------------------------------------------------------
# Replace env variables in the service_conf.yaml file
# -----------------------------------------------------------------------------
CONF_DIR="/ragflow/conf"
TEMPLATE_FILE="${CONF_DIR}/service_conf.yaml.template"
CONF_FILE="${CONF_DIR}/service_conf.yaml"
rm -f "${CONF_FILE}"
while IFS= read -r line || [[ -n "$line" ]]; do
eval "echo \"$line\"" >> "${CONF_FILE}"
done < "${TEMPLATE_FILE}"
export LD_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu/"
PY=python3
# -----------------------------------------------------------------------------
# Function(s)
# -----------------------------------------------------------------------------
function task_exe() {
local consumer_id="$1"
local host_id="$2"
JEMALLOC_PATH="$(pkg-config --variable=libdir jemalloc)/libjemalloc.so"
while true; do
LD_PRELOAD="$JEMALLOC_PATH" \
"$PY" rag/svr/task_executor.py "${host_id}_${consumer_id}"
done
}
function start_mcp_server() {
echo "Starting MCP Server on ${MCP_HOST}:${MCP_PORT} with base URL ${MCP_BASE_URL}..."
"$PY" "${MCP_SCRIPT_PATH}" \
--host="${MCP_HOST}" \
--port="${MCP_PORT}" \
--base-url="${MCP_BASE_URL}" \
--mode="${MCP_MODE}" \
--api-key="${MCP_HOST_API_KEY}" \
"${MCP_TRANSPORT_SSE_FLAG}" \
"${MCP_TRANSPORT_STREAMABLE_HTTP_FLAG}" \
"${MCP_JSON_RESPONSE_FLAG}" &
}
# -----------------------------------------------------------------------------
# Start components based on flags
# -----------------------------------------------------------------------------
if [[ "${ENABLE_WEBSERVER}" -eq 1 ]]; then
echo "Starting nginx..."
/usr/sbin/nginx
echo "Starting ragflow_server..."
while true; do
"$PY" api/ragflow_server.py
done &
fi
if [[ "${ENABLE_MCP_SERVER}" -eq 1 ]]; then
start_mcp_server
fi
if [[ "${ENABLE_TASKEXECUTOR}" -eq 1 ]]; then
if [[ "${CONSUMER_NO_END}" -gt "${CONSUMER_NO_BEG}" ]]; then
echo "Starting task executors on host '${HOST_ID}' for IDs in [${CONSUMER_NO_BEG}, ${CONSUMER_NO_END})..."
for (( i=CONSUMER_NO_BEG; i<CONSUMER_NO_END; i++ ))
do
task_exe "${i}" "${HOST_ID}" &
done
else
# Otherwise, start a fixed number of workers
echo "Starting ${WORKERS} task executor(s) on host '${HOST_ID}'..."
for (( i=0; i<WORKERS; i++ ))
do
task_exe "${i}" "${HOST_ID}" &
done
fi
fi
wait

57
docker/infinity_conf.toml Normal file
View File

@@ -0,0 +1,57 @@
[general]
version = "0.6.0"
time_zone = "utc-8"
[network]
server_address = "0.0.0.0"
postgres_port = 5432
http_port = 23820
client_port = 23817
connection_pool_size = 128
[log]
log_filename = "infinity.log"
log_dir = "/var/infinity/log"
log_to_stdout = true
log_file_max_size = "100MB"
log_file_rotate_count = 10
# trace/debug/info/warning/error/critical 6 log levels, default: info
log_level = "trace"
[storage]
persistence_dir = "/var/infinity/persistence"
data_dir = "/var/infinity/data"
# periodically activates garbage collection:
# 0 means real-time,
# s means seconds, for example "60s", 60 seconds
# m means minutes, for example "60m", 60 minutes
# h means hours, for example "1h", 1 hour
optimize_interval = "10s"
cleanup_interval = "60s"
compact_interval = "120s"
storage_type = "local"
# dump memory index entry when it reachs the capacity
mem_index_capacity = 65536
# S3 storage config example:
# [storage.object_storage]
# url = "127.0.0.1:9000"
# bucket_name = "infinity"
# access_key = "minioadmin"
# secret_key = "minioadmin"
# enable_https = false
[buffer]
buffer_manager_size = "8GB"
lru_num = 7
temp_dir = "/var/infinity/tmp"
result_cache = "off"
memindex_memory_quota = "1GB"
[wal]
wal_dir = "/var/infinity/wal"
[resource]
resource_dir = "/var/infinity/resource"

2
docker/init.sql Normal file
View File

@@ -0,0 +1,2 @@
CREATE DATABASE IF NOT EXISTS rag_flow;
USE rag_flow;

View File

@@ -0,0 +1,129 @@
#!/bin/bash
# Exit immediately if a command exits with a non-zero status
set -e
# Function to load environment variables from .env file
load_env_file() {
# Get the directory of the current script
local script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
local env_file="$script_dir/.env"
# Check if .env file exists
if [ -f "$env_file" ]; then
echo "Loading environment variables from: $env_file"
# Source the .env file
set -a
source "$env_file"
set +a
else
echo "Warning: .env file not found at: $env_file"
fi
}
# Load environment variables
load_env_file
# Unset HTTP proxies that might be set by Docker daemon
export http_proxy=""; export https_proxy=""; export no_proxy=""; export HTTP_PROXY=""; export HTTPS_PROXY=""; export NO_PROXY=""
export PYTHONPATH=$(pwd)
export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/
JEMALLOC_PATH=$(pkg-config --variable=libdir jemalloc)/libjemalloc.so
PY=python3
# Set default number of workers if WS is not set or less than 1
if [[ -z "$WS" || $WS -lt 1 ]]; then
WS=1
fi
# Maximum number of retries for each task executor and server
MAX_RETRIES=5
# Flag to control termination
STOP=false
# Array to keep track of child PIDs
PIDS=()
# Set the path to the NLTK data directory
export NLTK_DATA="./nltk_data"
# Function to handle termination signals
cleanup() {
echo "Termination signal received. Shutting down..."
STOP=true
# Terminate all child processes
for pid in "${PIDS[@]}"; do
if kill -0 "$pid" 2>/dev/null; then
echo "Killing process $pid"
kill "$pid"
fi
done
exit 0
}
# Trap SIGINT and SIGTERM to invoke cleanup
trap cleanup SIGINT SIGTERM
# Function to execute task_executor with retry logic
task_exe(){
local task_id=$1
local retry_count=0
while ! $STOP && [ $retry_count -lt $MAX_RETRIES ]; do
echo "Starting task_executor.py for task $task_id (Attempt $((retry_count+1)))"
LD_PRELOAD=$JEMALLOC_PATH $PY rag/svr/task_executor.py "$task_id"
EXIT_CODE=$?
if [ $EXIT_CODE -eq 0 ]; then
echo "task_executor.py for task $task_id exited successfully."
break
else
echo "task_executor.py for task $task_id failed with exit code $EXIT_CODE. Retrying..." >&2
retry_count=$((retry_count + 1))
sleep 2
fi
done
if [ $retry_count -ge $MAX_RETRIES ]; then
echo "task_executor.py for task $task_id failed after $MAX_RETRIES attempts. Exiting..." >&2
cleanup
fi
}
# Function to execute ragflow_server with retry logic
run_server(){
local retry_count=0
while ! $STOP && [ $retry_count -lt $MAX_RETRIES ]; do
echo "Starting ragflow_server.py (Attempt $((retry_count+1)))"
$PY api/ragflow_server.py
EXIT_CODE=$?
if [ $EXIT_CODE -eq 0 ]; then
echo "ragflow_server.py exited successfully."
break
else
echo "ragflow_server.py failed with exit code $EXIT_CODE. Retrying..." >&2
retry_count=$((retry_count + 1))
sleep 2
fi
done
if [ $retry_count -ge $MAX_RETRIES ]; then
echo "ragflow_server.py failed after $MAX_RETRIES attempts. Exiting..." >&2
cleanup
fi
}
# Start task executors
for ((i=0;i<WS;i++))
do
task_exe "$i" &
PIDS+=($!)
done
# Start the main server
run_server &
PIDS+=($!)
# Wait for all background processes to finish
wait

298
docker/migration.sh Normal file
View File

@@ -0,0 +1,298 @@
#!/bin/bash
# RAGFlow Data Migration Script
# Usage: ./migration.sh [backup|restore] [backup_folder]
#
# This script helps you backup and restore RAGFlow Docker volumes
# including MySQL, MinIO, Redis, and Elasticsearch data.
set -e # Exit on any error
# Instead, we'll handle errors manually for better debugging experience
# Default values
DEFAULT_BACKUP_FOLDER="backup"
VOLUMES=("docker_mysql_data" "docker_minio_data" "docker_redis_data" "docker_esdata01")
BACKUP_FILES=("mysql_backup.tar.gz" "minio_backup.tar.gz" "redis_backup.tar.gz" "es_backup.tar.gz")
# Function to display help information
show_help() {
echo "RAGFlow Data Migration Tool"
echo ""
echo "USAGE:"
echo " $0 <operation> [backup_folder]"
echo ""
echo "OPERATIONS:"
echo " backup - Create backup of all RAGFlow data volumes"
echo " restore - Restore RAGFlow data volumes from backup"
echo " help - Show this help message"
echo ""
echo "PARAMETERS:"
echo " backup_folder - Name of backup folder (default: '$DEFAULT_BACKUP_FOLDER')"
echo ""
echo "EXAMPLES:"
echo " $0 backup # Backup to './backup' folder"
echo " $0 backup my_backup # Backup to './my_backup' folder"
echo " $0 restore # Restore from './backup' folder"
echo " $0 restore my_backup # Restore from './my_backup' folder"
echo ""
echo "DOCKER VOLUMES:"
echo " - docker_mysql_data (MySQL database)"
echo " - docker_minio_data (MinIO object storage)"
echo " - docker_redis_data (Redis cache)"
echo " - docker_esdata01 (Elasticsearch indices)"
}
# Function to check if Docker is running
check_docker() {
if ! docker info >/dev/null 2>&1; then
echo "❌ Error: Docker is not running or not accessible"
echo "Please start Docker and try again"
exit 1
fi
}
# Function to check if volume exists
volume_exists() {
local volume_name=$1
docker volume inspect "$volume_name" >/dev/null 2>&1
}
# Function to check if any containers are using the target volumes
check_containers_using_volumes() {
echo "🔍 Checking for running containers that might be using target volumes..."
# Get all running containers
local running_containers=$(docker ps --format "{{.Names}}")
if [ -z "$running_containers" ]; then
echo "✅ No running containers found"
return 0
fi
# Check each running container for volume usage
local containers_using_volumes=()
local volume_usage_details=()
for container in $running_containers; do
# Get container's mount information
local mounts=$(docker inspect "$container" --format '{{range .Mounts}}{{.Source}}{{"|"}}{{end}}' 2>/dev/null || echo "")
# Check if any of our target volumes are used by this container
for volume in "${VOLUMES[@]}"; do
if echo "$mounts" | grep -q "$volume"; then
containers_using_volumes+=("$container")
volume_usage_details+=("$container -> $volume")
break
fi
done
done
# If any containers are using our volumes, show error and exit
if [ ${#containers_using_volumes[@]} -gt 0 ]; then
echo ""
echo "❌ ERROR: Found running containers using target volumes!"
echo ""
echo "📋 Running containers status:"
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Image}}"
echo ""
echo "🔗 Volume usage details:"
for detail in "${volume_usage_details[@]}"; do
echo " - $detail"
done
echo ""
echo "🛑 SOLUTION: Stop the containers before performing backup/restore operations:"
echo " docker-compose -f docker/<your-docker-compose-file>.yml down"
echo ""
echo "💡 After backup/restore, you can restart with:"
echo " docker-compose -f docker/<your-docker-compose-file>.yml up -d"
echo ""
exit 1
fi
echo "✅ No containers are using target volumes, safe to proceed"
return 0
}
# Function to confirm user action
confirm_action() {
local message=$1
echo -n "$message (y/N): "
read -r response
case "$response" in
[yY]|[yY][eE][sS]) return 0 ;;
*) return 1 ;;
esac
}
# Function to perform backup
perform_backup() {
local backup_folder=$1
echo "🚀 Starting RAGFlow data backup..."
echo "📁 Backup folder: $backup_folder"
echo ""
# Check if any containers are using the volumes
check_containers_using_volumes
# Create backup folder if it doesn't exist
mkdir -p "$backup_folder"
# Backup each volume
for i in "${!VOLUMES[@]}"; do
local volume="${VOLUMES[$i]}"
local backup_file="${BACKUP_FILES[$i]}"
local step=$((i + 1))
echo "📦 Step $step/4: Backing up $volume..."
if volume_exists "$volume"; then
docker run --rm \
-v "$volume":/source \
-v "$(pwd)/$backup_folder":/backup \
alpine tar czf "/backup/$backup_file" -C /source .
echo "✅ Successfully backed up $volume to $backup_folder/$backup_file"
else
echo "⚠️ Warning: Volume $volume does not exist, skipping..."
fi
echo ""
done
echo "🎉 Backup completed successfully!"
echo "📍 Backup location: $(pwd)/$backup_folder"
# List backup files with sizes
echo ""
echo "📋 Backup files created:"
for backup_file in "${BACKUP_FILES[@]}"; do
if [ -f "$backup_folder/$backup_file" ]; then
local size=$(ls -lh "$backup_folder/$backup_file" | awk '{print $5}')
echo " - $backup_file ($size)"
fi
done
}
# Function to perform restore
perform_restore() {
local backup_folder=$1
echo "🔄 Starting RAGFlow data restore..."
echo "📁 Backup folder: $backup_folder"
echo ""
# Check if any containers are using the volumes
check_containers_using_volumes
# Check if backup folder exists
if [ ! -d "$backup_folder" ]; then
echo "❌ Error: Backup folder '$backup_folder' does not exist"
exit 1
fi
# Check if all backup files exist
local missing_files=()
for backup_file in "${BACKUP_FILES[@]}"; do
if [ ! -f "$backup_folder/$backup_file" ]; then
missing_files+=("$backup_file")
fi
done
if [ ${#missing_files[@]} -gt 0 ]; then
echo "❌ Error: Missing backup files:"
for file in "${missing_files[@]}"; do
echo " - $file"
done
echo "Please ensure all backup files are present in '$backup_folder'"
exit 1
fi
# Check for existing volumes and warn user
local existing_volumes=()
for volume in "${VOLUMES[@]}"; do
if volume_exists "$volume"; then
existing_volumes+=("$volume")
fi
done
if [ ${#existing_volumes[@]} -gt 0 ]; then
echo "⚠️ WARNING: The following Docker volumes already exist:"
for volume in "${existing_volumes[@]}"; do
echo " - $volume"
done
echo ""
echo "🔴 IMPORTANT: Restoring will OVERWRITE existing data!"
echo "💡 Recommendation: Create a backup of your current data first:"
echo " $0 backup current_backup_$(date +%Y%m%d_%H%M%S)"
echo ""
if ! confirm_action "Do you want to continue with the restore operation?"; then
echo "❌ Restore operation cancelled by user"
exit 0
fi
fi
# Create volumes and restore data
for i in "${!VOLUMES[@]}"; do
local volume="${VOLUMES[$i]}"
local backup_file="${BACKUP_FILES[$i]}"
local step=$((i + 1))
echo "🔧 Step $step/4: Restoring $volume..."
# Create volume if it doesn't exist
if ! volume_exists "$volume"; then
echo " 📋 Creating Docker volume: $volume"
docker volume create "$volume"
else
echo " 📋 Using existing Docker volume: $volume"
fi
# Restore data
echo " 📥 Restoring data from $backup_file..."
docker run --rm \
-v "$volume":/target \
-v "$(pwd)/$backup_folder":/backup \
alpine tar xzf "/backup/$backup_file" -C /target
echo "✅ Successfully restored $volume"
echo ""
done
echo "🎉 Restore completed successfully!"
echo "💡 You can now start your RAGFlow services"
}
# Main script logic
main() {
# Check if Docker is available
check_docker
# Parse command line arguments
local operation=${1:-}
local backup_folder=${2:-$DEFAULT_BACKUP_FOLDER}
# Handle help or no arguments
if [ -z "$operation" ] || [ "$operation" = "help" ] || [ "$operation" = "-h" ] || [ "$operation" = "--help" ]; then
show_help
exit 0
fi
# Validate operation
case "$operation" in
backup)
perform_backup "$backup_folder"
;;
restore)
perform_restore "$backup_folder"
;;
*)
echo "❌ Error: Invalid operation '$operation'"
echo ""
show_help
exit 1
;;
esac
}
# Run main function with all arguments
main "$@"

33
docker/nginx/nginx.conf Normal file
View File

@@ -0,0 +1,33 @@
user root;
worker_processes auto;
error_log /var/log/nginx/error.log notice;
pid /var/run/nginx.pid;
events {
worker_connections 1024;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
sendfile on;
#tcp_nopush on;
keepalive_timeout 65;
#gzip on;
client_max_body_size 1024M;
include /etc/nginx/conf.d/ragflow.conf;
}

12
docker/nginx/proxy.conf Normal file
View File

@@ -0,0 +1,12 @@
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_buffering off;
proxy_read_timeout 3600s;
proxy_send_timeout 3600s;
proxy_buffer_size 1024k;
proxy_buffers 16 1024k;
proxy_busy_buffers_size 2048k;
proxy_temp_file_write_size 2048k;

29
docker/nginx/ragflow.conf Normal file
View File

@@ -0,0 +1,29 @@
server {
listen 80;
server_name _;
root /ragflow/web/dist;
gzip on;
gzip_min_length 1k;
gzip_comp_level 9;
gzip_types text/plain application/javascript application/x-javascript text/css application/xml text/javascript application/x-httpd-php image/jpeg image/gif image/png;
gzip_vary on;
gzip_disable "MSIE [1-6]\.";
location ~ ^/(v1|api) {
proxy_pass http://ragflow:9380;
include proxy.conf;
}
location / {
index index.html;
try_files $uri $uri/ /index.html;
}
# Cache-Control: max-age~@~AExpires
location ~ ^/static/(css|js|media)/ {
expires 10y;
access_log off;
}
}

View File

@@ -0,0 +1,41 @@
server {
listen 80;
server_name your-ragflow-domain.com;
return 301 https://$host$request_uri;
}
server {
listen 443 ssl;
server_name your-ragflow-domain.com;
ssl_certificate /etc/nginx/ssl/fullchain.pem;
ssl_certificate_key /etc/nginx/ssl/privkey.pem;
root /ragflow/web/dist;
gzip on;
gzip_min_length 1k;
gzip_comp_level 9;
gzip_types text/plain application/javascript application/x-javascript text/css application/xml text/javascript application/x-httpd-php image/jpeg image/gif image/png;
gzip_vary on;
gzip_disable "MSIE [1-6]\.";
location ~ ^/(v1|api) {
proxy_pass http://ragflow:9380;
include proxy.conf;
}
location / {
index index.html;
try_files $uri $uri/ /index.html;
}
# Cache-Control: max-age~@~AExpires
location ~ ^/static/(css|js|media)/ {
expires 10y;
access_log off;
}
}

63
docker/ragflow.sh Normal file
View File

@@ -0,0 +1,63 @@
#!/bin/bash
# RAGFlow 服务管理脚本
case "$1" in
"start")
echo "启动 RAGFlow 服务..."
./start-ragflow.sh
;;
"stop")
echo "停止 RAGFlow 服务..."
./stop-ragflow.sh
;;
"restart")
echo "重启 RAGFlow 服务..."
./stop-ragflow.sh
sleep 5
./start-ragflow.sh
;;
"status")
echo "检查服务状态..."
echo "=== RAGFlow 服务 ==="
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}" | grep -E "(ragflow-server|ragflow-postgres|ragflow-redis|ragflow-minio|ragflow-opensearch)"
;;
"logs")
echo "查看 RAGFlow 日志..."
docker-compose -f docker-compose.yml logs -f ragflow
;;
"init")
echo "初始化所有服务(仅首次使用)..."
docker-compose -f docker-compose-base.yml up -d
echo "等待基础服务启动..."
sleep 30
docker-compose -f docker-compose.yml up -d ragflow
echo "所有服务启动完成!"
;;
"clean")
echo "清理所有服务(包括数据)..."
read -p "确定要删除所有数据吗?(y/N): " -n 1 -r
echo
if [[ $REPLY =~ ^[Yy]$ ]]; then
docker-compose -f docker-compose.yml down
docker-compose -f docker-compose-base.yml down -v
docker network rm ragflow 2>/dev/null || true
echo "所有服务已清理"
else
echo "操作已取消"
fi
;;
*)
echo "用法: $0 {start|stop|restart|status|logs|init|clean}"
echo ""
echo "命令说明:"
echo " start - 启动 RAGFlow 服务(不重新创建基础服务)"
echo " stop - 停止 RAGFlow 服务(保留基础服务)"
echo " restart - 重启 RAGFlow 服务"
echo " status - 查看服务状态"
echo " logs - 查看 RAGFlow 日志"
echo " init - 初始化所有服务(仅首次使用)"
echo " clean - 清理所有服务(包括数据)"
exit 1
;;
esac

View File

@@ -0,0 +1,135 @@
ragflow:
host: ${RAGFLOW_HOST:-0.0.0.0}
http_port: 9380
admin:
host: ${RAGFLOW_HOST:-0.0.0.0}
http_port: 9381
mysql:
name: '${MYSQL_DBNAME:-rag_flow}'
user: '${MYSQL_USER:-root}'
password: '${MYSQL_PASSWORD:-infini_rag_flow}'
host: '${MYSQL_HOST:-mysql}'
port: 3306
max_connections: 900
stale_timeout: 300
max_allowed_packet: ${MYSQL_MAX_PACKET:-1073741824}
minio:
user: '${MINIO_USER:-rag_flow}'
password: '${MINIO_PASSWORD:-infini_rag_flow}'
host: '${MINIO_HOST:-minio}:9000'
es:
hosts: 'http://${ES_HOST:-es01}:9200'
username: '${ES_USER:-elastic}'
password: '${ELASTIC_PASSWORD:-infini_rag_flow}'
os:
hosts: 'http://${OS_HOST:-opensearch01}:9201'
username: '${OS_USER:-admin}'
password: '${OPENSEARCH_PASSWORD:-infini_rag_flow_OS_01}'
infinity:
uri: '${INFINITY_HOST:-infinity}:23817'
db_name: 'default_db'
redis:
db: 1
password: '${REDIS_PASSWORD:-infini_rag_flow}'
host: '${REDIS_HOST:-redis}:6379'
postgres:
name: '${POSTGRES_DBNAME:-rag_flow}'
user: '${POSTGRES_USER:-rag_flow}'
password: '${POSTGRES_PASSWORD:-infini_rag_flow}'
host: '${POSTGRES_HOST:-postgres}'
port: 5432
max_connections: 100
stale_timeout: 30
# s3:
# access_key: 'access_key'
# secret_key: 'secret_key'
# region: 'region'
# endpoint_url: 'endpoint_url'
# bucket: 'bucket'
# prefix_path: 'prefix_path'
# signature_version: 'v4'
# addressing_style: 'path'
# oss:
# access_key: '${ACCESS_KEY}'
# secret_key: '${SECRET_KEY}'
# endpoint_url: '${ENDPOINT}'
# region: '${REGION}'
# bucket: '${BUCKET}'
# prefix_path: '${OSS_PREFIX_PATH}'
# azure:
# auth_type: 'sas'
# container_url: 'container_url'
# sas_token: 'sas_token'
# azure:
# auth_type: 'spn'
# account_url: 'account_url'
# client_id: 'client_id'
# secret: 'secret'
# tenant_id: 'tenant_id'
# container_name: 'container_name'
# The OSS object storage uses the MySQL configuration above by default. If you need to switch to another object storage service, please uncomment and configure the following parameters.
# opendal:
# scheme: 'mysql' # Storage type, such as s3, oss, azure, etc.
# config:
# oss_table: 'opendal_storage'
# user_default_llm:
# factory: 'BAAI'
# api_key: 'backup'
# base_url: 'backup_base_url'
# default_models:
# chat_model:
# name: 'qwen2.5-7b-instruct'
# factory: 'xxxx'
# api_key: 'xxxx'
# base_url: 'https://api.xx.com'
# embedding_model:
# name: 'bge-m3'
# rerank_model: 'bge-reranker-v2'
# asr_model:
# model: 'whisper-large-v3' # alias of name
# image2text_model: ''
# oauth:
# oauth2:
# display_name: "OAuth2"
# client_id: "your_client_id"
# client_secret: "your_client_secret"
# authorization_url: "https://your-oauth-provider.com/oauth/authorize"
# token_url: "https://your-oauth-provider.com/oauth/token"
# userinfo_url: "https://your-oauth-provider.com/oauth/userinfo"
# redirect_uri: "https://your-app.com/v1/user/oauth/callback/oauth2"
# oidc:
# display_name: "OIDC"
# client_id: "your_client_id"
# client_secret: "your_client_secret"
# issuer: "https://your-oauth-provider.com/oidc"
# scope: "openid email profile"
# redirect_uri: "https://your-app.com/v1/user/oauth/callback/oidc"
# github:
# type: "github"
# icon: "github"
# display_name: "Github"
# client_id: "your_client_id"
# client_secret: "your_client_secret"
# redirect_uri: "https://your-app.com/v1/user/oauth/callback/github"
# authentication:
# client:
# switch: false
# http_app_key:
# http_secret_key:
# site:
# switch: false
# permission:
# switch: false
# component: false
# dataset: false
# smtp:
# mail_server: ""
# mail_port: 465
# mail_use_ssl: true
# mail_use_tls: false
# mail_username: ""
# mail_password: ""
# mail_default_sender:
# - "RAGFlow" # display name
# - "" # sender email address
# mail_frontend_url: "https://your-frontend.example.com"

27
docker/start-ragflow.sh Normal file
View File

@@ -0,0 +1,27 @@
#!/bin/bash
# 启动脚本:只启动 ragflow 服务,不重新创建基础服务
echo "检查基础服务是否运行..."
# 检查基础服务是否在运行
if ! docker ps --format "table {{.Names}}" | grep -q "ragflow-postgres\|ragflow-redis\|ragflow-minio\|ragflow-opensearch-01"; then
echo "基础服务未运行,正在启动基础服务..."
docker-compose -p ragflow -f docker-compose-base.yml up -d
echo "等待基础服务启动完成..."
sleep 30
else
echo "基础服务已在运行"
fi
# 检查网络是否存在
if ! docker network ls --format "{{.Name}}" | grep -q "ragflow-20250916_ragflow"; then
echo "ragflow 网络不存在,请先运行基础服务创建网络"
exit 1
fi
echo "启动 ragflow 服务..."
docker-compose -p ragflow -f docker-compose.yml up -d ragflow
echo "ragflow 服务启动完成!"
echo "访问地址: http://localhost:${SVR_HTTP_PORT:-9380}"

9
docker/stop-ragflow.sh Normal file
View File

@@ -0,0 +1,9 @@
#!/bin/bash
# 停止脚本:只停止 ragflow 服务,保留基础服务
echo "停止 ragflow 服务..."
docker-compose -f docker-compose.yml down
echo "ragflow 服务已停止"
echo "基础服务postgres、redis、minio、opensearch仍在运行"