Skip to main content

后端控制且 UI 兼容的 API 流程

warning

本教程是社区贡献的内容,不属于 Open WebUI 团队的官方支持范围。它仅作为如何针对特定用例自定义 Open WebUI 的演示。想要贡献?请查看 贡献教程


后端控制且 UI 兼容的 API 流程

本教程演示了如何实现 Open WebUI 对话的服务端编排,同时确保助手的回复在前端 UI 中正常显示。这种方法不需要前端参与,并且允许对聊天流进行完整的后端控制。 本教程已验证适用于 Open WebUI v0.6.15 版本。未来的版本可能会引入行为或 API 结构的更改。

前提条件

在跟随本教程之前,请确保您拥有:

  • 一个正在运行的 Open WebUI 实例
  • 有效的 API 身份验证令牌 (Token)
  • 对 Open WebUI 后端 API 的访问权限
  • 对 REST API 和 JSON 的基础了解
  • 命令行工具:curljq(可选,用于 JSON 解析)

概述

本教程描述了一个全面的 7 步过程,该过程支持 Open WebUI 对话的服务端编排,同时确保助手回复在前端 UI 中正常显示。

流程步骤

关键步骤如下:

  1. 使用用户消息创建新聊天 - 使用用户的输入初始化对话
  2. 使用助手消息丰富聊天响应 - 在内存中的响应对象中添加助手消息
  3. 使用助手消息更新聊天 - 将丰富后的聊天状态发送到服务器
  4. 触发助手补全 (Completion) - 生成实际的 AI 响应(可选集成知识库)
  5. 等待响应完成 - 监控助手响应直到完全生成
  6. 完成助手消息 - 将响应标记为已完成
  7. 获取并处理最终聊天 - 检索并解析完成的对话

这使得服务端编排成为可能,同时仍能让回复在前端 UI 中显示,就像通过正常用户交互生成的一样。

实现指南

关键步骤:使用助手消息丰富聊天响应

在触发补全之前,必须将助手消息作为关键前提条件添加到内存中的聊天响应对象中。这一步至关重要,因为 Open WebUI 前端期望助手消息存在于特定的结构中。

助手消息必须出现在以下两个位置:

  • chat.messages[] - 主消息数组
  • chat.history.messages[<assistantId>] - 索引后的消息历史

助手消息的预期结构:

{
"id": "<uuid>",
"role": "assistant",
"content": "",
"parentId": "<user-msg-id>",
"modelName": "gpt-4o",
"modelIdx": 0,
"timestamp": "<currentTimestamp>"
}

如果没有这一丰富过程,即使补全成功,助手的响应也不会出现在前端界面中。

分步实现

第 1 步:使用用户消息创建聊天

启动聊天并返回一个 chatId,该 ID 将在后续请求中使用。

curl -X POST https://<host>/api/v1/chats/new \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"chat": {
"title": "新聊天",
"models": ["gpt-4o"],
"messages": [
{
"id": "user-msg-id",
"role": "user",
"content": "嗨,法国的首都是哪里?",
"timestamp": 1720000000000,
"models": ["gpt-4o"]
}
],
"history": {
"current_id": "user-msg-id",
"messages": {
"user-msg-id": {
"id": "user-msg-id",
"role": "user",
"content": "嗨,法国的首都是哪里?",
"timestamp": 1720000000000,
"models": ["gpt-4o"]
}
}
}
}
}'

第 2 步:使用助手消息丰富聊天响应

在内存中的聊天响应对象中添加助手消息。请注意,这可以与第 1 步结合,即在初始创建聊天时就包含助手消息:

// Java 实现示例
public void enrichChatWithAssistantMessage(OWUIChatResponse chatResponse, String model) {
OWUIMessage assistantOWUIMessage = buildAssistantMessage(chatResponse, model, "assistant", "");
assistantOWUIMessage.setParentId(chatResponse.getChat().getMessages().get(0).getId());

chatResponse.getChat().getMessages().add(assistantOWUIMessage);
chatResponse.getChat().getHistory().getMessages().put(assistantOWUIMessage.getId(), assistantOWUIMessage);
}
note

注意: 此步骤可以在响应对象的内存中执行,或者通过在初始创建聊天时同时包含用户和空的助手消息来与第 1 步结合。

第 3 步:使用助手消息更新聊天

将包含用户和助手消息的丰富后的聊天状态发送到服务器:

curl -X POST https://<host>/api/v1/chats/<chatId> \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"chat": {
"id": "<chatId>",
"title": "新聊天",
"models": ["gpt-4o"],
"messages": [
{
"id": "user-msg-id",
"role": "user",
"content": "嗨,法国的首都是哪里?",
"timestamp": 1720000000000,
"models": ["gpt-4o"]
},
{
"id": "assistant-msg-id",
"role": "assistant",
"content": "",
"parentId": "user-msg-id",
"modelName": "gpt-4o",
"modelIdx": 0,
"timestamp": 1720000001000
}
],
"history": {
"current_id": "assistant-msg-id",
"messages": {
"user-msg-id": {
"id": "user-msg-id",
"role": "user",
"content": "嗨,法国的首都是哪里?",
"timestamp": 1720000000000,
"models": ["gpt-4o"]
},
"assistant-msg-id": {
"id": "assistant-msg-id",
"role": "assistant",
"content": "",
"parentId": "user-msg-id",
"modelName": "gpt-4o",
"modelIdx": 0,
"timestamp": 1720000001000
}
}
}
}
}'

第 4 步:触发助手补全

使用补全端点生成实际的 AI 响应:

curl -X POST https://<host>/api/chat/completions \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"chat_id": "<chatId>",
"id": "assistant-msg-id",
"messages": [
{
"role": "user",
"content": "嗨,法国的首都是哪里?"
}
],
"model": "gpt-4o",
"stream": true,
"background_tasks": {
"title_generation": true,
"tags_generation": false,
"follow_up_generation": false
},
"features": {
"code_interpreter": false,
"web_search": false,
"image_generation": false,
"memory": false
},
"variables": {
"{{USER_NAME}}": "",
"{{USER_LANGUAGE}}": "en-US",
"{{CURRENT_DATETIME}}": "2025-07-14T12:00:00Z",
"{{CURRENT_TIMEZONE}}": "Europe"
},
"session_id": "session-id"
}'

第 4.1 步:集成知识库 (RAG) 触发助手补全

对于涉及知识库或文档集的进阶用例,在补全请求中包含知识库文件:

curl -X POST https://<host>/api/chat/completions \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"chat_id": "<chatId>",
"id": "assistant-msg-id",
"messages": [
{
"role": "user",
"content": "嗨,法国的首都是哪里?"
}
],
"model": "gpt-4o",
"stream": true,
"files": [
{
"id": "knowledge-collection-id",
"type": "collection",
"status": "processed"
}
],
"background_tasks": {
"title_generation": true,
"tags_generation": false,
"follow_up_generation": false
},
"features": {
"code_interpreter": false,
"web_search": false,
"image_generation": false,
"memory": false
},
"variables": {
"{{USER_NAME}}": "",
"{{USER_LANGUAGE}}": "en-US",
"{{CURRENT_DATETIME}}": "2025-07-14T12:00:00Z",
"{{CURRENT_TIMEZONE}}": "Europe"
},
"session_id": "session-id"
}'

第 5 步:等待助手响应完成

根据您的实现需求,可以通过两种方式处理助手的响应:

选项 A:流式处理(推荐)

如果在补全请求中使用 stream: true,您可以实时处理流式响应并等待流完成。这是 OpenWebUI Web 界面使用的方法,可以提供即时反馈。

选项 B:轮询方式

对于无法处理流式传输的实现,请轮询聊天端点直到响应就绪。建议使用带有指数退避的重试机制:

// Java 实现示例
@Retryable(
retryFor = AssistantResponseNotReadyException.class,
maxAttemptsExpression = "#{${webopenui.retries:50}}",
backoff = @Backoff(delayExpression = "#{${webopenui.backoffmilliseconds:2000}}")
)
public String getAssistantResponseWhenReady(String chatId, ChatCompletedRequest chatCompletedRequest) {
OWUIChatResponse response = owuiService.fetchFinalChatResponse(chatId);
Optional<OWUIMessage> assistantMsg = extractAssistantResponse(response);

if (assistantMsg.isPresent() && !assistantMsg.get().getContent().isBlank()) {
owuiService.completeAssistantMessage(chatCompletedRequest);
return assistantMsg.get().getContent();
}

throw new AssistantResponseNotReadyException("Assistant response not ready yet for chatId: " + chatId);
}

对于手动轮询,您可以使用:


# 每隔几秒轮询一次,直到助手内容被填充
while true; do
response=$(curl -s -X GET https://<host>/api/v1/chats/<chatId> \
-H "Authorization: Bearer <token>")

# 检查助手消息是否有内容(响应已就绪)
if echo "$response" | jq '.chat.messages[] | select(.role=="assistant" and .id=="assistant-msg-id") | .content' | grep -v '""' > /dev/null; then
echo "助手响应已就绪!"
break
fi

echo "正在等待助手响应..."
sleep 2
done

第 6 步:完成助手消息

一旦助手响应就绪,将其标记为已完成:

curl -X POST https://<host>/api/chat/completed \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"chat_id": "<chatId>",
"id": "assistant-msg-id",
"session_id": "session-id",
"model": "gpt-4o"
}'

第 7 步:获取最终聊天

检索完成的对话:

curl -X GET https://<host>/api/v1/chats/<chatId> \
-H "Authorization: Bearer <token>"

其他有用的 API 端点

获取知识库集合

检索用于 RAG 集成的知识库信息:

curl -X GET https://<host>/api/v1/knowledge/<knowledge-id> \
-H "Authorization: Bearer <token>"

获取模型信息

获取特定模型的详细信息:

curl -X GET https://<host>/api/v1/models/model?id=<model-name> \
-H "Authorization: Bearer <token>"

向聊天发送额外消息

对于多轮对话,您可以向现有聊天发送额外消息:

curl -X POST https://<host>/api/v1/chats/<chatId> \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"chat": {
"id": "<chatId>",
"messages": [
{
"id": "new-user-msg-id",
"role": "user",
"content": "你能告诉我更多关于这个的信息吗?",
"timestamp": 1720000002000,
"models": ["gpt-4o"]
}
],
"history": {
"current_id": "new-user-msg-id",
"messages": {
"new-user-msg-id": {
"id": "new-user-msg-id",
"role": "user",
"content": "你能告诉我更多关于这个的信息吗?",
"timestamp": 1720000002000,
"models": ["gpt-4o"]
}
}
}
}
}'

响应处理

解析助手响应

助手响应可能被包裹在 Markdown 代码块中。以下是如何清理它们的方法:


# 助手的原始响应示例
raw_response='```json
{
"result": "法国的首都是巴黎。",
"confidence": 0.99
}
```'

# 清理响应(移除 Markdown 包裹器)
cleaned_response=$(echo "$raw_response" | sed 's/^```json//' | sed 's/```$//' | sed 's/^[[:space:]]*//' | sed 's/[[:space:]]*$//')

echo "$cleaned_response" | jq '.'

此清理过程处理:

  • 移除 ````json` 前缀
  • 移除 ```` 后缀
  • 修剪空白字符
  • JSON 校验

API 参考

DTO 结构

聊天 DTO (完整结构)

{
"id": "chat-uuid-12345",
"title": "新聊天",
"models": ["gpt-4o"],
"files": [],
"tags": [
{
"id": "tag-id",
"name": "important",
"color": "#FF5733"
}
],
"params": {
"temperature": 0.7,
"max_tokens": 1000
},
"timestamp": 1720000000000,
"messages": [
{
"id": "user-msg-id",
"role": "user",
"content": "嗨,法国的首都是哪里?",
"timestamp": 1720000000000,
"models": ["gpt-4o"]
},
{
"id": "assistant-msg-id",
"role": "assistant",
"content": "",
"parentId": "user-msg-id",
"modelName": "gpt-4o",
"modelIdx": 0,
"timestamp": 1720000001000
}
],
"history": {
"current_id": "assistant-msg-id",
"messages": {
"user-msg-id": {
"id": "user-msg-id",
"role": "user",
"content": "嗨,法国的首都是哪里?",
"timestamp": 1720000000000,
"models": ["gpt-4o"]
},
"assistant-msg-id": {
"id": "assistant-msg-id",
"role": "assistant",
"content": "",
"parentId": "user-msg-id",
"modelName": "gpt-4o",
"modelIdx": 0,
"timestamp": 1720000001000
}
}
},
"currentId": "assistant-msg-id"
}

ChatCompletionsRequest DTO

{
"chat_id": "chat-uuid-12345",
"id": "assistant-msg-id",
"messages": [
{
"role": "user",
"content": "嗨,法国的首都是哪里?"
}
],
"model": "gpt-4o",
"stream": true,
"background_tasks": {
"title_generation": true,
"tags_generation": false,
"follow_up_generation": false
},
"features": {
"code_interpreter": false,
"web_search": false,
"image_generation": false,
"memory": false
},
"variables": {
"{{USER_NAME}}": "",
"{{USER_LANGUAGE}}": "en-US",
"{{CURRENT_DATETIME}}": "2025-07-14T12:00:00Z",
"{{CURRENT_TIMEZONE}}": "Europe"
},
"session_id": "session-uuid-67890",
"filter_ids": [],
"files": [
{
"id": "knowledge-collection-id",
"type": "collection",
"status": "processed"
}
]
}

ChatCompletedRequest DTO

{
"model": "gpt-4o",
"chat_id": "chat-uuid-12345",
"id": "assistant-msg-id",
"session_id": "session-uuid-67890",
"messages": [
{
"id": "user-msg-id",
"role": "user",
"content": "嗨,法国的首都是哪里?",
"timestamp": 1720000000000,
"models": ["gpt-4o"]
},
{
"id": "assistant-msg-id",
"role": "assistant",
"content": "法国的首都是巴黎。",
"parentId": "user-msg-id",
"modelName": "gpt-4o",
"modelIdx": 0,
"timestamp": 1720000001000
}
]
}

ChatCompletionMessage DTO

{
"role": "user",
"content": "嗨,法国的首都是哪里?"
}

历史 DTO

{
"current_id": "assistant-msg-id",
"messages": {
"user-msg-id": {
"id": "user-msg-id",
"role": "user",
"content": "嗨,法国的首都是哪里?",
"timestamp": 1720000000000,
"models": ["gpt-4o"]
},
"assistant-msg-id": {
"id": "assistant-msg-id",
"role": "assistant",
"content": "法国的首都是巴黎。",
"parentId": "user-msg-id",
"modelName": "gpt-4o",
"modelIdx": 0,
"timestamp": 1720000001000
}
}
}

消息 DTO (完整结构)

{
"id": "msg-id",
"role": "user",
"content": "嗨,法国的首都是哪里?",
"timestamp": 1720000000000,
"models": ["gpt-4o"]
}
{
"id": "assistant-msg-id",
"role": "assistant",
"content": "法国的首都是巴黎。",
"parentId": "user-msg-id",
"modelName": "gpt-4o",
"modelIdx": 0,
"timestamp": 1720000001000
}

响应示例

创建聊天响应

{
"success": true,
"chat": {
"id": "chat-uuid-12345",
"title": "新聊天",
"models": ["gpt-4o"],
"files": [],
"tags": [],
"params": {},
"timestamp": 1720000000000,
"messages": [
{
"id": "user-msg-id",
"role": "user",
"content": "嗨,法国的首都是哪里?",
"timestamp": 1720000000000,
"models": ["gpt-4o"]
}
],
"history": {
"current_id": "user-msg-id",
"messages": {
"user-msg-id": {
"id": "user-msg-id",
"role": "user",
"content": "嗨,法国的首都是哪里?",
"timestamp": 1720000000000,
"models": ["gpt-4o"]
}
}
},
"currentId": "user-msg-id"
}
}

最终聊天响应(补全后)

{
"id": "chat-uuid-12345",
"title": "关于法国首都的讨论",
"models": ["gpt-4o"],
"files": [],
"tags": [
{
"id": "auto-tag-1",
"name": "geography",
"color": "#4CAF50"
}
],
"params": {},
"timestamp": 1720000000000,
"messages": [
{
"id": "user-msg-id",
"role": "user",
"content": "嗨,法国的首都是哪里?",
"timestamp": 1720000000000,
"models": ["gpt-4o"]
},
{
"id": "assistant-msg-id",
"role": "assistant",
"content": "法国的首都是巴黎。巴黎不仅是首都,也是法国人口最多的城市,以埃菲尔铁塔、卢浮宫博物馆和巴黎圣母院等标志性建筑而闻名。",
"parentId": "user-msg-id",
"modelName": "gpt-4o",
"modelIdx": 0,
"timestamp": 1720000001000
}
],
"history": {
"current_id": "assistant-msg-id",
"messages": {
"user-msg-id": {
"id": "user-msg-id",
"role": "user",
"content": "嗨,法国的首都是哪里?",
"timestamp": 1720000000000,
"models": ["gpt-4o"]
},
"assistant-msg-id": {
"id": "assistant-msg-id",
"role": "assistant",
"content": "法国的首都是巴黎。巴黎不仅是首都,也是法国人口最多的城市,以埃菲尔铁塔、卢浮宫博物馆和巴黎圣母院等标志性建筑而闻名。",
"parentId": "user-msg-id",
"modelName": "gpt-4o",
"modelIdx": 0,
"timestamp": 1720000001000
}
}
},
"currentId": "assistant-msg-id"
}

标签 DTO

{
"id": "tag-uuid-123",
"name": "geography",
"color": "#4CAF50"
}

OWUIKnowledge DTO (知识库集合)

{
"id": "knowledge-collection-id",
"type": "collection",
"status": "processed",
"name": "地理知识库",
"description": "包含世界地理和首都的信息",
"created_at": 1720000000000,
"updated_at": 1720000001000
}

知识库集合响应

{
"id": "knowledge-collection-id",
"name": "地理知识库",
"description": "包含世界地理和首都的信息",
"type": "collection",
"status": "processed",
"files_count": 15,
"total_size": 2048576,
"created_at": 1720000000000,
"updated_at": 1720000001000,
"metadata": {
"indexing_status": "complete",
"last_indexed": 1720000001000
}
}

模型信息响应

{
"id": "gpt-4o",
"name": "GPT-4 Optimized",
"model": "gpt-4o",
"base_model_id": "gpt-4o",
"meta": {
"description": "Most advanced GPT-4 model optimized for performance",
"capabilities": ["text", "vision", "function_calling"],
"context_length": 128000,
"max_output_tokens": 4096
},
"params": {
"temperature": 0.7,
"top_p": 1.0,
"frequency_penalty": 0.0,
"presence_penalty": 0.0
},
"created_at": 1720000000000,
"updated_at": 1720000001000
}

Field Reference Guide

Required vs Optional Fields

Chat Creation - Required Fields:

  • title - Chat title (string)
  • models - Array of model names (string[])
  • messages - Initial message array

Chat Creation - Optional Fields:

  • files - Knowledge files for RAG (defaults to empty array)
  • tags - Chat tags (defaults to empty array)
  • params - Model parameters (defaults to empty object)

Message Structure - User Message:

  • Required: id, role, content, timestamp, models
  • Optional: parentId (for threading)

Message Structure - Assistant Message:

  • Required: id, role, content, parentId, modelName, modelIdx, timestamp
  • Optional: Additional metadata fields

ChatCompletionsRequest - Required Fields:

  • chat_id - Target chat ID
  • id - Assistant message ID
  • messages - Array of ChatCompletionMessage
  • model - Model identifier
  • session_id - Session identifier

ChatCompletionsRequest - Optional Fields:

  • stream - Enable streaming (defaults to false)
  • background_tasks - Control automatic tasks
  • features - Enable/disable features
  • variables - Template variables
  • filter_ids - Pipeline filters
  • files - Knowledge collections for RAG

Field Constraints

Timestamps:

  • Format: Unix timestamp in milliseconds
  • Example: 1720000000000 (July 4, 2024, 00:00:00 UTC)

UUIDs:

  • All ID fields should use valid UUID format
  • Example: 550e8400-e29b-41d4-a716-446655440000

Model Names:

  • Must match available models in your Open WebUI instance
  • Common examples: gpt-4o, gpt-3.5-turbo, claude-3-sonnet

Session IDs:

  • Can be any unique string identifier
  • Recommendation: Use UUID format for consistency

Knowledge File Status:

  • Valid values: "processed", "processing", "error"
  • Only use "processed" files for completions

Important Notes

  • This workflow is compatible with Open WebUI + backend orchestration scenarios
  • Critical: The assistant message enrichment must be done in memory on the response object, not via API call
  • Alternative Approach: You can include both user and assistant messages in the initial chat creation (Step 1) instead of doing Step 2 separately
  • No frontend code changes are required for this approach
  • The stream: true parameter allows for real-time response streaming if needed
  • Response Monitoring: Use streaming for real-time processing or polling for simpler implementations that cannot handle streams
  • Background tasks like title generation can be controlled via the background_tasks object
  • Session IDs help maintain conversation context across requests
  • Knowledge Integration: Use the files array to include knowledge collections for RAG capabilities
  • Response Parsing: Handle JSON responses that may be wrapped in markdown code blocks
  • Error Handling: Implement proper retry mechanisms for network timeouts and server errors

Summary

Use the Open WebUI backend APIs to:

  1. Start a chat - Create the initial conversation with user input
  2. Enrich with assistant message - Add assistant placeholder to the response object in memory (can be combined with Step 1)
  3. Update chat state - Send the enriched chat to the server
  4. Trigger a reply - Generate the AI response (with optional knowledge integration)
  5. Monitor completion - Wait for the assistant response using streaming or polling
  6. Complete the message - Mark the response as completed
  7. Fetch the final chat - Retrieve and parse the completed conversation

Enhanced Capabilities:

  • RAG Integration - Include knowledge collections for context-aware responses
  • Asynchronous Processing - Handle long-running AI operations with streaming or polling
  • Response Parsing - Clean and validate JSON responses from the assistant
  • Session Management - Maintain conversation context across requests

This enables backend-controlled workflows that still appear properly in the Web UI frontend chat interface, providing seamless integration between programmatic control and user experience.

The key advantage of this approach is that it maintains full compatibility with the Open WebUI frontend while allowing complete backend orchestration of the conversation flow, including advanced features like knowledge integration and asynchronous response handling.

Testing

You can test your implementation by following the step-by-step CURL examples provided above. Make sure to replace placeholder values with your actual:

  • Host URL
  • Authentication token
  • Chat IDs
  • Message IDs
  • Model names
tip

Start with a simple user message and gradually add complexity like knowledge integration and advanced features once the basic flow is working.