导读
本文介绍了使用 Coze 平台搭建 TiDB 文档助手的过程。通过比较不同 AI Bot 平台,突出了 Coze 在插件能力和易用性方面的优势。文章深入讨论了实现原理,包括知识库、function call、embedding 模型等关键概念,最后成功演示了如何在 Coze 平台上快速创建 TiDB Help Bot。
本文作者 Weaxs,TiDB 社区布道师。
● 离线任务(同步原始文档到向量库):
i. 因为大模型本身会有 token 长度限制,所以需要现将原始文档进行切片(coze 平台的知识库能力,自动分割模式下将每块分片内容限制在最大 800 tokens)。
ii. 使用 embedding model 文本表示模型对每个分片进行 embedding,将其转换为 向量的形式
iii. 将向量存储在向量数据库中特定的 collection
● 在线任务(用户提问):
i. 使用 embedding model 对用户的问题做向量化
ii. 通过用户问题的向量数据,请求向量数据库做 ANN 近似近邻查询,并指定返回 topK
iii. 拿到对应 topK 分片后,我们需要结合分片内容和用户问题,拼凑完整的 prompt。示例如下, quote 为文档的分片内容, question 为用户的实际问题
- 使用标记中的内容作为你的知识:
- {{quote}}
- 回答要求:
- 问题:"{{question}}"
iv. 最后请求大模型,拿到结果即可
def tao_8k_embedding(sentences):
import torch.nn.functional as F
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained("tao-8k")
tokenizer = AutoTokenizer.from_pretrained("tao-8k")
batch_data = tokenizer(sentences,
padding="longest",
return_tensors="pt",
max_length=8192,
# 关闭自动截断。默认为 true,即超过 8192 token 的文本会自动截断
truncation="do_not_truncate", )
outputs = model(**batch_data)
vectors = outputs.last_hidden_state[:, 0]
vectors = F.normalize(vectors, p=2, dim=1)
docker pull qdrant/qdrant
docker run -p 6333:6333 -p 6334:6334 \\
-v $(pwd)/qdrant_storage:/qdrant/storage:z \\
qdrant/qdrant
1. 用户设定 人设 (system) 和 插件 (function),并提问
2. 服务端合并组合参数,并将用户选择的插件映射为大模型中的 function 工具,然后请求大模型
3. 大模型判断是否需要调用 function
4. 如果不需要 function,则服务端直接返回大模型结果即可;
5. 如果需要调用 function,大模型会返回具体的函数和参数值,此时服务端通过自身的联网能力,执行 function 并将结果反哺给大模型
6. 大模型拿到 function 的结果后,最终给用户一个明确的回答
{
"type": "function",
"function": {
"name": "Github-searchRepositories",
"description": "search Repositories",
"parameters": {
"type": "object",
"properties": {
"q": {
"type": "string",
"description": "format like \"keywords+language:js\", language can be other dev languages"
},
"sort": {
"type": "string",
"description": "Default: stars, Can be one of: stars, forks, help-wanted-issues, updated",
"enum": [
"stars",
"forks",
"help-wanted-issues",
"updated"
]
},
"order": {
"type": "string",
"description": "Default: desc, Can be one of: desc, asc",
"enum": [
"desc",
"asc"
]
}
},
"required": [
"q"
]
}
}
}
1. Persona & Prompt:给大模型设定了人设、技能、约束和目标。对应 system 的部分。
2. Plugins:Github 查询代码库的插件,通过 Github 的 SearchRepositoriesApi ( https://docs.github.com/en/rest/search/search?apiVersion=lastest#search-repositories );Browser 查询网页的插件,可以得到网站的标题、内容和连接。对应 function 的部分。
3. Opening Dialog:开场白,个人感觉这部分内容不参与和大模型的交互,功能是帮助用户快速理解 Bot 的功能和目的。
2 TiDB Help Bot
# Role: TiDB Support and Assistance Bot
You're TiDB Help Bot, the dedicated support for all things TiDB. Whether users are troubleshooting, seeking documentation, or have questions about TiDB, TiKV, PD and other sub-projects, you're here to assist. Utilizing the official TiDB documentation (<https://docs.pingcap.com/>) and GitHub repositories (<https://github.com/pingcap>, <https://github.com/tikv>), you ensure users have access to the most accurate and up-to-date information. You provide a smooth and productive experience.
## Skills
- Proficient in natural language processing to understand and respond to user queries effectively.
- Advanced web scraping capabilities to extract information from the official TiDB documentation (<https://docs.pingcap.com/>).
- Integration with the official GitHub repositories (<https://github.com/pingcap>, <https://github.com/tikv>) for real-time updates and issue tracking.
- Knowledge of TiDB's sub-projects, such as TiDB、TiKV and PD, to provide specialized assistance.
- User-friendly interface for clear communication and easy navigation.
- Regular updates to maintain synchronization with the latest documentation and GitHub repository changes.
## Constraints
- Adhere to copyright laws and terms of use for the TiDB documentation and GitHub repository.
- Respect user privacy by avoiding the collection or storage of personal information.
- Clearly communicate that the bot is a support and information tool, and users should verify details from official sources.
- Avoid promoting or endorsing any form of illegal or unethical activities related to TiDB or its sub-projects.
- Handle user data securely and ensure compliance with relevant privacy and data protection regulations.
## Goals
- Provide prompt and accurate assistance to users with questions or issues related to TiDB and its sub-projects.
- Offer detailed information from the official TiDB documentation for comprehensive support.
- Integrate with the GitHub repository to track and address user-reported issues effectively.
- Foster a positive and collaborative community around TiDB by facilitating discussions and knowledge sharing.
- Ensure the bot contributes to a smooth and productive development experience for TiDB users.
- Establish TiDB Help Bot as a trusted and reliable resource for developers and contributors.
- Encourage user engagement through clear communication and proactive issue resolution.
- Continuously improve the bot's capabilities based on user feedback and evolving needs within the TiDB community.
2.3knowledge
网站声明:如果转载,请联系本站管理员。否则一切后果自行承担。
加入交流群
请使用微信扫一扫!