朗链的人工智能驱动顾问

#AI #大数据 #编码 #数据分析 #数据管理 #人工智能代理

查看配置文件

查看更多文章

This story is about an AI-driven consultant chatbot I have worked on based on LangChain 和 Chainlit. This bot asks potential customers about their problems in the enterprise data space, developing on the go a dynamic questionnaire to better underst和 the problems. After gathering enough information about the user’s problem, it gives advice to solve it. 在提出问题的同时，它还试图检查用户是否感到困惑，是否需要回答一些问题. If that is the case, it tries to reply to it.

The chatbot is built around a knowledge base about topics related to AI governance, 安全, 数据质量, 等. But you could use other topics of your choice.

This knowledge base is stored in a vector database (做) 和 is used at every step to either generate questions or give advice.

This chatbot could however be based on any knowledge base 和 used in different contexts. So it you can take it as a blueprint for other consulting chatbots.

交互流

聊天机器人的正常流程很简单:用户提问，机器人回答，以此类推. The bot normally remembers the previous interactions, so there is a history.

The interaction of this bot is however different. 它是这样的:

人工智能驱动的顾问聊天机器人互动

在这种情况下，聊天机器人会问一个问题, the user answers 和 repeats this interaction a couple of times. 如果积累的知识足以回答问题或问题的数量达到一定的阈值, 给出一个回应, 否则就会问另一个问题.

粗略的体系结构

Here are the participants in this application:

Main participants in the AI driven consultant chatbot

我们有4个参与者:

用户
在ChatGPT、知识库和用户之间协调工作流的应用程序.
ChatGPT 4 (gpt-4-0613)
知识库(采用矢量数据库) 做)

我们已经尝试了ChatGPT 3.5 but the results were not that great 和 it was hard to generate meaningful questions. ChatGPT 4 (gpt-4-0613)似乎提供了更好的问题和建议，也更稳定.

We have also experimented 与 the latest ChatGPT 4 model (gpt-4–1106-preview, GPT 4 Turbo), but we have frequently experienced unexpected results from the OpenAI function calls. So we would often see error logs like this one here:

文件“pydantic /主要.Py "，第341行，pydantic.main.BaseModel.__init__

pydantic.error_wrappers.ValidationError: 2 validation errors for ResponseTags

extracted_questions

必需的字段(type=value_error).失踪)

questions_related_to_data_analytics

必需的字段(type=value_error).失踪)

工作流-它是如何工作的

This diagramme shows how the tool works internally:

聊天机器人工作流

以下是工作流程的步骤:

The tool asks the user a pre-defined question. 这是典型的:”
Which area of your data ecosystem are you most concerned about?”
用户回答最初的问题
The chatbot checks whether the user reply contains a legitimate question (i.e. 一个不离题的问题)
- if yes, then a simple query agent is started to clarify the question. This simple agent uses ChatGPT 4 和 the DuckDuckGo search engine.
Now the chatbot decides whether it should generate more questions or give advice. This decision is influenced by a simple rule: in case there are less than 4 questions, 还有一个问题, 否则，我们让ChatGPT决定是给出建议还是继续提问.
- If the decision is to continue asking questions, 查询带有知识库的向量数据库以检索与用户答案最相似的文本. 向量数据库搜索结果包含问题和答案，并发送给ChatGPT 4以生成更多问题.
—如果决策是给出建议，则查询知识库中的所有问题和答案. 知识库中最相似的部分被提取出来，并与整个问卷(问题和答案)一起包含在ChatGPT的建议生成提示中. 给出建议后，流程终止.

实现

The whole implementation can be found in this repository:

GitHub - onepointconsulting/data-questionnaire-agent: Data Questionnaire 代理 Chatbot

项目的安装说明可以在项目的README文件中找到:

http://github.com/onepointconsulting/data-questionnaire-agent/blob/main/README.md

应用程序模块

bot包含一个服务模块，您可以在其中找到与ChatGPT 4交互并执行某些操作的所有服务, like generating the PDF report 和 sending an email to the user.

服务

这是包含服务的文件夹:

http://github.com/onepointconsulting/data-questionnaire-agent/tree/main/data_questionnaire_agent/service

最重要的服务是:

咨询服务 -创建Langchain LLMChain 用于生成建议. LLMChain使用 OpenAI功能, like most of the chains in this application. The output schema for this function is in the openai_schema.py文件.
澄清剂 -创建Lanchain 代理用于澄清任何合法的用户操作. 用的也很简单 OpenAI功能在引擎盖下.
嵌入服务 -从知识库中创建基于OpenAI的嵌入，知识库应该是一个文本文档列表.
html发电机 — functions used to generate HTML for email 和 PDF generation
初始问题服务 -创建 LLMChain which generates the first question after the user’s answer to the initial question.
邮件发送者 -用来发送电子邮件.
问题生成服务 — used to generate all questions except for the first generated question. 它还使用了Langchain LLMChain 与 OpenAI功能.
相似性搜索 - used使用做. 最有趣的函数是similarity_search函数，它执行多次搜索，以最大化发送到ChatGPT 4的令牌数量，直到达到限制
标签服务 — Used to figure out if the user has legitimate questions in his answers to questions. 在这项服务中，我们使用的是LangChain的 create_tagging_chain_pydantic 方法生成标记链.

数据结构

There is a module 与 the data structures in this application:

http://github.com/onepointconsulting/data-questionnaire-agent/tree/main/data_questionnaire_agent/model

在这种情况下，我们有两个模块:

应用程序模式: Contains all data classes which are used for operating the application像我一样.e. 问卷类:
openai模式的上下文中使用的所有类 OpenAI功能像我一样.e.:

用户界面

这是一个模块 Chainlit 基于用户界面代码:

http://github.com/onepointconsulting/data-questionnaire-agent/tree/main/data_questionnaire_agent/ui

该文件用主实现的 Chainlit 用户界面为:

data_questionnaire_chainlit.py. 它包含应用程序的主要入口点以及运行代理的逻辑.

该文件中包含工作流实现的方法是process_questionnaire.

全球最大的博彩平台UI的注意事项

The Chainlit version was forked from version 0.7.0 和 modified to meet some requirements given to us. The project should work however using more modern Chainlit versions.

提示

We have separated the prompts from the Python code 和 used a toml 存档:

http://github.com/onepointconsulting/data-questionnaire-agent/blob/main/prompts.toml

提示符使用分隔符将que指令与知识库、问题和答案分开. ChatGPT 4 seems to underst和 delimiters well, unlike ChatGPT 3.5，这很容易混淆. Here is an example of the prompt used for question generation:

(问卷调查)

(调查问卷.最初的)

question = "Which area of your data ecosystem are you most concerned about?"

system_message =“您是数据集成和治理专家，可以询问有关数据集成和治理的问题，以帮助客户解决数据集成和治理问题”

human_message = """基于最佳实践和知识库以及对客户回答的问题的回答, \

请生成有助于该客户解决数据集成和治理问题的{questions_per_batch}问题.

最佳实践部分从==== best practices START ====开始，以==== best practices END ====结束.

知识库部分以==== knowledge base START ====开始，以==== knowledge base END ====结束.

向用户提出的问题以==== question ====开始，以==== question END ====结束.

客户提供的用户回答以==== answer ====开头，以==== answer END ====结尾.

====知识库启动====

{knowledge_base}

====知识库端====

====问题====

{问题}

====问题结束====

====回答====

{答案}

====回答结束====

"""

(调查问卷.二次)

system_message =“您是一名英国数据集成和治理专家，可以询问有关数据集成和治理的问题，以帮助客户解决数据集成和治理问题”

human_message = """基于最佳实践和知识库以及客户回答的多个问题的答案, \

请生成有助于该客户解决数据集成问题的{questions_per_batch}问题, 治理和质量问题.

知识库部分以==== knowledge base START ====开始，以==== knowledge base END ====结束.

客户回答的问题和回答部分以====问卷====开始，以====问卷结束====结束.

用户答案位于以==== answers ====开始，以==== answers END ====结束的部分中.

====知识库启动====

{knowledge_base}

====知识库端====

====问卷====

{questions_answers}

====问卷结束====

====回答====

{答案}

====答案结束====

"""

As you can see we are using delimiter sections like e.g: ====知识库启动====or ====知识库端====

外卖

We have tried to build meaningful interactions using ChatGPT 3.5, but this model could not underst和 well the prompt delimiters, 而ChatGPT 4 (gpt-4-0613)可以做到这一点，并允许我们与用户进行有意义的交互. 因此，我们为这个应用程序选择了ChatGPT 4.

就像我们之前提到的, we tried to replace gpt-4–0613 与 gpt-4–1106-preview, 但结果并不好. 函数调用经常失败.

当我们开始这个项目时，我们每分钟有10000个令牌的限制，这导致了一些恼人的错误. 但现在OpenAI将限制增加到30万个令牌，这增加了应用程序的稳定性:

增加了每分钟令牌的限制

另一个重要的收获是，你需要非常小心地限制互动的范围, otherwise your bot might be misused for something else, 就像这个例子:

离题问题

但是我们找到了一种方法来防止它，并且机器人可以识别离题问题(参见[标签]部分) promps.toml 文件):

最后的结论是，ChatGPT4能够应对这一挑战，生成一种有意义的顾问式交互，它可以生成一个开放式问卷，以一系列有意义的建议结束.

朗链的人工智能驱动顾问

分享

交互流

粗略的体系结构

工作流-它是如何工作的

实现

应用程序模块

服务

数据结构

用户界面

全球最大的博彩平台UI的注意事项

提示

外卖

相关的帖子

由Fuzzy Labs发布

Fuzzy Labs release free tool to export Google Analytics data into Google BigQuery

由S和yx发布

Case study: Rescuing Sainsbury's from their Software Ordeal

由Datum数据中心发布

全球最大的博彩平台数据中心你不知道的10件事

由D55发布

D55 presents Disruptive Innovation @ Manchester Tech Incubator

由伍德赫斯特咨询公司发布

AI in transaction monitoring – Two birds, one stone

由伍德赫斯特咨询公司发布

Machine Learning for the foreseeable future

由伍德赫斯特咨询公司发布

你不仅仅是你的信用评分

由伍德赫斯特咨询公司发布

在云端翱翔

由伍德赫斯特咨询公司发布

了解整体情况

由伍德赫斯特咨询公司发布

Imperfect Intelligence, Part I – Garbage Data

十大正规博彩网站评级