🌠 通过优化检索和重排序模型改进 RAG#

在本教程中，我们将展示如何通过优化检索和重排序模型来改进 RAG（检索增强生成）模型。为此，我们将使用 ArgillaTrainer 在两个包含我们自己数据的数据集上微调 bi-encoder 和 cross-encoder。

步骤如下：

📝 使用 Haystack 设置 RAG 的 QA 管道
🗃️ 获取答案和上下文以创建我们自己的数据集
📩 创建 Argilla 数据集并将它们推送到 Argilla UI
💫 微调 bi-encoder 和 cross-encoder
🌌 使用我们微调的模型来改进原始 RAG 模型

简介#

LLM 在我们的日常生活中已成为现实。它们被用于搜索引擎、聊天机器人和问答系统。然而，它们并非完美。它们经常产生不相关、不准确或无法验证的响应。为了解决这个问题，引入了 RAG（检索增强生成）。

RAG 是一个框架，它使用预训练的 LLM 和检索模型来提高响应质量。检索模型用于从知识库（网络或您的文档）检索相关信息，这使其对用户更值得信赖。此外，RAG 解决了常见的 LLM 缺点，因为它可以提供最新的和特定领域的数据（甚至引用其来源），并且更高效且经济实惠（无需从头开始重新训练模型）。

为了优化检索模型，可以使用句子相似度模型。为什么？为了通过查找用户的意图来提高检索信息的准确性和相关性。这通过将文本转换为嵌入（表示语义信息的向量）并计算它们之间的相似度来完成，从而“理解”输入文本的含义。

在本教程中，我们将使用双编码器（更快但准确性较低）和交叉编码器（更慢但准确性更高）微调句子相似度模型。双编码器为数据和查询创建句子嵌入，然后通过计算向量之间的相似度来比较它们。交叉编码器不使用句子嵌入，而是对数据对进行分类，并输出一个值，指示它们之间的相似度。它们可以独立用于检索器中，也可以一起使用，如下图所示，其中检索是初始步骤，涉及搜索庞大的数据集或集合，以识别可能与给定查询或信息需求相关的候选文档、段落或句子子集。在此之后，进行重排序阶段，其中最初检索到的候选对象会根据它们与查询的实际相关性进行重新评估和重新组织。

运行 Argilla#

对于本教程，您需要运行 Argilla 服务器。部署和运行 Argilla 有两个主要选项：

在 Hugging Face Spaces 上部署 Argilla：如果您想使用外部 Notebook（例如 Google Colab）运行教程，并且您在 Hugging Face 上有一个帐户，您只需点击几下即可在 Spaces 上部署 Argilla

有关配置部署的详细信息，请查看 Hugging Face Hub 官方指南。

使用 Argilla 的快速入门 Docker 镜像启动 Argilla：如果您想在本地机器上运行 Argilla，这是推荐选项。请注意，此选项仅允许您在本地运行教程，而不能使用外部 Notebook 服务。

有关部署选项的更多信息，请查看文档的“部署”部分。

提示

本教程是一个 Jupyter Notebook。有两种运行它的选项：

使用此页面顶部的“在 Colab 中打开”按钮。此选项允许您直接在 Google Colab 上运行 Notebook。不要忘记将运行时类型更改为 GPU 以加快模型训练和推理速度。
单击页面顶部的“查看源代码”链接下载 .ipynb 文件。此选项允许您下载 Notebook 并在本地机器或您选择的 Jupyter Notebook 工具上运行它。

设置环境#

要完成本教程，您需要使用 pip 安装 Argilla 客户端和一些第三方库

[ ]:

# %pip install --upgrade pip
%pip install argilla -qqq
%pip install datasets
%pip install sentence-transformers
%pip install farm-haystack[colab,faiss,inference]

让我们进行所需的导入

[2]:

import argilla as rg
from argilla.feedback import TrainingTask
from argilla.feedback import ArgillaTrainer

import random
import locale
import re

from datasets import load_dataset
from tqdm import tqdm

from haystack.document_stores import FAISSDocumentStore
from haystack.nodes import PreProcessor, TextConverter, EmbeddingRetriever, PromptNode, PromptTemplate, AnswerParser, SentenceTransformersRanker
from haystack.pipelines import Pipeline
from haystack.pipelines.standard_pipelines import TextIndexingPipeline

如果您正在使用 Docker 快速入门镜像或公共 Hugging Face Spaces 运行 Argilla，您需要使用 URL 和 API_KEY 初始化 Argilla 客户端

[ ]:

# Replace api_url with the url to your HF Spaces URL if using Spaces
# Replace api_key if you configured a custom API key
# Replace workspace with the name of your workspace
rg.init(
    api_url="https://:6900",
    api_key="owner.apikey",
    workspace="admin"
)

如果您正在运行私有 Hugging Face Space，您还需要按如下方式设置 HF_TOKEN

[ ]:

# # Set the HF_TOKEN environment variable
# import os
# os.environ['HF_TOKEN'] = "your-hf-token"

# # Replace api_url with the url to your HF Spaces URL
# # Replace api_key if you configured a custom API key
# rg.init(
#     api_url="https://[your-owner-name]-[your_space_name].hf.space",
#     api_key="admin.apikey",
#     extra_headers={"Authorization": f"Bearer {os.environ['HF_TOKEN']}"},
# )

启用遥测#

我们从您与教程的交互中获得有价值的见解。为了改进我们自己，为您提供最合适的内容，使用以下代码行将帮助我们了解本教程是否有效地为您服务。虽然这是完全匿名的，但如果您愿意，可以选择跳过此步骤。有关更多信息，请查看遥测页面。

[ ]:

try:
    from argilla.utils.telemetry import tutorial_running
    tutorial_running()
except ImportError:
    print("Telemetry is introduced in Argilla 1.20.0 and not found in the current installation. Skipping telemetry.")

数据集#

可用于句子相似度的数据集可以是各种类型：积极相似句子对（句子压缩），用于自然语言推理（snli），带有句子标签（QQP_triplets）等。您可以在此处找到有关不同类型数据集的更多信息。

在本示例中，我们想使用我们自己的数据创建两个数据集，并在注释后使用句子对和标签：一个用于优化检索模型，比较查询和上下文，另一个用于优化重排序模型，比较上下文。为此，我们将使用 Haystack 的生成式 QA 管道和 RAG 从知识库中获取上下文和答案。然后，我们将使用相应的 FeedbackDatasetTemplates 将它们上传到 Argilla，以便在 Argilla UI 中工作。

生成响应#

Haystack 是一个开源框架，为各种 NLP 任务构建端到端管道提供手段。它是模型无关的，可用于问答、文档搜索和摘要等任务。在本教程中，我们将使用 Haystack 为我们的数据集生成响应。

首先，我们从 HuggingFace 下载以下数据集，其中包含有关 Argilla Cloud 的问题，该数据集在本教程中创建。然后，我们下载将用作 RAG 模型输入数据的文本文件。

[ ]:

# Load the questions
dataset = load_dataset("argilla/cloud_assistant_questions")

# Download the document for RAG
# locale.getpreferredencoding = lambda: "UTF-8" # Run it if UTF-8 encoding error
!curl https://hugging-face.cn/datasets/argilla/cloud_assistant_questions/raw/main/argilla_cloud.txt > argilla_cloud.txt

现在，让我们使用预定义的 TextIndexingPipeline 预处理我们的文档。此管道允许我们初始化 DocumentStore（检索器的数据库）、PreProcessor（用于清理文档并将其拆分为更小的单元）和 TextConverter（用于将 txt 文件转换为 Document 对象）。

[ ]:

# Initialize the DocumentStore
document_store = FAISSDocumentStore(faiss_index_factory_str="Flat", similarity="dot_product", embedding_dim=384)

# Initialize the PreProcessor
preprocessor = PreProcessor(
    clean_empty_lines=True,
    clean_whitespace=True,
    clean_header_footer=False,
    split_by="word",
    split_length=100,
    split_respect_sentence_boundary=True,
)

# Initialize the TextConverter
text_converter = TextConverter()

# Run the TextIndexingPipeline
pipeline = TextIndexingPipeline(document_store, text_converter, preprocessor)
result = pipeline.run(file_path="argilla_cloud.txt")

准备好文件后，我们初始化检索器和提示节点。在我们的例子中，我们将使用密集检索器 EmbeddingRetriever，它计算文档和查询的嵌入。最常见的架构以及我们将在本博文中使用的架构是 sentence-transformers，您可以在此处找到有关它的更多信息。

关于 PromptNode，我们将使用 flan-t5-large 模型，这是 Google 在 HuggingFaces 中提供的开源 LLM，尽管可以使用更多模型。它将使用我们定义的自定义提示模板 rag_prompt。

# Use this code if you prefer to use the OpenAI API for the prompt node
# Remember to add your OpenAI API key for generation
prompt_node = PromptNode(
   model_name_or_path="text-davinci-003", api_key='api_key', default_prompt_template=rag_prompt
)

有关更多信息，请查看文档。

[ ]:

# Initialize the EmbeddingRetriever
retriever = EmbeddingRetriever(
    document_store=document_store, embedding_model="sentence-transformers/multi-qa-mpnet-base-dot-v1"
)
document_store.update_embeddings(retriever)

# Write the prompt for RAG
rag_prompt = PromptTemplate(
    prompt="""Synthesize a comprehensive answer from the following text for the given question.
            Provide a clear and concise response.
            Your answer should be in your own words and be no longer than 50 words.
            \n\n Related text: {join(documents, delimiter=new_line, pattern=new_line+'Document[$idx]: $content', str_replace={new_line: ' ', '[': '(', ']': ')'})} \n Question: {query}; Answer: """,
            output_parser=AnswerParser(reference_pattern=r"Document\[(\d+)\]"),
)

# Initialize PromptNode
prompt_node = PromptNode(model_name_or_path="google/flan-t5-large", default_prompt_template=rag_prompt)

现在，我们可以使用 pipe.run 运行我们已使用检索器和提示节点初始化的 QA 管道。为此，我们将创建一个循环来迭代问题并获取包含答案和不同上下文的列表。

[ ]:

# Create the QA pipeline
pipe = Pipeline()
pipe.add_node(component=retriever, name="retriever", inputs=["Query"])
pipe.add_node(component=prompt_node, name="prompt_node", inputs=["retriever"])

# Run the pipeline
questions = dataset["train"]["question"]
answers = []
contexts = []

for question in tqdm(questions):

    # Get the response and save it
    response = pipe.run(query=question)
    answers.append(response["answers"][0].answer)

    # Get the document contexts and save them
    prompt = response["answers"][0].meta['prompt']
    segments = re.split(r'Document\[\d+\]:', prompt)
    document_segments = [segment.strip() for segment in segments[1:]]
    contexts.append(document_segments)

[ ]:

# Question: the query asked the model
print(f"Question: {questions[0]}")

# Answer: the answer generated by the model
print(f"Answer: {answers[0]}")

# Contexts: a list of contexts retrieved by the model
print(f"Contexts: {contexts[0]}")

Question: What is the ticketing system used by Argilla for customer support?
Answer: volume of records processed.
Contexts: ["In summary, Argilla Cloud's support services are designed to provide timely and efficient assistance for issues of varying severity, ensuring a smooth and reliable user experience. All plans include Monday to Friday during office hours (8am to 17pm CEST) with additional support upon request. The Support Channels and features of each tier are shown below: Starter: Slack Community. Severity 1 - Response time < 4 hours. Severity 2 - Response time < 8 hours. Severity 3 - Response time < 48 hours. Severity 4 not specified. Base: Ticketing System, Severity 1 - Response time < 4 hours.", "They have the option to configure settings as per their team's requirements, including assigning datasets to specific workspaces and managing access permissions. Step 5: Training and Support Argilla provides open resources and support to aid in the onboarding process. This includes user manuals, tutorials, and access to our support team for any queries or issues that may arise during the setup and onboarding process. By following these steps, new users can be quickly onboarded and begin using the Argilla Cloud service with minimal downtime.", "This process ensures the client administrator has full control over their team's access and can manage their workspace efficiently. Plans The plans for the Argilla Cloud service depend on the volume of records processed, with several tiers available to suit varying needs. Each tier has a corresponding monthly and annual price, with a 10% discount applied to the annual pricing option. The tier selection and associated price will be determined by the client'"]

创建 Argilla 数据集#

最后，我们获得了原始数据，因此我们可以使用 FeedbackDatasetTemplates 创建 Argilla 数据集。在本例中，我们将使用 FeedbackDataset.for_retrieval_augmented_generation 模板进行检索，存储查询和上下文，并使用 FeedbackDataset.for_sentence_similarity 模板进行重排序，存储上下文。

对于 RAG#

我们使用默认参数初始化模板，直到 number_of_retrievals，它将被设置为 3（因为我们有三个上下文），并添加一个新的元数据属性（上下文的来源，以防我们稍后想将它们与使用另一个模型获得的上下文进行比较）。然后，我们添加带有字段的记录，迭代我们创建的列表，并使用 push_to_argilla 方法将它们推送到 Argilla。这将允许我们在 Argilla UI 中可视化数据集。

[2]:

# Initialize the FeedbackDatasetTemplate
dataset_rag = rg.FeedbackDataset.for_retrieval_augmented_generation(
    number_of_retrievals=3,
    rating_scale=10,
    use_markdown=False,
    guidelines=None,
    metadata_properties=None,
    vectors_settings=None,
)
dataset_rag

[2]:

FeedbackDataset(
   fields=[TextField(name='query', title='Query', required=True, type='text', use_markdown=False), TextField(name='retrieved_document_1', title='Retrieved Document 1', required=True, type='text', use_markdown=False), TextField(name='retrieved_document_2', title='Retrieved Document 2', required=False, type='text', use_markdown=False), TextField(name='retrieved_document_3', title='Retrieved Document 3', required=False, type='text', use_markdown=False)]
   questions=[RatingQuestion(name='rating_retrieved_document_1', title='Rate the relevance of the Retrieved Document 1 for the query', description='Rate the relevance of the retrieved document.', required=True, type='rating', values=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]), RatingQuestion(name='rating_retrieved_document_2', title='Rate the relevance of the Retrieved Document 2 for the query', description='Rate the relevance of the retrieved document.', required=False, type='rating', values=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]), RatingQuestion(name='rating_retrieved_document_3', title='Rate the relevance of the Retrieved Document 3 for the query', description='Rate the relevance of the retrieved document.', required=False, type='rating', values=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]), TextQuestion(name='response', title='Write a helpful, harmless, accurate response to the query.', description='Write the response to the query.', required=False, type='text', use_markdown=False)]
   guidelines=This is a retrieval augmented generation dataset that contains queries and retrieved documents. Please rate the relevancy of retrieved document and write the response to the query in the response field.)
   metadata_properties=[])
)

[ ]:

# Add the new metadata property
metadata = rg.TermsMetadataProperty(name="source", title="Source model")

dataset_rag.add_metadata_property(metadata)

TermsMetadataProperty(name='source', title='Source model', visible_for_annotators=True, type='terms', values=None)

[ ]:

# Create the proper records
records = [
    rg.FeedbackRecord(
        fields={"query": question, "retrieved_document_1": context[0], "retrieved_document_2": context[1], "retrieved_document_3": context[2]},
        metadata={"source": "flan-t5-large"}
    )
    for question, context in tqdm(zip(questions, contexts))
]

# Add records to the dataset
dataset_rag.add_records(records)

[ ]:

# Publish the dataset in the Argilla UI
dataset_rag = dataset_rag.push_to_argilla(name="my_rag_dataset", workspace="admin")

对于句子相似度#

流程与前一个类似。我们初始化模板，添加元数据属性（在本例中，源将有助于了解我们正在比较的上下文的来源），并且，由于我们想比较三个句子，但交叉编码器仅支持比较两个句子，因此我们迭代三个上下文以将它们匹配为 1-2、1-3 和 2-3。然后，我们添加记录，并将其推送到 Argilla。

[ ]:

# Initialize the FeedbackDatasetTemplate
dataset_ssim = rg.FeedbackDataset.for_sentence_similarity(
    rating_scale=10,
    use_markdown=True,
    guidelines=None,
    metadata_properties=None,
    vectors_settings=None,
)
dataset_ssim

FeedbackDataset(
   fields=[TextField(name='sentence-1', title='Sentence-1', required=True, type='text', use_markdown=True), TextField(name='sentence-2', title='Sentence-2', required=True, type='text', use_markdown=True)]
   questions=[RatingQuestion(name='similarity', title='Similarity', description='Rate the similarity between the two sentences.', required=True, type='rating', values=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10])]
   guidelines=This is a sentence similarity dataset that contains two sentences. Please rate the similarity between the two sentences.)
   metadata_properties=[])
)

[ ]:

# Add the new metadata property
metadata = rg.TermsMetadataProperty(name="sources", title="Model sources", values=["flan-t5-large/flan-t5-large"])

dataset_ssim.add_metadata_property(metadata)

TermsMetadataProperty(name='sources', title='Model sources', visible_for_annotators=True, type='terms', values=['flan-t5-large/flan-t5-large'])

[ ]:

# Create the proper records
records = []
for context in tqdm(contexts):
    for i in range(len(context)):
        for j in range(i + 1, len(context)):
            record = rg.FeedbackRecord(
                fields={"sentence1": context[i], "sentence2": context[j]},
                metadata={"sources": "flan-t5-large/flan-t5-large"}
            )
            records.append(record)

# Add records to the dataset
dataset_ssim.add_records(records)

[ ]:

# Publish the dataset in the Argilla UI
dataset_ssim = dataset_ssim.push_to_argilla(name="my_ssim_dataset", workspace="admin")

微调模型#

为了改进检索模型，我们需要微调句子相似度模型，该模型比较双编码器情况下的查询嵌入和检索文档的上下文嵌入，并在交叉编码器情况下对上下文进行排序。在本教程中，我们将使用 ArgillaTrainer 和我们创建的数据集微调这两个模型。

双编码器模型#

首先，我们准备数据以微调双编码器。我们没有使用默认的 TrainingTask.for_sentence_similarity，而是使用 formatting_func，以便我们比较两个句子（查询和每个上下文），并添加注释者使用 Argilla UI 指示的评级值，以使其更精确。

使用 ArgillaTrainer 微调句子相似度模型非常容易。我们只需要初始化训练器并调用 train。要设置是使用 bi-encoder 还是 cross-encoder，我们只需要设置 framework_kwargs={"cross_encoder": False} 或 framework_kwargs={"cross_encoder": True}。查看文档以进行进一步自定义。

[35]:

# Load the dataset from Argilla
dataset_rag = rg.FeedbackDataset.from_argilla("my_rag_dataset", workspace="admin")

[38]:

# Define the training task using the formatting function
def formatting_func(sample):

    records = []

    for i in range(1, 4):
        record = {"sentence-1": sample["query"], "sentence-2": sample[f"retrieved_document_{i}"]}
        values = [resp["value"] for resp in sample[f"rating_retrieved_document_{i}"]]
        label = int(values[0])
        record["label"] = label
        records.append(record)

    return records

task = TrainingTask.for_sentence_similarity(formatting_func=formatting_func)

[ ]:

# Fine-tune the bi-encoder
trainer_bi = ArgillaTrainer(
    dataset=dataset_rag,
    task=task,
    framework="sentence-transformers",
    framework_kwargs={"cross_encoder": False}
)
trainer_bi.train(output_dir="my_bi_sentence_transformer_model")

交叉编码器模型#

我们的初始任务是准备数据以微调交叉编码器。在 TrainingTask.for_sentence_similarity function 中，我们明确专注于仅比较两个上下文，模型随后将对它们进行排序并添加评级注释。这种方法确保当检索模型检索文档时，交叉编码器可以有效地对它们进行排序并返回最相似的文档。

然后，我们像对双编码器一样初始化并运行训练器。唯一的区别是我们设置了 framework_kwargs={"cross_encoder": True}。查看文档以进行进一步自定义。

💭 请记住，cross-encoder 无法使用三元组进行训练。

[ ]:

# Load the dataset from Argilla
dataset_ssim = rg.FeedbackDataset.from_argilla("my_ssim_dataset", workspace="admin")

[ ]:

# Define the training task
task = TrainingTask.for_sentence_similarity(
    texts=[dataset_ssim.field_by_name("sentence-1"), dataset_ssim.field_by_name("sentence-2")],
    label=dataset_ssim.question_by_name("similarity")
)

[ ]:

# Fine-tune the cross-encoder
trainer_cross = ArgillaTrainer(
    dataset=dataset_ssim,
    task=task,
    framework="sentence-transformers",
    framework_kwargs={"cross_encoder": True}
)
trainer_cross.train(output_dir="my_cross_sentence_transformer_model")

使用我们的模型#

我们现在处于最后一步，因此我们将再次将我们微调的模型与 Haystack 一起使用，以便我们可以获得新的、更好的预测。

为此，我们将对 RAG 使用与之前相同的问题和文档。如果您已完成所有前面的步骤，则可以继续。

请记住，如果正在使用相同的文档，则无需重新初始化 document_store。但是，请确保 embeddings_dim 与我们微调的模型中的 embeddings_dim 对齐。如果存在不匹配，请使用正确的值重新初始化 document_store 并运行 TextIndexingPipeline。

首先，我们将使用我们微调的双编码器模型初始化 EmbeddingRetriever 并更新嵌入。然后，我们应该通过使用我们微调的交叉编码器模型初始化 SentenceTransformersRanker 来添加 ranker，这将按照顶部图表中的相同思路用于提升我们的 RAG 模型。

[ ]:

# Initialize the EmbeddingRetriever with out model
retriever = EmbeddingRetriever(
    document_store=document_store, embedding_model="my_bi_sentence_transformer_model"
)
document_store.update_embeddings(retriever)

[ ]:

# Initialize the SentenceTransformersRanker with out model
ranker = SentenceTransformersRanker(model_name_or_path="my_cross_sentence_transformer_model")

与之前一样，我们还应该使用 flan-t5-large 模型和 rag_prompt 模板初始化提示节点。

[ ]:

# Write the prompt for RAG
rag_prompt = PromptTemplate(
    prompt="""Synthesize a comprehensive answer from the following text for the given question.
            Provide a clear and concise response.
            Your answer should be in your own words and be no longer than 50 words.
            \n\n Related text: {join(documents, delimiter=new_line, pattern=new_line+'Document[$idx]: $content', str_replace={new_line: ' ', '[': '(', ']': ')'})} \n Question: {query}; Answer: """,
            output_parser=AnswerParser(reference_pattern=r"Document\[(\d+)\]"),
)

# Initialize PromptNode
prompt_node = PromptNode(model_name_or_path="google/flan-t5-large", default_prompt_template=rag_prompt)

现在，让我们创建最终管道，将所有组件连接起来！并运行它！

[ ]:

# Create the QA pipeline
pipe = Pipeline()
pipe.add_node(component=retriever, name="retriever", inputs=["Query"])
pipe.add_node(component=ranker, name="ranker", inputs=["retriever"])
pipe.add_node(component=prompt_node, name="prompt_node", inputs=["ranker"])

# Run the pipeline
questions = dataset["train"]["question"]
answers = []
contexts = []

for question in tqdm(questions):

    # Get the response and save it
    response = pipe.run(query=question)
    answers.append(response["answers"][0].answer)

    # Get the document context and save it
    prompt = response["answers"][0].meta['prompt']
    segments = re.split(r'Document\[\d+\]:', prompt)
    document_segments = [segment.strip() for segment in segments[1:]]
    contexts.append(document_segments)

[ ]:

# Question: the query asked the model
print(f"Question: {questions[0]}")

# Answer: the answer generated by the model
print(f"Answer: {answers[0]}")

# Contexts: a list of contexts retrieved by the model
print(f"Contexts: {contexts[0]}")

Question: What is the ticketing system used by Argilla for customer support?
Answer: Argilla Cloud's support services are designed to provide timely and efficient assistance for issues of varying severity, ensuring a smooth and reliable user experience. All plans include Monday to Friday during office hours (8am to 17pm CEST) with additional support upon request.
Contexts: ["In summary, Argilla Cloud's support services are designed to provide timely and efficient assistance for issues of varying severity, ensuring a smooth and reliable user experience. All plans include Monday to Friday during office hours (8am to 17pm CEST) with additional support upon request. The Support Channels and features of each tier are shown below: Starter: Slack Community. Severity 1 - Response time < 4 hours. Severity 2 - Response time < 8 hours. Severity 3 - Response time < 48 hours. Severity 4 not specified. Base: Ticketing System, Severity 1 - Response time < 4 hours.", 'This documents an overview of the Argilla Cloud service - a comprehensive Software as a Service (SaaS) solution for data labeling and curation. The service is specifically designed to meet the needs of businesses seeking a reliable, secure, and user-friendly platform for data management. The key components of our service include advanced security measures, robust data backup and recovery protocols, flexible pricing options, and dedicated customer support. The onboarding process is efficient, enabling clients to start using the service within one business day.', "They have the option to configure settings as per their team's requirements, including assigning datasets to specific workspaces and managing access permissions. Step 5: Training and Support Argilla provides open resources and support to aid in the onboarding process. This includes user manuals, tutorials, and access to our support team for any queries or issues that may arise during the setup and onboarding process. By following these steps, new users can be quickly on"]

正如我们所看到的，结果比以前好得多，并且 ranker 为我们提供了最佳上下文！👏

现在，剩下的就是让您尝试各种参数 - 我相信您可以进一步改进这些结果！💪