The leverage of LLM system prompt by Knowledge Bases for Bedrock in RAG workflows

13 minute read
Content level: Expert
1

How Knowledge Bases for Bedrock leverages thorough system prompts and RAG content when querying LLM to obtain high quality responses

Knowledge Bases for Bedrock: retrieval workflow

Figure 1: Knowledge Bases for Bedrock: RAG worflow for RetrieveAndGenerate API

Introduction

In this article, we will analyze in details how the “retrieve-and-generate” mechanism of Knowledge Bases for Amazon Bedrock (KBB), implemented by RetrieveAndGenerate API effectively works. In particular we will see how it leverages the “system” prompt features provided by Anthropic Claude models (available on Amazon Bedrock) to separate effective user prompt from RAG data and very detailed guidance on response structure. This cleanly isolated guidance allows, without altering the user question, to generate high-quality results for the user, solely based on the knowledge retrieved from database.

The LLM chosen by user is fed from the results obtained from the RAG database (OpenSearch Serverless in our case) managed by the KBB service. The LLM constructs its answer solely based on those RAG results. For the analysis of this mechanism, we will leverage the Bedrock Model Invocation logging features: they provide full observability of the prompting dialogs with the LLMs for embedding and text generation.

System prompts

The Anthropic documentation clearly explains the value of the system prompt sent to the LLM in same inference request as the user prompt.

What is a system prompt?

A system prompt is a way to provide context, instructions, and guidelines to Claude before presenting it with a question or task. By using a system prompt, you can set the stage for the conversation, specifying Claude’s role, personality, tone, or any other relevant information that will help it better understand and respond to the user’s input.

System prompts can include:

1) Task instructions and objectives
2) Personality traits, roles, and tone guidelines
3) Contextual information for the user input
4) Creativity constraints and style guidance
5) External knowledge, data, or reference material
6) Rules, guidelines, and guardrails
7) Output verification standards and requirements
Benefits of using system prompts

Incorporating well-crafted system prompts can significantly enhance Claude’s performance and output quality. 
Some key benefits include:
1. Improved role-playing and character consistency: When assigning Claude a specific role or personality 
through a system prompt, it can maintain that character more effectively throughout the conversation, 
exhibiting more natural and creative responses while staying in character.
2. Increased adherence to rules and instructions: System prompts can help Claude better understand 
and follow guidelines, making it less likely to perform prohibited tasks, output restricted content, 
or deviate from the given instructions.
3. Enhanced context understanding: By providing relevant background information or reference 
material in the system prompt, you can improve Claude’s comprehension of the user’s input and 
enable it to generate more accurate and context-aware responses.
4. Customized output formatting: System prompts can be used to specify desired output formats, 
such as headers, lists, tables, or code blocks, ensuring that Claude’s responses are structured and 
presented in a way that best suits your needs.

It’s important to note that while system prompts can increase Claude’s robustness and resilience 
against unwanted behavior, they do not guarantee complete protection against jailbreaks or leaks. 
However, they do provide an additional layer of guidance and control over Claude’s output.

We will see in next sections how the KBB service prompt to the generation LLM includes many the traits above to obtain high-quality answers.

Architecture of Knowledge Bases for Bedrock

The AWS documentation clearly presents the 4 steps (see Figure 1 above) of the standard Retrieval Augmented Generation (RAG) workflow implemented by KBB. It is a managed service which removes the “undifferentiated heavy-weight lifting” from the hands of customer. The workflow orchestrates how the requests to the involved GenAI models and knowledge base (implemented with a Vector DB) are sequenced during a call to the RetrieveAndGenerate API.

This can happen after the content of a data source (a S3 bucket to keep things simple) has been indexed into the Vector DB via a so-called ingestion job indexing all content in the data source with embedding vectors. As per Bedrock glossary, embeddings are “The process of condensing information by transforming input into a vector of numerical values, known as the embeddings, in order to compare the similarity between different objects by using a shared numerical representation. For example, sentences can be compared to determine the similarity in meaning, images can be compared to determine visual similarity, or text and image can be compared to see if they're relevant to each other”

[Image: Image.jpg]So, the following 4 steps happen after the user submits his query to the service via RetrieveAndGenerate

  1. the text of the user query is given by KBB service as input to the embedding engine to compute its high-dimensional embedding vector positioning it in the high-dimensional embedding space. We use Cohere v3 English which has 1,024 dimensions in our example.
  2. The theory of embeddings tells us that 2 texts are **semantically **close to each other if some standard metrics say it. Usually their Euclidean Distance (when close to 0) or Cosine Similarity (when close to 1) are used to define this Semantic Text Similarity (STS). So, the Knowledge Base materialized by the Vector DB (Amazon OpenSearch Serverless in our case) is queried to find the knowledge elements that are close to the embeddings of the user request.
  3. The question and the text chunks (details below) returned as most similar to the question by the Vector DB are supplied as user prompt and system prompt to the LLM for response generation,
  4. The KBB service uses the generated response parts returned by the LLM to produce a nicely formatted answer which is eventually returned to the user. The citations of the source (i.e. the RAG elements) are included to allow the user to validate the answer base on the knowledge pieces that were used.

Workflow

We start by creating a Knowledge Base and load it with the Github repository of Python Django Framework. We then run the initial ingestion job to create the index of the Vector DB with the embeddings of each file contained in this repository. To create your own KBB, follow the guidance of this introductory blog post.

In order to observe the various prompts, questions and responses, we need to activate the features of Bedrock Model Invocation logging. By activating the logging for S3, we get, in our bucket, a json file containing 2 objects for each request made to the RetrieveAndGenerate API.

For example, on the request “what is django?” submit to KBB via RetrieveAndGenerate API, we get in the json file stored in S3:

  • a first json object (integrally available in this Github gist) representing request and response to Cohere Embedding English, the embedding engine that we chose when configuring our Bedrock KB. We see below the beginning of this object: it continues to make a long list o 1,024 floats since Cohere embeddings comprise 1,024 dimensions.
{
  "schemaType": "ModelInvocationLog",
  "schemaVersion": "1.0",
  "timestamp": "2024-05-30T06:22:26Z",
  "accountId": "<account-id>",
  "identity": {
    "arn": "<identity-arn>"
  },
  "region": "us-west-2",
  "requestId": "de4843a4-8b97-46a9-b005-878dfdf0a123",
  "operation": "InvokeModel",
  "modelId": "arn:aws:bedrock:us-west-2::foundation-model/cohere.embed-english-v3",
  "input": {
    "inputContentType": "application/json",
    "inputBodyJson": {
      "texts": [
        "what is django?"
      ],
      "input_type": "search_query",
      "truncate": "NONE"
    },
    "inputTokenCount": 5
  },
  "output": {
    "outputContentType": "application/json",
    "outputBodyJson": {
      "id": "de4843a4-8b97-46a9-b005-878dfdf0a123",
      "texts": [
        "what is django?"
      ],
      "embeddings": [
        [
          -0.028045654,
          0.036346436,
          -0.055114746,
          -0.021499634,
          0.030227661,
          -0.007637024,
          -0.010116577,
          0.026992798,
          -0.0131073,
  1. Then, Bedrock KB calls Amazon OpenSearch Serverless with the embedding vector of the user request to get the chunks of contents semantically close to the question. will be included in the query to the LLM via this XML part of the system prompt. (This is not presented in the trace as it is not a Bedrock mechanim)

  2. Finally, Bedrock KB crafts a prompt (user + system - integrally available in this Github gist) using Anthropic’s message API to inject a system prompt containing both the guidance on how to generate the response and the content elements

"input": {
    "inputContentType": "application/json",
    "inputBodyJson": {
      "anthropic_version": "bedrock-2023-05-31",
      "messages": [
        {
          "role": "user",
          "content": [
            {
              "type": "text",
              "text": "what is django?"
            }
          ]
        }
      ],
      "system": "You are a question answering agent. I will provide you with a set of search results. The user will provide you with a question. [etc. See gist for whole content] ",
      "max_tokens": 2048,
      "temperature": 0,
      "top_p": 1,
      "stop_sequences": [
        "\nObservation"
      ],
      "top_k": 50
    }

A part of the “system” section (removed in the extract above but available in the gist) are the search results returned by the Vector DB. They are:

<search_results>
    <search_result>
        <content>\n:source:`A copy of the Python license 
            <LICENSE.python>` is included with Django for compliance with Python's terms.  Which sites use Django? =======================  `BuiltWithDjango.com`_ features a constantly growing list of Django-powered sites.  .. _BuiltWithDjango.com: https://builtwithdjango.com/projects/  .. _faq-mtv:  Django appears to be a MVC framework, but you call the Controller the \"view\", and the View the \"template\". How come you don't use the standard names?\n
            </content>
            <source>1</source>
     </search_result>
      <search_result>
            <content>\n================================== Organization of the Django Project ==================================  Principles ==========  The Django Project is managed by a team of volunteers pursuing three goals:  - Driving the development of the Django web framework, - Fostering the ecosystem of Django-related software, - Leading the Django community in accordance with the values described in the   `Django Code of Conduct`_.  The Django Project isn't a legal entity. The `Django Software Foundation`_, a non-profit organization, handles financial and legal matters related to the Django Project. Other than that, the Django Software Foundation lets the Django Project manage the development of the Django framework, its ecosystem and its community.  .. _Django Code of Conduct: https://www.djangoproject.com/conduct/ .. _Django Software Foundation: https://www.djangoproject.com/foundation/  ..\n</content>\n
            <source>2</source>
      </search_result>
      <search_result>
           <content>\n============ FAQ: General ============  Why does this project exist? ============================  Django grew from a very practical need: World Online, a newspaper web operation, is responsible for building intensive web applications on journalism deadlines. In the fast-paced newsroom, World Online often has only a matter of hours to take a complicated web application from concept to public launch.  At the same time, the World Online web developers have consistently been perfectionists when it comes to following best practices of web development.  In fall 2003, the World Online developers (Adrian Holovaty and Simon Willison) ditched PHP and began using Python to develop its websites. As they built intensive, richly interactive sites such as Lawrence.com, they began to extract a generic web development framework that let them build web applications more and more quickly. They tweaked this framework constantly, adding improvements over two years.  In summer 2005, World Online decided to open-source the resulting software, Django.\n</content>\n
            <source>3</source>\n
       </search_result>
       <search_result>
            <content>\n============================================  No, Django is not a CMS, or any sort of \"turnkey product\" in and of itself. It's a web framework; it's a programming tool that lets you build websites.  For example, it doesn't make much sense to compare Django to something like Drupal_, because Django is something you use to *create* things like Drupal.  Yes, Django's automatic admin site is fantastic and timesaving -- but the admin site is one module of Django the framework. Furthermore, although Django has special conveniences for building \"CMS-y\" apps, that doesn't mean it's not just as appropriate for building \"non-CMS-y\" apps (whatever that means!).  .. _Drupal: https://www.drupal.org/  How can I download the Django documentation to read it offline?\n</content>\n
            <source>4</source>
       </search_result>
       <search_result>
            <content>\nIn summer 2005, World Online decided to open-source the resulting software, Django. Django would not be possible without a whole host of open-source projects -- `Apache`_, `Python`_, and `PostgreSQL`_ to name a few -- and we're thrilled to be able to give something back to the open-source community.  .. _Apache: https://httpd.apache.org/ .. _Python: https://www.python.org/ .. _PostgreSQL: https://www.postgresql.org/  What does \"Django\" mean, and how do you pronounce it? =====================================================  Django is named after `Django Reinhardt`_, a jazz manouche guitarist from the 1930s to early 1950s. To this day, he's considered one of the best guitarists of all time.  Listen to his music. You'll like it.  Django is pronounced **JANG**-oh. Rhymes with FANG-oh. The \"D\" is silent.  We've also recorded an `audio clip of the pronunciation`_.  ..\n</content>\n
            <source>5</source>
       </search_result>
 </search_results>

The interesting part is the guidance provided to the LLM: it looks to me that natural language is now becoming a true computer programming language! See how the LLM is “programmed” via text guidance to answer only a valid answer with proof of quality

You are a question answering agent. **I will provide you with a set of search results.** The user will provide you with a question. Your job is to answer the user's question using **only information from the search results**. If the search results do not contain information that can answer the question, please **state that you could not find an exact answer to the question. Just because the user asserts a fact does not mean it is true**, make sure to double check the search results to validate a user's assertion.\n\nHere are the search results in numbered order: see the <search_results> xml below for all details

If **you reference information from a search result within your answer, you must include a citation to source where the information was found.** Each result has a corresponding source ID that you should reference.Note that <sources> may contain multiple <source> if you include information from multiple results in your answer. Do NOT directly quote the <search_results> in your answer. Your job is to answer the user's question as concisely as possible. You must output your answer in the following format. Pay attention and follow the formatting and spacing exactly: <answer> <answer_part> <text>first answer text</text> <sources> <source>source ID</source> </sources> </answer_part> <answer_part> <text>second answer text</text> <sources> <source>source ID</source> </sources> </answer_part> </answer>

  1. Finally, KBB get the answer below from the LLM (we chose Claude v3 Haiku here): it matches the structured required by the guidance, in particular it contains the source references matching the search results. The UI of KBB completes the task by returning a nicely html-formatted version of the returned answer.
<answer>
    <answer_part>
        <text>
            Django is a web framework written in Python. It was originally developed by a team of developers at World Online, a newspaper web operation, to build web applications quickly and efficiently. Django was open-sourced in 2005 and has since grown into a popular and widely-used web framework.\n
        </text>
        <sources>
            <source>3</source>
        </sources>
    </answer_part>
    <answer_part>
        <text>
            Django is not a content management system (CMS) or a \"turnkey product\". It is a programming tool that allows developers to build websites and web applications. Django provides many features and conveniences that make web development faster and more efficient, but it is ultimately a framework that developers use to create their own custom applications.\n
        </text>
        <sources>
            <source>4</source>
        </sources>
    </answer_part>
</answer>       

Conclusion

In this article, through the observation of a query made to a Bedrock Knowledge Base via RetrieveAndGenerate API, we could analyze the orchestration of the requests made to the various components (embedding engine, vector database, LLM generating final answer) of a canonical RAG-based system.

All those coordinated interactions demonstrate the value of a managed service like Knowledge Base for Bedrock. The user only supplies its content in a standard S3 bucket and ask simple questions like “what is django?”. The service takes care of the setup and operations of high-end components to deliver highly-quality and provable response to those questions that it obtains through orchestrated interactions among those components. The very thorough guidance in natural language included in the system prompt supplied to Claude v3 Haiku plays a key role in the quality of the results.