Prompt Flow in Azure ML: exploring new features

BenBurtenshaw

Ben Burtenshaw

NLP Engineer

In Natural Language Processing (NLP) precise model prompting is paramount. Minor fluctuations in phrasing or structure can impact the quality and consistency of the generated text. Automatic Prompt Engineering (APE) is a technique to optimise prompt definition. APE is an approach that crafts prompts to steer language models towards more accurate and contextually relevant outputs while reducing inaccuracies.

PromptFlowBlogposts-Visual

 

Automatic prompt engineering serves as a mechanism to optimize the efficacy of large LLMs in generating relevant and accurate outputs. While LLMs demonstrate immense capabilities in handling various tasks, their performance is often contingent on the quality and structure of the prompts fed to them. With automatic prompt engineering, the objective is to systematically fine-tune these prompts based on various evaluation metrics, ensuring that the model’s responses align seamlessly with the intended context and showcase precision. By employing techniques like the chain of thought (CoT) prompting, which encourages step-by-step reasoning, or the groundbreaking Zero-shot-CoT method, there’s a potential to unearth deeper layers of knowledge embedded within LLMs. This approach not only amplifies the model’s zero-shot reasoning skills but also underscores the significance of harnessing the latent potential of LLMs through strategic prompting, eliminating the often tedious manual task of crafting prompts.

APE’s significance is hihglighted in the paper “Large Language Models are Human-level Prompt Engineers”, which delineates three pivotal strategies for optimizing Large Language Model (LLM) outputs:

  • LLM as an Inference Model: This is the conventional LLM usage where a direct question yields a straightforward answer.
  • LLM as a Resampling/Paraphrase Model: Here, the LLM suggests alternative phrasings for a given question, potentially enhancing clarity or specificity.
  • LLM as a Scoring Model: The LLM evaluates the quality of answers based on criteria like truthfulness, informativeness, objectivity, and more. This evaluation can incorporate metrics like Rouge, Bleu, or even regular expressions to assess the presence of links, quotes, etc.

Following this evaluation, one can refine the prompts either by resampling or by making incremental adjustments to enhance their effectiveness.

In this guide, we’ll delve into the latter approach. Building on our prior exploration of Vector Index creation using AzureML, we’ll now integrate this Vector Index with an LLM. Our focus will be on fine-tuning prompts based on evaluation metrics, ensuring that the LLM’s responses are not only accurate but also contextually relevant.

Hands-on guide: combining Retrieval Augmented Generation (RAG) with Automatic Prompt Engineering (APE) in AzureML

Connecting to AzureML Workspace

First, input your workspace details. This action will generate a workspace.json file in your current directory, which will be crucial for the next steps. The MLClient is your primary interface with AzureML. It's through this client that you'll interact with all the functionalities AzureML offers.

from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential
from azure.ai.ml import MLClient
from azureml.core import Workspace

 

credential = InteractiveBrowserCredential()

 

ml_client = MLClient.from_config(credential=credential, path="workspace.json")

 

ws = Workspace(
    subscription_id=ml_client.subscription_id,
    resource_group=ml_client.resource_group_name,
    workspace_name=ml_client.workspace_name,
)

 

Creating the Vector Index

For this tutorial, we’ll use a public git repository as our data source. We’ll also employ the text-embeddings-ada-002 from Azure OpenAI for the FAISS Vector Index creation. If you're interested in other data sources, embedding types, or vector index stores, there are additional resources available.

Once you’ve set up your workspace connection for Azure OpenAI, ensure that the text-embedding-ada-002 model has been deployed and is ready for inference. If not, you'll need to deploy the model using Azure OpenAI.

Setting Up the Pipeline

AzureML Pipelines are a powerful tool that allows you to connect multiple components seamlessly. Each component has its inputs, a specific code that processes these inputs, and outputs. Pipelines can also have their inputs and outputs, which are determined by the individual components they connect.

For our purposes, we’ll be chaining together multiple components, each responsible for a specific step in the workflow. These components are published to a registry, which you should have access to by default.

auto_prompt_component = ml_registry.components.get("llm_autoprompt_qna", label="latest")

 

register_auto_prompt_data_component = ml_registry.components.get(
    "llm_rag_register_autoprompt_data_asset", label="latest"
)

 

promptflow_creation_component = ml_registry.components.get(
    "llm_rag_create_promptflow", label="latest"
)

 

# Register all components

The pipeline we’ll be building will define a Python function that connects the components we’ve discussed. This function will take in various parameters, such as the git URL, branch name, and embeddings model, and return a dictionary that defines the outputs of the pipeline.

 

@pipeline(default_compute="serverless")
def git_to_faiss_with_testgen_and_autoprompt(
    git_url: str,
    branch_name: str,
    embeddings_model: str,
    asset_name: str,
    llm_completion_config: str,
    data_source_glob: str = None,
    data_source_url: str = None,
    document_path_replacement_regex: str = None,
    aoai_connection_id: str = None,
):
    """Pipeline to generate embeddings from a git repository and create a Faiss index, in parallel generate test data and evaluate prompts."""

 

# Vector index creation
    git_clone = git_clone_component(git_repository=git_url, branch_name=branch_name)
    use_automatic_compute(git_clone)

 

crack_and_chunk = crack_and_chunk_component(
        input_data=git_clone.outputs.output_data,
        input_glob=data_source_glob,
        chunk_size=1024,
        data_source_url=data_source_url,
        document_path_replacement_regex=document_path_replacement_regex,
    )
    use_automatic_compute(crack_and_chunk)

 

generate_embeddings = generate_embeddings_component(
        chunks_source=crack_and_chunk.outputs.output_chunks,
        embeddings_model=embeddings_model,
    )
    use_automatic_compute(generate_embeddings)
    use_aoai_connection(generate_embeddings, aoai_connection_id)

 

create_faiss_index = create_faiss_index_component(
        embeddings=generate_embeddings.outputs.embeddings,
    )
    use_automatic_compute(create_faiss_index)

 

register_mlindex = register_mlindex_component(
        storage_uri=create_faiss_index.outputs.index, asset_name=asset_name
    )
    use_automatic_compute(register_mlindex)

 

# Test Data Generation
    data_generation = data_generation_component(
        input_data=crack_and_chunk.outputs.output_chunks,
        llm_config=llm_completion_config,
        dataset_size=50,
    )
    use_automatic_compute(data_generation)
    use_aoai_connection(data_generation, aoai_connection_id)

 

register_qa_data = register_qa_data_component(
        storage_uri=data_generation.outputs.output_data,
        asset_name=asset_name,
        register_output=True,
    )
    use_automatic_compute(register_qa_data)

 

# Autoprompt Creation
    auto_prompt = auto_prompt_component(
        llm_config=llm_completion_config,
        task_type="abstractive",
        primary_metric="gpt_similarity",
        n_prompts=10,
        dev_data=data_generation.outputs.output_data,
        test_data=data_generation.outputs.output_data,
        best_of=100,
        top_k=3,
    )
    use_automatic_compute(auto_prompt)
    use_aoai_connection(auto_prompt, aoai_connection_id)

 

register_auto_prompt_data = register_auto_prompt_data_component(
        storage_uri=auto_prompt.outputs.best_prompt,
        asset_name=asset_name,
        register_output=True,
    )
    use_automatic_compute(register_auto_prompt_data)

 

# Prompt flow Creation
    promptflow = promptflow_creation_component(
        best_prompts=auto_prompt.outputs.best_prompt,
        mlindex_asset_id=register_mlindex.outputs.asset_id,
        mlindex_name=asset_name,
        llm_connection_name=aoai_connection_id,
        llm_config=llm_completion_config,
        embedding_connection=aoai_connection_id,
        embeddings_model=embeddings_model,
    )
    use_automatic_compute(promptflow)

 

return {
        "mlindex_asset_uri": create_faiss_index.outputs.index,
        "mlindex_asset_id": register_mlindex.outputs.asset_id,
        "best_prompts": auto_prompt.outputs.best_prompt,
        "test_data": data_generation.outputs.output_data,
    }

 

Submitting the Pipeline

Once you’ve set up your pipeline, it’s time to submit it. If you encounter any errors during this process, there are troubleshooting resources available. After submission, you can inspect the output of each step in the pipeline via the Workspace UI.

from azure.ai.ml import Input
import json

 

pipeline_job = git_to_faiss_with_testgen_and_autoprompt(
    git_url=git_url,
    branch_name=None,
    data_source_glob=data_source_glob,
    data_source_url=data_source_url,
    document_path_replacement_regex=document_path_replacement_regex,
    embeddings_model=embeddings_model_uri,
    aoai_connection_id=aoai_connection_id,
    llm_completion_config=llm_completion_config,
    # Name of asset to register MLIndex under, other assets (like prompt and test data) will derive their names from this name
    asset_name=asset_name,
)

 

pipeline_job.properties["azureml.mlIndexAssetName"] = asset_name
pipeline_job.properties["azureml.mlIndexAssetKind"] = "faiss"
pipeline_job.properties["azureml.mlIndexAssetSource"] = "Git Repository"

 

running_pipeline_job = ml_client.jobs.create_or_update(
    pipeline_job, experiment_name="mlindex_with_testgen_and_autoprompt"
)

 

Reviewing the Data and Prompts

After the pipeline job finishes, you’ll have access to the autogenerated test dataset. This dataset is registered as a URI File data asset in your workspace. You can review this data either by browsing it on the ‘Explore’ tab of the registered data asset or by using the provided code.

Additionally, you can review the best prompts and their associated evaluation metrics. This will give you insights into the performance of your model and the quality of the prompts.

import fsspec
import pandas as pd

 

qa_data = ml_client.data.get(f"{asset_name}-test-data", label="latest")
with fsspec.open(qa_data.path) as f:
    df = pd.read_json(f, lines=True)

 

Testing with Prompt Flow

Prompt Flow offers an interactive experience for testing and authoring Retrieval-Augmented Generation (RAG). Once you’re satisfied with your setup, you can save and bulk test the entire RAG process using the autogenerated test dataset.

AzureML-Runs

Conclusion

In conclusion, AzureML Prompt Flow offers a comprehensive suite of tools for NLP practitioners. The advantages of retrieval augmented generation (RAG) are well documented: By integrating a vector index with an LLM, you can enhance the capabilities of your models, leading to more accurate and dynamic results. By incorporating Automatic Prompt Engineering with RAG, we can leverage the model more through an optimised prompt.

The next step is to optimise the models themselves through finetuning. In the next series, I will tackle finetuning techniques and focus on open-source LLMs.

Get in touch!

Inquiry for your POC

=
Scroll to Top