Integrating Llumo evaluation API in LLM Pipeline

This guide will explain how to integrate Llumo’s evaluation API into a LLM pipeline build on top of OpenAI APIs. LLUMO can help you gain complete insights on you LLM outputs and customer success using proprietary framework- EvalLM. We’ll go through the process step-by-step to make it easy for developers of all levels.

##Setting up the environment ###Importing libraries:

  • os: This module provides a way to use operating system dependent functionality, like reading environment variables.
  • OpenAI: This is the official OpenAI Python client, used to interact with OpenAI’s API.
  • requests: A popular library for making HTTP requests in Python.
  • json: Used for parsing JSON data, which is common in API responses.
  • logging: Provides a flexible framework for generating log messages in Python.
  • getpass: Allows secure password prompts where the input is not displayed on the screen.

###Setting up logging:

  • We use logging.basicConfig() to configure the logging system. The level=logging.INFO argument sets the threshold for logging messages to INFO level and above.
  • We create a logger object named logger that we’ll use throughout our script to log important information and errors.

This setup ensures we have all necessary tools to interact with APIs, handle data, and track any issues that might occur during execution.

!pip install openai
import os
from openai import OpenAI
import requests
import json
import logging
from getpass import getpass

# Set up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

print("Libraries imported and logging set up successfully.")

##Secure API key handling ###Secure API key input:

  • We use getpass() to prompt the user for their API keys. This function hides the input, making it more secure than using regular input().
  • This approach is safer than hardcoding API keys in your script, which could accidentally be shared or exposed.

###Setting environment variables:

  • We use os.environ to set environment variables for both API keys.
  • Environment variables are a secure way to store sensitive information, as they’re not part of your code and are only accessible within the current process.

###Initializing the OpenAI client:

  • We create an instance of the OpenAI client using the API key we just set.
  • Using os.getenv("OPENAI_API_KEY") retrieves the API key from the environment variables.

This method ensures that your API keys are handled securely and are readily available for use in your script.

# Cell 2: Secure API key handling

# Securely input API keys
openai_api_key = getpass("Enter your OpenAI API key: ")
llumo_api_key = getpass("Enter your Llumo API key: ")

# Set environment variables
os.environ['OPENAI_API_KEY'] = openai_api_key
os.environ['LLUMO_API_KEY'] = llumo_api_key

del openai_api_key
del llumo_api_key


# Initialize OpenAI client
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

print("API keys securely set and OpenAI client initialized.")

Llumo Evaluation Function Documentation

Define Llumo Evaluation Function

Function Definition:

  • We define a function evaluate_with_llumo that takes a prompt and output as inputs.

API Setup:

  • We retrieve the Llumo API key from environment variables.
  • We set the API endpoint and prepare headers for the HTTP request.

Payload Preparation:

  • We create a payload dictionary with the prompt, output, and predefined analytics type (“Clarity”).

API Request:

  • We use requests.post() to send a POST request to the Llumo API.
  • response.raise_for_status() will raise an exception for HTTP errors.

Response Parsing:

  • We parse the JSON response and extract the evaluation data.
  • We print the API response for debugging purposes.

Error Handling:

  • We use a try-except block to catch potential errors:

    • JSON decoding errors
    • Request exceptions
  • If an error occurs, we log it and return an empty dictionary with a failure indicator.

Return Values:

  • The function returns a tuple containing:
    • Evaluation data (or empty dictionary if evaluation failed)
    • Success boolean

This function encapsulates the entire process of interacting with the Llumo API for text evaluation, including error handling and result processing.

def evaluate_with_llumo(prompt, output):
    LLUMO_API_KEY = os.getenv("LLUMO_API_KEY")
    LLUMO_ENDPOINT = "https://app.llumo.ai/api/create-eval-analytics"
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {LLUMO_API_KEY}"
    }
    payload = {
        "prompt": prompt,
        "input": {},
        "output": output,
        "analytics": ["Clarity"]
    }

    try:
        response = requests.post(LLUMO_ENDPOINT, json=payload, headers=headers)

        try:
            result = response.json()
            print("API Response:", result)
            return result.get('data', {}), True
        except json.JSONDecodeError as json_error:
            logger.error(f"Error decoding JSON: {str(json_error)}")
            return {}, False

    except requests.RequestException as e:
       logger.error(f"Error evaluating with Llumo: {str(e)}")
       return {}, False

Getting Respone from OpenAI

Define Example Prompt:

  • We define an example prompt to be used for testing without compression.

Example Prompt:

prompt = """
Quite a long time ago, there lived two talking birds with their parents. One day when their parents were away, a villager who generally had an eye on those special birds caught them and took them away. One of the birds got away from the trap and looked for his parents. The bird, at last, arrived at a hermitage where it was welcomed and had some food. He lived joyfully.
An explorer once ran over the other bird that the villager caught. He talked very rudely to him. He was shocked to see a talking bird but at the same time was angry with his way of behaving. The explorer visited the hermitage and recognized a similar talking bird, yet this bird talked respectfully and welcomed him to remain there.
Moral of the Story
Staying in a good company gives us a good way of behaving. Bad company influences our way of behaving negatively.
"""

This segment of code demonstrates how to define an example prompt and test the response generation without applying any compression, using the OpenAI API. It includes the prompt definition, API request, and response extraction steps.

# Cell 4: Define example prompt and test without compression

# Example prompt
prompt = """
Quite a long time ago, there lived two talking birds with their parents. One day when their parents were away, a villager who generally had an eye on those special birds caught them and took them away. One of the birds got away from the trap and looked for his parents. The bird, at last, arrived at a hermitage where it was welcomed and had some food. He lived joyfully.
An explorer once ran over the other bird that the villager caught. He talked very rudely to him. He was shocked to see a talking bird but at the same time was angry with his way of behaving. The explorer visited the hermitage and recognised a similar talking bird, yet this bird talked respectfully and welcomed him to remain there.
Moral of the Story
Staying in a good company gives us a good way of behaving. Bad company influences our way of behaving negatively.
"""



response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": prompt}
    ]
)
openai_output=response.choices[0].message.content
print(openai_output)
print(f"Tokens used: {response.usage.total_tokens}\n")

Evaluating Responses with Llumo

Evaluate the OpenAI Response:

  • We evaluate the response generated by the OpenAI API using the Llumo evaluation function.
  • We call the evaluate_with_llumo function with the example prompt and the openai_output.
  • We check if the evaluation was successful:
    • If successful, we print the Llumo evaluation results.
    • If the evaluation fails, we print an error message and indicate that the original prompt can be used if evaluation fails.

print("Evaluating Responses:")
llumo_result, success = evaluate_with_llumo(prompt, openai_output)

if success:
    print("Llumo Evaluation Results:")
    for analytic, score in llumo_result.items():
        print(f"{analytic}: {score}")
else:
    print("Evaluation with Llumo failed.")
    # Use the original prompt if compression fails